5. SwiftForth-i386 Implementation

SwiftForth-i386 is designed to produce optimal performance in a 32-bit environment. This section describes the implementation of the Forth virtual machine in this context.

5.1 Implementation Overview

This section provides a summary of the important implementation features of SwiftForth. More detail on critical issues is provided in later sections.

5.1.1 Execution model

SwiftForth-i386 is a 32-bit, subroutine-threaded Forth. Subroutine threading is an implementation strategy in which references in a colon definition are compiled as subroutine calls, rather than as addresses that must be processed by an address interpreter.

Colon and code definitions do not have a code field distinct from the content of the definition itself; data structures typically have a code field consisting of a call to the code for that data type. At the end of a colon definition, the EXIT used in other Forth implementation strategies is replaced by a subroutine return.

A subroutine-threaded implementation lends itself to inline code expansion. SwiftForth takes advantage of this via a header flag indicating that a word is to be compiled inline or called. Many kernel-level primitives are designated for inline expansion by being defined with ICODE rather than by CODE. The compiler will automatically inline a definition whose INLINE field is set.

5.1.2 Code Optimization

More extensive optimization is provided by a powerful rule-based optimizer that can optimize over 200 common high-level phrases. This optimizer is normally on, but can be turned off for debugging or comparison purposes. Consider the definition of DIGIT, which converts a small binary number to a digit:

: DIGIT ( u -- char)
   DUP 9 > IF  7 +  THEN  [CHAR] 0 + ;

With the optimizer turned off, you would get:

SEE DIGIT
4078BF   4 # EBP SUB                    83ED04
4078C2   EBX 0 [EBP] MOV                895D00
4078C5   4 # EBP SUB                    83ED04
4078C8   EBX 0 [EBP] MOV                895D00
4078CB   9 # EBX MOV                    BB09000000
4078D0   403263 ( > ) CALL              E88EB9FFFF
4078D5   EBX EBX OR                     09DB
4078D7   0 [EBP] EBX MOV                8B5D00
4078DA   4 [EBP] EBP LEA                8D6D04
4078DD   4078F4 JZ                      0F8411000000
4078E3   4 # EBP SUB                    83ED04
4078E6   EBX 0 [EBP] MOV                895D00
4078E9   7 # EBX MOV                    BB07000000
4078EE   0 [EBP] EBX ADD                035D00
4078F1   4 # EBP ADD                    83C504
4078F4   4 # EBP SUB                    83ED04
4078F7   EBX 0 [EBP] MOV                895D00
4078FA   30 # EBX MOV                   BB30000000
4078FF   0 [EBP] EBX ADD                035D00
407902   4 # EBP ADD                    83C504
407905   RET                            C3 ok

With the optimizer turned on, the code is much more compact and efficient:

SEE DIGIT
41A40F   9 # EBX CMP                    83FB09
41A412   41A41B JLE                     0F8E03000000
41A418   7 # EBX ADD                    83C307
41A41B   30 # EBX ADD                   83C330
41A41E   RET                            C3

Another example shows the compiler’s ability to “fold” literals, and operations on literals, into shorter sequences.

: TEST ( addr -- x )
   DUP 6 CELLS + CELL+ $FFFF AND @ ;

SEE TEST
45A2F3   4 # EBP SUB                    83ED04
45A2F6   EBX 0 [EBP] MOV                895D00
45A2F9   18 # EBX ADD                   83C318
45A2FC   4 # EBX ADD                    83C304
45A2FF   FFFF # EBX AND                 81E3FFFF0000
45A305   0 [EBX] EBX MOV                8B1B
45A307   RET                            C3 

To experiment with this further, follow this procedure:

  1. Turn off the optimizer, by typing -OPTIMIZER
  2. Type in a definition
  3. Decompile it, using SEE <name>
  4. Turn the optimizer back on with +OPTIMIZER
  5. Re-enter your definition
  6. Decompile it and compare

Let’s do that for the last example above.

-OPTIMIZER

: TEST ( addr -- x )
   DUP 6 CELLS + CELL+ $FFFF AND @ ;

SEE TEST
481C36   -4 [EBP] EBP LEA               8D6DFC
481C39   EBX 0 [EBP] MOV                895D00
481C3C   -4 [EBP] EBP LEA               8D6DFC
481C3F   EBX 0 [EBP] MOV                895D00
481C42   6 # EBX MOV                    BB06000000
481C47   EBX 2 # SHL                    C1E302
481C4A   0 [EBP] EBX ADD                035D00
481C4D   4 [EBP] EBP LEA                8D6D04
481C50   4 [EBX] EBX LEA                8D5B04
481C53   -4 [EBP] EBP LEA               8D6DFC
481C56   EBX 0 [EBP] MOV                895D00
481C59   FFFF # EBX MOV                 BBFFFF0000
481C5E   0 [EBP] EBX AND                235D00
481C61   4 [EBP] EBP LEA                8D6D04
481C64   0 [EBX] EBX MOV                8B1B
481C66   RET                            C3

+OPTIMIZER

: TEST ( addr -- x )
   DUP 6 CELLS + CELL+ $FFFF AND @ ;

SEE TEST
481C77   -4 [EBP] EBP LEA               8D6DFC
481C7A   EBX 0 [EBP] MOV                895D00
481C7D   18 # EBX ADD                   83C318
481C80   4 [EBX] EBX LEA                8D5B04
481C83   FFFF # EBX AND                 81E3FFFF0000
481C89   0 [EBX] EBX MOV                8B1B
481C8B   RET                            C3 

At the end of each colon definition, the optimizer attempts to convert a CALL followed by a RET into a single JMP instruction. This process is known as tail recursion and is a common compiler technique. To prevent tail recursion on a definition (e.g. a word that performs implementation-specific manipulation of the return stack), follow the end of the definition with NO-TAIL-RECURSION.

Consider this simple example and note that DEBARK is called at the end of TEST. Without NO-TAIL-RECURSION, that would have been a JMP to DEBARK and the return stack would be out of balance, likely crashing the program.

: EMBARK ( n1 n2 -- )   R>  SWAP >R  SWAP >R  >R ;  NO-TAIL-RECURSION
: DEBARK ( -- n1 n2 )   R>  R> SWAP  R> SWAP  >R ;  NO-TAIL-RECURSION

: TEST ( -- n1 n2 )   \ Should return 3 4
   1 2 3 4 EMBARK 2DROP DEBARK ;

SEE TEST
481D52   -8 [EBP] EBP LEA               8D6DF8
481D55   EBX 4 [EBP] MOV                895D04
481D58   1 # 0 [EBP] MOV                C7450001000000
481D5F   2 # EBX MOV                    BB02000000
481D64   -8 [EBP] EBP LEA               8D6DF8
481D67   EBX 4 [EBP] MOV                895D04
481D6A   3 # 0 [EBP] MOV                C7450003000000
481D71   4 # EBX MOV                    BB04000000
481D76   481C9E ( EMBARK ) CALL         E823FFFFFF
481D7B   4 [EBP] EBX MOV                8B5D04
481D7E   8 [EBP] EBP LEA                8D6D08
481D81   481CF9 ( DEBARK ) CALL         E873FFFFFF
481D86   RET                            C3

5.1.3 Register usage

Following are the SwiftForth Virtual Machine (VM) register assignments:

RegisterUsage
EBXTop of stack
ESIUser area pointer
EDIStart of SwiftForth memory
EBPData stack pointer
ESPReturn stack pointer

All other registers are available for use without saving and restoring.

Note that the processor stack pointer is now used for the return stack; this is a consequence of the subroutine-threaded model. This implementation uses EBX to cache the top stack item, as noted above, which further improves performance.

SwiftForth’s dictionary occupies a single, contiguous, flat 32-bit address space in virtual memory. SwiftForth is position-independent, which means it can run wherever it happens to be loaded. This means that compiled address references are relative; however, when words that reference data objects (e.g., things constructed with CREATE, VARIABLE, etc.) are executed, they return an absolute address.

By being position independent, SwiftForth simplifies and speeds up all host system interactions. This feature is a natural consequence of subroutine threading, because the i386 calls use relative addresses. Forth execution tokens are zero-relative to the start of the run-time memory space, but they are used only internally and, thus, do not need conversion.

Therefore, although it is efficient for data objects to return absolute addresses, these must never be compiled as literals into definitions. Instead, always execute the object name at run time to get its address.

5.1.5 Stack Implementation and Rules of Use

The Forth virtual machine has two stacks with 32-bit items, which in SwiftForth are located in the stack frame assigned by the host system when SwiftForth is loaded. Stacks grow downward in address space. The return stack is the CPU’s subroutine stack, and it functions analogously to the traditional Forth return stack (i.e., carries return addresses for nested calls). A program may use the return stack for temporary storage during the execution of a definition, subject to the following restrictions:

  • A program shall not access values on the return stack (using R@, R>, 2R@, or 2R>) that it did not place there using >R or 2>R.
  • When within a DO loop, a program shall not access values that were placed on the return stack before the loop was entered.
  • All values placed on the return stack within a DO loop shall be removed before I, J, LOOP, +LOOP, or LEAVE is executed.
  • In a definition in which local variables will be used, values may not be placed on the return stack before the local variables declaration.
  • All values placed on the return stack within a definition shall be removed before the definition is terminated or before EXIT is executed.

The dictionary can accommodate word names up to 254 characters in length.

There is no address alignment requirement in SwiftForth-i386.

5.2 Memory Organization

SwiftForth’s dictionary resides in a memory space provided by the host system whose size is fixed when SwiftForth is loaded. The current memory allocation and usage can be displayed with .ALLOCATION.

The SwiftForth memory space begins at ORIGIN. The dictionary grows toward high memory.

The Standard Forth command UNUSED returns on the stack the current amount of free space remaining in the dictionary; the command FYI displays the origin, current value of HERE (top of dictionary), and remaining free space.

When SwiftForth is loaded, the host system allocates a generous stack space to the main SwiftForth process. SwiftForth reserves the top portion of this for its data stack, and the balance is left for the return stack.

SwiftForth is an inherently multitasked system. When booted, it has a single task, whose name is OPERATOR. OPERATOR’s user area is allocated above the kernel. You may define and manage additional tasks, as described in a subsequent section.

.ALLOCATION
( — )
Displays the current SwiftForth memory allocation and usage.
MEMTOP
( — addr )
Return the address of the top of SwiftForth’s committed memory.
ORIGIN
( — addr )
Return the address of the origin of SwiftForth’s committed memory.
+ORIGIN
( xt — addr )
Convert an execution token to an address by adding ORIGIN.
-ORIGIN
( addr — xt )
Convert an absolute address to a relative address by subtracting ORIGIN.
UNUSED
( — n )
Return the amount of available memory remaining in the dictionary.

5.3 Control Structure Balance Checking

The SwiftForth colon definition compiler performs an optional control structure balance check at the end of each definition. If any control structure is left unbalanced, SwiftForth will abort with this error message:

Unbalanced control structure

This feature can be turned on and off with the words +BALANCE and -BALANCE. The default is on.

Here is an advanced use of control structures that passes control between two high-level definitions, but leaves the first definition out of balance until the IF and BEGIN are resolved in the second definition:

-BALANCE
: DUMIN ( d1 d2 -- d3 )   2OVER 2OVER DU< IF  BEGIN  2DROP ;
: DUMAX ( d1 d2 -- d3 )   2OVER 2OVER DU< UNTIL  THEN  2SWAP 2DROP ;
+BALANCE
-BALANCE
( — )
Turn off control structure balance checking.
+BALANCE
( — )
Turn on control structure balance checking. This is the default setting.

5.4 Dynamic Memory Allocation

SwiftForth supports the Forth Standard memory allocation wordset. ALLOCATE gets memory from the host system’s virtual memory pool, which is outside SwiftForth’s dictionary space.

This is a particularly useful strategy for large buffers, because it avoids enlarging the size of a saved program (which would be the result of using BUFFER: or a CREATE … ALLOT sequence).

Virtual memory allocated by your program does not have to be explicitly released, as it will be automatically released when the program terminates. However, memory allocated for short-term use should be released by your program.

5.5 Dictionary Management

This section describes the layout and management of the SwiftForth dictionary.

5.5.1 Dictionary Structure

The SwiftForth dictionary includes code, headers, and data. There is no separation of code and data spaces in this implementation.

Dictionary header layout:

FieldSizeDescription
LocateCellLocation of source code for this word (e.g. source file, line number).
LinkCellRelative link to previous definition in this word list.
SentinelByteMarker used to identify the start of the name field. Always FFHEX.
FlagsByteSee flag byte layout below
CountByteLength of the name field in bytes (1..254)
NameBytesNumber of bytes specified by the Count field
InlineByteIf non-zero, the number of bytes to inline in place of a call to this word

Notes:

  • The only byte in the header that can have the value FFHEX is the Sentinel field
  • Expansion of an inline definition does not include a trailing RET if one is present.

Flags byte layout:

Bit(s)NameDescription
7SmudgeIf set, the word cannot be found by a dictionary search.
6ImmediateIf set, this is an IMMEDIATE word.
5..0ReservedReserved for future use

SwiftForth also provides the following words for managing and navigating various parts of a definition header.

IMMEDIATE
( — )
Set the Immediate bit in the most recently created header.
+SMUDGE
( — )
Set the Smudge bit in the flags byte, rendering the name invisible to the dictionary search. This bit is set for a colon definition while it is being constructed, to avoid inadvertent recursive references.
-SMUDGE
( — )
Clear the Smudge bit.
>BODY
( xt — addr )
Return the parameter field address for the definition xt.
BODY>
( addr — xt )
Return the xt corresponding to the parameter field address addr.
>CODE
( xt — addr )
Return the code address addr corresponding to xt.
CODE>
( addr — xt )
Return the xt corresponding to the code address addr.
>NAME
( xt — addr )
Return the address of the name field for the definition xt.
NAME>
( addr — xt )
Return the xt corresponding to the name at addr.

5.5.2 Wordlists and Vocabularies

Forth provides for the dictionary to be organized in multiple linked lists. This serves several purposes:

  • Shortens dictionary searches
  • Allows names to be used in different contexts with different meanings (as often occurs in human languages)
  • Allows internal words to be protected from inappropriate or unintentional use or redefinition
    to enable the programmer to control the order in which various categories of words are searched.

Such lists are called wordlists. Standard Forth provides a number of system-level words for managing wordlists, discussed in Forth Programmer’s Handbook. These facilities are fully implemented in SwiftForth.

In addition, SwiftForth defines a number of vocabularies. A vocabulary is a named wordlist. When the name of a vocabulary is invoked, its associated wordlist will be added to the top of the search order. A vocabulary is defined using the word VOCABULARY (also discussed in Forth Programmer’s Handbook).

New definitions are linked into the current wordlist, which is normally a vocabulary. However, there may be times when you want to manage a special wordlist outside the system of Forth defining words and vocabularies.

The word WORDLIST creates a new, unnamed wordlist, and returns a unique single-cell numeric identifier for it called a wid (wordlist identifier). This may be given a name by using it as the argument to CONSTANT.

The word (WID-CREATE) will create a definition from a string in a specified wordlist identifier, given its wid.

In this example, the string parameters for FOO and the wid returned by FORTH-WORDLIST are passed to (WID-CREATE):

S" FOO" FORTH-WORDLIST (WID-CREATE)

To search a wordlist of this type, you may use SEARCH-WORDLIST. It takes string parameters for the string you’re searching for and a wid. It will return a zero if the word is not found; if the word is found, it will return its xt and a 1 if the definition is immediate, and a -1 otherwise.

Wordlists are linked in multiple strands, selected by a hashing mechanism, to speed up the dictionary search process. Wordlists in SwiftForth may have any number of strands, and the user can set the system default for this. The default number of strands in a wordlist is 31.

WORDLIST
( — wid )
Create a new empty wordlist, returning its wordlist identifier.
(WID-CREATE)
( addr u wid — )
Create a definition for the counted string at addr, in the wordlist wid.
SEARCH-WORDLIST
( addr u wid — 0 | xt 1 | xt -1 )
Find the definition identified by the string addr u in the wordlist identified by wid. If the definition is not found, return zero. If the definition is found, return its execution token xt and 1 if the definition is immediate, -1 otherwise.
#STRANDS
( — addr )
Variable containing the number of strands for subsequent wordlist creation. Its default value is 31.

5.5.3 Packages

Encapsulation is the process of containing a set of entities such that the members are only visible through a user-defined window. Object-oriented programming is one kind of encapsulation. Another is when a word or routine requires supporting words for its definition, but which have no interest to the “outside” world.

Packages are a technique for encapsulating groups of words. This is useful when you are writing special-purpose code that includes a number of support words plus a specific API. The words that constitute the API need to be public (globally available), whereas the support words are best hidden (private) in order to keep dictionary searches short and avoid name collisions. Packages in SwiftForth are implemented using wordlists.

The simplest way to show how packages work is with an example.

PACKAGE MYAPPLICATION

PACKAGE defines a named wordlist, and places a set of marker values (referred to as tag in the glossary below) on the data stack. These marker values will be used by the words PRIVATE and PUBLIC to specify the scope of access for groups of words in the package, and must remain on the stack throughout compilation of the package.

Initially words defined in a package are PRIVATE. This means that the system variable CURRENT, which indicates the wordlist into which to place new definitions, is set to the newly created wordlist MYAPPLICATION.

Now we define a few private words. These words are the building blocks for the application, but are considered not to be generally useful or interesting. They are available for use while the package is being compiled, and are available at any time by explicit wordlist manipulation.

: WORD1 ( -- ) ...  ;
: WORD2 ( -- ) ...  ;
: WORD3 ( -- ) ...  ;

PUBLIC words are the words that are available to the user as the API, or to make the package accessible from other code. These words are placed into whatever wordlist was CURRENT when the package was opened. PUBLIC words may reference any words in the PRIVATE section, as well as any words normally available in the current search order.

PUBLIC
: WORD4 ( -- )   WORD1 WORD2 DUP + ;
: WORD5 ( -- )   WORD1 WORD3 OVER SWAP ;

We can switch back to PRIVATE words anytime.

PRIVATE
: WORD6 ( -- )   WORD1 WORD5 DUP + OVER ;
: WORD7 ( -- )   WORD1 WORD4 WORD6 SWAP ROT DROP ;

We can switch between PUBLIC and PRIVATE as many times as we wish. When we are finished, we close the package with the command:

END-PACKAGE

With the package closed, only the PUBLIC words are still visible.

If you need to add words to a previously-constructed package, you may reopen it by reasserting its defining phrase:

PACKAGE MYAPPLICATION

The word PACKAGE will create a new package only if it doesn’t already exist; so using PACKAGE with an already-created package name reopens it. After reopening, the normal PUBLIC, PRIVATE, and END-PACKAGE definitions apply.

PACKAGE
( — tag )
If the package name has been previously defined, open it and return its tag. Otherwise, create it and return a tag.
PRIVATE
( tag — tag )
Mark subsequent definitions invisible outside the package. This is the default condition following the use of PACKAGE.
PUBLIC
( tag — tag )
Mark subsequent definitions available outside the package.
END-PACKAGE
( tag — )
Close the package.

5.5.4 Windows Constants

This section applies to SwiftForth for Windows only.

Windows programming uses a large number of named constants, such as GENERIC_READ. The system would be burdened to include all of these as Forth definitions. The SwiftForth compiler contains a link to a precompiled wordlist (included with SwiftForth), and the dictionary search mechanism is extended to include it if the parsed text is not a known word or a valid number.

References to Windows constants are compiled as literals.

5.5.5 Dictionary Search Extensions

Because numerous entities can be referenced in SwiftForth beyond just defined words and numbers, the dictionary search mechanism has been extended. The full search includes the following steps (listed in the order performed):

  • Search the list of currently active local variables (if any)
  • Search the dictionary, according to the current search order
  • Try to convert the word as a number (including floating point)
  • Check the Windows Constants wordlist (Windows systems only)
    • Abort, with the error message “?”

    5.6 Terminal-type Devices

    SwiftForth provides a standard API for terminal-type devices (those that handle character I/O) that is described in the Terminal Input and Terminal Ouput topics in Forth Programmer’s Handbook. Most implementations handle the existence of varied actual character-oriented devices by vectoring the standard words KEY, EMIT, TYPE, etc., to perform appropriate behaviors for each device supported, along with a mechanism for selecting devices.

    5.6.1 Device Personalities

    In SwiftForth, the collection of device-specific functions underlying the terminal API is called a personality. Personalities are useful for making specialized serial-type destinations; all the serial I/O words in SwiftForth (e.g., TYPE, EMIT, KEY, etc.) are implemented as vectored functions whose actual behavior depends upon the current personality. SwiftForth’s command window has a specific personality, for example, which is different from the one supported for buffered I/O.

    A personality is a data structure composed of data and execution vectors; the first two cells contain the number of bytes of data and the number of vectors present. Any SwiftForth task or callback may have its own current personality; a pointer to the current personality is kept in the user variable ‘PERSONALITY.

    A description of a personality data structure is given below. When defining a personality for a device that supports only a subset of these, the unsupported ones must be given appropriate null behaviors. In most cases, this can be provided with ‘ NOOP (address of a word that does nothing); or you could use DROP or 2DROP to discard parameters.