.de Ls .RS .nr L 0 1 .. .de Le .RE .LP .. .de Np .IP (\\n+L) .25i .. .ds ar \v'2p'\s18\(->\s0\v'-2p' .ds sd \s8\v'.2m'\h'-0.4n' .ds su \v'-.2m'\s0 .ds ex \fIarg\fP .ds e1 \fIarg\*(sd1\*(su\fP .ds e2 \fIarg\*(sd2\*(su\fP .ds e3 \fIarg\*(sd3\*(su\fP .ds e4 \fIarg\*(sd4\*(su\fP .ds e5 \fIarg\*(sd5\*(su\fP .ds ei \fIarg\*(sdi\*(su\fP .ds en \fIarg\*(sdn\*(su\fP .ds e0 \fIarg\*(sd0\*(su\fP .ds xx \fIexpr\fP .ds x1 \fIexpr\*(sd1\*(su\fP .ds x2 \fIexpr\*(sd2\*(su\fP .ds x3 \fIexpr\*(sd3\*(su\fP .ds x4 \fIexpr\*(sd4\*(su\fP .ds x5 \fIexpr\*(sd5\*(su\fP .ds xi \fIexpr\*(sdi\*(su\fP .ds xn \fIexpr\*(sdn\*(su\fP .ds x0 \fIexpr\*(sd0\*(su\fP .ds v0 \fIvar\fP .ds v1 \fIvar\*(sd1\*(su\fP .ds v2 \fIvar\*(sd2\*(su\fP .ds v3 \fIvar\*(sd3\*(su\fP .ds vi \fIvar\*(sdi\*(su\fP .ds vn \fIvar\*(sdn\*(su\fP .ds ax \fIarg\fP .ds a1 \fIarg\*(sd1\*(su\fP .ds a2 \fIarg\*(sd2\*(su\fP .ds a3 \fIarg\*(sd3\*(su\fP .ds a4 \fIarg\*(sd4\*(su\fP .ds a5 \fIarg\*(sd5\*(su\fP .ds ai \fIarg\*(sdi\*(su\fP .ds an \fIarg\*(sdn\*(su\fP .ds a0 \fIarg\*(sd0\*(su\fP .de St .ta 1.0iR +.5i 4i .. .de S1 .ta 0.75i .. .de Pt .ta 0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i .. .TR 83-10d .DA "June 1983; Revised July 1983,\^ January 1984,\^ June 1984,\^ and August 1984" .Gr .TL Porting the UNIX Implementation of Icon .AU William H. Mitchell .AB This document explains how to port the UNIX implementation of the Icon programming language. The Icon system is composed of a translator,\^ a linker,\^ and an interpreter. Procedures for porting each system component are described in detail. This document is meant to be a companion to the Icon ``tour'' (TR 84-11) and the source code for the system. .AE .SH Introduction .PP This document describes how to port the Version 5 Icon interpreter to a \*U environment. .Un There is both an interpreter and a compiler available for Icon; this document only addresses porting the interpreter. The Icon system has three major components: a translator,\^ a linker,\^ and an interpreter. The translator and the linker are entirely written in C and porting them is merely a matter of setting constant values that are appropriate for the target machine. Portions of the interpreter are written in assembly language and thus must be written anew for each machine. The interpreter also contains a very small amount of C code that must be written on a per-machine basis. .PP The sections of this document that describe the porting of the translator and the linker are straightforward,\^ being merely a description of a process. While porting the translator and the linker is a task of following instructions,\^ porting the interpreter is a task of design and programming. The approach taken is to describe what function each routine must perform and how it is implemented in the VAX\u\(dg\d version of Icon. The porter's job is to determine how to .FS \u\(dg\dVAX is a trademark of Digital Equipment Corporation. .FE implement the various routines on the target machine. .PP In light of the increasing popularity of the C language and the availability of C compilers for non-UNIX environments,\^ it is quite possible that one may desire to port Icon to a non-UNIX environment. Because the matter of porting a UNIX program to a non-UNIX environment is a problem in itself,\^ it is not addressed in this document. Rather,\^ this document assumes that the target environment is UNIX. This is not to say that porting Icon to a non-UNIX environment is not feasible. Icon is not strongly bound to UNIX,\^ the primary association being that Icon is written in C. It is anticipated that most C systems that are available for a non-UNIX environment will provide most of the UNIX-independent C standard functions as part of a library. If such a library is available,\^ it should be possible to port Icon without great difficulty. .PP This document is a companion document of the Icon ``tour''\^[1] and should be studied with the source code for Version 5.8 of Icon at hand. In particular,\^ the porter should be familiar with the information contained in the tour. .PP The sections of this document that describe the VAX assembly language code attempt to explain the operation of instructions when the operation is not obvious. However,\^ this document does assume that the porter has a rudimentary familiarity with the basic concepts of the VAX-11 architecture\^[2]. .SH C Compiler Requirements .PP Because there is no standard for the C programming language,\^ it is difficult to say how ``standard'' the usage of C in the system is. The system was developed using the V7 C compiler,\^ often referred to as the Ritchie compiler\^[3]. The system was later ported to the VAX using the \fIPortable C Compiler\fP\^[4] and no serious problems were encountered. .PP In addition to supporting ``full'' C,\^ a few specific requirements and non-requirements are made on the C compiler: .Ls .Np The compiler must support both assignment and call-by-value for structures. .Np The compiler need not support bit field operations. .Np Arguments to C functions must be stored in consecutive,\^ ascending memory locations. .Np There may be problems if \*Msizeof(int)\fP and \*Msizeof(char *)\fP are not the same,\^ but no definite problems are known. .Np It is believed that there are great,\^ perhaps insurmountable problems,\^ if \*Msizeof(char *)\fP is not equal to \*Msizeof(int *)\fP. .Le .SH System Testing .PP The test programs and testing procedures to be used for porting Icon are described in \^[5]. At various points in this document,\^ the porter is directed to test the system component just completed. At such times,\^ the porter should refer to \^[5] to determine what should be done. .NH 1 Porting the Icon Translator .NH 2 Overview .PP The Icon translator,\^ known as \*Mitran\fR,\^ is the first logical component of the Icon system. The translator takes Icon source files as input and produces two \fIucode\fR output files for each input file. The translator may be run by saying: .Ds itran hello.icn .De This produces two ascii files,\^ \*Mhello.u1\fR and \*Mhello.u2\fR. \*Mhello.u1\fR contains interpretable instructions and data in a printable format. \*Mhello.u2\fR contains information about global symbols and scope. .PP The translator is written entirely in C and is the most machine independent major system component. No serious problems should be encountered in porting it. If difficulties are encountered,\^ it probably indicates that there are major problems with the C compiler being used. .NH 2 Porting Procedure .PP The Icon system contains a number of instances of values that must be specified on a per-machine basis. The system also contains assembly code and,\^ of course,\^ such code is different on each machine. Rather than maintaining a source copy of Icon for each machine that Icon runs on,\^ C preprocessor control statements are used to select portions of code specific to a certain machine. The source as distributed can be compiled on either a VAX and PDP-11* system by defining \*MVAX\fP or .FS *PDP is a trademark of Digital Equipment Corporation. .FE \*MPDP11\fP respectively in \*Mh/config.h\fP. The porting source has neither \*MVAX\fP or \*MPDP11\fP defined; rather,\^ \*MPORT\fP is defined. Where machine specific code is to appear,\^ along with sections bracketed by \*M#define\fPs for \*MVAX\fP and \*MPDP11\fP,\^ there is a skeletal section bracketed by a \*M#define\fP for \*MPORT\fP. The \*MPORT\fP section is to be filled out for the target machine. This convention is followed throughout and porting Icon is nothing more than filling in all the \*MPORT\fP sections. .PP The source for the translator is contained in the directory \*Mtran\fP. Translator machine dependencies are confined to the file \*Mtran/sym.h\fR. A pair of constants define the sizes of two data structures used during the translation process. Edit the file \*Msym.h\fR and search for the string \*MPORT\fR. The code looks something like .Ds .ta 2.0iR 2.5i #ifdef PORT #define TSIZE x /* default size of parse tree space */ #define SSIZE x /* default size of string space */ #endif PORT #ifdef VAX #define TSIZE 15000 /* size of parse tree space */ #define SSIZE 15000 /* default size of string space */ #endif VAX #ifdef PDP11 #define TSIZE 5000 /* default size of parse tree space */ #define SSIZE 5000 /* default size of string space */ #endif PDP11 .De The values of \*MTSIZE\fP and \*Mssize\fP are not critical and current values have been chosen rather arbitrarily. If you are on a large machine,\^ use the values of \*MTSIZE\fR and \*MSSIZE\fR specified for the VAX; otherwise,\^ use the values specified for the PDP-11. .PP The translator may now be compiled by issuing the \fImake\fR command without any arguments. .PP It should be noted that although Icon programs are used to create some of the translator source files (namely \*Mkeyword.h\fP,\^ \*Mkeyword.c\fP \*Moptab.c\fP,\^ and \*Mtoktab.c\fP). These files are machine independent and do not need to be remade. If for some reason \*Mmake\fP tries to create any of these files,\^ just \*Mtouch\fP the file in question to update the last-modified date. Similarly,\^ \*Mparse.c\fP is generated by \fIyacc\fP and does not need to be regenerated unless the grammar is modified. .PP When the translator has been successfully compiled using \fImake\fP,\^ refer to [5] for testing. .PP Porting the translator may seem like a trivial task,\^ but its successful completion is a definite milestone because it is good sign that the C compiler in use is suitable. .nr $1 1 .nr $2 0 .NH 1 Porting the Icon Linker .NH 2 Overview .PP The Icon linker,\^ known as \*Milink\fP,\^ is the second logical component of the Icon system. The linker takes \*Mu1\fP and \*Mu2\fP files produced by the translator and binds them together to form an \fIinterpretable\fP file. The interpretable file serves as input for the Icon interpreter. The linker is written entirely in C and is a fairly small and simple program. However,\^ the interpretable files produced by the linker are not machine independent and because of this,\^ porting the linker is more troublesome than porting the translator. .PP Interpretable files contain two distinct types of data: opcodes and associated operands that the interpreter ``understands''; and data that is directly mapped into run-time data structures. By ``mapping'' it is meant that the data is loaded into memory and then C structure references are used to access elements of the object at a certain location in memory. The formats of the opcodes and operands must conform to what the interpreter is expecting. The data that is directly mapped must conform to the format of the C data structures used by the run-time system. .PP On the VAX,\^ for example,\^ interpreter opcodes are one byte long and operands are four bytes long. On the PDP-11,\^ opcodes are also one byte long,\^ but operands are only two bytes long. Opcode and operand size are fairly arbitrary,\^ but it is important that the linker and the interpreter be coordinated. .PP The mapped data structures are slightly more complicated because the linker must conform to the format produced by the C compiler. This is not difficult,\^ since the data structures involved have a regular form. All are composed of some number of \fIwords\fP where each word is the same size in every structure.* .FS *Literature about the VAX conventionally uses the term \fIword\fP to refer to 16-bit quantities and the term \fIlongword\fP to refer to 32-bit quantities. In this document,\^ \fIword\fP in a generic context refers to the basic unit of the run-time data structures; \fIword\fP in a VAX-specific context refers to a 32-bit quantity. .FE .PP The opcodes,\^ operands,\^ and mapped data are accumulated in memory during the linking process. This conglomerate is referred to as the \fIcode\fP section. Several routines are used to add data to the code section. These routines are parameterized so that porting the linker to a new machine is merely a matter of setting the parameters correctly. Four primitive data units compose the code section. These are \fIopcodes\fP,\^ \fIoperands\fP,\^ \fIwords\fP,\^ and \fIblocks\fP. .IP opcodes .br are instructions for the interpreter. An opcode may direct the interpreter to push a value on the stack,\^ branch to a location,\^ perform an arithmetic operation,\^ etc. .IP operands .br are associated with some opcodes. For example,\^ the \*Mgoto\fP instruction has a location to branch to as its single operand. .IP words .br compose mapped data structures. A word is the basic unit of the run-time data structures and should consist of \*Msizeof(int *)\fP bytes. .IP blocks .br are merely some number of bytes. For example,\^ a \*Mcset\fP constant is loaded into the code section as a block of 32 8-bit bytes (256 bits). .PP Routines in \*Mlink/lcode.c\fP are used to add a unit of data of one of the preceding types to the code section. These routines are \*Moutop\fP,\^ \*Moutopnd\fP,\^ \*Moutword\fP,\^ and \*Moutblock\fP. Each routine adds the appropriate data into the code section at the current location (maintained as a pointer),\^ and then the location pointer is advanced to the next free location. .NH 2 Porting Procedure .PP Edit \*Milink.h\fP and search for the string \*MPORT\fP. Define the following constants as described. .IP \*MINTSIZE\fP .br The number of bits in an \*Mint\fP. .IP \*MLOGINTSIZE\fP .br The base 2 log of \*MINTSIZE\fP. That is,\^ \*MLOGINTSIZE\fP answers the question ``\fIWhat power of 2 is \*MINTSIZE\fR\^?''. .IP \*MLONGS\fP .br Icon has an integer data type. On the VAX and the PDP-11 the range of integer values is \-2\u\s-231\s0\d to 2\u\s-231\s0\d-1. On the VAX,\^ C \*Mint\fPs and \*Mlong\fPs are both 32 bits wide. On the PDP-11,\^ C \*Mint\fPs are 16 bits wide while \*Mlong\fPs are 32 bits wide. The PDP-11 Icon system makes an internal distinction between integers that ``fit'' in 16 bits and integers that require 32 bits. The former are stored in two-word descriptors (the actual value being in the second of the two 16-bit words),\^ while the latter have a value descriptor that points to a block in the heap that holds the two-word,\^ 32-bit value. On the other hand,\^ the VAX uses two 32-bit words for descriptors and thus the second word of a descriptor can hold the largest possible integer value used by Icon. Rather than having an internal distinction between integer types on the VAX,\^ integers are always represented by two-word integer descriptors. There are places in the code where special provisions must be made if C \*Mint\fPs are not the same size as C \*Mlong\fPs. .sp If \*Msizeof(int) != sizeof(long)\fP for the C compiler in use,\^ define \*MLONGS\fP. (\*MLONGS\fP need not be given a value,\^ \*M#define LONGS\fP is sufficient.) If \*MLONGS\fP must be defined,\^ the minimum and maximum values that can be represented by an \*Mint\fP must also be defined. Define \*MMINSHORT\fP to be the smallest value that an \*Mint\fP can hold and define \*MMAXSHORT\fP to be the largest value that an \*Mint\fP can hold. .IP \*MMAXCODE\fP .br This is the maximum size in bytes of the code that can be generated for each procedure. This value is not critical; 10,\^000 is used for the VAX,\^ while 2000 is used for the PDP-11. .IP \*Mstrchr\fP\ and\ \*Mstrrchr\fP .br If you are on a USG UNIX system,\^ \*M#define\fP \*Mindex\fP to be \*Mstrchr\fP and \*Mrindex\fP to be \*Mstrrchr\fP. .PP Edit \*Mdatatype.h\fP and search for the \*MPORT\fP section. This section contains \*M#define\fPs that are used to set and test flags contained in the first word of descriptors. The basic idea in forming these constants is to set some bits at the high end of the word,\^ and set some other bits at the low end. The number of unused bits in the middle depends on the size of a word. .PP \*MF_NQUAL\fP,\^ \*MF_VAR\fP,\^ \*MF_TVAR\fP,\^ \*MF_PTR\fP,\^ \*MF_NUM\fP,\^ \*MF_INT\fP,\^ and \*MF_AGGR\fP should be set to mask values with one bit set to 1 in each. For \*MF_NQUAL\fP,\^ the leftmost bit should set,\^ for \*MF_VAR\fP,\^ the next to leftmost bit should be set,\^ and so forth. The values for the VAX and PDP-11 should be suitable for machines with 32-bit and 16-bit words,\^ respectively. .PP The constants \*MOPSIZE\fP,\^ \*MOPNDSIZE\fP,\^ and \*MWORDSIZE\fP control the sizes of opcodes,\^ operands,\^ and words in the code section. Before setting these constants to values appropriate for the target machine,\^ a ``standard'' linker should be built and tested using the supplied values (under \*MPORT\fP) for these constants. This allows the linker to be checked against output files that are known to be correct. The purpose of this is to attempt to discover C compiler problems. Compile the linker using \*Mmake\fP and refer to [5] for the testing procedure. .PP Once the ``standard'' linker has been checked out,\^ the following ``sizing'' parameters in \*Milink.h\fP should be set to values appropriate for the target machine. .IP \*MOPSIZE\fP .br This is the size in bytes of interpreter opcodes. The interpreter treats opcodes as unsigned quantities. One byte (8 bits) is currently large enough to accommodate all opcodes and a value of 1 is recommended for \*MOPSIZE\fP. The \*Moutop\fP routine in \*Mlcode.c\fP assumes that opcodes are one byte. If a larger size is desired,\^ \*Moutop\fP will have to be recoded. It might be wise to use a value other than 1 for \*MOPSIZE\fP on machines that are not byte-addressable and have ample memory. .IP \*MOPNDSIZE\fP .br This is the size in bytes of operands for interpreter instructions. For some instructions,\^ the operand value represents an offset from the interpreter program counter and thus,\^ the maximum possible offset is limited by the magnitude of values that can be represented in \*MOPNDSIZE\fP bytes. Because larger operands occupy more code space and smaller operands limit addressing ``distance'',\^ a trade-off is involved. On the VAX,\^ operands are four bytes because memory space is not very critical. On the PDP-11,\^ operands are two bytes because of the limited memory. While it is easy to change the value of \*MOPNDSIZE\fP in the linker,\^ the operand size is pervasive in the interpreter. If the target machine has a large,\^ perhaps virtual address space,\^ use a value such as 4 for \*MOPNDSIZE\fP. A value such as 2 may be appropriate for a smaller machine. A value of 1 is not advisable under any circumstances. The suggested value for \*MOPNDSIZE\fP is \*Msizeof(int)\fP. .IP \*MWORDSIZE\fP .br This should be set to \*Msizeof(int *)\fP on the target machine. The various run-time data structures are all composed of a number of words each of which contain \*MWORDSIZE\fP bytes. For example,\^ the data blocks for user-defined procedures are built in the code section by a sequence of calls to \*Moutword\fP. .PP The \*Mbackpatch\fP routine in \*Mlcode.c\fP needs some machine-specific modifications. This routine backpatches forward references to ucode labels. In the \fIwhile\fP loop,\^ the operand (which is \*MOPNDSIZE\fP bytes long) that is pointed at by \*Mq\fP is loaded into the variable \*Mp\fP. Then,\^ the operand is replaced by the value of \*Mr\fP. On the VAX,\^ this can be expressed as: .Ds p = *q; *q = r; .De where \*Mq\fP is an \*Mint *\fP. This is possible because the VAX allows word references on an arbitrary boundary. On the PDP-11,\^ such references are illegal and the assignments must be made on a byte-wise basis. If the target machine allows word accesses on arbitrary boundaries,\^ the VAX code may be used (assuming \*MOPNDSIZE\fP is equal to \*Msizeof(int)\fP). If not,\^ but operands are the same size as \*Mint\fPs,\^ the PDP-11 code may be used. Other situations may require ingenuity. Be sure to alter the first \*MPORT\fP section in \*Mbackpatch\fP to contain an appropriate declaration for \*Mq\fP (that section currently contains a declaration for \*Mq\fP and a \*Mreturn\fP). .PP When the linker has been compiled,\^ refer to [5] for directions on testing. .nr $1 2 .nr $2 0 .NH 1 Porting the Icon Interpreter .NH 2 Introduction .PP The Icon interpreter,\^ known as \fIiconx\fP,\^ is the third major logical component of the system. The interpreter takes interpretable files produced by the linker and ``executes'' them. The interpreter is run by: .Ds iconx hello .De where \*Mhello\fP has been produced by the linker. .PP Due to the stack manipulations that the interpreter performs,\^ it is necessary for a small portion of the interpreter to be written in assembly language rather than in C. On the VAX,\^ about 550 lines of assembly instructions are required. The coding of these assembly instructions is the most difficult part of the port. .NH 2 Source File Layout .LP The interpreter is divided into four parts: .DS .ft R start-up code the main loop primary subroutines support subroutines .DE .LP The start-up code initializes the interpreter and passes control to the main loop. The main loop,\^ referred to as \*Minterp\fP,\^ fetches interpreter instructions and executes them. An interpreter instruction may be entirely performed by \*Minterp\fP or \*Minterp\fP may call a \fIprimary subroutine\fP to perform the operation. In turn,\^ a primary subroutine may call a number of \fIsupport subroutines\fP. Each primary subroutine has a direct correspondence to a source language operation of some type or to a stack manipulation. .PP While the translator and linker sources files are in their own directories,\^ the interpreter source files are segregated into several directories. .nr a \w'\*Moperators\fR'+1m .IP \*Miconx\fP (\na)u The start-up code and the main interpreter loop reside in this directory. Files of particular interest are: \*Mstart.s\fP,\^ which is entered when the interpreter is run and does some low-level initialization; \*Minit.c\fP,\^ which is called from \*Mstart.s\fP and completes initialization of the interpreter; and \*Minterp.s\fP,\^ which is the interpreter loop itself. .IP \*Mfunctions\fP (\na)u This directory contains code for the built-in procedures. For example,\^ \*Mwrite.c\fP contains the source for the \*Mwrite\fP function. The source for each built-in procedure appears in a file of its own. .IP \*Moperators\fP (\na)u This directory contains code for the Icon operators. The routines in this directory implement the various Icon source level operators. For example,\^ \*Mplus.c\fP is called to perform the \*M+\fP (addition) operation,\^ and \*Mbang.c\fP is called to perform the \*M!\fP (element generation) operation. As with the built-in procedures,\^ there is one operator per file. .IP \*Mlib\fP (\na)u This directory contains routines that do not fit anywhere else. First of all,\^ there is code for routines that perform actions similar in nature to those in \*Mfunctions\fP and \*Moperators\fP,\^ but that do not have a functional or operator syntax. For example,\^ \*Mllist.c\fP creates a list that is specified syntactically as \*M\^[\*(e0,\^\*(e1,\^\*(El,\^\*(en]\fR,\^ and \*Mfield.c\fP handles record element accesses that arise from \*M\*(e1.\*(e2\fR. .sp .8 \*Mlib\fP also contains routines such as \*Mesusp.s\fP and \*Mefail.s\fP that handle stack manipulations during expression evaluation. The routines \*Mpret.s\fP and \*Mpfail.s\fP handle procedure return and failure respectively. .sp .8 The directories \*Mfunctions\fP,\^ \*Moperators\fP,\^ and \*Mlib\fP compose the primary subroutines mentioned above. .IP \*Mrt\fP (\na)u The support subroutines are contained in the \*Mrt\fP directory. The primary subroutines are autonomous with respect to each other and use the \*Mrt\fP routines for common operations. For example,\^ \*Mcvstr.c\fP is used to convert a value to a string,\^ \*Mtrace.c\fP produces various types of tracing messages,\^ and \*Mgc.c\fP is the garbage collector. .IP \*Mh\fP (\na)u This directory contains a number of header files that are \*M#include\fPd in the other files that compose the interpreter. Of particular interest is \*Mrt.h\fP,\^ which defines a number of constants and data structures. .NH 2 Overview of the Porting Process .PP The following steps are to be followed when porting the interpreter. .Ls .Np Determination of layout of procedure,\^ generator,\^ and expression markers and associated frame pointers. .Np Setting of implementation specific constants in \*Mh/rt.h\fP and creation of \*Mh/defs.s\fP from \*Mrt.h\fP. .Np Complete system compilation. .Np Coding of a ``basis'' of routines for the interpreter,\^ consisting of \*Miconx/start.s\fP,\^ \*Mrt/setbound.s\fP,\^ \*Mlib/invoke.s\fP,\^ \*Miconx/interp.s\fP,\^ \*Mlib/efail.s\fP,\^ \*Mlib/pfail.s\fP. .Np Testing of the basis routines for the interpreter. .Np Coding and testing of .Ds rt/arith.s rt/fail.s lib/pret.s lib/esusp.s lib/lsusp.s lib/psusp.s rt/suspend.s functions/display.c .De in an incremental fashion. Test programs are provided to test the system after adding each routine. .Np Coding of \*Mrt/gcollect.s\fP and \*Mrt/sweep.c\fP. Testing of garbage collection. .Np Complete system testing. .Le .PP This document does not explain how to port the sections of the system that are related to co-expressions. The involved files are \*Mlib/coact.s\fP,\^ \*Mlib/cofail.s\fP,\^ \*Mlib/coret.s\fP,\^ \*Mlib/create.c\fP,\^ and \*Moperators/refresh.c\fP. Icon works properly with these sections of code left unimplemented,\^ provided no attempt is made to use co-expressions,\^ in which case the system notes it as a fatal error. .NH 2 Porting Procedure .SH Determination of Frame Layouts .PP Unfortunately,\^ one of the most far-reaching decisions that must be made during the porting process is also one of the first decisions that must be made. The decision (actually,\^ a number of decisions) is how to layout the procedure,\^ generator,\^ and expression frames and what registers should be used as frame pointers. The various frames and their usages are explained in detail in \^[1] and the portions of this document that describe routines that manipulate a particular frame also provide further explanations. The porter should have a good understanding of what the frames are used for before setting frame layouts as they are pervasive throughout the assembly language portions of the system. .PP This document is rather tightly bound to the VAX implementation of Icon. Because of this,\^ the stack model that is used is that of the VAX. Specifically,\^ the VAX stack starts in high memory and grows downward. Thus,\^ when something is pushed on the stack,\^ the stack pointer goes down. When something is removed,\^ the stack pointer goes up. The only time that this convention is departed from is in the use of the phrase ``the top of the stack''. The top of the stack is the stack word that has the \fIlowest\fP memory address. .PP The procedure frame layout is the first to be determined. The layout is somewhat fixed by the C compiler and target machine,\^ so the task is a combination of making a decision and also recognizing what has been pre-determined. On most machines,\^ the task of the porter is more one of recognition than of design. .PP The first thing to determine is the frame layout imposed by the target machine and the C compiler. Create a file containing the following .Ds f() { x(1,\^2); } .De Compile the file using \fIcc\fP in such a manner as to catch the assembly code that is generated in a file. The \*M\-S\fP option of \fIcc\fP should cause assembly code to be placed in a file. On the VAX,\^ the code generated by \*Mx(1,\^2)\fP is .Ds .ta .7i pushl $2 pushl $1 calls $2,\^_x .De From this it can be seen that arguments are pushed on the stack using the \*Mpushl\fP instruction,\^ and that the \*Mcalls\fP instruction does the actual procedure call. The first argument to \*Mcalls\fP is the number of arguments that are on the stack. When a return is made from a procedure called with a \*Mcalls\fP instruction,\^ the arguments are removed from the stack by the return mechanism. On some machines,\^ the removal of arguments after a subroutine call is left to the programmer (or code generator,\^ in this case). This is usually done by adding a value to the stack pointer or incrementing the stack pointer several times. .PP Examine the assembly code produced on the target machine by the given C statements. Determine what actions are taken by the machine when the appropriate call instruction is performed. It is important to completely and totally understand what the target machine does when a call is performed. Next,\^ determine what sort of procedure frame is used by C routines. Compile the following C function using \*M\-S\fP. .Ds .ta .7i f(a,\^b,\^c) int a; char b; char *c; { int x,\^y; x = a; a = 1; y = 2; } .De Look at the generated code and try to get a feel for what is going on. The things that need to be determined are: .Ds .ft R how arguments are accessed the format of the C call frame register saving and restoring conventions .De For example,\^ on the VAX,\^ the following code is generated for the test procedure. .Ds .Pt .word L12 register save mask,\^ filled in later jbr L14 jump to end to make stack space L15: movl 4(ap),\^-4(fp) x = a movl $1,\^4(ap) a = 1 movl $2,\^-8(fp) y = 2 ret return .set L12,\^0x0 set register mask L14: subl2 $8,\^sp make room for two local variables of four bytes each jbr L15 jump to start of routine .De Several inferences can be made. First of all,\^ arguments are accessed relative to \*Map\fP,\^ the argument pointer. Secondly,\^ local variables are accessed relative to \*Mfp\fP,\^ the frame pointer. On the VAX,\^ because of the hardware register save and restoration based on the entry mask (the first word of the routine),\^ no subroutine calls are required to save registers. .PP The Icon procedure frame must have the following attributes: .Ls .Np The values on the stack at the time of call to the procedure appear as arguments to the procedure. Furthermore,\^ the values must be accessible in a deterministic fashion. .Np Register values are saved in the frame and can be accessed deterministically. .Np \*M_line\fP and \*M_file\fP appear in the procedure frame just below the last word pushed on the stack as part of the C procedure calling protocol. .Np The region for local variables begins at the lower end of the ``constant'' portion of the frame. Local variables must be be accessible via deterministic means. .Np The procedure frame created by a C procedure call must be a subset of the procedure frame selected. That is,\^ the Icon procedure frame must be an augmentation of the C procedure frame. .Le .LP The VAX uses this procedure frame layout: .Ds .St .ft R arguments 4 number of arguments (\*Mnargs\fR) \*Map\fR \*(ar 0 number of words in argument list (\*Mnwords\fR) saved \*Mr11\fR (\*Mefp\fR) saved \*Mr10\fR (\*Mgfp\fR) \*(El last saved register 16 saved \*Mpc\fR 12 saved \*Mfp\fR 8 saved \*Map\fR 4 program status word and register mask \*Mfp\fR \*(ar 0 0 (condition handler address) -4 saved value of \*M_line\fR -8 saved value of \*M_file\fR \*Msp\fR \*(ar Icon local variables .De .PP Actually,\^ on the VAX,\^ most of the decisions are predetermined by the VAX architecture. The arguments are present on the stack,\^ so they are the high end of the frame. The registers are saved on the stack by the \*Mcalls\fP instruction. The values of \*M_line\fP and \*M_file\fP naturally fit after the saved registers. The locals then appear on the lower end and extend for a variable distance (on a per-procedure basis). Note that the first local is at \*M\-16(fp)\fP and the \fIlast\fP argument is at \*M8(ap)\fP. .PP The VAX hardware takes care of saving and restoring registers upon subroutine entry and exit. It is quite possible that the target machine will not have this capability and the task must be delegated to software. This is usually evidenced by a call to a routine with a name such as \*Mcsave\fP as the very first thing in the routine and a call to a routine with a name such as \*Mcrestore\fP at the end of a routine. If this is the case,\^ the actions of the saving and restoring routines must be taken into account when determining the procedure frame layout. .PP In addition to determining the procedure frame layout,\^ a procedure frame pointer must also be selected. On the VAX,\^ the \*Mfp\fP stays constant throughout execution of a C procedure; it is used as the procedure frame pointer. For the target machine,\^ there should be some register on which references to local variables (and perhaps parameters) are based. That register should be used as the procedure frame pointer (sometimes referred to as the \*Mpfp\fP). The \*Mpfp\fP need not point at the lowest word pushed on the stack as part of the procedure call; it only needs to be constant while a procedure is executing. Of course,\^ the \*Mpfp\fP changes while the program is executing; by ``pointing at'' a particular word,\^ it is meant that the \*Mpfp\fP always references a certain word in the procedure frame marker. An \*Mrt.h\fP constant,\^ \*MFRAMELIMIT\fP,\^ is dependent on the number of words between the lowest word of the procedure marker and the word that the \*Mpfp\fP points to. Setting \*MFRAMELIMIT\fP is described below. .PP A point about terminology should be stressed. The procedure frame marker is bounded by arguments on one end and the Icon local variables on the other. A procedure marker,\^ the arguments,\^ the Icon local variables,\^ and the stack below the local variables compose a procedure frame. .\"\^[Note in here about variable size of marker,\^ forced saving of efp,\^ gfp.,\^ .\"ap being needed] .PP Determining the procedure frame layout is by no means a deterministic process. It takes work,\^ but once it's successfully set,\^ the single hardest task of the port is complete. .PP Once the procedure frame has been set,\^ the generator frame layout follows rather easily. A generator frame is merely an augmented procedure frame. The generator frame has two additional pieces of information,\^ a saved value of \*M_k_level\fP,\^ and a saved value for the boundary. It is recommended that the generator frame be identical to a procedure frame except that the two extra words required be located between the lowest word that is pushed on the stack by the procedure call mechanism and the saved value of \*M_line\fP. Thus,\^ on the VAX,\^ the generator frame \fImarker\fP is .Ds .ft R .St saved \*Mr11\fR saved \*Mr10\fR \*(El last saved register 20 reactivation address 16 saved \*Mfp\fR 12 saved \*Map\fR 8 program status word and register mask 4 0 (condition handler address) \*Mgfp\fR \*(ar 0 saved value of the boundary -4 saved value of \*M_k_level\fR -8 saved value of \*M_line\fR -12 saved value of \*M_file\fR .De Note that instead of a saved \*Mpc\fR value,\^ the generator frame marker holds a reactivation address. Control passes to this address when the generator is reactivated. Reactivation is fully explained in later sections. .PP A generator frame pointer (\*Mgfp\fP) is associated with a generator frame. On the VAX,\^ \*Mr10\fP is the \*Mgfp\fP. The choice of a \*Mgfp\fP is indirectly determined by the machine architecture and is intertwined with the selection of an expression frame pointer. The selection of the register to use as the \*Mgfp\fP is discussed below. It is recommended that the \*Mgfp\fP point at the word containing the saved boundary value. .PP The third type of frame marker is the expression frame marker. Expression frame markers are totally machine independent and contain three pieces of information: a saved expression marker address,\^ a saved generator marker address,\^ and a failure label that is to be given control in certain circumstances. On the VAX,\^ the expression marker layout is .Ds .ft R .St \*Mefp\fR \*(ar 0 saved \*Mefp\fR value -4 saved \*Mgfp\fR value -8 failure address .De This same format should be used on the target machine and there is no apparent reason for needing an alternative format. The expression frame pointer (\*Mefp\fP) should point at the high word of the expression marker. .PP The registers that should be used for the \*Mgfp\fP and \*Mefp\fP are indirectly dependent on the procedure call mechanism. The primary requirement for the registers used as the \*Mefp\fP and \*Mgfp\fP is that they are saved across procedure calls. The secondary requirement is that the \*Mgfp\fP and \*Mefp\fP always be saved in a procedure frame. If the target machine has two general purpose registers that are always saved in a procedure frame,\^ those two registers are quite suitable for the \*Mgfp\fP and \*Mefp\fP. .PP If the procedure call mechanism does not always save a pair of general purpose registers,\^ the problem is more complicated. There are stack manipulations that are performed that \fIrequire\fP saved values of \*Mefp\fP and \*Mgfp\fP to be present in procedure and generator frames. For built-in procedures and Icon procedures this is no problem because \*Minvoke\fP creates the procedure frame for them and can insure that the registers are saved. On the VAX,\^ for the C routines that are directly called from \*Minterp\fP,\^ no such assurances can be made because the VAX C compiler directs only the registers used in a routine to be saved in the C procedure frame. This creates a problem because Icon counts on the registers being saved. The problem is countered by making the C compiler think that certain registers are used in certain routines. Specifically,\^ declarations for a pair of \*Mregister int\fP variables are placed at the start of appropriate routines. On the VAX,\^ the first two local variables declared in a C routine \fIalways\fP get allocated to \*Mr10\fP and \*Mr11\fP. Thus,\^ \*Mr10\fP and \*Mr11\fP are used for the \*Mgfp\fP and the \*Mefp\fP respectively. If the target machine is like the VAX in that it doesn't always save certain registers,\^ a similar tactic may need to be used. If this is the case,\^ try compiling a routine with a pair of \*Mregister int\fP variables declared and see what the compiler does. If the compiler saves the two registers assigned to the variables,\^ use those registers for the \*Mgfp\fP and the \*Mefp\fP. It is wise to attempt to be sure that the compiler is deterministic in making its choice of registers to allocate to the variables. Routines that require this ruse to be employed have a line containing the string \*MDclSave\fP as the first line of the declarations. \*MDclSave\fP is defined in \*Mrt.h\fP and should be set to an appropriate value. It may be the case that no registers need to be saved. If so,\^ define \*MDclSave\fP,\^ but specify no value. This is done for the PDP-11. .PP It is also necessary to select a register to use as the interpreter program counter (\*Mipc\fP). Any general register that is preserved across procedure calls is suitable. The VAX uses \*Mr9\fP for the \*Mipc\fP. .NH 2 Machine and System Specific Values .PP Edit \*Mh/rt.h\fP and search for the first \*MPORT\fP section. Define the various constant values as outlined below. .IP \*MMAXHEAPSIZE\fP .br The size of the heap storage region in bytes. The VAX uses 50k and the PDP-11 uses 10k. If you have a small machine,\^ use 10k. Larger machines should use larger values,\^ such as that for VAX. .IP \*MMAXSTRSPACE\fP .br The size of the string storage region in bytes. As with \*MMAXHEAPSIZE\fP,\^ this value is somewhat arbitrary. A value similar to that used for the heap size should be used. .IP \*MSTACKSIZE\fP .br The size of co-expression stacks in words. Use 1000 for smaller machines,\^ 2000 for larger ones. .IP \*MMAXSTACKS\fP .br The number of co-expression stacks initially allocated. Use 2 for smaller machines,\^ 4 for larger ones. .IP \*MNUMBUF\fP .br The number of i/o buffers available. When a file is opened,\^ a buffer is assigned to the file if one is available. A value from 5 to 10 is recommended. .IP \*MINTSIZE\fP .IP \*MLOGINTSIZE\fP .IP \*MLONGS\fP .IP \*MMINSHORT\fP .IP \*MMAXSHORT\fP .br These constants must be set to the values they were given (if any) in \*Mlink/ilink.h\fP. .IP \*MMINLONG\fP .br The smallest value that can be represented in a \*Mlong\fP. .IP \*MMAXLONG\fP .br The largest value that can be represented in a \*Mlong\fP. .IP \*MLGHUGE\fP .br The highest base-10 exponent plus 1 representable by a \*Mfloat\fP. For example,\^ on the VAX,\^ the highest number representable by a \*Mfloat\fP is about 1.7x10\u38\d. Thus,\^ \*MLGHUGE\fP is 39 on the VAX. .IP \*MFRAMELIMIT\fP .br As discussed above,\^ set \*MFRAMELIMIT\fP to the number of words between the low word of the procedure frame marker and the word that the procedure frame pointer references. .IP \*MSTKBASE\fP .br This value represents the approximate base of the stack when execution begins. On machines such as the VAX,\^ where the stack grows down from high memory,\^ \*MSTKBASE\fP should have a high value,\^ where on machines where the stack grows up from low memory,\^ \*MSTKBASE\fP should have a low value. The \fIman\fP page for \fIexec(2)\fP usually specifies the initial value for the stack pointer when program execution begins. If uncertain,\^ be extreme with the value. .IP \*MGRANSIZE\fP .br The granularity of memory allocations. Calls to \fIbrk(2)\fP are used to expand the main memory that is being used. When \fIbrk\fP is given an address to expand to,\^ it rounds it to a multiple of a certain number. That number should be used for \*MGRANSIZE\fP. The \fIman\fP page for \fIbrk(2)\fP should state what value is used on a particular system. .IP \*MDclSave\fP .br Give \*MDclSave\fP the value needed as previously described. .IP \*MEntryPoint(x)\fP .br \*MEntryPoint\fP is a macro that is used to yield the address of the first instruction of the C routine \*Mx\fP that is past any procedure entry protocol instructions. On the VAX,\^ the register mask is two bytes long and thus the first executable instruction of a routine \*Mx\fP is at \*M(char *)x + 2\fP. On the PDP-11,\^ there is a four-byte instruction at the start of each routine that calls the routine \*Mcsv\fP to save registers and establish the procedure frame. Thus for the PDP-11,\^ \*MEntryPoint(x)\fP is \*M(char *)x + 4\fP. Values calculated by \*MEntryPoint\fP are used in \*Minvoke\fP. .IP \*MDummyFcn(name)\fP .br Initially,\^ each of the assembly language portions of the system that must be filled in consist of a single line of the form \*MDummyFcn(name)\fP. \*MDummyFcn\fP should be defined to generate \fIassembly\fP language statements that form a dummy routine with the label \*Mname\fP. This can be as simple as a label and a global declaration. It is advisable to include as part of the definition something that will cause a program abort. A halt instruction usually does the job. Thus,\^ the system can be built and will function normally unless an incomplete routine is called. .IP \*MDummyDcl(x)\fP .br A macro that should expand into an assembly language declaration that allocates a word of storage for a variable named \*Mx\fP. .IP \*MDummyRef(x)\fP .br A macro that should expand into an assembly language reference to the variable \*Mx\fP. .IP \*MGlobal(x)\fP .br A macro that should expand into an assembly language declaration of \*Mx\fP as a global symbol. .IP \*Mfp\fP,\^\ \*Mefp\fP,\^\ \*Mgfp\fP,\^\ \*Mipc\fP .br It is advisable to use \*M#define\fPs for these registers rather than explicitly name them in the code. .IP \*Mcset_display\fP .br This is a rather complicated macro that is used to initialize the values of \*Mcset\fPs such as \*M&cset\fP and \*M&lcase\fP. If the target machine has \*Mint\fPs with 32 or 16 bits,\^ then the VAX or PDP-11 definition (respectively) of \*Mcset_display\fP may be used. If this is not the case,\^ \*Mcset_display\fP will have to be hand-crafted and the various uses of it will have to be altered for the machine in question. Briefly,\^ a \*Mcset_display\fP specifies which of the 256 bits that comprise a cset are to be set to 1. For example,\^ the \*Mcset_display\fP for \*M&cset\fP has all the bits set to 1,\^ while \*M&ascii\fP has the first 128 bits set to 1. \*Mcset\fPs are accessed using the \*Msetb\fP and \*Mtstb\fP macros which are also defined in \*Mrt.h\fP. \*Mcset_display\fPs appear in \*Miconx/init.c\fP,\^ \*Mfunctions/bal.c\fP,\^ and \*Mfunctions/trim.c\fP. It may also be necessary to modify the definitions of \*MCSETSIZE\fP,\^ \*Msetb\fP,\^ and \*Mtstb\fP. .PP Search for the second \*MPORT\fP section. \*MF_NQUAL\fP,\^ \*MF_VAR\fP,\^ \*MF_TVAR\fP,\^ \*MF_PTR\fP,\^ \*MF_NUM\fP,\^ \*MF_INT\fP,\^ and \*MF_AGGR\fP should be given the same values they have in \*Mlink/datatype.h\fP. .PP Once \*Mrt.h\fP has been completed,\^ an analogous file,\^ \*Mh/defs.s\fP must tailored. \*Mdefs.s\fP is a subset of \*Mrt.h\fP that is included in assembly language files. The \*MPORT\fP section of \*Mdefs.s\fP lists a number of constants that must be defined. Use the appropriate values from \*Mrt.h\fP for each constant. If all assemblers used a default radix of 10 for constants,\^ it would be possible to tailor \*Mdefs.s\fP mechanically,\^ but since this is not the case,\^ \*Mdefs.s\fP must be modified by hand. .NH 2 Complete System Compilation .PP In order to determine if there are serious C compiler problems with the interpreter source,\^ the entire system should be made at this point. Do a .Ds make Icon .De in the root directory of the Icon distribution. The entire system should compile without any problems. The resulting interpreter will be completely disfunctional,\^ but if it is built without any problems,\^ it provides further evidence that the C compiler is up to the task. .NH 2 Porting the Assembly Language Routines .PP The porting of the assembly language routines is the most difficult part of porting Icon. This document has a section for each assembly language routine and each routine is described in three ways: .Ds .ft R overview generic operation the routine on the VAX .De .PP The overview section briefly describes the action of the routine and how the routine may be encountered during the course of execution. The generic operation section tells what steps the routine takes to perform its given task. Each major step that the routine takes is described. These steps should be very similar from machine to machine. The section about the routine on the VAX details the operation of the routine on the VAX. This section complements the comments contained in the source code for the routine and should be read with the source code at hand. This section is very machine specific. (Ideally there would be one such section for each existing Icon implementation.) .PP Each routine must be formulated for the target machine. For the most part,\^ the best approach is to take the same steps that are taken on the VAX. It is important to select the right level for modeling the VAX routines. Try to recognize the steps that are made rather than following the operations on a per-instruction basis. The most important thing is to have a good understanding of what actions are performed and how these can be done on the target machine. .PP The first goal is to get a very simple Icon program working. This first program is \*Mtest/hello.icn\fP. It is quite short: .Ds procedure main() write("hello world") end .De The basis of routines mentioned above (\*Mstart.s\fP,\^ \*Msetbound.s\fP,\^ \*Minvoke.s\fP,\^ \*Minterp.s\fP,\^ \*Mefail.s\fP,\^ and \*Mpfail.s\fP) must be implemented for even a very simple Icon program to work. However,\^ all these routines do not need to be written to make \*Mhello\fP \fIbegin\fP to work. .PP Translate and link \*Mhello\fP by running the translator and the linker: .Ds tran/itran hello.icn link/ilink hello.u1 .De This creates an interpretable file named \*Mhello\fP. Just to get the feel of things,\^ run the interpreter on the file: .Ds iconx/iconx hello .De A message of some type and a core dump should be produced. .PP The files \*Mtran/itran\fP,\^ \*Mlink/ilink\fP,\^ and \*Miconx/iconx\fP,\^ are copied into the \*Mbin\fP directory as the last action of .Ds make Icon .De in the root directory. The porter may find it convenient to link these files to the \*Mbin\fP directory and then place the full pathname in his search path. It is necessary to remove \*Mitran\fP,\^ \*Milink\fP,\^ and \*Miconx\fP in the \*Mbin\fP directory before linking them. Also,\^ if the files are linked,\^ the last step of \fImake\fP in the root directory will fail. This failure is inconsequential. .PP As \*Mstart.s\fP et al. are written,\^ try stepping through them to be sure the correct actions are being performed. Most of the assembly language source files are straight line code with a branch or two and it is possible to do a large amount of verification of the assembly code by single stepping through it with a debugger. .PP When a routine has been completed,\^ it may be added to the interpreter by doing a \fImake\fP in the directory containing the routine and then doing another \fImake\fP in the directory \*Miconx\fP.