% -*-LaTeX-*- % TECOEPS.LTX.54, 3-Jun-86 10:23:23, Edit by BEEBE % This is part of TECO.LTX In this section, we discuss some of the features of the implementation of \TECO{} under \EPSILON{}. Although some critical remarks may be made, \EPSILON{} and \EEL{} are products of high quality and I commend Lugaru Software for a fine job. Considering the dismal history on the \IBMPC{} of compiler and assembler quality from numerous vendors, including {\sc ibm} and {\sc microsoft}, it is gratifying to have a compiler for a language as complex as \C{} which actually works! \subsection{Epsilon Support} \EPSILON{} provides all of the underlying support for buffer, file, and window management, and runs the main command loop which interprets keyboard characters. Most \EPSILON{} functions have been patterned after capabilities of \EMACS{}, and most are written in \EEL{}, a \C{}-like language which must be compiled, \index{compilation} but once compiled, can be dynamically loaded into the editor. Since \EPSILON{} supports a process buffer in which one can run operating system commands, in practice, this means that one can edit a function written in \EEL{}, save it, compile it in the process buffer, load it, and test it, all without leaving the editor. This is quite similar to \EMACS{}, which offers a {\em Tcompile} \INDEX{\T{Tcompile}}{Tcompile} function to compile a \TECO{} program in the current region, after which it is immediately available for execution. By using the minibuffer in both editors, \TECO{} code can be developed and interpreted immediately. \EPSILON{} also provides the basic editing primitives of insertion, deletion, replacement, and searching, and these are used directly by the \TECO{} interpreter. The \EPSILON{} model of buffer selection is unusual and not \C{}-like, but nevertheless straightforward. A buffer is selected for editing by assigning a buffer name pointer to the global variable \TX{bufname}, and \POINT{} is updated by assigning a new value to the global variable \T{point}\INDEX{\T{point} variable}{point variable}. The function \EF{size} returns the current buffer size, and all access to the buffer is via function calls, such as \TX{character(n)} to obtain the $\T{n}^{th}$ character, or \TX{insert(c)} and \TX{stuff(s)} to insert single characters or strings. This implementation means that \EPSILON{} can support paging of large files, even ones which exceed the limited address space of the \IBMPC{}. Updating of \POINT{} inside these functions is pretty much the same as \ETECO{} does, so programming functions for the \TECO{} interpreter is not fraught with mental remappings of {\em where are we now?}. \subsection{The Epsilon Debug Facility} \EPSILON{} has a limited source-level debugger for \EEL{} functions. It displays the function execution line by line, allowing stepping into, or over, function calls, suspending stepping until the function completes, changing the debug window size, and entering a recursive edit. The \EEL{} debugger has two serious lacks, however. The first is that there is no facility for examining, and possibly changing, variable values, and the second is that there is no call traceback display. These drawbacks seriously limit the speed at which debugging can be carried out, because one still has to go back to the source code and insert debug output statements, then recompile, reload, and re-execute it. \subsection{TECO Object Representation} One of the first considerations that must be addressed in designing a \TECO{} interpreter is the representation of objects which could be either strings or numbers. Since these are what Q-registers hold, we shall call them \X{Q-objects}. On the \TWENTY{}, the machine on which \EMACS{} was originally developed, the computer word contains 36 bits, and until the advent of extended addressing with later models of the family, a word address required 18 bits, giving a total address space of $2^{18} = 262144$ words, or about 1.1Mbytes. Characters are normally stored as 7-bit ASCII, five per word, with the left-over low-order bit set to zero. The architecture provides for efficient addressing of bytes of any size from 1 to 36 bits, and a one-word byte pointer consists of position and size fields in the high-order 18 bits, and the word address in the low-order 18 bits. A character pointer and an integer can therefore both occupy the same storage, and the particular bit patterns in the pointer give it an integer value which is near the most negative integer, $-2^{35} \equiv -34359738368$. \ETECO{} allocates one word for each Q-object, and when necessary, determines from the high-order bits whether it is a string pointer or integer. This means, for example, that copying one Q-register into another, such as \T{QpUq}, can be efficiently implemented by copying a single word, without regard for its type. Arithmetic performed on a Q-object which is actually a string pointer just uses the pointer value, but the result is unlikely to be a valid pointer. No check is made in arithmetic expression evaluations for such pointer arithmetic. The copying of Q-objects which might be pointers has an important ramification---after \T{QpUq}, both Q-registers point to the {\em same} string. If one subsequently assigns one of them a numeric value, as by \T{3Uq}, the interpreter cannot free the string storage pointed to by the previous contents of Q-register \T{q}, because there could be, and in this case, {\em is}, another reference to it. \ETECO{} therefore must have a {\em garbage collector}, \index{garbage collection} just like \LISP{}. The {\sc intel {\rm i}apx}\INDEX{\sf INTEL {\rm i}APX}{intel iapx} architecture on which the \IBMPC{} is based has a byte-addressable segmented memory space in which an address is a composite object consisting of a 16-bit segment, and a 16-bit offset. On the 8088, 8086, and 80186 processors, \index{80xxx microprocessors} a complete address is formed by multiplying the segment value by 16 and adding to the offset, giving a 20-bit address space of $2^{20} = 1048576$ bytes. On the more powerful 80286 and 80386 processors in virtual address mode (instead of the real address mode compatible with the smaller processors), the segment is no longer a number to be added to the address, but instead an index into a table which provides a base address to which the offset portion is added. On the 80286, this constructs a 24-bit physical memory address, and on the 80386, a 32-bit address. Since all but the 80386 have only 16-bit registers, and all lack an instruction to perform arithmetic on a segment-offset pair,\footnote{This is probably the greatest weakness of the i{\sc apx} architecture, for it takes about 30 instructions to add a constant to a pointer!} the developers of \EPSILON{} and \EEL{} chose to make pointers and 32-bit integers incommensurate, and coercion between them is forbidden by the language. The effect of this design decision is that since we cannot perform pointer to integer coercions, Q-objects cannot be represented by single 32-bit integers. Instead, they become a record structure containing a type flag, an integer, and a pointer. In \EEL{}, this is \begin{verbatim} struct Q_object { int type; /* QT_NUMBER or QT_STRING */ int n; /* value if type == QT_NUMBER */ char *s; /* value if type == QT_STRING */ int name; /* Q-register name (for function return restore) */ }; \end{verbatim} Use of a \T{union} could overlay the integer and pointer, but we have not done so. Throughout the interpreter, therefore, when Q-objects are manipulated, it is necessary to check the type flags to determine what code branch must be executed. Of course, much of this is encapsulated in preprocessor macro definitions to simplify the coding and hide the grubby details. In this \TECO{} interpreter, both Q-registers and stacks are implemented as Q-objects. This makes it simple to support looping, push and pop commands, and recursive invocation of the interpreter and \EPSILON{}, all the while maintaining a global set of Q-registers and Q-register stack. \subsection{TECO Garbage Collection \index{garbage collection}} The string assignment problem of the last section now brings us to the issue of {\em garbage collection}. The core of most \LISP{} and \TECO{} interpreters is written at a low level, usually assembly language, where the programmer has complete control over the environment and how things get implemented. Garbage collection can be designed in from the start. With \EEL{}, we have a \C{} run-time environment, where standard library functions \EF{malloc} and \EF{free} can be used for dynamic memory management. However, we do not have access to the internals of this management to implement something like a garbage collector. Of course, one could initially allocate a large portion of free memory and then write private allocation and deallocation routines which could handle garbage collection, but it seems like overkill to do this. Instead, the choice made in this \TECO{} interpreter is that string assignment is done by making a fresh copy of the string, rather than by straight pointer assignment. We are then guaranteed that no string has more than one pointer pointing to it, and therefore, if a number is subsequently assigned to the Q-object, the string space can be deallocated without fear of a dangling pointer somewhere. Although this takes more care than would be necessary if we really did have a garbage collector, there is in fact only a small number of places where such extra bookkeeping need be done, so it is not burdensome. One can of course regret that the standard \C{} library does not provide for garbage collection, because careless programming can lead to growth in allocated memory. Indeed, one of the popular command shells in the {\sc unix} system for a long time had such a bug, and would grow during a login session to the point where it finally died for lack of memory, and logged out the user. We try to be more careful. \subsection{Outline of the Interpreter\index{interpreter}} When the compiled \TECO{} interpreter is first loaded into \EPSILON{}, the function \EF{when\_loading} is executed; it loads in the various compiled files that make up the interpreter, then invokes the function \EF{teco\_startup}.\footnote{ We use \EF{foo} to mean an \EEL{} function with zero or more arguments, and \EC{foo} for an \EEL{} command, which is not permitted to have arguments. Although commands are otherwise normal integer functions, they rarely return values. Another distinction between them is that command names are recognized for completion by \EC{extended-command}, while function names are not. Both can be invoked as extended commands.} That function initializes the Q-registers, flags, mark ring, and function dispatch table, and then calls \EF{teco\_init} which initializes stacks and clears the arguments. \EF{teco\_init} is also called by \EC{execute-minibuffer} and \EC{step-minibuffer}. This initialization is not part of \EF{teco\_interpreter}, so that recursive invocation of the interpreter preserves stacks, arguments, and Q-registers. With these initializations complete, the user can run \EC{execute-minibuffer} and \EC{step-minibuffer}. The former is the simplest---it checks first for the existence of the minibuffer, then calls \EF{teco\_init}, \EF{save\_minibuffer}, and finally \EF{teco\_interpreter}. The intermediate function, \EF{save\_minibuffer}, is introduced partly because of efficiency concerns. Because \EPSILON{} permits buffer access only through function calls, it is advisable to copy the minibuffer into a string which can be accessed in-line. There is the happy side benefit that since \EF{teco\_interpreter} operates on a string, rather than a buffer, it is easy to invoke it recursively on a new string, providing straightforward support for the \TECO{} function-call function, \T{M}. \EF{teco\_interpreter} is the core of it all, but is surprisingly simple. It loops over the string into which \EF{save\_minibuffer} copied the minibuffer, using a string index which is globally available. At each iteration, it just calls \EF{teco\_eval} to evaluate the next function. \EF{teco\_eval} checks for a user abort, then skips over whitespace and comments (another efficiency measure), checks the debugger flags and calls \EC{teco\_debugger} if necessary, and finally, dispatches through a function table to one of up to 256 functions based on the value of the current minibuffer character. When a binary operator, like $\T{+}$, is found, the dispatch calls the function \EF{teco\_binop} which saves the current post-comma argument for the left operand and calls \EF{teco\_eval} to evaluate the right operand. This results in a recursive entry to \EF{teco\_eval}. Similarly, when a left parenthesis is met, the dispatch calls \EF{teco\_leftparen} which saves the current flags and arguments, then loops testing for a right parenthesis which terminates the loop, and otherwise calls \EF{teco\_eval}, which again is recursive. At loop exit, the current flags and arguments are merged with the values saved on entry. Thus, parenthesized expression lists can be nested arbitrarily deep, and any number of comma-separated expressions can be present in any list. It is important for the string to be an exact copy of what is in the minibuffer, because then the string index corresponds to \POINT{} in the minibuffer, and viewing the minibuffer during the course of execution of the \TECO{} program will always show the cursor tracking the execution. Even if this were not the case, it is not feasible to compact the input, because compression of white space and comments requires a complete \TECO{} interpretation, and comments which are present to balance conditionals and loops cannot be removed anyway. All functions called advance the global string index at least one position, except for the loop end function, which backs it up. The index movement is in general different for each function, so it cannot be handled by \EF{teco\_eval} itself. The functions which handle a single whitespace character or comment have a loop which gobbles such text, rather than suffering the function call overhead which would be occasioned by an immediate return after handling one character or one comment. Because of this simple structure of the interpreter, and general independence of the individual functions from each other, adding a new \TECO{} function just involves writing the \EEL{} code to support it, then installing it in the function dispatch table in \EF{teco\_startup}. \subsection{Additional Support Functions} While \EPSILON{} has most of the basic editing functionality of \EMACS{}, aside from \TECO{}, it lacks several functions which I have come to rely on, and have therefore added. These are not strictly part of the \TECO{} interpreter, although those that access Q-registers obviously need some of that support. In order to make their presence known, it is well to document them here. We use the same format as for the \TECO{} language reference, but omit the returned values and error entries, since \EPSILON{} commands do not normally return anything, and errors simply revert to the top-level \EPSILON{} command loop. \begin{commandtable} \item[after]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a string, then display each matching line followed by the {\em arg} lines after the match. An omitted argument is equivalent to an argument of 0. \item[Relations] \EC{before} and \EC{occur} \item[Key Binding] None. \end{commandentry} \item[before]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a string, then display each matching line preceded by the {\em arg} lines before the match. An omitted argument is equivalent to an argument of 0. \item[Relations] \EC{after} and \EC{occur} \item[Key Binding] None. \end{commandentry} \item[blank-trim-lines]\NEWLINE{} \begin{commandentry} \item[Action] Trim trailing blanks (spaces and tabs) in the entire buffer. The number of characters deleted is printed in the echo area. \item[Key Binding] None. \end{commandentry} \item[check-line-length]\NEWLINE{} \begin{commandentry} \item[Action] Find the next line in the buffer longer than {\em arg} characters and position \POINT{} after it. An omitted argument is equivalent to an argument of 72. \item[Key Binding] None. \end{commandentry} \item[execute-minibuffer]\NEWLINE{} \begin{commandentry} \item[Action] Execute the \TECO{} code in the minibuffer. \item[Relations] \EC{minibuffer} and \EC{step-minibuffer}. \item[Key Binding] \T{C-X} \A{ESC} \end{commandentry} \item[get-q-register]\NEWLINE{} \begin{commandentry} \item[Action] Get the contents of a Q-register into the edit buffer at \POINT{}. The Q-register name is prompted for from the keyboard. \item[Relations] \EC{put-q-register} and \EC{view-q-register} \item[Key Binding] \T{C-X G}, \T{C-X g} \end{commandentry} \item[how-many]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a string, then display a count of how many lines in the buffer (starting from \POINT{}) contain that string. With an argument, the string is a regular-expression instead of a normal search string. \item[Relations] \EC{occur} \item[Key Binding] None. \end{commandentry} \item[insert-file]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a filename, then read the file into the current buffer at \POINT{}. \PPOINT{} is left at the beginning of the inserted text, and mark at the end. The insertion may be undone either by killing the region, or by \EC{undo}. \item[Key Binding] None. \end{commandentry} \item[mark-whole-buffer]\NEWLINE{} \begin{commandentry} \item[Action] Put {\em mark} at the end of the buffer, and \POINT{} at the beginning. \item[Key Binding] \T{C-X H}, \T{C-X h} \end{commandentry} \item[minibuffer]\NEWLINE{} \begin{commandentry} \item[Action] Create a minibuffer if necessary, giving it the fixed name \T{*MINIBUFFER*}, select it, set \TECO{} editing mode, then call \EC{recursive-edit} to allow you to edit text in the buffer. On exit from the recursive edit with \EC{exit-level} on \T{C-X C-Z}, it checks the last two characters in the minibuffer to see if they are \A{ESC}, which requests that \EC{step-minibuffer} should be called before returning, then reselects the original buffer, and calls \EC{step-minibuffer} if wanted. \item[Relations] \EC{execute-minibuffer} and \EC{step-minibuffer} \item[Key Binding] \T{A-}\A{ESC} \end{commandentry} \item[occur]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a search string, then display {\em arg} lines before and after each line in the buffer which matches that string. An omitted argument is equivalent to 0. \item[Relations] \EC{after}, \EC{before}, and \EC{how-many} \item[Key Binding] None. \end{commandentry} \item[put-q-register]\NEWLINE{} \begin{commandentry} \item[Action] Copy the text between {\em mark} and \POINT{} into a Q-register. The Q-register name is prompted for from the keyboard. \item[Relations] \EC{get-q-register} and \EC{view-q-register} \item[Key Binding] \T{C-X X}, \T{C-X x} \end{commandentry} \item[set-pop-mark]\NEWLINE{} \begin{commandentry} \item[Action] \EPSILON{} maintains only a single {\em mark}, which is inadequate. This function maintains a ring of marks unique to each edit buffer; the ring size is set at compile time to the value in the header file, \T{teco.h},\INDEX{\T{teco.h}}{teco.h} currently 12. Without an argument, this function sets the mark at \POINT{}, and pushes it onto the ring. With an argument, it pops the ring into \POINT{}. A ring instead of a stack is used so that underflow and overflow is impossible, and repeated pops will finally bring \POINT{} back to where it started, preserving all the marks on the ring. \item[Key Binding] \CTL{@} (this replaces the binding to the \EPSILON{} \EC{set-mark} function). \end{commandentry} \item[set-teco-trace]\NEWLINE{} \begin{commandentry} \item[Action] Without an argument, this function toggles the \TECO{} trace flag. An explicit argument sets it (zero $\equiv$ off, non-zero $\equiv$ on). The trace flag gives additional output when \TECO{} is single stepped. \item[Relations] \EC{step-minibuffer} \item[Key Binding] None. \end{commandentry} \item[show-buffer]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a buffer name and then display it. \item[Key Binding] None. \end{commandentry} \item[show-columns]\NEWLINE{} \begin{commandentry} \item[Action] Type out a column number display. \item[Key Binding] \T{M-=} \end{commandentry} \item[step-minibuffer]\NEWLINE{} \begin{commandentry} \item[Action] Initiate single stepping of the \TECO{} interpretation of the minibuffer. For details, see the earlier section on the debugger. \item[Relations] \EC{execute-minibuffer}, \EC{minibuffer}, and \EC{set-teco-trace} \item[Key Binding] None. \end{commandentry} \item[teco-mode]\NEWLINE{} \begin{commandentry} \item[Action] Set up for editing \TECO{}. At present, the only action is to rebind \A{ESC} to \EC{normal-character}, since that character is often needed. It is invoked automatically when a file with extension \T{.TEC} is edited. \item[Key Binding] None. \end{commandentry} \item[undo]\NEWLINE{} \begin{commandentry} \item[Action] Undo the last function which ran the \T{save-for-undo} command. The latter makes a copy of the current buffer in an undo buffer named \T{*UNDO*}, and \EC{undo} exchanges the two buffers. Without an argument, it asks for confirmation. The exchange can be undone by another call to \EC{undo}. Unfortunately, no \EPSILON{} functions call this one, only some of the functions described here. The convention in \EMACS{} is that all functions which make radical changes to the buffer run \EC{save-for-undo} before they begin. \EMACS{} saves only the changed region of the buffer, which is fast. This version saves the entire buffer, which may be slow if the file is big. This could initiate swapping to disk, but thanks to \EPSILON{}'s design, this only forces you to wait momentarily; you cannot run out of memory. \item[Key Binding] None. \end{commandentry} \item[view-minibuffer]\NEWLINE{} \begin{commandentry} \item[Action] This function runs \EC{show-buffer} on the minibuffer. \item[Key Binding] None. \end{commandentry} \item[view-q-register]\NEWLINE{} \begin{commandentry} \item[Action] Prompt for a Q-register name, then display its contents in the echo area. If it contains a long string, only the first 50 characters are shown. If it contains a number, it is displayed in decimal, octal, and hexadecimal. \item[Relations] \EC{get-q-register} and \EC{put-q-register} \item[Key Binding] \T{C-A-Q}, \T{C-A-q}, \T{C-A-Y}, \T{C-A-y} \end{commandentry} \end{commandtable} \subsection{Epsilon Limitations} \EPSILON{} has several limitations which restrict the degree of closeness of this implementation of \TECO{} to \ETECO{}. Here is a list of some of them. \begin{itemize} \item There is no bounded search function. This is a serious lack, because its simulation is necessarily inefficient. \item Commands cannot take string arguments which replace the string value they normally obtain from the keyboard. This too is a serious restriction. If each command which prompts for input then called a normal function to do the actual work, the problem could be eliminated. \item There is a function, \EF{try\_calling}, which can be used to call a function defined by a string argument, but there is no way to test if the call was successful, or if the function even exists. There is no analogous routine to lookup the value of a global variable, or test for its existence. This makes it impossible for \TECO{} programs to create variables dynamically, and it is impossible to implement a function like the \EMACS{} \EC{Edit Options} command which displays in a temporary buffer all variables matching a specified string pattern, allowing their values to be edited and modified. \EPSILON{} is clearly capable of this, since it supports dynamic loading and binding of functions and variables; the primitives are just not available to the programmer in \EEL{}. \item There is a source token length limit in \EEL{} of about 250 characters, which is much too small for string tokens which can legitimately span several lines---an example is the debugger help message. \item \EPSILON{} has no concept of a narrowed region, which is a very valuable feature of \EMACS{}. It is used to limit the scope of editing to a portion of the buffer, such as a replacement of a string that occurs dozens of times but for which replacement must happen only in part of the buffer. The only way to do this at present is to move the region to a new buffer, perform the edits, and move the region back. \item There is no traceback display in the \EEL{} debugger, so you cannot tell where you came from. \item It is not possible to display values of variables in the \EEL{} debugger. \item The \EF{say} and \EF{sayput} primitives type only in the one-line echo area. This drastically limits typeout. If you want to see a larger amount of text with {\em more} processing (pause at end of screen), you have to put it in a temporary buffer and then run \EC{show-buffer} on it. For some purposes, this is satisfactory, but for others, it is not. For example, \EC{occur} can potentially display a lot of text, but in this implementation, it must find {\em all} of that text first before the user gets to see any of it. In \EMACS{}, it is possible to abort the typeout at any time as the text is being found. I would very much like to see a set of \EEL{} functions, \EF{start\_more\_display}, \EF{more\_display}, and \EF{end\_more\_display} to handle this. The first and last would not require arguments; \EF{more\_display} would be called with a text string to be displayed. \item Although \EPSILON{} search and display functions are very fast, insertion and deletion are not. There is an objectionable delay if you do something like \T{C-U 80 *} to insert 80 asterisks in the buffer. A command like the \EMACS{} \EC{Flush Lines}, which deletes all lines in the buffer which contain a match with its string argument, when implemented in \EEL{} or \TECO{} can be {\em very} slow. This clearly is evidence of the \EPSILON{} iteration count handling (the \EC{normal-character} command gets called 80 times, instead of a string of 80 characters being created and inserted in one operation), and of the text buffer implementation, which is probably a contiguous block of text which has to be moved for every insertion or deletion. \ETECO{} uses this organization too, but maintains a gap where insertion is currently happening, so that buffer movement is minimized. The \TWENTY{} is fast enough that massive deletions in an edit buffer of about 1 Mbyte do not noticeably slow performance; the \IBMPC{} is not. Several other \EMACS{} implementations have used linked lists of lines which makes insertion and deletion fast, and allows an easy implementation of a function to undo changes of single lines. The linked list implementation does suffer a startup overhead for the list construction, which can be very noticeable on large files. I have experienced startup delays of several {\em minutes} on a {\sc vax-\small750} with {\sc cca} \EMACS{} on editing a multi-megabyte file. \item Richard Stallman in his article about \EMACS{} which was cited in the introduction makes a point about the value of dynamic variable binding in addition to argument binding. He gives the example of three functions A, B, and C, where A calls B and B calls C. If A dynamically creates a new variable, C can access it without any changes being required in B or its arguments. If the variable already existed globally, or in some calling ancestor of A, the new value would hide the old one, until A exited. With \EEL{}, we can achieve part of this---A can create a global or buffer-specific variable which B is ignorant of and C can find, but it cannot hide any previously-existing variable. The \C{} language allows three levels of variable encapsulation---local to a function, local to all functions in a source file, and global to all functions. \EEL{} has only the first and third of these. It appears difficult to introduce any kind of dynamic scoping in a \C{}-like language; indeed, dynamic scoping has been largely deprecated in {\sc common} \LISP{} in favor of lexical scoping. Nevertheless, there are certainly occasions where it will be missed. \end{itemize}