.so tmac.tr .DA "May 13, 1983" .TR 83-3a .GR MCS81-01916 .GE .TL An Overview of the Icon Programming Language .AU Ralph E. Griswold .AE .tr *\(** .NH Introduction .PP Icon is a high-level programming language with extensive facilities for processing strings and lists. Icon has several novel features, including expressions that may produce sequences of results, goal-directed evaluation that automatically searches for a successful result, and string scanning that allows operations on strings to be formulated at a high conceptual level. .PP Icon resembles SNOBOL4 [1] in its emphasis on high-level string processing and a design philosophy that allows ease of programming and short, concise programs. Like SNOBOL4, storage allocation and garbage collection are automatic in Icon, and there are few restrictions on the sizes of objects. Strings, lists, and other structures are created during program execution and their size does not need to be known when a program is written. Values are converted to expected types automatically; for example, numeral strings read in as input can be used in numerical computations without explicit conversion. Whereas SNOBOL4 has a pattern-matching facility that is separate from the rest of the language, string scanning is integrated with the rest of the language facilities in Icon. Unlike SNOBOL4, Icon has an expression-based syntax with reserved words; in appearance, Icon programs resemble those of several other conventional programming languages. .PP Examples of the kinds of problems for which Icon is well suited are: .in .5i .IP \(bu text analysis, editing, and reformatting .IP \(bu document preparation .IP \(bu symbolic mathematics .IP \(bu text generation .IP \(bu program parsing and translation .IP \(bu data laundry .IP \(bu graph manipulation .in 0 .PP Icon is implemented in C [2] and runs under UNIX\u*\d .FS \u*\dUNIX is a trademark of Bell Laboratories. .FE on the PDP-11, VAX-11, and Onyx C8002 computers. Implementations for other computers and operating systems are presently underway. An earlier version of Icon [3] is available on several large-scale computers, including the CRAY-1, DEC-10, IBM 360/370, PRIME 450/550/650, DG MV8000, and CDC Cyber/6000. .PP A brief description of some of the representative features of Icon is given in the following sections. This description is not rigorous and does not include many features of Icon. See [4] for a complete description. .NH Strings .PP Strings of characters may be arbitrarily long, limited only by the architecture of the computer on which Icon is implemented. A string may be specified literally by enclosing it in double quotation marks, as in .Ds greeting := "Hello world" .De which assigns an 11-character string to \*Mgreeting\fR, and .Ds address := "" .De which assigns the zero-length \fIempty\fR string to \*Maddress\fR. The number of characters in a string \*Ms\fR, its size, is given by \*M*s\fR. For example, \*M*greeting\fR is 11 and \*M*address\fR is 0. .PP Icon uses the ASCII character set, extended to 256 characters. There are escape conventions, similar to those of C, for representing characters that cannot be keyboarded. .PP Strings also can be read in and written out, as in .Ds line := read() .De and .Ds write(line) .De Strings can be constructed by concatenation, as in .Ds element := "(" || read() || ")" .De If the concatenation of a number of strings is to be written out, the \*Mwrite\fR function can be used with several arguments to avoid actual concatenation: .Ds write("(",read(),")") .De .PP Substrings can be formed by subscripting strings with range specifications that indicate, by position, the desired range of characters. For example, .Ds middle := line\^[10:20] .De assigns to \*Mmiddle\fR the string of characters of \*Mline\fR between positions 10 and 20. Similarly, .Ds write(line\^[2])\fR .De writes the second character of \*Mline\fR. The value 0 is used to refer to the position after the last character of a string. Thus .Ds write(line\^[2:0]) .De writes the substring of \*Mline\fR from the second character to the end, thus omitting the first character. .PP An assignment can be made to the substring of string-valued variable to change its value. For example, .Ds line[2] := "..." .De replaces the second character of \*Mline\fR by three dots. Note that the size of \*Mline\fR changes automatically. .PP There are many functions for analyzing strings. An example is .Ds find(s1,\*bs2) .De which produces the position in \*Ms2\fR at which \*Ms1\fR occurs as a substring. For example, if the value of \*Mgreeting\fR is as given earlier, .Ds find("or",\*bgreeting) .De produces the value 8. See Section 4.2 for the handling of situations in which \*Ms1\fR does not occur in \*Ms2\fR, or in which it occurs at several different positions. .NH Character Sets .PP While strings are sequences of characters, \fIcsets\fR are sets of characters in which membership rather than order is significant. Csets are represented literally using single enclosing quotation marks, as in .Ds vowels := 'aeiouAEIOU' .De Two useful built-in csets are \*M&lcase\fR and \*M&ucase\fR, which consist of the lowercase and uppercase letters, respectively. Set operations are provided for csets. For example, .Ds letters := &lcase ++ &ucase .De forms the cset union of the lowercase and uppercase letters and assigns the resulting cset to \*Mletters\fR, while .Ds consonants := letters -- 'aeiouAEIOU' .De forms the cset difference of the letters and the vowels and assigns the resulting cset to \*Mconsonants\fR. .PP Csets are useful in situations in which any one of a number of characters is significant. An example is the string analysis function .Ds upto(c,\*bs) .De which produces the position \*Ms\fR at which any character in \*Mc\fR occurs. For example, .Ds upto(vowels,\*bgreeting) .De produces 2. Another string analysis function that uses csets is .Ds many(c,\*bs) .De which produces the position in \*Ms\fR after an initial substring consisting only of characters that occur in \*Ms\fR. An example of the use of \*Mmany\fR is in locating words. Suppose, for example, that a word is defined to consist of a string of letters. The expression .Ds write(line\^[1:many(letters,\*bline)]) .De writes a word at the beginning of \*Mline\fR. Note the use of the position returned by a string analysis function to specify the end of a substring. .NH Expression Evaluation .NH 2 Conditional Expressions .PP In Icon there are \fIconditional expressions\fR that may \fIsucceed\fR and produce a result, or may \fIfail\fR and not produce any result. An example is the comparison operation .Ds i > j .De which succeeds (and produces the value of \*Mj\fR) provided that the value of \*Mi\fR is greater than the value of \*Mj\fR, but fails otherwise. .PP The success or failure of conditional operations is used instead of Boolean values to drive control structures in Icon. An example is .Ds if i > j then k := i else k := j .De which assigns the value of \*Mi\fR to \*Mk\fR if the value of \*Mi\fR is greater than the value of \*Mj\fR, but assigns the value of \*Mj\fR to \*Mk\fR \%otherwise. .PP The usefulness of the concepts of success and failure is illustrated by \*Mfind(s1,\*bs2)\fR, which fails if \*Ms1\fR does not occur as a substring of \*Ms2\fR. Thus .Ds if i := find("or",line) then write(i) .De writes the position at which \*Mor\fR occurs in \*Mline\fR, if it occurs, but does not write a value if it does not occur. .PP Many expressions in Icon are conditional. An example is \*Mread()\fR, which produces the next line from the input file, but fails when the end of the file is reached. The following expression is typical of programming in Icon and illustrates the integration of conditional expressions and conventional control structures: .Ds while line := read() do write(line) .De This expression copies the input file to the output file. .PP If an argument of a function fails, the function is not called, and the function call fails as well. This ``inheritance'' of failure allows the concise formulation of many programming tasks. Omitting the optional \f3do\fR clause in \f3while-do\fR, the previous expression can be rewritten as .Ds while write(read()) .De .NH 2 Generators .PP In some situations, an expression may be capable of producing more than one result. Consider .Ds sentence := "Store it in the neighboring harbor" find("or",\*bsentence) .De Here \*Mor\fR occurs in \*Msentence\fR at positions 3, 23, and 33. Most programming languages treat this situation by selecting one of the positions, such as the first, as the result of the expression. In Icon, such an expression is a \fIgenerator\fR and is capable of producing all three positions. .PP The results that a generator produces depend on context. In a situation where only one result is needed, the first is produced, as in .Ds i := find("or",\*bsentence) .De which assigns the value 3 to \*Mi\fR. .PP If the result produced by a generator does not lead to the success of an enclosing expression, however, the generator is \fIresumed\fR to produce another value. An example is .Ds if (i := find("or",\*bsentence)) > 5 then write(i) .De Here the first result produced by the generator, 3, is assigned to \*Mi\fR, but this value is not greater than 5 and the comparison operation fails. At this point, the generator is resumed and produces the second position, 23, which is greater than 5. The comparison operation then succeeds and the value 23 is written. Because of the inheritance of failure and the fact that comparison operations return the value of their right argument, this expression can be written in the following more compact form: .Ds write(5 < find("or",\*bsentence)) .De .PP Goal-directed evaluation is inherent in the expression evaluation mechanism of Icon and can be used in arbitrarily complicated situations. For example, .Ds find("or",\*bsentence1) = find("and",\*bsentence2) .De succeeds if \*Mor\fR occurs in \*Msentence1\fR at the same position as \*Mand\fR occurs in \*Msentence2\fR. .PP A generator can be resumed repeatedly to produce all its results by using the \f3every-do\fR control structure. An example is .Ds every i := find("or",\*bsentence) do write(i) .De which writes all the positions at which \*Mor\fR occurs in \*Msentence\fR. For the example above, these are 3, 23, and 33. .PP Generation is inherited like failure, and this expression can be written more concisely by omitting the optional \f3do\fR clause: .Ds every write(find("or",\*bsentence)) .De .PP There are several built-in generators in Icon. One of the most frequently used of these is .Ds i to j .De which generates the integers from \*Mi\fR to \*Mj\fR. This generator can be combined with \f3every-do\fR to formulate the traditional \f3for\fR-style control structure: .Ds every k := i to j do f(k) .De Note that this expression can be written more compactly as .Ds every f(i to j) .De .PP There are a number of other control structures related to generation. One is \fIalternation\fR, .Ds \*1 | \*2 .De which generates the results of \*1 followed by the results of \*2. Thus .Ds every write(find("or",\*bsentence1) | find("or",\*bsentence2)) .De writes the positions of \*Mor\fR in \*Msentence1\fR followed by the positions of \*Mor\fR in \*Msentence2\fR. Again, this sentence can be written more compactly by using alternation in the second argument of \*Mfind\fR: .Ds every write(find("or",\*bsentence1 | sentence2)) .De .PP Another use of alternation is illustrated by .Ds (i | j | k) = (0 | 1) .De which succeeds if any of \*Mi\fR, \*Mj\fR, or \*Mk\fR has the value 0 or 1. .NH String Scanning .PP The string analysis and synthesis operations described in Sections 2 and 3 work best for relatively simple operations on strings. For complicated operations, the bookkeeping involved in keeping track of positions in strings becomes burdensome and error prone. In such cases, Icon has a string scanning facility that is analogous in many respects to pattern matching in SNOBOL4. In string scanning, positions are managed automatically and attention is focused on a current position in a string as it is examined by a sequence of operations. .PP The string scanning operation has the form .Ds s ? \*0 .De where \*Ms\fR is the \fIsubject\fR string to be examined and \*0 is an expression that performs the examination. A position in the subject, which starts at 1, is the focus of examination. .PP \fIMatching functions\fR change this position. One matching function, \*Mmove(i)\fR, moves the position by \*Mi\fR and produces the substring of the subject between the previous and new positions. If the position cannot be moved by the specified amount (because the subject is not long enough), \*Mmove(i)\fR fails. A simple example is .Ds line ? while write(move(2)) .De which writes successive two-character substrings of \*Mline\fR, stopping when there are no more characters. .PP Another matching function is \*Mtab(i)\fR, which sets the position in the subject to \*Mi\fR and also returns the substring of the subject between the previous and new positions. For example, .Ds line ? if tab(10) then write(tab(0)) .De first sets the position in the subject to 10 and then to the end of the subject, writing \*Mline\^[10:0]\fR. Note that no value is written if the subject is not long enough. .PP String analysis functions such as \*Mfind\fR can be used in string scanning. In this context, the string that they operate on is not specified and is taken to be the subject. For example, .Ds line ? while write(tab(find("or"))) do move(2) .De writes all the substrings of \*Mline\fR prior to occurrences of \*Mor\fR. Note that \*Mfind\fR produces a position, which is then used by \*Mtab\fR to change the position and produce the desired substring. The \*Mmove(2)\fR skips the \*Mor\fR that is found. .PP Another example of the use of string analysis functions in scanning is .Ds line ? while tab(upto(letters)) do write(tab(many(letters))) .De which writes all the words in \*Mline\fR. .PP As illustrated in the examples above, any expression may occur in the scanning expression. Unlike SNOBOL4, in which the operations that are allowed in pattern matching are limited and idiosyncratic, string scanning is completely integrated with the rest of the operation repertoire of Icon. .NH Structures .NH 2 Lists .PP While strings are sequences of characters, lists in Icon are sequences of values of arbitrary types. Lists are created by enclosing the lists of values in brackets. An example is .Ds car1 := ["buick",\*b"skylark",\*b1978,\*b2450] .De in which the list \*Mcar1\fR has four values, two of which are strings and two of which are integers. Note that the values in a list need not all be of the same type. In fact, any kind of value can occur in a list \(em even another list, as in .Ds inventory := [car1,\*bcar2,\*bcar3,\*bcar4] .De .PP Lists also can be created by .Ds a := list(i,\*bx) .De which creates a list of \*Mi\fR values, each of which has the value \*Mx\fR. .PP The values in a list can be referenced by position much like the characters in a string. Thus .Ds car1\^[4] := 2400 .De changes the last value in \*Mcar1\fR to 2400. A reference that is out of the range of the list fails. For example, .Ds write(car1\^[5]) .De fails. .PP The values in a list \*Ma\fR are generated by \*M!a\fR. Thus .Ds every write(!a) .De writes all the values in \*Ma\fR. .PP Lists can be manipulated like stacks and queues. The function \*Mpush(a,\*bx)\fR adds the value of \*Mx\fR to the left end of the list \*Ma\fR, automatically increasing the size of \*Ma\fR by one. Similarly, \*Mpop(a)\fR removes the leftmost value from \*Ma\fR, automatically decreasing the size of \*Ma\fR by one, and produces the removed value. .PP A list value in Icon is a pointer (reference) to a structure. Assignment of a structure in Icon does not copy the structure itself but only the pointer to it. Thus the result of .Ds demo := car1 .De causes \*Mdemo\fR and \*Mcar1\fR to reference the same list. Graphs with loops can be constructed in this way. For example, .Ds node1 := ["a"] node2 := [node1,\*b"b"] push(node1,\*bnode2) .De constructs a structure that can be pictured as follows: .if \n(Ex .ig .Ds .ta 1.2i .sp 2 node1 a .sp 2 node2 b .sp 2 .De .. .if !\n(Ex .ig .ne 2i .nf .in 1i .ft B .sp 2 .cs B 20 node1 .->a--. | | | | node2 '--b<-' .sp 2 .cs B .in 0 .fi .. .NH 2 Tables .PP Icon has a table data type similar to that of SNOBOL4. Tables essentially are sets of pairs of values, an \fIentry value\fR and a corresponding \fIassigned value\fR. The entry and assigned values may be of any type, and the assigned value for any entry value can be looked up automatically. Thus tables provide a form of associative access in contrast with the positional access to values in lists. .PP A table is created by an expression such as .Ds symbols := table(x) .De which assigns to \*Msymbols\fR a table with the default assigned value \*Mx\fR. Subsequently, \*Msymbols\fR can be referenced by any entry value, such as .Ds symbols\^["there"] := 1 .De which assigns the value 1 to the \*Mthere\fRth entry in symbols. .PP Tables grow automatically as new entry values are added. For example, the following program segment produces a table containing a count of the words that appear in the input file: .Ds words := table(0) while line := read() do line ? while tab(upto(letters)) do words\^[tab(many(letters))] +:= 1 .De Here the default assigned value for each word is 0, as given in \*Mtable(0)\fR, and \*M+:=\fR is an augmented assignment operation that increments the assigned values by one. There are augmented assignment operations for all binary operators. .PP Tables can be converted to lists, so that their entry and assigned values can be accessed by position. This is done by \*Msort(t)\fR, which produces a list of two-element lists from \*Mt\fR, where each two-element list consists of an entry value and its corresponding assigned value. For example, .Ds wordlist := sort(words) every pair := !wordlist do write(pair\^[1],\*b" : ",\*bpair\^[2]) .De writes the words and their counts from \*Mwords\fR. .NH Procedures .PP An Icon program consists of a sequence of procedure declarations. An example of a procedure declaration is .Ds procedure max(i,\*bj) if i > j then return i else return j end .De where the name of the procedure is \*Mmax\fR and its formal parameters are \*Mi\fR and \*Mj\fR. The \f3return\fR expressions return the value of \*Mi\fR or \*Mj\fR, whichever is larger. .PP Procedures are called like built-in functions. Thus .Ds k := max(*s1,\*b*s2) .De assigns to \*Mk\fR the size of the longer of the strings \*Ms1\fR and \*Ms2\fR. .PP A procedure also may suspend instead of returning. In this case, a result is produced as in the case of a return, but the procedure can be resumed to produce other results. An example is the following procedure that generates the words in the input file. .Ds procedure genword() local line, letters, words letters := &lcase ++ &ucase while line := read() do line ? while tab(upto(letters)) do { word := tab(many(letters)) suspend word } end .De The braces enclose a compound expression. .PP Such a generator is used in the same way that a built-in generator is used. For example .Ds every word := genword() do if find("or",\*bword) then write(word) .De writes only those words that contain the substring \*Mor\fR. .NH An Example .PP The following program sorts graphs topologically. .Ds .ta 3.5i .Px procedure main() local sorted, nodes, arcs, roots while nodes := read() do { # get next node list arcs := read() # get arc list sorted := "" # sorted nodes # get nodes without predecessors while *(roots := nodes -- snodes(arcs)) > 0 do { sorted ||:= roots # add to sorted nodes nodes --:= roots # delete these nodes arcs := delarcs(arcs,\*broots) # delete their arcs } if *arcs = 0 then write(sorted) # successfully sorted else write("graph has cycle") # cycle if node remains } end .De .Ds .Px procedure snodes(arcs) local nodes nodes := "" arcs ? while move(1) do { # predecessor move(2) # skip "->" nodes ||:= move(1) # successor move(1) # skip ";" } return nodes end .De .Ds .Px procedure delarcs(arcs,\*broots) local newarcs, node newarcs := "" arcs ? while node := move(1) do { # get predecessor node if many(roots,\*bnode) then move(4) # delete arc from root node else newarcs ||:= node || move(4) # else keep arc } return newarcs end .De Graph nodes are represented by single characters with a list of the nodes on one input line followed by a list of arcs. For example, the graph .if \n(Ex .ig .Ds .ta .75i +.75i +.75i \0 \0 \0 a b c .sp 2 d e .sp 2 .De .. .if !\n(Ex .ig .nf .ft B .cs B 20 .in 1i .ne 2i .sp 2 .---------------. | | | \o'v|' a------>b------>c \o'^|' | \o'^|' | | | | \o'|v' | d------>e-------' .sp 1 .cs B .fi .in 0 .sp 1 .ft R .. is given as .Ds abcde a\*(->b;a\*(->c;b\*(->c;b\*(->e;d\*(->a;d\*(->e;e\*(->c; .De for which the output is .Ds dabec .De .PP The nodes are represented by csets and automatic type conversion is used to convert strings to csets and vice versa. Note the use of augmented assignment operations for concatenation and in the computation of cset differences. .SH Acknowledgement .PP Icon was designed by the the author in collaboration with Dave Hanson, Tim Korb, Cary Coutant, and Steve Wampler. The current implementation is largely the work of Cary Coutant and Steve Wampler with recent contributions by Bill Mitchell. Dave Hanson and Bill Mitchell also made several helpful suggestions on the presentation of material in this paper. .SH References .LP .IP 1. Griswold, Ralph E., Poage, James F., and Polonsky, Ivan P. \fIThe SNOBOL4 Programming Language\fR, second edition. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 1971. .IP 2. Kernighan, Brian W. and Ritchie, Dennis M. \fIThe C Programming Language\fR. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 1978. .IP 3. Griswold, Ralph E. \fIDifferences Between Versions 2 and 5 of Icon\fR, Technical Report TR 83-5, Department of Computer Science, The University of Arizona. 1983. .IP 4. Griswold, Ralph E. and Griswold, Madge T. \fIThe Icon Programming Language\fR. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 1983.