.SH Section 3: Lexical Analysis .PP The user must supply a lexical analyzer which reads the input stream and communicates tokens (with values, if desired) to the parser. The lexical analyzer is an integer valued function called yylex, in both C and Ratfor. The function returns an integer which represents the type of the token. The value to be associated in the parser with that token is assigned to the integer variable yylval. Thus, a lexical analyzer written in C should begin .DS yylex ( ) { extern int yylval; . . . .DE while a lexical analyzer written in Ratfor should begin .DS integer function yylex(yylval) integer yylval . . . .DE .PP Clearly, the parser and the lexical analyzer must agree on the type numbers in order for communication between them to take place. These numbers may be chosen by Yacc, or chosen by the user. In either case, the ``define'' mechanisms of C and Ratfor are used to allow the lexical analyzer to return these numbers symbolically. For example, suppose that the token name DIGIT has been defined in the declarations section of the specification. The relevant portion of the lexical analyzer (in C) might look like: .DS yylex( ) { extern int yylval; int c; . . . c = getchar( ); . . . if( c >= \'0\' && c <= \'9\' ) { yylval = c\-\'0\'; return(DIGIT); } . . . .DE .PP The relevant portion of the Ratfor lexical analyzer might look like: .DS integer function yylex(yylval) integer yylval, digits(10), c . . . data digits(1) / "0" /; data digits(2) / "1" /; . . . data digits(10) / "9" /; . . . # set c to the next input character . . . do i = 1, 10 { if(c .EQ. digits(i)) { yylval = i\-1 yylex = DIGIT return } } . . . .DE .PP In both cases, the intent is to return a token type of DIGIT, and a value equal to the numerical value of the digit. Provided that the lexical analyzer code is placed in the programs section of the specification, the identifier DIGIT will be redefined to be equal to the type number associated with the token name DIGIT. .PP This mechanism leads to clear and easily modified lexical analyzers; the only pitfall is that it makes it important to avoid using any names in the grammar which are reserved or significant in the chosen language; thus, in both C and Ratfor, the use of token names of ``if'' or ``yylex'' will almost certainly cause severe difficulties when the lexical analyzer is compiled. The token name ``error'' is reserved for error handling, and should not be used naively (see Section 5). .PP As mentioned above, the type numbers may be chosen by Yacc or by the user. In the default situation, the numbers are chosen by Yacc. The default type number for a literal character is the numerical value of the character, considered as a 1 byte integer. Other token names are assigned type numbers starting at 257. It is a difficult, machine dependent operation to determine the numerical value of an input character in Ratfor (or Fortran). Thus, the Ratfor user of Yacc will probably wish to set his own type numbers, or not use any literals in his specification. .PP To assign a type number to a token (including literals), the first appearance of the token name or literal .I in the declarations section .R can be immediately followed by a nonnegative integer. This integer is taken to be the type number of the name or literal. Names and literals not defined by this mechanism retain their default definition. It is important that all type numbers be distinct. .PP There is one exception to this situation. For sticky historical reasons, the endmarker must have type number 0. Note that this is not unattractive in C, since the nul character is returned upon end of file; in Ratfor, it makes no sense. This type number cannot be redefined by the user; thus, all lexical analyzers should be prepared to return 0 as a type number upon reaching the end of their input.