.so tmac.ilib .TH RSG 1 "The University of Arizona \- 5/16/83" .SH NAME rsg \- generate random sentences .SH SYNOPSIS \f3rsg\fP [\f3\-l\fI n\fR] [\f3\-l \fIn\fR] [\f3\-t\fR] .SH DESCRIPTION \fIRsg\fR generates randomly selected sentences from a grammar specified by the user. .PP The following options may appear in any order: .IP "\f3\-s\fI n\fR" Set the seed for random generation to \fIn\fR. The default seed is 0. .IP "\f3\-l\fI n\fR" Terminate generation if the number of symbols remaining to be processed exceeds \fIn\fR. There is no default limit. .IP \f3\-t\fR Trace the generation of sentences. Trace output goes to standard error output. .PP \fIRsg\fR works interactively, allowing the user to build, test, modify, and save grammars. Input to \fIrsg\fR consists of various kinds of specifications, which can be intermixed: .PP \fIProductions\fR define nonterminal symbols in a syntax similar to the rewriting rules of BNF with various alternatives consisting of the concatenation of nonterminal and terminal symbols. .PP \fIGeneration specifications\fR cause the generation of a specified number of sentences from the language defined by a given nonterminal symbol. .PP \fIGrammar output specifications\fR cause the definition of a specified nonterminal or the entire current grammar to be written to a given file. .PP \fISource specifications\fR cause subsequent input to be read from a specified file. .PP In addition, any line beginning with \*M#\fR is considered to be a comment, while any line beginning with \*M=\fR causes the rest of that line to be used as a prompt to the user whenever \fIrsg\fR is ready for input (there normally is no prompt). A line consisting of a single \*M=\fR stops prompting. .SH \0\0\0Productions Examples of productions are: .DS ::=|+ ::=|* ::=x|y|z|() .DE Productions may occur in any order. The definition for a nonterminal symbol can be changed by specifying a new production for it. .PP There are a number of special devices to facilitate the definition of grammars, including eight predefined, built-in nonterminal symbols: .nf .sp 1 .ta .5i 1.5i symbol definition .sp .5 \*M < > | \fR newline \*M<>\fR empty string \*M<&lcase>\fR any single lowercase letter \*M<&ucase>\fR any single uppercase letter \*M<&digit>\fR any single digit .sp 1 .fi In addition, if the string between a \*M<\fR and \*M>\fR begins and ends with a single quotation mark, that construction stands for any single character between the quotation marks. For example, .DS <'xyz'> .DE is equivalent to .DS x|y|z .DE Finally, if the name of a nonterminal symbol between the \*M<\fR and \*M>\fR begins with \*M?\fR, the user is queried during generation to supply a string for that nonterminal symbol. For example, in .DS ::=|+| .DE if the third alternative is encountered during generation, the user is asked to provide a string for \*M\fR. .SH \0\0\0Generation Specifications A generation specification consists of a nonterminal symbol followed by a nonnegative integer. An example is .DS 10 .DE which specifies the generation of 10 \*M\fRs. If the integer is omitted, it is assumed to be 1. Generated sentences are written to standard output. .SH \0\0\0Grammar Output Specifications A grammar output specification consists of a nonterminal symbol, followed by \*M\->\fR, followed by a file name. Such a specification causes the current definition of the nonterminal symbol to be written to the given file. If the file is omitted, standard output is assumed. If the nonterminal symbol is omitted, the entire grammar is written out. Thus, .DS \-> .DE causes the entire grammar to be written to standard output. .SH \0\0\0Source Specifications A source specification consists of \*M@\fR followed by a file name. Subsequent input is read from that file. When an end of file is encountered, input reverts to the previous file. Input files can be nested. .SH DIAGNOSTICS Syntactically erroneous input lines are noted, but ignored. .PP Specifications for a file that cannot be opened are noted and treated as erroneous. .PP If an undefined nonterminal symbol is encountered during generation, an error message that identifies the undefined symbol is produced, followed by the partial sentence generated to that point. Exceeding the limit of symbols remaining to be generated as specified by the \f3\-l\fR option is handled in similarly. .SH CAVEATS Generation may fail to terminate because of a loop in the rewriting rules or, more seriously, because of the progressive accumulation of nonterminal symbols. The latter problem can be identified by using the \f3\-t\fR option and controlled by using the \f3\-l\fR option. The problem often can be circumvented by duplicating alternatives that lead to fewer rather than more nonterminal symbols. For example, changing .DS ::=|+ .DE to .DS ::=||+ .DE increases the probability of selecting \*M\fR from 1/2 to 2/3. See the second reference listed below for a discussion of the general problem. .SH SEE ALSO .Ib pp. 211-219, 301-302. .PP Wetherell, C. S. ``Probabilistic Languages: A Review and Some Open Questions'', \fIComputer Surveys\fR, Vol. 12, No. 4 (1980), pp. 361-379. .SH AUTHOR Ralph E. Griswold