.sp |2.5i .ce 3 .ps 12 .ft G Programming in C _ A Tutorial .ps 10 .sp .ft R Brian W. Kernighan .ft I .sp Bell Laboratories, Murray Hill, N. J. .sp |4.1i .ft R .ps 10 .fi .vs 12p .NH Introduction .PP C is a computer language available on the .UC GCOS and .UC UNIX operating systems at Murray Hill and (in preliminary form) on OS/360 at Holmdel. C lets you write your programs clearly and simply _ it has decent control flow facilities so your code can be read straight down the page, without labels or GOTO's; it lets you write code that is compact without being too cryptic; it encourages modularity and good program organization; and it provides good data-structuring facilities. .PP This memorandum is a tutorial to make learning C as painless as possible. The first part concentrates on the central features of C; the second part discusses those parts of the language which are useful (usually for getting more efficient and smaller code) but which are not necessary for the new user. This is .ul not a reference manual. Details and special cases will be skipped ruthlessly, and no attempt will be made to cover every language feature. The order of presentation is hopefully pedagogical instead of logical. Users who would like the full story should consult the .ul C Reference Manual by D. M. Ritchie [1], which should be read for details anyway. Runtime support is described in [2] and [3]; you will have to read one of these to learn how to compile and run a C program. .PP We will assume that you are familiar with the mysteries of creating files, text editing, and the like in the operating system you run on, and that you have programmed in some language before. .NH A Simple C Program .PP .E1 main(~) { printf("hello, world"); } .E2 .PP A C program consists of one or more .ul functions, which are similar to the functions and subroutines of a Fortran program or the procedures of PL/I, and perhaps some external data definitions. .UL main is such a function, and in fact all C programs must have a .UL main\*. Execution of the program begins at the first statement of .UL main\*. .UL main will usually invoke other functions to perform its job, some coming from the same program, and others from libraries. .PP One method of communicating data between functions is by arguments. The parentheses following the function name surround the argument list; here .UL main is a function of no arguments, indicated by (~). The {} enclose the statements of the function. Individual statements end with a semicolon but are otherwise free-format. .PP .UL printf is a library function which will format and print output on the terminal (unless some other destination is specified). In this case it prints .E1 hello, world .E2 A function is invoked by naming it, followed by a list of arguments in parentheses. There is no .UC CALL statement as in Fortran or .UC PL/I. .NH A Working C Program; Variables; Types and Type Declarations .PP Here's a bigger program that adds three integers and prints their sum. .E1 main(~) { int a, b, c, sum; a = 1; b = 2; c = 3; sum = a + b + c; printf("sum is %d", sum); } .E2 .PP Arithmetic and the assignment statements are much the same as in Fortran (except for the semicolons) or .UC PL/I. The format of C programs is quite free. We can put several statements on a line if we want, or we can split a statement among several lines if it seems desirable. The split may be between any of the operators or variables, but .ul not in the middle of a name or operator. As a matter of style, spaces, tabs, and newlines should be used freely to enhance readability. .PP C has four fundamental .ul types of variables: .DS \fGint\fR integer (PDP-11: 16 bits; H6070: 36 bits; IBM360: 32 bits) \fGchar\fR one byte character (PDP-11, IBM360: 8 bits; H6070: 9 bits) \fGfloat\fR single-precision floating point \fGdouble\fR double-precision floating point .DE There are also .ul arrays and .ul structures of these basic types, .ul pointers to them and .ul functions that return them, all of which we will meet shortly. .PP .ul All variables in a C program must be declared, although this can sometimes be done implicitly by context. Declarations must precede executable statements. The declaration .E1 int a, b, c, sum; .E2 declares .UL a, .UL b, .UL c, and .UL sum to be integers. .PP Variable names have one to eight characters, chosen from A-Z, a-z, 0-9, and \(ul, and start with a non-digit. Stylistically, it's much better to use only a single case and give functions and external variables names that are unique in the first six characters. (Function and external variable names are used by various assemblers, some of which are limited in the size and case of identifiers they can handle.) Furthermore, keywords and library functions may only be recognized in one case. .NH Constants .PP We have already seen decimal integer constants in the previous example _ 1, 2, and 3. Since C is often used for system programming and bit-manipulation, octal numbers are an important part of the language. In C, any number that begins with 0 (zero!) is an octal integer (and hence can't have any 8's or 9's in it). Thus 0777 is an octal constant, with decimal value 511. .PP A ``character'' is one byte (an inherently machine-dependent concept). Most often this is expressed as a .ul character constant, which is one character enclosed in single quotes. However, it may be any quantity that fits in a byte, as in .UL flags below: .E1 char quest, newline, flags; quest = '?'; newline = '\\n'; flags = 077; .E2 .PP The sequence `\\n' is C notation for ``newline character'', which, when printed, skips the terminal to the beginning of the next line. Notice that `\\n' represents only a single character. There are several other ``escapes'' like `\\n' for representing hard-to-get or invisible characters, such as `\\t' for tab, `\\b' for backspace, `\\0' for end of file, and `\\\\' for the backslash itself. .PP .UL float and .UL double constants are discussed in section 26. .NH Simple I/O _ getchar, putchar, printf .PP .E1 main( ) { char c; c = getchar(~); putchar(c); } .E2 .PP .UL getchar and .UL putchar are the basic I/O library functions in C. .UL getchar fetches one character from the standard input (usually the terminal) each time it is called, and returns that character as the value of the function. When it reaches the end of whatever file it is reading, thereafter it returns the character represented by `\\0' (ascii .UC NUL, which has value zero). We will see how to use this very shortly. .PP .UL putchar puts one character out on the standard output (usually the terminal) each time it is called. So the program above reads one character and writes it back out. By itself, this isn't very interesting, but observe that if we put a loop around this, and add a test for end of file, we have a complete program for copying one file to another. .PP .UL printf is a more complicated function for producing formatted output. We will talk about only the simplest use of it. Basically, .UL printf uses its first argument as formatting information, and any successive arguments as variables to be output. Thus .E1 printf ("hello, world\\n"); .E2 is the simplest use _ the string ``hello, world\\n'' is printed out. No formatting information, no variables, so the string is dumped out verbatim. The newline is necessary to put this out on a line by itself. (The construction .E1 "hello, world\\n" .E2 is really an array of .UL chars\*. More about this shortly.) .PP More complicated, if .UL sum is 6, .E1 printf ("sum is %d\\n", sum); .E2 prints .E1 sum is 6 .E2 Within the first argument of .UL printf, the characters ``%d'' signify that the next argument in the argument list is to be printed as a base 10 number. .PP Other useful formatting commands are ``%c'' to print out a single character, ``%s'' to print out an entire string, and ``%o'' to print a number as octal instead of decimal (no leading zero). For example, .E1 n = 511; printf ("What is the value of %d in octal?", n); printf (" %s! %d decimal is %o octal\\n", "Right", n, n); .E2 prints .E1 .fi What is the value of 511 in octal? Right! 511 decimal is 777 octal .E2 Notice that there is no newline at the end of the first output line. Successive calls to .UL printf (and/or .UL putchar, for that matter) simply put out characters. No newlines are printed unless you ask for them. Similarly, on input, characters are read one at a time as you ask for them. Each line is generally terminated by a newline (\\n), but there is otherwise no concept of record.