CRASH(8)            UNIX Programmer's Manual             CRASH(8)

NAME
     crash - what happens when the system crashes

DESCRIPTION
     This section explains what happens when the system crashes
     and how to analyze crash dumps.

     When the system crashes voluntarily it prints a message of
     the form ``_p_a_n_i_c: specific panic message'' on the console,
     takes a dump on a mass storage peripheral, and then invokes
     an automatic reboot procedure as described in _r_e_b_o_o_t(8).
     (If auto-reboot is disabled, the system will simply halt at
     this point.) Unless some unexpected inconsistency is encoun-
     tered in the state of the file systems due to hardware or
     software failure the system will then resume multi-user
     operations.  If automatic reboots are not enabled, or if the
     automatic file system check fails, the file systems should
     be checked and repaired with _f_s_c_k(8) before continuing.

     If the system stops or hangs without a panic, it is possible
     to stop it and take a dump of memory before rebooting.  If
     automatic reboot is enabled, a panic can be forced from the
     console, which will allow a dump, automatic reboot and file
     system check.  This is accomplished by halting the CPU,
     loading the PC with 040, and continuing without a reset (use
     continue, not start).  The message ``panic:  forced from
     console'' should print, and the autoreboot will start.  If
     this fails or is not enabled, a dump of the first 248K bytes
     of memory can be made on magtape.  Mount a tape (with write
     ring!), halt the CPU, load address 044, and start (which
     does a reset).  After this completes, halt again and reboot.
     After rebooting, or after an automatic file system check
     fails, check and fix the file systems with _f_s_c_k.  If the
     system will not reboot, a runnable system must be obtained
     from a backup medium after verifying that the hardware is
     functioning normally.  A damaged root file system should be
     patched while running with an alternate root if possible.

     The system has a large number of internal consistency
     checks; if one of these fails, then it will panic with a
     very short message indicating which one failed.

     The most common cause of system failures is hardware
     failure, which can reflect itself in different ways.  Here
     are the most common messages which are encountered, with
     some hints as to causes.  Left unstated in all cases is the
     possibility that hardware or software error produced the
     message in some unexpected way.

     IO err in swap
          The system encountered an error trying to write to the
          swap device or an error in reading information from a

Printed 3/28/83                                                 1

CRASH(8)            UNIX Programmer's Manual             CRASH(8)

          disk drive.  The disk should be fixed or replaced if it
          is broken or unreliable.

     Timeout table overflow
          This really shouldn't be a panic.  If this happens, the
          timeout table should be made larger (NCALL in param.c).

     Out of swap
     Out of swap space
          These really shouldn't be panics but there's no other
          satisfactory solution.  The size of the swap area must
          be increased.  The system attempts to avoid running out
          of swap by refusing to start new processes when short
          of swap space (resulting in ``No more proceses'' mes-
          sages from the shell).

     &remap_area > 0120000
     _end > 0120000
          The kernel detected at boot time that an unacceptable
          portion of its data space extended into the region con-
          trolled by KDSA5.  In the case of the first message,
          the size of the kernel's data segment (excluding the
          file, proc, and text tables) must be decreased.  In the
          latter case, there are two possibilities: if
          &remap_area is not greater than 0120000, the kernel
          must be recompiled without defining the option NOKA5.
          Otherwise, as above, the size of the kernel's data seg-
          ment must be decreased.

     init died
          The system initialization process (process 1) has
          exited.  This is serious, as the system will slowly die
          away or constipate.  Rebooting is the only fix, so the
          system panics.

     Can't exec /etc/init
          This is not a normal panic, as the system does not
          reboot.  This occurs during a bootstrap when the system
          is unable to exec /etc/init.  Either it isn't present
          on the root filesystem, the root filesystem was
          incorrectly set, or /etc/init is not executable (no
          execute permission).

     trap type %o
          An unexpected trap has occurred within the system; the
          trap types are:

     0    bus error
     1    illegal instruction trap
     2    BPT/trace trap
     3    IOT
     4    power fail trap (if autoreboot fails)

Printed 3/28/83                                                 2

CRASH(8)            UNIX Programmer's Manual             CRASH(8)

     5    EMT
     6    recursive system call (TRAP instruction)
     7    programmed interrupt request
     11   protection fault (segmentation violation)
     12   parity trap
     In some of these cases it is possible for octal 020 to be
     added into the trap type; this indicates that the processor
     was in user mode when the trap occurred.

     In addition to the trap type, the system will have printed
     out three (or four) other numbers: _k_a_6, which is the con-
     tents of the segmentation register for the area in which the
     system's stack is kept; _a_p_s, which is the location where the
     hardware stored the program status word during the trap; _p_c,
     which was the system's program counter when it faulted
     (already incremented to the next word); ___o_v_n_o, the overlay
     number from which the trap occurred (this is printed only if
     the kernel is overlaid).

     That completes the list of panic types that are most likely
     to be seen.  There are many other panic messages which are
     less likely to occur; most of them detect logical incon-
     sistencies within the kernel and thus ``cannot happen''
     unless some part of the kernel has been modified.

     _I_n_t_e_r_p_r_e_t_i_n_g _d_u_m_p_s. When the system crashes it writes (or at
     least attempts to write) an image of the current memory into
     the last part of the swap area.  After the system is
     rebooted, the program _s_a_v_e_c_o_r_e(8) runs and preserves a copy
     of this core image and the current system in a specified
     directory for later perusal.  See _s_a_v_e_c_o_r_e(8) for details.
     A magtape dump can be read onto disk with _d_d(1).

     To analyze a dump, begin by running _p_s -_a_l_x_k and/or _p_s_t_a_t -_p
     to print the process table at the time of the crash.  Use
     _a_d_b(1) with the -_k option to examine the core file and to
     get a reverse calling order with the $_c or $_C command.  If
     the mapping or the stack frame are incorrect, the following
     magic locations may be examined in an attempt to find out
     what went wrong.  The registers R0, R1, R2, R3, R4, R5, SP,
     and KDSA6 (or KISA6 for machines without separate instruc-
     tion and data) are saved at location 04.  If the core dump
     was taken on disk, these values also appear at 0300.  The
     value of KDSA6 (KISA6) multiplied by 0100 (8) gives the
     address of the user structure and kernel stack for the run-
     ning process.  Relabel these addresses 0140000 through
     0142000.  R5 is C's frame or display pointer.  Stored at
     (R5) is the old R5 pointing to the previous stack frame.  At
     (R5)+2 is the saved PC of the calling procedure.  Trace this
     calling chain to an R5 value of 0141756 (0141754 for over-
     laid kernels), which is where the user's R5 is stored.  If
     the chain is broken, look for a plausible R5, PC pair and

Printed 3/28/83                                                 3

CRASH(8)            UNIX Programmer's Manual             CRASH(8)

     continue from there.  In most cases this procedure will give
     an idea of what is wrong.  A more complete discussion of
     system debugging is impossible here.

SEE ALSO
     adb(1), ps(1), pstat(1), boot(8), fsck(8), reboot(8),
     savecore(8)

Printed 3/28/83                                                 4