FS(5)		    UNIX Programmer's Manual		    FS(5)


NAME
     fs, inode - format of file system volume (2BSD)

SYNOPSIS
     #include <sys/types.h>
     #include <sys/fs.h>
     #include <sys/inode.h>

DESCRIPTION
     Every file system storage volume (e.g. disk) has a common
     format for certain vital information.  Every such volume is
     divided into a certain number of blocks.  The block size is
     DEV_BSIZE bytes; specified in <sys/param.h> - currently
     1024.

     Each disk drive contains some number of file systems each
     laid out on a contiguous partition of the disk.  A file sys-
     tem consists of a boot block, followed by a super block,
     followed by an inode area, followed by a data block area
     which takes up the remainder of the disk partition.  The
     layout of the super block as defined in <sys/fs.h> is:

     #define MAXMNTLEN 12

     /*
      * Structure of the super-block
      */
     struct fs
     {
	    u_short fs_isize;		 /* first block after i-list */
	    daddr_t fs_fsize;		 /* size in blocks of entire volume */
	    short   fs_nfree;		 /* number of addresses in fs_free */
	    daddr_t fs_free[NICFREE];	 /* free block list */
	    short   fs_ninode;		 /* number of inodes in fs_inode */
	    ino_t   fs_inode[NICINOD];	 /* free inode list */
	    char    fs_flock;		 /* lock during free list manipulation */
	    char    fs_ilock;		 /* lock during i-list manipulation */
	    char    fs_fmod;		 /* super block modified flag */
	    char    fs_ronly;		 /* mounted read-only flag */
	    time_t  fs_time;		 /* last super block update */
	    daddr_t fs_tfree;		 /* total free blocks */
	    ino_t   fs_tinode;		 /* total free inodes */
	    short   fs_step;		 /* optimal step in free list pattern */
	    short   fs_cyl;		 /* number of blocks per pattern */
	    char    fs_fsmnt[MAXMNTLEN]; /* ordinary file mounted on */
	    ino_t   fs_lasti;		 /* start place for circular search */
	    ino_t   fs_nbehind;          /* est # free inodes before s_lasti */
	    u_short fs_flags;		 /* mount time flags */
     };

     File system: A file system is described by its super-block.
     Block 0 of each file system partition is unused and is


Printed 11/26/99	January 27, 1996			1


FS(5)		    UNIX Programmer's Manual		    FS(5)


     available to contain a bootstrap program, pack label, or
     other information.  Block 1 (SUPERB) is the super block.
     The inode area starts immediately after the super-block, in
     block 2.  Fs_isize is the address of the first block after
     the inode area.  Thus the inode area is fs_isize-2 blocks
     long.  Fs_fsize is the address of the first block not poten-
     tially available for allocation to a file.  Thus the data
     block area is fs_fsize - fs_isize blocks long.

     Super block: The path name on which the file system is
     mounted is maintained in fs_fsmnt.  Fs_flock, fs_ilock,
     fs_fmod, fs_ronly and fs_flags are flags maintained in the
     in core copy of the super block while its file system is
     mounted and their values on disk are immaterial.  Fs_fmod is
     used as a flag to indicate that the super-block has changed
     and should be copied to the disk during the next periodic
     update of file system information.  Fs_ronly is a write-
     protection indicator.  It is a copy of the mount flags
     fs_flags anded with MNT_RDONLY(see/sys/h/mount.h).

     Fs_time is the last time the super-block of the file system
     was changed.  During a reboot, the fs_time of the super-
     block for the root file system is used to set the system's
     idea of the time.

     Inode: The inode is the focus of all file activity in the
     UNIX file system.	There is a unique inode allocated for
     each active file, each current directory, each mounted-on
     file, text file, and the root.  An inode is `named' by its
     device/i-number pair.

     Inodes are 64 bytes long, so 16 of them fit into a block if
     DEV_BSIZE is 1024.  The root inode is the root of the file
     system.  Inode 0 can't be used for normal purposes and his-
     torically bad blocks were linked to inode 1, thus the root
     inode is 2 (inode 1 is no longer used for this purpose, how-
     ever numerous dump tapes make this assumption, so we are
     stuck with it).  No other i-number has a built-in meaning.

     The format of an inode as given in <sys/inode.h> is:

     /*
      * Inode structure as it appears on
      * a disk block.
      */
     struct dinode {
	    u_short di_mode;	 /* mode and type of file */
	    short   di_nlink;	 /* number of links to file */
	    uid_t   di_uid;	 /* owner's user id */
	    gid_t   di_gid;	 /* owner's group id */
	    off_t   di_size;	 /* number of bytes in file */
	    daddr_t di_addr[7];  /* 7 block addresses 4 bytes each */


Printed 11/26/99	January 27, 1996			2


FS(5)		    UNIX Programmer's Manual		    FS(5)


	    u_short di_reserved[5];/* pad of 10 to make total size 64 */
	    u_short di_flags;
	    time_t  di_atime;	 /* time last accessed */
	    time_t  di_mtime;	 /* time last modified */
	    time_t  di_ctime;	 /* time created */
     };

     /*
      * 28 of the di_addr address bytes are used; 7 addresses of 4
      * bytes each: 4 direct (4Kb directly accessible) and 3 indirect.
      */
     #define NADDR  7

     /* modes */

     #define IFMT   0170000 /* type of file */
     #define IFCHR  0020000 /* character special */
     #define IFDIR  0040000 /* directory */
     #define IFBLK  0060000 /* block special */
     #define IFREG  0100000 /* regular */
     #define IFLNK  0120000 /* symbolic link */
     #define IFSOCK 0140000 /* socket */
     #define ISUID  04000   /* set user id on execution */
     #define ISGID  02000   /* set group id on execution */
     #define ISVTX  01000   /* save swapped text even after use */
     #define IREAD  0400    /* read, write, execute permissions */
     #define IWRITE 0200
     #define IEXEC  0100

     Di_mode identifies the type of file the inode represents; it
     is encoded identically to the st_mode field of stat(2).
     Di_nlink is the number of directory entries (links) that
     refer to this inode.  Di_uid and di_gid are the owner's user
     and group IDs.  Di_size is the number of bytes in the file.
     Di_atime and di_mtime are the times of last access and
     modification of the file contents (read, write or create);
     Di_ctime records the time of last modification to the inode
     or to the file, and is used to determine whether it should
     be dumped by dump(8).

     Special files are recognized by their modes.  A block-type
     special file is one which can potentially be mounted as a
     file system; a character-type special file cannot, though it
     is not necessarily character-oriented.  For special files,
     the first two bytes of the di_addr field are occupied by the
     device code (see types(5)).  The device codes of block and
     character special files overlap.

     Disk addresses of plain files and directories are kept in
     the array di_addr. For a DEV_BSIZE of 1K bytes, 7 addresses
     are kept in di_addr using 28 of the 40 bytes.  The first 4
     addresses specify device blocks directly.	The last 3


Printed 11/26/99	January 27, 1996			3


FS(5)		    UNIX Programmer's Manual		    FS(5)


     addresses are singly, doubly and triply indirect and point
     to blocks containing 256 further block pointers.  There are
     3 block addresses reserved as a pad to bring the total size
     of an inode to 64 bytes.  All block addresses are of type
     daddr_t (see types(5)).

     For block b in a file to exist, it is not necessary that all
     blocks less than b exist.	A zero block number indicates
     that the corresponding block has never been allocated.  Such
     a missing block reads as if it contained all zero bytes.

     Free block list: The free data block list for each volume is
     maintained as follows.  Fs_free[1], ... ,
     fs_free[fs_nfree-1], contain up to NICFREE free block
     numbers (NICFREE is a configuration constant defined in
     <sys/param.h>).  Fs_free[0] is the block address of the head
     of a chain of blocks constituting the free list.  The layout
     of each block of the free chain as defined in <sys/fs.h> is:

     struct fblk
     {
	    short   df_nfree;	      /* number of addresses in df_free */
	    daddr_t df_free[NICFREE]; /* free block list */
     };

     The fields df_nfree and df_free in a free block are used
     exactly like fs_nfree and fs_free in the super block.

     The algorithm used to allocate a block is:  decrement
     fs_nfree, and the new block number is fs_free[fs_nfree]. If
     the new block address is 0, there are no blocks left, so
     give an error.  If fs_nfree became 0, read the new block
     into fs_nfree and fs_free.

     To free a block: check if fs_nfree is NICFREE; if so, copy
     fs_nfree and the fs_free array into the newly freed block,
     write it out, and set fs_nfree to 0.  In any event set
     fs_free[fs_nfree] to the freed block's address and increment
     fs_nfree.

     Fs_isize and fs_fsize are used by the system to check for
     bad block addresses; if an `impossible' block address is
     allocated from or returned to the free list, a diagnostic is
     written on the console.  Moreover, the free array is
     cleared, to prevent further allocation from a presumably
     corrupted free list.

     Fs_step and fs_cyl determine the block interleaving of files
     for fastest access; traditionally these were referred to as
     s_m and s_n respectively.	Fs_step is the distance between
     successive blocks and fs_cyl is the number of blocks before
     the pattern repeats.  A file system's interleaving factors


Printed 11/26/99	January 27, 1996			4


FS(5)		    UNIX Programmer's Manual		    FS(5)


     are determined when it is first created by mkfs(8).  Mkfs
     lays out the initial free list with these parameters and
     fsck(8) can be used to rebuild the free list optimally (and
     assign new interleaving factors if necessary).

     Free inode list: Fs_ninode is the number of free inode
     numbers in the fs_inode array.

     To allocate an inode: if fs_ninode is greater than 0, decre-
     ment it and return fs_inode[fs_ninode]. If it was 0, read
     through the inode area and place the numbers of all free
     inodes (up to NICINOD) into the fs_inode array, then try
     again.  If a search for free inodes is necessary, the search
     will start at the beginning of the inode area if fs_nbehind
     >= 4 x NICINOD, otherwise starting at fs_lasti and continu-
     ing at the beginning of the inode area if NICINOD free
     inodes aren't found when the end of the inode area is
     reached.  When a search completes the i-number of the first
     inode of the last block scanned in the search is left in
     fs_lasti.

     To free an inode, provided fs_ninode is less than NICINODE,
     place its number into fs_inode[fs_ninode] and increment
     fs_ninode. If fs_ninode is already NICINODE, don't bother to
     enter the freed inode into any table (fs_inode is only to
     speed up the allocation process; the information as to
     whether the inode is really free or not is maintained in the
     inode itself).  If the i-number of the freed inode is less
     than fs_lasti increment fs_nbehind.

SEE ALSO
     stat(2), dir(5), types(5), dcheck(8), fsck(8), icheck(8),
     mkfs(8), mount(8)

BUGS
     It isn't the 4BSD fast file system.  The 2BSD file system is
     a direct descendent of the V7 file system and exists little
     changed from that ancestor.  There are many performance
     holes in the file system.

     Some changes from the original V7 file system have resulted
     in better performance: The larger block size (1Kb as opposed
     to the 512 byte block size of V7) cuts the average number of
     system calls necessary to access a file by a factor of two;
     the smaller (in core) inodes allowed by the smaller number
     of direct links kept in inodes saves valuable kernel data
     space allowing the kernel buffer cache to be made larger
     while sacrificing only 1Kb of direct file accessing; and
     starting free inode searches at the position the last search
     ended cuts the time to gather free inodes significantly.


Printed 11/26/99	January 27, 1996			5


FS(5)		    UNIX Programmer's Manual		    FS(5)


     However, the separation of inodes and data blocks into com-
     pletely different areas of the disk, the handling of the
     free list, the lack of any file allocation layout policy
     encouraging locality such as that found in the 4BSD file
     system and the still too small block size often leads to
     extremely poor performance.

     The separation of inodes and data blocks in the file system
     means that to access a file a seek will have to be made to
     the beginning of the disk partition containing the file sys-
     tem followed another to the the actual data blocks of the
     file (often quite distant from the inode area).

     The free list which is laid out at file system creation for
     optimal file block allocation, becomes scrambled over time
     on an active file system.	This process is slowed down by
     the kernel which always frees blocks from unlink'ed or trun-
     cated files in reverse order thereby maintaining strings of
     optimally laid out free blocks in the free list.  Eventu-
     ally, however, since both freed and allocated blocks use the
     head of the free list, it's possible (and quite probable) to
     have most of the free list laid out optimally with the first
     portion totally scrambled.  As a trade off, a file system's
     free list may be rebuilt fairly frequently via icheck -s or
     fsck -s and most blocks allocated will be localized as close
     to the the inode area as possible.  Because of this problem,
     files are sometimes scattered across a file system generat-
     ing an unpleasant amount of disk arm movement.  A nasty
     oscillation also occurs in the free block list when fs_nfree
     hovers around NICFREE and 0 causing the free array to be
     constantly written out and read back in as blocks are freed
     and allocated.

     For a more in depth analysis of the 2BSD file system, its
     shortcomings, and a description of the changes made for the
     4BSD file system see "A Fast File System for UNIX" by M.
     McKusick; W. Joy; S. Leffler; and R. Fabry.


Printed 11/26/99	January 27, 1996			6


 
Generated: 2016-12-26
Generated by man2html V0.25
page hit count: 1650
Valid CSS Valid XHTML 1.0 Strict