.SH Shared Memory .PP On a fast display and processor, X may be performing more than one thousand operations (X requests) per second. If every access to the device requires a system call, the overhead rapidly predominates all other costs. X uses a shared memory structure with the device driver for two purposes: 1) to get mouse and keyboard input and 2) to access the device or write into a memory bitmap. .PP As pointed out before, X is a single threaded server. Since client programs should be able to overlap with the window system as much as possible (remember that you may be running applications on other machines), it is particularly important to send input events to the correct client as soon as possible. It is therefore desirable to test if there is input after each graphic output operation. This test can be performed in only a couple of instructions given shared memory, and would otherwise require either one system call/output operation (to check for new input) or a compromise in how quickly input would be handled. .PP All input events are put into a shared memory circular buffer; since the driver only inserts into the buffer, and X only removes from the buffer, synchronization is easy to provide with separate head and tail indices (presuming a write to shared memory is atomic). .PP Output on the QVSS is directly to a mapped bitmap. In the case of the Vs100, a piece of the UNIBUS\(dg and a shared DMA buffer are statically mapped where both the driver and the X server can access them. .FS \(dg UNIBUS is a trademark of Digital Equipment Corporation. .sp .FE Output requests to the Vs100 are directly formated into this buffer, minimizing copying of data.\(dd .FS \(dd Our thanks go to Phil Karlton, of Digital's Western Research Lab, for the first implementation of this mechanism. .FE This permits the device dependent routines to start I/O transfers without system call overhead (by directly accessing device CSR registers), and avoids UNIBUS map setup overhead that DMA from user space requires. .PP These changes dramatically increased performance and improved interactive feel when implemented, while greatly reducting CPU overhead. Since proper memory sharing primitives are lacking in 4.2BSD, it was implemented by making pages readable and writable in system space, where they are accessible to any process. In theory, any program on the machine could cause a Vs100 implementation to machine check (odd byte access in the UNIBUS space), though in practice it has never happened. None the less, it is the ugliest piece of the current X implementation. We are more willing to allow a server process to access hardware directly than kernel code, as it is much easier to debug user processes than kernel code. .PP The current X implementation uses a TCP stream both locally and remotely, though one could easily use .UX domain sockets for the local case at the cost of a file descriptor. For current applications, the bandwidth limitations (of approximately 1 million bits/second on 780 class processor) is not major, though faster devices (and image processing applications) would probably benefit from implementation of a shared memory path between the X server and client applications. .PP Current shared memory implementations in variants of .UX are not sufficient. Memory sharing primitives should allow appropriately privileged programs to both share memory with other processes and map to both kernel space and I/O space. Shared libraries (available in some versions of .UX ) would also increase the options available to window system designers (see below).