################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the USENIX Microkernels and Other Kernel Architectures Symposium San Diego, California, September 20-23, 1993 For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org From V to Vanguard: The Evolution of a Distributed, Object-Oriented Microkernel Interface Ross S. Finlayson*, Mark D. Hennecke, Steven L. Goldberg Apple Computer, Inc. Abstract The Vanguard operating system kernel was designed and implemented as a research testbed for distributed applications and higher-level operating system services. Using the design of the V-System as a starting point, we developed an extensible set of operating system services, organized in an object type hierarchy. We also implemented a modular OS (micro)kernel that implements these services. An important part of any microkernel design is its exported interface, as the design of this interface affects the ease with which programmers can develop higher-level operating system services on top of the kernel. In this paper we describe several notable features of the Vanguard microkernel interfacein particular, its process and object model, its object identification scheme, and its use of group communication. We show how these features lead to a simple yet powerful interface that avoids the need to provide an excessive number of operations. 1. Introduction An important part of any microkernel design is the interface that the microkernel exports to higher-level OS services. In fact, this interface can be said to define the microkernel itself. A microkernel interface must be sufficiently rich to easily support all possible higher-level services that could be built on top of it, including value-added OS services (such as file systems), application program interfaces and runtime libraries, and emulation of other OS interfaces. At the same time, the interface should not hinder programmers by being too complex or inconsistent. Fortunately, compared to application programming interfaces, microkernel interfaces are usually less constrained by backwards compatibility concerns, and this gives microkernel architects a valuable opportunity to produce well-designed interfaces. Modern operating system interfaces are frequently based upon the object model [7], which is well suited for kernelized OSs. Because the implementation of a kernel service is hidden from its interface, clients of the service are oblivious to whether the service's implementation exists inside or outside the kernel (or even on a remote node). Inheritance and polymorphism further simplify the interface (by reducing the total number of operations), and allow higher-level OS software to seamlessly extend the interface, if desired. In this paper we describe several key features of the (object-oriented) interface to the Vanguard operating system (micro)kernel. Vanguard [5] was designed as a research testbed for distributed applications and higher-level operating system services, and has been implemented both on raw hardware (Motorola 88100- based coprocessor boards) and hosted on top of other operating systems (Macintosh OS and Unix). Vanguard's OS interface is intended to be used by one or more levels of system software above the OS kernel, although not necessarily by application code, which may wish to use existing APIs. Vanguard's design was heavily influenced by that of the V-System [3]; this paper concentrates on those features of Vanguard that either did not exist in the V-System, or are extensions of similar features in V. 2. Processes, Invocation, and Objects Like the V-System, Vanguard is defined by a suite of request and response messages. The units of control structuring and concurrency are lightweight threads (called processes), which communicate using a synchronous "Send-Receive-Reply" IPC model. In V the object of a "Send" operation was a server process id. To identify a particular entity or service (e.g., a window or an open file) that is implemented by this process, request messages would typically contain a separate local id, interpreted by the server, that identifies this entity. This style of request messagedenoting a distinct local object managed by the server processwas so common in V that in Vanguard we chose to make it a fundamental part of the IPC model. In Vanguard the object of each "Send" invocation is a (128-bit) object id. Although client processes treat object ids as opaque, they really consist of two 64-bit portions: a server process id, and a server-relative local id. The kernel uses the process id to route each message, delivering it to the designated process if it is local, otherwise delivering it to a separate transport server (similar to Mach's "netserver" [1]). The transport server then delivers the message to the appropriate remote kernel, using a transport protocol suited to the interconnect. (On a local area network we use the same VMTP protocol [2] used by the V-System.) The local id portion of the id is not used for message routinginstead, the allocation, interpretation and use of the local id is the responsibility of the server processes. An object id is valid as long as the server process designated by its process id remains running. A rebooting kernel does not attempt to resurrect servers with their old ids. This policy is appropriate for a microkernel object system, as opposed to a more application-oriented object system such as CORBA [8], where object references can persist across server shutdowns. In Vanguard, longer-lived references, when desired, can be created using a higher-level naming mechanism (described below). Object ids identify all operating system services, including those services that are exported by the kernel itselffor example devices and address spaces. Processes are also objects, managed by a kernel process server. (Thus, each process server is conceptually managed by itself, which terminates the recursion.) As an object, a process can be operated on just like any other object, by sending a message to its manager, the process server. For example, a file f managed by server s can be deleted by sending a "delete" message to process s, invoking the object (i.e., file) whose id is (s, f). In the same way, the process s can be destroyed by sending a "delete" message to the process server p, invoking the object (i.e., process) whose id is (p, s). 3. The Object Type Hierarchy To reduce the complexity of the Vanguard interface, objects are classified in an abstract type hierarchy. Each type defines a set of operations (with corresponding opcodes, request messages and response messages) that are applicable to objects of this type, and inherited by subtypes. For example, the "delete" operation described earlier is one of the operations defined on the base type, "Vanguard Object". (The V- System's interface, in contrast, had separate "delete" messages for each kind of object.) Subtrees of the type hierarchy include "I/O objects" (files, disks, consoles, networks and other devices) and "character-string-named objects" (described below). The Vanguard interface, being message-based, is programming language independentthere is no fundamental relationship between Vanguard objects and the lighter-weight objects defined by object- oriented programming languages. This allows clients and servers to be written in different programming languages, including non-object-oriented languages, as long as they conform to the Vanguard protocols. However, if clients are programmed in an object-oriented language, then it is natural to use programming language objects as message stubs encapsulating Vanguard objects. Similarly, programming language objects can be used in server implementations. At present, these client and server stubs are written by hand, but in principle they could also be generated automatically from specifications written in a common "interface description language", as is done in many other systems. Vanguard currently has client and server bindings for C++, and client bindings for Common Lisp. (Additional details of Vanguard's C++ language binding can be found in [6].) 4. Object Groups One of the more noteworthy features of the V-System was its notion of a process group. A process id could be used as a group id, representing an arbitrary number of processes. A message sent to a process group would be delivered (1-reliably) to all members of the group. The process group mechanism takes advantage of multicast, if this is supported by the underlying network or interconnect. A process groups can be used in two possible ways: for "resource location" or "multi-destination delivery". In the first case, the group id provides a level of indirection: a new process can join an already-known group, and can be reached by sending to the group id. In the second case, the group id represents a group of several processes that can be notified collectively using a single message. Vanguard extends this concept to that of an object group. An object group id is identical to a regular object id, except that either the "process id" portion or the "local id" portion can denote multiple members. Thus, an object group can be managed by an arbitrary number of server processes, each of which may in turn implement an arbitrary number of local objects. (The "process server" described earlier is an example of an object group, because there are really several individual process servers, one for each kernel in the system.) When an object group is invoked, the invocation's request message is delivered (1-reliably) to each server process, which in turn delivers the message to each local object (if any). For each delivery, a server process can choose either to return a response, or not to respond (or more precisely, to discard the request). (Multiple responses from the same server are aggregated into a single response message.) The client process can also specify how many responses it desires. The default value is 1, meaning that the client remains blocked until the first response returns, or until the kernel(s) can determine that no responses will return. A value of 0 indicates that the invocation is a "best efforts" datagram send, with no guarantee of delivery. Object group membership is completely decentralized, and group "join" and "leave" operations are handled efficiently. In particular, a request to join object o to a group is sent directly to o's kernel, and a network multicast occurs only if the group was previously unknown to this kernel. Object groups provide a convenient way of operating on collections of objects, without having to complicate the interface by introducing an additional set of opcodes and message formats solely for this purpose. For example, each directory in Vanguard's character-string name space (see below) has an associated object group representing the members of the directory. Any operation that can be applied to any individual member of the directory can also be applied to all members of the directory, simply by invoking its object group instead. 5. Decentralized Character-String Naming Along with a low-level identification mechanism (such as the object id scheme described earlier), it is also useful for an operating system to have an additional, higher-level naming mechanism that allows more permanent, human-understandable names to be assigned to certain system objects. This kind of naming mechanism is typically provided by the file systemi.e., above the kernel level in most microkernel-based designs. However, such a naming mechanism, if efficiently implemented, can also be usefully applied inside the kernel. In particular, the ability to give important kernel services (such as devices) character string names can make the kernel easier to configure, maintain and debug. Vanguard uses the same naming mechanism as the V-System: decentralized naming [4]. To review: The main idea of decentralized naming is that the system has no (physically or logically) centralized name servers. Instead, each server that wishes to enter an object into the name space must also be prepared to handle "lookup" operations on this object. A directory in the name space may be distributed; a name in such a directory is resolved by multicasting a "lookup" operation to all servers that implement parts of this directoryi.e., by invoking the object group for the directory. Only the server that actually implements this name will return a response; the rest will simply discard the request. To reduce the number of multicast messages, the V-System allowed clients to maintain a cache of mappings from hierarchical name prefixes to server pids, and would consult this cache before resorting to multicast. Vanguard's implementation of decentralized naming does not currently include client caching, but it could be implemented by adding it to the client language stub for the "lookup" operationin effect making this stub a "proxy" [9] for the actual "lookup" request message. Decentralized naming allows us to assign character-string names to several of Vanguard's exported kernel objects with little overhead. In particular, there is no separate "name server" process, either inside or outside the kernel. 6. Request Chaining, and High-level Ids Quite frequently, the result of a Vanguard object invocation is also a Vanguard objectperhaps even the same object as the original. Thus, Vanguard response messages contain an optional result object id; a client may then use this id as the target of a subsequent invocation. Unfortunately, the accumulated round-trip delays from a sequence of object invocations may be costly, especially over a network. To alleviate this problem, Vanguards protocols allow a sequence of object invocations to be chained into a single message, in the common case where each invocation in the sequence is applied to the object resulting from the previous invocation. For example, to delete the object whose character string name is "foo" (in directory ), one could combine the two requests "lookup foo" and "delete" into a single message, and use this chained message to invoke the object . A response message would be returned only from the last subrequest ("delete"), and not from the intermediate "lookup". Chaining not only gives a performance benefit, but also simplifies the OS interface by allowing new operations to be created by combining existing operations, rather than inventing (for example) a separate "delete by name" operation. One could imagine extending this idea even further, by allowing requests to be combined in more complex waysfor example by introducing variables, conditional expressions and loops. Such extensions, however, would considerably complicate the implementation of servers. Therefore we have chosen to support only linear chains; these are simple to implement, and are especially useful. An interesting consequence of the chaining mechanism is that a prefix of a chain can be used as an alternative, high-level form of object id. For instance, from the example above, the object id and the request "lookup foo" can be encapsulated together as an opaque, high-level id. This object id (like any other) can be invoked with a request message; this request message will be chained onto the end of the "lookup" request. Thus, the "lookup" operation will be evaluated anew on each invocation. A programming language stub for an object encapsulates an object ideither a simple, low-level object id, or a high-level id. Client code does not know or care either way. High-level ids are used quite frequently in Vanguard, for example in the virtual memory system, where a high-level id may (transparently) represent a subrange of an address space, rather than a whole address space. 7. Summary We have presented a summary of the interface exported by the Vanguard microkernel, showing in particular how a rich operating system interface can be built from a relatively small set of basic operations. We have described three separate techniques that make this possible. First, objects are organized in an inheritance hierarchy. Second, the interface supports object group ids, which appear to clients exactly like regular object ids. This makes it possible to operate on groups of related objects without introducing new operations. Third, chains of dependent operations can be grouped together into single requests. Prefixes of these chains can also be used as alternative, higher-level ids. We believe that these techniques have widespread applicability to many microkernel designs. 8. References 1. Accetta, M. J., R. V. Baron, W. Bolosky, D. B. Golub, R. F. Rashid, A. Tevanian Jr. and M. W. Young. Mach: A New Kernel Foundation for Unix Development. Proceedings of the Summer USENIX Conference. 93-113, 1986. 2. Cheriton, D. R. VMTP: A transport protocol for the next generation of communication systems. SIGCOMM '86 Symposium: Communication Architectures and Protocols. 406-415, 1986. 3. Cheriton, D. R. The V distributed system. Communications of the ACM. 31(3): 314-333, 1988. 4. Cheriton, D. R. and T. P. Mann. Decentralizing a global naming service for improved performance and fault-tolerance. ACM Transactions on Computer Systems. 7(2): 147-183, 1987. 5. Finlayson, R. S., M. D. Hennecke and S. L. Goldberg. Vanguard: A protocol suite and OS kernel for distributed object-oriented environments. IEEE Workshop on Experimental Distributed Systems. 1990. 6. Finlayson, R. S., M. D. Hennecke, S. L. Goldberg, J. L. Coolidge, A. G. Parghi and E. W. Sznyter. Object-Oriented Communication and Structuring in Vanguard. IEEE International Workshop on Object Orientation in Operating Systems. 112-113, 1991. 7. Jones, A. "The object model: A conceptual tool for structuring software." Operating systems: An advanced course. Bayer, Graham and Seegmueller ed. 1979 Springer Verlag. 8. Object Management Group, Inc. The Common Object Request Broker: Architecture and Specification. 1992. 9. Shapiro, M. Structure and encapsulation in distributed systems: The proxy principle. Sixth International Conference on Distributed Computer Systems. 1986. * Author's current affiliation: SunSoft, Inc. (finlayson@eng.sun.com)