Next Index Previous
The proposed work, spanning collaborative and learning applications; distributed operat-ing, storage, and data management systems; path/flow processing; component-based middleware; scalable computing platforms; methodologies for hardware/software co-design, correctness/trust, and UIs; and MEMS-based information devices; touches on virtually all of the emerging themes in computing systems research. There is no effort that we are aware of that proposes to take such a comprehensive approach to constructing the prevasive information utility for the next century. In this section, we only touch on some of the research and industrial efforts that are most relevant to the proposed work.
Several commercial and research efforts have developed OSs for small computers, in-cluding the Apple Newton OS, Windows CE, Palm OS, Geoworks GEOS, Lucent In-ferno, VX Works, QNX, Chorus, Java OS, JINI, Microsoft Smartcard OS, and Scout [Mosberger96]. With the possible exception of the last three, each has target devices roughly comparable to late 80s PC, with similar single-user simplifications, but with a limited display and keyboard, no disk, modem-like communication, and a small, fixed collection of real-time devices. Data exchange with the infrastructure is infrequent and consists primarily of bulk synchronization with a specific file structure. The design fo-cuses on thread management, storage management, window management, and manage-ment of ports and PPP stack. A microkernel approach is frequently adopted as a strategy to allow system components to be assembled for each new consumer device and to pro-vide some protection among the components for reliability.
However, there has been little innovation in the OS design itself, in spite of the radical changes in technology and application (Scout is an exception, but applies its version of the path concept not in the network infrastructure, but to manage fast processing paths inside a single node OS). Strong forces preserve much of current interfaces to reduce the application porting effort. Windows CE provides "140 compatible Windows APIs" to minimize the effort to port allow applications to smaller systems.
There has also been almost no substantial investigation in the research community in the system design principles for emerging tiny devices. Most of the experimental systems simply boot a scaled-down Linux. We do, however, see two important changes in orien-tation in PalmOS and JINI. PalmOS takes the view that applications are inherently split between the device and the infrastructure, with a conduit between and a storage system that is fundamentally oriented around time-stamped records. JINI centers on spontaneous networking, with mobile code used to provide the interface to previous unknown devices, and utilizes a shared tuple space as the storage model and framework for communication.
A notable wide-area, distributed file system is the Andrew File System (AFS) [Spasojevic96]. AFS supports consistent file naming semantics for across a global scale, allows client-side caching of data, and provides security through access control backed by Ker-beros [Steiner88] authentication. AFS is constructed on the traditional client-server model; thus, although it supports replication of servers for availability, it is vulnerable to network partitions and server outages. The Berkeley xFS file system [Anderson95] sup-ports "serverless" operation: all clients cooperate to provide file service. Although xFS is a step toward nomadic data, it is restricted to machines that communicate over a fast, lo-cal-area network and trust each's kernel to enforce security. Mariposa [Stonebraker94] is a wide-area, distributed database that uses a computational economy to trade storage and compute resources between a set of distributed servers.
Our work builds on the CONTROL project at Berkeley, which developed interactive systems for approximate data analysis, mining, and visualization [Hellerstein98]. Existing techniques for data reduction are well known [Barbara97]. [Gibbons98] describes tech-niques for approximate query answers of reduced data. Earlier work on approximate query answering focused on carefully designed semantic hierarchies of data [Read92, Vrbsky93]. These are inappropriate for the less structured environment of streaming sen-sor data. Systems for managing sequence data have also been studied [Seshadri96]. The specific instance of temporal sequences, with a query focus, has been extensively re-searched [e.g., Jensen98]. Data mining over time-series and other sequences is an active area (e.g., [Agrawal95, Das98, Guralnik98, Yi98]). Yet none of these efforts addresses the complete problem of sensor-centric data management constructed on a fluid software paradigm: handling infinite sequences of data in real-time from sensors, accommodating noisy data and generating approximate answers in response, and generating and placing data reduction agents in the infrastructure.
The component/object metaphor has received considerable attention. Microsoft's COM provides a binary standard and query method for a component to discover if another pro-vides an interface it can use. CORBA [OMG93] externalizes this matching process into a broker, allowing client invocation of server objects across heterogeneous systems. JINI [JINI98] specifies how components or devices reveal their interface to others, to form spontaneous networks. Many registry and lookup services exist to facilitate the rendez-vous. However, existing systems do not provide a notion of iterative negotiation, nor even a specification of terms and conditions (beyond the temporal extent a service is to be provided). Nor do they specify how to monitor the performance of a service, nor what to do when that service is not performing as expected. We demand not just polymorphic interfaces and matching, but the negotiation of component roles in a larger system.
Similarly, there are higher-level languages for general communication of service meta-data. For example, Resource Description Language (RDF) [RDF98] is designed to pro-vide an infrastructure to support metadata across many web-based activities. It may be a useful substrate in which to cast the languages we will need to develop for negotiation.
Even for modern embedded systems, hardware and software design has been performed separately. Hardware is developed for standard instruction sets with coprocessors for spe-cial-purpose extensions. Software is developed in assembly language and C. Real-time OSs are developed internally or a standard commercial offering (e.g., Wind River or ISI) is adapted for the task. These technologies are "scaled down" Von Neumann uniproces-sor hardware and software architectures adapted for embedded applications. They leave much to be desired in terms of performance, power, cost, reliability, and adaptability.
Recent attempts have been made to develop tools for hardware/software "codesign," which optimize an initial abstract system specification and map it either to hardware or software. Generally these efforts are without a target application domain, and have shown virtually no progress. Many are restricted to uniprocessor applications and do not model real-time and distributed environments. When a specific application domain and envi-ronment have been targeted from the outset (e.g., distributed automotive control [Cuatto98]), initial results are very promising. By restricting the application domain to refine the development of the implementation, and considering the distributed environ-ment with the parts of the applications themselves, we believe a major breakthrough can be achieved for optimizing the hardware (processing and sensing)/software tradeoff.
Proving complete program correctness is notoriously difficult. Proving realistic programs correct is currently either impossible or, at best, extraordinarily expensive. A recent breakthrough has been achieved by George Necula, who has just joined our Department as an Assistant Professor. Necula developed the first practical method for "proof carrying code" [Necula98a]: software that carries an associated, and in some cases automatically generated, proof of its safety. Necula developed his method with a practical subset of the C language while focusing on safety rather than general correctness. Such methods are essential for realizing our vision of fluid software, and we will certainly build on Ne-cula's technology, including his certifying compiler [Necula98b], in the proposed work.
Model-based user interfaces have been explored in several systems [Sukaviriya93, Szekely93]. We intend to combine these approaches with the more informal methods that interface designers actually use in practice [Landay95].
Several researchers have discussed and evaluated the positive impacts of problem-based learning on educational outcomes [Hadgraft98, Maskell98]. Others have attempted PBL using distance learning [Mackenzie97] and web technologies [Lautenbacher97], but they did not implement tools that allowed them to scale-up class sizes, nor have they inte-grated interesting information appliances into classroom environments. Several groups, at Berkeley and elsewhere, have worked on smart classrooms that contain large electronic whiteboards [Abowd96], note taking tools [Davis99], and video capture [Smith98]. Fi-nally, Microsoft has had considerable success integrating Bayesian learning networks into their help software [Horvitz98]. None of systems have been constructed on top of anything resembling the Information Utility we have proposed here.
Next Index Previous
Randy H. Katz, 17 July 1999, randy@cs.Berkeley.edu