------------------------------------------------------------------------

Comp.os.research: Welcome to comp.os.research [l/m 6 Jan 1996]

--------------------------------------------------------------------------------

From: bos@serpentine.com (Bryan O'Sullivan)
Newsgroups: comp.os.research
Subject: Comp.os.research: Welcome to comp.os.research [l/m  6 Jan 1996]
Date: 15 Nov 1997 09:00:30 GMT
Message-ID: <64jobe$4a1@darkstar.ucsc.edu>
Reply-To: os-faq@cse.ucsc.edu
Summary: a first introduction to the operating systems research group

Archive-name: os-research/welcome
Version: $Revision: 1.5 $
Last-Modified: Sat Jan  6 18:16:47 1996

This periodic posting serves as an introduction to newsgroup
comp.os.research.  A set of companion postings, the Frequently
Answered Questions for comp.os.research, summarises some of the past
discussions in the group.

Comp.os.research is a moderated Usenet newsgroup created to foster the
discussion of and exchange of information on topics in the field of
operating systems research.  The moderator is Darrell Long
<darrell@cse.ucsc.edu> of the University of California at Santa
Cruz.  The maintainer of the FAQ is Bryan O'Sullivan
<bos@serpentine.com>.

As far as moderation policy is concerned, the following two principles
are applied to each article submitted for posting: is it interesting?
is it related to research?

Job announcements will be posted if they are related to operating
systems research; blatantly commercial postings are not acceptable.

Calls for papers for most conferences and journals related to OS
research will be posted.  This is fairly broadly interpreted, since OS
research is a large field.

Smart alec comments are generally filed unceremoniously in /dev/null.

Finally, discussions are cut off when the signal-to-noise ratio falls
below a certain (fairly subjective) level.  Please consider setting
the Followup-To: line whenever discussion drifts away from operating
systems research.

This posting, like much of Usenet, is maintained on a purely volunteer
basis.  It is subject to comment and improvement by sending email to
<os-faq@cse.ucsc.edu>.

Please note that there is a discaimer and copyright notice at the end
of Part 2 of the FAQ.

------------------------------------------------------------------------

Comp.os.research: Frequently answered questions [1/3: l/m 13 Aug 1996]

--------------------------------------------------------------------------------

From: bos@serpentine.com (Bryan O'Sullivan)
Newsgroups: comp.os.research
Subject: Comp.os.research: Frequently answered questions [1/3: l/m 13 Aug 1996]
Date: 1 Nov 1997 10:00:12 GMT
Message-ID: <63eujc$iq@darkstar.ucsc.edu>
Reply-To: os-faq@cse.ucsc.edu
Summary: frequent topics of discussion on the operating systems research group

Archive-name: os-research/part1
Version: $Revision: 1.19 $
Posting-Frequency: monthly
Last-Modified: Tue Aug 13 21:03:39 1996
URL: http://www.serpentine.com/~bos/os-faq/

		Answers to frequently asked questions
		  for comp.os.research: part 1 of 3

		       Copyright (C) 1993--1996
			   Bryan O'Sullivan


			  TABLE OF CONTENTS


1.     Introduction
1.1.   How to read this article
1.2.   Reader contributions and comments
1.3.   Acknowledgments and caveats

2.     Recurrent discussions
2.1.   Microkernels, macrokernels, and the in-betweenies
2.2.   Threads
2.2.1. Distinguishing features
2.2.2. Characterising implementations of multithreading
2.2.3. The history of threads

3.     File systems
3.1.   Extent-based versus log-structured file systems

4.     Mobile and disconnected computing
4.1.   Constraints on software
4.2.   Communications protocols
4.3.   Access to files
4.4.   Power management
4.5.   Other issues
4.6.   An introductory mobile computing bibliography

5.     Operating systems teaching
5.1.   What good undergraduate-level texts are available?
5.2.   Graduate-level texts
5.3.   Do any texts cover the implementation of specific operating systems?
5.4.   What instructional operating systems can I use?
5.5.   Where can I find the canonical list of OS papers for grad courses?


--------------------------------------------------------------------------------
Subject: [1] Introduction
From: Introduction

This posting consists of answers to many of the questions most
frequently asked and summaries of the topics most frequently covered
on comp.os.research, the Usenet operating systems research discussion
group.  The purpose of this posting is to circulate existing
information, and to avoid rehashing old topics of discussion and
questions.  Please read all parts of this document before posting to
this newsgroup.

This newsgroup is moderated; the moderator is Darrell Long
<darrell@cse.ucsc.edu>.  A companion posting to the FAQs, `Welcome
to comp.os.research', briefly covers the moderation policy and
guidelines for posting to comp.os.research.  It can be found in either
comp.os.research or news.answers, and is posted regularly.

Due to its size, the FAQ is split up into three parts; each is posted
once a month.  The welcome posting is posted fortnightly.  The FAQ is
also available in hypertext form on the World-Wide Web, at
<URL:http://www.serpentine.com/~bos/os-faq>.  You may prefer to browse
the FAQ on the Web rather than on Usenet, as it contains many useful
hyperlinks.

Note: chunks of text of the form [92-02-12-21-20.29] indicate the
original posting from which a section of this article was inspired,
snarfed, or just plain copied wholesale.  The FAQ as available on the
Web has hyperlinks to the relevant articles.  Other chunks in square
brackets are comments and reminders to myself.  These latter sections
of text will be removed as appropriate material is added, but the
attributions will remain.

--------------------------------------------------------------------------------
Subject: [1.1] How to read this article
From: Introduction

This article is posted in digest format; using the `G%' command from
within the `nn' newsreader should split it up into separate
sub-articles which you can browse through.

To skip to a particular question numbered n.m, use `/: \[n\.m\]' from
most pagers.  From within GNU Emacs, you can use `C-s [n.m]'.  This
article is treated as an outline when edited by GNU Emacs.

--------------------------------------------------------------------------------
Subject: [1.2] Reader contributions and comments
From: Introduction

Your contributions, comments, and corrections are welcomed; mail sent
to <os-faq@cse.ucsc.edu> will be dealt with as quickly as I can
manage.  Generally, performing a reply or followup to this article
from within your newsreader should do the Right Thing.

While I am more than happy to include submissions of material for the
FAQ if they seem appropriate, it would make my life a lot easier if
such text were proof-read in advance, and kept concise.  I don't have
as much time as I would like to digest 15K text files and summarise
them in three paragraphs for inclusion here.  If you are interested in
contributing material, please see the to-do list at the end of part 3
of the FAQ.

--------------------------------------------------------------------------------
Subject: [1.3] Acknowledgments and caveats
From: Introduction

Although this FAQ has been the result of a co-operative effort, any
blame for inaccuracies and errors lies entirely with my edits.  I
would like to thank the following people for their part in
contributing to this article:

Arindam Banerji		<axb@cse.nd.edu>
Surendar Chandra	<surendar@cs.duke.edu>
Steve Chapin		<sjc@cs.purdue.edu>
Crispin Cowan		<crispin@csd.uwo.ca>
Dan Hildebrand		<danh@qnx.com>
Gordon Irlam		<gordoni@home.base.com>
Alan Judge		<amjudge@dsg.cs.tcd.ie>
Darrell Long		<darrell@cse.ucsc.edu>
Chris Maeda		<cmaeda@cs.washington.edu>
Peter Magnusson		<psm@sics.se>
Craig Partridge		<craig@bbn.com>
Tom Van Vleck		<tom_van_vleck@taligent.com>
Robert Walsh		<rjwalsh@maths.tcd.ie>

--------------------------------------------------------------------------------
Subject: [2] Recurrent discussions
From: Recurrent discussions

A number of topics tend to appear with regularity in comp.os.research.
This section attempts to go over some of the most commonly-covered
ground.  I haven't made the list of topics covered exhaustive by any
means.

--------------------------------------------------------------------------------
Subject: [2.1] Microkernels, macrokernels, and the in-betweenies
From: Recurrent discussions

A recurrent topic of discussion in this newsgroup has been the
comparison between microkernel (for example Mach and QNX) and
`macrokernel' (traditional Unix) operating systems.  The basic notion
of a microkernel consists of devolving as much functionality as
possible into processes rather than the kernel itself; different
systems take different approaches to implementing this.

For example, some systems (such as Mach) leave device drivers in the
kernel, and place higher-level services (such as file systems)
outside; others (such as QNX) move device drivers outside of the
kernel.

However, anecdotal evidence [93-03-03-07-56.52] suggests that the
distinction between microkernel and monolithic architectures is
becoming more blurred as time goes on, as the two advance.  For
example, most modern monolithic kernels now implement multiple threads
of execution and fine-grained parallelism.  Architecturally, this
approach begins to appear similar to a microkernel with several
kernel-space processes working from shared memory.

As an aside, people often complain that the Mach system can't be a
`real' microkernel, because it is so large (at least, this is the
argument most frequently cited).  However, I have been told that
automatically-generated code stubs contribute very significantly to
the size of the kernel, and that some size reduction would be likely
if MIG (the stub generator) produced better code.  [Can someone from
CMU comment on this?]  As mentioned above, the leaving of device
drivers in the kernel also contributes to Mach's size.

Debating microkernels versus monolithic kernels on the basis of kernel
size misses the central, architectural point.  In the same way as the
point of a RISC processor is not to minimise the instruction count,
but rather to make a different tradeoff between what is implemented
in the processor instruction set and what is implemented in other
ways, the microkernel architectural issue is to determine which
services are implemented in the microkernel, and which services are
implemented external to that microkernel.  By making appropriate
choices here, the goal is to enhance various OS attributes in a manner
that might not be addressable with a monolithic kernel OS.  System
attributes such as performance, flexibility, realtime, etc. are all
variables which are taken into account.

Some history:

Ira Goldstein and Paul Dale were the coiners of the term `microkernel'
back around 1989.

--------------------------------------------------------------------------------
Subject: [2.2] Threads
From: Recurrent discussions

The exact meaning of the term `thread' is not generally agreed upon.
One of the more common usages denotes a `lightweight' process
(sequential flow of control) which shares an address space and some
other resources with others, and for which context switching time is
lower than for `heavyweight' (i.e. kernel-supported) processes.

Throughout the following material, when we refer to `processes', this
can be taken as meaning heavyweight processes.

--------------------------------------------------------------------------------
Subject: [2.2.1] Distinguishing features
From: Recurrent discussions

Some of the features which distinguish different approaches to
threading are listed below:

- Number of *concurrent* flows of control: generally, threads may
  potentially make use of multiple processors in order to allow
  several to execute concurrently.  That is, the model usually takes
  into consideration the possibility that there may be more than one
  flow of control active at any time.

- Scheduling policy: a thread scheduler may be pre-emptive, in which
  case a thread is put to sleep either when it waits upon some
  resource or runs for the full duration of its time quantum, or
  non-pre-emptive, in which case individual threads continue to run
  until they relinquish the processor themselves (either through
  waiting on a resource or calling the analogue of a sleep()
  function).

Systems which are non-pre-emptive and may only ever have a single
active flow of control (regardless of the number of processors
available) are referred to as coroutine systems.  Coroutine
programming requires quite a different approach from threads-based
programming, as many of the synchronisation and resource-sharing
problems which occur in threaded environments need never trouble the
coroutines programmer.

--------------------------------------------------------------------------------
Subject: [2.2.2] Characterising implementations of multithreading
From: Recurrent discussions

An important distinction may be made between user-level threads and
kernel-supported threads.  A user-level thread maintains all its state
in user space.  A consequence of this is that no kernel resources need
to be allocated per thread, and switching between threads can be done
without changing address space.  A disadvantage is that user level
threads cannot execute while the kernel is busy, for instance, with
paging or I/O.  This would require some knowledge and participation on
the part of the kernel.

It is possible to combine both methods, as is done in SunOS 5.x (aka
Solaris 2.x).  Here, one or more light weight processes (LWPs)
multitask one or more user-level threads, which in turn are
implemented using user-space libraries.

Some issues which characterise thread implementations, and which
determine the uses to which a threads package may be put, include:

- How much by way of kernel resources does a thread require?  This
  will typically limit the number of threads that can be started by a
  process.

- Under what circumstances will the entire process hang?  For
  instance, if some thread gets a page fault, may another thread in
  that process be dispatched?

- Does switching threads require a full system call (as on the SPARC),
  or may context switches between threads be performed entirely at
  user level?

- How are signals handled?  Can signals be masked individually per
  thread?  Is there a `broadcast' signal?

- How are stacks handled?  Will the stacks shrink/grow dynamically on
  a per thread basis?

Several systems today (QNX and Plan 9, for instance) take the stance
that threads `fix the symptom, but not the problem'.  Rather than
using threads because the OS context switch time is too slow, a better
approach, according to the architects of these systems, is to fix the
OS.  It's ironic, now that even PC-hosted desktop OSes provide
MMU-protected multitasking, the fashionable programming model has
become multiple threads running in a common address space, making
debugging difficult, and also making it more difficult to generate
reliable code.  With fast context switching, existing OS services like
explicitly allocated shared memory between a team of cooperating
processes can create a `threaded' environment, without opening the
Pandora's box of problems that a fully shared memory space entails.

--------------------------------------------------------------------------------
Subject: [2.2.3] The history of threads
From: Recurrent discussions

[93-04-21-13-32.11] [92-01-27-17-05.54] The notion of a thread, as a
sequential flow of control, dates back to 1965, at least, with the
Berkeley Timesharing System.  Only they weren't called threads at that
time, but processes [Dijkstra, 65].  Processes interacted through
shared variables, semaphores, and similar means.  Max Smith did a
prototype threads implementation on Multics around 1970; it used
multiple stacks in a single heavyweight process to support background
compilations.

Perhaps the most important progenitor of threads is the programming
language PL/I, from about the 1965 time frame.  The language as
defined by IBM provided a `CALL XXX (A, B) TASK;' construct, which
forked a thread for XXX.  It is not clear whether any IBM compiler
ever implemented this feature, but it was examined closely while
Multics was being designed; it was decided that the TASK call as
defined didn't map onto processes, since there was no protection
between the threads of control.  So Multics took a different
direction, and the TASK feature was removed from PL/I by IBM in any
case, along with the ABNORMAL attribute and lots of other weird stuff.

Then came Unix, in the early 1970s.  The Unix notion of a `process'
became a sequential thread of control *plus* a virtual address space
(incidentally, the Unix notion of a process derived directly from the
Multics process design [Saltzer, 66]).  So `processes', in the Unix
sense, are quite heavyweight machines.  Since they cannot share memory
(each has its own address space), they interact through pipes,
signals, etc).  Shared memory (also a rather ponderous mechanism) was
added much later.

After some time, Unix users started to miss the old processes that
could share memory.  This led to the `invention' of threads: old-style
processes that shared the address space of a single Unix process.
They also were called `lightweight', by way of contrast with
`heavyweight' Unix processes.  This distinction dates back to the very
late 70s or early 80s, i.e. to the first `microkernels' (Thoth
(precursor of the V-kernel and QNX), Amoeba, Chorus, the
RIG-Accent-Mach family, etc).

On a side note, threads have been in continuous use in
telecommunications applications for quite a long time.

See also:

[Cheriton, 79]
  Cheriton, D. R., `Multi-process structuring and the Thoth operating
    system', Ph.D. Thesis, University of Waterloo, 1979.

[Daley & Dennis, 68]
  Daley, R. C., Dennis, J. B., `Virtual memory, processes, and
    sharing in Multics', Comm, ACM 11, 306-312, May 1968.

[Dennis & van Horn, 66]
  Dennis, J. B., van Horn, E. C., `Programming semantics for
    multiprogrammed computations', MAC-TR-21, 1966.

[Dijkstra, 65]
  Dijkstra, E. W., `Cooperating sequential processes', in `Programming
    Languages', Genuys, F. (ed.), Academic Press, 1965.

[Saltzer, 66]
  Saltzer, J. H., `Traffic control in a multiplexed computer system',
    MAC-TR-30 (Sc.D. Thesis), July, 1966.


--------------------------------------------------------------------------------
Subject: [3] File systems
From: File systems

This field is discussed both here and in the comp.arch.storage
newsgroup.  This section needs fleshing out at the moment; it will
grow over time (hopefully!).

--------------------------------------------------------------------------------
Subject: [3.1] Extent-based versus log-structured file systems
From: File systems

[92-11-18-10-57.53] [92-11-22-10-16.26] A general definition for a
log-structured storage system might be the following: logging is an
append-only storage semantics.  The unit of logging is the record.
Write accesses append records to the end of the log.  A log record may
become obsolete.  Useless records are compacted out of the log when
possible.  Other write accesses are forbidden.

An extent-based file system is another candicate for better filesystem
performance.  The approach used under QNX, for example, is to have
files exist as an array of extents on disk, where each is extent is as
many contiguous blocks as could be allocated at that location.  By
using a bitmap for space allocation, files can also grow `in-place',
if adjacent free space on disk exists.  This approach allows the unit
of disk space allocation to remain 512 bytes, which is also the
smallest unit of physical I/O.  The potential performance bottleneck
of this approach does not happen because the filesystem code passes
I/O requests to the disk drivers in units of extents, not 512 byte
blocks.  The filesystem also heuristically modifies the size of the
pre-read requests based on the historical access pattern to the
file.  This approach provides the performance advantages of larger
physical disk block sizes, without the wasted disk space that results
from unused portions of large blocks, or the complexity of trying to
allocate space from partially unused blocks.


--------------------------------------------------------------------------------
Subject: [4] Mobile and disconnected computing
From: Mobile and disconnected computing

The subject of operating systems for mobile and
frequently-disconnected computers has become a recurrent topic in this
newsgroup.  This section attempts to give an overview of issues in
this area.  [Text by Arindam Banerji.]

--------------------------------------------------------------------------------
Subject: [4.1] Constraints on software
From: Mobile and disconnected computing

System software for mobile computing is impeded by four distinct
constraints:

- Compared to stationary computers, mobile computers will always be
  resource poor [Satyanarayan, 93].  Although currently available PDAs
  (Personal Digital Assistants) compare favourably with the
  stand-alone workstations of a few years ago [Marsh, 93], they'll
  most probably lag behind in compute capabilities, available power,
  storage availability and communication bandwidth, for some time to
  come.

- Mobility entails computation amid fluctuating resource availability
  and constraints [Banerji, 93].  Communication bandwidth may be
  available at discrete intervals, an available resource may suddenly
  become unreachable or an otherwise in-expensive communication link
  may be randomly replaced by an expensive alternate in transit.

- Security threats to both mobile computational elements as well as
  the data accessed by them are greatly increased [Satyanarayan, 93].
  Not only is it easier to lose, damage or be robbed of a carry-along
  PDA, but it is often easier to tap into the data transferred (as is
  well-known to much of the cellular communication industry).  Very
  little work, except for that undertaken by the cellular
  communication industry, has been done in the area of addressing the
  specific security needs of mobile computing (as far as I know).

- User needs and their application requirements may not be the same as
  those in stationary systems [Weiser, 91].  As mobile computers
  become ubiquitous (this phrase coined by Mark Weiser), the number of
  computer users will most probably increase exponentially.  Most or
  many of these users will be far less computer literate than the
  average computer user of today.  In addition, shopping, information
  browsing and entertainment may be the typical use of such mobile
  units, as opposed to traditional scientific computing, database
  support or word processing.

- With the presence of PCMCIA slots in a PDA, it also becomes
  necessary for an OS to be able to mount and dismount entire OS
  subsystems on the fly [Hildebrand, 94].  Operating systems need to
  be able to treat networking, filesystems, and other services as
  facilities which may be loaded and unloaded on demand.
  
Based upon an amalgam of these criteria, the next few sections discuss
some of the main areas of ongoing research in mobile computing.

--------------------------------------------------------------------------------
Subject: [4.2] Communications protocols
From: Mobile and disconnected computing

Mobile-IP [Myles & Perkins, 93] `allows packets between mobile hosts
or networks and other hosts (including fixed hosts) to be delivered
along close to optimal routes'.  Compatibility with existing IP
implementations is one of the key problems in Mobile-IP.  For example,
[Perkins et. al, 93], have suggested a scheme based upon the loose
source routing option of IP packets, but most existing IP
implementations do not implement this option.  Scalability is yet
another important issue.

The Columbia scheme [Ioannidis et al., 91] uses IP-in-IP
encapsulation, thus avoiding problems with non-conforming
implementations; but it achieves only sub-optimal routing for mobility
across widely distributed locations [Aziz, 94].  Some efficient
implementations of IP-in-IP encapsulation capable of supporting
near-optimal wide area mobile routing have been suggested [Aziz, 94],
but more experimentation is required.

For resource-constrained mobile computers, hosting a full IP protocol
suite may be an unacceptable resource burden.  Being able to gateway
with a lightweight protocol to a network node which is hosting a
`heavyweight' protocol suite is a valuable capability [Hildebrand,
94].  Lightweight protocols can also make better use of the bandwidth
limitations of wireless communications.

Apart from this, architectures and implementations that handle the
impact of mobility at higher layers have also been proposed -- such as
the connection-oriented services discussed by Katz [Keeton et. al.,
93], and the mobile socket interface discussed by Casey [Casey, 93].
Current trends would appear to suggest that some form of Mobile-IP
will soon become standard, whereas connection maintenance and caching
in higher-level protocols still needs to be resolved.

--------------------------------------------------------------------------------
Subject: [4.3] Access to files
From: Mobile and disconnected computing

File access in a mobile computing environment, where the communication
link to a file server is not guaranteed, has been a major area of
study.  Coda [Satyanarayan, 90], a descendant of the Andrew File
system (AFS), pioneered support for disconnected operations in
file-systems.  Coda increases file availability by replicating a
single volume at multiple server locations.  Disconnected operations
occur when the set of accessible servers for a particular volume
becomes null.  Coda supports disconnected operations by pre-caching
the files a user is most likely to need and then allowing all
operations on cached copies of these files, while disconnected.  Upon
reconnection, reintegration occurs through reconciliation of the
cached copy with the now-reachable server's copy, through the use of a
replay log maintained during the disconnection.

Disconnected operations have also been implemented for AFS [Huston,
93].  The highly available peer-to-peer based Ficus [Page, 91] file
system achieves similar results, although mobile computing was not one
its initial applications.  Caching issues are beginning to predominate
the open research topics in this area.  In between connected and
disconnected states, there are many states of expensive, intermittent
and unreliable connections.  Adapting caching to these varying
situations is a necessity.  More importantly, as introduced by the
Hoarding scheme of Coda, user control over some caching behavior is
extremely beneficial, and this need for user input becomes even more
important when the server connection is weak.

--------------------------------------------------------------------------------
Subject: [4.4] Power management
From: Mobile and disconnected computing

Current battery technology limits PDA use to only a few hours.  The
conservation of power through system software is thus becoming a major
area of research in mobile computing.  Two specific approaches to this
problem exist.

- Some researchers [Greenawalt, 93] are attempting to analyse the effects
  of application type, user input and operating system implementations on
  device power consumption.  Based upon simulation data, several power
  consumption models have been proposed for disks [Greenawalt, 93]
  [Douglis & Marsh, 93].  Work in characterising and analysing the power
  consumption problem is still ongoing.

- Several industry-led efforts, on the other hand, have focussed on
  building system support for dynamic power-saving mechanisms.  The
  Advanced Power Management standard presents an interface and structure
  for manipulating power consumption.  The Nomadic System Group at Sun
  Microsystems has integrated similar support into SVR4 [Bender et. al,
  93]; these facilities are also available in QNX.

Complete analysis of peripheral device power usage and experimentation
with different strategies for implementing power management will certainly
be useful.

--------------------------------------------------------------------------------
Subject: [4.5] Other issues
From: Mobile and disconnected computing

Two significant aspects of mobile computing give applications in this
environment a very different flavour.

- The dynamic nature of the environment forces applications to handle
  changes in the availability and allocation of software resources.
  Dynamic changes to environment variables [Schilit, 93], change in
  the available version of a library [Goldstein, 94] and the ability
  to lookup and retrieve objects from remote locations [Theimer, 93]
  are all required to solve this problem.  For the very same reasons,
  user interfaces add on an extra dimension, an issue which very few
  have addressed so far [Landay & Kaufmann, 93]. All this has caused
  certain vendors to move towards interpreted environments, based on
  scripting(??) languages as such as Script-X (Kaleida) and Open
  Scripting Architecture (Apple).

- Money will be a constituent of many of the transactions and
  applications that mobile computers will typically be used for.
  Hence, many pieces of system software will be required to handle,
  understand and optimise the use of money [Kulkarni, 94].  As
  mentioned by Ed Frank at the ICDCS '93 panel discussion on mobile
  computing, transaction involving `money and sex' may well become the
  biggest uses of the mobile computer.  Some initial forays into
  reviewing policies for pricing Internet services [Shenker, 93] may
  prove to be very useful and so will the experience of current
  consumer service providers such as CompuServe and America Online.
  This area will perhaps show the biggest divergence in the years to
  come, since applications will be far more customer-needs driven than
  technology-driven, as they have been in the past.

Finally, aspects of hardware support are critical to positioning any
discussion on mobile computing.  The most ambitious system is perhaps
the ParcTab system [Schilit, 93] under development at Xerox PARC.  The
ParcTab is a PDA that communicates via infrared data packets to a
network of infrared transceivers.  The network, designed for use
within a building, designates each room as a communication cell.  This
infrastructure has the responsibility for providing reliable service
as the ParcTab user moves from room to room.  More general purpose but
less ambitious PDAs are currently available from AT&T (EO), Apple
(Newton), IBM (Simon) etc.  Almost all recognise some alternate form
of input, such as handwriting.  The capabilities of these PDAs are
sure to increase in the coming years, and hopefully their prices will
not follow a similar trend.

--------------------------------------------------------------------------------
Subject: [4.6] An introductory mobile computing bibliography
From: Mobile and disconnected computing

[Marsh, 93]
  Marsh, B., Douglis, F. & Caceres, R., `System issues in mobile
    computing', Technical Report, Matsushita Information Technology
    Laboratory, ITL-TR-50-93

[Satyanarayanan, 93]
  Satyanarayanan et. al, `Experience with disconnected operation in a
    mobile computing environment', Proceedings of Mobile and
    Location-Independent Computing Symposium, August 93, pp. 11-28

[Banerji, 93]
  Banerji, A., Cohn, D. & Kulkarni, D., `Mobile computing personae',
    Proceedings of Fourth Workshop on Workstation Operating Systems,
    October 93, pp. 14-20

[Weiser, 91]
  Weiser, M., `The computer for the 21st century', Scientific
    American, September 91, pp. 94-104

[Myles & Perkins, 94]
  Myles, A. & Perkins, C., Internet Draft, September, 93

[Perkins et. al, 93]
  Bhagwat, P. & Perkins, C., `A mobile networking system based on IP',
    Proceedings of Mobile and Location-Independent Computing
    Symposium, August 93, pp. 69-82

[Ioannidis et. al, 91]
  `IP based protocols for mobile internetworking', Proceedings of the
    SIGCOMM-91 conference: Communications, Architectures and
    Protocols, September 91, pp. 235-245

[Aziz, 94]
  Aziz, A., `A scalable and efficient intra-domain tunneling mobile-IP
    scheme', ACM SIGCOMM-Computer Communications Review, Vol 24.,
    No. 1, January 94, pp. 12-20

[Keeton et al., 93]
  Keeton, K. et al., `Providing connection oriented network services
    to mobile hosts', Proceedings of Mobile and Location-Independent
    Computing Symposium, August 93, pp. 83-102

[Casey, 94]
  Casey, M., `Application and communication support for mobile
    computing', Dissertation Proposal, University of Notre Dame,
    January 94

[Satyanarayanan, 90]
  Satyanarayanan, M., et al., `Coda: A highly available File-system
    for a distributed workstation environment', IEEE Transactions on
    Computers 39(4), April 90

[Huston, 93]
  Huston, L. & Honeyman, P., `Disconnected operation for AFS',
    Proceedings of Mobile and Location- Independent Computing
    Symposium, August 93, pp. 1-10

[Page, 91]
  Page, T. et al., `Architecture of the Ficus replicated file system',
    Tech Report CSD-910005, University of California, Los Angeles,
    March 91

[Greenawalt, 93]
  Greenawalt, P., `Modelling power management for hard disks',
    Intl. Workshop on Modelling, Analysis & Simulation of Computer and
    Telecommunication Systems, January 94

[Douglis & Marsh, 93]
  Douglis, F. & Marsh, B., `Low power disk management for mobile
    computers', Technical Report, Matsushita Information Technology
    Laboratory, MITL-TR-53-93

[Bender et. al, 93]
  Nomadic Systems Group, Sun Microsystems, `UNIX for Nomads: Making
    UNIX support mobile computing', Proceedings of Mobile and
    Location-Independent Computing Symposium, August 93, pp. 53-68

[Schilit, 93]
  Schilit, B., Theimer, M. & Welch, B., `Customizing mobile
    applications', Proceedings of Mobile and Location-Independent
    Computing Symposium, August 93, pp. 129-138

[Hildebrand, 94]
  Hildebrand, D., `QNX: microkernel technology for open systems
    handheld computing', Proceedings of Pen and Portable Computing
    Conference Exposition, May 1994.  Available via ftp from
    <URL:ftp://ftp.qnx.com/pub/papers/>.

[Goldstein, 94]
  Goldstein, T. & Sloane, A., `The object binary interface -- C++
    objects for evolvable shared class libraries', Proceedings of the
    USENIX C++ Conference, April 94

[Theimer, 93]
  Theimer, M., Demers, A. & Welch, B., `Operating system issues for
    PDAs', Proceedings of Fourth Workshop on Workstation Operating
    Systems, October 93, pp. 14-20

[Landay & Kaufmann, 93]
  Landay, J. & Kaufmann, T., `User-interface issues in mobile
    computing', Proceedings of Fourth Workshop on Workstation
    Operating Systems, October 93, pp. 40-47

[Kulkarni, 94]
  Kulkarni, D., Banerji, A., Cohn, D., `Operating systems and cost
    management', Operating Systems Review, January 94.

[Shenker, 93]
  Shenker, S., `Service models and pricing policies for an integrated
    services Internet', Proceedings on Conference on Public Access to
    the Internet, JFK School of Government, Harvard University, May 93

[Schlitt, 93]
  Schlitt et al., `The ParcTab mobile computing system', Proceedings
    of Fourth Workshop on Workstation Operating Systems, October 93,
    pp. 34-39


--------------------------------------------------------------------------------
Subject: [5] Operating systems teaching
From: Operating systems teaching

This section attempts to give some useful pointers to people who teach
operating systems, both at undergraduate and postgraduate level.

--------------------------------------------------------------------------------
Subject: [5.1] What good undergraduate-level texts are available?
From: Operating systems teaching

The comments below have been provided by a variety of people, so any
`me's or `I's you encounter are not necessarily those of the
maintainer!

- `Operating Systems Concepts', fourth edition, by Abraham
  Silberschatz and Peter Galvin is the latest version of this popular
  text.  Addison-Wesley, 1994, ISBN 0-201-50480.  This book has been
  revised to include new and updated information, examples, diagrams,
  and an expanded bibliography.

  I think this is the `standard' OS text, although I have a couple of
  others that I also think are good, and that I draw from when I teach
  OS.  Previous editions of the dinosaur book don't have the greatest
  organisation, and sometimes wander when describing things.  Its
  strong point lies in the copious examples.

  Speaking of the third edition (I haven't seen a copy of the fourth
  edition yet):

    The first 84 pages cover operating system basics, the next 120
    pages cover process management including 30 pages on deadlocks.
    130 pages on storage management: memory, virtual memory, secondary
    storage.  70 pages on file systems and protection.  Then 100 pages
    on distributed systems.  The last part of the book has case
    studies on Unix and Mach: 50 pages on Unix and 30 pages on Mach.
    The last chapter gives a short 10 page historical perspective.

  Mail a message with contents `send help' to <os4e@aw.com> for
  further details of the new edition.  The book gives a good (but
  slightly theoretical) overview of operating system concepts.  A good
  complement would be the books covering Minix or BSD, which are more
  implementation-oriented.

- `Operating Systems', Harvey Deitel, Addison-Wesley, 1990, ISBN
  0-201-18038-3.  Not a bad book; gives the same sort of theoretical
  treatment of operating systems as the dinosaur book.  Includes case
  studies on Unix, MS DOS, MVS, VM, the Macintosh OS, and OS/2.

- `An Operating Systems Vade Mecum', second edition, by Raphael
  Finkel, 1988, Prentice Hall, ISBN 0-13-637950-8.  I really like this
  book; it is a bit more theoretical than the dinosaur book, but is
  well-written and clear.  I would accompany it with labs based on one
  of the educational experimental OS's (NachOS, OSP) for hands-on
  experience.

  The edition mentioned above is now out of print.  However, it may be
  obtained via anonymous ftp from
  <URL:ftp://ftp.ms.uky.edu/pub/tech-reports/UK/cs/>.
  Here is the associated chunk of README:

    This textbook is out of print.  It was published by Prentice Hall.
    The author now owns the copyright.  Permission is granted to copy
    this text for any noncommercial purpose.  Feel free to generate
    copies of the text for your students.  You may also photocopy the
    original book without restriction.  Kindly send suggested upgrades
    to the author: <raphael@ms.uky.edu>.  He is planning a new
    edition sometime.

  [It's been a few years since I've looked at this book, so I can't
   remember what it contains.  Can anyone help?]

- `The Logical Design of Operating Systems', second edition, Lubomir
  Bic, Alan Shaw, 1988, Prentice Hall, ISBN 0-13-540139-9.  This one
  isn't as theoretical as Finkel's book, nor is it as long as the
  dinosaur book.  I haven't tried to use it in a course yet, but it
  looks like a fairly well-rounded text.

  [Can anyone write a paragraph on the various topics covered ... ?]

- `Operating Systems', second edition, William Stallings
  <ws@shore.net>, Prentice-Hall, 1995, ISBN 0-02-415493-8.  I
  received very positive feedback from students about the first
  edition of this book; I have not yet seen the second edition.  The
  explanations of topics were easy to understand and complete.  An
  especially nice feature was that at the end of each chapter OS/2,
  Unix and MVS were used to demonstrate real life implementations of
  the theory talked about.  I found this tying together of theory and
  practice much nicer than having the practice lumped at the end of
  the book.

- `Modern Operating Systems,' Andrew Tanenbaum <ast@cs.vu.nl>,
  1992, Prentice Hall, ISBN 0-13-588187-0.  This started out as a
  rewrite of the Minix book, but he pulled the Minix-specific material
  and added seven chapters on distributed systems.  It's a bit heavy
  for undergrads, depending on how far into the distributed systems
  you go, but I like Tanenbaum as an author.  He'll be bringing out a
  second edition of the Minix book sometime soon; as he says, one is
  for `hands-on' (Minix) and one is for `hands-off' (Modern OS).

  The book is divided into two parts: `traditional' introductory
  material, taken more or less verbatim from the Minix book, and an
  introduction to distributed systems.  Each parts concludes with a
  case study and comparison of two well-known systems (Unix and
  MS-DOS, and Mach and Amoeba).  The bibliography at the end is
  organised well for more advanced coverage of the topics encountered
  throughout the book.

  Topics covered in the first part include process concepts, memory
  management, file system organisation and I/O, and deadlock detection
  and avoidance.  The second part addresses issues such as distributed
  communication, synchronisation (the section on clock synchronisation
  is well put together), processes in distributed environments
  (nothing on process migration), and distributed file systems (using
  AFS as an example).  The second part seems more suitable for
  advanced undergraduate level or introductory graduate level studies.

  This book has been translated into German; it is available from
  Carl Hanser Verlag as `Moderne Betriebssysteme', ISBN 3-446-17472-9.

- `Operating System Design: the Xinu Approach', Douglas Comer, Timothy
  Fossum, 1984, Prentice Hall, ISBNs 0-13-638180-4 (PC edition) and
  0-13-638529-X (Macintosh edition).  A walk-through of the principles
  behind, and implementation of, the Xinu operating system, a small
  instructional OS similar to Unix.  While this text is aging
  somewhat, it presents its material in a clear fashion, and does a
  good job of covering the "standard" fundamentals of operating
  systems.

- `Operating Systems: Design and Implementation', Andrew S. Tanenbaum,
  1986 (?), Prentice Hall, ISBN 0-13-637406-9.  This, along with
  Comer's Xinu books, is the classic text which `teaches by doing',
  covering the design and implementation of Minix, a microkernel
  operating system which has a programming and user interface similar
  to Unix.  As with Comer's books, this text is showing its age
  somewhat (the source is very much out of date with the current Minix
  distribution), but it still does a good job of presenting the basics
  of operating system implementation.

- `Operating Systems Programming: The SR Programming Language',
  Stephen J. Hartley <shartley@mcs.drexel.edu>, Oxford University
  Press, 1995, ISBN 0-19-5095790.  SR is a language for concurrent
  programming; this book presents the language, presents some example
  programs in the context of operating systems or concurrent
  programming, and provides exercises in the form of Open Student
  Laboratories.  The book is designed to be used in conjunction with
  one of the standard operating systems texts to provide concurrent
  programming experience, or can be used alone as an introductory
  concurrent programming book.  I have not seen a copy of it yet, and
  so cannot comment on its quality.  The example programs in the book
  are intended for running in a Unix environment; they are available
  via anonymous ftp from <URL:ftp://mcs.drexel.edu/pub/>, and
  the SR language itself is available from
  <URL:ftp://cs.arizona.edu/>.

--------------------------------------------------------------------------------
Subject: [5.2] Graduate-level texts
From: Operating systems teaching

This section is still under construction.

- `Distributed Systems', second edition, by Sape Mullender,
  Addison-Wesley, 1994, ISBN 0-201-62427-3.  A review is forthcoming.

- `Distributed Operating Systems -- the Logical Design', Andrzej
  Goscinski, Addison-Wesley, 1991, ISBN 0-201-41704-9.  A thorough
  desk reference, but reads a little too much like an encyclopedia for
  use as a textbook.

- `Modern Operating Systems,' Andrew Tanenbaum <ast@cs.vu.nl>,
  1992, Prentice Hall, ISBN 0-13-588187-0.  The section of this book
  which covers distributed systems is suitable for use at introductory
  graduate level.  See above for further details.

- `Concurrent Systems', Jean Bacon, 1992, Addison-Wesley, ISBN
  0-201-41677-8.  This covers much the same material as `Modern
  Operating Systems', but goes into rather more detail on databases
  and languages.  The book is divided into four parts, and comes with
  a separate instructor's manual (ISBN 0-201-62406-0).  The first
  covers basic material, such as OS functions, and system and language
  support for concurrent processes.  Part 2 deals with simple
  concurrent actions, covering topics such as shared-memory IPC,
  message passing, persistent data, crashes, and distributed data.
  The third part of the book covers transactions, concurrency control,
  and failure recovery.  The final section presents a set of case
  studies, with Unix, Mach and Chorus being covered, along with some
  of the work done at Cambridge over the past decade.  An interesting
  emphasis is placed on language-level support for concurrency
  throughout the book, and the focus on database issues is also a good
  thing.

  I haven't read the book in as much detail as I would like, but it
  seems to be well put together.  The cramming of so many topics under
  one cover means that there is probably too much material for a
  single undergraduate course, and the book perforce does not go into
  as much detail as I would like on some topics (a problem I also find
  with Tanenbaum's book).  Well worth a look, however.

- `Distributed Systems: Concepts and Design', second edition, George
  Coulouris <George.Coulouris@dcs.qmw.ac.uk>, Jean Dollimore, and
  Tim Kindberg, Addison-Wesley 1994, ISBN 0-201-62433-8.  This text
  treats a wide variety of issues at a level suitable for advanced
  undergraduate and postgraduate teaching.  Basic topics covered
  include IPC, networking and RPC, upon which notions of distributed
  operation and provision of services are built.  Coverage of
  distributed synchronisation leads on to a treatment of replication,
  simple transactions and concurrency control.  The final chapters
  include material on distributed transactions, fault tolerance,
  security, and distributed shared memory.

  Illustrative examples taken from modern `real world' systems such as
  Sun RPC, the Andrew File System, and PGP are provided throughout the
  book, and case studies of the Amoeba, Mach, Chorus, and Clouds
  systems appear towards the end.  Exercises are presented at the end
  of each chapter.  The prose is clear, and the layout pleasant.  This
  is, by a narrow margin, the best distributed systems textbook I have
  come across.

- `Advanced Concepts in Operating Systems -- Distributed,
  Multiprocessor, and Database Operating Systems', Mukesh Singhal,
  Niranjan G. Shivaratri, McGraw-Hill, 1994, ISBN 0-07-057572-X.  A
  solid work on advanced operating systems, with some emphasis on
  theoretical aspects.  Well over 2/3 of the book focuses on
  distributed operating systems.  It does a good job of covering all
  the bases, but at times omits vital information or obfuscates what
  should be simple issues.

--------------------------------------------------------------------------------
Subject: [5.3] Do any texts cover the implementation of specific operating systems?
From: Operating systems teaching

Some books commonly used in conjunction with the texts listed above
are:

- `The Design and Implementation of the 4.3BSD Unix Operating System',
  Samuel Leffler, Kirk McKusick, Michael Karels, John Quarterman,
  1989, Addison-Wesley, ISBN 0-201-06196-1.  This book describes the
  kernel of 4.3BSD Unix in some detail, covering process and memory
  management (the latter being heavily VAX-oriented), file system
  organisation, device driver internals, terminal handling, IPC,
  network communications, some details of the Internet protocols, and
  system operation.  I found this book to be well-written and concise.

  Accompanying the above is the `4.3BSD Answer Book', Samuel Leffler,
  Kirk McKusick, 1991, Addison-Wesley, ISBN 0-201-54629-9.  This short
  book provides answers to all of the exercises found at the end of
  each chapter in the daemon book.

- `The Design of the Unix Operating System', Maurice Bach, 1986,
  Prentice Hall, ISBN 0-13-201757-1.  This is the authoritative
  description of the internals of System V Release 2 Unix.  Due to
  copyright restrictions, it contains no actual code, but rather
  pseudo-code (I didn't find this to be a problem).  Topics covered
  include file system internals, process control and scheduling,
  memory management, IPC, and device driver architecture.  Coverage of
  mutliprocessor and distributed Unix systems is dated, but this
  remains a classic operating systems text.

- `The Magic Garden Explained: The Internals of Unix System V Release
  4', Berny Goodheart, James Cox, 1994, Prentice Hall, ISBN
  0-13-098138-9.  This books covers the workings of SVR4 in
  substantial detail; it is more detailed than either of the above
  texts, and appears to be of very high quality.  While the authors
  recommend the book be read in parallel with study of the original
  Unix source code, sufficient pseudocode representation of the key
  algorithms has been included to permit a more or less detailed study
  without restricted access to the original source code.

- `Unix Internals: The New Frontiers', Uresh Vahalia, 1995, Prentice
  Hall, ISBN 0-13-101908-2.  This is quite simply a wonderful book.
  The broad issues it covers include threads and lightweight
  processes, and how they interact; signal implementations, process
  group and session management; process scheduling; IPC; kernel
  synchronisation and multiprocessor architectures; local and
  distributed filesystems; kernel memory management; and device driver
  architectures.  Each topic is accompanied by details of its
  implementation under modern Unix variants such as Solaris 2.x,
  SVR4.2, and 4.4BSD, and its treatment by the Mach kernel.  The
  writing style is concise and pleasant, and the treatment of each
  topic is satisfyingly thorough and clear.  If you are interested in
  the implementation of Unix or other operating systems, this book is
  a "must have".

- `Unix Systems for Modern Architectures: Caching and Symmetric
  Multiprocessing for Kernel Programmers', Curt Schimmel, 1995,
  Addison-Wesley, ISBN 0-201-63338-8.  Covers in extensive detail the
  operation of caches and symmetric multiprocessors, how they
  interact, and the issues operating systems must address in order to
  make effective use of them.  Issues addressed include the management
  of virtually- and physically-tagged caches on uniprocessors,
  synchronisation and mutual exclusion techniques for multiprocessors,
  standard multiprocessor kernel architectures, and multiprocessor
  cache coherency.  This book is written in a clear manner, and
  illustrated effectively.  Each chapter ends with lists of exercises
  and references.  My copy contains a number of typographical errors,
  but I am told that later printings have addressed this issue.

I am not aware of any similar book which covers the implementation of
a distributed system.

--------------------------------------------------------------------------------
Subject: [5.4] What instructional operating systems can I use?
From: Operating systems teaching

- Minix, from Amsterdam's Vrije Universiteit, was developed by Andy
  Tanenbaum <ast@cs.vu.nl>, and is a mostly-Unix lookalike based on
  a message-passing microkernel-similar architecture.  The system is
  used in Tanenbaum's `Modern Operating Systems' and its predecessor,
  `Operating Systems: Design and Implementation'.  See the Minix
  Information Sheet, posted regularly to comp.os.minix, for ordering
  information; Minix is copyrighted and is not in the public domain,
  but is available from <URL:ftp://ftp.cs.vu.nl/pub/minix>.  For
  further information, see Andy's Web page at
  <URL:http://www.cs.vu.nl/~ast>.

- NachOS is an instructional OS developed at Berkeley by Tom Anderson
  <tea@cs.berkeley.edu>, Wayne Christopher, Stephen Procter (and
  others?).  It currently runs on DEC MIPS and Sun SPARC workstations,
  HP PA-RISC, and 386BSD machines.  The NachOS system, along with
  sample assignments and an overview paper which was presented at
  Usenix, is available via anonymous ftp from
  <URL:ftp://ftp.cs.berkeley.edu/ucb/nachos>.

- OSP (current version is 1.7) is an operating systems simulator,
  available via ftp from <URL:ftp://sblapis1.cs.sunysb.edu>, with
  username ospftp, and password as in the instructor's guide.  Used in
  `OSP---an Environment for Operating Systems', Michael Kifer, Scott
  Smolka, Addison-Wesley.

- RCOS (Ron Chernich's Operating System) is a simulated operating
  system that is intended to demonstrate graphically the concepts
  behind operating systems.  Students can investigate and modify the
  algorithms it uses, and write programs in a Pascal-like language
  (extended with semaphores and shared memory) which it will execute.
  RCOS has a windowing interface, and currently runs under MS-DOS; an
  alpha-quality Unix/X11 port is also available.  For further details,
  check out the Web page at
  <URL:http://cq-pan.cqu.edu.au/david-jones/projects/rcos>.

- SunOS Minix consists of a set of patches for Minix which allows the
  Minix system to run in a single monolithic Unix process on top of
  SunOS 4.x on Sun 3 and Sun 4 machines.  SunOS Minix is produced by
  applying a set of patches to Mac Minix 1.5 (both 1.5.10.0 and
  1.5.10.1 can be used) or PC Minix 1.5.  Also, Atari Minix has been
  used as the base version by at least one person.  The latest version
  (2.0) includes a preliminary attempt at a port to Solaris 2.x.
  SunOS Minix is available via anonymous ftp from
  <URL:ftp://csc.canterbury.ac.nz/UNIX/>.

- VSTa is not intended as an instructional operating system, but it is
  certainly small and concise enough to be tractable, and the code is
  clean and provides modern microkernel features.  See part 2 of the
  FAQ for further details.

- Xinu was developed at Purdue by Doug Comer and some others.  It was
  designed to be small and layered, so that the code is succinct and
  easily understandable.  It is intended for education, and is a
  `conventional' operating system.  Xinu runs on the IBM PC, Sun-3,
  SPARC, LSI, MIPS, Macintosh, and VAX architectures.  The system is
  used in Comer's `Operating System Design: the Xinu Approach'.
  See <URL:http://www.cs.purdue.edu/homes/dec/xlicense.html> for
  licensing information.

--------------------------------------------------------------------------------
Subject: [5.5] Where can I find the canonical list of OS papers for grad courses?
From: Operating systems teaching

[93-03-14-17-09.47] Darrell Long <darrell@cse.ucsc.edu> maintains a
bibliography which provides a good starting point for graduate OS
course reading lists.  This may be imported using refdbms as
ucsc.grad.os, from refdbms.cse.ucsc.edu 4117 or refdbms.cs.vu.nl 4117.

------------------------------------------------------------------------

Comp.os.research: Frequently answered questions [2/3: l/m 13 Aug 1996]

--------------------------------------------------------------------------------

From: bos@serpentine.com (Bryan O'Sullivan)
Newsgroups: comp.os.research
Subject: Comp.os.research: Frequently answered questions [2/3: l/m 13 Aug 1996]
Date: 1 Nov 1997 10:00:25 GMT
Message-ID: <63eujp$iu@darkstar.ucsc.edu>
Reply-To: os-faq@cse.ucsc.edu
Summary: frequent topics of discussion on the operating systems research group

Archive-name: os-research/part2
Version: $Revision: 1.22 $
Posting-Frequency: monthly
Last-Modified: Tue Aug 13 21:03:28 1996
URL: http://www.serpentine.com/~bos/os-faq/

		Answers to frequently asked questions
		  for comp.os.research: part 2 of 3

		       Copyright (C) 1993--1996
			   Bryan O'Sullivan


			  TABLE OF CONTENTS


1.     Available software
1.1.   Where can I find Unix process checkpointing and restoration packages?
1.2.   What threads packages are available for me to use?
1.3.   Can I use distributed shared memory on my Unix system?
1.4.   Where can I find operating systems distributions?
1.4.1. Distributed systems and microkernels
1.4.2. Unix lookalikes
1.4.3. Others

2.     Performance and workload studies
2.1.   TCP internetwork traffic characteristics
2.2.   File system traces
2.3.   Modern Unix file and block sizes
2.3.1. File sizes
2.3.2. Block sizes
2.3.3. Inode ratios

3.     Papers, reports, and bibliographies
3.1.   From where are papers for distributed systems available?
3.2.   Where can I find other papers?
3.3.   Where can I find bibliographies?

4.     General Internet-accessible resources
4.1.   Wide Area Information Service (WAIS) and World-Wide Web (WWW) servers
4.2.   Refdbms---a distributed bibliographic database system
4.3.   Willow -- the information looker-upper
4.4.   Computer science bibliographies and technical reports
4.5.   The comp.os.research archive
4.6.   Miscellaneous resources

5.     Disclaimer and copyright


--------------------------------------------------------------------------------
Subject: [1] Available software
From: Available software

This section covers various software packages, operating systems
distributions, and miscellaneous other such items which may be of
interest to the operating systems research community.  If you have
written, or know of, some software which you believe would be of
fairly wide interest, please get in touch with the FAQ maintainer with
a view to having a short spiel and availability information included
here.

--------------------------------------------------------------------------------
Subject: [1.1] Where can I find Unix process checkpointing and restoration packages?
From: Available software

- [93-01-21-10-18.30] The Condor system is available via anonymous ftp
  from <URL:ftp://ftp.cs.wisc.edu>.  Condor works entirely at user
  level [no kernel modifications required] but doesn't currently
  support interprocess communication, signals, or fork().  Definitely
  worth a look.

- Bennet S Yee implemented a `mostly portable' checkpoint and restore
  package back around 1987.  When the programmer invokes the
  checkpoint procedure, it saves the state to a file; when a second
  process with the same program (but with different arguments) is
  started which calls the restore procedure, it reads the old state
  from the file.  Available via anonymous ftp from
  <URL:ftp://play.trust.cs.cmu.edu/usr/bsy/pub/>.
  This package is known to work for Pmaxen, Sun4's, Sun3's, IBM RTs,
  and VAXen.  Porting it to a new architecture should be relatively
  simple -- look at the README file.

--------------------------------------------------------------------------------
Subject: [1.2] What threads packages are available for me to use?
From: Available software

Now that POSIX has arrived at a standard threads interface, it is
expected that all major Unix vendors will soon release conformant
threads packages.  Currently, vendor-supplied threads packages vary
widely in the interfaces they provide.  Some vendors' packages conform
to various drafts of the POSIX standard, while others provide their
own interfaces.

OS/2, Windows NT and Windows 95 all provide threads interfaces.  None
conforms to the POSIX standard, and neither IBM nor Microsoft has
signalled any intention to provide conformant threads interfaces.

- Michael T. Peterson <mtp@big.aa.net> has written a POSIX and DCE
  threads package, called PCthreads, for Intel-based Linux systems.
  See <URL:http://www.aa.net/~mtp/PCthreads.html> for more information.

- Christopher Provenzano <proven@mit.edu> has written a portable
  implementation of draft 8 of the IEEE Pthreads standard.  See
  <URL:http://www.mit.edu:8001/people/proven/pthreads.html> for further
  details, or fetch the software itself from
  <URL:ftp://sipb.mit.edu/pub/pthreads>.  Currently supported are
  i386/i486/Pentium processors running NetBSD 1.0, FreeBSD 1.1, Linux
  1.0, and BSDi 1.1; DECstations running Ultrix-4.2; SPARCstations
  running SunOS 4.1.3; and HP/PA machines running HP/UX-9.03.

  As far as I can see, development of this library has halted (at
  least temporarily), and it still contains many serious bugs.

- Georgia Tech's OS group has a fairly portable user-level threads
  implementation of the Mach Cthreads package.  It is called Cthreads,
  and can be found at
  <URL:ftp://ftp.cc.gatech.edu/pub/groups/systems/Falcon/>.
  It also contains the Falcon integrated monitoring system.

  It currently runs under SunOS 4.1.X, Irix 4.0.5, Irix 5.3, AIX
  3.2.5, Linux 1.0 and higher, and KSR1 and KSR2.  It is a fairly easy
  to port to other architectures.  Current ports in progress are
  Solaris 2.4 and AIX 4.X.

- The POSIX / Ada-Runtime Project (PART) has made available an
  implementation of draft 6 of the POSIX 1003.4a Pthreads
  specification, which runs under SunOS 4.x; the current release is
  version 1.20.  Available using anonymous ftp from
  <URL:ftp://ftp.cs.fsu.edu/pub/>.

- Elan Feingold has written a threads package called ethreads; I don't
  know anything about it, other than that it is available from
  <URL:ftp://frmap711.mathp7.jussieu.fr/pub/scratch/rideau/misc/threads/ethreads/ethreads.tgz>.

- Stephen Crane has written a `fairly portable' threads package, which
  runs under Sun 3, Sun 4, MIPS/RISCos, Linux, and 386BSD.  It is
  available via anonymous ftp from
  <URL:ftp://dse.doc.ic.ac.uk/rex/>, with documentation in
  the same directory named lwp.ps.gz.

- QuickThreads is a toolkit for building threads packages, written by
  David Keppel.  It is available via anonymous ftp from
  <URL:ftp://ftp.cs.washington.edu/pub/qt-001.tar.Z>, with an
  accompanying tech report at
  <URL:ftp://ftp.cs.washington.edu/tr/1993/05/UW-CSE-93-05-06.PS.Z>.
  The code as distributed includes ports for the Alpha, x86, 88000,
  MIPS, SPARC, VAX, and KSR1.

- On CONVEX SPP Exemplar machines there is a Compiler Parallel Support
  Library (CPSlib), a library of thread management and synchronisation
  routines.  CPSlib is not compatible with anything else, but the
  interface is sufficiently similar to the Solaris threads or pthreads
  interface to allow straight porting.  One special feature of CPSlib
  is the (possible) distiction between "symmetric" and "asymmetric"
  parallelism.

A small number of vendors provide DCE threads packages for various
Unix systems.

--------------------------------------------------------------------------------
Subject: [1.3] Can I use distributed shared memory on my Unix system?
From: Available software

- CRL is a simple all-software distributed shared memory system
  intended for use on message-passing multicomputers and distributed
  systems.  CRL 1.0 can be compiled for use on the MIT Alewife
  Machine, Thinking Machine's CM-5, and networks of Sun workstations
  running SunOS 4.1.3 communicating with one another using TCP and
  PVM.  Because CRL requires no functionality from the underlying
  hardware, compiler, or operating system beyond that necessary to
  send and receive messages, porting CRL to other platforms should
  prove to be straightforward.

  General information about CRL can be found at
  <URL:http://www.pdos.lcs.mit.edu/crl>.  The CRL 1.0 source
  distribution (sources for CRL 1.0 and several applications, user
  documentation, and a postscript version of a paper about CRL to
  appear in this SOSP later this year) is available at
  <URL:http://www.pdos.lcs.mit.edu/crl/source.html>.

- Ron Minnich <rminnich@earth.sarnoff.com> has implemented a
  distributed shared memory system called MNFS, which is a modified
  version of NFS and runs alongside NFS in the kernel.

  Performance is good; page faults under FreeBSD 2.0R run at about the
  same speed as NFS (~5.9 milliseconds per page).  If you need to
  update a page from one host to many clients, it can be done at a
  cost of 1.2 milliseconds or so per client.  This scales: networks of
  128 nodes running MNFS have been set up, and times should improve
  over faster LANs than Ethernet.

  The MNFS programming model uses mmap'ed files.  Programs map files
  in and then use them as ordinary memory.  Cache consistency of a
  page is maintained by the MNFS servers, ensuring that there is only
  one writeable copy in the network at a time.  The model is not
  strongly coherent; read-only copies of a page are only refreshed by
  an explicit action on the part of the holder of a writeable page
  (using msync).  For those who don't like this style of programming,
  a parallel C compiler has been retargeted to use MNFS on clusters
  and networks of computers running Condor.  Both performance and
  scalability matched explicitly mmap-coded systems.

  The system has been implemented on Sunos 4.1.x, Solaris 2.2 and 2.3,
  IRIX 5.2 and 5.3, and AIX 3.2.  All of these were legally
  encumbered, so the FreeBSD version is currently the only
  freely-available implementation.

  MNFS is available from <URL:ftp:ftp.sarnoff.com/pub/mnfs>, and may be
  installed either as a set of diffs to the FreeBSD 2.0.5R kernel, or
  installed in-place.  Also included in this directory is a slightly
  out-of-date paper on MNFS, and a more current manual.

  A Linux port of MNFS is in the works.

--------------------------------------------------------------------------------
Subject: [1.4] Where can I find operating systems distributions?
From: Available software

This section covers the availability of several well-known systems;
the only criterion for inclusion of a system here is that it be of
interest to some segment of the OS research community (commercial
systems will be accepted for inclusion, so long as they are pertinent
to research).

--------------------------------------------------------------------------------
Subject: [1.4.1] Distributed systems and microkernels
From: Available software

See part one of the FAQ for further information on some of the systems
listed below.

- [93-03-31-22-49.53] ACE is the distribution, support and sales
  channel for Amoeba.  `Due to overwhelming response from non-profit
  organisations wishing to obtain Amoeba for their research
  activities', VU is offering Amoeba 5.2 to research institutions for
  more or less free (via ftp at no charge, or on tape for $500 on
  Exabyte or $800 on QIC-24).  Amoeba currently supports 68020 and
  68030-based VME board machines, as well at i386- and i486-based AT
  PCs and Sun 3 and 4 machines.

  For further information on `commercial' Amoeba, you can contact ACE
  by email at <amoeba@ace.nl>, by phone at +31 20 664 6416, or by
  fax at +31 20 675 0389.  Universities interested in obtaining a
  license should send mail to <amoeba-license@cs.vu.nl>, or fax to
  +31 20 642 7705.

- Chorus Systemes has special programmes for universities interested
  in using Chorus.  For more information on the offerings available,
  conditions, and other details, get the following files:
  - <URL:ftp://ftp.chorus.fr/pub/>
  - <URL:ftp://ftp.chorus.fr/pub/academic/README>
  - <URL:ftp://ftp.chorus.fr/pub/academic/offerings>

- The Cronus object-oriented distributed system may be obtained via
  ftp from <URL:ftp://pineapple.bbn.com>; email
  <cronus-help@bbn.com> for details of the account name and
  password.  Before attempting to get the Cronus distribution, you
  must obtain, via anonymous ftp,
  <URL:ftp://pineapple.bbn.com/>.  Maintenance,
  hotline support, and training for Cronus are available from BBN.
  Send email to the above address for information on these, or on
  obtaining a commercial license.

- Flux is a Mach-based toolkit for developing operating systems; you
  can find more information about it on the Web at
  <URL:http://www.cs.utah.edu/projects/flux>.

- Horus is available for research use; contact Ken Birman
  <ken@cs.cornell.edu> or Robbert van Renesse
  <rvr@cs.cornell.edu> for details.

- Isis has not been publicly available since 1989, but may (I'm not
  sure) still be obtained using anonymous ftp from
  <URL:ftp://ftp.uu.net> or <URL:ftp://ftp.cs.cornell.edu>.  After 1989,
  the code was picked up by Isis Distributed Systems, which has
  subsequently developed and supported it.  The commercial version of
  Isis (available `at very low cost' to academic institutions) is
  available from the company.  Email <info@isis.com> for
  information, or call +1-212-979-7729 or +1-607-272-6327.

- Information on obtaining the latest Mach 4 distribution is available
  from the University of Utah's Mach 4 pages, at
  <URL:http://www.cs.utah.edu/projects/flux/mach4/html/Mach4-proj.html>.

- The Plan 9 distribution is now commercially available for $350; it
  consists of a two-volume manual, a CD-ROM with all the sources, and
  four PC diskettes comprising a binary-only installation of a fairly
  complete version of the system that runs on a PC.  For more
  information, <URL:http://plan9.att.com/plan9/index.html>; this site
  houses ordering information, a browsable copy of all the
  documentation, and the PC binary distribution.

  Kernels exist for the Sun SLC, Sun4Cs of various types,
  NeXTstations, MIPS Magnum 3000, SGI 4D series, AT&T Safari, `a whole
  bunch of' PCs, and the Gnot.

  Sydney University Basser Department of Computer Science has a port
  of Plan 9 underway to the DEC Alpha at the moment.  A port to the
  Sun 3 has been completed.  Contact <plan9info@cs.su.oz.au> for
  details.

  The Plan 9 user mailing list may be subscribed to by sending mail to
  <9fans-request@cse.psu.edu>.

- QNX is available for academic applications through an education
  support programme run by QNX Software Systems, whereby QNX systems
  can be obtained for educational purposes at very low cost.  For
  commercial and education availability and pricing, contact:
	QNX Software Systems		QNX Software Systems
	175 Terrence Matthews Cr.	Westendstr. 19
	Kanata, Ontario K2M 1W8		6000 Frankfurt am Main 1
	Canada				Germany

	1 800 363 9001			+49 69 9754 6156 x299
	+1 (613) 591 0931
	+1 (613) 591 3579 (fax)		+49 69 9754 6110 (fax)
  Versions after 4.2 of QNX run on the i386 and later processors, with
  a 16-bit kernel included for i286 machines.  Native optimisations
  and a compiler for the Pentium are also included.  Further marketing
  information can be obtained on the World Wide Web from
  <URL:http://www.qnx.com>.

- The 1.1 Research Distribution of the Spring distributed object
  oriented operating system is available.  Spring is a highly modular,
  object-oriented operating system, which is focused around a uniform
  interface definition language (IDL).  The system is intrinsically
  distributed, with all system interfaces being accessible both
  locally and remotely.

  The 1.1 Research Distribution adds a number of fixes and
  improvements, including a Spring-Java IDL system that facilitates
  writing Java applets that can talk across Spring IDL interfaces.

  The Spring SRD 1.1 Binary CDROM is $75 to Universities and $750 to
  commercial research institutions.  This includes all of the software
  and documentation necessary for installing, running, and developing
  new system modules and applications in Spring.  All binaries, IDL
  files, development tools, key exemplary sources, and course teaching
  materials are included.  A standard full source license and source
  CDROM is also available for $100 to Universities and $1000 to
  commercial research institutions.

  For more details and ordering information, see
  <URL:http://www.sun.com/tech/projects/spring>.

- [93-02-07-16-03.48] The Sprite Network Operating System is available
  on CD-ROM.  The disc contains the source code and documentation for
  Sprite, a research operating system developed at the University of
  California, Berkeley.  All the research papers from the Sprite
  project are also included on the disc.  This software on this disc
  is primarily intended for research purposes, and is not really
  intended to be used as a production system.  Boot images are
  provided for Sun SPARCstations and DECstations.  The CD-ROM is in
  ISO-9660 format with Rock Ridge extensions.  The disc contains about
  550 megabytes of software.

  You can get an overview of the Sprite Project, and a complete list
  of what is on this disc, by anonymous ftp from
  <URL:ftp://cdrom.com/pub/cdroms/>.

  If you would like a CD-ROM please send $25.  Add $4.95 if you would
  like a caddy too.  S&H is $5 (per order, not per disc) for
  US/Can/Mex, and $10 for overseas.  If you live in California, please
  add sales tax.  You can send a check or money order, or you can
  order with Mastercard/Visa/AmEx.
	Bob Bruce <rab@cdrom.com>
	Walnut Creek CDROM
	1547 Palos Verdes Mall, Suite 260
	Walnut Creek, CA 94596
	United States

	   1 800 786-9907 (USA only)
	  +1 510 947-5996
	  +1 510 947-1644 (fax)

- VSTa is a copylefted system written by Andrew Valencia
  <vandys@cisco.com> which uses ideas from several research
  operating systems in its implementation.  It is currently in an
  `experimental but usable' state, and supports `lots of' POSIX, and
  runs on a number of different PC configurations.  For further
  information, send mail to <vsta-request@cisco.com>, or ftp to
  <URL:ftp://ftp.cygnus.com/pub/embedded/vsta>.

[Chorus, Clouds?, Choices?]

--------------------------------------------------------------------------------
Subject: [1.4.2] Unix lookalikes
From: Available software

- FreeBSD is available via ftp from
  <URL:ftp://ftp.freebsd.org/pub/>,
  <URL:ftp://ftp.cosy.sbg.ac.at/pub/mirror/>, and
  <URL:ftp://pdq.coe.montana.edu/pub/mirrors/unix/>.  The latest
  version is derived from 4.4BSD Lite, and contains many extensions.
  See <URL:http://www.freebsd.org> for further information.

- NetBSD is available via ftp from
  <URL:ftp://ftp.netbsd.org/pub/NetBSD>, and is also derived from
  4.4BSD Lite.  See <URL:http://www.netbsd.org> for more information.

- Linux is available via anonymous ftp from
  <URL:ftp://tsx-11.mit.edu/pub/>, <URL:ftp://ftp.funet.fi/pub/OS/Linux>,
  and <URL:ftp://sunsite.unc.edu/pub/Linux>.  It is a freely-distributable
  System V compatible Unix, and is covered by the GNU General Public
  License.  Linux runs almost all PCs with i386 or better CPUs and at
  least 4 megabytes of memory.  See <URL:http://www.linux.org> for further
  details.

- 386BSD is available via ftp from
  <URL:ftp://agate.berkeley.edu/pub/> or
  <URL:ftp://ftp.uu.net:systems/unix/>.  It lies mid-way between
  4.3BSD Reno and 4.4BSD internally, and contains no AT&T-copyrighted
  code.  386BSD runs on ISA bus PCs with i386 or better CPUs.  Use of
  386BSD is not recommended, since it is unstable and has long since
  been superseded by FreeBSD and NetBSD.

- The Hurd is the GNU operating system, being written by Michael
  Bushnell.  It is based on Mach 3.0, and should be available on most
  systems to which Mach has been ported.  A preliminary runnable image
  may be fetched from
  <URL:ftp://alpha.gnu.ai.mit.edu/gnu/>.  Trent
  A. Fisher <trent@gnurd.uu.pdx.edu> runs an unofficial Hurd page
  at <URL:http://www.cs.pdx.edu/~trent/gnu/hurd.html>.

- Lites is a free 4.4BSD-based Unix server which runs on top of Mach.
  Lites provides binary compatibility with 4.4 BSD. NetBSD (0.8, 0.9,
  and 1.0), FreeBSD (1.1.5 and 2.0), 386BSD, UX (4.3BSD) and Linux on
  the i386 platform.  It has also been ported to the pc532, and
  PA-RISC. Preliminary ports to the R3000 and Alpha processors have
  also been made.  For more information, see the Lites home page at
  <URL:http://www.cs.hut.fi/lites.html>, and see also
  <URL:http://www.cs.utah.edu/projects/flux/lites/html>.


--------------------------------------------------------------------------------
Subject: [1.4.3] Others
From: Available software

[93-03-18-10-19.02] Microsoft is making sources of Windows NT
available under license to universities and research laboratories.
You should have the appropriate officials contact
<ntsrcreq@microsoft.com> to get started on this process.

Patrick Bridges' operating systems home page at
<URL:http://www.cs.arizona.edu/people/bridges/oses.html> is an
excellent source of information on a variety of other operating
systems.


--------------------------------------------------------------------------------
Subject: [2] Performance and workload studies
From: Performance and workload studies

This section covers various different publicly-available traces and
studies, libraries and source distributions, which may be of use.

--------------------------------------------------------------------------------
Subject: [2.1] TCP internetwork traffic characteristics
From: Performance and workload studies

- The Internet Traffic Archive is a moderated repository to support
  widespread access to traces of Internet network traffic.  The traces
  can be used to study network dynamics, usage characteristics, and
  growth patterns, as well as providing the grist for trace-driven
  simulations.  The archive is also open to programs for reducing raw
  trace data to more manageable forms, for generating synthetic
  traces, and for analyzing traces.  The archive is available on the
  Web at <URL:http://town.hall.org/Archives/pub/ITA>.
 
  There you will find a description of the archive, its associated
  mailing lists, the moderation policy and submission guidelines, and
  the contents of the archive (traces and programs).

- [92-10-20-15-04.39] Peter Danzig and Sugih Jamin of USC have made
  available a report and a source library which simulates realistic
  day-to-day network traffic between nodes.  The library, tcplib, `is
  motivated by our observation that present-day wide-area tcp/ip
  traffic cannot be accurately modeled with simple analytical
  expressions, but instead requires a combination of detailed
  knowledge of the end-user applications responsible for the traffic
  and certain measured probability distributions'.

  The technical report and the source library it describes are
  available via anonymous ftp from
  <URL:ftp://jerico.usc.edu/pub/jamin/tcplib>.  All you need to
  transfer to use the library are: README, brkdn_dist.h, tcpapps.h,
  tcplib.1, and one of libtcp* that matches your setup.  You need
  tcplib.tar.Z only if you must generate the library yourself.  The
  file tcplibtr.ps.Z is the PostScript version of the report.  The
  authors may be contacted at <traffic@excalibur.usc.edu>.

- [93-08-09-15-15.54] Vern Paxson of Lawrence Berkeley Laboratories
  has a report available via anonymous ftp which describes analytic
  models for wide-area TCP connections based upon a set of wide-area
  traffic traces.  The report may be obtained from
  <URL:ftp://ftp.ee.lbl.gov/WAN-TCP-models.{1,2}.ps.Z>.

- [93-05-13-10-54.09] Vern Paxson also has made available another
  report, <URL:ftp://ftp.ee.lbl.gov/WAN-TCP-growth-trends.ps.Z>, which
  provides an analysis of the growth trends of a medium-sized research
  laboratory's wide-area TCP connections over a period of more than
  two years.

--------------------------------------------------------------------------------
Subject: [2.2] File system traces
From: Performance and workload studies

- Randy Appleton <randy@dcs.uky.edu> has a set of filesystem traces
  which detail every operation performed during a period of more than
  a week (several hundred thousand events).  Timestamps on the traces
  are accurate to under a millisecond.  For more details, contact the
  author, or visit <URL:http://www.dcs.uky.edu/~randy/Research/index.html>.

- Chris Ruemmler has done a study on low-level disk access patterns
  for a workstation, a server, and a time-shared system which appeared
  in the Winter 1993 USENIX proceedings.  A copy may be obtained via
  anonymous ftp from <URL:ftp://ftp.hpl.hp.com/wilkes/>.

- Stephen Russell <smr@cs.unsw.oz.au> has instrumented the SunOS 4.1.x
  kernel running on Sun 3 machines.  The system allows time-stamped
  event records to be obtained from various points in the kernel.
  Events can be categorised (eg, paging, file system, etc), and are
  read via pseudo-devices.  Ioctl calls allow substreams to be
  enabled/disabled, buffer status checked, etc.  An external high
  resolution timer is used for timestamping.

- [93-05-09-09-23.32] The traces used in `Measurements of a
  distributed file system' (SOSP 1991) may be obtained from
  <URL:http://now.cs.berkeley.edu/Xfs/SpriteTraces>.

--------------------------------------------------------------------------------
Subject: [2.3] Modern Unix file and block sizes
From: Performance and workload studies

The following sections are lifted more or less verbatim from a number
of traces which were co-ordinated and analysed by Gordon Irlam
<gordoni@home.base.com>.  The numbers quoted below are based on Unix
file size data for 12 million files, residing on 1000 file systems,
with a total size of 250 gigabytes.

Further information may be obtained on the World Wide Web at
<URL:http://www.base.com/gordoni/ufs93.html>.

--------------------------------------------------------------------------------
Subject: [2.3.1] File sizes
From: Performance and workload studies

There is no such thing as an average file system.  Some file systems
have lots of little files.  Others have a few big files.  However as a
mental model the notion of an average file system is invaluable.

The following table gives a break down of file sizes and the amount of
space they consume.

   file size       #files  %files  %files   disk space  %space  %space
(max. bytes)                        cumm.         (Mb)           cumm.
           0       147479     1.2     1.2          0.0     0.0     0.0
           1         3288     0.0     1.2          0.0     0.0     0.0
           2         5740     0.0     1.3          0.0     0.0     0.0
           4        10234     0.1     1.4          0.0     0.0     0.0
           8        21217     0.2     1.5          0.1     0.0     0.0
          16        67144     0.6     2.1          0.9     0.0     0.0
          32       231970     1.9     4.0          5.8     0.0     0.0
          64       282079     2.3     6.3         14.3     0.0     0.0
         128       278731     2.3     8.6         26.1     0.0     0.0
         256       512897     4.2    12.9         95.1     0.0     0.1
         512      1284617    10.6    23.5        566.7     0.2     0.3
        1024      1808526    14.9    38.4       1442.8     0.6     0.8
        2048      2397908    19.8    58.1       3554.1     1.4     2.2
        4096      1717869    14.2    72.3       4966.8     1.9     4.1
        8192      1144688     9.4    81.7       6646.6     2.6     6.7
       16384       865126     7.1    88.9      10114.5     3.9    10.6
       32768       574651     4.7    93.6      13420.4     5.2    15.8
       65536       348280     2.9    96.5      16162.6     6.2    22.0
      131072       194864     1.6    98.1      18079.7     7.0    29.0
      262144       112967     0.9    99.0      21055.8     8.1    37.1
      524288        58644     0.5    99.5      21523.9     8.3    45.4
     1048576        32286     0.3    99.8      23652.5     9.1    54.5
     2097152        16140     0.1    99.9      23230.4     9.0    63.5
     4194304         7221     0.1   100.0      20850.3     8.0    71.5
     8388608         2475     0.0   100.0      14042.0     5.4    77.0
    16777216          991     0.0   100.0      11378.8     4.4    81.3
    33554432          479     0.0   100.0      11456.1     4.4    85.8
    67108864          258     0.0   100.0      12555.9     4.8    90.6
   134217728           61     0.0   100.0       5633.3     2.2    92.8
   268435456           29     0.0   100.0       5649.2     2.2    95.0
   536870912           12     0.0   100.0       4419.1     1.7    96.7
  1073741824            7     0.0   100.0       5004.5     1.9    98.6
  2147483647            3     0.0   100.0       3620.8     1.4   100.0

A number of observations can be made:
  - the distribution is heavily skewed towards small files
  - but it has a very long tail
  - the average file size is 22k
  - pick a file at random: it is probably smaller than 2k
  - pick a byte at random: it is probably in a file larger than 512k
  - 89% of files take up 11% of the disk space
  - 11% of files take up 89% of the disk space

Such a heavily skewed distribution of file sizes suggests that, if one
were to design a file system from scratch, it might make sense to
employ radically different strategies for small and large files.

The seductive power of mathematics allows us treat a 200 byte and a
2MB file in the same way.  But do we really want to?  Are there any
problems in engineering where the same techniques would be used in
handling physical objects that span 6 orders of magnitude?

A quote from sci.physics that has stuck with me: `When things change
by 2 orders of magnitude, you are actually dealing with fundamentally
different problems'.

People I trust say they would have expected the tail of the above
distribution to have been even longer.  There are at least some files
in the 1-2G range.  They point out that DBMS shops with really large
files might have been less inclined to respond to a survey like this
than some other sites.  This would bias the disk space figures, but it
would have no appreciable effect on file counts.  The results gathered
would still be valuable because many static disk layout issues are
determined by the distribution of small files and are largely
independent of the potential existence of massive files.

(It should be noted that many popular DBMSs, such as Oracle, Sybase,
 and Informix, use raw disk partitions instead of Unix file systems
 for storing data, hence the difficulty in gathering data about them
 in a uniform way.)

--------------------------------------------------------------------------------
Subject: [2.3.2] Block sizes
From: Performance and workload studies

The last block of a file is normally only partially occupied, and so
as block sizes are increased so too will the the amount of wasted disk
space.

The following historical values for the design of the BSD FFS are
given in `Design and implementation of the 4.3BSD Unix operating
system':

fragment size   overhead
   (bytes)        (%)
      512         4.2
     1024         9.1
     2048        19.7
     4096        42.9

Files have clearly gotten larger since then; I obtained the following
results:

fragment size   overhead
   (bytes)        (%)
      128         0.3
      256         0.6
      512         1.1
     1024         2.5
     2048         5.4
     4096        12.3
     8192        27.8
    16384        61.2

By default the BSD FFS typically uses a 1k fragment size.  Perhaps
this size is no longer optimal and should be increased.

(The FFS block size is constrained to be no more than 8 times the
 fragment size.  Clustering is a good way to improve throughput for
 FFS based file systems, but it doesn't do very much to reduce the not
 insignificant FFS computational overhead.)

It is interesting to note that even though most files are less than 2K
in size, having a 2K block size wastes very little space, because disk
space consumption is so totally dominated by large files.

--------------------------------------------------------------------------------
Subject: [2.3.3] Inode ratios
From: Performance and workload studies

The BSD FFS statically allocates inodes.  By default one inode is
allocated for every 2K of disk space.  Since an inode consumes 128
bytes this means that by default 6.25% of disk space is consumed by
inodes.

It is important not to run out of inodes since any remaining disk
space is then effectively wasted.  Despite this allocating 1 inode for
every 2K is excessive.

For each file system studied I worked out the minimum sized disk it
could be placed on.  Most disks needed to be only marginally larger
than the size of their files, but a few disks, having much smaller
files than average, needed a much larger disk---a small disk had
insufficient inodes.

bytes per   overhead
  inode       (%)
   1024      12.5
   2048       6.3
   3072       4.5
   4096       4.2
   5120       4.4
   6144       4.9
   7168       5.5
   8192       6.3
   9216       7.2
  10240       8.3
  11264       9.5
  12288      10.9
  13312      12.7
  14336      14.6
  15360      16.7
  16384      19.1
  17408      21.7
  18432      24.4
  19456      27.4
  20480      30.5

Clearly, the current default of one inode for every 2K of data is too
small.  Earlier results suggested that allocating one inode for every
5-6k was in some sense optimal, and allocating one inode for every 8k
would only be 0.4% worse.  The new data suggests one inode for every
4k is optimal, and allocating one inode for every 8k would be 2.1%
worse.

The analysis technique I used is very sensitive to even a few file
systems with very small files.

The main source of file systems with lots of small files would appear
to be netnews servers.  The typical Usenet message would appear to be
1-2k in length.  Ignoring such file systems would drastically alter
the conclusions I reach.  If, as I believe might already be the case,
news servers are manually tuned to have a lower than normal bytes per
inode ratio, it would then be possible to justify setting the default
ratio much higher.

Clearly it is best if the file system dynamically allocate inodes; I
believe AIX does this for instance.  Systems that statically allocate
inodes should probably increase the bytes per inode ratio, but it is
not clear to exactly what value.  The engineer in me says `it is
important to play this one conservatively: stick to 6k', the artist
goes `as Chris Torek says: aesthetics, 8k'.


--------------------------------------------------------------------------------
Subject: [3] Papers, reports, and bibliographies
From: Papers, reports, and bibliographies

Network-available documents are listed in this section.  I'd like to
see information for obtaining other sets of reports which aren't
electronically-available included here as well, at some stage.

--------------------------------------------------------------------------------
Subject: [3.1] From where are papers for distributed systems available?
From: Papers, reports, and bibliographies

Amoeba
	<URL:ftp://ftp.cs.vu.nl/amoeba>
	<URL:http://www.cs.vu.nl/vakgroepen/cs/amoeba.html>
	<URL:ftp://ftp.cse.ucsc.edu/pub/amoeba>

Arjuna
	<URL:ftp://arjuna.ncl.ac.uk/pub/Arjuna>

Choices
	<URL:ftp://choices.cs.uiuc.edu/>

Chorus
	<URL:ftp://ftp.chorus.fr/pub/>
	<URL:ftp://cse.ogi.edu/pub/chorus/reports>

Clouds
	<URL:ftp://helios.cc.gatech.edu/pub/>

Cronus
	<URL:ftp://pineapple.bbn.com/>

Mungi
	<URL:http://i30www.ira.uka.de/projects/cosy/index.html>

ExOS
	<URL:http://www.pdos.lcs.mit.edu/exo.html>

Flexmach
	<URL:http://www.cs.utah.edu/projects/flexmach>

Fox
	<URL:http://www.cs.cmu.edu/afs/cs.cmu.edu/project/fox/mosaic/HomePage.html>

Guide
	<URL:ftp://ftp.imag.fr/pub/GUIDE/>

Horus
	<URL:ftp://ftp.cs.cornell.edu/pub/Horus>

Isis
	<URL:ftp://ftp.cse.ucsc.edu/pub/bib/isis.bib>
	<URL:ftp://ftp.cs.cornell.edu/pub>

Mach
	<URL:ftp://mach.cs.cmu.edu/>
	<URL:http://www.cs.cmu.edu/afs/cs.cmu.edu/project/mach/public/www/mach.html>
	<URL:http://riwww.osf.org:8001/os/index.html>
	<URL:http://www.cs.utah.edu/projects/flexmach/mach4/html/Mach4-proj.html>

Nebula
	<URL:http://www.sys.cse.psu.edu/NEBFS/nebula.html>

PEACE
	<URL:http://www.gmd.de/FIRST/peace/peace.html>

Plan 9
	<URL:ftp://plan9.att.com/plan9/>
	<URL:http://www.ecf.toronto.edu/plan9>
	<URL:http://plan9.att.com/plan9/plan9doc>
	<URL:http://cooper.edu:9000/~rp/plan9/plan9-info.html>
	<URL:ftp://plan9.att.com/plan9/>

RTmach
	<URL:http://www.cs.cmu.edu:8001/afs/cs.cmu.edu/project/art-6/www/rtmach.html>

Spring
	<URL:http://www.sun.com/technology-research/spring>

SUNMOS / Puma
	<URL:http://www.cs.sandia.gov/~rolf/puma/puma.html>

Tigger

X kernel / Scout
	<URL:ftp://cs.arizona.edu/pub/xkernel>
	<URL:http://www.cs.arizona.eduxkernel/www>

Papers covering Amoeba, Choices, Chorus, Clouds, the Hurd, Guide,
Mach, Mars, NonStop, and Plan 9 are also available via anonymous ftp
from <URL:ftp://ftp.funet.fi/pub/doc/OS>.

[I'd like to find the authoritative home for V---Mars and NonStop are
 a bit more obscure, I think; they certainly aren't asked after much]

--------------------------------------------------------------------------------
Subject: [3.2] Where can I find other papers?
From: Papers, reports, and bibliographies

Angel
	<URL:ftp://ftp.cs.city.ac.uk/>

Apertos
	<URL:http://www.csl.sony.co.jp/project/Apertos>

Cache kernel
	<URL:http://www-dsg.stanford.edu/papers/cachekernel/main.html>

Hive
	<URL:http://www-flash.stanford.edu/OS>

Mungi
	<URL:ftp://ftp.vast.unsw.edu.au/pub/>

KeyKOS
	<URL:ftp://cs.dartmouth.edu/pub/sasos/papers/>

Pegasus
	<URL:http://www.cl.cam.ac.uk/Research/SRG/pegasus.html>

QNX [93-09-19-22-22.26]
	<URL:ftp://ftp.cse.ucsc.edu/pub/qnx>
	<URL:ftp://ftp.qnx.com/pub/papers>
	<URL:http://www.qnx.com>

Solaris 2.x [93-02-23-12-12.43]
	<URL:ftp://opcom.sun.ca/pub/docs/>
	<URL:ftp://opcom.sun.ca/pub/docs/>

SPIN
	<URL:http://www.cs.washington.edu:80/research/projects/spin/www>

Synthetix
	<URL:http://www.cse.ogi.edu/DISC/projects/synthetix>

VSTa
	<URL:http://www.cen.uiuc.edu/~jeske/VSTa>

Windows NT [92-09-18-11-46.16]
	<URL:ftp://ftp.uu.net/vendor/microsoft/win32-api>
	<URL:ftp://ftp.uu.net/vendor/microsoft/isv-communications>

--------------------------------------------------------------------------------
Subject: [3.3] Where can I find bibliographies?
From: Papers, reports, and bibliographies

Distributed shared memory
	<URL:http://www.cs.uno.edu/~rasit/dsmbiblio.html>

Load balancing
	<URL:ftp://ftp.cse.ucsc.edu/pub/bib/load-balancing.bib>

Mobile computing
	<URL:ftp://ftp.comp.lancs.ac.uk/pub/mpg>

Multimedia operating systems [94-04-15-23-29.51]
	<URL:ftp://cs.ucsd.edu/pub/>
	<URL:ftp://ftp.cse.ucsc.edu/pub/bib/mmos.bib>

Object-oriented operating systems
	<URL:ftp://ftp.cse.ucsc.edu/pub/bib/ooos.bib.Z>
	<URL:ftp://ftp.inria.fr/INRIA/bib/>

Parallel and distributed I/O
	<URL:ftp://ftp.cse.ucsc.edu/pub/bib/io.bib>

Sprite network operating system
	<URL:ftp://ftp.cs.berkeley.edu/ucb/sprite/sprite.html>

See also the section on General Net Resources.

[There's quite a lot more at <URL:ftp://ftp.cse.ucsc.edu/pub/bib>, if
 anyone wants to add more to this list.]


--------------------------------------------------------------------------------
Subject: [4] General Internet-accessible resources
From: General Internet-accessible resources

This section contains information about a variety of services
available to the OS research community via the Internet.

--------------------------------------------------------------------------------
Subject: [4.1] Wide Area Information Service (WAIS) and World-Wide Web (WWW) servers
From: General Internet-accessible resources

[92-09-21-16-38.23] Loughborough University high-performance
networking and distributed systems archive may be accessed via the
World Wide Web at <URL:http://hill.lut.ac.uk/DS-Archive>.  This archive
contains, according to Jon Knight <J.P.Knight@lut.ac.uk>, the
organiser:

- Technical reports and papers written at LUT by the networks and
  distributed systems researchers in the Department of Computer
  Studies.

- Technical reports, papers and theses which have been produced at
  other sites and then made available for public electronic access.

- Software which is of use in research or which has been produced by a
  specific research project.

- Details of relevant conferences, collected from a variety of sources
  (USENET, email, flyers, etc).

- Information on ongoing research projects.

- Bibliographies that have been generated for research at LUT and also
  access to other WAIS indexed bibliographies, both at LUT and
  elsewhere.

- A list of contacts in the field, with details of their research
  interests.  This is entirely voluntary (i.e. people have agreed to
  Jon entering their details rather than him just rooting round the
  Internet to build up the information).

Bibliographies in the comp.os.research collection are accessible via
WAIS from UCSC.
	(:source 
	 :version  3 
	 :ip-address "128.114.134.19"
	 :ip-name "ftp.cse.ucsc.edu"
	 :tcp-port 210
	 :database-name "os-bibliographies"
	 :cost 0.00 
	 :cost-unit :free 
	 :maintainer "paul@cse.ucsc.edu"
	 :description "Server created with WAIS release 8 b5
		on Jul 9 22:38:27 1992 by paul@cse.ucsc.edu
		The files of type bibtex used in the index
		were: /home/ftp/pub/bib"
	)


--------------------------------------------------------------------------------
Subject: [4.2] Refdbms---a distributed bibliographic database system
From: General Internet-accessible resources

[92-10-01-11-39.32] The 13th alpha release of refdbms version 3,
developed by John Wilkes of the Concurrent Systems Project at
Hewlett-Packard Laboratories and Richard Golding of the Concurrent
Systems Laboratory at UC Santa Cruz, is now available.  It can be
obtained by anonymous ftp from <URL:ftp://ftp.cse.ucsc.edu/pub/refdbms>.
The system has been tested on Sun 3 and 4 systems running SunOS 4.1.x,
and on DECstations running Ultrix 4.1.  It is an experiment in
building weak-consistency wide-area distributed applications, and the
databases currently available for the system have a good systems
coverage.

The system includes tools to query the database, to produce
bibliographies for LaTeX documents, and to enter new references into
the database.  It is part of ongoing research into wide-area
distributed information systems on the Internet.

Features include:

- Distributed databases: a reference database can be shared among
  multiple sites.  Updates can be entered at any site, and will be
  propagated to the other sites holding a replica of the database.

- Multiple databases: every database has a name, and users specify the
  order in which databases will be searched.

- Private databases: databases can be private, available site-wide, or
  they can be made available to other sites.

- Database query by keyword, author, and title word.

- Translator for refer-format databases.

- Usable with LaTeX documents: the internal refdbms format can be
  translated into a special BibTeX format.

An up-to-date list of bibliographies exported by various institutions
may be obtained using anonymous ftp from
<URL:ftp://ftp.cse.ucsc.edu/pub/refdbms/current-databases>.


--------------------------------------------------------------------------------
Subject: [4.3] Willow -- the information looker-upper
From: General Internet-accessible resources

The University of Washington's Willow system provides a Motif-based
user interface to a heterogeneous collection of on-line bibliographic
databases.  It will compile and run on most systems which provide a
Motif library.

For further information, see the Willow home page at
<URL:http://www.cac.washington.edu/willow/home.html>.


--------------------------------------------------------------------------------
Subject: [4.4] Computer science bibliographies and technical reports
From: General Internet-accessible resources

- A collection of bibliographies in various fields of computer science
  is available via anonymous ftp and the World Wide Web.  The
  bibliographies contain about 260,000 references, most of which are
  references to journal articles, conference papers or technical
  reports.  The collection has been formed by using various freely
  accessible services in the Internet (anonymous ftp, mailserver,
  wais, telnet) and converting each bibliography into a uniform BibTeX
  format.  It is organised in files containing references to a (more
  or less) specific area within computer science.

  The database has been organised by Alf-Christian Achilles
  <achilles@ira.uka.de>.  It may be accessed on the Web at
  <URL:http://liinwww.ira.uka.de/bibliography/index.html>, via ftp from
  <URL:ftp://ftp.cs.umanitoba.ca/pub/bibliographies>, and through a
  more useful search mechanism on the Web at
  <URL:http://glimpse.cs.arizona.edu/1994/bib>.

- As part of the ARPA Electronic Library Project, the Database Group
  at Stanford is providing a Selective Dissemination of Information
  (SDI) service to disseminate information about computer science
  technical reports.  You can have a server email you periodic
  announcements of new papers on topics that interest you.

  See <URL:http://cs-tr.cs.cornell.edu/Info/cstr.html> for details, or
  contact Tak Yan <tyan@cs.stanford.edu> or the mail server itself
  at <elib@db.stanford.edu>.


--------------------------------------------------------------------------------
Subject: [4.5] The comp.os.research archive
From: General Internet-accessible resources

[93-02-18-21-18.31] An archive of all messages posted to
comp.os.research since 1988 is maintained at UC Santa Cruz.  It may be
accessed via anonymous ftp at
<URL:ftp://ftp.cse.ucsc.edu/pub/comp.os.research>.  The archive is
organised by year.

Postings may also be found via WAIS at UCSC's Computer Science gopher
hole:
	(:source 
	 :version  3 
	 :ip-address "128.114.134.19"
	 :ip-name "ftp.cse.ucsc.edu"
	 :tcp-port 210
	 :database-name "comp-os-research"
	 :cost 0.00 
	 :cost-unit :free 
	 :maintainer "paul@cse.ucsc.edu"

	 :description "Server created with WAIS release 8 b5
		on Jul 9 03:51:11 1992 by paul@cse.ucsc.edu
		The files of type netnews used in the index
		were: /home/ftp/pub/comp.os.research"
	)


--------------------------------------------------------------------------------
Subject: [4.6] Miscellaneous resources
From: General Internet-accessible resources

- Paul Harrington <phrrngtn@dcs.st-andrews.ac.uk> maintains a World
  Wide Web page on checkpointing, at
  <URL:http://warp.dcs.st-and.ac.uk/warp/systems/checkpoint>.
  
- Jay Lepreau <lepreau@cs.utah.edu> has made available an
  electronic version of the proceedings of OSDI '94 at
  <URL:http://www.cs.utah.edu/~lepreau/osdi94>.  Available are such
  things as
  - Papers:               abstracts, papers, slides, bibtex entries,
                          and for most, the actual software.
  - Keynote:              audio and slides
  - Extensible OS panel:  audio, slides, project URLs
  - Insularity panel:     audio
  - Mach/Chorus workshop: TRs for most, slides, some software
  - Tutorials:            slides for half, descriptions for all
  - Miscellaneous:        summary report from ;login, list of works-in-progress
                          talks, hard-copy proceedings ordering info, CFP,
                          proceedings introduction, list of referees.


--------------------------------------------------------------------------------
Subject: [5] Disclaimer and copyright
From: Disclaimer and copyright

Note that this document is provided as is.  The information in it is
not warranted to be correct; you use it at your own risk.
 
Following recent reports on the <faq-maintainers@mit.edu> list I
think it wise to change the copyright:

NOTICE OF COPYRIGHT AND PERMISSIONS

Answers to Frequently Asked Questions for comp.os.research (hereafter
referred to as These Articles) are Copyright (C) 1993, 1994, 1995, and 1996
by Bryan O'Sullivan <bos@serpentine.com>.  They may be reproduced and
distributed in whole or in part, subject to the following conditions:
 
- This copyright and permission notice must be retained on all
  complete or partial copies of These Articles.

- These Articles may be copied or distributed in part or in full for
  personal or educational use.  Any translation, derivative work, or
  copies made for other purposes must be approved by the copyright
  holder before distribution, unless otherwise stated.

- If you distribute These Articles, instructions for obtaining the
  complete current versions of them free or at cost price must be
  included.  Redistributors must make reasonable efforts to maintain
  current copies of These Articles.

Exceptions to these rules may be granted, and I shall be happy to
answer any questions about this copyright notice -- write to Bryan
O'Sullivan, PO Box 62215, Sunnyvale, CA 94088-2215, USA or email
<bos@serpentine.com>.  These restrictions are here to protect the
contributors, not to restrict you as educators and learners.

------------------------------------------------------------------------

Comp.os.research: Frequently answered questions [3/3: l/m 13 Aug 1996]

--------------------------------------------------------------------------------

From: bos@serpentine.com (Bryan O'Sullivan)
Newsgroups: comp.os.research
Subject: Comp.os.research: Frequently answered questions [3/3: l/m 13 Aug 1996]
Date: 1 Nov 1997 10:00:34 GMT
Message-ID: <63euk2$iv@darkstar.ucsc.edu>
Reply-To: os-faq@cse.ucsc.edu
Summary: frequent topics of discussion on the operating systems research group

Archive-name: os-research/part3
Version: $Revision: 1.3 $
Posting-Frequency: monthly
Last-Modified: Tue Aug 13 21:03:20 1996
URL: http://www.serpentine.com/~bos/os-faq/

		Answers to frequently asked questions
		  for comp.os.research: part 3 of 3

		       Copyright (C) 1994--1996
			   Bryan O'Sullivan


			  TABLE OF CONTENTS


1.     Distributed systems
1.1.   What is the current status of the (insert name) project?
1.2.   How do approaches to load balancing differ?
1.3.   Fault tolerance in distributed systems
1.4.   Naming in distributed systems
1.5.   Distributed shared memory
1.5.1. Data consistency
1.5.1.1. Strictly consistent systems
1.5.1.2. Relaxing consistency
1.5.1.3. Application-specific coherence
1.5.2. Access synchronisation
1.5.3. Transfer and caching granularity
1.5.4. Address space structure
1.5.5. Fault tolerance
1.5.6. A brief bibliography on distributed shared memory
1.6.   What have we learned?

2.     Needful things


--------------------------------------------------------------------------------
Subject: [1] Distributed systems
From: Distributed systems

A great deal of the high-profile research carried out in operating
systems these days deals with distributed computing.  Not
surprisingly, discussions of distributed systems make up a large
amount of the traffic on comp.os.research.

--------------------------------------------------------------------------------
Subject: [1.1] What is the current status of the (insert name) project?
From: Distributed systems

See the section on `available software' for information on
distributions of some of the systems mentioned here.

- The Amoeba project is still going.  There are roughly 20 people
  working on it, but most of these are no longer kernel hackers.  They
  are working on using it for parallel programming, wide-area
  distributed systems, and other things.  Amoeba is used in over 100
  universities at the moment, and is also used at commercial
  institutions.

- Brazil is the new research operating system being developed at AT&T
  Bell Labs.  Research topics being addressed in Brazil center on
  higher-performance machines and, particularly, networks. A new
  in-house 300 megabit/s switched fiber network increases the
  potential bandwidth between machines by at least an order of
  magnitude; our aim is to realize and exploit that bandwidth. The
  overall design is to eliminate unnecessary overhead, particularly by
  restructuring and redesigning where necessary to avoid copying data
  from element to element along the communications path.  Most of this
  software (except the operating system kernel) is written in a new
  concurrent systems programming language, Alef, which makes it easy
  to write multi-process servers and applications that can communicate
  using messages or shared memory, as appropriate.  A paper on Alef is
  available from the Plan 9 ftp site; see part 2 of this FAQ for a
  pointer.

- Cronus is still under development at BBN.  The current public
  release is 3.0.  The project currently has two thrusts---as the base
  for advanced distributed system R&D, and as a platform for
  constructing and deploying sophisticated distributed applications.

  Ongoing research topics include the integration of Cronus and Mach
  technology, the exploration of techniques for the construction of
  WAN-based and multi-organisational applications, investigation into
  the integration of distributed systems and network management
  systems, and work in high-performance distributed computing.

- Horus is being developed by the same group that worked on Isis; the
  head of this group is Robbert van Renesse.

- Isis is no longer being developed at Cornell; it is now managed as a
  commercial product.

- Mach is no longer being developed at CMU.  Current work on Mach is
  being carried out by the OSF Research Institute and at the
  University of Utah.

- Plan 9 is no longer in development at AT&T Bell Labs.  fibre-optic
  network.  The operating systems research group at Bell Labs has
  moved on to a new project, called Brazil, which addresses portable
  computing and distributed applications programming.

- QNX is a commercial POSIX-certified realtime OS with an installed
  base of over 250,000 systems.  It is used extensively in process
  control, factory automation, medical instrumentation, communications
  and point-of-sale.  A number of universities are also doing research
  with QNX.

- The Sprite network operating system project has ended.

--------------------------------------------------------------------------------
Subject: [1.2] How do approaches to load balancing differ?
From: Distributed systems

Load-balancing policy falls into two broad groups: static and dynamic.
Static policies use algorithms which operate without regard to
run-time loads across a system, while dynamic policies use the
run-time performance of various parts of a system in order to make
more `informed' decisions about balancing.

[92-11-06-12-53.57] A dynamic load-balancing policy is one which uses
run-time state information in making scheduling decisions.

There are two kinds of dynamic policies: adaptive and non-adaptive.
The latter always use the same (fixed, load-dependent) policy; the
former may adjust policy parameters in order to gradually improve
their performance.

The key point is that while non-adaptive policies use only the
information about the run-time state, adaptive policies use, in
addition to this, information about current performance.

In adaptive policies, the rules for adjusting policy parameters may be
static or dynamic.  An example of the former might be: `shift to a
conservative migration rule when system-wide load patterns are varying
too rapidly'.  An example of the latter could be: `increase
sender-side threshold when migrated jobs cause slowdown rather than
speedup'.  Some researchers refer to the performance-driven adaptation
exhibited by the second policy as `learning'.

Since both non-adaptive policies and adaptive policies with static
rules really use only load information, it is confusing to distinguish
between them.  One way to avoid such confusion is to restrict the use
of the word `adaptive' to policies that use performance feedback in
order to drive their adjustment of policy parameters.

--------------------------------------------------------------------------------
Subject: [1.3] Fault tolerance in distributed systems
From: Distributed systems

One approach to providing fault tolerance in distributed systems
involves the use of redundant services, such that standby facilities
can become active in the event of the failure of, or loss of
connection to, a primary service.

Another approach is to provide multiple paths of connectivity between
the computers that make up the distributed system.  The QNX system,
for example, supports multiple network drivers per node.  The purpose
of the network connection under QNX is to merge the microkernels on
the LAN into a single logical kernel.  Hence, if multiple LAN
connections per node are present, the networking code can load balance
the LAN traffic on the paths available.  It can also route around
failed links, providing both greater LAN bandwidth and better fault
tolerance.

See below for treatment of fault tolerance in systems which make use
of distributed shared memory.

--------------------------------------------------------------------------------
Subject: [1.4] Naming in distributed systems
From: Distributed systems

[Material on naming and/or global naming sought.]

--------------------------------------------------------------------------------
Subject: [1.5] Distributed shared memory
From: Distributed systems

Distributed computer systems have evolved using message passing as
their main method of communication.  Other communication systems used
in loosely coupled distributed systems, such as RPC, are usually
implemented on top of an underlying message passing system.  On the
other hand, in tightly coupled systems, such as a multi-processor
machine, the communication method used is usually shared memory.

In distributed shared memory (DSM) systems [Nitzberg & Lo, 91],
processes share data transparently across node boundaries; data
faulting, location, and movement is handled by the underlying system.
Among other things, this allows parallel programs designed to use
shared memory to execute transparently on a loosely coupled
distributed system.  While the performance implications cannot be
ignored, the advantages of the shared memory programming model are
well known:

- Shared memory programs are usually shorter and easier to understand
  than equivalent message passing programs.

- Large or complex data structures may easily be communicated.

- Shared memory gives transparent process-to-process communication.

- Programming with shared memory is a well-understood problem.

Shared-memory (or `procedure-oriented') and message-oriented operating
systems are, in some sense, equivalent [Lauer & Needham, 78], though
it has been claimed that the former are `more powerful' [Tam et al.,
90].

--------------------------------------------------------------------------------
Subject: [1.5.1] Data consistency
From: Distributed systems

Despite recent advances in both local and wide-area networking
technologies, network latency is still a major factor in distributed
systems and likely to remain so.  All DSM systems provide some sort of
caching in an attempt to improve the performance beyond that provided
by doing a network access on every reference to a non-local data item.
Each system must decide whether or not to attempt to keep the data
coherent, and, if so, what coherence strategy to use.  The coherence
semantics which may be provided to the programmer include:

- `strict' consistency, where a read always returns the value written
  by the most recent write

- a `loosely' consistent system where the system enforces some form of
  weak consistency guarantees and the application (or compiler or
  user) can indicate synchronisation points where consistency must be
  enforced;

- no automatic consistency mechanism, but provide the user with the
  facilities necessary to implement user level synchronisation and
  consistency.

--------------------------------------------------------------------------------
Subject: [1.5.1.1] Strictly consistent systems
From: Distributed systems

Older, strictly consistent systems tend to enforce a single writer,
multiple reader model, where at any time data will be held either at a
single node (which may have write access) or several nodes (none of
which may have write access).

Given this model, we must be able to locate a copy of our data when it
is not resident.  The method most frequently used is to assign an
`owner' to each item of data, where the owner has either the only
writeable copy of the data, or one of the read-only copies.  Ownership
may remain fixed throughout the life of a datum, or it may change
dynamically.  In the latter case, the problem arises of locating the
owner.  A database of locations may be maintained by centralised
managers, or ownership information can be distributed among nodes of
the system [Li and Hudak, 89].

In a strictly consistent system, we must also be able to synchronise
writes.  The two major solutions to this problem are:

- Write broadcast.  The effects of every write are broadcast to ever
  node that has a copy of the data being written; this effectively
  implements a replication algorithm.  Write broadcast is usually
  considered too expensive to be used as a general solution.

- Write invalidation.  Each node in the system holding a read-only
  copy of the data being written is sent an invalidation message.

--------------------------------------------------------------------------------
Subject: [1.5.1.2] Relaxing consistency
From: Distributed systems

Permitting temporary inconsistencies is a common method of increasing
performance in distributed systems.  Memory is said to be loosely
coherent if the value returned by a read operation is the value
written by an update operation to the same object that `could' have
immediately preceded the read operation in some legal schedule of the
threads in execution [Bennett et al., 90].

Using loose coherence, more than one thread may have write access to
the same object, provided that the programmer knows that the writes
will not conflict.

Another memory consistency model is `release consistency'
[Gharachorloo et al., 90], in which memory accesses are divided into
ordinary and synchronisation-related accesses.  The latter are further
divided into `acquire' and `release' operations.  The `acquire'
operation indicates that shared data is needed, and a processor's
updates are not guaranteed to be performed at other nodes until a
`release' is performed.  The primary advantage of this form of
consistency is that it allows consistency updates to be tied to
synchronisation events, and therefore to be delayed until actually
needed by applications.  However, most release consistent systems
require the programmer to make explicit use of `acquire' and `release'
operations.

A DSM system called Midway introduces another new consistency model,
`entry consistency' [Bershad et al., 93].  Entry consistency is weaker
than many of the other models suggested, including release
consistency; it requires explicit annotations to associate
synchronisation objects and data.  On an `acquire', only the data
associated with the synchronisation object is guaranteed to be
consistent.  This extra weakness permits higher performance
implementations of the underlying consistency protocols to be written.
Midway also supports stronger consistency models, so that the
application programmer can trade-off performance against the extra
effort required to write entry consistent programs.

--------------------------------------------------------------------------------
Subject: [1.5.1.3] Application-specific coherence
From: Distributed systems

From [Cheriton, 86]:
  `Problem-oriented shared memory' is a shared memory that implements
  fetch and store operations specialised to the particular problem or
  application it is supporting.  In particular, a problem-oriented
  shared memory commonly provides a specialised form of consistency
  and consistency maintenance that exploits application-specific
  semantics.
Cheriton goes on to propose that consistency constraints be relaxed
and more use be made of problem semantics.  He suggests that, in some
cases, stale data may be detected on use by the client, and the client
may then recover.  A example would be hint caching.  In some
applications, stale data may actually be sufficiently accurate,
provided that the client can obtain up to date information when
necessary.  In other applications, some data may be optional in the
sense that the client can continue without it.  Other applications may
tolerate having the results of store operations being lost or undone,
for example, an application that regularly updates the entire data
set.

Another approach is presented by the designers of Munin, where the
runtime system accepts hints from the compiler or user to determine
the coherence mechanism to be used for each object.  The default, in
the absence of hints, is to use a general read-write consistency
mechanism, much like that employed by IVY.  Munin supports several
different object types that are based on the results of a survey of
shared memory access characteristics.  The results of the survey
showed that a very small percentage of all accesses to shared data
fall under the general read-write type.  The Munin designers also note
that a program moves through various stages of execution, and the
types associated with objects change as time progresses

--------------------------------------------------------------------------------
Subject: [1.5.2] Access synchronisation
From: Distributed systems

Most parallel applications will use some sort of synchronisation
system to order and control accesses to shared data before actually
accessing the data.  The most important thing to note in DSM systems
is that just blindly using standard test and set operations on bytes
in shared pages will produce a high fault rate; faults are usually
expensive, making this approach unacceptable.

Clouds merges locking with the cache consistency protocol, so that the
user may obtain both a lock and the data in one network transaction.
This system has the advantage that no invalidation messages are
required, since the granting of the lock guarantees that there are no
conflicting copies; it has the disadvantage that an explicit
unlock/discard operation is required to release access to the data.
This is acceptable in Clouds, as the DSM system was designed
specifically to support object invocation, so it is easy to discard on
a return.

Munin provides a distributed lock mechanism using `proxy objects' to
reduce network load.  Proxy objects are maintained by a lock server on
each node; when a thread wants to obtain a lock on an object, it
attempts to lock the proxy instead.  The server obtains the global
lock if it is not already held locally.  Global locking is done by
negotiating with all the other lock servers in the system.  Each lock
may be migrated from server to server, and part of the Munin system
allows objects to be migrated along with their locks.

Other systems, such as IVY and Mermaid, use modified versions of classic
multiprocessor synchronisation facilities.

--------------------------------------------------------------------------------
Subject: [1.5.3] Transfer and caching granularity
From: Distributed systems

When caching objects in local memory, it is necessary to decide what
level of granularity to use.  All current systems use a fixed block
size in the cache, rather than varying the granularity based on object
size.  Usually this is due to constraints imposed by the system
hardware and memory management.

The choice of the block size in the cache depends on several issues.

- Cost of communication: for example, on many local area networks
  there is little difference between the time required to send a
  one-byte message and that required to send a 1024-byte message.
  Transmitting bulk changes rather than single-byte modifications
  would therefore seem desirable.

- The choice of granularity also depends on the locality of reference
  in the application, as thrashing may occur when two machines are
  both accessing the same block (this is also known as the `ping-pong
  effect').  This would seem to argue for a smaller block size.  It
  should be noted that many object-oriented systems exhibit very poor
  locality of reference.

In practice, a compromise must be achieved, as with conventional
virtual memory systems.  Most systems use a block size which is the
same as that of the virtual memory management unit on the system, or a
multiple thereof.  Among other things, it allows the hardware to be
used to help in the maintenance of consistency.  The choice is
complicated somewhat when heterogeneous machines are being used, but
in these cases, the lowest common multiple of hardware supported page
sizes can usually be used.

The only major system that doesn't use a large block size is Memnet,
in which a hardware based DSM system was implemented on a high speed
token ring; a 32-byte block size was used instead [Delp & Farber].
The choice of a small block size is appropriate, as the system is much
closer to a shared memory multi-processor than it is to a software DSM
system.  This is because the entire processor is blocked on a cache
miss; the processor is not actually aware of the distributed nature of
its address space.  Also, the ratio between remote and local memory
access times is much lower than in the software based systems due to
the dedicated token ring (200Mbps) and hardware assistance.

--------------------------------------------------------------------------------
Subject: [1.5.4] Address space structure
From: Distributed systems

In a single shared address space system, the system appears as a set
of threads executing in a shared distributed address space.  Objects
always appear at the same addresses on all nodes.  Single address
space systems have had a resurgence in popularity with the arrival of
64-bit processors.  A number of researchers believe that a 64-bit
address space is large enough to act as a single global address space
for all the memory (both primary and secondary) in a distributed
system.  Examples of such systems include Angel, Mungi, and Opal.
Security and protection are a major problem in such systems, and
current approaches either rely on hardware assistance or stochastic
algorithms, or ignore the problem.

Another approach is to divide each process's address space into
different fixed regions, some of which are private and not shared, and
some of which are shared with some other processes.  Ra, the Clouds
kernel, takes this approach using O, P, and K address regions, with
the O region shared between all processes executing in a given object;
the P and K regions are local to a process and kernel, respectively.
Here objects always appear at the same address but may not be visible
from every address space.  By contrast, some systems, including Mirage
and Mach, allow shared data to exist at differing addresses in
different processes address spaces.  However, neither system does
transparent pointer translation, so the address changes are not
entirely transparent to the application.

As for the structuring of the shared region itself, some systems --
for example, IVY and Mether -- use a single flat region: one
continuous range of virtual addresses represent the shared address
space and are managed by the DSM system.  This single address space is
usually sub-divided into pages.  Most systems use paged segmentation:
the shared region consists of disjoint pieces, which are usually
managed separately and are not all mapped in any one process.
Frequently, the segments (sometimes called memory objects, or windows)
are related to the backing store.  For example, in Clouds, the object
address space consists of windows onto larger segments; these segments
are usually maintained on secondary storage.

--------------------------------------------------------------------------------
Subject: [1.5.5] Fault tolerance
From: Distributed systems

Most DSM systems ignore the fault tolerance issue or maintain that it
is an operating system issue and should be handled by the underlying
system.  However, it would appear that in practice a DSM system would
strongly effect the fault tolerance of a system.  For example, in a
system where several systems are sharing access to a set of data, the
failure of any one of them could lead to the failure of all the
connected sites (or, at least, some of the processes on each site).
We are also presented with an unusual failure handling problem.  It is
fairly easy to see how to handle a failed message or RPC, but how do
you handle a failed page fault?

The original Clouds system provided recoverability using shadowing of
segments and a transactional system using commits.  The recovery
system was not really integrated with the DSM system and was merely
implemented at the segment storage site.  In order to maintain a
consistent view of data when one transaction is active at multiple
nodes, they have more recently been forced to integrate the
transaction system with the DSM support system.

--------------------------------------------------------------------------------
Subject: [1.5.6] A brief bibliography on distributed shared memory
From: Distributed systems

[Nitzberg & Lo, 1991]
  Nitzberg, W. and Lo, V., `Distributed shared memory: a survey of
    issues and algorithms', IEEE Computer, August 91, pp. 52-60

[Lauer & Needham, 1978]
[Tam et al., 90]
  Tam, M.-C., Smith, J. M. & Farber, D. J., `A taxonomy-based
    comparison of several distributed shared memory systems', ACM
    Operating Systems Review 24(3), July 90, pp. 40-67

[Li and Hudak, 89]
  Li, K. & Hudak, P., `Memory coherence in shared virtual memory
    systems', ACM Transactions on Computer Systems 7(4), November 89,
    pp. 321-359

[Bennett et al., 90]
  Bennett, J. K., Carter, J. B. & Zwaenopoel, W., `Munin:
    distributed shared memory based on type-specific memory
    coherence', Proceedings of the 2nd ACM SIGPLAN Symposium on
    Principles and Practice of Parallel Programming, SIGPLAN Notices
    25(3), March 90, pp. 168-176

[Gharachorloo et al., 90]
  Gharachorloo, K., et al., `Memory consistency and event ordering in
    scalable shared-memory multiprocessors', ACM SIGARCH News 18(2),
    June 90

[Bershad et al., 93]
  Bershad, B. N., et al., `The Midway distributed shared memory
    system', Technical Report CMU-CS-93-119, School of Computer
    Science, Carnegie Mellon University, 1993.  Available via
    anonymous ftp from
    <URL:ftp://ftp.cs.cmu.edu/project/mach/public/doc/published/>.

[Cheriton, 86]
  Cheriton, D. R., `Problem-oriented shared memory: a decentralized
    approach to distributed system design', Proceedings of the 6th
    International Conference on Distributed Computing Systems, May 86,
    pp. 190-197

[Delp & Farber]
  Delp, G. S. & Farber, D. J., `Memnet -- a different approach to a
    network', Technical Report, Department of Electrical Engineering,
    University of Delaware, ???


--------------------------------------------------------------------------------
Subject: [1.6] What have we learned?
From: Distributed systems

Andy Tanenbaum started a (very long) thread on this topic in
comp.os.research in April of 1992 [92-04-03-17-10.05].  The interested
reader is directed to the comp.os.research archives, since this thread
proved rather divisive (i.e. nobody really agreed on any issue).


--------------------------------------------------------------------------------
Subject: [2] Needful things
From: Needful things

This FAQ is incomplete, and will probably remain in this state to a
greater or lesser extent for ever and ever.  Should you feel willing
to contribute some material, the following is a list of topics which
``urgently'' require treatment (some of which I may get around to
covering myself at some point):

- naming in distributed systems