IEN 141
INDRA Note 897
11th April 1980
Message System Issues
C. J. Bennett
ABSTRACT: This INDRA Note discusses the
design choices for the message server system
to be built at UCL. Particular issues
considered include: the nature of the UK user
community; the nature of the message service
to be offered on the server; the message
formats and transfer protocols to be used;
addressing; interworking with the ARPANET
community; and the design of the message
management system on the message server.
Table of Contents
1. Introduction...........................................1
2. The User Community.....................................1
3. Message Movement.......................................2
3.1 Message Format.....................................2
3.1.1 Message Format Staging........................3
3.2 Message Protocol...................................3
3.3 Message Transport..................................4
3.3.1 FTP Staging...................................5
3.4 Addressing.........................................5
3.5 Status Reporting...................................8
4. Message Server Design..................................8
4.1 User Interface.....................................8
4.2 Message Management.................................10
5. Conclusions............................................12
1. Introduction
Electronic message services have historically been
one of the most successful services to have developed
from the use of packet switched computer networks.
However, these facilities have not been available to
users of United Kingdom research data networks in the
past, and UK users who wished to send mail to remote
sites were required to obtain mailboxes on remote
machines in the United States, accessible via ARPANET.
With the development of public networks, in particular
IPSS and PSS, and in view of the UKPO's policy of
requiring users to move to these networks, it is no
longer economically feasible to continue this mode of
usage.
For these reasons it is proposed that University
College London will develop a message server system
based on a PDP-11/35 running UNIX and accessible
initially to users through the DARPA Catenet, and later
through PSS. This server would allow users to exchange
messages with other users on the same site, users of
ARPANET mail systems, and eventually users of other UK
and US message servers. The aim of this INDRA note is
to identify the design constraints on this system and
to suggest approaches that may be taken to meet them.
2. The User Community
Five major groups of users can be identified who can
be expected to interact with such a service in the
short term. These are:
(i) Current users of the ARPANET mail system,
especially UK users who have (until recently)
had dialin access through the TIP. The message
server would become the prime mail server for
this group. US users of ARPANET systems must be
able to send messages to this site. This group
will require messages formatted according to
the rules specified in RFC 733 (as modified by
actual practice).
(ii) Users of the DARPA Catenet, who will be using
at least three formats for intersite mail:
those of RFC 733; those of the Internet Mail
Protocol as defined in IEN 85; and the private
formats being developed by RSRE.
(iii) Users who wish to exchange messages between the
UCL server and other servers which may become
available through PSS. This group will
initially require only PSS access to the server
Bennett [Page 1]
INDRA Note 897, IEN 141 Message System Issues
and will exchanges messages locally, but in the
longer term it can be anticipated that other
mail servers will emerge on PSS.
(iv) Users who wish to exchange messages with US
message servers available through Telenet and
IPSS. In particular, such traffic may arise
through the US EDUNET project.
(v) UCL users who will exchange messages through
the UCL ring, and who will wish to exchange
messages with users in one or more of the other
three categories.
3. Message Movement
This section is concerned with the questions which
affect the movement of messages between the message
server and other message sites. Four major questions
must be considered: choice of message format; choice of
transport mechanism; mail protocol; and addressing.
3.1 Message Format
The message format may be based on one of the
following choices:
(i) ARPANET Format (RFC 733)
(ii) Internet Mail Format
(iii) RSRE Mail Format
(iv) Other format not currently in use amongst the
user community, such as those that may arise
through the work of IFIP TC6.5, or through
Telenet and EDUNET.
Of these choices, only the first is feasible at
present. It is that which is most widely used at the
moment, as it provides the current ARPANET mail
service, and the internal UCL Unix mail service, and it
is intended that it shall be used for initial DARPA
Catenet mail. The DARPA Internet Mail format is very
experimental, and although it is expected to remain
stable for the time being no experience has been gained
with it. Much the same comment applies to the RSRE
Bennett [Page 2]
INDRA Note 897, IEN 141 Message System Issues
system. The fourth choice involves either obtaining an
existing commercial system such as COMET, or devising a
new format from scratch. Both these possibilities would
result in considerable delay, and a UCL home-brewed
format would be unlikely to be any more satisfactory,
and would be much less acceptable to the users, than
other alternatives.
As it may be anticipated that the server will have to
interwork eventually with other formats, notably that
of RSRE and whatever emerges amongst the EDUNET group,
the development of other formats should be closely
tracked. It is expected that conversion will eventually
take place through the use of a common Internet Format
such as that being developed in the DARPA Internet
scheme.
3.1.1 Message Format Staging
One result of this is that users who will eventually
require a different format for messages for their own
server - initially, RSRE in particular - will require a
conversion between the two. It is expected that this
will take place at the UCL message server. As noted
above, it is to be hoped that conversions will take
place through a common intermediary format.
An important longterm question in this regard is how
widely the UCL message server system will be
distributed in the UK. If other message servers are
built along the same lines, then the format chosen will
become a __ _____ UK standard, at least among the UK
research community.
3.2 Message Protocol
The current ARPANET message protocol is essentially a
trivial extension to the ARPANET file transfer,
obtained through the MAIL option. This causes each
message to be sent as a separate file to be appended to
the message file of an individual user at that site.
Given future use of IPSS and PSS this is an uneconomic
option. There are two reasons for this.
(i) Demultiplexing for a message which is to be
copied to several users at the same site occurs
at the sender, not the receiver. Thus a
message for N users at site X is transferred N
times, even though it is identical. If mailers
Bennett [Page 3]
INDRA Note 897, IEN 141 Message System Issues
were capable of parsing the message headers
properly, the message need only be sent once.
(ii) For each message transferred a separate data
connection is set up. Thus a queue of N
messages for M sites (M < N) will require N + M
calls to be made. If the messages were
mailbagged by site, only 2M calls need be made.
(Note that if FTP control and data were mixed
on the same call, as in the NIFTP (see below),
these figures reduce to N and M respectively).
Both these changes have some impact on message
format. The first requires, as a minimum, that all
recipients of a message at a given site be visible in
the To: and Cc: fields - that is, it is not possible if
the mailing list facility is used in its current form.
In such cases, the sender must provide the list, and
the receiver must recognise that this list should be
suppressed or separated from the users' copies. It is
to be hoped that the Internet group will accept this
proposal as a minimum change to be made for use in the
Catenet, and that similar procedures will be set up by
other groups.
Mailbagging requires that different messages in a
file transferred must be clearly delimited. This
requires a mailbag structure to be defined - at the
very least, by defining a standard message separator.
However, it does not require restructuring of
individual messages. This is a much more important
change than the first, and as the saving is likely to
be less, it is proposed here that it should await the
results of experiments with the Internet Mail Protocol.
3.3 Message Transport
There are two major choices to be made for the
message transport service, namely the TCP FTP, derived
from the ARPANET FTP, and the NI FTP. It is expected
that the first will be used for mail within the
Catenet, using the same MAIL option as used within the
ARPANET. As has been seen above, however, this protocol
is unsuited to our needs because it is uneconomic. It
may be retained initially, as it gives direct
compatibility with other Catenet sites.
Bennett [Page 4]
INDRA Note 897, IEN 141 Message System Issues
In the slightly longer term, the NI FTP is the more
attractive option. The reasons for this are its
independence of specific transport services and the
fact that it will be widely adopted in the UK. UCL
already has implementations on its research Unix and at
ISIE (though these will have to be changed to reflect
the final specification); an implementation at RSRE is
planned; and future mail servers in the UK will prefer
to use it. The fact that many of these will run above
X25 networks while Catenet sites will use TCP is
immaterial; the necessary transport-level conversion
will be handled by the UCL Protocol Convertor. The
existing ARPANET FTP is demonstrably NCP-specific, and
the TCP version of this will at the minimum be
Catenet-specific in its use of Telnet.
3.3.1 FTP Staging
An important consequence of this is that FTP staging
will be required, for three reasons.
(i) It will be necessary to stage messages into and
out of the ARPANET. This applies regardless of
the FTP used, as ARPANET mail is restricted to
use of the ARPANET FTP.
(ii) It will be necessary to stage messages between
mailers in the Catenet using the TCP FTP and
those using the NI FTP. If UCL does decide to
use the TCP FTP, this decision is merely
postponed until a UK community emerges based on
the NI FTP.
(iii) It may eventually be necessary to stage
messages between UCL and Telenet/Tymnet
servers, even if they adopt a common format, if
a different transport mechanism is used.
It is proposed here that experiments with the first two
stagings be performed at ISIE, or some other TOPS20 on
the ARPANET which has all three systems. In its final
form, the staging system would consist of a daemon
which would process the mail file at a special account
and forward messages to the appropriate sites. The
structure of such a system is shown in Figure 1.
3.4 Addressing
Only four message sites in the UK are initially
Bennett [Page 5]
INDRA Note 897, IEN 141 Message System Issues
Figure 1: Staging Daemon System
expected to be heavily involved in the system.
Initially, development will be in the UCL message
server itself (UCL-MUnix), while at a later stage the
UCL teaching and research machines (UCL-TUnix and UCL-
RUnix), and at least one machine at RSRE will become
involved. While other message servers may emerge at a
later date, it is not expected that this will happen
rapidly. Staging to Catenet and ARPANET sites will be
through ISIE; the problem of staging to Telenet/Tymnet
Bennett [Page 6]
INDRA Note 897, IEN 141 Message System Issues
sites must be considered if and when it arises.
The UK sites should be able to exchange mail directly
through the use of addresses of the form 'user@site'
(e.g. Ruth@UCL-TUnix). This format could be used
throughout the mailing address space, although it
involves the message sites not under UCL control to
make special modifications to their mailers. Thus an
ARPANET mailer presented with a return address
'Ruth@UCL-TUnix' would have to recognise that this
should be sent to ISIE; the ISIE mailer would have to
recognise that the message should be added to the UCL
daemon's mailbox and the UCL daemon would then forward
the message to UCL-TUnix.
Two other alternatives are source routing and
hierarchical addressing. A source routed form of the
address might be identical in appearance to the ARPANET
(by making 'UCL' a synonym for ISIE, in much the same
way the 'UDel-EE' is a synonym for 'Rand-Unix'),
although for parsing purposes it would be preferable to
rearrange it: (Ruth-(TUnix@(UCL))). Local messages
would then appear as: Ruth-TUnix. An ARPANET address
would appear to a message server user in a form such
as: Kirstein-ISI@ISIE. Staging message servers would be
required to parse the address into intermediate forms.
Further, the terminal staging server for the catenet
and for ARPANET would be required to suppress
intermediate fields. Thus the UCL daemon at ISIE would
have to transform all addresses of the form: Kirstein-
ISI@ISIE to Kirstein@ISI and back again for traffic in
the reverse direction. Source routing is the favoured
solution of the University of Delaware's MMDF group.
Hierarchical addressing is actually the official
ARPANET standard as described in RFC 733, although it
is not implemented. It is also the solution favoured in
Postel's Internet system. Under this scheme UCL would
refer to a widely-known addressing domain, and
addresses would take the form: Kirstein-ISI@ARPA and
Ruth-TUnix@UCL. In practice, since only two hops and
only one staging point are involved the two forms are
virtually synonymous - which is a good argument for
postponing a real decision until we see an addressing
hierarchy actually emerging! The differences will be
seen when an RSRE server becomes active. In this case,
an ARPANET site has the choice of the following forms:
Bryan@NSide (global)
Bryan-NSide@PPSN (hierarchical)
Bryan-NSide-MUnix@ISIE (source routing)
Bennett [Page 7]
INDRA Note 897, IEN 141 Message System Issues
Note that in any form changes of the type above are
required to ARPANET mailers. With global and
hierarchical addressing, ARPANET tables must be
modified to recognise mail servers (global address) or
mail address spaces (hierarchical address). This is
not required with source routing. The mailer at the
staging site must additionally recognise that account
names taking a certain format should automatically be
accepted and routed to the UCL mail daemon at that
site. Both solutions therefore require some structuring
of the address. In the examples above, a hyphen ('-')
has been used as a component separator. In fact, this
is probably a bad choice. Two possibilities are:
(i) Use of some other separator, such as %.
(ii) Use of the comment fields allowed by the mail
protocol.
The second choice has the convenient side effect that
the account checking procedure need not be changed at
the staging site, as addresses may then look like:
UCLfor a source-routed
format). However not all message preparation facilities
will include comment fields (e.g. 'answer' under MSG).
Since this note was first drafted my attention has
been drawn to RFC754 (Out-of-Net Host Addresses for
Mail by J. Postel). This note considers four
solutions: three are variants on the global solution,
and the fourth involves name structuring. Postel's note
favours a structured name solution. This is compatible
with either a source routed or hierarchically
structured solution.
3.5 Status Reporting
Finally in this section there is the issue of status
reporting. Currently, most ARPA-type message systems
give an immediate report, with possibly a mailer-
generated message if there is some subsequent failure.
For staged or mailbagged messages an immediate report
of success can only imply success at the first stage.
Thus it is important that staging daemons which cannot
successfully deliver a message must be prepared to
generate messages indicating why failure occurred. This
can be done simply through the use of the current
message generation mechanism.
Bennett [Page 8]
INDRA Note 897, IEN 141 Message System Issues
4. Message Server Design
4.1 User Interface
The primary service which must be provided is a
reliable, efficient and cheap method of sending and
processing text messages exchanged amongst the user
community. It is not intended to provide a multimedia
service, although this is an important research goal of
the program. Within this constraint, a user of the
message server must be able to:
(i) Prepare messages.
(ii) Send messages to remote users.
(iii) Receive messages from remote users.
(iv) Read messages.
(v) Be assured that messages are safely stored and
are recoverable in the event of system failure.
(vi) Be able to obtain adequate online help on the
use of the server.
In addition it is desirable that the user be able to:
(i) Prepare message files which may not be sent
immediately.
(ii) Archive and dearchive messages.
(iii) Manipulate messages in file structures of his
own creation.
(iv) Answer and forward messages.
(v) Obtain hardcopy listings.
(vi) Maintain mailing lists.
(vii) Annotate messages.
This list is clearly not exhaustive, and the aims of
the user interface should be continually reevaluated in
the light of user experience, development experience,
and the recommendations of other message groups, such
as IFIP TC6.5. Nor does it imply any evaluation of the
difficulty of implementation: answering and forwarding
Bennett [Page 9]
INDRA Note 897, IEN 141 Message System Issues
messages should be comparatively trivial; while a
satisfactory remote hardcopy listing service is a major
problem.
Following the general approach taken in this note, it
is proposed that MSG be used at least initially as the
basis of the user interface in the message server. The
user would enter MSG automatically as his login shell.
It is expected that the repertoire of commands will be
changed and extended in order to provide the full range
of services listed above (e.g. for the maintenance of
mailing lists). This may require the single-letter
command interface to be modified. It is also expected
that the character-at-a-time interface and the use of
TV editors would have to be altered to fit the needs of
users accessing the system via XXX terminals, which
favour line-oriented commands and editors. These issues
will be reexamined in the light of experience gained.
4.2 Message Management
An important issue is the internal design of the
message server. The current system of personal mailbox
files each containing a copy of all messages is complex
and wasteful in a Unix system solely devoted to message
handling. It is proposed here that database structures
be examined in which only one copy of a message is kept
in a central directory, and that the user's current
mail file, and any other mail files he keeps, consist
solely of descriptors pointing to the message and to
other cross-referencing descriptors which may be
needed. The structure of the system is shown in Figure
2.
The details of the descriptor structure are not
considered in this note. However, a number of important
issues arise. The fundamental question is: should all
messages be kept in a single file, or each message in a
separate file? The answer chosen has important
implications for the limits on the size of the system,
the method of updating the system, methods of accessing
messages, and many other issues.
In the second method, messages may be found rapidly
by filename, and garbage collection is considerably
simplified through the use of Unix file management
facilities, but on average 256 bytes (half a disc
block) will be wasted per message. Further, at most an
entire file system of 64K blocks can be allocated to
message service, although this is not a serious
Bennett [Page 10]
INDRA Note 897, IEN 141 Message System Issues
Figure 2: Message Management Structure
restriction. Assuming that most messages will be small,
of the order of 2K characters, the file system would
allow something less than 16K messages, wasting some 4K
bytes of space. Thus a more serious limitation is the
number of inodes (file descriptors) allocated to the
system, which is currently about 2^13 - allowing 8K
files. Increasing this to 2^14 is not difficult and
will allow 16K files, of which a significant proportion
would be for user descriptor information.
Bennett [Page 11]
INDRA Note 897, IEN 141 Message System Issues
The first method allows more efficient use of space
and places a much looser restriction on the number of
messages that may be retained, but requires building
searching and garbage collection facilities parallel to
Unix's. In order to use these, moreover, either a
complex file structure must be defined, or a master
descriptor file retained.
Pending further investigation, the second choice is
favoured at this stage. The fact that only one copy of
a message need be kept should help to minimise the
effects of the restrictions. Ensuring this may be a
problem, especially if multiple copies of a message are
received. Hence an important aspect of the system may
be to examine incoming messages and attempt to detect
duplicates of existing messages.
5. Conclusions
The message system discussed here is centred around
text messages based largely on ARPANET-style formats,
at least initially. Nevertheless there are several
important issues which must be resolved in order to
bring up a workable system. These issues include:
(i) Economic use of transfer and storage resources.
(ii) The structure of UCL-style mail daemons at
staging site(s).
(iii) The modification of other mail servers to
handle UCL mail.
(iv) Basic addressing style.
(v) Detailed user interface.
(vi) Message management issues.
This note has indicated some lines of approach to these
problems. They will be examined in more detail in
future notes, prior to the commencement of actual work
on the system later this year. It is clear that
satisfactory progress requires cooperation and
discussion with other parties, notably the DARPA
Catenet group and groups using various public carrier
services. While the projects of the former are more
advanced at this point, it is expected that the latter
groups will become increasingly important in the long
term.
Bennett [Page 12]