IEN #31 On Names, Addresses and Routings (II) Dan Cohen ISI 28 April 1978 2.3.3.11 On Names, Addresses and Routings (II) SUMMARY This note deals with internetwork addressing and routing. It suggests the following: (1) Some systems have a tree-like hierarchical Universal-Address (UA) space. (2) The communication connectivity is a superset of this tree. (3) The postal system, the telephone and the ARPANET are such systems. (4) The address of a process tells where it is located (or connected to) by specifying the route to it from the root of the universal addressing tree. (5) The default routing to any address (unless a better one is specifically known) is up the UA-tree, from the source, and down the tree to the destination. In the case of networks like the ARPANET, the set of all the IMPs (the subnet) is considered as a single process, known as {ARPANET}. (6) Since the set of all networks is not connected, it cannot be tree structured. However, for ease of name-management it is possible to introduce (e.g., administratively) any arbitrary hierarchy to the address space of all networks. This hierarchy is "artificial" because it is not a subset of the communication connectivity. (7) Since there is no hierarchical structure to the space of all networks, there is no tree-like hierarchical internetwork Universal-Addressing scheme. In particular, the notion of extending addresses like {NET}-[HOST]-(PORT) or {NET}-/IMP/-[HOST]-(PORT) 2 upward to include METANETs, SUPERNETs and GIGANETs suffers from the lack of corresponding underlying communication structure. (8) Since the set of all networks is too big to be captured in local tables, and since routing cannot be derived in general from the addresses, it should always either be apriori known, or supplied by the source. This apriori knowledge does not include every step (e.g., the sequence of intermediate IMPs or PRUs). It has to include only a sequence of addresses such that the routing between them is locally known. (9) The corollary of this note is OPTIONAL SOURCE ROUTING SHOULD BE IMPLEMENTED. This note explains in great detail several aspects of the addressing schemes used by the postal and telephone systems. It also mentions the ARPANET addressing. These examples are used to support the arguments which are summarized in paragraphs (1)-(9) above. Unless one is very interested in the details of these systems and their relevance to the internetwork environment, or in the details of the arguments, there is no need to continue reading this note beyond this point. INTRODUCTION Before discussing the internet addressing/routing issues let us summarize the basic concepts: - Processes (not hardware pieces) have names and addresses. - A process may have more than one name, and one address, but addresses and names correspond uniquely to processes. - The name tells WHAT the process is. - The address tells WHERE the process is. - The route tells HOW-TO-GET-THERE. The last three beautiful definitions are borrowed (with appreciation) from John Shoch. More detailed discussion of names and addresses may be found in "Internetwork Naming, Addressing, and Routing" [Internet Notebook Section 2.3.3.5, IEN 19] by John Shoch, and in "On Names, Addresses and Routings" [Internet Notebook Section 2.3.3.7, IEN 23] by Danny Cohen. 3 Generally the concept of address is well understood, but the concept of routing is much more complex. Who performs the address-to-routing mapping? How is it performed? These and similar problems are the topic of this note. Most of us are familiar with the postal, the telephone, and the ARPANET addressing schemes. We also have a very good understanding of the routing processes performed for these networks. In this note we discuss the similarity of these addressing schemes, and argue that the internetwork environment violates the basic axiom which is common to them, and therefore the internetwork environment requires a different addressing and routing philosophy. This is why we have so much trouble with internet addressing -- it is different. Our experience cannot be used as a model. ON OTHER ADDRESSING/ROUTING SCHEMES Three addressing/routing schemes are discussed: the postal, the telephone, and the ARPANET schemes. The postal addressing scheme is a UA-scheme. It is defined for human processing, and therefore may tolerate a fair amount of redundancy which improves the robustness of the scheme with respect to errors and to partial losses (like stains on envelopes). At the top level of the hierarchy there is the country name, and underneath it there are as many addressing schemes as there are countries. Some countries use ZIP codes which identify major post offices, and require more information to identify the terminal addressee. Other countries use the ZIP code (or its equivalent) to identify the individual letter-carriers, and require less additional information to identify the terminal addressee. In some cases, a street address is sufficient. In others, suite number and names are required. In summary, the postal addressing tree is of variable depth (or: postal addresses are of variable length). Its top level (the country level) has a complete connectivity, since every country knows how to route mail to any other one. Letters are routed, either directly to the destination or to one of its ancestors. This is performed by POs either directly or via their ancestors. Since the address and routing processing is not fully automated, human intelligence is used for resolving ambiguities, for coping with unknown addresses, for redundancy handling and the like. Missing information is usually supplied by using common sense and defaults. As a result, a great amount of variability in addresses can be tolerated. For example, while at Tech, I received letters addressed to 4 Danny (I was the only Danny there) 256-80 (The Computer Science Mail-stop) 91125 (The zip code for Caltech) I also received letters addressed to: Dr. Dan Cohen Computer Science, Mail-Stop 256-80 California Institute of Technology Pasadena, California 91125 The first address contains all the needed information, but requires delicate handling. If any of the digits is mistyped, there is very little chance that the letter would be delivered to the right destination. The second address, which has about six times more characters, is more robust, and can cope with the multi-Danny situation, which (even though undesirable) is still very probable. The terse address has to be modified if another Danny joins this Mail-Stop. The phone addressing is considered next. Like the postal system, each telephone station has a UA. It always starts with the country code, and usually continues with a Numbering-Plan-Area (NPA, which is the familiar Area-Code, AC), continues through a Central Office (CO) number and terminates with the station number. Even though the above fields are of variable length, the total length never exceeds 12 digits. However, when private networks are connected to the universal phone system, the entire address length may exceed this limit. For example, my current phone number is 12138221511105, which is 14 digits long. The first "1" is the USA code, and the last "105" is my station (extension) number. At the top level of the hierarchy there is the country code, and underneath it there are as many addressing schemes as there are countries. All countries know how to communicate with any other country. In both systems addresses are of variable length, and by looking at an arbitrary address one cannot parse it into fields without knowing the specifics of each national addressing scheme. Here are some examples of telephone addresses and their correct parsing as COUNTRY-NPA-CO-station: 1-213-822-1511 (USA, LA) 44-1-387-3400 (UK, London) 44-31-332-2424 (UK, Edinburgh) 44-745-58-3301 (UK, Wales) 972-4-25-2690 (Israel, Haifa) 972-67-4-0777 (Israel, Kiryat Shmona) Obviously, the number of fields to be dialed depends on "distance" from the destination. 5 Since the telephone routing is performed by (relatively) simple minded automated equipment, no variability in dialing a number is allowed (except in very unusual situations). Like the postal addresses, phone dialing sequences are of variable length, as a function of the distance to the destination. While adding the country code ("USA") would not hurt either of the postal addresses shown above, adding the local AC and country code is not allowed for local phone calls. The reason is that one does not dial the address (telephone numbers), but dials the routing! The purpose of the telephone number is to be used for accounting and for deriving routing, but not for verbatim-dialing. The actual telephone routing (i.e., hop-by-hop) is very efficient and deserves attention. The key to it is the existence of more communication lines than branches in the UA tree. The USA is divided into 10 regions, which subsequently are divided into sections and areas. The grouping of areas into sections and regions cannot be simply deduced from the ACs but has to be found by a table lookup operation. This is because the ACs were most cleverly assigned to areas by population size and not geographically like the ZIP codes. The routing is performed by each center assigning each call to a line known to be connected (or enroute) to the destination central-office. This is performed by looking at the first six digits of the address (AC+CO). If such a line does not exist, then the call is assigned to a line known to be connected to the principal city of the destination NPA. If such a line does not exist either, the call is forwarded to the center above this one. Since at the top all regions are interconnected, this process is guaranteed to terminate. In addition to the SIX digits recognition, the system is designed such that in many cases CO numbers do not conflict across NPAs. This eliminated the need to dial an AC of a neighboring town, across a state line, which is necessarily in a different NPA. This allows, for example, dialing from Washington, D.C., (AC=202) to Alexandria, Va., (AC=703) and to Potomac, Md., (AC=301) without dialing the AC. The third addressing scheme is the one used in the ARPANET. Addresses on the ARPANET are of processes which are either NCP-like in actual hosts or of other types in "fake" hosts. It is logical to extend the address notion "down" to include the port, too. Conceptually, one can consider the ARPANET as a single process or as a star network. The fact that this single process is implemented in a very clever way by a multitude of IMPs is irrelevant, from a functional point of view. This allows treating this entire network as a single addressable process, the {ARPANET}, if so required. 6 Each IMP knows how to forward messages to any other, and therefore all IMPs constitute the top-level (and the only level) for routing. In the ARPANET all the connections are between centers (nodes) of the same (and only) level. This is in contrast to the telephone network, which has several levels of centers, partially connected at all levels (except the top, where they are fully connected) and also partially connected between levels. An ARPANET address is of the form {ARPANET}-[HOST]-(PORT), which one may consider as {ARPANET}-/IMP/-[HOST]-(PORT). Adding the /IMP/ field may help in the ARPANET situation, though some generality will be lacking. This is a valid address, since the {ARPANET}-process (which is the set of all IMPs) can forward messages to all hosts, and hosts can give messages to PORTs. Therefore, routing every message up to the {ARPANET} and then down through the host to the destination port is a good default routing strategy. INTERNETWORK ADDRESSING After this (very) long introduction, let's return to the Internet Addressing situation. But first, let's introduce some more formality: * An address is a string (i.e., an ordered set) of symbols taken from a given alphabet (e.g., ASCII, {0,1}, {1,2,...9,0}). * In a UA tree the level of a given address is its depth in the tree. The level of the address A is denoted by L(A). * Address concatenation (extended to the right) will be used. The concatenation is denoted by a "-". * If both A1 and A2 are addresses of the same process P, then (1) L(A1) and L(A2) may differ, and (2) The addresses (A1)-(X) and (A2)-(X) are necessarily addresses of the same process. * Addresses should always be decodable in a strictly left-to-right sequential manner (prefix coding). What is the ARPANET address of my mailbox? In the TENEX environment it is [ISI]-(MAILBOX)-. But in another environment, in the host [X], it could be [X]--(MAILBOX). Or maybe in the form [X]-/TCP/-(MAIL.DEPO)-. 7 Obviously we want to expand it upwards, to allow for other networks. We could simply add in front of these addresses a network field and get {ARPANET}-[ISI]-(MAILBOX)- and similar addresses. In other networks the value of the NETWORK-field may be {PRNET-SF}, {PRNET-BOS}, {SATNET}, etc. What is the relation between these nets? Do they all belong to the same parent like USA or USA-DoD? This could be a solution, and if adopted my mail address may be upward expanded in the UA scheme to be like: {GALAXY-573}-[SOLAR.SYS]-(EARTH)--{ARPANET}-[ISI]-(MAILBOX)- With a clever use of defaults, the first several fields may be omitted from most of the intraglobal communication. The extension of this address upward, to include METANETs, SUPERNETs and GIGANETs, is very elegant. This is the prevailing popular approach expressed in a series of notes and papers, such as Ken Harrenstien's note, and various other communications between the members of [SRI-KL]FIELD-ADDRESS.LIST. The intelligent reader probably has noticed by now that this note does not subscribe to the same philosophy. This solution suffers from several problems. What is a network? Is every bus to which several processes are connected a network? If not, why? What is the relation between networks? In order to stress the difference between the structure of the internet address and of the other universal addressing schemes, let's consider the following example. One of the ARPANET hosts, [PARC-MAXC], is tied to a private network. This network is actually a very rich internetwork environment, with about 14 networks and about 400 hosts, but for simplicity let us consider it functionally as a single {XEROX-net}. One of the hosts on this net is [RIG], the University of Rochester Intelligent Gateway to the internal {U-of-R-net}. This description is not accurate, since [RIG] is actually connected directly to the {ARPANET}, not to {XEROX-net}. But for the sake of the argument let us assume this connectivity. We preferred not to use other actual examples for several reasons. A poet's-license is nice to have.... One of the hosts on the {U-of-R-net} is, say, [NOVA-3]. What is its address? Obviously it is {ARPANET}-[PARC-MAXC]-{XEROX-Net}-[RIG]-{U-of-R-Net}-[NOVA-3]. By the way, whenever an intermediate host in such a specification (i.e., a gateway) is between two networks only, there is no need to specify the destination network, since it is uniquely defined by context. However, this is not a good practice, since addresses have to be changed when this host is connected to more networks. 8 What is [ISI]'s address? Obviously {ARPANET}-[ISI]. "Not so!" scream the U-of-R people. The addresses of [NOVA-3] and of [ISI] are quite different from the point of view of [NOVA-2]. According to it, the address of [NOVA-3] is simply {U-of-R-Net}-[NOVA-3] but of [ISI] is {U-of-R-Net}-[RIG]-{XEROX-Net}- [PARC-MAXC]-{ARPANET}-[ISI]. Who is right? Neither!! Either approach is equally wrong. Neither of these addresses is above the other in the UA scheme. All the networks involved in the interconnection of these networks are of equal level, unless we decide otherwise for administrative reasons. The internet communication environment does not have up-and-down relations, except in the eyes of some users, which may be very subjective. INTERNETWORK ADDRESSES ARE DIFFERENT Telephone stations are always connected to the system network, and their position in it dictates their addresses. Not so with computer networks which spring into an independent existence until they are interconnected to others, if ever. Therefore, their addresses cannot be deduced from their positions (geographically or connection-wise) and vice versa. Therefore, the network ID is an arbitrary string. Who assigns it and makes sure that no conflicts occur? Is it NBS? Jon Postel? Another Czar? At this point we suggest that: * There is no universal hierarchy of networks, in contrast to the telephone and the postal systems. * There are too many networks to be named and/or addressed in a single flat name/address space. * Therefore, some naming/addressing hierarchy has to be introduced. However, this addressing tree does not serve as a basis (or underlying structure) for the communication connectivity. * Routing cannot be computed from any point to any point from the addresses alone. How should internetwork routing be performed? There are obviously several possibilities. It could be performed entirely by the networks involved (i.e., the communication system), by the source, or by any combination of the communication system and the source. It is always desirable to distribute the knowledge about possible destinations to the various centers (gateways?), such that their "sphere of knowledge" is as large as possible, though uniformity should not be required. More knowledge should be kept about frequent destinations than about less frequent ones. 9 Since this information must be limited due to practicalities (such as finite storage and updating procedures), it is impossible that all sources always know about all possible destinations. What should be done about unknown destinations? Several possibilities may be considered. o Having a `supernet' of default sub destinations with the hope that they know how to find a way to the terminal destination (like the phone and the postal systems), o providing internetwork-wide directory services, or o refusal of service. Let's consider each of these three possibilities. Having both -{ARPANET} and -{XEROX-Net-3} does not guarantee the existence of the path {ARPANET}===={XEROX-Net-3}. Therefore, the supernet (metanet?) is not a part of the underlying communication structure as in the phone and postal situations. In order for this approach to work, one has to create this supernet and keep it updated about all changes of the entire internetwork environment. Both the storage and the updating procedures seem to be impractical. The second approach, the network-wide-directory-help, is a very reasonable one. It has a major drawback in the necessity to maintain centers with indefinite knowledge radius. Note that the telephone directory services are structured according to the addressing hierarchy of NPA-CO, and are not flat as may be suggested for the internetwork environment. In essence, this second approach has all the problems of the first one. The third approach, namely, refusal of service to unknown destinations, is consistent, to say the least. Admittedly, it seems like an inconvenience to users. It should be supplemented by ways of "learning" about the unknown destinations, such that refusal should never occur, or at most, only in very rare situations. Directory services could be helpful. Hence, this approach supports serving only destinations to which the routing is either apriori known or supplied. Note how similar this is to the telephone system philosophy. In summary: The internet routing problem is different from the routing problems of other systems because the internet environment does not have a communication connectivity which supports a UA scheme, and therefore the addresses cannot support a direct address-to-routing mapping by using only a definite amount of knowledge. 10 I HATE TO ADMIT IT, BUT ... At the beginning of this note, and in an earlier note, I used a great line telling that "names tell what the processes are, and addresses tell where they are." It continues by "routings tell how to get there." I hate to admit that by now I have some reservations about this definition. My name is "Danny." My address is "ISI." When I was at Tech, my name was the same, but the address was different. This supports the definition. How about the addresses in a broadcasting media network? When a host changes its position (location) on the same Ethernet, its address does not change. Well, maybe these addresses are not real addresses, according to the definition. Admittedly, this is an uncomfortable thought. I believe that there is a better explanation. I suggest that an address is "the canonic routing from the root of the addressing-tree." It sounds recursive, doesn't it? To be more precise, an addressing scheme is a hierarchical organization of elements, with code assignment such that each element has a unique set of codes, corresponding to its position in the hierarchy. The notion that the address tells how-to-get-there from the root of the tree is very similar to the notion that absolute coordinates are really relative, with respect to the origin. Since we know (by default) how to get from the source to the UA root, and since the address tells how to get to the destination from the root, the address tells how to get from the source to the destination. Hence, by definition, addresses are routings. This leaves us only with names and routings. This should not surprise us now, since we already discovered that the telephone system has only names and routings. Since the general internet environment does not have a hierarchy, the notion of addresses suffers. Since the addresses are "routing from the root," and since there is no root of the entire system, our conclusion is that there are no addresses, only names and routing. In other words, what we are used to call an address is actually a routing (even though it is of higher level than the hop-by-hop routing). In a well defined (and tightly controlled) environment, such as the {ARPANET}, this address/routing is a well defined string. In general, it may be of indefinite length and structure. If the destination is in the neighborhood, such as the same network, the same nets-cluster, the same agency (even on different net), the system may have a built-in knowledge of how to get to it. Otherwise the routing information needed should be supplied by the sender. 11 PROPOSAL FOR ADDRESSING AND ROUTING Our proposal for addressing and routing is as follows: * Establish a UA scheme, of variable level structure. * Disseminate as much knowledge to each participating node as deemed practical. * Allow the option of routing to be included in the headers of the messages. * Refuse delivery of messages to a destination with unknown routing * Establish internet-directory-assistance service. The proper use of the optional routing is to supply a set of subdestinations which may be as far apart as the networks can handle without help. This is very much like source routing for telephone connections, where a sequence of switching-centers is designated by a user, but the communication subsystem is free to optimize the hop-by-hop routing between these centers. As long as the number of participating networks in the internetwork environment is small, it is possible to have each of them know about routing to all the others. However, we already have a large community of networks, including several PRNETs, networks in universities, about 15 networks at Xerox, the commercial and the national ones, many in DoD, all the DEC-Nets, and many others. In addition, each big modern substantial computing system is a network. One may be advised to expect the number of networks to grow, and the internet connectivity to get more and more obscure. Therefore, optional source routing seems to be the most sensible, and the only, alternative. This does not exclude the notion that internet clusters of any size optimize their internal routing by any scheme, for example by ARPANET-like dynamic routing scheme. We recommend that the optional source routing will be composed of self level-identifying fields. Note that this is like the telephone dialing sequence. The self-identifying fields could be implemented either by codes-exclusion (like the telephone system) or by identification subfields. Obviously these two schemes are equivalent, and the choice between them is just a matter of convenience.