(for context, see a review of OLPC’s networking troubles)
A group of OLPC engineers, especially C. Scott Ananian and Michael Stone, were dissatisfied with the the behavior of the current collaboration stack, and decided to start from scratch, laying out a set of “Network principles” from which a new collaboration system could be derived. Their proposal is a bit hard to compare to the Telepathy design, as it is written from a dramatically different perspective.
Telepathy, and in particular Telepathy’s implementation of XMPP, is meant to be a single-source, near-universal solution for all collaboration problems. Telepathy provides identity (as a Jabber ID), buddy lists with presence and status updates, buddy search, text IM, multi-user chat rooms, remote procedure calls (D-Bus Tubes), reliable bytestreams (Stream Tubes), voice over IP, videoconferencing, file transfer, and a growing list of more. All these features are integrated together, so that an application that starts with a text chat between two users can easily initiate a file transfer, or a voice chat, alongside it.
Telepathy is also designed to make the most of even the worst network topologies. Most of this support comes simply from having a central server, which ensures that any user who can see the server can communicate with any other user who can see the server, or any federated server. Telepathy also provides a number of NAT traversal techniques, many originally standardized by Google, for optimizing bandwidth, latency, and server performance, by bypassing the server when possible.
Telepathy is designed for interoperability with other Jabber clients, up to a point. XOs running Sugar can chat over the local network with Macs running iChat, because both support the link-local XMPP specification. With a properly configured server, and a change to the Sugar UI, Sugar users could chat with anyone on jabber.org or even Google Talk. Of course, other Jabber clients don’t provide Sugar’s Activity system, so interoperability is limited to text, voice, video, and files. If non-Sugar applications start using Telepathy’s Tubes system, then compatible Sugar Activities can potentially be written, but at this time virtually all use of Tubes seems to be within Sugar.
Telepathy’s many features make it a tremendously powerful single source. They also make it complicated and confusing. For example, Telepathy supports both multi-server operation (via Federation) and buddy search (via Gadget), but Gadget only acts within a single server, so locating buddies on a different server requires a separate mechanism. Explaining this within a comprehensible user interface seems difficult to me. “Network Principles” describes the opposite approach.
“Network Principles” is so named because it is not a singular software entity. It’s a set of ideas for how networked collaboration should be implemented, drawn up in such a way that very little new software is actually required to implement it. Its tagline could be “use as little software as possible, but no less.”.
In the network principles architecture, (hereafter referred to as NPA), a user is identified by a DNS name. All knowledge about that user comes by communicating over IP with the IP address specified by that DNS name. For example, instead of asking a central jabber server what Activities a user is running, one would simply communicate with some daemon running on that user’s machine. The story is similar for file transfer, video chat, or any other network collaboration use case. The user’s identity, the DNS name, immediately enables you to contact that user directly. The principal motivation for this design is to ensure that all traffic is direct and unicast, to avoid the multicast routing breakdown experienced by Salut on wireless networks. It also permits extremely simple, transparent debugging, using tools like ifconfig and ping.
Of course, this described system only works if there is a DNS server available and all users have mutually routable IP addresses. There are several common scenarios in which the network architecture does not meet this description, and which Telepathy goes to great lengths to support. NPA proposes to support these scenarios instead by slightly changing the network architecture:
- On serverless, link-local networks, there is (by definition) no DNS server. In NPA, link-local networking would be supported by modifying DNS (e.g. via an NSS plugin) to produce a machine’s link-local IPv6 address by hashing the user’s DNS name. The IPv6 address space is sufficiently large that collisions are unlikely. (Telepathy uses mDNS here, with its problems of inefficiency on wireless networks and unreliable design.)
- On a LAN with private IP addresses provided by a router, but no need to communicate outside the LAN, NPA users can simply continue to use their link-local addresses via the described hashing scheme. Alternatively, a cooperating router could provide a dynamic DNS server to allow users to adopt DNS names.
- On a multiple-LAN network with private IP addresses provided by a router, but no need to communicate outside the network, some sort of server is required. The server could be a cooperating router providing a slightly unusual fake-DNS service, or (if efficiency is not critical) it could be an IPv6 tunnel gateway.
- On a network with private IP addresses connected to the internet via NAT, NPA users wishing to communicate outside the LAN would require a network tunnel, such as an IPv6 tunnel, with a global address at the endpoint. That (unique per-user) endpoint address would then be listed in the user’s DNS record, updated by dynamic DNS. (Telepathy employs the Jabber server itself as a sort of tunnel, with the user’s JID serving as her global identifier. Telepathy also adds some NAT traversal techniques to improve efficiency.)
- On dynamic public IP addresses, NPA users would simply employ dynamic DNS to keep their name record up to date.
Like Telepathy, the NPA is designed for interoperability, but in a very different sense. Because NPA simply demands that collaborating users be able to route to each other’s IP addresses, they may now communicate with any service whose requirements are the same. Most obviously, users might run web servers, and collaboration could occur over HTTP. Anyone with a web browser could participate, even those running operating systems with no notion of NPA. A user could also run an IRC daemon, a gnutella node, or (notably) a Telepathy-compatible XMPP server. Running such a server would allow collaborative Activities to run as they do now… almost.
NPA trades away two major things to achieve its simplicity. The most obvious is efficiency, at least in certain limits. In Telepathy, a user can send a message to many other users with a single packet. In Salut, this is achieved by link-layer multicast. In Gabble, this is a service provided by the server, which is presumed to have substantially more bandwidth than the clients. The server forwards the message to all the right people, acting as a sort of bandwidth amplifier.
In NPA, there is no distinction between link-local connections and global connections, and so no provision for making use of multicast routing, which (for the foreseeable future) is only available within private networks. Similarly, there is no provision for bandwidth amplifiers. Both of these features could be provided by a particular library layered on top of NPA, but then these features would only be applicable to other applications using this library. Such a layer would, in my view, be approaching a reimplementation of Gabble or Salut.
The value of one-to-many transmission should not be overstated. The problems experienced with multicast routing in wireless networks indicate that falling back to multiple independent unicast transmissions might not be such a great loss after all. Similarly, the value of bandwidth amplification by a server is questionable, especially given the poor observed performance of ejabberd under load. In the oft-mentioned case of a “school server”, the server shares a LAN with the users, and so has essentially no bandwidth advantage.
Another efficiency loss with NPA occurs in the case of users behind NATs. Thanks to NAT hole-punching, it is possible for two Telepathy users behind NATs to communicate with each other directly, using techniques such as STUN. In NPA, it is difficult to imagine a system for NAT traversal, because NAT’ed users see each other only through the public IPs of their tunnel endpoints, which could easily be inconveniently located in the network. (EDIT: Michael Stone notes that a large family of techniques, notably Teredo tunneling, have developed to provide direct NAT traversal for IPv6 tunnels. Such techniques could be employed to avoid triangle routing and achieve behaviors equal to Telepathy’s.) Direct connections of this sort are particularly important for videochat and voip, which are both bandwidth-intensive and demand low latency. The importance should not be overstated: Sugar has done very little in the way of videoconferencing or voip, and Skype often routes calls through relays without complaints from users. Nonetheless, we can imagine that if the users’ IPv6 tunnel endpoint is on the other side of a satellite link, then all local collaboration traffic might be forced over this slow, unreliable, expensive link.
Another significant sacrifice is disconnection resilience. Salut and Gabble enable collaborative sessions to persist even as their membership rotates. The lead user needs not remain for activity to continue. Strictly speaking, this is true for NPA as well, but not for collaboration based on HTTP, IRC, or any other standard server-based protocol. Server-based protocols are almost always designed with the assumption that the server is a long-lived third party, providing a service to the users. Running such a server on a child’s laptop, over an unreliable wireless network, is bound to create disappointment. To support this feature in NPA, an Activity would either have to direct traffic through a reliable third party (reprising the XMPP server in Gabble) or implement a distributed data coherence algorithm (reprising much of Salut’s Clique protocol).
Admittedly, many current Sugar Activities simply fall over if the leader leaves. There may even be some bugs in the Presence Service that make the user interface problematic in these cases. I also admit that this particular feature is a personal interest of mine. Nonetheless, I genuinely feel its an important property, and not one to be given up so easily.
So after this exhaustive (indeed, exhausting) comparison, what are we left with? I cannot foresee jettisoning Salut or Gabble. Both provide important features, and there is not as yet anything resembling an alternative. However, I do think that the “Network Principles” are valuable, particularly as they provide, in some cases, a way out from our current unwinnable fights with wireless networks. We would do well to see where we can integrate these principles into Sugar, without disrupting our present systems.
The first logical step, to me, is to make use of Michael Stone’s remarkable DNS-hash-based NSS module. Salut relies on mDNS to spread presence information, implemented in the Avahi library. I suspect we could modify Avahi to use hashDNS instead of mDNS, resulting in a dramatically reduced number of network broadcasts. Salut’s performance on wifi and mesh might improve significantly; if it did, we would both validate the approach and enjoy a valuable win. This would at least nominally violate the link-local XMPP standard, but the approach is quite elegant, so it could potentially be standardized.
We would also improve our ability to locate problems with finer granularity. On an NPA network, correct functioning can be verified layer by layer. Even if the top layer is unchanged, this can be valuable for diagnosis in the field.