Hum

I woke up at the crack of dawn yesterday (okay, 6:45) to attend a 7 AM online meeting. The meeting, actually being held in Sweden where it was a more reasonable 1 PM, was the IETF75 Internet Wideband Audio Codec BoF.

I’ll see if I can unpack that expression. The IETF is the Internet Engineering Task Force. They’re the group that defines how the internet works. For example, when you download a web page (like this one), your browser sends a request packet to the server with the name of the web page you want, and the server sends back a reply containing the text of the page. The format of those packets, and the standard rules for processing them, were defined by the IETF.

There’s been a lot of interest lately in enabling voice conversations (“telephony”) over the internet. “Wideband Audio” refers to voice conversations with higher quality than your average telephone. In order to make this work, the computers on each end need to agree on a “codec”, a compact digital representation of the audio.

The IETF is a very funny organization. Although they write some of the most precisely worded, widely used, highly technical standards on the planet, the results are always referred to as RFCs, “Request For Comments”. When the IETF wants to have a preliminary meeting on a topic, it’s called a BoF, from the phrase “birds of a feather flock together”.

Currently, all the standard codecs suitable for high-quality conversation over the internet are patented, and cannot be used without paying royalties. The existing standards bodies in this area (like ITU-T and MPEG) are composed largely of representative from companies that make money from these royalties, and so are inclined to write standards that require royalty payments. The IETF has been much better about producing royalty-free standards, and so some people came up with the idea to write a new standard codec there, carefully avoiding patented technologies. The purpose of the BoF was to discuss whether to start this standardization effort.

The BoF was held in a room in Sweden, but there was also an online chat room and a live audio stream. People in the room gave very short presentations, after which there was a continuous stream of comments, by people getting in line for a microphone. Chat room participants could ask people in the room to get up and say something for them, which happened occasionally. It was an interesting setup.

The discussion was quite heated, divided principally between those who felt that royalty-free codecs are important and those who did not. It’s not yet clear where this is going.

The IETF is often described by a famous quote from David Clark:

We reject kings, presidents and voting. We believe in rough consensus and running code

To gauge consensus without a vote, the IETF has a “Hum”. The leader says “hum for the formation of a codec working group” and people hum if they agree. Then the leader says “hum against the formation…” and people hum if they disagree. If one’s a lot louder, well, that’s consensus.

In the chat room, people tend to type things like “*HUM* for the formation of the working group”, which may or may not make any sense.

Anyway, it was quite a morning.

Sustainability

I went out for about 7 hours to judge papers for RSI’s annual compendium. It was kind of fun, a chance to sample some papers in many different fields.

I came back home to find 120 e-mails in my inbox.

This is not sustainable.

Fusion

I recently met two people who work at the MIT Plasma Science Fusion Center (PSFC). One works on Alcator C-Mod, the world’s most powerful tokamak. The other works on the Levitated Dipole Experiment (LDX), a new approach to fusion that I think of as the conjugate of a tokamak. I immediately set about cajoling them into giving me a tour, which happened Friday afternoon.

We visited the LDX control room first. We had scheduled our visit to coincide with a time when they were making plasma. After snaking through the industrial corridors of the PSFC, formerly a Nabisco bread factory, we found the LDX control room. The control room is a bare white medium-size conference room, with computers set up on tables around the edges, and another table in the center. There are snacks cluttering the center table, and papers piled on a handful of filing cabinets. In one corner, there are two innocuous black and white television sets that look like 1987, showing nothing. These are wired to the cameras pointed into the vacuum chamber. They are the only live view of what happens during a shot, though the data is also digitized, and everyone is looking at it on their computer screens a few seconds later.

The vacuum chamber can be filled with a number of different gases. While we were there, “el jefe” (his title, written on a piece of paper taped to the back of his standard-issue swivel-chair) called for an Argon shot, and so a burst of argon was allowed into the chamber. The microwave heaters were activated, and the TV screens suddenly lit up. The only source of light in the chamber is the plasma itself, and so the camera sees a diffuse glow, illuminating the chamber walls, the levitating torus, and the spring-loaded catcher positioned beneath it.

In a way, it’s surprising that you can see anything at all. At its highest operating densities, LDX runs at a pressure of about 10^-5 Torr. That’s a better vacuum than low earth orbit; there’s really almost nothing there. To get a visible glow from plasma at that density requires astonishing temperature.

I was surprised to see a lot of flicker in the plasma. My host explained that the flicker is exactly what it looks like: density waves in the plasma, rolling around a loop through the center of the torus. The brilliant thing about LDX is that these fluctuations are themselves stable; they don’t turn into a runaway reaction that throws the plasma against the vessel wall, although they do make life interesting for the levitation feedback circuits.

When they turned off the heating magnetrons, the plasma decayed back to blackness in about one second. My host noted that this was actually due to the high density. At lower densities, the glow persists longer, indicating that containment is better, presumably because the mean free time is longer and scattering events are rare.

After the LDX control room, the Alcator control room was a bit of a shock. It looks like an anarchic version of NASA mission control. The room is a huge, high-ceilinged rectangle, with desks along the edges and two rows of tables down the middle. Every desk surface is claimed by somebody’s stuff. One desk even had two ornate shaded lamps, the better to catch those late-night equations. At capacity, the room must have well over a hundred people, packed in mouse by keyboard.

At the far end of the room is a row of (matching!) modern computers, running identical software, and positioned above them are three enormous projection screens. The leftmost screen is a live video feed from inside the tokamak, and the others present inscrutable diagnostic information in black and green diagrams.

Each “shot” on Alcator is announced by a recorded female voice, referred to by all as “the sexy voice”. “Shot number 31 will commence in 12 minutes”, she said when we arrived. In the meantime, a friendly physics major showed us some footage of previous shots, so we would know what to expect. A normal shot looks much like on LDX: a diffuse glow lighting the inside of the chamber. What’s more interesting, of course, is a failed shot. Tokamaks employ the world’s most sophisticated feedback control mechanisms. Unlike LDX, plasma fluctuations in a tokamak will naturally spiral out of control unless caught and corrected electromagnetically. We watched the recording from a recent failed shot. First the video camera starts to malfunction as charged particles disrupt its electronics, and the image bounces haywire around the screen. Then a shower of sparks, and darkness. In this particular incident, a negatively charged jet escaped the confinement fields, and punched an inch-deep hole in the solid molybdenum inner cladding. This is especially impressive when considering that, as in LDX, the density of the jet was comparable to outer space.

The sexy voice initiated a fifteen-second countdown for the shot, and our hosts warned us to expect a vibration, which, when it came, was unmistakable. Alcator’s primary containment electromagnet draws 15 megawatts during a shot, about as much power as a city. No power line could possibly transfer energy this fast, and so instead, Alcator buffers energy in a giant flywheel weighing many tons, down the street from the control room. For each shot, the power is drawn back out of the flywheel, and the torsional vibrations of the generator shake the foundations.

Once firing was done for the day, we were able to visit the actual apparati themselves. They are both susceptible to similar descriptions: enormous round inscrutable devices several stories tall, surrounded by gratings and staircases for access at different levels. Alcator lives behind a particularly impressive door over a meter thick, next to a tangle of high-power electrical equipment that continues into the distance. Entry to LDX requires snaking through an entrance designed to ensure that radiation cannot have a straight exit path. LDX lives in a warehouse-like room seemingly taller than its width, and topped by a gantry crane. The whole room is carefully RF-shielded to avoid perturbing the measurements.

These guys are awesome. Now I’m definitely rooting for ITER, Polywell, “LDR”, and anyone else working on this stuff.

Halvah

I’d recently acquired a craving for Halvah, but couldn’t find any in Boston. When some friends invited me for Shabbat dinner, I looked on the web, and sure enough, I found this recipe. It’s dead simple: honey, tahini, and powdered milk. Heat, mix, and cool.

It took me forever to find the Tahini in the grocery store, and when I did, it was right next to a brand-new shipment of factory Halvah… but I was already committed to my plan. I biked back and started mixing.

The honey, mixed with the tahini at boiling temperatures, seemed to be working just fine. It smelled like halvah, and it was mixing to a uniform consistency. Adding the milk totally destroyed the illusion, though. The mixture quickly became grainy, and the oil separated out entirely. The mixture was too thick to remelt over a gas stove.

After straining off most of the excess oil, I mashed the mush into a pan and set it in the fridge to cool. It cooled to a tremendously tough solid, easy to tear but impossible to cut. Israeli halvah is dry and flaky, with a strong, almost bitter aftertaste. This stuff is moist and chewy, with just a very mild sesame flavor. My hostess called it a halvah fudge, which is probably a good name for it. It’s not halvah, but it’s not awful.

Anyway, if you’re looking to make Halvah, don’t follow this recipe… and if you figure out the right one, let me know.

Completion

Tonight I closed the loop in my target tracking system for the first time. I knew it would take all evening, so I brought dinner with me in the morning.

I spent most of the day fighting with Microsoft. I’d been using MS Visual C++ 5.0 for this project, because it was already installed on the computer with the data acquisition hardware. Today, mid-afternoon, it decided, rather than compile my program, to terminate itself immediately. I could not get it to compile anything, and so I eventually gave up, registered a copy of Visual Studio 2008 Express, and spent the rest of the day fighting with various inexplicable incompatibilities, and occasionally triggering unexpected spontaneous reboots.

I did finally get it all set up, though. I even got to use the motorized motion phantom I had built for this purpose. It was ugly, with the usual complicated dance to start and stop 5 different systems in the correct sequence, but all the components were communicating correctly.

The result is not what I would call a success, though for a first try it was a worthwhile outcome. The system appears to successfully counteract the artifact I am trying to cancel, but it introduces a new artifact that’s even worse … so I still have some work to do.

Beauty

This afternoon I took a stroll around Harvard with some friends. I can recommend without reservation Harvard Square on a sunny summer Sunday afternoon and Herrell’s Mudpie flavor.

I spent much of the weekend building a generic monoid-annotated self-balancing binary tree with bidirectional links and stable leaves. I’m not sure why monoid-annotated trees are so little-known, as they are super-useful. I learned about them here, though that author incorrectly refers to them as Finger Trees.

Anyway, they’re pretty cool. Using the monoid for summation over the integers, and an enhanced AA balancing procedure I was able to build a list with O(log(N)) performance for lookup, insertion, deletion, and search.

Beautiful.

EDIT: Turns out WordPress really doesn’t like certain Unicode characters…

Science!

I am having, I think, a classic scientist moment.

I just demonstrated that I can take an MRI pulse sequence, producing perfectly good, clear images, and introduce errors of the sort you would see if the target (perhaps an organ) were moving. This, you might say, is not very exciting. After all, I am describing a technique for making medical images worse.

But in fact, this is tremendously exciting, because I am creating motion artifacts, but the target is not moving at all. As we can destroy what we create, so a demonstration of inducing motion artifacts is the first step toward canceling them.

bemasc.net

My car got towed yesterday, because of street cleaning. It cost me $140 to get it back. I was not happy.

To cheer myself up, I bought myself a domain name! You can now reach this page just by typing “bemasc.net” into your browser. It cost me $24 for two years. Name service provided by Joker.com, selected because they seem to be nice people, and they include dynamic DNS for free.

I think it should probably be pronounced with a hard c, even though this breaks the etymology.

To fix the network

(for context, see a review of OLPC’s networking troubles)

A group of OLPC engineers, especially C. Scott Ananian and Michael Stone, were dissatisfied with the the behavior of the current collaboration stack, and decided to start from scratch, laying out a set of “Network principles” from which a new collaboration system could be derived. Their proposal is a bit hard to compare to the Telepathy design, as it is written from a dramatically different perspective.

Telepathy, and in particular Telepathy’s implementation of XMPP, is meant to be a single-source, near-universal solution for all collaboration problems. Telepathy provides identity (as a Jabber ID), buddy lists with presence and status updates, buddy search, text IM, multi-user chat rooms, remote procedure calls (D-Bus Tubes), reliable bytestreams (Stream Tubes), voice over IP, videoconferencing, file transfer, and a growing list of more. All these features are integrated together, so that an application that starts with a text chat between two users can easily initiate a file transfer, or a voice chat, alongside it.

Telepathy is also designed to make the most of even the worst network topologies. Most of this support comes simply from having a central server, which ensures that any user who can see the server can communicate with any other user who can see the server, or any federated server. Telepathy also provides a number of NAT traversal techniques, many originally standardized by Google, for optimizing bandwidth, latency, and server performance, by bypassing the server when possible.

Telepathy is designed for interoperability with other Jabber clients, up to a point. XOs running Sugar can chat over the local network with Macs running iChat, because both support the link-local XMPP specification. With a properly configured server, and a change to the Sugar UI, Sugar users could chat with anyone on jabber.org or even Google Talk. Of course, other Jabber clients don’t provide Sugar’s Activity system, so interoperability is limited to text, voice, video, and files. If non-Sugar applications start using Telepathy’s Tubes system, then compatible Sugar Activities can potentially be written, but at this time virtually all use of Tubes seems to be within Sugar.

Telepathy’s many features make it a tremendously powerful single source. They also make it complicated and confusing. For example, Telepathy supports both multi-server operation (via Federation) and buddy search (via Gadget), but Gadget only acts within a single server, so locating buddies on a different server requires a separate mechanism. Explaining this within a comprehensible user interface seems difficult to me. “Network Principles” describes the opposite approach.

“Network Principles” is so named because it is not a singular software entity. It’s a set of ideas for how networked collaboration should be implemented, drawn up in such a way that very little new software is actually required to implement it. Its tagline could be “use as little software as possible, but no less.”.

In the network principles architecture, (hereafter referred to as NPA), a user is identified by a DNS name. All knowledge about that user comes by communicating over IP with the IP address specified by that DNS name. For example, instead of asking a central jabber server what Activities a user is running, one would simply communicate with some daemon running on that user’s machine. The story is similar for file transfer, video chat, or any other network collaboration use case. The user’s identity, the DNS name, immediately enables you to contact that user directly. The principal motivation for this design is to ensure that all traffic is direct and unicast, to avoid the multicast routing breakdown experienced by Salut on wireless networks. It also permits extremely simple, transparent debugging, using tools like ifconfig and ping.

Of course, this described system only works if there is a DNS server available and all users have mutually routable IP addresses. There are several common scenarios in which the network architecture does not meet this description, and which Telepathy goes to great lengths to support. NPA proposes to support these scenarios instead by slightly changing the network architecture:

  • On serverless, link-local networks, there is (by definition) no DNS server. In NPA, link-local networking would be supported by modifying DNS (e.g. via an NSS plugin) to produce a machine’s link-local IPv6 address by hashing the user’s DNS name. The IPv6 address space is sufficiently large that collisions are unlikely. (Telepathy uses mDNS here, with its problems of inefficiency on wireless networks and unreliable design.)
  • On a LAN with private IP addresses provided by a router, but no need to communicate outside the LAN, NPA users can simply continue to use their link-local addresses via the described hashing scheme. Alternatively, a cooperating router could provide a dynamic DNS server to allow users to adopt DNS names.
  • On a multiple-LAN network with private IP addresses provided by a router, but no need to communicate outside the network, some sort of server is required. The server could be a cooperating router providing a slightly unusual fake-DNS service, or (if efficiency is not critical) it could be an IPv6 tunnel gateway.
  • On a network with private IP addresses connected to the internet via NAT, NPA users wishing to communicate outside the LAN would require a network tunnel, such as an IPv6 tunnel, with a global address at the endpoint. That (unique per-user) endpoint address would then be listed in the user’s DNS record, updated by dynamic DNS. (Telepathy employs the Jabber server itself as a sort of tunnel, with the user’s JID serving as her global identifier. Telepathy also adds some NAT traversal techniques to improve efficiency.)
  • On dynamic public IP addresses, NPA users would simply employ dynamic DNS to keep their name record up to date.

Like Telepathy, the NPA is designed for interoperability, but in a very different sense. Because NPA simply demands that collaborating users be able to route to each other’s IP addresses, they may now communicate with any service whose requirements are the same. Most obviously, users might run web servers, and collaboration could occur over HTTP. Anyone with a web browser could participate, even those running operating systems with no notion of NPA. A user could also run an IRC daemon, a gnutella node, or (notably) a Telepathy-compatible XMPP server. Running such a server would allow collaborative Activities to run as they do now… almost.

NPA trades away two major things to achieve its simplicity. The most obvious is efficiency, at least in certain limits. In Telepathy, a user can send a message to many other users with a single packet. In Salut, this is achieved by link-layer multicast. In Gabble, this is a service provided by the server, which is presumed to have substantially more bandwidth than the clients. The server forwards the message to all the right people, acting as a sort of bandwidth amplifier.

In NPA, there is no distinction between link-local connections and global connections, and so no provision for making use of multicast routing, which (for the foreseeable future) is only available within private networks. Similarly, there is no provision for bandwidth amplifiers. Both of these features could be provided by a particular library layered on top of NPA, but then these features would only be applicable to other applications using this library. Such a layer would, in my view, be approaching a reimplementation of Gabble or Salut.

The value of one-to-many transmission should not be overstated. The problems experienced with multicast routing in wireless networks indicate that falling back to multiple independent unicast transmissions might not be such a great loss after all. Similarly, the value of bandwidth amplification by a server is questionable, especially given the poor observed performance of ejabberd under load. In the oft-mentioned case of a “school server”, the server shares a LAN with the users, and so has essentially no bandwidth advantage.

Another efficiency loss with NPA occurs in the case of users behind NATs. Thanks to NAT hole-punching, it is possible for two Telepathy users behind NATs to communicate with each other directly, using techniques such as STUN. In NPA, it is difficult to imagine a system for NAT traversal, because NAT’ed users see each other only through the public IPs of their tunnel endpoints, which could easily be inconveniently located in the network. (EDIT: Michael Stone notes that a large family of techniques, notably Teredo tunneling, have developed to provide direct NAT traversal for IPv6 tunnels. Such techniques could be employed to avoid triangle routing and achieve behaviors equal to Telepathy’s.) Direct connections of this sort are particularly important for videochat and voip, which are both bandwidth-intensive and demand low latency. The importance should not be overstated: Sugar has done very little in the way of videoconferencing or voip, and Skype often routes calls through relays without complaints from users. Nonetheless, we can imagine that if the users’ IPv6 tunnel endpoint is on the other side of a satellite link, then all local collaboration traffic might be forced over this slow, unreliable, expensive link.

Another significant sacrifice is disconnection resilience. Salut and Gabble enable collaborative sessions to persist even as their membership rotates. The lead user needs not remain for activity to continue. Strictly speaking, this is true for NPA as well, but not for collaboration based on HTTP, IRC, or any other standard server-based protocol. Server-based protocols are almost always designed with the assumption that the server is a long-lived third party, providing a service to the users. Running such a server on a child’s laptop, over an unreliable wireless network, is bound to create disappointment. To support this feature in NPA, an Activity would either have to direct traffic through a reliable third party (reprising the XMPP server in Gabble) or implement a distributed data coherence algorithm (reprising much of Salut’s Clique protocol).

Admittedly, many current Sugar Activities simply fall over if the leader leaves. There may even be some bugs in the Presence Service that make the user interface problematic in these cases. I also admit that this particular feature is a personal interest of mine. Nonetheless, I genuinely feel its an important property, and not one to be given up so easily.

So after this exhaustive (indeed, exhausting) comparison, what are we left with? I cannot foresee jettisoning Salut or Gabble. Both provide important features, and there is not as yet anything resembling an alternative. However, I do think that the “Network Principles” are valuable, particularly as they provide, in some cases, a way out from our current unwinnable fights with wireless networks. We would do well to see where we can integrate these principles into Sugar, without disrupting our present systems.

The first logical step, to me, is to make use of Michael Stone’s remarkable DNS-hash-based NSS module. Salut relies on mDNS to spread presence information, implemented in the Avahi library. I suspect we could modify Avahi to use hashDNS instead of mDNS, resulting in a dramatically reduced number of network broadcasts. Salut’s performance on wifi and mesh might improve significantly; if it did, we would both validate the approach and enjoy a valuable win. This would at least nominally violate the link-local XMPP standard, but the approach is quite elegant, so it could potentially be standardized.

We would also improve our ability to locate problems with finer granularity. On an NPA network, correct functioning can be verified layer by layer. Even if the top layer is unchanged, this can be valuable for diagnosis in the field.

A very brief and extremely selective history of OLPC and collaboration technology, performed entirely from memory

OLPC’s founders wanted to improve education, and in their vision, education requires communication. They envisioned a computer system in which students, and teachers, could easily work together on projects, and share all kinds of documents and media. To describe this vision, they adopted a buzzword: “collaboration”.

Implementing this collaboration required a network. In schools with electricity, that network could be provided by standard wireless networking systems, and in schools with excellent systems support, collaboration could be supported by a server. In schools without electricity, or outside of schools entirely, the laptops would have to talk to each other directly, and so OLPC became perhaps the first adopter of the IEEE 802.11s standard for “mesh networking”, using a chip sourced from Marvell to implement the required behaviors.

Collaboration also requires a software system to perform communication over the network, and for this, OLPC contracted Collabora, a free software development firm working on a then-new project called Telepathy. Telepathy’s original purpose was to provide an abstraction layer over chat services, like AIM, MSN Messenger, or Google Chat, so that a chat client could work without knowing the details of each system. OLPC contracted Collabora to extend Telepathy’s XMPP (i.e. Jabber) support to arbitrary data channels, not just human-readable text. They called these channels “Tubes”.

Both Marvell’s mesh system and Collabora’s Telepathy software took years to debug. Debugging was especially hindered by the NDAs surrounding the firmware on Marvell’s chip, which prevented volunteer experts from fixing problems or adding features. (Such NDAs have become deeply ingrained in the culture of wireless device manufacturers, not least due to concerns about liability for FCC compliance violations.) Telepathy too proved difficult for outsiders to improve, due in part to its use of specialized technologies like XMPP, and a large, intricate codebase.

When both systems seemed to be approaching a degree of reliability independently, testing began on using them together. OLPC’s engineers quickly discovered that the combined system was extremely fragile, even in somewhat idealized tests. In particular, two major problems were discovered. The first was that Telepathy’s serverless communications component, known as Salut, could not be used simultaneously by more than roughly a dozen users in a room. With more users than this, typical collaborative applications would begin to fail.

After a great deal of discussion and testing by expert engineers, a rough consensus was reached, that the failure to support more users could be attributed to the behavior of multicast routing. Salut was written with a reliance on efficient routing of multicast packets, and makes deliberate use of multicast given this assumption. Marvell’s mesh routing algorithms did not provide efficient multicast routing for a large number of nearby users. (More efficient routing algorithms have been the subject of numerous research papers in recent years, but have not yet reached broad implementation.)

With Salut’s high volume of multicast traffic being routed inefficiently, a small number of users could quickly saturate the available wireless network bandwidth. Performance improved if a wireless access point was provided, but most wireless access points use the inefficient “basic rate” for all broadcast and multicast transmissions, which results in a similar saturation of bandwidth, typically seen at around 20 participating users. Salut did appear to work well on wired 100Mb ethernet, where broadcast is highly efficient and there is a great excess of bandwidth, but this use case was of little interest for OLPC, since its hardware did not have an ethernet port, and its schools could not afford to install the wiring.

A second problem was observed when a server was used to enhance collaboration. When communicating via a server, Salut and multicast are not used, and so network problems are substantially alleviated. However, the only server software recommended by Collabora, ejabberd, proved to have substantial scaling problems of its own. In testing, ejabberd had a tendency to crash when supporting more than about 100 simultaneous users, as might be common in even a small school. While several potential issues were identified and resolved, testing has proven difficult, and recent tests have run into server instabilities again. Debugging is made difficult both by the need to have several hundred active clients, and by ejabberd’s unusual internal structure (it is written in Erlang).

Telepathy also suffered from its immaturity. The developers implemented necessary features as fast as possible, and as such they were often not implemented in the most efficient way. For example, in many cases, Telepathy would take compact binary data, expand it using base64 so that it was valid plain text, and then send that text over a zlib-compressed channel, spending a great deal of CPU time in the process. Telepathy’s network transports have since been made much more efficient in many respects, but not until after OLPC began sending laptops to customers. Much efficiency work still remains.

The final problem that I will mention here is opacity. Telepathy is an abstraction layer; its goal is to hide things from the developer. This can make it difficult to determine what went wrong when things are not working. This is especially true due to the way that Sugar uses Telepathy. Sugar carefully hides all mention of IP addresses, XMPP identifiers, and all other technical matters, behind an extremely simple non-textual user interface, designed to be used by children who do not yet read well.

Sugar still uses Telepathy for collaboration, and its deep integration into every collaborative Activity makes it unlikely to ever be otherwise. However, frustrated with the various issues with the Telepathy stack, some OLPC engineers began searching for a better architecture for networked collaboration. I’ll discuss their proposals in my next post.