Category Archives: Xiph

The Clock

I watched The Clock at the Boston Museum of Fine Arts from about 4:45 to 6:30 on Friday. My thoughts on The Clock:

The experiment works in part because scenes with clocks in them are usually frenetic. In a movie, the presence of a clock usually means someone is in a rush, and so most of the sequences convey urgency.

It’s often hard to spot the clock in each scene. In less exciting sequences, this serves as a game to pass the time. In many cases the clock in question is never in focus, or is moving too fast for the viewer to notice. The editors must have done careful freeze-frames and zoomed in on wristwatches to work out the indicated time.

The selected films are mostly in English, with a fair number in French and very few in any other languages. This feels fairly arbitrary to me.

Scenes from multiple films are often mixed within each segment. It seems like the editors adopted a relaxed rule, maybe something like: “if a clock appeared in an original, then a one minute window around the moment of appearance is fair game to include during that minute of The Clock, spliced together with other clips in any order”.

The editing makes heavy use of L cuts and audio crossfades to make the fairly random assortment of sources feel more cohesive.

I swear I saw a young Michael Cain at least twice in two different roles.

Some of the sources were distinctly low-fidelity, often due to framerate matching issues. I think this might be the first production I’ve seen that would really have benefited from a full Variable Frame Rate render and display pipeline.

I started to wonder about connections to deep learning. Could we train an image captioning network to identify images of clocks and watches, then run it on a massive video corpus to generate The Clock automatically?

Or, could we construct a spatial analogue to The Clock’s time-for-time conceit? How about a service that notifies you of a film clip shot at your current location? With a large GPS-tagged corpus (or a location-finder neural network) it might be possible to do this with pretty broad coverage.


Anyone who loves video software has probably caught more than one glimpse of the Blender Foundation’s short films: Elephants Dream, Big Buck Bunny, Sintel, and Tears of Steel. I’ve enjoyed them from the beginning, and never paid a dime, on account of their impeccable Creative Commons licensing.

I always hoped that the little open source project would one day grow up enough to make a full length feature film. Now they’ve decided to try, and they’ve raised more than half their funding target … with only two days to go. You can donate here. I think of it like buying a movie ticket, except that what you get is not just the right to watch the movie, but actually ownership of the movie itself.


My home for the next two nights is the Hotel 309, right by the office.  It’s the stingiest hotel I’ve ever stayed in.  Nothing is free: the wi-fi is $8/night and the “business center” is 25 cents/minute after the first 20.  There’s no soap by the bathroom sink, just the soap dispenser over the tub.  Even in the middle of winter, there is no box of tissues.  Its status as a 2-star hotel is well-deserved.

The rooms are also very stylish.  There’s a high-contrast color scheme that spans from the dark wood floors and rug to the boldly matted posters and high-concept lamps.  The furniture has high design value, or at least it did before it got all beat up.

These two themes come together beautifully for me in the (custom printed?) shower curtain, which features a repeating pattern of peacocks and crowns … with severe JPEG artifacts!  The luma blocks are almost two centimeters across.wpid-IMG_20140217_181332.jpgSomeone should tell the artist that bits are cheap these days.



So you’re trying to build a DVD player using Debian Jessie and an Atom D2700 on a Poulsbo board, and you’ve even biked down to the used DVD warehouse and picked up a few $3 90’s classics for test materials.  Here’s what will happen next:

  1. Gnome 3 at 1920×1080.  The interface is sluggish even on static graphics.  Video is right out, since the graphics is unaccelerated, so every pixel has to be pushed around by the CPU.
  2. Reduce mode to 1280×720 (half the pixels to push), and try VLC in Gnome 3.  Playback is totally choppy.  Sigh.  Not really surprising, since Gnome is running in composited mode via OpenGL commands, which are then being faked on the low-power CPU using llvmpipe.  God only knows how many times each pixels is getting copied.  top shows half the CPU time is spent inside the gnome-shell process.
  3. Switch to XFCE.  Now VLC runs, and nothing else is stealing CPU time.  Still VLC runs out of CPU when expanded to full screen.  top shows it using 330% of CPU time, which is pretty impressive for a dual-core system.
  4. Switch to Gnome-mplayer, because someone says it’s faster.  Aspect is initially wrong; switch to “x11″ output mode to fix it.  Video playback finally runs smooth, even at full screen.  OK, there’s a little bit of tearing, but just pretend that it’s 1999.  top shows … wait for it … 67% CPU utilization, or about one fifth of VLC’s.  (Less, actually, since at that usage VLC was dropping frames.)  Too bad Gnome-mplayer is buggy as heck: buttons like “pause” and “stop” do nothing, and the rest of the user interface is a crapshoot at best.

On a system like this, efficiency makes a big difference.  Now if only we could get efficiency and functionality together…

It’s Google

I’m normally reticent to talk about the future; most of my posts are in the past tense. But now the plane tickets are purchased, apartment booked, and my room is gradually emptying itself of my furniture and belongings. The point of no return is long past.

A few days after Independence Day, I’ll be flying to Mountain View for a week at the Googleplex, and from there to Seattle (or Kirkland), to start work as a software engineer on Google’s WebRTC team, within the larger Chromium development effort. The exact project I’ll be working on initially isn’t yet decided, but a few very exciting ideas have floated by since I was offered the position in March.

Last summer I told a friend that I had no idea where I would be in a year’s time, and when I listed places I might be — Boston, Madrid, San Francisco, Schenectady — Seattle wasn’t even on the list. It still wasn’t in March, when I was offered this position in the Cambridge (MA) office. It was an unfortunate coincidence that the team I’d planned to join was relocated to Seattle shortly after I’d accepted the offer.

My recruiters and managers were helpful and gracious in two key ways. First, they arranged for me to meet with ~5 different leaders in the Cambridge office whose teams I might be able to join instead of moving. Second, they flew me out to Seattle (I’d never been to the city, nor the state, nor any of the states or provinces that it borders) and arranged for meetings with various managers and developers in the Kirkland office, just so I could learn more about the office and the city. I spent the afternoon wandering the city and (with help from a friend of a friend), looking at as many districts as I could squeeze between lunch and sleep.

The visit made all the difference. It made the city real to me … and it seemed like a place that I could live. It also confirmed an impressive pattern: every single Google employee I met, at whichever office, seemed like someone I would be happy to work alongside.

When I returned there were yet more meetings scheduled, but I began to perceive that the move was essentially inevitable. The hiring committee had done their job well, and assigned me to the best fitting position. Everything else was second best at best.

It’s been an up and down experience, with the drudgery of packing and schlepping an unwelcome reminder of the feeling of loss that accompanies leaving history, family, and friends behind. I am learning in the process that, having never really moved, I have no idea how to move.

But there’s also sometimes a sense of joy in it. I am going to be an independent, free adult, in a way that cannot be achieved by even the happiest potted plant.

After signing the same lease on the same student apartment for the seventh time, I worried about getting stuck, in some metaphysical sense, about failure to launch from my too-comfortable cocoon. It was time for a grand adventure.

This is it.

Ethics in an unethical world: Ethics Offsets

The recent hubbub regarding the (admirably public) debate within Mozilla about codec support has set me thinking about how to deal with untenable situations. After rightly railing against H.264 on the web for several years, and pushing free codecs with the full thrust of the organization, Mozilla may now be approaching consensus that they cannot win, and that continued refusal to capitulate to the cartel is tantamount to organizational suicide.

So what can you do, when you find yourself compelled to do something that goes against your ethics? To make a choice that you feel is wrong on its own because it benefits you in other ways, a choice you would like to make only when really necessary and never otherwise? Any thinking person will have this problem, to greater and lesser degrees, throughout their lives. We are not martyrs, so we do what we have to do to survive and try to keep in mind our need to escape from the trap.

Organizations cannot simply keep something in mind, but they can adopt structures that remind their members of their values even when those values are compromised. A common structure of this type is the sin tax, a tax designed (in a democracy) by members of a state to help them break or prevent their own bad habits. Sin taxes work by countering the locally perceived benefit of some action that’s harmful in a larger way, by reminding us of less visible but still important negative considerations. Some of their effect is straightforwardly economic, but some is psychological, to help us remember the bigger picture.

Sin taxes are more or less involuntary, but when the government does not impose these reminders, we often choose to remind ourselves. One currently popular implementation of this concept is the Carbon offset, a payment typically made when burning fuel to counter the effect of global warming. Organizations that buy carbon offsets for their fuel consumption do so to send a message, both internally and externally, that they place real value on minimizing carbon emissions. They may send this message both explicitly (by publicizing the purchase) and implicitly (by its effect on internal and external economic incentives).

Carbon offsets may be in fashion this decade, but there are many older forms of this concept. Maybe the most quotidian is the Curse Jar*, traditionally a place in a home or small office where individuals may make a small payment when using discouraged vocabulary. The Curse Jar provides a disincentive to coarse language despite being strictly voluntary, and despite not purchasing any effect on the linguistic environment (although the coffee fund may help for some). The Curse Jar works simply by reminding group members which behaviors are accepted and which are not.

For Mozilla, the difficulty is not emissions, verbal or vaporous, but ethical behavior. How can Mozilla publicly commit to a standard of behavior while violating it? I humbly submit that the answer is to balance its karmic books, by introducing an Ethics Offset**. When Mozilla finds itself cornered, it may take the necessary unfortunate action … and introduce a proportionate positive action as a reminder about its real values.

In the case at hand, a reasonable Ethics Offset might look like an internal “tax” on all uses of patented codecs. For example, for every Boot2Gecko device that is sold, Mozilla could commit to an offset equal to double the amount spent on patent licenses for the device. The offset could be donated to relevant worthy causes, like organizations that oppose software patents or contribute to the development of patent-free multimedia … but the actual recipient matters much less than the commitment. By accumulating and periodically (and publicly) “losing” this money, Mozilla would remind us all about its commitment to freedom in the multimedia realm. A similar scheme may be appropriate for Firefox Mobile if it is also configured for H.264 support.

Without a reminder of this kind, Mozilla risks becoming dangerously complacent and complicit to the cartel-controlled multimedia monopolies. As long as H.264 support appears to serve Mozilla’s other goals, Mozilla’s commitment to multimedia freedom will remain uncomfortable, inconvenient, and tempting to forget. Greater organizations have slid down off their ethical peaks, on paths paved all along with good intentions.

Most companies would not even consider a public and persistent admission of compromise, but Mozilla is not most companies. Neither are the companies that produce free operating systems, and many other components of the free software ecosystem. None of them should be ashamed to admit when they are forced to compromise their values and support enterprises that, on ethical grounds, they despise … but they should make their position clear, by committing to an Ethics Offset until they can escape from the compromise entirely.

*: Why is there no Wikipedia entry for “Curse Jar”!?
**: Let’s not call it an indulgence.


I was really impressed by Michael Bebenita’s Broadway.js, the recent port of an H.264 decoder to pure Javascript using Emscripten, a LLVM-based C-to-JS converter … but of course this is the opposite of what we want! Who needs H.264? We want WebM!

I’ve spent the past few weekends digging into Broadway.js, stripping out the H.264 bits and replacing them with libvpx and libnestegg. Now it’s working, to a degree. You can see it for yourself at the demo page (so far tested only in Firefox 7…).

I’m not going to be able to take this much further … at least not right now. It’s been a fun exercise though. I invite all interested comers to read some more details and then fork the repo.

Take this thing, and make it your own.
Continue reading Route9.js


So I wrote this song, sort of. Maybe you’ll like it.

YouTube version
Sheet Music
Reference files at

After about 6 years of covering pop songs in my a cappella groups, I really wanted to sing some original music. In part, I was motivated by the US’s aggressively restrictive copyright regime, which always prevented us from freely sharing recordings of our own performances.

I tried to write a song from scratch for a while, but it wasn’t working out, mostly because I don’t have anything interesting to say. Then I struck upon the idea of using the text of an old out-of-copyright poem (which, because of the US’s effectively perpetual copyright, has to be very old indeed). I started browsing through the poetry section of WikiSource, until I stumbled across this brilliant 1895 poem by Langdon Smith. The choice was clear.

I drew up a thoroughly derivative 4-part a cappella arrangement in MuseScore, and VoiceLab indulged me by adding it to the repertoire. We’ve sung it twice so far, but the first time we didn’t have a good recording, and then this time I had to solve this audio-video alignment problem… but now it’s here.

The recordings and sheet music are all CC0 dedicated to the public domain. I would appreciate attribution as the arranger, but I find threats of legal action to be just as distasteful as plagiarism. I wouldn’t want to do anything to discourage people from adopting and adapting the music as they see fit. Maybe someone will make a recording with a soloist who can really sing!

An Auto-Aligner for PiTiVi

It’s rare to get exactly one recording of an a capella concert. Usually someone’s parents have a fancy but outdated camcorder, someone in the front row has a cell phone video with a great angle but terrible quality, and there’s a beautiful audio-only recording, maybe straight from the mixing board. All the recordings are independent, starting and stopping at different times. Some are only one song long, or are broken into many short pieces.

If you want to combine all these inputs into a video that anyone could watch, you’ll first have to line them up correctly in a video editor. This is a painful process of dragging clips around on the timeline with the mouse, trying to figure out if they’re in sync or not. The usual trick to making this achievable is to look at the audio waveform visualization, but even so, the process can be tedious and irritating.

This year, when I got three recordings from the VoiceLab spring concert, I resolved to solve the problem once and for all. I set about writing an automatic clip alignment algorithm as a patch to PiTiVi, a beautiful (if not mature) free software video editor written in Python.

Today, after about two months of nights and weekends, the result is ready for testing in PiTiVi mainline. Jean-François Fortin Tam has a great writeup explaining how it works from a user’s perspective.

I hadn’t looked into it until after the fact, but of course this is not the first auto-alignment function in a video editor. Final Cut Pro appears to have a similar function built in, and there are also plug-ins such as “Plural Eyes” for many editors. However, to the best of my knowledge, this is the first free implementation, and the first available on Linux. Comparing features in PiTiVi vs. the proprietary giants, I think of this as “one down, 20,000 to go”.

I guess this is as good a place as any to talk about the algorithm, which is almost The Simplest Thing that could Possibly Work. Alignment works by analyzing the audio tracks, relying on every video camera to have a microphone of its own. The most direct approach might be to compute the cross-correlation of these audio tracks and look for the peak … but this could require storing multi-gigabyte audio files in memory, and performing impossibly large FFTs. On computers of today, the direct approach is technologically infeasible.

The algorithm I settled on resembles the method a human uses when looking at the waveform view. First, it breaks each input audio stream into 40 ms blocks and computes the mean absolute value of each block. The resulting 25 Hz signal is the “volume envelope”. The code subtracts the mean volume from each track’s envelope, then performs a cross-correlation between tracks and looks for the peak, which identifies the relative shift. To avoid performing N^2 cross-correlations, one clip is selected as the fixed reference, and all others are compared to it. The peak position is quantized to the block duration (creating an error of +/- 20ms), so to improve accuracy a parabolic fit is used to interpolate the true maximum. I don’t know the exact residual error, but I expect it’s typically less than 5 ms, which should be plenty good enough, seeing as sound travels about 1 foot per ms.

My original intent was to compensate for clock skew as well, because all these recording devices are using independent sample clocks that are running at slightly different rates due to manufacturing variation. There’s even code in the commit for a far more complex algorithm that can measure this clock skew. At the moment, this code is disused, for two reasons: none of our test clips actually showed appreciable skew, and PiTiVi doesn’t actually support changing the speed of clips, especially audio.

If you want to help, just stop by the PiTiVi mailing list or IRC channel. We can use more test clips, a real testing framework, a cancel button, UI improvements, conversion to C for speed, and all sorts of general bug squashing. For this feature, and throughout PiTiVi, there’s always more to be done. I’ve found the developer community to be extremely welcoming of new contributions … come and join us.

Transparent Video with GStreamer

So you wrote a script to generate an animation with ImageMagick or something. You have a folder full of transparent PNGs, one for each frame. Now you want to do some alpha-channel compositing in gstreamer (e.g. with PiTiVi). Instead of having to shove around a huge pile of PNGs, you want a single movie file that contains them all, and retains the transparency. So here’s what you do:

gst-launch-0.10 multifilesrc location=images%05d.png caps="image/png,framerate=1/1,pixel-aspect-ratio=1/1" num-buffers=95 ! pngdec ! videorate ! alphacolor ! "video/x-raw-yuv,format=(fourcc)AYUV" ! matroskamux ! filesink location=images_raw.mkv

Boom. I just saved you my last two hours.