Archive for the ‘Xiph’ Category

No, you can’t do that with H.264

Tuesday, February 2nd, 2010

A lot of commercial software comes with H.264 encoders and decoders, and some computers arrive with this software preinstalled. This leads a lot of people to believe that they can legally view and create H.264 videos for whatever purpose they like. Unfortunately for them, it ain’t so.

Maybe the best example comes from the Final Cut Pro license:

To the extent that the Apple Software contains AVC encoding and/or decoding functionality, commercial use of H.264/AVC requires additional licensing and the following provision applies: THE AVC FUNCTIONALITY IN THIS PRODUCT IS LICENSED HEREIN ONLY FOR THE PERSONAL AND NON-COMMERCIAL USE OF A CONSUMER TO (i) ENCODE VIDEO IN COMPLIANCE WITH THE AVC STANDARD (“AVC VIDEO”) AND/OR (ii) DECODE AVC VIDEO THAT WAS ENCODED BY A CONSUMER ENGAGED IN A PERSONAL AND NON-COMMERCIAL ACTIVITY AND/OR AVC VIDEO THAT WAS OBTAINED FROM A VIDEO PROVIDER LICENSED TO PROVIDE AVC VIDEO. INFORMATION REGARDING OTHER USES AND LICENSES MAY BE OBTAINED FROM MPEG LA L.L.C. SEE HTTP://WWW.MPEGLA.COM.

The text could hardly be clearer: you do not have a license for commercial use of H.264. Call it “Final Cut Pro Hobbyist”. Do you post videos on your website that has Google Adwords? Do you edit video on a consulting basis? Do you want to include a video in a package sent to your customers? Do your clients send you video clips as part of your business? Then you’re using the encoder or decoder for commercial purposes, in violation of the license.

Now, you might think “but I’m sticking with MPEG-4, or MPEG-2, so it’s not a problem for me”. No. It’s just as bad. Here’s the relevant section of the license:

13. MPEG-2 Notice. To the extent that the Apple Software contains MPEG-2 functionality, the following provision applies: ANY USE OF THIS PRODUCT OTHER THAN CONSUMER PERSONAL USE IN ANY MANNER THAT COMPLIES WITH THE MPEG-2 STANDARD FOR ENCODING VIDEO INFORMATION FOR PACKAGED MEDIA IS EXPRESSLY PROHIBITED WITHOUT A LICENSE UNDER APPLICABLE PATENTS IN THE MPEG-2 PATENT PORTFOLIO, WHICH LICENSE IS AVAILABLE FROM MPEG LA, L.L.C., 250 STEELE STREET, SUITE 300, DENVER, COLORADO 80206.
14. MPEG-4 Notice. This product is licensed under the MPEG-4 Systems Patent Portfolio License for encoding in compliance with the MPEG-4 Systems Standard, except that an additional license and payment of royalties are necessary for encoding in connection with (i) data stored or replicated in physical media which is paid for on a title by title basis and/or (ii) data which is paid for on a title by title basis and is transmitted to an end user for permanent storage and/or use. Such additional license may be obtained from MPEG LA, LLC. See http://www.mpegla.com for additional details. This product is licensed under the MPEG-4 Visual Patent Portfolio License for the personal and non-commercial use of a consumer for (i) encoding video in compliance with the MPEG-4 Visual Standard (“MPEG-4 Video”) and/or (ii) decoding MPEG-4 video that was encoded by a consumer engaged in a personal and non-commercial activity and/or was obtained from a video provider licensed by MPEG LA to provide MPEG-4 video. No license is granted or shall be implied for any other use. Additional information including that relating to promotional, internal and commercial uses and licensing may be obtained from MPEG LA, LLC.

Noticing a pattern? You have a license to use their software, provided you don’t make any money, your friends are also all correctly licensed, and you only produce content that complies with the MPEG standard. Using video for a commercial purpose? Producing video that isn’t within MPEG’s parameters? Have friends who use unlicensed encoders like x264, ffmpeg, or xvid? Too bad.

This last thing is actually a particularly interesting point. If you encode a video using one of these (open-source) unlicensed encoders, you’re practicing patents without a license, and you can be sued. But hey, maybe you’re just a scofflaw. After all, it’s not like you’re making trouble for anyone else, right? Wrong. If you send a video to a friend who uses a licensed decoder, and they watch it, you’ve caused them to violate their own software license, so they can be sued too.

Oh, and in case you thought this was specific to Apple, here’s the matching piece from the Windows 7 Ultimate License:

18. NOTICE ABOUT THE H.264/AVC VISUAL STANDARD, THE VC-1 VIDEO STANDARD, THE MPEG-4 VISUAL STANDARD AND THE MPEG-2 VIDEO STANDARD. This software includes H.264/AVC, VC-1, MPEG-4 Part 2, and MPEG-2 visual compression technology. MPEG LA, L.L.C. requires this notice:
THIS PRODUCT IS LICENSED UNDER THE AVC, THE VC-1, THE MPEG-4 PART 2 VISUAL, AND THE MPEG-2 VIDEO PATENT PORTFOLIO LICENSES FOR THE PERSONAL AND NON-COMMERCIAL USE OF A CONSUMER TO (i) ENCODE VIDEO IN COMPLIANCE WITH THE ABOVE STANDARDS (“VIDEO STANDARDS”) AND/OR (ii) DECODE AVC, VC-1, MPEG-4 PART 2 AND MPEG-2 VIDEO THAT WAS ENCODED BY A CONSUMER ENGAGED IN A PERSONAL AND NON-COMMERCIAL ACTIVITY OR WAS OBTAINED FROM A VIDEO PROVIDER LICENSED TO PROVIDE SUCH VIDEO. NONE OF THE LICENSES EXTEND TO ANY OTHER PRODUCT REGARDLESS OF WHETHER SUCH PRODUCT IS INCLUDED WITH THIS PRODUCT IN A SINGLE ARTICLE. NO LICENSE IS GRANTED OR SHALL BE IMPLIED FOR ANY OTHER USE. ADDITIONAL INFORMATION MAY BE OBTAINED FROM MPEG LA, L.L.C. SEE WWW.MPEGLA.COM.

Doesn’t seem so Ultimate to me.

My advice: use a codec that doesn’t need a license:

Q. What is the license for Theora?
Theora (and all associated technologies released by the Xiph.org Foundation) is released to the public via a BSD-style license. It is completely free for commercial or noncommercial use. That means that commercial developers may independently write Theora software which is compatible with the specification for no charge and without restrictions of any kind.

Semiconductor prediction: long term

Wednesday, December 30th, 2009

Disclaimer: I don’t know anything about this stuff. I mostly code Python.

Setup

Computer chips (that includes CPUs, GPUs, RAM, and solid-state storage) are expensive. They’re expensive because they’re made in ultra-high-tech lithographic fabs that cost over a billion dollars to build. The chips have to be expensive because the fab’s cost must be amortized in the limited amount of time before the next process shrink, at which point the old fabs are obsolete (or at least behind the times).

As many have noted, there’s a fundamental limit to how many more process shrinks we can have. The latest chips are being produced at a 32 nm process. If you’re willing to extrapolate out Moore’s law for transistor density, then in about 20 years we’ll be at 1 nm process, and further shrinkage is impossible because that’s about the size of a single atom. If you’re a pessimist you think progress will stop sooner, and if you’re an optimist you think we’ll reach that limit sooner, so either way, we’re likely to be mostly done within a decade or two.

As process improvement becomes incremental, the commercial lifetime of fabs will increase dramatically, and the same is true of semiconductor designs. An Intel Core i7 is mostly faster than a Pentium II because of process shrinks. Without changes in process, a single chip design could be sold until the dominant cost consideration is the raw materials themselves… and with high volumes, I suspect that even ultra high purity silicon isn’t all that expensive.

Prediction

The question then is: if silicon is cheap, what do you put on it? From my perspective, the answer is “everything”. Different kinds of tasks require very different kinds of functions, and the most efficient implementation of any task, measured in time or power, is always a special-purpose circuit. A general-purpose computer might be used for word processing, massively multithreaded relational databases, sound manipulation, 3D graphics, physics simulation, videoconferencing, or some obscure low-latency stream processing. These tasks are respectively most efficiently accomplished by a few general-purpose CPUs, a massive number of logic-focused processors, DSP with hardware FFT, a GPU pipeline, an enormously wide floating-point vector machine, codec-specific encode/decode/muxing, and an FPGA. A future main chip might contain all these functions on a single die. If you’re not using them, they don’t draw power, and so are “free”.

This trend has very much already started, to the point that the most dubious thing about this prediction may be calling it “long term”. Today, AMD/ATi, Intel, and VIA each sell a complete package of: a general-purpose CPU, a vector unit attached to the CPU, a GPU with a 3D pipeline, a huge vector unit accessible on the GPU, and codec-specific demux and decode for roughly 5 different codecs. Texas Instruments OMAP3 includes all that (though smaller), plus a DSP. Both AMD and Intel have vowed to combine their functions onto a single die in the near future. Sun has demonstrated the effectiveness of Niagara in certain workloads, and Tilera and Intel’s Larrabee have moved even further down the polycore path.

Software Architecture Implications

One remarkable issue apparent with the growth of GPUs is how hard it is to allocate resources in heterogeneous environments. I’m not aware of any operating system that actually attempts to schedule processes on graphics cards or DSPs. This lack of scheduling hasn’t been a terrible issue in practice … because software that makes use of these special-purpose processors is so hard to write that most of a user’s software doesn’t require it!

The best approach I’ve seen to making use of special-purpose hardware is the one beginning to bubble up in the Gstreamer project. Gstreamer is a semi-special-purpose dataflow framework that knows something about the nature of the data that is flowing. Specifically, it knows the type of the data, the series of high-level operations that are needed, and the available implementation of these operations. Soon, it will know something about the underlying hardware, and the costs of performing operations in various places. The goal, then, is to be able to say “overlay the text from this file, rendered in this font, over this video, and display to this screen”, and let gstreamer work how which system components should be responsible for which steps. For multimedia, this is exactly the right way.

I think this is the way forward: a framework in which one composes high-level operations on typed inputs. If this becomes popular enough, then we really will have a scheduling problem, which leads me to a prediction: gstreamer or its successor will be integrated with the kernel, and especially the scheduler. The only way a scheduler will be able to allocate these heterogeneous resources effectively is if it can see the detailed structure of the tasks themselves. It needs to see things like the relationship between different pieces’ realtime deadlines, and the different possible processor allocations for all running pipelines. This is especially important given the bizarre topologies that seem to be inevitable in designs like Tilera’s.

Software Politics Implications

Right now, integrated chips like TI’s OMAP are also among the worst offenders in the area of proprietary drivers and undocumented functionality. To a businessman focused on differential gains, this makes perfect sense, because there’s nothing a corporation hates more than commoditization. By keeping the public abstraction barrier high, the manufacturer raises the costs for others to reproduce its work, limiting the ability of competition to squeeze prices against manufacturing costs. As semiconductor production becomes increasingly commoditized, the incentive to hide the functioning of the chip becomes higher and higher.

The other problem is that even if the designs aren’t secret, the choice of which functions to implement has a huge impact on the choice of software. The example of the moment is the ubiquitous MPEG-4 accelerator chips. They implement the patented algorithms required to decode MPEG-4, requiring licensing fees both to produce and to use. They are largely undocumented, perhaps due to fear by their manufacturers that releasing documentation would be seen as encouraging patent infringement. As long as patents on software are enforced, some component of a complex chip will likely require a patent license to use. It’s not just video codecs, either: an obvious example of the moment might be a .NET/CLI bytecode acceleration unit.

At a higher level, there’s a political question about compatibility. Currently, CPUs (even of different architectures) are all essentially C processing units, and so most code written in C (or higher) will run across any of them with nothing more than a recompile (and often less than a recompile). As chips acquire more special-purpose functional units, making use of them is going to require something more than ISO C. If chipmakers don’t agree ahead of time to standard abstraction barriers (like OpenGL or VHDL), it could quickly become quite difficult to build an operating system that runs the same applications on multiple architectures. DSPs are already in this position, requiring hand-tuned assembler that’s different for every DSP. Moreover, manufacturers who fear commoditization will shun standards, rendering it almost impossible to use the whole chip without tying yourself to it. We can work around this, by providing high-level operators (e.g. a Theora decoder) with many different backends, but an enormous amount of labor will be required.

Open Issues

My number one unanswered question is: what will the memory topology look like? I’m fairly certain it’s going to look complicated, and severely nonuniform, but beyond that I’m stumped. My best guess is that the components of a chip will be wired together by an on-chip bus, with independent caches and maybe a few different main RAM banks.

As silicon gets faster though, I wonder if the latency to main memory will become unbearable. The only solution is to move the memory closer to the chips, onto the same die. IBM’s POWER designs seem to be headed this way already, with DRAM on the CPU. If you already have NUMA, then naturally you want the memory to be nearest to its functional unit… or perhaps the reverse! Maybe the future is widely separated functional units, each surrounded by its own RAM bank.

The really interesting question here, though, is what happens if nonvolatile RAM picks up. Will we have memory, storage, and processing, all mixed up in a single bank of identical chips? Maybe cache, RAM, and disk will be replaced by a continuum, from the bits that are closest, and therefore fastest, to the ones that are furthest away.

That would be cool.

Old bugs fixed

Friday, November 6th, 2009

In the process of testing Cortado on old operating systems, we discovered that using a recent compiler produced bytecode that wouldn’t run on Sun JDK 1.1. Instead, we got IllegalMonitorState exceptions in an infinite loop.

A little bit of searching made it clear that we weren’t the only ones who’d experienced this problem. There were reports going back to 2001 that Sun had introduced some sort of bug in their compiler in version 1.4. We verified that going back to an old compiler produced code that worked for us, again.

Today Greg Maxwell constructed a minimal test case and printed out the disassembled bytecode produced with old and new compilers. One difference stood out: the new compiler introduced a circular exception handler at the end of a synchronized block. I looked around, and sure enough, this behavior drew complaints when it first appeared over eight years ago.

Rather than attempt to convince the compiler authors that their code has a logical fallacy, or somehow fix ten-year-old versions of closed-source software, we instead decided to add a workaround into ProGuard, a bytecode post-processor that we are already using to shrink Cortado by 30% for faster downloads.

There’s an interesting question here as to what, exactly, the bug is. Is it a code generation bug, in which the compiler produces bytecode that will not run correctly on the Java 1.1 target? Or is it a JVM bug, exposed by newer compilers that make use of previously untested edge cases? This is a case of Software Development Relativity: the number of bugs is conserved, but their precise location depends on your reference frame.

Anyway, I think this is a nice short story about the power of an open development model. We found a bug somewhere in a complex system, and wound up putting a fix in the component whose maintainers, we hope, will be most receptive to it. When one avenue is cut off, open source finds another route.

Cortado

Monday, November 2nd, 2009

A project I’ve been playing with recently is Ogg Theora’s Cortado, a free video player designed to be able to run on an extremely wide variety of computers, including old, obsolete systems. How old, you ask?

Really old:

Screenshot of Cortado playing a video in SheepShaver

Screenshot of Cortado playing a video in SheepShaver

This is a picture of Cortado running on Mac OS 7.5.5, in the Macintosh Runtime for Java 2.0, playing the video from the FSF’s freedom testimonials campaign. This operating system was released in 1996. The system is emulated in SheepShaver, which makes playback far too slow to be usable. Someone will have to test on real hardware to see what happens.

Nonetheless, I think this is strong evidence regarding how serious we are about backwards compatibility and inclusive software. Serious, or at least, enthusiastic.