Monthly Archives: December 2009

Semiconductor prediction: long term

Disclaimer: I don’t know anything about this stuff. I mostly code Python.

Setup

Computer chips (that includes CPUs, GPUs, RAM, and solid-state storage) are expensive. They’re expensive because they’re made in ultra-high-tech lithographic fabs that cost over a billion dollars to build. The chips have to be expensive because the fab’s cost must be amortized in the limited amount of time before the next process shrink, at which point the old fabs are obsolete (or at least behind the times).

As many have noted, there’s a fundamental limit to how many more process shrinks we can have. The latest chips are being produced at a 32 nm process. If you’re willing to extrapolate out Moore’s law for transistor density, then in about 20 years we’ll be at 1 nm process, and further shrinkage is impossible because that’s about the size of a single atom. If you’re a pessimist you think progress will stop sooner, and if you’re an optimist you think we’ll reach that limit sooner, so either way, we’re likely to be mostly done within a decade or two.

As process improvement becomes incremental, the commercial lifetime of fabs will increase dramatically, and the same is true of semiconductor designs. An Intel Core i7 is mostly faster than a Pentium II because of process shrinks. Without changes in process, a single chip design could be sold until the dominant cost consideration is the raw materials themselves… and with high volumes, I suspect that even ultra high purity silicon isn’t all that expensive.

Prediction

The question then is: if silicon is cheap, what do you put on it? From my perspective, the answer is “everything”. Different kinds of tasks require very different kinds of functions, and the most efficient implementation of any task, measured in time or power, is always a special-purpose circuit. A general-purpose computer might be used for word processing, massively multithreaded relational databases, sound manipulation, 3D graphics, physics simulation, videoconferencing, or some obscure low-latency stream processing. These tasks are respectively most efficiently accomplished by a few general-purpose CPUs, a massive number of logic-focused processors, DSP with hardware FFT, a GPU pipeline, an enormously wide floating-point vector machine, codec-specific encode/decode/muxing, and an FPGA. A future main chip might contain all these functions on a single die. If you’re not using them, they don’t draw power, and so are “free”.

This trend has very much already started, to the point that the most dubious thing about this prediction may be calling it “long term”. Today, AMD/ATi, Intel, and VIA each sell a complete package of: a general-purpose CPU, a vector unit attached to the CPU, a GPU with a 3D pipeline, a huge vector unit accessible on the GPU, and codec-specific demux and decode for roughly 5 different codecs. Texas Instruments OMAP3 includes all that (though smaller), plus a DSP. Both AMD and Intel have vowed to combine their functions onto a single die in the near future. Sun has demonstrated the effectiveness of Niagara in certain workloads, and Tilera and Intel’s Larrabee have moved even further down the polycore path.

Software Architecture Implications

One remarkable issue apparent with the growth of GPUs is how hard it is to allocate resources in heterogeneous environments. I’m not aware of any operating system that actually attempts to schedule processes on graphics cards or DSPs. This lack of scheduling hasn’t been a terrible issue in practice … because software that makes use of these special-purpose processors is so hard to write that most of a user’s software doesn’t require it!

The best approach I’ve seen to making use of special-purpose hardware is the one beginning to bubble up in the Gstreamer project. Gstreamer is a semi-special-purpose dataflow framework that knows something about the nature of the data that is flowing. Specifically, it knows the type of the data, the series of high-level operations that are needed, and the available implementation of these operations. Soon, it will know something about the underlying hardware, and the costs of performing operations in various places. The goal, then, is to be able to say “overlay the text from this file, rendered in this font, over this video, and display to this screen”, and let gstreamer work how which system components should be responsible for which steps. For multimedia, this is exactly the right way.

I think this is the way forward: a framework in which one composes high-level operations on typed inputs. If this becomes popular enough, then we really will have a scheduling problem, which leads me to a prediction: gstreamer or its successor will be integrated with the kernel, and especially the scheduler. The only way a scheduler will be able to allocate these heterogeneous resources effectively is if it can see the detailed structure of the tasks themselves. It needs to see things like the relationship between different pieces’ realtime deadlines, and the different possible processor allocations for all running pipelines. This is especially important given the bizarre topologies that seem to be inevitable in designs like Tilera’s.

Software Politics Implications

Right now, integrated chips like TI’s OMAP are also among the worst offenders in the area of proprietary drivers and undocumented functionality. To a businessman focused on differential gains, this makes perfect sense, because there’s nothing a corporation hates more than commoditization. By keeping the public abstraction barrier high, the manufacturer raises the costs for others to reproduce its work, limiting the ability of competition to squeeze prices against manufacturing costs. As semiconductor production becomes increasingly commoditized, the incentive to hide the functioning of the chip becomes higher and higher.

The other problem is that even if the designs aren’t secret, the choice of which functions to implement has a huge impact on the choice of software. The example of the moment is the ubiquitous MPEG-4 accelerator chips. They implement the patented algorithms required to decode MPEG-4, requiring licensing fees both to produce and to use. They are largely undocumented, perhaps due to fear by their manufacturers that releasing documentation would be seen as encouraging patent infringement. As long as patents on software are enforced, some component of a complex chip will likely require a patent license to use. It’s not just video codecs, either: an obvious example of the moment might be a .NET/CLI bytecode acceleration unit.

At a higher level, there’s a political question about compatibility. Currently, CPUs (even of different architectures) are all essentially C processing units, and so most code written in C (or higher) will run across any of them with nothing more than a recompile (and often less than a recompile). As chips acquire more special-purpose functional units, making use of them is going to require something more than ISO C. If chipmakers don’t agree ahead of time to standard abstraction barriers (like OpenGL or VHDL), it could quickly become quite difficult to build an operating system that runs the same applications on multiple architectures. DSPs are already in this position, requiring hand-tuned assembler that’s different for every DSP. Moreover, manufacturers who fear commoditization will shun standards, rendering it almost impossible to use the whole chip without tying yourself to it. We can work around this, by providing high-level operators (e.g. a Theora decoder) with many different backends, but an enormous amount of labor will be required.

Open Issues

My number one unanswered question is: what will the memory topology look like? I’m fairly certain it’s going to look complicated, and severely nonuniform, but beyond that I’m stumped. My best guess is that the components of a chip will be wired together by an on-chip bus, with independent caches and maybe a few different main RAM banks.

As silicon gets faster though, I wonder if the latency to main memory will become unbearable. The only solution is to move the memory closer to the chips, onto the same die. IBM’s POWER designs seem to be headed this way already, with DRAM on the CPU. If you already have NUMA, then naturally you want the memory to be nearest to its functional unit… or perhaps the reverse! Maybe the future is widely separated functional units, each surrounded by its own RAM bank.

The really interesting question here, though, is what happens if nonvolatile RAM picks up. Will we have memory, storage, and processing, all mixed up in a single bank of identical chips? Maybe cache, RAM, and disk will be replaced by a continuum, from the bits that are closest, and therefore fastest, to the ones that are furthest away.

That would be cool.

Semiconductor predictions: Near Term

The great thing about being home with nothing to do is that I finally have time to write down the stuff that’s been in my head. The great thing about a weblog is that I have someplace to write it, regardless of whether anyone is reading.

The inspiration for this post is a conversation I had on Friday with a friend who is working on Spin Torque Memory. He told me, though I can’t quote him precisely, that it will shortly be as fast as SRAM, as durable and easily addressed as DRAM, as nonvolatile as Flash, and more power-efficient than any of the above. Bit density also seemed to be very high, and the low power draw means that stacking should be possible without heat sinking issues.

Spin Torque has some competition, from techniques like Phase-Change Memory, which has similar properties and is already available in 64 MB chips. I’m not claiming to be able to pick a winner here. All I’m saying is: someone is going to win this race, and we are going to be flooded with memory that is both wicked fast and absolutely permanent.

Non-volatile, infinitely rewritable memory that’s as fast as RAM is a really big deal for operating systems. Operating systems are necessarily designed around a particular class of computing hardware. OS designers must choose their abstractions to balance the psychology of programmers and users against the structure of the machines. Each shift in the data storage mechanism, from punched cards to drums to core to tape to disk, has inspired a new generation of operating systems and programming languages. Non-volatile memory is the next great shift.

The key reason that non-volatile memory will cause a revolution in operating system design is that it ends the need for a distinction between RAM and disk. Programmers especially are so acclimated to this distinction that we easily forget how artificial it is. We have separate RAM and disk only because disk is too slow and DRAM is volatile.

What will the revolution look like? Initially, it will be invisible. When capacities are large enough, nvRAM will appear behind trivial block device emulation chips (no wear-leveling required!), speaking SATA. When it gets cheap, fast, and low-power enough, it will show up in DIMMs, promising to save you the idle wattage of your memory bank.

At some point boardmakers will tire of buying two chips where one will do, and so the RAM and disk will be fused into a single shared memory bank, with allocation configurable in the firmware. The operating system will still see them as entirely separate, because hardware always moves ahead of software.

Not too far ahead though, especially with free software. Some manufacturer interested in the Linux market will add a firmware switch to disable zeroing the “RAM” on shutdown, allowing the kernel to address the whole bank over the memory bus. With a few patches that are quickly mainlined, Linux will boot without access to anything but RAM, with userland stored in a (renamed) tmpfs.

The fun begins here. It will start with increased use of mmap(). I can’t tell you where it goes from there. Backwards compatibility dictates that none of the current interfaces are likely to disappear, but I bet they’ll be used in different ways. Programming languages where pointer values are file paths? File systems whose interface looks more like a SQL database, or Java objects? A merger between SLUB and tmpfs? Software that describes input and output by pointers, and has no notion of files?

It’s going to be a wild time… and it might happen sooner than you think.

Movie Review: Avatar (3D)

I saw Avatar on Friday with family. I was pleasantly surprised. Within even the broadest construal of its category, it is unmatched.

The movie has been analyzed so extensively, everywhere, that I have very little to add. The plot is deliberately and utterly predictable, but then, this is from the director of Titanic. As a friend put it last night, “he’s not M. Night Shyamalan”. There’s also not a hint of moral ambiguity in the whole film. Every character is unalloyed Good or Evil. Maybe the sequels will make things more interesting.

The story takes place on Pandora, established throughout as a moon of a blue gas giant. While Pandora is in fact a moon of Saturn, the name seems to be more like an oblique reference to Europa, which has for decades been considered the most likely place other than Earth to harbor life. Of course, life in Europa would have to be aquatic, which doesn’t make for such a good movie, and neither Saturn nor Jupiter is blue.

Bus

I scheduled an experiment today, and there was still plenty of snow on the ground, so I walked into Central Square to catch Harvard’s free shuttle bus, the M2, which runs to the Medical Area. The bus comes roughly every half hour.

The bus driver would have gone right past the stop had it not been a red light. He didn’t pull over, and it was only when I waved my student ID card that he opened the door. I got on, swiped my card and looked … at an empty bus.

Me: Not too many people riding today, eh?
Him: You do know that bus service stopped at 1 right?
Me: What?
Him: Yeah.
[by this time we are well down Mass Ave., exiting Central Square]
Him: Where are you going?
Me: Longwood?
Him: I’ll take you there.

That’s how I ended up taking the world’s most convenient, least fuel-efficient, free taxi ride.

I felt kind of bad about taking the driver, and his bus, out of their way, until he got off at the same stop I did and left the bus parked.

Scheduling

It’s Tuesday. In two weeks I’ll be in Salt Lake City, on arguably my first trip to a scientific conference thing. It’s not a real conference, though. It’s the National Alliance for Medical Image Computing Work Week and All Hands Meeting, which is quite a mouthful, and in summary means that it’s like a conference except you don’t have to have done anything yet.

I’d like to have a bit more done before then anyway, though, and this makes for a bit of a scheduling issue. In the next 13 days I have to accomplish my scientific goal, drive home to see friends and family, then drive back to Boston to catch a plane, and somehow get my car un-crunched despite Christmas, New Years, and the need to drive to and from Connecticut.

Wish me luck.

Snowstorm

I’ve spent the past month working on a technique to extract motion parameters from ultrasound measurements, even when those measurements provide, at first glance, no relevant information at all. My approach has slowly gotten more sophisticated over time, and now employs a shift-insensitive distance metric, SMACOF MDS in four dimensions, a very specific sparsity structure, global optimization of piecewise quadratics, and a cosmological constant.

Anyway, it works, which means I can move forward to the next phase of my plan. Wait, what was my plan?

BTW, it’s snowing.

bluray

Tonight I watched my first Blu-Ray movie, though I didn’t realize it until just now. It was Kingdom of Heaven, a crusader epic. Blu-Ray carries far more detail than a DVD, and this movie, with sprawling epic battles and flyovers of an ancient city, is the poster child for a sharper format. We were watching on maybe a 50-inch screen.

It was good. I think the experience really did benefit from the tremendous visual quality. Not a lot, though. If it’s not on your mind, you may not notice. Had I not seen the box, I would almost certainly not have realized, and even having seen the box, it didn’t reach my consciousness until half an hour and three miles later.

Whoosh

My desk is in a classic scientific laboratory with slate counters and vents for things like Bunsen burners. It also has a fume hood, which is a powerfully ventilated enclosure for performing experiments that can produce harmful vapors.

Every few days, and sometimes more than once in a day, the fume hood in my lab goes funny. It starts pushing through more and more air, making a louder and louder whooshing sound until it’s a deafening roar. Then it stops roaring, and starts beeping, similarly loudly. After a minute or two it stops beeping, starts roaring again, and then goes back to normal. This is actually not so unusual; most friends who’ve worked in labs with fume hoods report similar poltergeists.

What’s particularly weird in my lab, though, is that every time this happens, we all lose our ethernet connection for about a minute.

Our building has some issues.

Party season

Friday night was the annual Radiology department holiday party, in the usual extravagant setting, with the usual band and food. This year’s terrible song performance by the chair of the department was “My grant”, to the tune of “My girl” (nominally, since no tune was in fact present).

Yesterday I went to a Hanukkah party (billed as a celebration of “cchhhhhhhhhhhhhanukka”) featuring marvelous sweet-potato latkes, followed by a perfect re-enactment of an MIT frat party, complete with most of the 2008 class of ATO. It was a cold night for biking around Boston, but properly suited up (wearing 14 articles of clothing) it felt quite mild.

Crunch?

I moved my car from one side of the street to the other this morning, to avoid tomorrow’s street cleaning, and when I got out, I noticed an enormous dent. Someone, or something, had put a major dent, spanning the fuel tank lid to the rear taillight, into my car. I have no idea when this happened, but sometimes after I got back from Thanksgiving, while it was parked. No note was in evidence.

I called up my insurance company, who told me to bring it to their Woburn location for appraisal. There I learned that, since the denter is unknown, my full collision deductible of $1000 would apply. Ugh. The appraiser was apparently feeling generous, though, and ruled that the dent was most likely not caused by a vehicle. Non-collision damage has a deductible of $300.

I spent basically all day dealing with this, and I’ll get to spend more time on it next week, finding a body shop and then bringing the car back in for repair verification. I don’t understand how anyone can hold down a 9-5 job, when bad surprises like this happen all the time.