I got pretty lost after going to the bike shop yesterday, and wound up wandering, lost, around Southern Roxbury in the freezing cold. It doesn’t seem to be a very pleasant area, at least for biking, but maybe it would have seemed less hostile if I’d known where I was and could feel my fingers.

Now I’m sneezing up a storm and trying to avoid shaking hands. Curse you, winter.


I’m amazed by the intricacy of everyday life.

I ran out of socks. I was woken up this morning about 7 AM, by birds chirping outside my window. This is going to happen all spring, if last year was any indication. I even went to sleep with earplugs in, but these earplugs are not quite comfortable, with the odd effect that I inevitably wake up in the middle of the night in intense pain and have to remove them.

So I woke up early, which was good because I had a lot to do. The laundromat downstairs opens at 7:30, and I was down to my last pair of socks. I wouldn’t be able to do laundry this evening because of VoiceLab rehearsal until 10 PM, so I put on some arbitrary clothes, ate some cereal, and walked into the laundromat at 7:35. I started the wash, which took 26 minutes, enough time for me to go upstairs, take a shower, put on my real set of clothes and pack a bag. Planning ahead for rehearsal, I packed my sheet music and a snack (there would be no time for dinner). Then I went to the laundromat again, moved my clothes into a dryer, went upstairs again and took down my bicycle and bag.

My bicycle’s brakes had been acting up, so I oiled the front brake, and also took a look at the pads. They were obviously shot; I would need to get new pads immediately. I biked to the bank to deposit some utility cheques, then biked back home, took my laundry out of the dryer, hung up my dress shirts, picked up my full set of Allen keys, and biked off for my meeting.

I arrived at my 9:30 meeting, at the radiology department annex in Brookline Village, almost on time, and while I was there looked up the closest bike shop. It appeared to be Revolution Bicycle Repair, so I set off in that direction. At the bike shop, I bought a full complement of brake blocks, and used my Allen keys to replace the front brakes. I then biked to my actual office, with a working front brake.

None of this is particularly unique or unusual, and that’s precisely my point. Just getting through a morning requires a level of advanced planning and scheduling that borders on Operations Research. People do this stuff every day. They think of it as completely routine, and it is. It also makes me think that just about every person has considerable innate capacity for deductive logic, more than enough to be a scientist, engineer, or mathematician.


Last night I went to the ICCA Quarterfinals at Wellesley to watch the Chorallaries perform. It could hardly have been a more familiar event, since I participated in the 2005-6 quarterfinal that was held in precisely the same room. The groups were all decent, and a few were really great. The Chorallaries took 2nd place, which means they’ll be singing at the semifinals. The Chorallaries compete in the ICCA competition every third year, and this is at least the third consecutive time that they have made it to semifinals, which have been held at MIT’s Kresge Auditorium every time.


There’s not much else to report. Work is going well. It’s been raining all day, which suggests that we are edging out of winter and into spring. I’d like to get home to finish rebuilding my bicycle, but it won’t be next weekend, because its the annual Biophysics recruiting weekend (didn’t we just have one of those? worrying…) and also Bad Taste!


I spent some time this weekend at an OpenStreetMap mapping party. OpenStreetMap is already pretty great in the Boston area, thanks to the MassGIS data, produced by the state government and made freely available, so there wasn’t much left to be done. I added this path down the center of Comm Ave though, including all the memorials. I did this by hanging a GPS around my neck, biking up and down the length of the path, then biking back and tracing the GPS’s recorded route.

Take that, Google.

How Not to Not Sort by Average Rating

My friend Chris pointed out two webpages recommending algorithms for dealing with user-added ranking systems: Bayesian Rating – How to implement a weighted rating system and How Not to Sort by Average Rating. They’re both very problematic, and so I decided to write a response.

Both pages contain a clear description of the problem. For example, Bayesian Rating says

the artworks in TheBroth gallery are visitor rated, using a rather simple + and – system. If you like an item, rate it “plus”. If you don’t like it, give it a “minus”.

The rating of an item would then be: number of positive votes divided by number of total votes. For example, 4 + votes and 1 – vote would correspond to a rating of 0.8, or 80%.

Now if you want to rank the items based on this simple equation, the following happens:

Assume you have on item with a rating of 0.93, based on 100s of votes. Now another new item is rated by a total of 2 visitors (or even just one), and they rate it +. Boom, it goes straight to #1 position in the ranking, as its rating is 100%!

That’s an interesting problem. What does Bayesian Rating recommend as a solution?

Bayesian Rating is using the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the “believability” of the votes. The greater the certainty based on the number of votes, the more the Bayesian rating approximates the plain, unweighted rating. When there are very few votes, the bayesian rating of an item will be closer to the average rating of all items.

Use this equation:

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

Now, I’m no expert in Bayesian statistics, but I do know Bayes’s Theorem, and this isn’t it. I suspect you can get there from here, by assuming that all these distributions are normal, the prior is also normal, and their variances are particular carefully chosen values… but this is make-believe math. The result, in fact, is a reasonable formula, but it’s not correct Bayesian reasoning.

What’s even funnier, though, is How Not to Sort’s answer:

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:

\left(\hat{p}+\frac{z_{\alpha/2}^2}{2n}\pm z_{\alpha/2}\sqrt{[\hat{p}(1-\hat{p}) + z_{\alpha/2}^2/4n]/n}\right)/(1+z_{\alpha_2}^2/n).

Now the Wilson Score Confidence Interval is an interesting beast, and not irrelevant. It’s worthwhile seeing how we get there, since it’s completely unstated in How Not.

Suppose that I ask you to compute rankings of a large number of items M, given that each item i has been rated


times, and of those ratings,


were positive. I do not give you any more information for computing your rankings, which is good, because otherwise you could end up doing something crazy like MovieLens, which uses raters’ past ratings to predict future ratings by detecting similarities between movies and between raters.

The obvious assumption to make, then, is that the probability that a rater rates item i positively is some constant, unknown


, and if an infinite number of raters were available,


would converge to


. Unfortunately, we only have finitely many ratings, so instead we observe a binomial distribution:

P[k_i = \kappa | N_i, p_i] = \mathrm{Binom}(\kappa;N_i,p_i) = p_i^\kappa(1-p_i)^{N_i-\kappa} {N_i\choose\kappa}

. Our goal, then, is to somehow estimate


given the data we have. This is the standard case for Bayesian Estimation.

Bayesian Estimation works by constructing a probability distribution for the parameter we seek to estimate by multiplying together a Conditional Probability (the binomial above) and a Prior probability. In this case, the Prior is some probability density function, call it D(p). D is the probability density of getting any particular


. It’s a normalized probability distribution, and since all


have to be between 0 and 1, D is only nonzero between 0 and 1. Note that by adopting any such prior, we are assuming that all the items we are rating are drawn from some single distribution. This is perfectly acceptable, since we have no information that allows us to distinguish between these items. In this case, Bayes’s Theorem tells us that

P[p_i = q | N_i, k_i] = D(q)\cdot\mathrm{Binom}(k_i;N_i,q)

. This is called the posterior distribution.

To get the Wilson Confidence Interval noted above, we have to take D() to be a “flat prior”, i.e. D(q) = 1 for all q in the domain (0 to 1). A flat prior means that we have no reason to expect any value for


to be more likely than any other value. Given a flat prior, the posterior distribution is the same as the conditional distribution, i.e. a binomial. The confidence interval formula actually just computes percentiles into the binomial distribution.

So given this information, what does How Not recommend?

I would pick 0.05.

In other words, replace each set of ratings by the 5th percentile estimate. This choice seems fairly arbitrary, and far from the best estimator for


(that would be the 50th percentile). Where does it come from? By choosing the 5th percentile, we produce lower estimators for wider distributions, which correspond to those with fewer ratings. The effect, then, is similar to choosing a prior in which smaller values are much more probable than larger values, but it’s hard to solve backwards to figure out exactly what prior we’re effectively using. We might also be able to reproduce Bayesian Rating‘s result, approximately, by some particular choice of Normal prior, centered at the mean of all ratings.

If you have a strong feeling about what the prior probability distribution D should be, then you’re pretty much done. Bayes’s Theorem gives you the Posterior distribution on


, and all you have to do is pull out a value from it. The most popular way to do this is Maximum Likelihood Estimation, which, as the name implies, means just picking the value that gives you the highest posterior probability density. Symbolically, MLE means choosing

\bar{p_i} = \arg\max_q D(q)P[k_i = \kappa|N_i,q]


What if you don’t have a strong feeling about D? Then we need to estimate D somehow. Sometimes when you have a hammer, you really want to smash stuff, so we can simply apply Bayesian Estimation again, this time to D itself. To do so, we have to introduce a hyperprior E, that is, a prior probability distribution whose domain consists of all possible probability distributions. The conditional probability is now the probability, given D, that the entire dataset would be generated:

P[\{k_i\} | \{N_i\}, D] = \prod_{i=1}^M P[k_i | N_i, D] \\= \prod_{i=1}^M \int_0^1 P[k_i | N_i, p_i] D(p_i)dp_i


We then choose D by MLE again:

\bar{D} = \arg\max_\Delta E(\Delta)\prod_{i=1}^M \int_0^1 \mathrm{Binom}(k_i;N_i,p_i)\Delta(p_i)dp_i

. We can then use


as our prior to compute



Of course, we have now just gotten ourselves back to the same problem: we have to choose a hyperprior. Hyperpriors are pretty cool, though, because they allow us to make some very general statements about what kinds of priors we find acceptable. For example, the hyperprior could enforce smooth priors, or weak (i.e. nearly flat) priors, or (truncated) Normal priors, or any other class of priors that fits with your intuition about your particular rating problem.

We also have the option of choosing a totally flat hyperprior, meaning that E(d) is a constant regardless of its argument, and all prior distributions are equally likely a priori. This simplifies the MLE… but only slightly. What we’re left with is unfortunately not a very friendly expression, and I have no reason to expect that a closed-form solution exists in general (and many hours’ worth of failed attempts). In fact, the product-of-integrals formulation looks to me a bit like a continuum analogue of the Satisfiability problem, which is NP-hard.

Given that the flat-hyperprior MLE problem is hard, we can take one step back and perform what I call “Large Likelihood Estimation”: pick a value for
D that achieves a high posterior probability. For example, a decent choice might be

D(q) = \sum_{i=1}^M\frac{1}{M}\delta\left(q-\frac{k_i}{N_i}\right)

. This prior simply says that each rating should be one of the observed ratings. As such, it’s unfortunately rather trivial, equivalent to a flat prior on this dataset. However, it’s a good starting point for better priors, which can be found by optimizing the likelihood function while varying the coefficients and positions of the delta functions, using something like Nonlinear Conjugate Gradients. In fact, I have a hunch that this may actually produce the globally optimal prior.

Anyway, the point of this ramble is that something as simple as interpreting +/- ratings can wind up tremendously complicated. It’s complicated even though we actually made some rather strong simplifying assumptions: ratings are binomial,


is completely independent from


, all items are interchangeable… imagine the complexity if everything were on a 5-star scale!


Last night was my first rehearsal with VoiceLab. It was lovely, but also shocking. (People who know me will not be surprised to learn that I misremembered the time and arrived precisely an hour late.)

VoiceLab rehearsal felt exactly like Chorallaries rehearsal. OK, not exactly, but really, really familiar. Both groups have about 16 members, one of whom is the Music Director (runs rehearsal, gives musical guidance) and one of whom is the President (takes care of administrative duties, including planning the group retreat). The parallels seem to continue into less critical roles like Librarian. The personalities in the group make me feel right at home. Both tend to shuffle people among parts just a little for each song, assign percussion pretty much arbitrarily, and seem to have similar auditions processes, internal and external. Both sing arrangements in a similar style, of similar songs, written by the membership. In fact, two of the songs for which I was given sheet music at VoiceLab are songs that I already know well because the Chorallaries performed them!

There are a few differences. The biggest one is also the subtlest: almost every VoiceLab member seems to have been in an a cappella group for years as an undergraduate, and so we are collectively tremendously experienced. There are many things that take years to learn in a cappella, simple things like holding a dissonant note while listening to the rest of the chord, creating dramatic dynamics without drowning out the soloist, and breathing without breaking the flow. For VoiceLab, this stuff is child’s play.

The experience shows in the arrangements, too. Inexperienced arrangers (myself included) often write arrangements that are both too simple to be musically interesting and so faithful to the original performance that they are impossible to sing. VoiceLab has many experienced arrangers, and I am tremendously impressed with the songs I’ve learned so far. Between the musical experience of the group and the restrained style of arrangement, we were able to learn new music from scratch in a matter of minutes. Part of this is simply the maturity of the group; it’s a welcome relief to be in rehearsals that aren’t constantly being derailed.

VoiceLab practices once a week, half as often as the Chorallaries. (Undergraduate a cappella seemed to take as much time as a varsity sport, impossible for Real People.) Rehearsals are Wednesdays, which is also Karaoke night at the Harvard pub (curiously named The Queen’s Head). We went there after rehearsal and… they’re really great people.

This was a good idea.

Recipe for Applesauce


  • 5 red delicious apples

Try to eat one apple. Discover that they’re yucky, grainy and kind of mushy, maybe from sitting in the basement refrigerator for way too long. Hypothesize that substandard apples might make acceptable applesauce. Wash 4 apples, chop into eighths, and remove cores. Do not peel. Place into a pot, with a bit of water to avoid burning at the lower contact surface. Bring to low boil until apples are completely mushy, about 20 minutes. Place mushy apples into blender. Do not remove skins. Blend until blendy. Risk waking neighbors by blending at midnight.

Makes a bit over 3 cups of sweet, syrupy applesauce, with an unusually fine texture (except for the tiny flakes of peel that the blender couldn’t cut) and deep ruddy color.

Best served hot.


Last night I went out to Karaoke with a few people from VoiceLab. We went to Do Re Mi in Allston, which appears to be a Korean Karaoke place. In Asian-style Karaoke, one arrives with a group of friends and rents a (soundproof) room, at some hourly rate. Each room has a Karaoke machine. At Do Re Mi, each room also came with several enormous binders, with labels like “English by Title” or “Filipino, Thai, Vietnamese, and Other Languages by Artist”. The total catalogue could easily have exceeded 100,000 songs.

Do Re Mi doesn’t serve food, or permit alcohol, but you can bring any other food or drink you like into your room. The setup is really quite pleasant, with decent mics (SM58s) and big plasma screens. The song catalogue must be updated very frequently, since it already had 2008’s big hits like Coldplay’s Viva la Vida.

It was a fun experience, made better by the great snacks and voices. It ended up costing about $7/(person hr), with about 9 people in the room (more than 12 would probably be too crowded, but maybe they have bigger rooms). If you happen to find yourself with a small crowd of talented pop singers and have a few hours to spend, I recommend it.


You read it here first: I have joined VoiceLab, Harvard’s only (?) graduate student a cappella group. There were auditions, and they let me in, which is nice of them.

I know very little about the group, but it seems to be really fun, talented people with great taste in music.

My first concert will be May 1, 2009 at 8 PM in Paine Hall.


Atlantic Powdercoating finished with my bicycle in mid-January, but I couldn’t pick it up until Friday. They’re only open standard business hours, and they’re in Milford, CT, so I had to miss a day of work to go get it.

The frame and fork look beautiful. They truly look like like brand-new parts. I wish I’d had the foresight to take more pictures of this bicycle before, so that I could show a comparison, but the change is dramatic.

I took a lot of pictures of the restoration work. It’s only half-done, or maybe less, but I’ll try to put together some kind of photo-essay at some point.