Imagine a person, looking ordinary and dressed in street clothes, seated in front of a glossy black glass table. Perhaps it is a woman. She sits calmly, waiting to have a conversation with you. You sit down as well, and she places her hands on the table in front of her.
The table reacts, lighting up with the laser chromatic glow of a multi-touch surface. Ten circular outlines track the contact points of her fingers. Her fingertips slide across the teflon surface in a complex pattern, tracing a deliberate course through constantly evolving local color fields, and from a speaker below the surface you hear “Hey! How’ve you been!”. She smiles.
What I am imagining is a parametric human speech synthesizer. Most people don’t need help speaking, but for those with dysphonia the best we currently offer are text-based synthesizers controlled by a computer keyboard. That’s fine for basic communication, but it will never really sound natural. True speech is brimming with inflections and accents, created by endless subtle variations in pitch, timbre, volume, and timing. Written language alone does not approach the bandwidth of actual speech, not even using every character in the phonetic alphabet. It is only with the latest, or perhaps the next, generation of multi-touch surfaces that we finally have the ability to capture over a dozen different simultaneous degrees of freedom, which I think might be enough for decent speech control.
It turns out there’s a word for this class of device. What I’m describing could be called a 21st century Voder. The Voder was developed in the 1930s by Bell Labs, and had 15 keys, a wrist pad, and a foot pedal. With modern voice models and predictive text AI, we should be able to do far better.
I don’t expect a device of this sort to be built any time soon, although the fundamentals are pretty much in place. The market is too small, the equipment too expensive, and the learning curve too steep.
That’s too bad, because it would be totally sweet to watch in action.