Mad Dog 21/21: Her Master’s Voice
March 28, 2016 Hesh Wiener
Someday there may be a portrait of Siri, the invisible aide of Apple’s users; an image of Google’s nameless virtual servant; an illustration of Amazon Echo’s helpful Alexa; or a graphic depicting Cortana, Microsoft’s attentive assistant. By contrast, IBM’s Watson, which talked to Bob Dylan, has an icon: a Haringesque homosexual planet.
None of these characters has achieved the immortality of Nipper the terrier, its attention drawn before 1900 by the recorded voice of its master. A four-ton statue of the duped pooch still adorns the Arnoff Building in Albany, New York.
Of all these characters, the only one with an interesting history, other than Nipper, of course, is Siri. Siri didn’t originate at Apple, where she now lives. Her roots include an eponymous app, sold by Apple in its App Store before its original producer, Siri Inc, born in Menlo Park, California, was acquired by App Store’s proprietor. In addition to the technologies perfected by its creator, Siri uses speech recognition technology from Nuance, a Burlington, Massachusetts, firm with a complex history that includes a liaison with IBM and its Via Voice speech-to-text effort. A third source of Siri’s superior vocalization is Susan Bennett, affiliated with Atlanta’s voiceover specialists GM Voices. Bennett recorded the large collection of words and phrases that, when sliced, diced, and concatenated, gave Siri its vocal personality.
The Apple Siri project was so hush-hush that Bennett herself didn’t know her work was part of a talking and listening iPhone 4S until 2011, when, the story goes, a friend called her to say that a talking iPhone sure sounded a lot like Susan. That turned out to be the case. Today, five years later, Siri’s voice relies on other sources of vocalization; it nevertheless incorporates one excellent aspect of its roots. Bennett wasn’t born in Atlanta. Long before she built a performance career in Georgia, long before she graduated Brown University in Providence, Susan Bennett grew up in Clinton, New Y, about an hour’s drive from the office of America’s most distinctive IBM analyst. That is why Siri knows Syracuse is seeracuse not saracuse.
The original Siri app, not yet fully integrated with iOS, provided a connection between iPhone users and apps that booked restaurant tables, bought movie tickets, called taxis and exercised Google Maps, which at the time was primitive compared to its current state. But by 2011, just five years ago, Apple had absorbed Siri and infused it with the skills and ambition that characterize all Apple offerings. Siri became a way to initiate just about every possible service an iPhone could provide from reading email to adding calendar appointments, from performing simple math calculations to checking the weather locally or pretty much anywhere on the planet. And Siri promises to keep on growing in scope and power as Apple adds features that make it a booster for apps that are of particular value to users with an iPad, an Apple Watch, or an Apple TV.
All the while, Siri has been learning additional languages, dialects and, increasingly, even regional language variations that don’t quite merit the rubric dialect. So, while some features have originated in English-speaking locales, they are not confined to their linguistic birthplace. Chances are good that Siri will enjoy a rich Chinese vocabulary, possibly a few, as it acquires dialects to supplement a core of Mandarin. It may not be long before Siri-based concepts and features that originate in what is bound to become a huge Chinese user base inspire new functionality that enriches the Apple experience everywhere.
Google, long aware that speech processing was a strategic imperative, was quick to spot the brilliance of the Siri voice interface. It put a stupendous effort into its own collection of voice technologies, initially giving the biggest boost to its navigation system but also providing support for Android texting apps, phone calling, weather reporting and more. Lots of apps gain power and versatility by making use of Google’s extraordinary searching and question answering capabilities. Like Apple, Google has also expanded the number of languages that are supported by its voice command and text-to-speech capabilities.
The power and versatility of the Google speech system may best be illustrated by examples of its integration with various apps. A user can wake up the service by saying, “Okay, Google,” and then asking a question like how much is a Polish zloty (and in the USA get an answer of 25 cents or whatever its current value). If you want the weather anywhere, just ask. If a phone has a flashlight function, you can ask the phone to turn it on or off. You can dictate a text or a Gmail but not an email sent by a third-party email app (although you can open that app, after which you must type the message). And of course Google understands all the basics associated with maps and navigation, like “navigate” to get a route by car, on foot or via public transit to most anywhere, or how to find local shops, restaurants, theaters and museums, or to find out how long it will take to travel to a particular place.
Microsoft’s Cortana system is similar, but it remains a source of frustration to Microsoft, as there simply are not that many phones, tablets, or computers using it. Usage is not only the starting point to build voice search revenue, it is also the source of experience that enables the computerized intelligence system behind the voice input/output service to amass extra knowledge. As navigation apps guide users through traffic they spot bottlenecks and use the information to quickly adjust guidance given to subsequent travelers. This on-the-job fast learning isn’t perfect, but it is a lot better than a rigid set of directions that pushes travelers into jams. So if Cortana doesn’t become a lot more popular, its key rivals, Siri and Google, will pull ahead, possibly so far that in the end Cortana will just fade away, embarrassing Microsoft.
The speech system that is rising and which doesn’t compete with the portable apps that are interwoven with navigation is Alexa, the voice of Amazon’s Echo device and the related apps that surround it. Alexa is a homebody, and its electronic package is designed to work a lot better while sitting on a table, desk, or shelf than the hardware in any phone, tablet or laptop. Echo has an array of seven microphones and clever noise-cancelling software that allows the device to pick voice commands against a background that may include music, audio, and video dramas, appliance noises and lots of other distracting audio. The Echo’s Alexa system is built to process commands that control entertainment and environmental devices. It can find and play songs, manage smart lighting, adjust thermostats, control door locks, set alarm clocks and operate smart appliances. Companies that make smart home equipment are able to get help from Amazon so their devices can work in conjunction with other smart gadgets in an Echo-equipped home.
For now, Amazon is a competitor of companies whose smart home controllers are sold by home improvement shops, installed by communications carriers and promoted by alarm and security firms. But there is no reason Amazon must remain a rival of these companies. If Amazon can come up with a marketing plan that makes its Echo a wise and profitable choice for the communications providers that currently have prominence in the smart home business, it could really take off. Amazon would likely get added revenue from media content it sells, and carriers would gain by virtue of their enhanced services.
Voice commands would be a very attractive addition to the range of input and output options available to users of smart home technology. Currently, owners of smart home systems have to use tablet consoles, phone apps or computer programs to manage their homes. These apps are okay but not very impressive. Customers are content but not particularly thrilled with the way their smart homes are controlled. Adding an Echo with Alexa services would be a welcome upgrade . . . if Amazon’s technology could be made to work well with the installed systems.
However, Amazon will have to get its technology to cooperate with the systems in the installed base, particularly the parts of those systems that manage security apparatus. Unless Amazon’s Echo can integrate with existing household systems that provide fire alarms, intrusion detection and other security features backed by 7×24 monitoring, chances are the Echo will do poorly. It will fail, more or less the way Amazon’s phone failed to catch on.
Google may have an interest in the home system market, too. Its Nest group with an excellent thermostat and other devices could expand by either cooperating or competing with Echo/Alexa. For now, Echo can interact with Nest, but neither has shown that it can become a dramatically better hub. Neither has become the iPhone of home environmental management.
Apple isn’t a big player in this area yet, but in its quest to grow broader as well as taller, the company might try to create a smart home hub that redefines that realm, much the way its other products have brought significant advancement to their market segments.
Nor would it be prudent for any observer to ignore the potential of Microsoft, which is desperate to overcome the setbacks it has endured in consumer markets. With all its wealth and prowess, the Redmond, Washington company has repeatedly failed to create an iconic phone, a market-changing tablet or a game console that redefined amusement devices. The way things have been going, it is beginning to look like Microsoft’s future is mainly about grabbing territory that IBM is leaving or losing: glass houses and business computing clouds.
IBM, perhaps ironically, sounds more progressive than Microsoft these days, even if its financial results indicate that talk is a lot cheaper than industrial innovation. Still, even if what IBM seems to be saying about its Watson technologies is only partially true, Big Blue could well have technology that can enliven and animate many information based services. For instance, if Watson’s medical folk can bring an ear and a voice into operating theatres, giving surgeons a bit of real-time help the way navigation advice dispensed by Google Maps, a prominent place in medical history awaits. Or if Watson’s smart city technology, whether talking or not, can monitor water quality and ward off a tragedy like that in Flint, Michigan, Big Blue could become the agent of health and safety the public always wants, sometimes expects but doesn’t always receive.
Notwithstanding its ambitious promotional efforts, though, IBM seems to be missing the boat. If it doesn’t soon show some real services for ordinary people, its potential market, like bored Bob Dylan in that IBM Watson ad, is going to just walk away. Right now anyone can affordably get the best advice Siri has to offer, and quite a few already do so. Many others, perhaps even more, have Google riding along as they drive, finding a course around traffic jams and, when asked, suggesting a place to get roadside chow or with hands free placing a phone call home. A few people simplify visits to an unfamiliar city with coaching by Cortana. And a small but significant and growing number have homes managed with the help of networked tablets that make our dwellings safer nodes on the Internet of Things.
People who care about IBM, particularly shareholders glancing at the exit, want to know what Warren Buffet must wonder amidst his worst nightmares: Where is Watson in all this? Buffett and IBM buffs may be saying, as Alexander Graham Bell once said under very different circumstances, “Come here, Mr. Watson, I want you.”