Speech Server 2004 R2 On Tap from Microsoft
by Alex Woodie
Support for more languages, more flexible deployment, and better performance are the main benefits users will see with Microsoft Speech Server 2004 Release 2, which the company announced yesterday. While the first release of MSS 2004 only supported English for automated speech recognition (ASR) and text-to-speech (TTS) generation, R2 adds French and Spanish recognition and generation capabilities, as well as more efficient ASR and TTS engines, and a new "all-in-one server deployment" option.
Microsoft Speech Server 2004 provides a platform for customers and third-party vendors to develop and deploy speech-enabled applications on top of a Windows Server 2003 and ASP.NET Web infrastructure. These applications can take many forms, such as an interactive voice response (IVR) system that offloads common and low-value tasks, such as fetching the balance of a checking account over a telephone. But MSS 2004 and its associated software development kit go beyond IVR and also enable the development of "multimodal" applications that merge visual and audio elements--a class of mobile applications that really has yet to be exploited or explored, and which Microsoft chairman and chief software architect Bill Gates touched on in a recent speech and essay.
Microsoft redeveloped the ASR engine in MSS 2004 for R2, and the new engine delivers much better performance in identifying words, particularly when the server is programmed to identify a large set of words. Microsoft says the better ASR performance is a result of new grammar state caching optimization, process recycling optimization, decreased memory use, and a higher degree of speech recognition accuracy.
Microsoft's new ASR engine also supports U.S. Spanish and Canadian French, in addition to U.S. English. (In case you're wondering, U.S. Spanish is considered a "bilingual" form of Spanish that's spoken by people who are raised and educated in a predominantly non-Spanish speaking country, and it's distinct from the "monolingual" Spanish spoken in countries such as Spain and Mexico, according to Georgia State University.) The Speechify TTS also supports U.S. Spanish and Canadian French.
Microsoft OEMs its TTS engine from ScanSoft, whose popular Speechify software delivers voice inflection and generates remarkably natural sounding speech, both male and female, for various languages. You can sample the various voices, including "Javier," "Jill, "Felix," and "Kyoko," at this ScanSoft Web page.
The new "all-in-one server deployment" option with MSS 2004 means Enterprise Edition customers no longer are required to install the software on multiple, connected servers, an Enterprise Edition feature which must please Microsoft's larger customers that are trying to reign in their sprawling server farms. Microsoft still recommends a distributed deployment when connecting MSS 20004 R2 to more than 48 telephone ports for an IVR application, or up to 96 ports for a touch-tone application.
Pricing for MSS 2004 R2 Standard Edition, which runs on servers with up to 4 CPUs and supports up to 24 telephone ports, is unchanged and starts at $7,999 per CPU. The Enterprise Edition, which runs on servers with up to 8 CPUs and supports an unlimited number of ports, starts $17,999 per CPU. These are only suggested prices, as Microsoft's partners--companies like Gold Systems, Softel, and Intervoice, who build out-of-the-box applications based on the MSS 2004 R2 platform--may offer their own incentives.
Bill Gates last week touched on the problem of "information overload" and possible ways that technology like text-to-speech and speech-recognition can address it in an e-mail essay and a speech he gave at the Microsoft CEO Summit 2005 conference in Redmond, Washington. The title of both the essay and the speech were "The New World of Work."
"You should be able to listen to your e-mail, or read your voicemail," Gates wrote in his essay. "Project notifications, meetings, business applications, contacts, and schedules should be accessible within a single consistent view, whether you're at your desk, down the hall, on the road or working at home."