|Volume 1: No. 13|
IBM has introduced a $7,200 VoiceType system that recognizes any 7,000 slowly spoken words. The key board and software were developed with Dragon Systems (Newton, MA). [Los Angeles Times. San Jose Mercury, 5/26.]
The leaders in commercial speaker-independent speech recognition appear to be Probe Research Inc. (Cedar Knolls, NJ) and Voice Information Associates Inc. (Lexington, MA). Performance can't match speaker-trained, isolated-word results by Dragon Systems Inc. (Boston, MA), and Kurzweil Applied Intelligence Inc. (Boston, MA), of course. DragonDictate will recognize a 30,000-word vocabulary at 40 wpm (95% accuracy) for $9,000. [A $3,100 system is also available. On market is computer professionals with repetitive strain injuries.] Hitachi is currently trying to develop neural-network systems, while NEC is focusing on Japanese-to-English speech translation. [Business Week, 6/3.]
Several Labs (BBN, CMU, MIT Lab for CS, SRI, Unisys, MIT Lincoln Labs, and Dragon Systems) have been developing "travel reservation" systems as a technology-driving project under DARPA sponsorship, and companies like AT&T, TI, and INRS/Bell Canada have joined the competition. A database of task-related speech samples is available, and the companies often share training data. The airline reservation database is limited to a few cities, carriers, and tasks -- about 1,200 words total -- so claims about recognition accuracy are highly optimistic.
SRI has recently gotten good press about its prototype Decipher system. (E.g., Don Clark, SF Chronicle, 6/13.) SRI is also working on applications in language instruction, telephone- based banking, and perhaps control of "smart car" dashboard functions. Phonetic recognition from connected speech is quite good, but the semantic templates are still very limited. I've seen the system demoed. Performance was often poor with untrained speakers (things like "When can I go to ..." mistaken for "When Continental ..."), but seemed reliable when the developers queried the database. Patti Price has added a breathing model to eliminate some of the end-of-sentence artifacts, but chair squeaks and extraneous conversation are sometimes picked up as syllables. Additional work on semantics will have a quick payoff, but I doubt that speech recognition technology will support unrestricted vocabulary any time soon.
One problem with current telephone banking is that anyone with your checking account number and social security number can call up and get your account history. Personal ID numbers (PINs) are more secure than SSNs, but not all banks or customers want to use them. I'm sure there's going to be a big demand for voice- identification systems, but I haven't heard much about voiceprint identification research.