For over five decades, researchers have worked toward computer speech recognition. While real time recognition of unconstrained human language hasn't been reached, the technology has made significant progress. As microprocessor technology continues to advance, speech recognition products are being made faster, better, and cheaper.
Advances in hardware speed and algorithm capability have made improved automatic speech recognition technology possible,” said Dr. John C. Kelly, interim chairperson of [North Carolina Agricultural and Technical State University] NC A&T’s Department of Electrical Engineering. “Unfortunately, though the technology components are in place, unconstrained speech recognition systems have advanced very little, due to the complexity and redundancy of language. (April 2001, The Use of Multi-Model Technology to Advance Automatic Speech Recognition , a project sponsored by Defense Advanced Research Projects Agency (DARPA).
Speech recognition systems are divided into two main categories:
There are a number of speech synthesis systems on the market. In artificial speech generation there is a tradeoff between intelligibility and naturalness. Currently, the industry has placed emphasis on naturalness, with the unfortunate consequence that even the high end systems are sometimes hard to understand. Speech synthesis is generally regarded as a secondary and much less complex issue when compared to speech recognition and understanding.
The current trends in the industry are for speaker-independent systems, software-based systems, and extensive use of post- processing. Spoken language systems that interpret spontaneous speech are also emerging. Speech recognition systems are being widely used by telephone companies, banks, and as dictation systems in many offices. These applications are highly constrained and sometimes require the user to pause between each word (isolated speech recognition). Other much more challenging applications have been fielded. Over the years, the Navy has purchased simulators that must operate under the most adverse conditions to speech recognition -- high noise, high stress, high personnel turnover, high need for accuracy, real time performance of about 300 msec, and out-of-phraseology speech.
Noise - Speech recognizers are susceptible to coherent and random noise within the bandwidth of human speech . Since speaker-independent systems use speaker models that operate with a variety of speakers, the models are more susceptible to noisy environments than the speaker-dependent systems
Out-of-Phraseology Speech - Speech recognizers are not yet capable of understanding unconstrained human speech. Accordingly, applications are developed based on constrained vocabularies. But users often say words that are not in the legal vocabulary. Since the speech recognizer will try to make the best match, undetected out-of-phraseology speech could be processed with chaotic results. The challenge is to detect out-of-phraseology speech and reject it before it is post-processed.
Speech recognition systems are not capable of perfect recognition. In general, performance is a trade-off between speed and accuracy. The major risk associated with these systems is that the speech input will be misrecognized and that erroneous input will be processed by the system. What would be considered a minor annoyance with a dictation system could be disastrous to a pilot using speech recognition in the cockpit
The Cambridge University Speech Research page contains information on speech recognition, coding synthesis, related conferences and web sites and more.
Speech at Carnegie Mellon University is dedicated to speech technology research, development, and deployment, and provides a vehicle to make their work available online. CMU has a historic position in computational speech research, and continues to test the limits of the art.
Center for Spoken Language Research, University of Colorado, Boulder is focused on research and education in areas of human communication technology. This center conducts many research projects, some funded by Defense Advanced Research Projects Agency (DARPA), on natural language understanding, dialog modeling, multi-modal speech recognition technology, and so on.
DARPA speech recognition/synthesis projects include: The Human Language Systems (HLS) Program will create usable computer systems that can read and hear; moreover the systems will be able to understand what they read or heard in the context of a specific task. The overall objective of the program is to improve the readiness of military forces and improve the affordability of systems by providing dramatic new technology for systems interaction and use. The primary task domain for Human language Sytems will be military Command, Control, Communications, Computers, and Intelligence (C4I) with special emphasis on JTF crisis management planning and execution. By 1998, the goal was to provide easy to use dialog interaction capability for crisis decision support. This capability will enable effective operation of a geographically dispersed JTF staff and designated worldwide functional experts in domains like logistics, meteorological forecasting, and medicine.
The first hands-free LAN access points were installed aboard the USS Rentz in August 1996 which allow sailors to access the ship's LAN from their hands-free at their job sites. Repair technicians can request remote supply, parts or technical data while staying at their repair site.
The USS Princeton and USS John Paul Jones are in the process of installing hands-free LAN access points aboard their ships to become hands-free beta test sites along with the USS Rentz, and the Aegis Training Center. The hands-free computers will house Interactive Electronic Technical Manuals (IETMs) like the Aegis Fire Control Transmitter IETM which helps the sailors to do their jobs more efficiently and quicker.
Copyright © 2003
voice-commands.com All Rights
Send questions or comments to webmaster -guide.com
Or use the feedback form: here