Robust Speech Recognition in Embedded Systems and PC Applications provides a link between the technology and the application worlds. As speech recognition technology is now good enough for a number of applications, and the core technology is well established around hidden Markov models, many of the differences between systems found in the field are related to implementation variants. This book distinguishes between embedded systems and PC-based applications. Embedded applications are usually cost sensitive and require very simple and optimized methods to be viable. Author Jean-Claude Junqua reviews the problems of robust speech recognition, summarizes the current state of the art of robust speech recognition while providing some perspectives, and goes over the complementary technologies that are necessary to build an application, such as dialog and user interface technologies. The book is divided into five chapters. The first one reviews the main difficulties encountered in automatic speech recognition when the type of communication is unknown. The second chapter focuses on environment-independent/adaptive speech recognition approaches and on the mainstream methods applicable to noise-robust speech recognition. The third chapter discusses several critical technologies that contribute to making an application usable. It also provides some design recommendations on how to design prompts, generate user feedback and develop speech user interfaces. The fourth chapter reviews several techniques that are particularly useful for embedded systems or to decrease computational complexity. It also presents some case studies for embedded applications and PC-based systems. Finally, the fifth chapter provides a future outlook for robust speech recognition, emphasizing the areas that the author sees as the most promising for the future. Table of Contents List of Figures List of Tables About the Author Preface Acknowledgements Sources of Variability and Distortion in the Communication Process Introduction Speaker/Task Variability - Introduction
- Age group differences
- Speaking rate
- Dialects and pronunciation variants
- Native versus non-native speakers
- Speaker issues: Unexpected behavior and task-induced variability
Acoustic Environment - Reverberation
- Environmental noise and Lombard reflex
Transducers and Transmission Channels - Variable transducers and speech recognition
- Speech transmission over the telephone or wireless connections
Assessing the Sensitivity of a Recognizer to Different Sources of Variability Environment-independent Adaptive Speech Recognition: A Review of the State of the Art Introduction Environment-independent ASR - Introduction
- Towards environment-independent features
- State-of-the-art acoustic modeling
Environment-adaptive ASR - Introduction
- Adaptation modes and selection criteria for an adaptation method
- Bayesian adaptation: Maximum a posteriori estimation
- Transformation-based adaptation
- Adaptation techniques based on clustering and model selection
- Corrective adaptation
- Feature-based adaptation
- Noise robust methods
Confidence Measures, Dialog Modeling and User Interface Introduction Representing and Ignoring Out-of-vocabulary Events Use of Confidence Measures for the Rejection of Incorrect Data - Introduction
- Utterance verification
- Confidence measures and unsupervised adaptation
- Dealing with utterance verification for embedded systems
Dialog Modeling - The different types of dialog
- Issues in dialog modeling
- Prompts and dialog feedback
- Directed dialogs versus conversational systems
- Mixing modalities
- Telephone-based dialogs
User Interface Issues for Speech-enabled Applications - Introduction
- A few design recommendations for a Speech User Interface (SUI)
- Designing for errors
From Cost-sensitive Embedded Applications to PC-based Systems Embedded versus PC-based Applications Design and Evaluation of Embedded Systems Algorithms for Embedded Applications and for Decreasing Computational Complexity - Floating-point to fixed-point conversion
- Decreasing the complexity of the Gaussian computation step
Some Case Studies - Introduction
- From voice dialing to car navigation
- Computer telephony applications
Application Programming Interfaces (APIs) - Introduction
- The Speech Application Programming Interface (SAPI)
- The Java Speech Application Programming Interface (JSAPI)
- The Voice Extensible Markup Language (VoiceXML)
Future Outlook for Robust ASR Introduction Promising Directions - Variable time-frequency analysis and use of multiple feature sets
- Use of temporal and spectral information
- ASR with partial information and multi-stream recognition
- Separate modeling of different sources of information
- Environment-adaptive ASR and language/task adaptation
- Rapid speaker adaptation
- Use of prosody information in ASR
- Pronunciation modeling and pronunciation adaptation
- Conversational systems and multimodal interfaces for interactive software agents
Index Hardcover; 204 pages.
|