AI Assistants (The MIT Press Essential Knowledge series)

By Pieraccini, Roberto

The Voice in the Machine: Building Computers That Understand Speech

By Roberto Pieraccini

The Voice in the Machine is the first and only book for a general non technical audience about the vision, the history, the technology and the business of voice recognition by computers.

Stanley Kubrick’s 1968 film 2001: A Space Odyssey famously featured HAL, a computer with the ability to hold lengthy conversations with his fellow space travelers. More than forty years later, we have advanced computer technology that Kubrick never imagined, but we do not have computers that talk and understand speech as HAL did. Is it a failure of our technology that we have not gotten much further than an automated voice that tells us to “say or press 1”? Or is there something fundamental in human language and speech that we do not yet understand deeply enough to be able to replicate in a computer?

In The Voice in the Machine, Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech may not have HAL’s capacity for conversation, they have capabilities that make them usable in many applications today and are on a fast track of improvement and innovation. Pieraccini describes the evolution of speech recognition and speech understanding processes from waveform methods to artificial intelligence approaches to statistical learning and modeling of human speech based on a rigorous mathematical model—specifically, Hidden Markov Models (HMM). He details the development of dialog systems, the ability to produce speech, and the process of bringing talking machines to the market. Finally, he asks a question that only the future can answer: will we end up with HAL-like computers or something completely unexpected?

Books with chapters to which I contributed

Data-Driven Methods for Adaptive Spoken Dialogue Systems: Computational Learning for Conversational Interfaces

Springer

Multilingual Natural Language Processing Applications: From Theory to Practice

By Daniel Bikel, Imed Zitouni

Spoken Language Understanding: Systems for Extracting Semantic Information from Speech

By Gokhan Tur, Renato De Mori

Speech Technology: Theory and Applications

Springer

Text, Speech and Dialogue: 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings (Lecture Notes in ... / Lecture Notes in Artificial Intelligence)

Springer

Recent Trends in Discourse and Dialogue (Text, Speech and Language Technology)

Springer

Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics

Springer

Perception in Multimodal Dialogue Systems: 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based ... / Lecture Notes in Artificial Intelligence)

Springer

Spoken Multimodal Human-Computer Dialogue in Mobile Environments (Text, Speech and Language Technology)

Springer

Speech Recognition and Understanding: Recent Advances, Trends and Applications (Nato ASI Series (closed) / Nato ASI Subseries F: (closed))

Springer

Speech Recognition and Coding: New Advances and Trends (Nato ASI Series (closed) / Computer and Systems Sciences (closed))

Springer

Have you talked to a machine lately? Asked Alexa to play a song, asked Siri to call a friend, asked Google Assistant to make a shopping list? This volume in the MIT Press Essential Knowledge series offers a nontechnical and accessible explanation of the technologies that enable these popular devices. Roberto Pieraccini, drawing on more than thirty years of experience at companies including Bell Labs, IBM, and Google, describes the developments in such fields as artificial intelligence, machine learning, speech recognition, and natural language understanding that allow us to outsource tasks to our ubiquitous virtual assistants.
Pieraccini describes the software components that enable spoken communication between humans and computers, and explains why it's so difficult to build machines that understand humans. He explains speech recognition technology; problems in extracting meaning from utterances in order to execute a request; language and speech generation; the dialog manager module; and interactions with social assistants and robots. Finally, he considers the next big challenge in the development of virtual assistants: building in more intelligence--enabling them to do more than communicate in natural language and endowing them with the capacity to know us better, predict our needs more accurately, and perform complex tasks with ease.