McGill researchers shed light on complex neural processes that help us understand speech

 

 

The speed at which the brain is able to decode the rapid flow of acoustic information to process speech is remarkable. In order to achieve this, the brain relies primarily on context. One hypothesis is that cerebral mechanisms exist that predict which words are most likely at any given moment while listening to the flow of someone speaking. These predictions are the fruit of our education, life experiences and mental representations of our environment, including of the person with whom we’re having a conversation. When the prediction is correct, the brain rapidly registers the information in the speech flow and does not need to spend more metabolic resources updating and improving its internal predictive models.

While the neuroscience of language is a very active field of research, it has been limited by the contingencies imposed by brain imaging scanners and the complexity of brain signals when using natural reading of speech: brain signals are difficult to decode and relate to the speech utterance perceived vs. the influence of context. For these reasons, researchers have typically been presenting words separately, in relatively slow and short sequences, to isolate the brain signals induced by each. This has limited our capacity to fully comprehend how context influences perception in a natural, moment-to-moment fashion.

Now, researchers at McGill University have derived a new approach to discover how the brain implements these sophisticated functions to extract meaningful speech information. Their findings were published in a recent edition of the journal Neuron.

 


 

Combining artificial neural networks with neuroimaging

The researchers used TED-LIUM, an open-access resource containing the transcripts of 1,500 TED talks. Peter Donhauser, a PhD student at McGill and the study’s co-author, developed the methodology to temporally align the transcripts with the corresponding audio. He then presented this corpus (a lexicon of about 10,500 unique words) to an artificial neural network (ANN) inspired by current technology used in artificial intelligence for natural speech processing. The ANN was used as a proxy for the internal predictive models implemented in the human brain. Donhauser then took the unique step of combining the prediction outcomes of the ANN with the neurophysiological brain signals recorded at the millisecond time scale from human participants with Magnetoencephalography (MEG), while they were presented with the same audio clips.

“With our approach, we showed that when context is uncertain, there are brain processes that enhance the sensitivity to incoming speech signals to decode them properly,” explains Dr. Sylvain Baillet, Professor of Neurology and Neurosurgery at McGill and the paper’s co-author. “It’s like turning up the volume of your radio when you expect something important. However, these brain processes happen at a fast time-scale, which we call the “theta” rate (about 4-8 times per second). When the surprise caused by an incoming speech utterance is high, slower brain processes come into play, which we call the “delta rate” (less than 4 times per second) and are stronger in brain regions immediately in the vicinity of primary auditory areas. We believe these signals convey prediction errors that are eventually used to update and correct our internal speech representation models.”

The researchers were surprised to find that the brain regions involved in the mechanisms of processing contextual uncertainty and surprise in speech were confined around the auditory cortex and not further distributed across the brain. “This actually makes sense as speech processing requires neural circuits to rapidly decode language contents,” explains Donhauser. “Our study does not mean all language processing occurs within a few square centimetres of cortex — we know it is largely distributed across the brain — but the important mechanisms of contextual gain control and adaptation are close to where audio signals reach the cortex. As such, these mechanisms make sure the rest of the brain receives the information it needs to make sense of spoken language.”

The researchers are planning to use the same approach to refine their findings in terms of subtle neurolinguistics parameters such as syntax and semantics. They also hope to investigate the mechanisms of speech disorders or the unique experience of the multilingual brain.

“We believe our discovery is important because speech is a distinctive feature of the human brain,” notes Dr. Baillet. “By advancing our knowledge of how the brain processes such sophisticated information we better understand the fabric of human nature. Furthermore, speech disorders affect a very large number of people. We hope our research will inspire new approaches and specialized studies to help those who live with such disabilities.”

“Two Distinct Neural Timescales for Predictive Speech Processing,” by Peter Donhauser and Sylvain Baillet, was published in the journal Neuron. DOI: https://doi.org/10.1016/j.neuron.2019.10.019

 

 

January 22, 2020