In order to get a better picture of our surroundings, the brain has to integrate information from different senses, but how does it know which signals to combine? New research involving scientists from the Max Planck Institute for Biological Cybernetics, the Bernstein Center for Computational Neuroscience Tübingen, the University of Oxford, and the University of Bielefeld has demonstrated that humans exploit the correlation between the temporal structures of signals to decide which of them to combine and which to keep segregated.
This research is about to be published in Current Biology.
Multisensory signals originating from the same distal event are often similar in nature. Think of fireworks on New Year’s Eve, an object falling and bouncing on the floor, or the footsteps of a person walking down the street. The temporal structures of such visual and auditory events are always almost overlapping (i.e., they correlate), and we often effortlessly assume an underlying unity between our visual and auditory experiences. In fact, the similarity of temporal structure of multiple unisensory signals, rather than merely their temporal coincidence as it has been previously thought, provides a potentially powerful cue for the brain to determine whether or not multiple sensory signals have a common cause.
Cesare Parise from the Max Planck Institute for Biological Cybernetics in Tübingen and Bernstein Center for Computational Neuroscience Tübingen and his colleagues set out to examine the role of signal correlation in multisensory integration by asking people to localize a stream of beeps and flashes. Participants seated in front of a large screen where sounds (streams of noise bursts) and images (streams blurred blobs) were presented from different spatial locations. On some trials only visual or auditory stimuli were presented, while on other trials visual and auditory stimuli were presented in combination.
Critically, on combined audiovisual trials, the temporal structure of the visual and auditory stimuli could either be correlated or not. Participants were required to report the spatial position of such stimuli by moving a cursor controlled by a graphic tablet. In line with previous studies, participants were more precise when the auditory and visual streams were presented together than when they were presented in isolation. Notably, precision was even higher when auditory and visual streams were correlated, and closely approached the theoretical maximum.
These results demonstrate that humans optimally combine multiple sensory signals only when they correlate in time. Previous research has demonstrated that optimal integration only occurs when the brain is sure that the signals have a common underlying cause. These results therefore demonstrate that the brain uses the statistical correlation between the sensory signals to infer whether they have a common physical cause, and hence whether they provide redundant information that should be integrated.
The researchers suggest the brain has evolved this ability to combine potentially related information from different senses so it can effectively pick its way through the noisy environments of everyday life.
“It’s why at a noisy cocktail party you can tell who is speaking with which voice,” says Parise. “Our eyes and ears are continually taking in sensory information and our brains make sense of it all by merging together sights and sounds with similar temporal structures.”
In spite of being a pervasive aspect of sensory processing, little is known about the low-level statistical determinants of multisensory integration for signals with complex dynamic temporal patterns. This research highlights the role of a key organizational principle for multisensory perceptual grouping. What at first glance appears to be a logical fallacy, namely inferring causation from correlation, turns out to be the rule in perception.