Noise-canceling headphones have excelled at creating a quiet listening experience, but filtering specific environmental sounds remains a challenge. While the latest Apple AirPods Pro can automatically adjust sound levels, such as during conversations, they don’t allow users to control which sounds to listen to or when.
A team at the University of Washington has developed an innovative artificial intelligence system called “Target Speech Hearing” that offers a solution. This system enables headphone users to “enroll” a speaker by looking at them for three to five seconds. Once enrolled, the system filters out all other environmental sounds, allowing only the enrolled speaker’s voice to be heard in real-time, even if the listener moves around and is no longer facing the speaker.
The researchers presented their findings on May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The proof-of-concept device’s code is available for further development, though it is not yet commercially available.
“We tend to think of AI now as web-based chatbots that answer questions,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices, you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”
To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head toward a speaker. The sound waves from the speaker’s voice reach the microphones on both sides of the headset simultaneously, within a 16-degree margin of error. This signal is sent to an onboard embedded computer, where machine learning software identifies and isolates the speaker’s voice. The system then continues to filter out other sounds, even as the user and speaker move around. The more the speaker talks, the better the system becomes at focusing on their voice.
The team tested the system on 21 subjects, who rated the clarity of the enrolled speaker’s voice nearly twice as high as unfiltered audio on average.
This development builds on the team’s previous “semantic hearing” research, which allowed users to select specific sound classes, like birds or voices, and cancel out other noises.
Currently, the Target Speech Hearing system can enroll only one speaker at a time and requires a quiet environment for enrollment if there is another loud voice from the same direction as the target speaker. If the sound quality is unsatisfactory, users can re-enroll the speaker to improve clarity.
The team aims to expand the system’s application to earbuds and hearing aids, potentially revolutionizing how we interact with our auditory environment in noisy settings.