Noise Cancellation: State of the Art

At 2Hz we’ve spent our last 1.5 years building a disruptive Noise Cancellation technology powered by Deep Neural Networks. And I must say it is a tough road.

In this article I’ll share with you the current state of the art for Noise Cancellation.

Before I start I want to clarify what exactly I mean by Noise Cancellation.

When I say Noise Cancellation I mean suppressing the Noise going from the caller and coming to the caller from other end. Imagine you are in a subway and you call a friend who is at the airport. By Noise Cancellation we mean suppressing the subway noise before sending to your friend (while you might still hear it) and also suppressing the airpot noise coming from their environment to you.

Active Noise Cancellation (ANC) refers to suppressing unwanted noise coming to your ears from the external environment surrounding you. For Active Noise Cancellation you typically need headphones (such as Bose QuiteComfort).

Active Noise Cancellation

In this article we will focus only on Noise Cancellation and not Active Noise Cancellation.

Traditional Approach to Noise Cancellation

Traditionally Noise Cancellation has been more effectively implemented on the edge device — phones, laptops, conferencing systems, etc. This is an intuitive approach since its the edge device that captures user’s voice in the first place. Once it captures the voice it can filter the noise out and send the result to the other end on the call.

I can imagine that mobile phone calls experience was quite bad 10 years ago especially when the call happened in a noisy environment. This is still the case with some mobile phones however more modern phones are equipped with multiple mics which help suppressing noise in calls.

Typically they have two or more mics (iPhone has 4). The first mic is placed in front/bottom of the phone where user’s mouth is. So it directly captures user’s voice. The second mic is placed as far as possible from the first mic, usually on the top/back of the phone. Both mics capture the surround sounds. Since the first one is closer to the mouth it captures more voice energy and second one captures more noise. When you subtract these from each other you get clean Voice (well almost clean).

Pixel 2016 mode. The two important mics are marked yellow

This sounds easy but it isn’t. There are many situations where this tech fails. Imagine when the person doesn’t speak and all the mics get is noise. Or imagine that the person is actively shaking/turning the phone while they speak (e.g. when running). Handling these situations is tricky. Engineers have to implement lots of workarounds to support these situations.

Running while speaking

However with all of the difficulties we can say for sure that these multiple-mic solutions work quite well today. Qualcomm is the best and biggest provider of such technology and they are doing a great job. Modern phones suppress noise quite successfully on most situations.

For this solution to work there is a certain form factor required. The distance between the first and second mics must meet a minimum requirement. In case of phone when user places the phone on their ear and mouth to talk — it works well.

However this is just a single form factor which in a long term is going to go away. In the world of wearables (smart watch, mic on your chest) and even in today’s world of laptops, tablets and Alexa this form factor doesn’t exist. Users talk to their devices from different angles and from different distances and in all of these situations we have no viable solution. Noise Cancellation just fails.

Modern Mac laptops have multiple built-in mics. These mics are trying to detect the direction of your voice and perform noise cancellation. However this tech is just broken. Even high end laptops with 7 mics on them fail miserably to perform this task.

Qualcomm recently announced a chip that they hope to integrate into every device out there. They claim that it works for any form factor, far mics, headsets and perhaps laptops as well.

Quite ambitious.

Deep Neural Networks based Noise Cancellation

With recent developments in Machine Learning Voice technologies have received a significant boost. This boost has enabled different types of voice technologies (e.g speech recognition) to pass over a quality tip-point and start becoming usable in real life.

Noise cancellation is another promising area for this boost. Today multiple mic or voice interface hardware vendors (or maybe all of them) such as Knowles, Synaptics, ADI are investing in building AI capable chips, such as this one: http://www.knowles.com/IA8508

Knowls ML optimized Audio Processor

We can imagine that more and more AI powered noise cancellation tech will go into the new generation chips.

2Hz Approach

At 2Hz we’ve built what we believe is the most advanced software based Noise Cancellation technology. All powered by Deep Neural Networks and implemented in Software. No additional mic required.

This technology is not only able to match the existing traditional DSP-Only 2 mic algorithms but also outperforms them in suppressing Non-Stationary noises (like a signal, siren, etc) which traditionally have been a big pain for DSP-Only approach. It also outperforms them in low SNR (high noise to voice ratio) situations.

And again, this is all done in software.

We can embed it into Chips, Laptops, Mobile Phones as well as run them in Servers disrupting the industry’s status quo.

Check these audio samples to see how 2Hz performs on various Non-Stationary and Low SNR situations:

In the next articles we will provide more insights into our Deep Learning-DSP approach. Stay tuned.

When we started 2Hz our goal was to build a tech which is able to Mute Background Noise entirely so that only Human Voice can go through. We are getting to this goal by improving 1dB SNR (Signal to Noise Ratio) at a time.

Ping us if you have questions: contacts@2hz.ai

Leave a Reply

Your email address will not be published.