We are thrilled to announce 2Hz Voice Library – a software library in C implementing next-generation voice enhancement algorithms.
The library is designed to be integrated into wide-range of devices and applications – headsets, smartphones, mobile apps, media servers, cars, radio devices, etc. Effectively, anything that interacts with a human voice.
2Hz Voice Library implements two features: Voice Activity Detection and Noise Suppression.
Both algorithms are powered by 2Hz’s specially designed Deep Neural Network and significantly outperform what’s out there in the market.
Real life demonstration of noise suppression technology is quite challenging since you need to find a way to simulate naturally-sounding noise in different physical setups – conference room, video conference, in front of a big audience, etc.
There is one particular test, well known in automotive industry, to simulate high-wind noise. One person talks to the microphone (mic) while another person blows directly into the microphone simply using their mouth or maybe a pen/straw – to create even more air pressure.
In this article we will explain why this test in most cases doesn’t make sense for MEMS microphones.
At 2Hz we are building world’s best Noise Suppression technology. And we’ve achieved an incredible quality.
We use Deep Learning in our approach and our latest algorithm increases MOS score by 1.2 points in average on a large dataset of noisy speech. This is a remarkable result. So far we haven’t seen any other tech coming close to this.
We’re happy to share today that we are bringing this technology to YOUR FINGERTIPS.
In the last 4 months we have been baking Krisp (krisp.ai). An App designed for laptops, which after installing, upgrades your laptop’s microphone and speaker and adds a magical “Mute Noise” button to it to use during conference calls.
You can use Krisp with any Conferencing App you prefer, out of the box.
The cool thing is: you can both mute the noise going from you to the conference participants and also mute the noise coming from them to you. Bi-directional mute.
PLC (Packet Loss Concealment) is a well known problem in voice communications. It’s also known to every telecommunication user in the world. Everyone, literally everyone who used VoIP Apps or Cellular Phone has experienced “chopped voice”. When network conditions are bad our voice is cutting off and sounds annoying and funny. Early Skype users remember this very well. We sound like “he e ey how aaaaaare yoo ooo”?
In this article we demonstrate how our Deep Neural Network (DNN) powered PLC algorithm (krispNet-PLC) compares to existing state of the art PLC technologies.
2Hz is committed to developing technologies which improve Voice Audio Quality in Real Time Communications.
One contributor to poor voice quality is the legacy infrastructure powered by 8kHz sampling based G.711 codec. While most of our phones can capture wideband audio (up to 48kHz) the codecs used by cellular networks downsample audio to 8kHz (lowband audio).
8kHz sampled audio can capture the frequency range the human ear is the most sensitive with however our voice still sounds like “coming from a tunnel” and is not pleasant enough. This is because of the absence of higher frequencies of our voice in the audio.
Artificial Bandwidth Expansion (we call it HD Voice Playback) refers to the idea of upsampling a lowband audio to wideband audio in a way that it improves voice quality. This technique has been around for many years. For example you can use ffmpeg open source tool to perform artificial expansion. ffmpeg up-samples the audio to 16kHz however it doesn’t enrich it. The end result still sounds like coming from a tunnel.
In this article we describe a Deep Learning based HD Voice Playback. We call our designed DNN krispNet. The full article an be found here:
We discussed traditional multi-mic based noise cancellation in the previous post. Such technologies can be applied on user device (phone, laptop) only where multiple mics are available.
In this post we will discuss the challenges related with running noise cancellation technology on the Server Side.
When we’ve built a fully software based noise cancellation technology at 2hz.ai, a profound question came up — why can’t we run this technology on Sever side rather than phones or laptops?
There is a big value proposition for Communications Service Provider companies here: independent on what devices their users are using all these conversations can be noise cancelled at the backend side.
See, when a new iPhoneX with a better noise cancellation comes out — it doesn’t have much impact on a Service Provider such as Twilio, RingCentral, Fuze or WebEx. This is because iPhoneX is only a fraction of their overall device population. But if they could noise cancel (denoise) every communication independent on user devices — there is a big value in it.
Even more. When you are in the backend you have access to both legs of a call and you can denoise both legs. So you not only make your user’s life “noise-free” but potentially also all the other users they are talking to (users outside your network).
Sounds like a no brainer. However it isn’t as simple as it sounds. Let’s talk now about some challenges.