HD Voice Playback with Deep Learning

2Hz is committed to developing technologies which improve Voice Audio Quality in Real Time Communications.

One contributor to poor voice quality is the legacy infrastructure powered by 8kHz sampling based G.711 codec. While most of our phones can capture wideband audio (up to 48kHz) the codecs used by cellular networks downsample audio to 8kHz (lowband audio).

8kHz sampled audio can capture the frequency range the human ear is the most sensitive with however our voice still sounds like “coming from a tunnel” and is not pleasant enough. This is because of the absence of higher frequencies of our voice in the audio.

Artificial Bandwidth Expansion (we call it HD Voice Playback) refers to the idea of upsampling a lowband audio to wideband audio in a way that it improves voice quality. This technique has been around for many years. For example you can use ffmpeg open source tool to perform artificial expansion. ffmpeg up-samples the audio to 16kHz however it doesn’t enrich it. The end result still sounds like coming from a tunnel.

In this article we describe a Deep Learning based HD Voice Playback. We call our designed DNN krispNet. The full article an be found here: