SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis


ArXiv: arXiv

Github: Click me

Authors

    By Tianren Gao *, Bohan Zhai *, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph Gonzalez, and Kurt Keutzer (UC Berkeley)

* Equal contribution.

Abstract

    Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at: Link

LJ0015

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0051

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0063

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0072

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0079

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0094

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0096

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0102

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ0153

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S

LJ00173

Baseline SqueezeWave 128L SqueezeWave 128S
SqueezeWave 64L SqueezeWave 64S