Conference Paper

Advancing Neural Speech Codecs: Integrating Psychoacoustic Models for Enhanced Speech Quality

Torres, A., Rosa, R. L., Rodriguez, D. Z., & Saadi, M.

SoftCOM 2024, Split, Croatia

Publisher: IEEE

DOI: 10.23919/softcom62040.2024.10722025

Abstract

Voice quality is a vital component of mobile communication systems, and the advent of neural speech codecs has significantly transformed the speech compression landscape. Traditional codecs, with their reliance on fixed signal processing pipelines, often falter at lower bit rates, leading to decreased voice quality. On the other hand, neural speech codecs provide the dual benefits of enhancement and compression with negligible latency. We propose the integration of a psychoacoustic model with the existing structure that comprises a convolutional encoder, decoder, and a residual vector quantizer. A unique combination of reconstruction and adversarial loss is employed to train the model, targeting high-quality speech content generation. Simulations are conducted to study and optimize the performance of the proposed model. Performance assessment is carry out using MUSHRA measures and POLQA scores, offering a comprehensive understanding of voice quality. In addition, other models to compare the reconstruction quality are utilized. Our results highlight the substantial improvements in voice quality enabled by the integration of psychoacoustic model and neural speech codecs, demonstrating the superiority of our model over the conventional approach to deliver superior voice quality services at lower bitrates. It also provides an enriched basis for future research and development in speech codec enhancement strategies.