DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model

(Submitted on Interspeech 2025)

1. Abstract

Dysarthric speech reconstruction (DSR) aims to convert dysarthric speech into comprehensible speech while maintaining the speaker’s identity. Despite significant advancements, existing methods often struggle with low speech intelligibility and poor speaker similarity. In this study, we introduce a novel diffusion-based DSR system that leverages a latent diffusion model to enhance the quality of speech reconstruction. Our model comprises: (i) a speech content encoder for phoneme embedding restoration via pre-trained self-supervised learning (SSL) speech foundation models; (ii) a speaker identity encoder for speaker-aware identity preservation by in-context learning mechanism; (iii) a diffusion-based speech generator to reconstruct the speech based on the restored phoneme embedding and preserved speaker identity. Through evaluations on the widely-used UASpeech corpus, our proposed model shows notable enhancements in speech intelligibility and speaker similarity.

2. Proposed Model Architecture

3. Comparison with Different Baseline Systems

FS2-DSR: It uses a speaker encoder to extract a global timbre embedding and a multi-speaker mel-based decoder.
CoLM-DSR: Excluding the influence of multi-modal input, it uses a LM-based generator with speech codec prompt.
Diff-DSR: Our complete proposed diffusion based system.

3.1 Speaker: M12

3.1.1 Text: Left

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.1.2 Text: Juliet

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.1.3 Text: Whiskey

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.1.4 Text: Many

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.1.5 Text: Golf

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.1.6 Text: Watch

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.2 Speaker: F02

3.2.1 Text: Paragraph

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.2.2 Text: Word

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.2.3 Text: When

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.2.4 Text: Seven

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.2.5 Text: Foxtrot

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.3 Speaker: M16

3.3.1 Text: Copy

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.3.2 Text: Bravo

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.3.3 Text: Kilo

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.3.4 Text: Oscar

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.3.5 Text: Tango

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.3.6 Text: Upward

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.4 Speaker: F04

3.4.1 Text: Bulrush

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.4.2 Text: Juliet

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.4.3 Text: Quebec

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.4.4 Text: Uniform

Original	FS2-DSR	CoLM-DSR	Diff-DSR

3.4.5 Text: Victor

Original	FS2-DSR	CoLM-DSR	Diff-DSR