DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model

(Submitted on Interspeech 2025)

1. Abstract

Dysarthric speech reconstruction (DSR) aims to convert dysarthric speech into comprehensible speech while maintaining the speaker’s identity. Despite significant advancements, existing methods often struggle with low speech intelligibility and poor speaker similarity. In this study, we introduce a novel diffusion-based DSR system that leverages a latent diffusion model to enhance the quality of speech reconstruction. Our model comprises: (i) a speech content encoder for phoneme embedding restoration via pre-trained self-supervised learning (SSL) speech foundation models; (ii) a speaker identity encoder for speaker-aware identity preservation by in-context learning mechanism; (iii) a diffusion-based speech generator to reconstruct the speech based on the restored phoneme embedding and preserved speaker identity. Through evaluations on the widely-used UASpeech corpus, our proposed model shows notable enhancements in speech intelligibility and speaker similarity.

2. Proposed Model Architecture

3. Comparison with Different Baseline Systems

3.1 Speaker: M12

3.1.1 Text: Left

Original FS2-DSR CoLM-DSR Diff-DSR

3.1.2 Text: Juliet

Original FS2-DSR CoLM-DSR Diff-DSR

3.1.3 Text: Whiskey

Original FS2-DSR CoLM-DSR Diff-DSR

3.1.4 Text: Many

Original FS2-DSR CoLM-DSR Diff-DSR

3.1.5 Text: Golf

Original FS2-DSR CoLM-DSR Diff-DSR

3.1.6 Text: Watch

Original FS2-DSR CoLM-DSR Diff-DSR

3.2 Speaker: F02

3.2.1 Text: Paragraph

Original FS2-DSR CoLM-DSR Diff-DSR

3.2.2 Text: Word

Original FS2-DSR CoLM-DSR Diff-DSR

3.2.3 Text: When

Original FS2-DSR CoLM-DSR Diff-DSR

3.2.4 Text: Seven

Original FS2-DSR CoLM-DSR Diff-DSR

3.2.5 Text: Foxtrot

Original FS2-DSR CoLM-DSR Diff-DSR

3.3 Speaker: M16

3.3.1 Text: Copy

Original FS2-DSR CoLM-DSR Diff-DSR

3.3.2 Text: Bravo

Original FS2-DSR CoLM-DSR Diff-DSR

3.3.3 Text: Kilo

Original FS2-DSR CoLM-DSR Diff-DSR

3.3.4 Text: Oscar

Original FS2-DSR CoLM-DSR Diff-DSR

3.3.5 Text: Tango

Original FS2-DSR CoLM-DSR Diff-DSR

3.3.6 Text: Upward

Original FS2-DSR CoLM-DSR Diff-DSR

3.4 Speaker: F04

3.4.1 Text: Bulrush

Original FS2-DSR CoLM-DSR Diff-DSR

3.4.2 Text: Juliet

Original FS2-DSR CoLM-DSR Diff-DSR

3.4.3 Text: Quebec

Original FS2-DSR CoLM-DSR Diff-DSR

3.4.4 Text: Uniform

Original FS2-DSR CoLM-DSR Diff-DSR

3.4.5 Text: Victor

Original FS2-DSR CoLM-DSR Diff-DSR