🎙️ Phonological Vector-based Speech Editing Demo

Demonstration for the paper [b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic. This demo reproduces Experiment 2: Scale of Phonological Vectors, illustrating the controllability of speech editing by phonological vectors.

Upload, record, or use the example audio (or word). Then, inspect the spectrogram, select the time window, choose a phonological vector to apply, then hit Run. (For the example words, we gave 0.25s margin to the start and end of the word.)

Hyperparameters

  • Start / Stop (s): Time range (in seconds) over which the phonological vector is applied. Use the input spectrogram to identify the target phone's boundaries.
  • Lambda: Strength of the phonological vector. Positive values strengthen the selected feature; negative values strengthens the opposite feature.
  • Vocos training dataset: Training corpus used for the vocoder (Vocos) that resynthesizes the modified representation back to audio.
  • Vector extraction method: How phonological vectors are estimated from S3M representations. Different options correspond to different training dataset/calculating the vectors.
  • Phonological feature: The phonological vector to add into the selected time window.

Hyperparameters

-5 5
Vocos training dataset
Vector extraction method
Phonological feature

Input audio

Choose a word to modify (or record your own below)

Output audio

Input spectrogram

Output spectrogram