Audio samples related to Tacotron, an end-to-end speech synthesis system by Google.

(March 2017) Tacotron: Towards End-to-End Speech Synthesis

(November 2017) Uncovering Latent Style Factors for Expressive Speech Synthesis

(December 2017) Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

(March 2018) Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

(March 2018) Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

(June 2018) Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

(July 2018) Predicting Expressive Speaking Style From Text in End-to-End Speech Synthesis

(August 2018) Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

(October 2018) Hierarchical Generative Modeling for Controllable Speech Synthesis

(November 2018) Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

(April 2019) Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

(June 2019) Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

(July 2019) Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning

(September 2019) Semi-Supervised Generative Modeling for Controllable Speech Synthesis

(October 2019) Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

(February 2020) Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

(February 2020) Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

(October 2020) Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling

(October 2020) Parallel Tacotron: Non-Autoregressive and Controllable TTS

(November 2020) Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

(March 2021) PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

(March 2021) Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

(November 2021) Speaker Generation

(March 2022) Real Time Spectrogram Inversion on Mobile Phone

(October 2022) Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

(October 2022) Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

(September 2024) Zero-shot Cross-lingual Voice Transfer for TTS

(October 2024) Very Attentive Tacotron: Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech

(December 2024) Long-Form Speech Generation with Spoken Language Models

(July 2025) SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

Tacotron (/täkōˌträn/): An end-to-end speech synthesis system by Google