CHiVE: Examples of side-by-side comparison

<< Back

These audio samples go with CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network, Wan, V., Chan, C.-a., Kenter, T., Vit, J., and Clark, R. A., Proceedings of the Thirty-sixth International Conference on Machine Learning (ICML 2019), 2019, side-by-side comparison between CHiVE and a baseline non-hierarchical model (Section 4.1, Table 1 in the paper).

Examples where CHiVE was preferred

5 examples where audio produced based on the sentence prosody embedding produced by the hierarchical CHiVE model was preferred over the audio produced by the non-hierarchical baseline.
Baseline CHiVE

Examples where the baseline was preferred

5 examples where audio produced based on the sentence prosody embedding produced by the non-hierarchical baseline was preferred over the audio produced by hierarchical CHiVE model.
Baseline CHiVE