Paper: arXiv
Authors: Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu
Click here for more from the Tacotron team.
Contents
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 1 (male): Phone Level Energy Control | |||
Speaker 1 (male): Word Level Energy Control | |||
Speaker 2 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 2 (male): Phone Level Energy Control | |||
Speaker 2 (male): Word Level Energy Control | |||
Speaker 3 (female): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 3 (female): Phone Level Energy Control | |||
Speaker 3 (female): Word Level Energy Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 1 (male): Phone Level Energy Control | |||
Speaker 1 (male): Word Level Energy Control | |||
Speaker 2 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 2 (male): Phone Level Energy Control | |||
Speaker 2 (male): Word Level Energy Control | |||
Speaker 3 (female): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 3 (female): Phone Level Energy Control | |||
Speaker 3 (female): Word Level Energy Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [0, -1, 0] | z = [0, 0, 0] | z = [0, +1, 0] |
Speaker 1 (male): Phone Level Duration Control | |||
Speaker 1 (male): Word Level Duration Control | |||
Speaker 2 (male): Copy Synthesis | z = [0, -1, 0] | z = [0, 0, 0] | z = [0, +1, 0] |
Speaker 2 (male): Phone Level Duration Control | |||
Speaker 2 (male): Word Level Duration Control | |||
Speaker 3 (female): Copy Synthesis | z = [0, -1, 0] | z = [0, 0, 0] | z = [0, +1, 0] |
Speaker 3 (female): Phone Level Duration Control | |||
Speaker 3 (female): Word Level Duration Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [0, -1, 0] | z = [0, 0, 0] | z = [0, +1, 0] |
Speaker 1 (male): Phone Level Duration Control | |||
Speaker 1 (male): Word Level Duration Control | |||
Speaker 2 (male): Copy Synthesis | z = [0, -1, 0] | z = [0, 0, 0] | z = [0, +1, 0] |
Speaker 2 (male): Phone Level Duration Control | |||
Speaker 2 (male): Word Level Duration Control | |||
Speaker 3 (female): Copy Synthesis | z = [0, -1, 0] | z = [0, 0, 0] | z = [0, +1, 0] |
Speaker 3 (female): Phone Level Duration Control | |||
Speaker 3 (female): Word Level Duration Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [0, 0, -1] | z = [0, 0, 0] | z = [0, 0, +1] |
Speaker 1 (male): Phone Level F0 Control | |||
Speaker 1 (male): Word Level F0 Control | |||
Speaker 2 (male): Copy Synthesis | z = [0, 0, -1] | z = [0, 0, 0] | z = [0, 0, +1] |
Speaker 2 (male): Phone Level F0 Control | |||
Speaker 2 (male): Word Level F0 Control | |||
Speaker 3 (female): Copy Synthesis | z = [0, 0, -1] | z = [0, 0, 0] | z = [0, 0, +1] |
Speaker 3 (female): Phone Level F0 Control | |||
Speaker 3 (female): Word Level F0 Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [0, 0, -1] | z = [0, 0, 0] | z = [0, 0, +1] |
Speaker 1 (male): Phone Level F0 Control | |||
Speaker 1 (male): Word Level F0 Control | |||
Speaker 2 (male): Copy Synthesis | z = [0, 0, -1] | z = [0, 0, 0] | z = [0, 0, +1] |
Speaker 2 (male): Phone Level F0 Control | |||
Speaker 2 (male): Word Level F0 Control | |||
Speaker 3 (female): Copy Synthesis | z = [0, 0, -1] | z = [0, 0, 0] | z = [0, 0, +1] |
Speaker 3 (female): Phone Level F0 Control | |||
Speaker 3 (female): Word Level F0 Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 1 (male): Phone Level Silence Control | |||
Speaker 1 (male): Word Level Silence Control | |||
Speaker 2 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 2 (male): Phone Level Silence Control | |||
Speaker 2 (male): Word Level Silence Control | |||
Speaker 3 (female): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 3 (female): Phone Level Silence Control | |||
Speaker 3 (female): Word Level Silence Control |
Ground Truth: | |||
Speaker 1 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 1 (male): Word Level Silence Control | |||
Speaker 2 (male): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 2 (male): Word Level Silence Control | |||
Speaker 3 (female): Copy Synthesis | z = [-1, 0, 0] | z = [0, 0, 0] | z = [+1, 0, 0] |
Speaker 3 (female): Word Level Silence Control |
Speaker 1 (male) Phone Level | |||
Speaker 1 (male) Word Level |
Speaker 1 (male) Phone Level | |||
Speaker 1 (male) Word Level |