Audio samples for CHiVE-BERT

Audio samples to go with Improving Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model, T. Kenter, M. Sharma and R. A. J. Clark, INTERSPEECH 2020.

@inproceedings{kenter2020improvingprosodywithbert,
  title = {Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT Model},
  author = {Tom Kenter and Manish Sharma and Rob Clark},
  booktitle = {INTERSPEECH 2020},
  pages = {4412--4416},
  year = {2020},
}

CHiVE-BERT preferred

Baseline CHiVE-BERT
"I am looking for diet cat food."

An example sentence with diet cat food, as referred to in the introduction.

Baseline CHiVE-BERT
"Where do you find traditional and live action role playing games in the espionage genre?"

An example of an error in the dependency parse being audible in the prosody generated by the baseline. The dependency parse analysizes the sentence as if playing is a verb (i.e. there is a live action role, who is playing games).

Baseline CHiVE-BERT
"High-speed rail is public transport by rail at speeds of at least 200 km/h."

Another example of an error in the dependency parse. The dependency parse analysizes at speeds ... 200 km/h as being a modifier of rail rather than transport by rail.

Baseline CHiVE-BERT
"When was the last time you took 10 seconds of silence to think of people who have helped you?"

A somewhat longer question, where CHiVE-BERT seems a bit better at maintaining a consistent prosody throughout.

Baseline CHiVE-BERT
"Sorry, what was that?"

A short question. The prosody yielded by CHiVE-BERT sounds a bit better at the end, but note that there already is a (subtle) difference at the start (Sorry) too.

Baseline preferred

Baseline CHiVE-BERT
"Ask one of your so-called experts."

The CHiVE-BERT model has trouble with the so-called.

Baseline CHiVE-BERT
"Why not just tax profits and capital gains and adjust rates as necessary?"

Here the CHiVE-BERT model seems to treat adjust rates as a noun compound.

Baseline CHiVE-BERT
"It is natural to ask how large is the dimension of the solution?"

The CHiVE-BERT model seems to pick up on the sentence structure (that of an affirmative sentence) but misses the question mark, which the prosody of baseline does reflect in a more pronounced way.

Just as a reference, here is the same sentence with the first two words swapped. This is not entirely grammatical, perhaps, but it suggests that this sentence is a question a bit stronger. We see that the prosody of both models sound rather similar:

"Is it natural to ask how large is the dimension of the solution? "
And, finally, to be complete, the sentence as a fully grammatical question. The prosody, again, is very similar here:
"Is it natural to ask how large the dimension of the solution is? "