<< home

'Hard lines' dataset used for evaluating CHiVE-BERT

This page presents the 'hard lines' dataset that was used for evaluation in Improving Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model, T. Kenter, M. Sharma and R. A. J. Clark, INTERSPEECH 2020.

If you use the dataset in your work, please cite the paper:

@inproceedings{kenter2020improvingprosodywithbert,
  title = {Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT Model},
  author = {Tom Kenter and Manish Sharma and Rob Clark},
  booktitle = {INTERSPEECH 2020},
  pages = {4412--4416},
  year = {2020},
}

The dataset is a plain text file with one sentence per line. It can be viewed and downloaded here.

It is composed of so-called 'hard lines' that typically contain compound nouns, compound adjectives or long proper names. These are phrases some TTS systems have trouble rendering correctly. The dataset was constructed from scratch to cover this kind of examples and to get a better understanding of how or models were performing on these lines, which do not not typically occur very often in standard datasets.

We found that using a targeted dataset like this helped in developing our models. By making this dataset available, we hope to support the use of targeted and openly available test sets, and to make comparison between models more consistent and straightforward.

This work is licensed under a Creative Commons Attribution 4.0 International License.