Personal tools

Blizzard Challenge 2018 Rules

From SynSIG




  • A registration fee of 600 GBP is payable by all participants who wish to submit synthetic speech for evaluation, to offset the costs of running the challenge, including paying local assistants and listeners. The fee must be paid by Friday 27th April 2018. You can pay this fee using Edinburgh University's online payments system at where you should register for the event called 'Blizzard Challenge 2018'. After doing this, you will receive a confirmation email from the epay system. Please forward this email to to notify us that you have paid. If you are absolutely unable to use the online payments system, please contact for assistance with a bank transfer. However, we strongly prefer the epay system because it reduces the costs and admin work for us. If you must pay by bank transfer, please contact us in plenty of time (at least 4 weeks before the payment deadline); an additional administration fee of 150 GBP will be added for any payments not made using the epay system.


  • Each participant must try to recruit at least ten volunteer listeners. If possible, these should be people who have some professional knowledge of synthetic speech.


  • Each participant should try to recruit as many naive listeners (with no professional knowledge of synthetic speech) as possible. They do not have to be native speakers.
  • The organisers would also appreciate assistance in advertising both the Challenge and the listening test as widely as possible (e.g., to your students or colleagues).
  • We are also seeking assistance with conducting a listening test using children. These can be native speakers, or learners of English as an additional language. Children with reading ages of either 4, 5, or 6 years (approximately) would be most appropriate, but slightly older children would also be useful. Please contact to discuss this. We may be able to offer some financial support with the costs, or reduce the entry fee if you are also submitting synthetic speech.


All participants will have access to the following material after signing the license:

  • An estimated 6.5 hours of speech from one native British English female professional speaker (this is identical to the 2017 data, and includes the 5 hours released in 2016) from 56 children's audiobooks
  • Publisher's text for all speech material, as either text or PDF files
  • Cleaned-up text and sentence-level alignments with the audio, for part of the material
  • A shared repository of cleaned-up text and alignments contributed by other participants: Blizzard Challenge 2016-8 Git Repository

All participants are expected to make a contribution to this shared repository.

The original material was very kindly provided by Usborne Publishing


Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.


Build a voice from the provided data, suitable for reading children's audiobooks. There is just a single task, designated as 2018-EH1.


  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish, subject to any exclusions given in these rules
  • Use of external data is entirely optional and is not compulsory
  • You must use the provided audio files
  • You must not use any additional speech data from the same speaker
  • You may exclude any parts of the provided databases if you wish.
  • Use of any provided segmentations, transcriptions or labels is optional.
  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.


  • The exact nature of the test set will not be revealed in advance, but is likely to include both sentence, paragraph and short book-length texts from a similar domain to the provided corpus, as well as texts from other domains.
  • Synthetic speech may be submitted at any standard sampling rate (but always at 16 bits per sample). Waveforms will not be downsampled for the listening test.


  • Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
  • You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. In the past, all participants have agreed to this and we strongly encourage you to give this consent.


Formal listening tests will be conducted to evaluate the synthetic speech submitted. Whilst the task is to synthesise speech suitable for reading an audiobook to children, the listening test will likely also evaluate the performance of the voice in terms of naturalness and intelligibility on other types of material (i.e., as in most previous Blizzard Challenges).


  • Each participant will be expected to submit a six-page paper (using the Interspeech 2018 template) describing their entry for review. Please email your paper to .
  • Papers should describe the system, as well as the use of:
    • external data, if any (e.g., other speech or text corpora)
    • existing tools, software and models (e.g., text analysers, Festival, HTS, word2vec, ...)
  • One of the authors of each accepted paper should present it at the Blizzard 2018 Workshop
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)


  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

SynSIG is a Special Interest Group of ISCA, the International Speech Communication Association.

SynSIG 1998-2019