Personal tools

Blizzard Challenge 2017

From SynSIG
Apple and Google have generously provided financial support to the Blizzard Challenge 2017

Contents

Read these first

This year, there are two distinct parts to the Blizzard Challenge. Teams may enter either one, or both. The first part of the challenge follows the standard approach of previous years, and comprises the single hub task (2017-EH1) which requires teams to build an end-to-end text-to-speech system. The second part of the challenge is novel and is designed to be accessible to the wider machine learning community; it comprises two spoke tasks (2017-ES1 and 2017-ES2)

New: the Blizzard Machine Learning Challenge

Speech synthesis as a machine learning problem ---exploring new types of acoustic models

In the HMM era, by taking a unified view of both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS), it was possible to develop various types of new ASR and TTS techniques, e.g., cross-lingual speaker adaptation, adaptive training for TTS, use of prosody in ASR, etc. We expect that by once again taking a unified view in the current DNN era, it will be possible to develop new types of acoustic modeling techniques that are useful for both ASR and TTS.

The series of Blizzard Challenges has helped us measure progress in TTS. But, to get competitive performance, a lot time has to be spent on skilled tasks such as updating the lexicon, removing inappropriate audio files, segmenting and aligning audio files, detecting alignment errors, etc. This may make the Blizzard Challenge unattractive to Machine Learning (ML) researchers from other fields.

We therefore propose a spin-off challenge that does not involve these speech-specific tasks, and allows participants to concentrate on the acoustic modeling task, framed as a straightforward ML problem, with a fixed data set.

The data that the organizers will provide is in the form of corresponding sequences of linguistic features, speech features and speech waveforms. Participants must train a model to predict speech features from linguistic features (or, to directly predict speech waveforms from linguistic features, as done in WaveNet), and then use that model to make predictions for a test set of previously-unseen linguistic features.

Evaluation will be done by the organisers, using a listening test, as in the main challenge.

Registration

Register by emailing blizzard@festvox.org. We need to know your team name, the name of the main contact person, your affiliation, and contact details including email address, postal address and phone number. Please specify which task(s) you plan to submit entries for.

Data download

The speech + text data comes from professional audiobooks produced by Usborne Publishing.

  • 2017-EH1
    • About 6.5 hours of British English speech data from a single female talker, which comprises 5 hours of speech already released for the 2016 challenge plus the audio from 6 additional books that were used for test material in 2016.
    • Processed versions, such as alignments, are shared via the Blizzard Challenge 2016-7 Git Repository
  • 2017-ES1 and 2017-ES2
    • About 4 hours of British English speech data (waveforms) from a single female talker, which is a cleaned-up version of the data used in the 2016 challenge, along with linguistic features and speech features.

Download links (including the online license form) can be found via http://www.cstr.ed.ac.uk/projects/blizzard/2017/usborne_blizzard2017

MD5 checksums:

  • blizzard_release_2017_v2.zip = 21c3f4ddcd724417632b96ef99deec20
  • blizzard_machine_learning_challenge_2017-ES1.zip = d59998653f450d0bd9cd4084334f130e
  • blizzard_machine_learning_challenge_2017-ES2.zip = 1e88ba7edb8af1f88710318ceee69075

Development tools

Questionnaire

Mailing list

There is a mailing list for discussion and announcements for the challenge:

 blizzard-discuss@festvox.org

Participants must join the list by sending a message to majordomo@festvox.org with the following line in the body of the message

 subscribe blizzard-discuss

Once you are a member you will be able to mail messages to blizzard-discuss@festvox.org

Timeline

The timeline shown on this web page is the official one and supercedes those shown in announcements - it is subject to change, but we will try to follow it as closely as possible. Note that we will not consider any requests from participants to change the synthetic speech submission date or the paper submission date!

  Dec  8    2016  -  2017-EH1 database released
  Jan  ?    2016  -  2017-ES1 and 2017-ES2 database released
  Mar 29    2017  -  test sentences released to participants
  Apr 17    2017  -  participants submit their output, plus questionnaire (by midnight UTC)
  Apr       2017  -  evaluation systems go live
  Jun       2017  -  end of evaluation period
  Jun 15    2017  -  expected release of results for 2017-ES1 and 2017-ES2
  Jun 29    2017  -  deadline to submit ASRU papers for 2017-ES1 and 2017-ES2
  Jun 29    2017  -  expected release of results for 2017-EH1
  Jul  3    2017  -  deadline to submit Blizzard Workshop papers for 2017-EH1
  Jul 31    2017  -  notification of paper acceptance for 2017-EH1
  Aug 20-24 2017  -  Interspeech 2017, Stockholm, Sweden
  Aug 25    2017  -  Blizzard Challenge workshop, Stockholm - for task 2017-EH1
  Aug 28-   2017  -  EUSIPCO 2017, Kos, Greece
  Aug 31    2017  -  notification of paper acceptance for 2017-ES1 and 2017-ES2
  Dec 16-20 2017  -  ASRU 2017
                      will include the workshop for tasks 2017-ES1 and 2017-ES2

Workshop

Information on the two workshops can be found here:

Any questions?

  • Please contact blizzard@festvox.org if you have any questions

Previous challenges


SynSIG is a Special Interest Group of ISCA, the International Speech Communication Association.

SynSIG 1998-2017