Personal tools

Blizzard Challenge 2010 Rules

From SynSIG

Contents

DATABASE ACCESS

REGISTRATION FEE

  • A registration fee of 500GBP (approx 750USD)) is due to offset the costs of running the challenge, including paying local assistants and undergraduate listeners. The fee is fixed, regardless of how many hub or spoke tasks you participate in. The fee must be paid by Friday 26th March 2010. You can pay this fee using Edinburgh University's online payments system: go to https://www.epay.ed.ac.uk/browse/extra_info.asp?compid=1&modid=2&prodid=246&deptid=24 and register for the event called 'Blizzard Challenge 2010'. After doing this, please also email blizzard@festvox.org to notify us that you have paid. If you are really unable to use the online payments system, please contact blizzard@festvox.org for assistance with other methods of payment. However, we strongly prefer the epay system because it reduces the costs and admin work for us. If you must pay by bank transfer, please contact us in plenty of time (several weeks before the payment deadline).

EXPERT LISTENERS

  • Each participant is expected to provide at least ten speech experts as listeners of the evaluation tests. English and/or Mandarin native speakers (as appropriate) are preferable, where possible. The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).

BUILDING VOICES

  • Participants may submit entries for any combination of tasks, subject to the following restrictions:
  • For each language in which you are participating, you must complete at least of the one hub tasks.
  • If you complete a hub task for a language, you may then submit entries for any number of the spoke tasks for that language.
  • You are encouraged to attempt both languages (remembering that, in past Challenges, the best-performing systems were not generally from native-speaker teams!).
  • It is not permissible for a single participant to submit multiple entries for any task, because the listening test will become unmanageable.
  • Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.

Hub task for English

  • Task EH1: build a voice from the UK English 'rjs' database. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate. (4014 utterances)
  • Task EH2: build a voice from the ARCTIC portion of the UK English 'roger' database, optionally using the provided hand-corrected labels. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate. (1132 utterances)

Spoke tasks for English

  • Task ES1: build voices from the first 100 utterances of the 'roger' database. You may use voice conversion, speaker adaptation techniques or any other technique you like. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate.
  • Task ES2: build a voice from the 'rjs' database suitable for synthesising speech to be heard in the presence of additive noise. The evaluation of this task will focus on intelligibility only. We will not consider naturalness or speaker similarity. You may enter the same voice as task EH1 if you wish, although specially-designed voices are strongly encouraged. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate.
  • Task ES3: the same as EH1, but you must submit 48kHz sampling rate wav files.

Hub task for Mandarin

  • Task MH1: build a voice from the full Mandarin database (5884 utterances)
  • Task MH2: build a voice from utterances 5085 to 5884 of the full Mandarin database (800 utterances)

Spoke tasks for Mandarin

  • Task MS1: build a voice from the first 100 of the utterances used in MH2, i.e. utterances 5085 to 5184.
  • Task MS2: build a voice from the full Mandarin database suitable for synthesising speech to be heard in the presence of additive noise. The evaluation of this task will focus on intelligibility only. We will not consider naturalness or speaker similarity. You may enter the same voice as task MH1 or MH2 if you wish, although specially-designed voices are strongly encouraged.

USE OF EXTERNAL DATA

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish
  • Use of external data is entirely optional and is not compulsory
  • For the UK English tasks, you should consider the Mandarin database to be external data.
  • For the Mandarin tasks, you should consider the UK English database to be external data.
  • For tasks EH2 and ES1, you should consider the 'rjs' data from task EH1 to be external data.
  • For tasks EH1, ES2 and ES3, you should consider the 'roger' data from task EH2 to be external data.
  • For tasks ES1, MH2 and MS1 in which you build a voice from a subset of a database, you must not use the remainder of that database at all, for any purpose - you must pretend it does not exist. For task EH2, you must not use the remainder of the larger 'roger' database distributed in previous Blizzard Challenges.
  • You may exclude any parts of the provided databases if you wish.
  • Use of the provided labels is optional.
  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.

SYNTHESISING THE TEST EXAMPLES

  • No manual intervention is allowed during synthesis. This includes, but is not limited to:
    • "Prompt sculpting"
    • Altering existing entries in your lexicon (however, you are allowed to add new words)
    • Using different subsets of the database to generate different test sentences or sentence types within a single task, unless this is a fully automatic part of your system.

RETENTION OF SUBMITTED SYNTHETIC SPEECH SAMPLES

  • Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
  • You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. In the past, all participants have agreed to this and we strongly encourage you to give this consent.

LISTENING TEST

  • The listening test design is likely to be similar to that used in the 2009 Challenge. Depending on the number of entries for each task, the organisers may only be able to evaluate certain subsets of the synthesised sentences or certain system configurations.

PAPER

  • Each participant will be expected to submit a six-page paper describing their entry for review.
  • One of the authors of each accepted paper should present it at the Blizzard 2010 Workshop, which will be a satellite of SSW7 and Interspeech 2010 in Japan. The workshop will be in the Kyoto area.
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

HOW ARE THESE RULES ENFORCED?

  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

SynSIG is a Special Interest Group of ISCA, the International Speech Communication Association.

SynSIG 1998-2019