Personal tools

Blizzard Challenge 2020 Rules

From SynSIG
(Difference between revisions)
(PAPER)
 
(2 intermediate revisions by one user not shown)
Line 64: Line 64:
 
==LISTENING TEST==
 
==LISTENING TEST==
 
Formal listening tests will be conducted to evaluate the synthetic speech submitted. The listening test will likely evaluate the performance of the voice in terms of naturalness and intelligibility on various types of material (i.e., as in most previous Blizzard Challenges).
 
Formal listening tests will be conducted to evaluate the synthetic speech submitted. The listening test will likely evaluate the performance of the voice in terms of naturalness and intelligibility on various types of material (i.e., as in most previous Blizzard Challenges).
 +
 +
==USE OF RESULTS==
 +
The Blizzard Challenge is a scientific exercise. You may use the results only for scientific research purposes. Specifically, you may NOT use the results (e.g., your team's ranking) for any commercial purposes, including but not limited to advertising products or services.
  
 
==PAPER==
 
==PAPER==
* Each participant MUST submit a six-page paper (using the Interspeech 2020 template) describing their entry for review. Please email your paper to blizzard@festvox.org .
+
* '''Each participant MUST submit a six-page paper''' (using the Interspeech 2020 template) describing their entry for review. Please email your paper to blizzard@festvox.org .
 +
** If you are unable to comply with this requirement, do not enter the challenge.
 
* Papers should describe the system, as well as the use of:
 
* Papers should describe the system, as well as the use of:
 
** external data, if any (e.g., other speech or text corpora)
 
** external data, if any (e.g., other speech or text corpora)

Latest revision as of 11:52, 10 February 2020

Contents

[edit] DATABASE ACCESS

  • After registration and completion of the required licenses, download passwords will be issued, as described on the main Blizzard 2020 page.

[edit] REGISTRATION FEE

  • A registration fee of 600 GBP is payable by all participants who wish to submit synthetic speech for evaluation, to offset the costs of running the challenge, including paying local assistants and listeners. The fee must be paid by Monday 4th May 2020. Each fee covers the entry from one participating team; that entry can comprise either or both of the tasks below.
  • Under no circumstances should you pay the fee until your request to participate has been accepted by the organisers. But we strongly recommend paying this fee only after you have decided to submit synthetic speech (e.g., after completing your own internal evaluation prior to submission). We cannot issue refunds.
  • Payment of the fee does not guarantee that your system will be included in the evaluation. We may exclude very low quality entries, at our discretion, to prevent them skewing the listening test (and wasting listener effort). Dealing with such entries still consumes resources, and therefore the entry fee will not be refunded.
  • You should pay this fee using Edinburgh University's online payments system at https://www.epay.ed.ac.uk/conferences-and-events/college-of-science-and-engineering/school-of-informatics/informatics-events where you should register for the event called 'Blizzard Challenge 2020' (not yet available). After doing this, you will receive a confirmation email from the epay system. Please forward this email to blizzard@festvox.org to notify us that you have paid. If you are absolutely unable to use the online payments system, please contact blizzard@festvox.org for assistance with a bank transfer. However, we strongly prefer the epay system because it reduces the costs and admin work for us. If you must pay by bank transfer, please contact us in plenty of time (at least 4 weeks before the payment deadline); an additional administration fee of 150 GBP will be added for any payments not made using the epay system.

[edit] LISTENERS

  • Each participant must try to recruit at least ten volunteer listeners. If possible, these should be people who have some professional knowledge of synthetic speech.

[edit] NAIVE LISTENERS

  • Each participant should try to recruit as many naive listeners (with no professional knowledge of synthetic speech) as possible. They do not have to be native speakers.
  • The organisers would also appreciate assistance in advertising both the Challenge and the listening test as widely as possible (e.g., to your students or colleagues).

[edit] MATERIALS PROVIDED

All participants will have access to the following material after signing the license:

  • An estimated 9.5 hours of speech from one native Mandarin speaker + 3 hours of speech from one native Shanghainese speaker. All material is segmented into individual sentence-level files.
  • Text transcriptions for all material, plus unaligned phonetic transcriptions for the Shanghainese material

[edit] THE CHALLENGE

Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree with this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.

[edit] Tasks

Participants may submit an entry for either or both tasks.

  • 2020-MH1 -- Hub task: build a voice from the provided Mandarin data
  • 2020-SS1 -- Spoke task: build a voice from the provided Shanghainese data

[edit] USE OF EXTERNAL DATA

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish, subject to any exclusions or limitations given in these rules
  • Use of external data is entirely optional and is not compulsory
  • You must use the provided audio files
  • You must use no more than 100 hours of audio for each task, including the provided data
    • e.g., if you start from a pre-trained model for task 2020-MH1, it must have been trained on less than 90.5 hours of audio
    • You must not use any additional speech data from the same speakers
  • You may exclude any parts of the provided databases if you wish.
  • There is no limitation on the amount of external non-audio data you may use (e.g., text, dictionaries)
  • Use of any provided transcriptions is optional.
  • If you are in any doubt about how to apply these rules, please contact the organizers for clarification

[edit] SYNTHESISING THE TEST EXAMPLES

  • The exact nature of the test set will not be revealed in advance but is likely to include both sentences and paragraphs.
  • Synthetic speech may be submitted at any standard sampling rate (but always at 16 bits per sample). Waveforms will not be downsampled for the listening test.
  • Registered participants can download the test set from: COMING LATER

[edit] RETENTION OF SUBMITTED SYNTHETIC SPEECH SAMPLES

  • Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
  • You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. In the past, all participants have agreed to this and we strongly encourage you to give this consent.

[edit] LISTENING TEST

Formal listening tests will be conducted to evaluate the synthetic speech submitted. The listening test will likely evaluate the performance of the voice in terms of naturalness and intelligibility on various types of material (i.e., as in most previous Blizzard Challenges).

[edit] USE OF RESULTS

The Blizzard Challenge is a scientific exercise. You may use the results only for scientific research purposes. Specifically, you may NOT use the results (e.g., your team's ranking) for any commercial purposes, including but not limited to advertising products or services.

[edit] PAPER

  • Each participant MUST submit a six-page paper (using the Interspeech 2020 template) describing their entry for review. Please email your paper to blizzard@festvox.org .
    • If you are unable to comply with this requirement, do not enter the challenge.
  • Papers should describe the system, as well as the use of:
    • external data, if any (e.g., other speech or text corpora)
    • existing tools, software and models (e.g., text analysers, Festival, HTS, Merlin, WaveNet, Tacotron, word2vec, ...)
  • One of the AUTHORS of each accepted paper MUST present it at the Blizzard 2020 Workshop
    • If you are unable to comply with this requirement, do not enter the challenge.
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

[edit] HOW ARE THESE RULES ENFORCED?

  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

SynSIG is a Special Interest Group of ISCA, the International Speech Communication Association.

SynSIG 1998-2020