Back to Main Conference 2024
LREC-COLING 2024main

An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4z75r8zo8ueo

Abstract

Data availability is crucial for advancing artificial intelligence applications, including voice-based technologies. As content creation, particularly in social media, experiences increasing demand, translation and text-to-speech (TTS) technologies have become essential tools. Notably, the performance of these TTS technologies is highly dependent on the quality of the training data, emphasizing the mutual dependence of data availability and technological progress. This paper introduces an end-to-end tool to generate high-quality datasets for text-to-speech (TTS) models to address this critical need for high-quality data. The contributions of this work are manifold and include: the integration of language-specific phoneme distribution into sample selection, automation of the recording process, automated and human-in-the-loop quality assurance of recordings, and processing of recordings to meet specified formats. The proposed application aims to streamline the dataset creation process for TTS models through these features, thereby facilitating advancements in voice-based technologies.

Details

Paper ID
lrec2024-main-0093
Pages
pp. 1043-1051
BibKey
gunduz-etal-2024-automated
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • AG

    Ahmet Gunduz

  • KY

    Kamer Ali Yuksel

  • KD

    Kareem Darwish

  • GJ

    Golara Javadi

  • FM

    Fabio Minazzi

  • NS

    Nicola Sobieski

  • SB

    Sébastien Bratières

Links