Back to Main Conference 2002
LREC 2002main

The Greedy Algorithm and its Application to the Construction of a Continuous Speech Database

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/4ewtb496pq75

Abstract

Databases containing varied linguistic features can be build by condensing large corpora; in this work we need to cover a set of phonetic units with a minimal set of natural phonetic sentences. With this aim in view we compare three set covering methods: the greedy method, its inverse which we call the spitting method, and the pair exchange method. Each method is defined with several criteria guiding the selection of sentences; they relate to the number of units of the sentences, to their length, and to the rareness of their units. A first experiment shows that pair exchange method doesn't guarantee a total covering. Greedy and spitting methods performances are comparable; nevertheless greedy is a bit better and above all less time-consuming. Applying spitting method to a greedy cover increases performance by removing about 10% redundancy. So does pair exchange method, but it is more time-consuming. Most of the criteria guiding selections are sensitive to the sentences length. Criteria performances obtained for a total covering are not necessarily transposable to a partial covering.

Details

Paper ID
lrec2002-main-265
Pages
N/A
BibKey
francois-boeffard-2002-greedy
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • HF

    Hélène François

  • OB

    Olivier Boëffard

Links