SegWin: a Tool for Segmenting, Annotating, and Controlling the Creation of a Database of Spoken Italian Varieties
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)
Abstract
A number of actions have been recently proposed, aiming at filling the gap existing in the availability of speech annotated corpora of Italian regional varieties. A starting action is represented by the national project AVIP (Archivio delle Varietà di Italiano Parlato, Spoken Italian Varieties Archive), whose main challenge is a methodological one, namely finding annotation strategies and developing suitable software tools for coping with the inadequacy of linguistic models for Italian accent variations. Basically, these strategies consist in adopting an iterative process of labelling such that a description for each variety could be achieved by successive refinement stages without loosing intermediate stages information. To satisfy such requirements, a specific software system, called SegWin, has been developed by Politecnico di Bari, which: • “guides” the human transcribers in the annotation phases by a sort of “scheduled procedure”; • allows incremental addition of information at any stage of the database creation; • monitors/checks the consistency of the database during every stage of its creation The system has been extensively used by all the partners of the project AVIP and is continuously updated to take into account the project needs. The main characteristics of SegWin are here described, in relation to the above mentioned aspects.