Volume: I, II, III, IV, V, VI





Introductory Messages


Antonio Zampolli, Introduction of the conference Chairman

Khalid Choukri, Message from ELRA’s CEO


Panel Summaries


Steven Bird, Hans Uszkoreit, Gary Simons, The Open Language Archives Community


Session SO1: Large Projects-Initiatives For Speech Corpora


Emanuela Cresti, Massimo Moneglia, Fernanda Bacelar do Nascimento, Antonio Moreno Sandoval, Jean Veronis, Philippe Martin, Kalid Choukri, Valerie Mapelli, Daniele Falavigna, Antonio Cid, Claude Blum, The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus

Hisao Kuwabara, Shuich Itahashi, Mikio Yamamoto, Toshiyuki Takezawa, Satoshi Nakamura, Kazuya Takeda, The Present Status of Speech Database in Japan: Development, Management, and Application to Speech Research.

Asunción Moreno, Oren Gedge, Henk van den Heuvel, Harald Höge, Sabine Horbach, Patricia Martin, Elisabeth Pinto, Antonio Rincón, Franco Senia, Rafid Sukkar, SpeechDat across all America: SALA II

Christoph Draxler, Florian Schiel, Three New Corpora at the Bavarian Archive for Speech Signals – and a First Step Towards Distributed Web-Based Recording.


Session MMO1: Tools & Annotations



Christoph Müller, Michael Strube, An API for Discourse-level Access to XML-encoded Corpora..

Jean-Claude Martin, Michael Kipp, Annotating and Measuring Multimodal Behaviour – Tycoon Metrics in the Anvil Tool.

P. Wittenburg, U. Mosel, A. Dwyer, Methods of Language Documentation in the DOBES project.

Niels Ole Bernsen, Laila Dybkjær, Mykola Kolodnytsky, THE NITE WORKBENCH. A Tool for Annotation of Natural Interactivity and Multimodal Data


Session WO1: LRs Platforms & Standards



Shoichiro Hara, Hisashi Yasunaga, Resource Sharing System for Humanity Researches

Nancy Ide, Laurent Romary, Standards for Language Resources.

Valentin Tablan, Cristian Ursu, Kalina Bontcheva, Hamish Cunningham, Diana Maynard, Oana Hamza, Tony McEnery, Paul Baker, Mark Leisher, A Unicode-based Environment for Creation and Use of Language Resources.

Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras, Ion Androutsopoulos, Constantine D. Spyropoulos, Ellogon: A New Text Engineering Platform.


Session WO2: Acquisition Of Lexical Information



Gregory Grefenstette, Yan Qu, David A. Evans, Expanding lexicons by inducing paradigms and  validating attested forms.

Ulrich Heid, Bettina Säuberlich, Arne Fitschen, Using Descriptive Generalisations in the Acquisition of Lexical Data for Word Formation

Manolis Maragoudakis, Katia Kermanidis, Nikos Fakotakis, George Kokkinakis, Combining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms..

Heiki-Jaan Kaalep, Kadri Muischnek, Using the Text Corpus to Create a Comprehensive List of Phrasal Verbs


Session EO1: Information Retrieval & Information Extraction Evaluation



Thierry Poibeau, Dominique Dutoit, Sophie Bizouard, Evaluating resource acquisition tools for Information Extraction.

Natalia V. Loukachevitch, Boris V. Dobrov, Evaluation of Thesaurus on Sociopolitical Life as Information-Retrieval Tool.

Carol Peters, Martin Braschler, The Importance of Evaluation for Cross-Language System Development: the CLEF Experience..

Bernardo Magnini, Matteo Negri, Roberto Prevete, Hristo Tanev, Towards Automatic Evaluation of Question/Answering Systems


Session SO2: Speech To Speech Translation



Harald Höge, Project Proposal TC-STAR - Make Speech to Speech Translation Real

Hideki Kashioka, Translation Unit Concerning Timing of  Simultaneous Translation.

Toshiyuki Takezawa, Eiichiro Sumita, Fumiaki Sugaya, Hirofumi Yamamoto, Seiichi Yamamoto, Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World.

Shigeki Matsubara, Akira Takagi, Nobuo Kawaguchi, Yasuyoshi Inagaki, Bilingual Spoken Monologue Corpus for Simultaneous Machine Interpretation Research..

Robert E. Frederking, Alan W Black, Ralf D. Brown, John Moody, Eric Steinbrecher, Field Testing the Tongues Speech-to-Speech Machine Translation System..

Erica Costantini, Susanne Burger, Fabio Pianesi, NESPOLE!’s Multilingual and Multimodal Corpus.


Session MMO2: Multimodal Lexicons & Corpora



Tokunaga Takenobu, Okumura Manabu, Saitô Suguru, Tanaka Hozumi, Constructing a lexicon of action. ..

P. Wittenburg, St. Levinson, S. Kita, H. Brugman, Multimodal Annotations in Gesture and Sign Language Studies.

Craig Martell, FORM: An Extensible, Kinematically-based Gesture Annotation Scheme.

Oliver Lemon, Alexander Gruenstein, Language Resources for Multi-Modal Dialogue Systems.

Angelika Salmen, Multi-Modal Menus And Traffic Interaction. Timing As A Crucial Factor For User Driven Mode Decisions. ..

Florian Schiel, Silke Steininger, Ulrich Türk, The SmartKom Multimodal Corpus at BAS..


Session WO3: Acquisition Of Lexical Information



Pierrette Bouillon, Vincent Claveau, Cécile Fabre, Pascale Sébillot, Acquisition of Qualia Elements from Corpora - Evaluation of a Symbolic Learning Method

Birte Lönneker, Building Concept Frames based on Text Corpora

Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier, A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping

Smaranda Muresan, Judith Klavans, A Method for Automatically Building and Evaluating Dictionary Resources.

Enrique Alfonseca, Suresh Manandhar, Improving an Ontology Refinement Method with Hyponymy Patterns.

Dominic Widdows, Beate Dorow, Chiu-Ki Chan, Using Parallel Corpora to enrich Multilingual Lexical Resources.


Session EO2: Evaluation Methodologies



Ariadna Font Llitjós, Alan W Black, Evaluation and collection of  proper name pronunciations online.

Janienke Sturm, Ilse Bakx, Bert Cranen, Jacques Terken, Fusi Wang, Usability Evaluation of a Dutch Multimodal System for Train Timetable Information.

Rickard Domeij, Ola Knutsson, Kerstin Severinson Eklundh, Different Ways of Evaluating a Swedish Grammar Checker.

Marianne Starlander, Andrei Popescu-Belis, Corpus-based Evaluation of a French Spelling and Grammar Checker.

Cătălina Barbu, Error analysis in anaphora resolution.

Barbara Di Eugenio, Michael Glass, Michael J. Scott, The binomial cumulative distribution function, or, is my system better than yours?.


Session SP1: Speech Resources



Achim F. Müller, Janez Stergar, Bogomir Horvat, Designing Prosodic Databases for Automatic Modeling of Slovenian Language in a Multilingual TTS System.

Robert Modic, Bojan Petek, A Contrastive Acoustic-Phonetic Analysis of Slovenian and English Diphthongs.

Ivan Kopeček, Karel Pala, Databases of Heterogeneous Segments for Concatenative Speech Synthesis.

Fabio Tamburini, Automatic detection of  prosodic prominence in continuous speech.

Michelina Savino, Mario Refice, Domenico Daleno, Methods and Tools for Prosodic Analysis of a Spoken Italian Corpus.

Hartmut R. Pfitzinger, Reducing Segmental Duration Variation by Local Speech Rate Normalization of Large Spoken Language Resources.

A. Benabbou, N. Chenfour, A. Mouradi, Study and quantification of the declination for the Arabic speech synthesis system PARADIS.

Fumiaki Suyaga, Toshiyuki Takezawa, Genichiro Kikui, Seiichi Yamamoto, Proposal of a very-large-corpus acquisition method by cell-formed registration.

Dorota Iskra, Beate Grosskopf, Krzysztof Marasek, Henk van den Heuvel, Frank Diehl, Andreas Kiessling, SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation.

Stefan Eickeler, Martha Larson, Wolff Rüter, Joachim Köhler, Creation of an Annotated German Broadcast Speech Database for Spoken Document Retrieval.

Nelleke Oostdijk, Wim Goedertier, Frank van Eynde, Louis Boves, Jean-Pierre Martens, Michael Moortgat, Harald Baayen, Experiences from the Spoken Dutch Corpus Project.








Session MMP1: Multimodal Resources And Tools



Laila Dybkjær, Niels Ole Bernsen, Natural Interactivity Resources – Data, Annotation Schemes and Tools.

Claudia Soria, Niels Ole Bernsen, Niels Cadée, Jean Carletta, Laila Dybkjær, Stefan Evert, Ulrich Heid, Amy Isard, Mykola Kolodnytsky, Christoph Lauer, Wolfgang Lezius, Lucas P.J.J. Noldus, Vito Pirrelli, Norbert Reithinger, Andreas Vögele, Advanced Tools for the Study of Natural Interactivity.

Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee, Beth Randall, Salim Zayat, TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the Annotation Graph Toolkit.

Silke Steininger, Florian Schiel, Angelika Glesner, User-State Labeling Procedures For The Multimodal Data Collection Of  SmartKom.

Hennie Brugman, Harriet Spenke, Markus Kramer, Alexander Klassmann, Multimedia Annotation with Multilingual Input Methods and Search Support.


Session WP1: Corpora & Corpus Tools



Tamás Váradi, The Hungarian National Corpus.

Florence Duclaye, François Yvon, Olivier Collin, Using the Web as a Linguistic Resource for Learning Reformulations Automatically.

António Branco, José Leitão, João Silva, Luís Gomes, Nexing Corpus: a corpus of verbal protocols on syllogistic reasoning.

Hatem Ghorbel, Giovanni Coray, André Linden, SAM: System for Multi-criteria Text Alignment.

Kristina Nilsson, Lars Borin, Living off the land: The Web as a source of  practice texts for learners of  less prevalent languages.

Michael Moortgat, Richard Moot, Using the Spoken Dutch Corpus for type-logical grammar induction.

Gabriela Cavaglià, Measuring corpus homogeneity using a range of measures for inter-document distance.

Eva Anna Lenz, Angelika Storrer, Converting a Corpus into a Hypertext: An Approach Using XML Topic Maps and XSLT.

Primož Jakopin, The feasibility of a complete text corpus.

Marko Tadić, Building the Croatian National Corpus.

Serge Sharoff, Meaning as use: exploitation of aligned corpora for the contrastive study of lexical semantics.

Mitsuo Shimohata, Eiichiro Sumita, Automatic paraphrasing based on parallel corpus for normalization.

Dan Tufiş, Ana-Maria Barbu, Lexical token alignment: experiments, results and applications.

Reinhard Rapp, A Part-of-Speech-Based Search Algorithm for Translation Memories.

José Miguel Aguilar Río, Compiling an Interactive Literary Translation Web Site for Education Purposes.

Kenji Matsumoto, Hideki Tanaka, Automatic Alignment of Japanese and English Newspaper Articles using an MT System and a Bilingual Company Name Dictionary.

Lars Ahrenberg , Mikael Andersson, Magnus Merkel, A System for Incremental and Interactive Word Linking.

Eugenio Picchi, Eva Sassolini, Ouafae Nahli, Sebastiana Cucurullo, M. Isabel Vargas, Italian arabic linguistic tools.

Keita Tsuji, Beatrice Daille, Kyo Kageura, Extracting French-Japanese Word Pairs from Bilingual Corpora based on Transliteration Rules.

Lynne Bowker, Peter Bennison, Translation Tracking System: A tool for managing translation archives.

Andrew Finch, Ezra Black, Ringo Wathelet, Beyond Tag Trigrams: New Local Features for Tagging.

Leonardo Lesmo, Vincenzo Lombardo, Transformed Subcategorization Frames in Chunk Parsing.

Caroline Hagège, Claude Roux, A Robust and Flexible Platform for Dependency Extraction.


Session EP1: Evaluation



Keiji Yasuda, Fumiaki Sugaya, Toshiyuki Takezawa, Seiichi Yamamoto, Masuzo Yanagida, Automatic machine translation selection scheme to output the best result.

Jean-Yves Antoine, Caroline Bousquet-Vernhettes, Jérôme Goulian, Mohamed Zakaria Kurdi, Sophie Rosset, Nadine Vigouroux, Jeanne Villaneau, Predictive and objective evaluation of speech understanding: the “challenge” evaluation campaign of the I3 speech workgroup of the French CNRS.

Nuria Bel, Javier Caminero, Luis Hernández, Montserrat Marimón, José F. Morlesín, Josep M. Otero, José Relaño, M. Carmen Rodríguez, Pedro M. Ruz, Daniel Tapias, Desing and Evaluation of a SLDS for E-Mail Access through the Telephone.

Mohamed-Zakaria Kurdi, Mohamed Ahafhaf, Toward an objective and generic Method for Spoken Language Understanding Systems Evaluation: an extension of the DCR method.

Marcela Charfuelán, Luis Hernández Gómez, Cristina Esteban López, Holmer Hemsen, A XML-based tool for evaluation of  SLDS.

Nicole Beringer, Katerina Louka, Victoria Penide-Lopez, Uli Türk, End-to-End Evaluation of Multimodal Dialogue Systems – can we Transfer Established Methods?

Andrej Žgank, Zdravko Kačič, Bogomir Horvat, Preliminary Evaluation of  Slovenian Mobile Database PoliDat.

Henk van den Heuvel, Khalid Choukri, Harald Höge, Give me a bug. a framework for a bug report service.

Michael Kluck, Christa Womser-Hacker, Inside the Evaluation Process of the Cross-Language Evaluation Forum (CLEF): Issues of Multilingual Topic Creation and Multilingual Relevance Assessment.

Martine Hurault-Plantet, Laura Monceaux, Cooperation between black box and glass box approaches for the evaluation of a question answering system.

Koji Eguchi, Kazuko Kuriyama, Noriko Kando, Sensitivity of IR systems Evaluation to Topic Difficulty.

Véronique Gendner, Gabriel Illouz, Michèle Jardino, Laura Monceaux, Patrick Paroubek, Isabelle Robba, Anne Vilnat, A Protocol for Evaluating Analyzers of  Syntax (PEAS).

Diana Santos, Caroline Gasperin, Evaluation of parsed corpora: Experiments in user-transparent and user-visible evaluation.

Kiyoaki Shirai, Construction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task.

Diana Raileanu, Paul Buitelaar, Spela Vintar, Jörg Bay, Evaluation Corpora for Sense Disambiguation in the Medical Domain.

Enrique Alfonseca, Suresh Manandhar, Proposal for Evaluating Ontology Refinement Methods.

Aristomenis Thanopoulos, Nikos Fakotakis, George Kokkinakis, Comparative Evaluation of  Collocation Extraction Metrics.

Javier Caminero, Joaquín González-Rodríguez, Javier Ortega-García, Daniel Tapias, Pedro M. Ruz, Mercedes Solá, A Multilingual Speaker Verification System: Architecture and Performance Evaluation.


Session SO3: Dialogue-Conversation Evaluation



Roldano Cattoni, Morena Danieli, Vanessa Sandrini, Claudia Soria, ADAM: The SI-TAL Corpus of  Annotated Dialogues.

Helen Wright Hastie, Rashmi Prasad, Marilyn Walker, Automatic Evaluation: Using a DATE Dialogue Act Tagger for User Satisfaction and Task Completion Prediction.

Pascale Nicolas, Sabine Letellier-Zarshenas, Igor Schadle, Jean-Yves Antoine, Jean Caelen, Towards a large corpus of spoken dialogue in French that will be freely available: the "Parole Publique" project and its first realisations.

John Garofolo, Jonathan G. Fiscus, Alvin Martin, David Pallett, Mark Przybocki, NIST Rich Transcription 2002 Evaluation: A Preview. 


Session MMO3: Collection & Indexing Of Multimodal LR



Stefan Rapp, Michael Strube, An Iterative Data Collection Approach for Multimodal Dialogue Systems.

Jean-Claude Martin, Jean-Hugues Réty, Nelly Bensimon, Multimodal and Adaptative Pedagogical Resources.

Horacio Saggion, Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Oana Hamza, Christian Ursu, Yorick Wilks, Extracting Information for Automatic Indexing of  Multimedia Material.

Emiko Suzuki, Kyoko Kakihana, Japanese and American Sign Language Dictionary System for Japanese and English Users.


Session WO4: General Issues On Lexicons



P. Wittenburg, W. Peters, S. Drude, Analysis of Lexical Structures from Field Linguistics and Language Engineering.

Sue Atkins, Nuria Bel, Francesca Bertagna, Pierrette Bouillon, Nicoletta Calzolari, Christiane Fellbaum, Ralph Grishman, Alessandro Lenci, Catherine MacLeod, Martha Palmer, Gregor Thurmair, Marta Villegas, Antonio Zampolli, From Resources to Applications. Designing the Multilingual ISLE Lexical Entry.

Marta Villegas, Nuria Bel, From DTD to relational dB. An automatic generation of a lexicographical station out off ISLE guidelines.

Carole Tiberius, How to build a multilingual inheritance-based lexicon.


Session WO5: Syntactic Annotation



Csaba Oravecz, Péter Dienes, Efficient Stochastic Part-of-Speech Tagging for Hungarian.

Katia Lida Kermanidis, Nikos Fakotakis, George Kokkinakis, DELOS: An Automatically Tagged Economic Corpus for Modern Greek.

Adam Meyers, Ralph Grishman, Michiko Kosaka, Formal Mechanisms for Capturing Regularizations.

Qiang Zhou, Elliott Franco Drabek, Fuji Ren, Annotating the functional chunks in Chinese sentences.


Session EO3: Written Systems Evaluation



Hidetsugu Nanba, Manabu Okumura, Some Examinations of Intrinsic Methods for Summary Evaluation Based on the Text Summarization Challenge (TSC).

Horacio Saggion, Dragomir Radev, Simone Teufel, Wai Lam, Stephanie M. Strassel, Developing Infrastructure for the Evaluation of  Single and Multi-document Summarization Systems in a Cross-lingual Environment.

Walter Daelemans, Véronique Hoste, Evaluation of Machine Learning Methods for Natural Language Processing Tasks.

Tristan Van Rullen, Philippe Blache, An evaluation of different symbolic shallow parsing techniques.







Panel Summaries



Tony McEnery, Ethical and legal issues in corpus construction.

Joseph Mariani, Mark T. Maybury, Fabio Pianesi, John Prange, Bernd Reuse, Phil Rubin, Daniel Tapias, Giovani Battista Varile, Charles Wayne, Antonio Zampolli, Language Resources and Evaluation: International Strategy Panel.


Keynotes Speeches



Kishore Papineni, Machine Translation Evaluation: N-grams to the Rescue.

Mark T. Maybury, Multimodal Systems, Resources and Evaluation.


Session SO4: Annotation Tools For Speech LRs


Ton van der Wouden, Heleen Hoekstra, Michael Moortgat, Bram Renmans, Ineke Schuurman,  Syntactic Analysis in the Spoken Dutch Corpus (CGN).

Véronique Gendner, Comparative study of oral and written French automatically tagged with morpho-syntactic information.

Jeska Buhmann, Johanneke Caspers, Vincent J. van Heuven, Heleen Hoekstra, Jean-Pierre Martens, Marc Swerts, Annotation of prominent words, prosodic boundaries and segmental lengthening by non-expert transcribers in the Spoken Dutch Corpus.


Session WO6: Semantic Lexicons



Charles J. Fillmore, Collin F. Baker, Hiroaki Sato, Seeing Arguments through Transparent Structures.

Nilda Ruimy, Monica Monachini, Raffaella Distante, Elisabetta Guazzini, Stefano Molino, Marisa Ulivieri, Nicoletta Calzolari, Antonio Zampolli, CLIPS, a Multi-level Italian Computational Lexicon: a Glimpse to Data.

Sanni Nimb, Adverbs in Semantic Lexica for NLP - The extension of the Danish SIMPLE lexicon with Time Adverbs.


Session WO7: LRs For Minority Languages



Monica Ward, Issues in the design, construction and use of Language Resources (LR) for Endangered Languages (Els).

Angelo Dalli, Creation and Evaluation of Extensible Language Resources for Maltese.

Paul Baker, Andrew Hardie, Tony McEnery, Hamish Cunningham, Rob Gaizauskas, EMILLE, A 67-Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation.


Session WO8: Written Corpora



Tony Rose, Mark Stevenson, Miles Whitehead, The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources.

George Mikros, Quantitative parameters in corpus design: Estimating the optimum text size in Modern Greek language.

Nancy Ide, Randi Reppen, Keith Suderman, The American National Corpus: More Than the Web Can Provide.


Session WO9: Treebanks



Eva Hajičová, Ivona Kučerová, Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank: A Comparative Pilot Study.

Igor Boguslavsky, Ivan Chardin, Svetlana Grigorieva, Nikolai Grigoriev, Leonid Iomdin, Leonid Kreidlin, Nadezhda Frid, Development of a Dependency Treebank for Russian and its Possible Applications in NLP.

Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, Marilyn Walker, A Dependency Treebank for English.


Session SP2: Speech Varieties And Multilingual ASR



I. Hernáez, E. Navas, J. Sánchez, I. Madariaga, I. Gaminde, X. Zalbide, BIZKAIFON: A sound archive of dialectal varieties of spoken Basque.

Chai Wutiwiwatchai, Patcharika Cotsomrong, Sinaporn Suebvisai, Supphanat Kanokphara, Phonetically Distributed Continuous Speech Corpus for Thai Language.

Laura Docío-Fernández, Carmen García-Mateo, Acoustic Modeling and Training of a Bilingual ASR System when a Minority Language is Involved.

Algimantas Rudzionis, Vytautas Rudzionis, Lithuanian Speech Database LTDIGITS.

Konstantin Biatov, Joachim Köhler, Methods and Tools for Speech Data Acquisition exploiting a Database of German Parliamentary Speeches and Transcripts from the Internet.

Rainer Siemund, Barbara Heuft, Khalid Choukri, Ossama Emam, Emmanuel Maragoudakis, Herbert Tropf, Oren Gedge, Sherrie Shammass, Asuncion Moreno, Albino Nogueiras Rodriguez, Imed Zitouni, Dorota Iskra, OrienTel - Multilingual access to interactive communication services for the Mediterranean and the Middle East.

Mónica Caballero, José B. Mariño, Asunción Moreno, Multidialectal Spanish Modeling for ASR.

N. Minematsu, Y. Tomiyama, K. Yoshimoto, K. Shimizu, S. Nakagawa, M. Dantsuji, S. Makino, English Speech Database Read by Japanese Learners for CALL System Development.

Masaki Murata, Hitoshi Isahara, Automatic extraction of differences between spoken and written languages, and automatic translation from the written to the spoken language.

Karl Weilhammer, Uwe Reichel, Florian Schiel, Multi-Tier Annotations in the Verbmobil Corpus.

Christopher Cieri, Stephanie Strassel, The DASL Project: a Case Study in Data Re-Annotation and Re-Use.


Session MMP2: Resources Of The Sign Languages



Thomas Hanke, iLex - A tool for Sign Language Lexicography and Corpus Analysis.

Atsuko, Koizumi, Hirohiko Sagawa, Masaru Takeuchi, An Annotated Japanese Sign Language Corpus.


Session WP2: Lexicons



Marie-Jeanne Derouin, Dr. André Le Meur, Report on the Revision of the Lexicographical Standard ISO 1951 Presentation/Representation of Entries in Dictionaries.

Pius ten Hacken, Word Formation and the Validation of Lexical Resources.

Catherine Macleod, Lexical Annotation for Multi-word Entries Containing Nominalizations.

Markéta Straňáková-Lopatková, Zdenĕk Žabokrtský, Valency Dictionary of  Czech Verbs: Complex Tectogrammatical Annotation.

Gábor Prószéky, Márton Miháltz, Automatism and User Interaction: Building a Hungarian WordNet.

Maya Ando, Jun Okamoto, Shun Ishizaki, Extraction of Associative Attributes from Nouns and Quantitative Expression of Prototype Concept.

Mike Maxwell, Resources for Morphology Learning and Evaluation.

Daniel Jung, Humans as Corpus - Language Learning Strategies in Virtually Mediated Authentic Environments.

Timothy Baldwin, Slaven Bilac, Ryo Okumura, Takenobu Tokunaga, Hozumi Tanaka, Enhanced Japanese Electronic Dictionary Look-up.

Anna Braasch, Current Developments of  STO - the Danish Lexicon Project for NLP and HLT Applications.

Rita Marinelli, Adriana Roventini, Proper Names In A Semantic Database.

Klára Osolsobĕ, Karel Pala, Radek Sedláček, Marek Veber, A Procedure for Word Derivational Processes Concerning Lexicon Extension in Highly Inflected Languages

Dominique Dutoit, Pierre Nugues, An Algorithm to Find Words from Definitions.

Heli Uibo, Experimental Two-Level Morphology of Estonian.

Kazutaka Takao, Kenji Imamura, Hideki Kashioka, Comparing and Extracting Paraphrasing Words with 2-Way Bilingual Dictionaries.

Roberto Navigli, Paola Velardi, Automatic Adaptation of WordNet to Domains.


Session WP3: Tools & Components



Paola Monachesi, Alexis Dimitriadis, Rob Goedemans, Anne-Marie Mineur, A unified system for accessing typological databases.

Jong-Hoon Oh, Saim Shin, Yong-Seok Choi, Key-Sun Choi, Word Sense Disambiguation with Information Retrieval Technique.

Antonio Molina, Ferran Pla, Encarna Segarra, Lidia Moreno, Word Sense Disambiguation using Statistical Models and WordNet.

A. Lavelli, F. Pianesi, E. Maci, I. Prodanof, L. Dini, G. Mazzini, SiSSA: An Infrastructure for Developing NLP Applications.

Daan Broeder, Freddy Offenga, Don Willems, Metadata Tools Supporting Controlled Vocabulary Services.

Claire Grover, Scott McDonald, Donnla Nic Gearailt, Vangelis Karkaletsis, Dimitra Farmakiotou, Georgios Samaritakis, Georgios Petasis, Maria Teresa Pazienza, Michele Vindigni, Frantz Vichot, Francis Wolinski, Multilingual XML-Based Named Entity Recognition for E-Retail Domains.

Nordine Fourour, Emmanuel Morin, Béatrice Daille, Incremental Recognition and Referential Categorization of French Proper Names.

Paloma Martínez, Ana García-Serrano, Alberto Ruiz-Cristina, Integrating Spanish Linguistic Resources in a Web Site Assistant.

Ana M. García-Serrano, Luis Rodrigo-Aguado, Javier Calle, Natural Language Dialogue in a Virtual Assistant Interface.

F. de Vriend, P.A. Coppen, W. Haeseryn, Using Grammatical Description as a Metalanguage Resource.

Pascale Bernard, Josette Lecomte, Jacques Dendien, Jean-Marie Pierrel, Computerized linguistic resources of the research laboratory ATILF for lexical and textual analysis: Frantext, TLFi, and the software Stella.

Dimitra Farmakiotou, Vangelis Karkaletsis, Ioannis Koutsias, George Petasis, Constantine D. Spyropoulos, PatEdit: An Information Extraction Pattern Editor for Fast System Customization.

Gerhard Heyer, Uwe Quasthoff, Christian Wolff, Information Extraction from Text Corpora: Using Filters on Collocation Sets.

Jason Baldridge, John Dowding, Susana Early, Leo: an Architecture for Sharing Resources for Unification-Based Grammars.

Amalia Todirascu, Eric Kow, Laurent Romary, Towards Reusable NLP Components.

Jakub Piskorski, Witold Drożdżyński, Oliver Scherf, Feiyu Xu, A Flexible XML-based Regular Compiler for Creation and Conversion of Linguistic Resources.

Alex Alsina, Toni Badia, Gemma Boleda, Stefan Bott, Àngel Gil, Martí Quixal, Oriol Valentín, CATCG: a general purpose parsing tool applied.







Sesion  D1: DEMOS


Dieter Maas, Nuebel Rita, Catherine Pease, Paul Schmidt, Bilingual Indexing for Information Retrieval with AUTINDEX.

Andrei Popescu-Belis, Susan Armstrong, Gilbert Robert, Electronic Dictionaries - from Publisher Data to a Distribution Server: the DicoPro, DicoEast and RERO Projects.

Jesús Cardeñosa, Edmundo Tovar, Carolina Gallardo, The UNL System.

Dragomir R. Radev, Hong Qi, Harris Wu, Weiguo Fan, Evaluating Web-based Question Answering Systems.

Charles J. Fillmore, Collin F. Baker, Hiroaki Sato, The FrameNet Database and Software Tools.




Session SO5: Speech Variabilities & Multilingual ASR



Oren Gedge, Christophe Couvreur, Klaus Linhard, Shaunie Shammass, Ami Moyal, Database Adaptation for Speech Recognition in Cross-Environmental Conditions.

Alexander Raake, Does the Content of Speech Influence its Perceived Sound Quality?.

Sebastian Möller, Ergina Kavallieratou, Diagnostic Assessment of  Telephone Transmission Impact on ASR Performance and Human-to-Human Speech Quality.




Session WO10: Ontologies



Daniela Alderuccio, Luciana Bordoni, An ontology-based approach in the literary research: two case-studies.

Adán Cassán, Sergi Cervell, Mireia Colom, Rafael Marín, Josep M. Merenciano, Gema Pérez, Lluís Valentín, BDCon: A Spanish knowledge database.

Antonio Moreno Ortiz, Victor Raskin, Sergei Nirenburg, New Developments in Ontological Semantics.


Session WO11: Specialised Written Corpora



Martin Wynne, The Language Resource Archive of the 21st Century.

Almudena Ballester, Ángel Martín Municio, Fernando Pardos, Jordi Porta Zamorano, Rafael J. Ruiz Ureña, Fernando Sánchez León, Combining statistics on n-grams for automatic term recognition.

Simone Teufel, Noemie Elhadad, Collection and linguistic processing of a large-scale corpus of medical articles.


Session WO12: Coreference



Massimo Poesio, Tomonori Ishikawa, Sabine Schulte im Walde, Renata Vieira, Acquiring Lexical Knowledge for Anaphora Resolution.

Felix Sasaki, Claudia Wegener, Andreas Witt, Dieter Metzing, Jens Pönninghaus, Co-reference annotation and resources: A multilingual corpus of typologically diverse languages.

Violeta Seretan, Dan Cristea, The Use of Referential Constraints in Structuring Discourse.


Session EO4: MT Evaluation



Eduard Hovy, Margaret King, Andrei Popescu-Belis, Computer-Aided Specification of  Quality Models for Machine Translation Evaluation.

Martin Rajman, Anthony Hartley, Automatic Ranking of  MT Systems.

Michelle Vanni, Keith Miller, Scaling the ISLE Framework: Use of Existing Corpus Resources for Validation of MT Evaluation Metrics across Languages.


Session SO6: Phonetic Lexicons



Petr Pollák, Václav Hanžl, Tool for Czech Pronunciation Generation Combining Fixed Rules with Pronunciation Lexicon and Lexicon Management Tool

Vincent Vandeghinste, Lexicon Optimization: Maximizing Lexical Coverage in Speech Recognition through Automated Compounding.

Stefan Schaden, A Database for the Analysis of  Cross-Lingual Pronunciation Variants of European City Names.

Rudolf Muhr, Robert Hölrdich, Eva Wächter-Kollpache, The Pronouncing Dictionary of Austrian German and the other Major Varieties of German - A Phonetic Resources Database on the Pronunciation of German.

Ingunn Amdal, Torbjørn Svendsen, Evaluation of  Pronunciation Variants in the ASR Lexicon for Different Speaking Styles.

Matej Rojc, Zdravko Kačič, Darinka Verdonik, Design and Implementation of the Slovenian Phonetic and Morphology Lexicons for the Use in Spoken Language Applications.


Session WO13: Issues On LRs Infrastructures



Daan Broeder, Peter Wittenburg, Thierry Declerck, Laurent Romary, LREP: A Language Repository Exchange Protocol.

Joanne Capstick, Hans Uszkoreit, Wolfgang Wahlster, Thierry Declerck, Gregor Erbach, Anthony Jameson, Brigitte Jorg, Reinhard Karger, Tillmann Wegst, COLLATE: Competence Center in Speech and Language Technology.

Matthias Denecke, Signatures, Typed Feature Structures and RDFS.

P. Wittenburg, W. Peters, D. Broeder, Metadata Proposals for Corpora and Lexica.

Christopher Cieri, Mark Liberman, Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report.

Christopher Cieri, Mark Liberman, TIDES Language Resources: A Resource Map for Translingual Information Access.


Session WO14: Lexicons


Matthieu Constant, Methods for Constructing Lexicon-Grammar Resources: The Example of  Measure Expressions.

Juliana Galvani Greghi, Ronaldo Teixeira Martins, Maria das Graças Volpe Nunes, DIADORIM - A Lexical Database for Brazilian Portuguese.

Sabine Schulte im Walde, A Subcategorisation Lexicon for German Verbs induced from a Lexicalised PCFG.

Takano Ogino, Hitoshi Isahara, Kazuhiro Kobayashi, The Valence Patterns of  Japanese Verbs Extracted From The EDR Corpus.

Hans C. Boas, Bilingual FrameNet Dictionaries for Machine Translation.

Masayuki Asahara, Ryuichi Yoneda, Akiko Yamashita, Yasuharu Den, Yuji Matsumoto, Use of  XML and Relational Databases for Consistent Development and Maintenance of Lexicons and Annotated Corpora.


Session WO15: Semantic Tagging



Hiroyuki Shinnou, Learning of  word sense disambiguation rules by Co-training, checking co-occurrence of features.

Katja Markert, Malvina Nissim, Towards a Corpus Annotated for Metonymies: the Case of Location Names.

Constantin Orăsan, Richard Evans, Assessing the difficulty of  finding people in texts.

Luisa Bentivogli, Emanuele Pianta, Opportunistic Semantic Tagging.

Rada F. Mihalcea, Bootstrapping Large Sense Tagged Corpora.

Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham, Yorick Wilks, How feasible is the reuse of grammars for Named Entity Recognition?.







Panel Summaries



Nicoletta Calzolari, Ralph Grishman, Marta Palmer, Standards & best practice for multilingual computational lexicons: ISLE MILE and more


Keynotes Speeches



James Pustejovsky, Creating Domain-specific Information Servers.

Gianni Lazzari, Speech to Speech Translation: Present and Future Challenges.


Session SO7: Tools For Spoken LRs



Hélèn François, Olivier Boëffard, The Greedy Algorithm and its Application to the Construction of a Continuous Speech Database.

Ricardo Ribeiro, Luís Oliveira, Isabel Trancoso, Morphosyntactic Disambiguation for TTS Systems.

Jean-Pierre Martens, Diana Binnenpoorte, Kris Demuynck, Ruben Van Parys, Tom Laureys, Wim Goedertier, Jacques Duchateau, Word Segmentation in the Spoken Dutch Corpus.

Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Masato Mimura, Atsushi Yamada, Akinori Ito, Katsunobu Itou, Kiyohiro Shikano, Continuous Speech Recognition Consortium  an Open Repository for CSR Tools and Models.


Session WO16: Applications Based On Written LRs



Silja Huttunen, Roman Yangarber, Ralph Grishman, Diversity of  Scenarios in Information extraction.

René Schneider, n-grams of  Seeds: A Hybrid System for Corpus-Based Text Summarization.

Sanda Harabagiu, Finley Lacatusu, Paul Morarescu, Multidocument Summarization with GISTexter.

Alessandro Lenci, Roberto Bartolini, Nicoletta Calzolari, Ana Agua, Stephan Busemann, Emmanuel Cartier, Karine Chevreau, José Coch, Multilingual Summarization by Integrating Linguistic Resources in the MLIS-MUSI Project.


Session WO17: Semantic Lexicons



Adriana Roventini, Marisa Ulivieri, Nicoletta Calzolari, Integrating Two Semantic Lexicons, SIMPLE and ItalWordNet: What Can We Gain?.

Nabil Hathout, From WordNet to CELEX: acquiring morphological links from dictionaries of synonyms.

Claudia Kunze, Lothar Lemnitzer, GermaNet - representation, visualization, application.

Jerker Järborg, Dimitrios Kokkinakis, Maria Toporowska Gronostaj, Lexical and Textual Resources for Sense Recognition and Description.


Session WO18: Syntactic Annotation



Ted Briscoe, John Carroll, Robust Accurate Statistical Annotation of General Text.

Erhard W. Hinrichs, Sandra Kübler, Frank H. Müller, Tylman Ule, A Hybrid Architecture for Robust Parsing of German.

Zdeněk Žabokrtský, Petr Sgall, Sašo Džeroski, A Machine Learning Approach to Automatic Functor Assignment in the Prague Dependency Treebank.

Roberto Bartolini, Alessandro Lenci, Simonetta Montemagni, Vito Pirrelli, The Lexicon-Grammar Balance in Robust Parsing of Italian.


Session EO5: Lexical Evaluation



Darren Pearce, A Comparative Evaluation of  Collocation Extraction Techniques.

Romaric Besançon, Martin Rajman, Evaluation of a Vector Space Similarity Measure in a Multilingual Framework.

Thierry Hamon, Olivier Hû, How to evaluate necessary cooperative systems of terminology building?

Judita Preiss, Anna Korhonen, Ted Briscoe, Subcategorization Acquisition as an Evaluation Method for WSD.


Session SP3 Annotation Tools: From Speech Segments To Dialogues



Doroteo Torre Toledano, Luis A. Hernández Gómez, HMMs for Automatic Phonetic Segmentation.

Tom Laureys, Kris Demuynck, Jacques Duchateau, Patrick Wambacq, An Improved Algorithm for the Automatic Segmentation of Speech Corpora.

Thorsten Trippel, Dafydd Gibbon, Annotation Driven Concordancing: the PAX Toolkit.

K. López de Ipiña, N. Ezeiza, G. Bordel, Automatic Morphological Segmentation for Continuous Speech Recognition of Basque.

Carlos D. Martínez-Hinarejos, Emilio Sanchís, Fernando García-Granada, Pablo Aibar, A Labelling Proposal to Annotate Dialogues.

Claudia Sassen, Dafydd Gibbon, Enhanced Dialogue Markup for Crisis Talk Scenario Resources.

Petra Geutner, Frank Steffens, Dietrich Manstetten, Design of the VICO Spoken Dialogue System: Evaluation of User Expectations by Wizard-of-Oz Experiments.

Laurence Devillers, Sophie Rosset, Hélèn Bonneau-Maynard, Lori Lamel, Annotations for Dynamic Diagnosis of the Dialog State.

Steve Whittaker, Marilyn Walker, Johanna Moore, Fish or Fowl: A Wizard of  Oz Evaluation of Dialogue Strategies in the Restaurant Domain.


Session WP4: Corpus Annotation



Nigel Collier, Koichi Takeuchi, PIA-Core: Semantic Annotation through Example-based Learning.

Tilly Dutilh, Truus Kruyt, Implementation and Evaluation of  PAROLE PoS in a National Context.

Kiril Ribarov, Old Sources and Modern Procedures: Computer Processing of  Old-Church Slavonic.

Susanne Salmon-Alt, Renata Vieira, Nominal Expressions in Multilingual Corpora: Definites and Demonstratives.

Chung-hye Han, Na-Rare Han, Eon-Suk Ko, Martha Palmer, Development and Evaluation of a Korean Treebank and its Application to NLP.

Sabine Brants, Silvia Hansen, Developments in the TIGER Annotation Scheme and their Realization in the Corpus.

X. Artola, A. Díaz de Ilarraza, N. Ezeiza, K. Gojenola, G. Hernández, A. Soroa, A Class Library for the Integration of NLP Tools: Definition and implementation of an Abstract Data Type Collection for the manipulation of SGML documents in a context of stand-off linguistic annotation

Špela Vintar,  Paul Buitelaar, Bärbel Ripplinger, Bogdan Sacaleanu, Diana Raileanu, Detlef Prescher, An Efficient and Flexible Format for Linguistic and Semantic Annotation.

Nadia Mana, Ornella Corazzari, The Lexico-semantic Annotation of an Italian Treebank.

Scott Cotton, Steven Bird, An integrated framework for treebanks and multilayer annotations.

Paul Clough, Robert Gaizauskas, S. L. Piao, Building and annotating a corpus for the study of journalistic text reuse.

Gosse Bouma, Geert Kloosterman, Querying Dependency Treebanks in XML.

Toshifumi Tanabe, Yasuo Koyama, Kenji Yoshimura, Kosho Shudo, Modal Expressions in Natural Language Sentence and Their Similarity.

Susana Afonso, Eckhard Bick, Renato Haber, Diana Santos, "Floresta Sintá(c)tica": A treebank for Portuguese.

Ilona Steiner, Laura Kallmeyer, VIQTORYA -- A Visual Query Tool for Syntactically Annotated Corpora.

Aoife Cahill, Josef van Genabith, TTS - A Treebank Tool Suite.

Serge A. Yablonsky, Corpora as Object-Oriented System. From UML-notation to Implementation.

Harris Papageorgiou, Prokopis Prokopidis, Voula Giouli, Iason Demiros, Alexis Konstantinidis, Stelios Piperidis, Multi-level XML-based Corpus Annotation.

Kiril Simov, Petya Osenova, Milena Slavcheva, Sia Kolkovska, Elisaveta Balabanova, Dimitar Doikoff, Krassimira Ivanova, Alexander Simov, Milen Kouylekov, Building a Linguistically Interpreted Corpus of  Bulgarian: the BulTreeBank.

Atsushi Fujii, Katunobu Itou, Tetsuya Ishikawa, Producing a Large-scale Encyclopedic Corpus over the Web.


Session WP5: Components & Systems



Chikashi Nobata, Satoshi Sekine, Hitoshi Isahara, Ralph Grishman, Summarization System Integrated with Named Entity Tagging and IE pattern Discovery.

Min-Yen Kan, Judith L. Klavans, Kathleen R. McKeown, Using the Annotated Bibliography as a Resource for Indicative Summarization.

Bolette S. Pedersen, Patrizia Paggio, Semantic Lexical Resources Applied to Content-based Querying - the OntoQuery Project.

Anna Sågvall Hein, Eva Forsbom, Jörg Tiedemann, Per Weijnitz, Ingrid Almqvist, Leif-Jöran Olsson, Sten Thaning, Scaling Up an MT Prototype for Industrial Use - Databases and Data Flow.

Barry Schiffman, Building a Resource for Evaluating the Importance of  Sentences.

Constantin Orasan, Ramesh Krishnamurthy, A corpus-based investigation of  junk emails.

Constantin Orasan, Building annotated resources for automatic text summarisation.

Andrea Bozzi, LAperLA: an integrated graphical-linguistic System for old printed Latin Texts.

Elaine Uí Dhonnchadha, A Two-level Morphological Analyser and Generator for Irish using Finite-State Transducers.

Nabil Hathout, Ludovic Tanguy, Webaffix: Discovering Morphological Links on the WWW.

Hannah Kermes, Stefan Evert, YAC - A Recursive Chunker for Unrestricted German Text.

Xavier Carreras, Lluís Padró, A Flexible Distributed Architecture for Natural Language Analyzers.

Satoshi Sekine, Kiyoshi Sudo, Chikashi Nobata, Extended Named Entity Hierarchy.

Yllias Chali, Experiments in Topic Detection.







Session WP6: LRs & Projects


Alejandro Bia, Manuel Sánchez Quero, Building ancient Spanish dictionaries for spell-checking of DL texts.

Choy-Kim Chuah, Zaharin Yusoff, Computational Linguistics at Universiti Sains Malaysia.

Carole Tiberius, Dunstan Brown, Greville Corbett, A typological database of agreement.

Fabio Tamburini, A dynamic model for reference corpora structure definition.

Yong-Ju Lee, Bong-Wan Kim, Yongnam Um, Speech Information Technology & Industry Promotion Center in Korea: Activities and Directions.

Catia Cucchiarini, Elisabeth D'Halleweyn, Lisanne Teunissen, A Human Language Technologies Platform for the Dutch language: awareness, management maintenance and distribution.

D. Binnenpoorte, F. De Vriend, J. Sturm, W. Daelemans, H. Strik, C. Cucchiarini, A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch.

Michael Rosner, The Future of  Maltilex.


Session TP1: Terminology



Lorna Balkan, Ken Miller, Birgit Austin, Anne Etheridge, Myriam Garcia Bernabé, Pam Miller, ELSST: a broad-based Multilingual Thesaurus for the Social Sciences.

Marianne Dabbadie, Widad Mustafa El Hadi, Ismaïl Timimi, Terminological Enrichment for non-Interactive MT Evaluation.

Judit Feliu, Jorge Vivaldi, M. Teresa Cabré, Towards an Ontology for a Human Genome Knowledge Base.

Olivier Ferret, Christian Fluhr, Françoise Rousseau-Hans, Jean-Luc Simoni, Building domain specific lexical hierarchies from corpora.

James Dowdall, Michael Hess, Neeme Kahusk, Kaarel Kaljurand, Mare Koit, Fabio Rinaldi, Kadri Vider, Technical Terminology as a Critical Resource.

Sussi Olsen, Lemma selection in domain specific computational lexica - some specific problems.

Jörg Tiedemann, MatsLex - a Multilingual Lexical Database for Machine Translation.


Session SO8: Annotation Frameworks & Tools



Kazauki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee, Creating Annotation Tools with the Annotation Graph Toolkit.

Jan-Torsten Milde, Ulrike Gut, The TASX-environment: an XML-based toolset for time aligned speech corpora.

Christophe Laprun, Jonathan G. Fiscus, John Garofolo, Sylvain Pajot, A Pratical Introduction to ATLAS.


Session WO19: Multi Word Expressions & Metaphors



Nicoletta Calzolari, Charles J. Fillmore, Ralph Grishman, Nancy Ide, Alessandro Lenci, Catherine MacLeod, Antonio Zampolli, Towards Best Practice for Multiword Expressions in Computational Lexicons.

Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, Timothy Baldwin, Ivan A. Sag, Dan Flickinger, Multiword expressions: linguistic precision and reusability.

Antonietta Alonge, Margherita Castelli, Which way should we go? Metaphoric expressions in lexical resources.


Session WO20: Machine Translation



Taro Watanabe, Mitsuo Shimohata, Eiichiro Sumita, Statistical Machine Translation on Paraphrased Corpora.

Mathieu Lafourcade, Christian Boitet, UNL Lexical Selection with Conceptual Vectors.

Satoshi Shirai, Kazuhide Yamamoto, Francis Bond, Hozumi Tanaka, Towards a Thesaurus of Predicates.


Session WO21: Treebanks



Julia Hockenmaier, Mark Steedman, Acquiring Compact Lexicalized Grammars from a Cleaner Treebank.

Alexandra Kinyon, Carlos A. Prolo, Identifying Verb Arguments and their Syntactic Function in the Penn Treebank.

Paul Kingsbury, Martha Palmer, From TreeBank to PropBank.


Session WO22: Coreference



Cătălina Barbu, Richard Evans, Ruslan Mitkov, A corpus based investigation of  morphological disagreement in anaphoric relations.

Dan Cristea, Oana-Diana Postolache, Gabriela-Eugenia Dima, Cătălina Barbu, AR-Engine - a framework for unrestricted co-reference resolution.

Daisuke Kawahara, Sadao Kurohashi, Kôiti Hasida, Construction of a Japanese Relevance-tagged Corpus.


Sessions SO9: Emotional & Specific Databases



Parham Mokhtari, Nick Campbell, Automatic Detection of Acoustic Centres of Reliability for Tagging Paralinguistic Information in Expressive Speech.

Vladimir Hozjan, Zdravko Kacic, Objective analysis of emotional speech for English and Slovenian Interface emotional speech databases.

Vladimir Hozjan, Zdravko Kacic, Asunción Moreno, Antonio Bonafonte, Albino Nogueiras, Interface Databases: Design and Collection of a Multilingual Emotional Speech Database.

Nick Campbell, Recording techniques for capturing natural every-day speech.

Laura Pecchia, Giuseppe Cappelli, Elisabetta Guazzini, Linguistic and Computational Problems for the Creation of an Italian Children's Corpus of Spoken Language.

Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano, Designing speech database with prosodic variety for expressive TTS system.

Nobuo Kawaguchi, Shigeki Matsubara, Kazuya Takeda, Fumitada Itakura, Multi-Dimensional Data Acquisition for Integrated Acoustic Information Research.


Session WO23: Corpus Analysis, Annotation, Representation



Irena Spasić, Goran Nenadić, Sophia Ananiadou, Tuning Context Features with Genetic Algorithms.

Steve Cassidy, XQuery as an Annotation Query Language: a Use Case Analysis.

Adán Cassán, Sergi Cervell, Mireia Colom, Rafael Marín, Josep M. Merenciano, Gema Pérez, Lluís Valentín, A step forward to hypertext.

Xiaoyi Ma, Haejoong Lee, Steven Bird, Kazuaki Maeda, Models and Tools for Collaborative Annotation.

Nigel Collier, Koichi Takeuchi, Chikashi Nobata, Junichi Fukumoto, Norihiro Ogata, Progress on Multi-lingual Named Entity Annotation Guidelines using RDF (S).

Brian Mitchell, Robert Gaizauskas, A Comparison of Machine Learning Algorithms for Prepositional Phrase Attachment.

R. Muñoz, R. Mitkov, M. Palomar, J. Peral, R. Evans, L. Moreno, C.Orasan, M. Saiz-Noeda, A. Ferrández, C. Barbu, P. Martínez-Barco, A. Suárez, Bilingual alignment of anaphoric expressions.


Session WO24: Applications Based On Written LRs



A. Cappelli, M. N. Catarsi, P. Michelassi, L. Moretti, M. Baglioni, F. Turini, M. Tavoni, Knowledge Mining and Discovery for Searching in Literary Texts.

Richard F. E. Sutcliffe, Kieran White, Searching via Keywords or Concept Hierarchies - Which is Better?.

Ganesh Ramesh, Amit Bagga, A Text-based for Detection and Filtering of Commercial Segments in Broadcast News.

Nadjet Bouayad-Agha, Richard Power, Donia Scott, Anja Belz, PILLS: Multilingual generation of medical information documents with overlapping content.

Masumi Narita, Kazuya Kurokawa, Takehito Utsuro, A Web-based English Abstract Writing Tool Using a Tagged E-J Parallel Corpus.

Jimmy Lin, The Web as a Resource for Question Answering: Perspectives and Challenges.

Philippe Langlais, Marie Loranger, Guy Lapalme, Translators at work with TRANSTYPE: Resource and Evaluation


Session TO1: Terminology



Udo Hahn, Stefan Schulz, Towards Very Large Ontologies for Medical Language Processing.

Maria Rzewuska, Terminology Resources in the Context of a Major Translation Project.

Klaus-Dirk Schmitz, Subject-field-specific Ontologies and Terminologies for the Web Community.

Goran Nenadić, Irena Spasić, Sophia Ananiadou, Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts.

Antonio S. Valderrábanos, Alexander Belskis, Luis Iraola Moreno, Multilingual Terminology Extraction and Validation.

Le An Ha, Learning description of  term patterns using glossary resources.