Multimodal Resources and Multimodal Systems Evaluation
Motivation and Aims
Individual organizations and countries have been investing in the creation of resources and methods for the evaluation of resources, technologies, products and applications. This is evident in the US DARPA HLT programme, the EU HLT programme under FP5-IST, the German MTI Program, the Francophone AUF programme and others. The European 6th Framework program (FP6), planned for a start in 2003, includes multilingual and multisensorial communication as major R&D issues. Substantial mutual benefits can be expected from addressing these issues through international cooperation. Nowhere is this more important than in the relatively new areas of multimedia (i.e., text, audio, video), multimodal (visual, auditory, tactile), and multicodal (language, graphics, gesture) communication.
Multimodal resources are concerned with the capture and annotation of multiple modalities such as speech, hand gesture, gaze, facial expression, body posture, graphics, etc. Until recently, only a handful of researchers have been engaged in the development of multimodal resources and their application in systems. Even so, most have focused on a limited set of modalities, custom annotation schemes and within a particular application domain.
The primary purpose of this one day workshop (feeding into a subsequent half day Multimodal Roadmap workshop) will be to report and discuss multimodal resources, annotation standards, tools and methods, and evaluation metrics/methods, as well as strategize jointly about the way forward.
Workshop Agenda
The workshop will be a mix of short presentations and facilitated sessions with the intent of jointly identifying grand challenge problems, a shared understanding of and plan for multimedia resources and applications, and identification of methods for facilitating the creation of multimedia resources. The workshop will consist of a morning session (8:00am to 13:30) and an afternoon session (14:30 to 20:00), with a focus on multimodal resources, annotation and evaluation. A common repository of illustrative multimodal video samples will be built prior to the workshop. Workshop participants will be encouraged to annotate some of them using their own coding scheme or tool and report results at the workshop. Elements of the workshop will include:
- Individual presentation of individual submitted video samples, annotations, coding scheme
- Collective annotation exercise (comparison and integration of individual annotations)
- Group and (possible) breakout discussions regarding annotation/coding schemes, tools and methods for creation and annotation of multimodal resources (e.g., text, speech, gaze, gesture, facial expressions), issues in collective annotation (e.g. integration/conflicts of several coding schemes, integration of precise hand made annotation methods, multilingual issues)
- Short reflections/statements by select participants in key multimodal corpora areas grounded in annotation exercise
- Creation of possible courses of actions for individuals, organizations, and governments. This will flow into the Multimodal Roadmap half day workshop the following day.
Topics to be addressed in the workshop include, but are not limited to:
- Guidelines, standards, specifications, models and best practices for multimedia and multimodal LR;
- Methods, tools, and procedures for the acquisition, creation, management, access, distribution, and use of multimedia and multimodal LR;
- Methods for the extraction and acquisition of knowledge (e.g. lexical information, modality modelling) from multimedia and multimodal LR;
- Integration of multiple modalities in LR (speech, vision, language);
- Ontological aspects of the creation and use of multimodal LR;
- Machine learning for and from multimedia (i.e., text, audio, video), multimodal (visual, auditory, tactile), and multicodal (language, graphics, gesture) communication;
- Exploitation of multimodal LR in different types of applications (information extraction, information retrieval, meeting transcription, multisensorial interfaces, translation, summarisation, www services, etc.);
- Multimodal information presentation;
- Multimedia and multimodal metadata descriptions of LR;
- Applications enabled by multimedia and multimodal LR;
- Benchmarking of systems and products; use of multimodal corpora for the evaluation of real systems;
- Processing and evaluation of mixed spoken, typed, and cursive (e.g., pen) language processing;
- Evaluation of multimodal document retrieval systems (including detection, indexing, filtering, alerting, question answering, etc.);
- Automated multimodal fusion and/or multimodal generation (e.g., coordinated speech, gaze, gesture, facial expressions).
Submissions
This workshop will consist primarily of working sessions. However, presentations and participation in the workshop will be based on an assessment of a 4 page extended position statement which addresses one or more of the fundamental multimodal road map and/or multimodal resource issues posed by the workshop. Submissions must be in English, no more than 4 pages long, and in single column format. The first page should include the title, names and affiliations of the authors; the full address of the first author (or a contact person), including phone, fax, email, URL; and 5 keywords.
Submissions should be sent electronically in Word (preferably) or PDF or ASCII text format to arrive no later than 15 January 2002 to Jean-Claude Martin and Paula MacDonald.
Demonstrations of multimodal LR and related tools will be considered as well. Please send a demonstration outline of 2 pages.
Authors willing to participate to the a collective annotation exercise of the morning are encouraged to consult www.limsi.fr/Individu/martin/lrec2002/ where they can submit one ore more short (approx. 1 minute) video(s) with an accompanying annotation.
As soon as possible, authors are encouraged to send a brief email indicating their intention to participate, including their contact information and the topic they intend to address in their submission. Proceedings of the workshop will be printed.
Important Dates
| Call for papers/invitation |
17th December 2001 |
| Submission deadline |
15th February 2002 |
| Notification, stylesheets available |
1st March 2002 |
| Camera ready paper due |
15th March 2002 |
| Workshop program due |
3rd April 2002 |
| Proceedings due |
20th April 2002 |
| Workshop |
1st June 2002 |
Organizers
Mark T. Maybury
Information Technology Division
The MITRE Corporation, 3K-205
202 Burlington Road
Bedford, MA 01730
Phone: +1(781) 271-7230
Fax.: +1(781) 271-2780
Email: maybury@mitre.org
Web site:www.mitre.org/resources/centers/it
Jean-Claude Martin
LIMSI-CNRS/LINC-IUT de Montreuil
B.P. 133
F-91403 Orsay (France)
Phone: +33 6 84 21 62 05
Fax. : +33 1 69 85 80 88
Email: martin@limsi.fr
Web site:www.limsi.fr/Individu/martin
Programme Committee
| Mark Maybury |
The MITRE Corporation (Co-Chair) (USA) |
| Jean-Claude Martin |
LIMSI-CNRS/LINC-University Paris 8 (Co-Chair) (France) |
| Lisa Harper |
The Mitre Corporation (USA) |
| Catherine Pelachaud |
University of Rome "La Sapienza" (Italy) |
| Michael Kipp |
DFKI (Germany) |
| Wolfgang Wahlster |
DFKI (Germany) |
| Oliviero Stock |
IRST |
| Harry Bunt |
Tilburg University (The Netherlands) |
| Antonio Zampolli |
Consiglio Nazionale delle Ricerche (Italy) |
| Steven Krauwer |
ELSNET |
| Niels Ole Bernsen |
Natural Interactive Systems Laboratory, University of Southern Danemark (Odense, Denamrk) |
| Laila Dybkjaer |
Natural Interactive Systems Laboratory, University of Southern Danemark (Odense, Denamrk) |
Workshop Registration Fees
The registration fees for the workshop are:
- If you are not attending LREC: 140 EURO
- If you are attending LREC: 90 Euro
These fees include a coffee break and the Proceedings of the workshop.
Related Activities
Recently, several projects, initiatives and organisations have addressed multimodal resources with a federative approach:
- At LREC2000, a workshop addressed the issue of multimodal corpora, focusing on meta-descriptions and large corpora http://www.mpi.nl/world/ISLE/events/LREC%202000/LREC2000.htm
- NIMM is a working group on Natural Interaction and Multimodality under the IST-ISLE project (http://isle.nis.sdu.dk/). Since 2001, NIMM has been engaged with conducting a survey of multimodal resources, coding schemes and annotation tools. Currently, more than 60 corpora are described in the survey. The ISLE project is developed both in Europe and in the USA (http://www.ldc.upenn.edu/sb/isle.html).
- In November 2001, ELRA (European Language Resources Association) conducted a survey of multimodal corpora including marketing aspects (http://www.icp.inpg.fr/ELRA/).
- In November 2001, a Working Group at the Dagstuhl Seminar on Multimodal Fusion and Coordination received 28 completed questionnaires from participating researchers; 21 announced their intention to collect and annotate multimodal corpora in the future. (http://www.dfki.de/~wahlster/Dagstuhl_Multi_Modality/)
- Several recent surveys have focused specifically on multimodal annotation coding schemes and tools (COCOSDA, LDC, MITRE).
Other recent initiatives in the United States include:
- NIST Automatic Meeting Transcription Project (http://www.nist.gov/speech/test_beds/mr_proj): "The National Institute of Standards and Technology (NIST) held an all-day workshop entitled "Automatic Meeting Transcription Data Collection and Annotation" on 2 November 2001. "The workshop addressed issues in data collection and annotation approaches, data sharing, common annotation standards and tools, and distribution of corpora. ... To collect data representative of what might be expected in a functional meeting room of the future, [NIST has] created a media- and sensor-enriched conference room containing a variety of cameras and microphones."
- ATLAS (http://www.nist.gov/speech/atlas): Also at NIST, "ATLAS (Architecture and Tools for Linguistic Analysis Systems) is a recent initiative involving NIST, LDC and MITRE. ATLAS addresses an array of applications needs spanning corpus construction, evaluation infrastructure, and multimodal visualisation."
- TALKBANK (http://www.talkbank.org): TALKBANK is funded by the National Science Foundation (NSF). Its goal "is to foster fundamental research in the study of human and animal communication. TalkBank will provide standards and tools for creating, searching, and publishing primary materials via networked computers." One of the six sub-groups is concerned with communication by gesture and sign.
In the summer 2001, a Call for Proposal dedicated to Multimodality was launched within the IST Program in autumn 2001. We hope that these participants will also be interested in participating in the LREC 2002 conference.
Starting in 2003, the European 6th Framework program (FP6) will include multilingual and multisensorial communication as a primary R&D focus. Technology evaluation is a specific item in the Integrated Project instrument presentation.
Until now, the collection and annotation of multimodal corpora has been made on an individual basis; individual researchers and teams typically develop custom coding schemes and tools within narrow task domains. As a result, there is a distinct lack of shared knowledge and understanding in terms of how to compare various coding schemes and tools. This makes it difficult to bootstrap off of the results and experiences of others. Given that the annotation of corpora (particularly multimodal corpora) is very costly, we anticipate a growing need for the development of tools and methodologies that enable the collaborative building and sharing of multimodal resources.