Evaluations of multimedia information processing systems

At a time when everything has been digitized and when multifaceted information is being transmitted by different media sources (sound, text, video, etc.), it has become essential to adopt shared benchmarks that enable qualifying the reliability of automated information processing systems. Such an effort makes it possible for assembly industries and consumers alike to navigate among the various technological options. It also creates the opportunity for developers and researchers to compare their respective approaches by means of quantifying their ability to fulfill a specific task during major international evaluation campaigns.

LNE organizes evaluations of information processing systems that make use of various types of data, namely:

  • text: automatic translation, document classification, layout and summaries, surveying of named entities, answers to questions, etc.
  • voice: automatic recognition, identification of the language and speaker, detection of terms spoken, translation, etc.
  • video and image: object recognition, facial detection, personal tracking, optical character recognition,
  • measurements output by sensors used in robotics or for self-driving vehicles.

The evaluation data distributed by LNE are prepared in partnership with associations specialized in linguistics and the language arts.

Evaluations made available to researchers and public bodies

LNE, as a trusted third party, conducts open evaluations as part of the projects financed by governmental bodies. Such evaluations consist of assessing, on the basis of the same dataset and at the same time, the systems operated by the various campaign participants.

This method aims to ensure equity among the participants, in addition to guaranteeing measurement reproducibility, repeatability and accuracy. The goals are threefold:

  • Evaluate the performance of competing systems through use of consistent evaluation methodologies,
  • Sketch out the state-of-the-art across different information technology fields,
  • Promote research and a greater utilization of information technologies in day-to-day life.

An error analysis to serve developers and integrators

LNE also proposes to both developers and integrators a set of technologies to conduct analyses that expose the strengths and possible areas for improvement of the systems studied. These analyses are performed using data close to those intended by the final application. The focus here is to understand the factors causing system performance to vary. This approach has two objectives, namely:

  • Highlight system strengths,
  • Assist developers in identifying data with the potential for improved processing.

Evaluations that rely heavily on preliminary research

As a complement to its evaluation services, LNE conducts research on evaluation protocols. Towards this end, the laboratory is working to develop evaluation tools, in addition to performing corpus analyses and participating in international standardization activities.

Special emphasis is placed on carrying out complex evaluations in the area of multimedia and multimodal information processing. Along these lines, LNE has refined new metric, non-trivial evaluation methods plus systems for annotating sophisticated data.

Sample projects

Quaero (www.quaero.org)

Quaero is a joint research and industrial innovation program focused on technologies that automate analyses and that classify and use multimedia and multilingual documents, with financing provided by OSEO. It comprises 32 partners collaborating on topics around the research and building of advanced application demonstrators and prototypes, as well as innovative services in pursuit of accessing multimedia documentary information such as spoken language, images, video and music. The consortium is composed of French and German institutions covering both the public and private sectors.
Within this construct, LNE has organized and conducted evaluation campaigns aimed at:

  • Voice recognition
  • Speaker recognition and tracking
  • Detection of entities named in written and oral documents
  • Translation
  • Optical character recognition

These features are used in order to establish and adjust the Quaero project's research and development priorities.

DEFI-REPERE (www.defi-repere.fr)

DEFI-REPERE is an evaluation project in the field of multimedia recognition of persons in televised documents; it has received financing by France's defense procurement agency (DGA). DEFI-REPERE is associated with the National Research Agency's systems development project of the same name.
This project's objective is to build a personal recognition system applicable to audiovisual broadcasts. The various information sources used are:

  • the image in which individuals are visible,
  • embedded texts in which persons' names appear,
  • the soundtrack in which the speaker's voice is recognizable,
  • the vocal signal content in which an individual's name is pronounced.

The performance of these systems is determined each year by LNE through a dedicated evaluation campaign, based on audiovisual broadcasts, journals, debates and entertainment shows, all in the French language.

VERA - Advanced Error Analysis for Speech Recognition

The VERA project, financed by France's National Research Agency, is intended to develop a generic methodology and tools that serve to localize and diagnose automatic speech recognition (ASR) errors. New measures will be derived to provide a contrasted focus on the various types of errors, depending on the application. The objective of this project is to study these errors in depth in order to both obtain a precise diagnosis and improve the performance of ASR system state-of-the-art.
As part of this project, LNE proposes, among other things, new metrics that take into account the relative significance of transcription errors as a function of the targeted application.

FaBiole - Reliability in Vocal Biometrics

This project, also financed by France's National Research Agency, lies within the field of voice comparison. Such comparisons entail expressing the likelihood that two voice recordings had been spoken by the same person. The evaluation of this task is non-trivial given the importance of understanding the circumstances under which an audio recording contains sufficient information on the speaker. FaBiole's objective is to identify the speaker's characteristic information present in each of the two recordings and then measure the consistency of such information.
Upon completion of this project, LNE will organize an international campaign based on a set of new metrics and a new method for choosing the recordings to be compared.

IMM - Multilingual Multimedia Integration

This project is being performed under the purview of the SystemX Technological Research Institute (IRT). It belongs within the field of multimedia information processing and is aimed at developing a platform capable of synthesizing information stemming from video and text data. In this context, LNE conducts evaluations according to 3 distinct protocols (evaluation on an a priori and a posteriori corpus, as well as usability testing), thus allowing developers from the project's 7 participating companies to highlight the platform systems' strengths as well as areas for improvement. These evaluations also make it possible for the end user to assess project progress.

SVA - Simulation for self-driving vehicle safety

Within the framework of France's industrial resurgence (Ecological mobility sector), the SVA project incorporates the devices that help oversee all or part of the driving tasks. The main objectives are to characterize, model and simulate self-driving vehicle safety. To run these simulations, LNE is working on characterization methods for on-board sensors, in addition to evaluating the driving decision-making algorithms built using collected information.

Sample publications

