LNE develops standards and carries out intelligent systems evaluations in order to provide its customers with reliable benchmarks and results to qualify their systems and to make pragmatic and reasoned decision-making possible.
LNE performs functional testing and performance measurement of AI systems to enable developers to optimize the development process until they get a viable product.
To determine the performance level of a technology, it is necessary to develop metrics. In addition to those relating to the overall performance of the system, specific metrics associated with its various components help identify areas for improvement.. They can be used to assess the relevance of the technological choices and orientations made to advance the effectiveness of the technological solution, in particular when progress is assessed in relation to the investments made, so as to estimate the impact of the latter.
Qualification of intelligent systems is therefore imperative for development and certification purposes. It allows to:
Indeed, the evaluation will allow the developer to identify the characteristics that differentiate its technology from those of its competitors. A developer who has performed well in an evaluation campaign can not only guarantee his customers that their systems comply with a set of quality requirements, but also demonstrate, for marketing purposes, that his system has stood out from the competition by its effectiveness.
LNE provides its customers with reliable benchmarks and results to pragmatically choose the AI solution to be adopted by their companies among existing technologies.
The evaluation problem is significantly new and has a metrological specificity: the aptitude of intelligent systems is to be measured mainly on the functional level and lies above all in their adaptability, specific to the notion of intelligence. It is therefore not only a question of quantifying functions and performances but also of validating and characterising operating environments (areas of use).
Given the wide variety of environments to submit to the system, the customer does not have the means to perform all the tests required to meet his needs. And of course, he cannot rely solely on the developer, who will be tempted to reduce his field of evaluation to the cases that seem most convincing for his product. Clients wishing to rely on a third party arbitrator may find it advantageous to turn to LNE, which has several distinctive advantages: it is a public agency, independent of any particular interest and whose opinions are therefore sincere, as is the protection of the intellectual property of the elements entrusted to it (processes and data to be tested); this neutrality is reinforced by its strict specialisation in the evaluation profession.
LNE provides objective quantitative criteria to assist its customers in making an informed choice of artificial intelligence technology to be acquired from existing offers. He thus brings his expertise to:
After acquiring the technological solution, LNE supports its clients in:
LNE, through rigorous measurement of technological progress, enables funding bodies to estimate the impact of investments made.
LNE, as a trusted third party evaluator, assists public bodies with project management by:
The evaluation campaigns organised by the LNE are multiannual projects which consist in proposing a common framework for the competition of teams developing competing approaches. These campaigns constitute an essential means of organization and motivation, for the maintenance of exchanges between various participants, generating an important ripple effect and making it possible to remove scientific or technological locks, to improve the performances and to accompany the rise in TRL (Technology Readiness Level) of the systems concerned.
Data is the key to AI evaluation and development. LNE is familiar with building large, high-quality, structured and labeled datasets. They can be based on customer data or provided by LNE's partners, business experts in the various fields covered by its evaluations. The LNE ensures that their confidentiality and ownership are respected.
LNE organises evaluations of AI systems that use different types of data:
Depending on the customer's needs, LNE can perform physical tests in real but controlled environments, virtual tests in fully simulated environments and mixed tests combining real and simulated stimulations.
The tests in real environments are carried out in anechoic and reverberant rooms, climatic chambers (temperature, humidity, pressure), salt spray or sunlight, in order to analyse the influence of environmental conditions on the performance of intelligent systems. LNE is also able to perform vibration, shock and constant acceleration tests to evaluate the behaviour of systems under extreme conditions, in order to precisely determine the operating limit conditions.
For the evaluation of autonomous systems moving in open and changing environments, given the almost infinite number of configurations with which the system could be confronted, LNE participates in the development of virtual test environments allowing system validation by simulation. This virtualization of the characterization of intelligent systems eliminates the prohibitive costs that would be generated by conducting all tests in real environments.
In order to develop its evaluation resources and maintain its own skills, LNE also carries out well-targeted research projects, alone or in the framework of public and private partnerships, and ensures the transfer of its results where appropriate. LNE research topics generally focus on:
LNE also participates in the major transversal challenges of AI by developing references to explain, guarantee and certify intelligent systems and to enable the development of standards and regulations. In particular, LNE participates in the AFNOR commission on artificial intelligence, the AFNOR strategic information and digital communication committee and UNM section 81 on industrial robotics.
These standards enable manufacturers to know exactly what the regulatory expectations are before an intelligent system is put on the market. They reassure consumers about the product, particularly through an ethical and responsible approach to artificial intelligence.
The OPEROSE project is part of the Ecophyto II plan supported by the Ministry of Agriculture and Food and the Ministry of Ecological and Solidarity Transition. Supported by LNE and Irstea, it aims to organise the evaluation campaigns of the Challenge ROSE, making it possible to measure the performance and technological maturity of robotised solutions ensuring automatic weeding of crops. The whole treatment chain is taken into account during the evaluation, from the detection of weeds and plants of interest to the effective weeding action. Testing environments consist of both databases and actual agricultural parcels.
This European project aims to improve the evaluation methodologies of companion robots. In addition to its research on the reliability of decision support systems, the Laboratory conducted environmental tests (temperature, humidity, vibration, etc.) to measure the impact of the environment on robot performance.
The project concerns the comparison of voices in the forensic field, in relation to national security or forensic issues. The objective of the project is to develop an accreditation methodology and establish objective measurement standards that will facilitate voice comparison processing in police services and enhance the admissibility of evidence in court.
The ALLIES project, funded by the ANR, aims to develop metrics and protocols for the evaluation of translation and diarization systems capable of self-learning and self-evaluation. Some of the metrics developed in this project will allow these intelligent systems to measure their own performance improvement. Other metrics will compare the different existing algorithms and determine the most promising approaches. The project also aims to develop a European platform dedicated to the development and evaluation of these systems, in a replicable research approach. The evaluation data and protocols will thus be made public.
The SVA project, within the framework of the new industrial France (ecological mobility sector), has for main objectives to develop tools allowing to qualify the performance and the safety of autonomous vehicles thanks to virtual tests. In particular, LNE is working on methods for characterising on-board sensors and evaluating driving decision making algorithms on the basis of the information collected.
This project led by IRT SystemX concerned the field of multimedia information processing and made it possible to develop a platform capable of synthesizing information from video and text data. In this context, LNE organised evaluations according to three different protocols (corpus a priori, corpus a posteriori and usage test) which enabled the developers of the seven companies in the project to highlight the strengths and areas for improvement of the platform's systems.
This project, funded by the ANR, concerned the field of voice comparison and consisted of expressing the likelihood that two voice recordings were made by the same person. At the end of the project, LNE organised an international system evaluation campaign to carry out this non-trivial task.
The VERA project, funded by the ANR, aimed to develop a methodology and generic tools to enable the precise localization and diagnosis of errors in automatic speech recognition (ASR) systems in order to improve their performance.
The DEFI-REPERE was an evaluation campaign in the field of recognition of persons in audiovisual programmes, financed by the DGA and the ANR. The recognition systems evaluated used multimedia information such as:
The performance of these systems was evaluated by LNE through evaluation campaigns. The test corpuses used by LNE for the evaluation contain audiovisual programmes, newspapers, debates and entertainment programmes in French.
Quaero was a federative research and industrial innovation programme on automatic analysis, classification and use technologies for multimedia and multilingual documents, funded by Bpifrance (formerly Oséo). It brought together 32 French and German partners who worked together to develop automatic systems for processing information contained in multimedia documents (spoken language, images, video and music). In this context, LNE organised evaluation campaigns for: