Project logo

Abstract

PARTAGES (Advanced Development of Digital Commons for Generative Artificial Intelligence in Healthcare) is a project coordinated by the Health Data Hub. A winner of the “Digital Commons for Generative AI” call for projects under the France 2030 plan, it aims to accelerate and democratize the use of large language models (LLMs) for healthcare professionals.

Its goal: to create a national momentum fostering the emergence of open generative AI solutions in healthcare, as well as their adoption within the healthcare ecosystem—whether academic, research-based, or industrial.

A major national project

With a budget of €9.4 million, PARTAGES is supported by a unique consortium of 32 partners mobilized on a national scale:

  • 10 research teams (CNRS, Inria, universities),
  • 20 public and private healthcare institutions (AP-HP, Institut Curie, Centre Léon Bérard, Ramsay Santé, ELSAN, 12 university hospitals...),
  • DeepTech companies specializing in AI, including Mistral and ReciTAL.
Project logo

A four-step approach

1. Developing Medical LLMs

Building an open-source corpus of French medical text data to train, evaluate, and distribute multiple open-source medical language models.

PARCOMED - PARTAGES Corpus of Open MEdical Documents available here

PARCOMED research only - PARTAGES Corpus of Open MEdical Documents (for research purpose only) available here

2. Creating an open database of fictional medical reports

Creation and release as open data of a unique corpus of over 5,000 fictional medical reports, including 1,450 annotated reports, covering 20 specialties. This effort involved more than 100 residents and young physicians and will be used in particular to train specialized models.

PARHAF - an open French corpus of human-authored clinical reports of fictional patients available here

3. Develop models for targeted use cases

Using these resources, PARTAGES is developing seven specialized AI models designed for high-impact use cases in research, innovation, and the healthcare system.

4. Establish a national federated evaluation platform

Development of a sovereign federated evaluation platform, enabling the evaluation of algorithms on real-world data within a secure regulatory framework. It will be deployed in 20 healthcare facilities but may be used by any facility wishing to access it.

Practical Use Cases for Healthcare

PARTAGES addresses eight priority use cases focused on the analysis, structuring, and generation of medical reports:

    Data augmentation through the generation of fictitious reports

    Automatic pseudonymization of medical reports

    Automated medical coding (DIM) based on medical reports

    Automatic summarization of medical reports

    Generation of clinical cases for medical training

    Identification of tumor biomarkers in oncology

    Analysis of treatment response in oncology

    Automatic detection in infectious diseases, particularly to combat antibiotic resistance

An organization structured into 8 work packages

  • WP1 – Data collection and governance: Structuring, accessing and securing health data.
  • The first work package is dedicated to the overall coordination of the PARTAGES project, as well as the dissemination and promotion of its results. It ensures sound project governance, consistency among the various work packages, and adherence to the timeline, scientific objectives, and regulatory requirements.

  • WP2 – Data preprocessing and harmonization: Standardizing and preparing datasets for model training.
  • The objective of this second phase is to establish a rigorous methodology for the production and use of project data (primarily fictitious medical records) and to ensure the quality control of all raw datasets used to train the project’s models. It also supports data quality control at the healthcare facility level for model evaluation.

  • WP3 – Foundation model development: Designing and training large-scale models for health data.
  • Work Package 3 is responsible for developing and implementing a common evaluation methodology for foundation models and all use cases, as well as for analyzing the evaluation results.

  • WP4 – Use cases in oncology: Identification of tumor biomarkers and clinical applications.
  • The objective of the fourth work package is to develop all the foundational models that will be used by the use cases, including the fine-tuning of the general-purpose generative LLM for the French-language medical domain, as well as the development of BERT-style encoder models (Bidirectional Encoder Representations from Transformers).

  • WP5 – Use cases in infectious diseases: Automatic detection and support for combating antibiotic resistance.
  • Lot No. 5 focuses on establishing the necessary technical infrastructure, including the creation, adaptation, and documentation of the federated validation platform, in which each partner healthcare facility serves as a node.

  • WP6 – Evaluation and benchmarking: Assessing model performance, robustness and reproducibility.
  • The goal of the sixth batch is to manage the recruitment and supervision of healthcare professionals (senior residents and junior physicians) for the creation of a corpus of 5,000 fictional patient records and for annotation tasks.

  • WP7 – Deployment and infrastructure: Integration into secure environments and operationalization.
  • Lot No. 7 oversees the project's legal matters, including the implementation of the contractual framework and the monitoring of work related to the local use of healthcare facility reports.

  • WP8 – Dissemination and collaboration: Promoting open science, sharing resources and engaging stakeholders.
  • This final phase covers the development of models for the specific use cases identified above.

PARTAGES stakeholders

Academic and research institutions

  • LISN
  • GREYC
  • LIG
  • LIMICS
  • LIA
  • LS2N
  • LIS
  • UNESS
  • Bordeaux Population Health
  • CNRS
  • INRIA

Hospitals and healthcare institutions

  • AP-HP
  • Centre Léon Bérard
  • CHU de Rouen
  • GCS HOURAA
  • Institut Gustave Roussy
  • Institut Curie
  • Hôpitaux Saint-Joseph & Marie-Lannelongue
  • Hôpital Foch
  • CHU de Toulouse
  • CHU de Bordeaux
  • CHU Amiens-Picardie
  • CHU Brest
  • CHU de Lille
  • CHRU de Nancy
  • CHU de La Réunion

Industry and private partners

  • ELSAN
  • Ramsay Santé
  • reciTAL
  • Mistral AI

Public coordination

  • Health Data Hub