Authors: Arun Sujenthiran, Lucia Groizard, Melissa Estevez, Cornelius Thaiss, Natalia Viani, Kathi Seidl-Rathkopf
Contributors: Maria Alvarellos, Adam Manhi, Emma Salib, Amanda White
Artificial intelligence (AI) is transforming the way researchers use electronic health records (EHRs) to generate real-world data (RWD) and real-world evidence (RWE) for clinical studies and scientific discovery. Large language models (LLMs) now make it possible to extract meaningful clinical information from unstructured EHR text at a scale and speed far beyond traditional manual abstraction. This capability has the potential to accelerate clinical research, support regulatory decisions, and improve patient care.
However, the data within EHRs are complex, inconsistently documented, and often ambiguous. Furthermore, LLMs can behave unpredictably, be sensitive to input variations, or reinforce biases present in source data. Without proper oversight, these challenges could undermine data quality, reduce the reliability of downstream analyses, and erode public and patient trust, especially when working with sensitive health data. While these risks are well recognised, the costs of limited adoption are often overlooked, as vast amounts of longitudinal patient data would otherwise remain inaccessible because manual curation does not scale. Therefore, responsible AI deployment represents not only a technical advance but an ethical imperative to maximise patient benefit, under-scoring the need for high-quality, transparent evaluation frameworks.
To address this gap, Flatiron Health has developed the Validation of Accuracy for LLM/ ML-Extracted Information and Data (VALID) framework. This provides a structured, multi-dimensional approach to assessing the accuracy, reliability, and fitness-forpurpose of LLM-extracted clinical information. In the UK, Flatiron Health applies these principles within a robust governance approach built on standards that reflect a commitment to responsible innovation. By embedding structured, high-quality frameworks such as VALID in health research, the UK can set a global benchmark for trustworthy use of LLMs in healthcare to improve patient care and outcomes.
This white paper outlines how high-quality, well-governed health data is essential for safely developing and deploying LLMs in health research. It provides practical recommendations for ensuring that LLMs are adopted safely, responsibly, and to their full potential within the UK health data ecosystem.