Francesca Ieva is an Associate Professor of Statistics at the MOX – Modeling and Scientific Computing lab of the Department of Mathematics, and co-director of the Health Data Science Center at Human Technopole.
Break the ice by telling us a bit about yourself, and about your educational and professional path.
After graduating in Mathematical Engineering in 2008, I obtained my PhD in Mathematical Models and Methods for Engineering in 2012, and then spent a period abroad at the MRC Biostatistics Unit in Cambridge during my postdoc. As a researcher, I moved to the University of Milan (Statale), in the Department of Mathematics, for three years, and then returned “home” to the Politecnico in 2016.
I love sports (especially volleyball), music… and my job! I like the idea of being able to do something useful for society. For this reason, I have always worked in statistical learning in the biomedical field, that is, developing statistical and Machine Learning methods for the analysis of complex medical data, with the goal of supporting clinicians and governance in decisions that impact our health.
What are the clinical data that someone in your field works with?
The types of data we work with include all information related to an individual, both before and after they become a patient. These range from omics data (genomics, transcriptomics, proteomics, metabolomics), to imaging (PET/CT, MRI and fMRI) and the information that can be extracted from them (known as radiomics), to clinical data contained in Electronic Health Records, which record prescriptions, procedures undergone by the patient, administered drugs, and so on. There is also health-related information derived from biosensors and apps that people now use daily to monitor their habits. In short, all structured and unstructured information that describes a person’s health status and care pathway.
Once all these types of data have been collected, why is it useful to analyze them?
It is useful because data analysis allows us to understand more about the phenomenon being observed—something that is not easily deducible from direct observation of a patient by a doctor. In fact, there are informative elements within images, genomic fingerprints, or patterns in disease progression that clinical practice cannot clearly capture when focusing on a single patient. Moreover, analyzing data provides a kind of augmented view of reality for clinicians, derived from a broader understanding of the clinical and biological phenomenon, in which the individual case can be placed as part of a wider biological variability.
Being able to do this starting from a rich and diverse source of information that integrates different modalities is undoubtedly a unique opportunity, because it is only from the combination of clinical and molecular traits that true personalized care can emerge.
What tools are used to perform data analysis?
We are talking about statistical tools and mathematical or stochastic models that first allow us to describe the phenomenon and its variability, then to capture and summarize its key features, and finally to perform inference. These tools come from statistics, Machine Learning, mathematical modeling, biomedical engineering, and computer science, and increasingly require significant computational capabilities, knowledge of different programming languages, and interdisciplinary skills across multiple domains.
Analyzing clinical data is certainly useful and important, but what specific challenges does the use of healthcare data present?
Unfortunately, many challenges still remain today, often more regulatory than technological in nature. First of all, there are practical difficulties related to the still relatively low level of digitalization of healthcare systems in our country, as well as the complexity and extreme heterogeneity of the data that should be integrated for the same individual. There are issues related to data collection methods, software systems, and temporality (a patient may return multiple times and needs to be followed longitudinally, possibly across different institutions…). Another challenge is the lack of studies specifically designed to collect “complete” patient data. Omics data are collected when there is a specific need for a certain disease, which means that the cohorts we work on often represent a particular subset related to a specific problem. And even when properly constructed datasets are available, methods capable of handling, training, and modeling multimodal, complex, and large-scale data are not yet widespread in the current state of the art.
The most significant challenges, however, are regulatory. The fragmentation of responsibilities in the healthcare system does not help. To date, no shared and uniform solutions have been found across the entire country for protecting privacy while also enabling research. There is considerable heterogeneity in decisions among ethics committees, and a lack of clear guidelines regarding informed consent and its use. Finally, there is still very limited awareness among citizens and communities about the importance of these issues.
Despite these difficulties, what are the prospects and opportunities offered by data analysis today, both in general and in healthcare?
I believe that today data are a language of the reality we live in. Just as we would never think of engaging with the world without knowing how to speak a language—which we learn from early childhood—I believe it is no longer possible to think about the world and our place in it without proper literacy in the language of data. Because we ourselves are data and producers of data, it is inevitable that data analysis becomes a powerful tool for an enhanced understanding of reality. This is even more true in healthcare, where digitalization and the discovery of complex patterns not easily detectable by the human mind can make a difference in prevention, diagnosis, care, and treatment—that is, across the entire healthcare process.
I therefore believe it is important to continue raising awareness about how proper data handling, appropriate protection, and the ability to use data for research purposes are essential for a better healthcare system, both in the services it provides and in its ability to ensure a better quality of life.