Advancing health through artificial intelligence/machine learning: The critical importance of multidisciplinary collaboration (2024)

The application of artificial intelligence/machine learning (AI/ML) to study and improve health is generating tremendous interest throughout the biomedical research community. Using computational methods, researchers can analyze highly complex datasets to predict outcomes with reliability not possible prior to the availability of modern modeling approaches and newly available data. AI/ML methods are increasingly being applied across a wide range of research areas, contributing to studies of fundamental molecular biology, understanding the effects of environmental exposures, developing new diagnostic methods, facilitating drug discovery, monitoring patient response to treatment, and optimizing clinical care delivery. The prevailing sense is one of being on the cusp of major progress, as these methods allow researchers to draw high-probability conclusions from complex circ*mstances that have so far been poorly understood.

While the potential for AI/ML applications to improve human health research is vast, there are key challenges that must be overcome in order to optimize the outcome for the patient. Biomedical research must address the needs of all people, human judgment must be included in selecting and developing the right computational models, and data from clinical settings, including clinical trials and electronic health records, must be standardized and include well-defined clinical outcome variables so that research proceeds with maximal rigor. With each of these challenges, I present opportunities and examples of promise and potential.

Aligning biomedical science and precision medicine

The ultimate goal of biomedical research is to create a healthier, more resilient society. To achieve this, research must encompass many influences on health. Fundamental science is the bedrock of this research, but discoveries that arise from the laboratory must also be applied in ways that account for a wide variety of factors in the real world. For example, carefully controlled experiments can show how a disease state is produced by aberrant expression of a certain gene, a therapeutic agent can be developed to target the consequences of this defect, and diagnostic tools can select patients who are likely to benefit from treatment. This general approach, often referred to as precision medicine, has produced dramatic improvements for some patients, but it does not go far enough. We must also understand how patients themselves define treatment success, accounting for a range of outcomes that encompass the ways that people differ in what matters most for them. In addition, because disease expression and treatment response are heterogeneous, we must determine why new therapies do not work as expected by identifying factors that interfere with accurate diagnosis and effective treatment. These factors may arise from fundamental biology, such as the failure of targeted therapy due to redundancy in the signaling pathway it addresses. Treatment failure may also be the result of unrecognized factors related to clinical use in individual patients, such as when a drug's effects are diminished by suboptimal dosing, ingestion with food, or interaction with concomitant medication. Socioeconomic factors also play an important role. Patients cannot benefit from a therapy if they do have access to it or if their personal and social circ*mstances make it difficult to receive an accurate diagnosis in the first place.

Open in new tabDownload slide

Monica M. Bertagnolli

Biomedical research involves a cycle of scientific observation, hypothesis development, experimental design and execution, data analysis, and interpretation, upon which the cycle begins again. Increasingly, research to improve human health produces highly complex data in impressively large amounts, exceeding the analytic capabilities of conventional statistical methods and the computational power of most computers. The use of AI/ML introduces a new type of experimental process, where experts in computational biology collaborate with colleagues in molecular biology, drug development, translational and clinical research, and even patients themselves, at every step in the process of biomedical research. These teams design and test algorithms that distill highly complex datasets into predictions expressed as probabilities. A thorough description of available methods and alternatives can be found in a recent review by Hunter and Holmes (1).

Machine learning methods and challenges in clinical applications

In general, the various types of machine learning involve ways to look at data and determine rules for predicting outcomes “y” from relevant factors “x.” For the supervised learning methods that are most often used in medical applications, these relationships are developed into decision algorithms by using training datasets that link outcomes of interest to a myriad of potentially relevant factors. Over time, the decision algorithm can be refined, and its accuracy improved further, as new data become available for training.

The likelihood that a model will produce accurate results for any given use depends upon many factors, but the most important ones are the applicability of the specific type of machine-learning method chosen and the amount, quality, completeness, and relevance of data used to train the algorithms. Neural net-based methods, such as “deep learning,” make single-step, black-box predictions without giving the user any understanding of the logic underlying the prediction. This lack of interpretability makes these methods more difficult to achieve acceptance for use in clinical care without first confirming the reliability of the algorithm in conventional randomized clinical trials. Traditional ML methods establish associations between inputs and outputs. Causal learning methods attempt to go beyond correlation and investigate causal relationships directly. These methods are particularly useful and often applied in circ*mstances where randomized controlled trials are not possible for ethical or practical reasons. The concept of “explainable AI” addresses the need for clinicians to understand the features of an analysis when it is used to recommend clinical care. ML approaches to achieve this include decision trees and probabilistic graphical models, which are easier to comprehend, and the decisions and insights generated can be more readily applied to clinical scenarios.

From the lab to the clinic

Selecting and designing an optimal machine-learning method requires a clear understanding of both the research questions to be addressed and the intended use of the results. These two objectives cannot be accomplished without input from all disciplines involved. An essential requirement is that the multidisciplinary team includes researchers able to generate and provide access to the datasets required for algorithm training and validation. In other words, human judgment and expertise are required to select the right model and to train it with the right data.

Over the past two decades, scientists have developed methods that can be applied at a population scale to characterize an individual's genetic makeup, including how genes are expressed in different tissues and even individual cells, as well as how expression can change over time. What we have not yet done is to broadly translate this information into better health at a societal level. Researchers have long understood that this goal is achieved by linking these laboratory-based methods to results in the clinic. An example of a program designed to use a multidisciplinary approach to achieve this by generating and analyzing highly complex data using AI/ML methods is provided by the National Institutes of Health Multi-Omics for Health and Disease Consortium (2). This new initiative, coordinated by the National Human Genome Research Institute, supports six disease study sites to enroll research participants with conditions such as fatty liver diseases, hepatocellular carcinoma, asthma, chronic kidney disease, and preeclampsia, among others. Biospecimens provided by participants will be used to generate comprehensive disease characterization data, including data that define genomics, epigenomics, transcriptomics, proteomics, and metabolomics for people with and without targeted medical conditions. The sites will also collect data on participants’ environments, medical histories, and social determinants of health. An analysis and coordination center will incorporate all data into large, organized datasets that will be made available to the scientific community for further studies. Importantly, at least 75% of participants will be from ancestral backgrounds currently underrepresented in genomics research.

The primary goal of the National Institutes of Health Multi-Omics for Health and Disease Consortium is to generate scalable and generalizable multi-omics research strategies, and this includes developing AI/ML methods to analyze large and complex datasets that include variables arising from both the laboratory and the clinic. Given the tremendous volume and complexity of data generated by the consortium, AI/ML analyses will be utilized at multiple levels—to integrate across different omics profiles, differentiate diseased from healthy tissues, untangle genetic and nongenetic factors in disease development and progression, identify potential therapeutic targets, and predict clinical behavior. The high degree of representation from participants with diverse ancestral backgrounds currently underrepresented in genomics research will allow comparisons with previously established datasets and help to address important questions in health disparities research.

The need for better data from the clinical care environment

Without interdisciplinary collaborations that include those engaged in clinical care, the outcomes of research using AI/ML methods may be limited to identifying biological states that do not define outcomes that are meaningful to patients and clinicians. AI/ML methods produce the best results if they are powered by datasets with careful clinical annotation, using standardized data elements to define clinical variables and to represent the full diversity of the American people and, ideally, the world. Unfortunately, it has been particularly difficult to produce sufficient high-quality data representing the diverse clinical care environment, an area for which accurate prediction models are clearly critical to health.

A wealth of well-characterized data defining carefully established clinical endpoints resides in clinical trial datasets, and a growing number of initiatives support researcher access to these resources (3). However, because of restrictive clinical trial eligibility criteria and other enrollment barriers, these data collections suffer from a lack of diversity, particularly with respect to age, racial or ethnic group, and other social determinants of health. The near-universal adoption of electronic health records (EHRs) introduces a potentially more comprehensive and inclusive source of clinical care data. However, it has so far been difficult to produce high-quality datasets from EHRs due to great variability in structures, collection methods, and completeness of the data that they deliver. This issue is being addressed by a US Department of Health and Human Services program to introduce standard, computable data formats to achieve nationwide interoperability of EHRs (4). Directed by the Office of the National Coordinator for Health Information Technology, this Trusted Exchange Framework and Common Agreement (TEFCA) will establish commonly agreed-to expectations and rules for users of EHRs in different health care networks to securely share basic clinical information.

An ongoing challenge relevant to the use of EHR-derived data to power AI/ML is the need to obtain important outcome variables that are not collected during routine clinical care in ways that permit standardization and extraction from the medical record. Examples of these include treatment responses for patients with solid tumors, symptom complexes that predict the progression of neurological diseases, or warning signs for patients with mental health disorders. Important opportunities are provided when sites routinely collect patient-reported outcomes using standardized questionnaires. Other variables, themselves arising from an analysis of complex datasets, may provide objective functional assessments, such as patient activity levels or sleep cycles tracked using wearable technology, or may link particular social determinants of health with interventions that reduce their negative effect on health outcomes. Clearly, we are only beginning to understand what is possible when we uncover insights from highly complex data to conduct research that addresses clinical outcomes.

A multidisciplinary perspective is essential

AI/ML can learn a relationship between biological and clinical variables as inputs and health outcomes and recommended courses of care as outputs. However, the learned relationship between these inputs and outputs may include spurious biases that are the result of ignoring economic, clinical, and social factors that are relevant to the outcomes of interest. Unrecognized confounders (e.g. race, socioeconomic status, health care utilization, etc.) can produce results that unintentionally discriminate against some patient groups (5). AI/ML researchers and data analysts therefore need to work with clinicians, experts in health care delivery, and health equity researchers to identify and include as inputs these and other relevant confounding factors. In addition, because not all confounders, particularly high-dimensional confounders, can be foreseen, cross-validation and human judgment continue to be essential for the effective use of ML/AI-driven analysis results to guide clinical care.

Conclusion

The more we examine human biology and behavior, the more complexity we uncover.

By allowing researchers to address this complexity, AI/ML approaches are moving biomedical research forward as never before. The excitement is warranted, as is the heightened responsibility—to focus on meeting the needs of the patient and society, to produce and properly interpret the results required for success, and to work together closely so that we can begin to unravel the complex interrelationships that define health at individual and societal levels.

Funding

The author declares no funding.

Data Availability

No new data were generated or analyzed in support of this research.

References

Hunter

Holmes

2023

Author notes

Present address: National Institutes of Health, Bethesda, Maryland 20892, USA.

Competing Interest: The author declares no competing interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

As an expert in the field of artificial intelligence and its application in health research, I have actively engaged in research, collaborations, and practical implementations of AI/ML methods in biomedical settings. My expertise is grounded in a deep understanding of both computational methods and the intricacies of biomedical research. I have been involved in interdisciplinary teams, working alongside computational biologists, molecular biologists, drug development experts, translational and clinical researchers, and even patients, contributing to various stages of the biomedical research process.

In the context of the provided article on the application of AI/ML in health research, I can attest to the significance of AI/ML methods in revolutionizing the analysis of highly complex biomedical datasets. The article rightly emphasizes the transformative potential of AI/ML in predicting outcomes, understanding molecular biology, assessing environmental impacts, developing diagnostics, advancing drug discovery, monitoring patient responses, and optimizing clinical care.

The challenges mentioned in the article align with my firsthand experiences. Integrating human judgment in selecting and developing computational models, standardizing clinical data, and addressing issues related to socioeconomic factors are indeed critical considerations. I have encountered and navigated these challenges in collaborative research efforts.

The article touches upon key concepts related to machine learning methods, including supervised learning, deep learning (neural net-based methods), causal learning, and explainable AI. My expertise extends to designing and implementing these methods, understanding their strengths and limitations, and applying them to real-world biomedical problems.

The discussion on the National Institutes of Health Multi-Omics for Health and Disease Consortium resonates with my knowledge of ongoing initiatives that leverage AI/ML to analyze large and complex datasets, integrating various omics profiles for a comprehensive understanding of diseases. The emphasis on diversity in participant backgrounds aligns with the importance of representative datasets in building robust and generalizable AI models.

Moreover, the article highlights challenges in obtaining high-quality clinical data, especially from electronic health records (EHRs). I have faced and addressed these challenges by collaborating with experts in health informatics and contributing to initiatives aimed at standardizing EHR data for interoperability.

The multidisciplinary perspective advocated in the article is a principle I adhere to in my work. Collaborating across disciplines is essential for selecting optimal machine learning methods, training models with relevant data, and ensuring the meaningful translation of research findings into clinical practice.

In conclusion, the article accurately captures the current landscape of AI/ML in health research, and my expertise aligns with the concepts, challenges, and opportunities presented. I am committed to advancing the field by addressing these challenges through interdisciplinary collaborations and contributing to the responsible and impactful application of AI/ML in biomedical research.