Achieving Data Diversity Through AI in Drug Development

October 18, 2023 |

6 min read |

AI and machine learning could transform drug discovery, but first, practitioners must overcome ethical challenges en route to medicines for all.

Pictured: AI, digital technology concept/iStock, Phongsak Sangkhamanee

The coming decades could see an astounding transformation in drug discovery due to AI. But experts say it’s critical to deal now with potential ethical and social justice landmines that may be encountered on the way to this new era.

“The big pivot with AI-based drug development is the ability to dramatically scale up our ability to find potential targets,” said computer scientist Suchi Saria, director of the Machine Learning and Healthcare Lab at Johns Hopkins University. But this brave new world will be “fraught with ethical issues,” she told BioSpace. “It’s so expensive to design drugs and take them to market, so our trials tend to be homogenous. We pick simple, narrow patient populations. And the system needs to be more open, to reflect real-world populations.”

Speaking with BioSpace, experts raised questions, including: how can data truly be rendered anonymous to protect patient privacy? How rigorous is the current consent system? And how do scientists and clinicians develop robust patient sets that don’t exclude members of underrepresented groups or the socially disadvantaged?

First, the good news: AI has the potential to help researchers fine-tune who might respond well to a given medication, increasing fairness and equity. This will make it more likely that “the person likely to respond to a drug will actually be covered for it,” said Kim Branson, global head of Artificial Intelligence and Machine Learning at GSK.

Currently, there are often many options to treat a condition, and insurance companies may not want to foot the bill for the most expensive medication without proof it is the best one for the patient, Branson told BioSpace. Knowing which patients will respond to a medication will benefit the socioeconomically disadvantaged—49% of whom reported that their insurance company refused to cover at least one drug in a given year, according to a 2020 report by NPR. It will also be critical for the pharmaceutical companies themselves. “In some cases, we simply can’t make enough of a drug for everybody anyway,” Branson said.

He cited a recent Phase II trial of GSK’s bepirovirsen for hepatitis B where about 10% of patients responded so well they experienced a functional cure (markers of the virus were below the lower limit of detection). The study relied on millions of data points, including clinical data, lab tests and genotyping the virus, and the company plans to keep sifting data through machine learning algorithms to better understand who might benefit. “Strong evidence has shown that drug targets with genetic validation and biomarkers of response are more likely to succeed, and AI has a key role in complex biomarker discovery,” Branson said.

Another benefit could be slashing the staggering cost of drug discovery, thus bringing more medications to market more easily, at a lower cost. Right now, the cost of developing and bringing a new drug to market is about $2.3 billion. Each successful drug has to carry on its back the cost of many failures. All of society bears that cost, and AI may help ease the burden.

“Improving these odds could potentially open the door for more affordable drug pricing,“ said Daphne Koller, a computer scientist, a former professor at Stanford and CEO and founder of Insitro, a machine learning–enabled drug discovery firm that is working with companies like Gilead and Bristol Myers Squibb.

Protecting Privacy: A Critical Concern

Legal protections like the Health Insurance Portability and Accountability Act (HIPAA) in the U.S. and General Data Protection Regulation (GDPR) in Europe prevent outside access to sensitive patient data. Even so, healthcare data breaches are a huge concern, and according to The HIPAA Journal, have been trending upward for the last fourteen years.

A 2021 study in BMC Medical Ethics noted that individuals can be identified in data repositories of both public and private institutions, even if data has been scrubbed and anonymized. In one 2018 study, an algorithm was able to re-identify over 85% of 4,720 adults and nearly 70% of 2,427 children in a cohort, even after removal of protected health information. And in England in 2015, Google DeepMind and the Royal Free Hospital in London signed a deal to copy 1.6 million medical records to be fed to a DeepMind AI. Ultimately, the deal was outed, and Google backed off.

Data sharing and patient privacy do not have to be incompatible, writes computer engineer Mihaela van der Schaar, who runs a lab at the University of Cambridge. Her lab is developing synthetic patient records that mimic real-world records and can be used for machine learning. The novel framework, called ‘Anonymization through Data Synthesis using Generative Adversarial Networks,’ produces artificial data but conserves the properties of the real datasets. To remove any inherent bias in the actual data—including under-representation of historically marginalized groups—that would be transferred to synthetic data, the lab is developing DECAF, a synthetic data generator that removes biased edges in the artificial data.

Even when patients consent to share their data, they may not understand all the implications. “Patient consent forms should be provided to all patients before any type of screening,” Koller told BioSpace. “These forms should clearly outline how the patients’ data might be used and offer them the option to opt out of data sharing, should that be their preference.” Koller said it’s important for healthcare providers to ensure that “patients fully understand what they are agreeing to when they sign these consent forms.”

How to Be Fair and Represent All

FDA research found that about 10% of drugs approved between 2014 and 2019 showed differences in exposure to and/or response across races/ethnicities, said Abdoul Jalil Djiberou Mahamadou, a Stanford University and GSK postdoctoral fellow in AI and biomedical ethics. Clinical trials for drugs approved during this time period over-represented whites and vastly under-represented Blacks, Asians and other ethnic groups. “If you have biased data, you will have a biased outcome,” he told BioSpace.

One of the most trusted and popular data pools comes from the UK Biobank, for which half a million residents of that country have volunteered—but, Mahamadou said, “that is a high-income country.” And 94% of the bank’s data is from whites, 2.3% from Asians and 1.5% from Black or Black British people. “If [you] are using AI to develop new drugs, you need representative data, and you also need it from low- and middle-income countries,” he said.

One solution is to collect new data, “but that can take years and be expensive,” Mahamadou said. “You also need to understand that data gathered is a reflection of the culture and society.” As recently as 2021 in Sudan, for instance, according to the United Nations Conference on Trade and Development, there was no legislation about privacy and ethics. “So, you might only want to collect data in countries that have strict permissions in place,” Mahamadou continued. Or, he said, ethicists could provide principles for ethical data collection in countries with no legislation. Above all, Mahamadou said, we cannot only rely on data scientists; we must bring in clinicians and ethicists who understand the country and culture where data is being gathered.

Databanks are deeply aware of these issues and are trying to address them. At the UK Biobank, a project called Our Future Health is recruiting data as broadly as possible, Koller said. “They’ve set up recruitment centers in urban pharmacies. And the All of Us project in the U.S. has made similar efforts to diversify the patient population.” She said that certain data are already diverse because this is required as part of the standard of care in nearly every part of the world—for example, histopathology from patients with solid tumors. And at GSK, Branson said, “We try to buy data globally. We want to make medicine for all of humanity.”

Jill Neimark is a freelance science writer based in Macon, Georgia. Reach her at jillneimark.com.

Academia Artificial intelligence

GlaxoSmithKline Insitro GSK Bio

Jill Neimark