Herophilus Publishes General Method for Detecting Relevant Signals in Machine Learning Analysis of Complex Biological Datasets

Herophilus, a leading biotechnology company developing neurotherapeutics to cure complex brain diseases, today announced the publication of research that describes a new statistical method to identify and analyze the effects of potentially confounding variables on machine learning models for complex biological datasets.

The method, published in Cell Patterns, is applicable to any datasets with hierarchical structure, making it broadly useful for machine learning analysis of large-scale, real-world datasets.

SAN FRANCISCO--(BUSINESS WIRE)-- Herophilus, a leading biotechnology company developing neurotherapeutics to cure complex brain diseases, today announced the publication of research that describes a new statistical method to identify and analyze the effects of potentially confounding variables on machine learning models for complex biological datasets.

The capability of machine learning (ML) to extract scientific insights from high-dimensional datasets is often limited by confounding variables that bias the models. Determining the influence of confounders is particularly challenging for complex bioscience datasets, which tend to be organized in nested hierarchies that prohibit the use of traditional methods such as linear regression to correct for the effects of nuisance variables. Though tools exist to mitigate known confounders, scientists lack a general method to identify which variables in a set of potential confounders require debiasing.

In “Hierarchical confounder discovery in the experiment–machine learning cycle,” published in Cell Patterns, the authors define a new nonparametric statistical method for scoring the effect of a potential confounder, called the “Rank-to-Group” (RTG) score. RTG scoring is robust to outlier noise and can identify the source of a confounding effect even in non-linear structures. The method is applicable both to raw data and to the results of ML models.

“RTG scoring is a broadly useful tool to analyze high-dimensional datasets with complex, potentially nested, sources of bias – which standard methods for bias identification can’t address. This approach enables a virtuous cycle of experimental design, data collection, and model building for the reduction of bias in data and thus strengthens the use of machine learning in discovery science,” said Sean Escola M.D., Ph.D., co-founder of Herophilus.

“Herophilus is focused on the discovery and development of curative therapeutics for brain disease, but we maintain a serious commitment to advancing the tools of foundational scientific inquiry for the benefit of all,” said Saul Kato, Ph.D., co-founder and CEO of Herophilus. “The next wave of ML research is moving beyond strict model performance into considerations of reliability, interpretability, and bias. RTG scoring has become part of our everyday use of ML for doing interpretable science, and we felt it merited sharing with the community.”

About Herophilus

Herophilus is a San Francisco-based neurotherapeutics company focused on curing complex brain diseases. The company’s scaled discovery platform combines brain organoid science, systems neuroscience approaches, robotic automation, and advanced machine learning techniques to generate multi-modal “deep phenotypes” which are exploited to identify novel therapeutic targets and novel treatments for disorders including neurodevelopmental, psychiatric and neurodegenerative diseases. To learn more, visit www.herophilus.com

Contacts

Thermal for Herophilus
Joanne Lin
press@herophilus.com

Source: Herophilus

MORE ON THIS TOPIC