Knowing that you want to integrate AI is one thing—but how do you actually do it? For biotech companies kickstarting their generative AI journey, here are four things to consider.
Pictured: A developer builds an AI system/Nicole Bean for BioSpace
2023 was the year AI, particularly generative AI, penetrated the mainstream, with ChatGPT attracting over 180 million users, highlighting a dramatic increase in consumer acceptance of this emerging technology. Now, business leaders are ready to embrace generative AI and integrate it into their processes.
One industry ripe for technological disruption is the biotech sector, with AI—specifically machine learning (ML)—offering solutions to longstanding challenges in protein design that will enable researchers to bring bio-based products to market quicker and at lower cost. Take biopharma, where companies invest an average of $280-$380 million on R&D to develop one drug, with preclinical costs accounting for an estimated 43% of total capitalized costs. Any tools that can accelerate and make this process more efficient will have a huge positive impact on the business.
But knowing that you want to integrate AI is one thing; how do you actually do it? As a co-founder of biotech startup Cradle, I help customers to identify protein engineering projects that are compatible with Cradle’s generative ML approach, enabling them to reach target product profiles in less time and with fewer experiments, helping to dramatically reduce the time and cost needed to bring new products to market.
For biotech companies kickstarting their generative AI journey, here are four things to consider:
Whether to Build or Buy
When looking to adopt any new technology, companies face a dilemma: whether to build out a team and technical capability in-house or outsource this part of the business to third-party experts.
Building your own ML technology allows you to leverage a wide range of publicly available generative models and develop custom models to predict experimental outcomes specific to your business. However, establishing and maintaining such infrastructure requires considerable time from ML engineers, who are currently highly compensated and sought after.
More importantly, while public-domain ML models can readily be used for a first round of predictions, this so-called “zero shot” approach has limited impact compared to the value you gain from further training a model on project-specific datasets. This is a non-trivial effort, as the scarcity of relevant public databases leaves you to benchmark models on internal data alone. Moreover, you need to be mindful of how predictions and generated protein sequences reach end users in your organization. How will computational and experimental teams interact to prevent communication bottlenecks or a lack of servicing?
Alternatively, buying ML solutions allows you to get started quickly and access up-to-date models. This route also bypasses the need to invest in hiring ML engineers and building infrastructure from scratch, which is particularly beneficial for early-stage startups. Recent examples include Recursion’s partnership with Roche/Genentech and Exscientia’s partnership with Bristol Myers Squibb.
Of course, not all commercially available ML services are created equal. It’s important to understand how core models have been developed and benchmarked, and whether the solution supports federated learning from smaller project-specific datasets. Moreover, giving an external party access to your data requires rigorous data security measures and clarity over IP ownership.
Beware of Imposter Syndrome When it Comes to Your Data
It’s a common misconception that harnessing generative AI requires tens of thousands of data points in a fully automated wet lab process with a flawless laboratory infrastructure management system (LIMS). Less than 10% of organizations have access to this type of fully automated high-throughput experimentation; much more common are laboratories that produce smaller datasets, often subject to round-to-round variation.
The great news is that with the advancement of large ML models that have been trained on billions of protein sequences, ML access has been democratized, benefiting labs with lower throughput. At Cradle, we’ve found that even with relatively low throughput, such as 96-well plate assays, ML models can learn enough to substantially enhance the protein design process over multiple rounds.
While waiting for the “perfect” dataset may be tempting, progress can still be made without it as long as you follow consistent protocols to control variance and implement internal controls across rounds. Think of it like this: the ML model of your experiment is the car that you take for test drives on your experimental road. Don’t postpone your drive until you can race F1 cars on the Monza circuit. Rather, start by figuring out which car performs best in supporting your weekly errands (and don’t cut corners).
Manage Expectations of Your Model
Instead of assuming that a model will function perfectly on the first try, understand that the most significant improvement often stems from proper configuration, including fine-tuning with your experimental data. Numerous AI and ML forms exist, each designed to perform a specific function, so it’s crucial to match the appropriate model with the task you intend to accomplish.
Remember, ML is a tool, and managing your team’s expectations is essential. By setting realistic expectations around the importance of calibrating a model with your project-specific data, your team can remain focused and adaptable, avoiding initial setbacks.
Define What Success Looks Like
Lastly, if you’re considering implementing AI in your R&D process, have a clear definition of success, which will allow for effective progress measurement and finesse, if necessary. Think: What are you looking to achieve? What are key cost drivers? Track research efficiency through metrics such as hit rate and hit magnitude to determine data points and experimental rounds needed, and monitor mutation novelty to bolster IP position and avoid project stagnation.
While exceptional results are the goal, it’s important to acknowledge that Rome wasn’t built in a day, and deriving value from any ML or AI project demands patience and effort. Ensure that you outline initial steps and create a reporting system that engages senior stakeholders in the AI journey, highlighting its long-term value.
AI in your biotech company is a worthy investment, but you must commit to integrating it in a way that makes sense to your company.
Elise de Reus is a co-founder of Cradle, a generative AI platform that helps scientists design and engineer proteins. She works closely with R&D teams throughout the biotech industry to onboard new projects onto Cradle’s platform. Elise previously engineered microbes in high throughput at Zymergen and Perfect Day. She holds a Ph.D. in fungal synthetic biology.