The breakthrough of CRISPR technology in the past two decades has allowed biologists to refine the manipulation of DNA, to slice and dice it in order to create organisms tailored to particular purposes. That free-wheeling editing of genes, though, produces a new problem: how to organize all the complexity of the different edited pieces of DNA.
That’s especially important for the multi-hundred-billion-dollar portion of the drug market called biologics, basically engineered proteins that can achieve a particular purpose. If you’re going to engineer new proteins through CRISPR, you need to do it in a systemic way, which is a fairly demanding combinatorial problem.
Hence, some smart young biotechs are turning to deep learning forms of artificial intelligence, as deep learning is a technology that loves combinatorial problems.
Biotech firm Absci, which came public last year, was founded a decade ago by CEO Sean McClain, who came up with a novel way to engineer E. coli cells as factories for producing custom proteins that a drug maker would want, such as monoclonal antibodies that can fight viruses. You could say McClain is the Elon Musk of protein manufacturing.
Greater manufacturing capability engendered a new problem: What to make, exactly.
Shortly before going public, Absci bought another startup, Denovium, a three-year-old firm pioneering deep learning to analyze all the many combinations of proteins that McClain’s cells can churn out.

“We’ve built a very large library of these genetic parts, and we can snap them together combinatorially,” says Absci chief technologist Matthew Weinstock. “And which sequence of DNA is best to produce this protein is the problem of codon optimization, and it’s a very big challenge.”
Absci Inc.
“We’ve built a very large library of these genetic parts, and we can snap them together combinatorially,” explained Absci chief technologist Matthew Weinstock in a meeting with ZDNet via Zoom. “And which sequence of DNA is best to produce this protein is the problem of codon optimization, and it’s a very big challenge.”
“If we have a million to a billion different cell lines, we need a screening capability that allows us to go through them to fish out the needles from the haystack, to find these genetic designs are the right ones.”
Not only is the manufacture of proteins a combinatorial challenge, but so is the determination of which protein will work as a biologic for a given disease, the fundamental question of drug discovery.
“We can randomize the protein sequence itself and ask what protein sequence is the best for binding to this particular target,” said Weinstock.
Weinstock, who has a PhD in biochemistry from the University of Utah, had previously run the development of next-gen therapeutics at startup Synthetic Genomics, Inc. There, he met up with Gregory J. Hannum, a PhD in bioengineering from UC San Diego. Hannum would go on to found Denovium in order to build deep learning tools.
Following the acquisition a year ago, Hannum became co-lead of AI research at Absci, along with his Denovium co-founder, Ariel Schwartz.
“Biology is one of the most complex problems that the planet has,” said Hannum in the same interview with ZDNet.
“It’s essentially a self-bootstrapped system, billions of years in the making that, if we could just understand what all the different letters are, and what their combinations were, we’d have tremendous power to engineer new drugs and help humanity in new ways.”
The field of biology has built “beautiful databases” by wet-lab observation, notes Hannum, such as the UniProt database or Universal Protein Resource, which is maintained by a consortium of research centers around the world, and which is funded by a gaggle of government offices, including the U.S.’s National Institutes of Health and National Science Foundation.
Despite those beautiful databases, and despite basic analysis with techniques such as Hidden Markov Models, a third of all proteins remain a mystery in terms of their function.
To try and resolve the mystery, Denovium built one giant model to tackle all proteins at once.
“Rather than have hundreds of thousands of small models, we built one deep learning model that can go straight f