AI Designed Two New Antibiotics From Scratch. Here's Why That Changes Everything

33 min read...

AI Designed Two New Antibiotics From Scratch. Here's Why That Changes Everything

Antimicrobial resistance kills.

The numbers are staggering enough to numb: antimicrobial resistance is associated with nearly 5 million deaths annually, with at least 1.27 million of those deaths directly attributable to drug-resistant infections1. In the United States alone, more than 2.8 million antimicrobial-resistant infections occur each year, causing over 35,000 deaths2. These are not isolated cases in developing regions. These are patients in American hospitals dying from infections we could treat a generation ago.

What makes this crisis unique is not just its scope but its invisibility to market forces. Between 1980 and 2003, the 15 largest pharmaceutical companies developed only five new antibacterial agents3. The entire industry moved on. The 'new' antibiotics approved by the FDA are mostly variants of existing drugs to which bacteria have already developed resistance mechanisms. For the past 45 years, the discovery of truly novel antibiotic classes has decelerated to a near halt4.

Yet in August 2025, researchers from MIT published work in Cell that changes the conversation5. This paper does not claim to have 'solved' antibiotic resistance. But it does demonstrate something that changes the conversation: for the first time, a generative model created truly novel chemical structures specifically targeting resistant bacteria. Not finding them in existing libraries. Not tweaking known compounds. Creating them from mathematical principles.

This is the story of how AI became a drug designer, not just a drug discoverer. And why that distinction matters for the future of medicine.

Antibiotic Discovery Timeline 1980-2025

The Crisis That Created the Opportunity

Drug-resistant bacterial infections represent a global public health crisis that accelerated during the COVID-19 pandemic. The pandemic reversed progress we had made. Antimicrobial-resistant infections and deaths increased by at least 15% from 2019 to 2020 across 29 countries6. Hospital-acquired infections surged as overwhelmed healthcare systems increased antibiotic usage to prevent secondary infections in COVID patients. In intensive care units, broad-spectrum antibiotic use increased by 38%, creating perfect conditions for resistance evolution6.

The speed of bacterial adaptation is breathtaking. Escherichia coli can develop resistance to ciprofloxacin in as little as 24 hours of exposure7. By contrast, bringing a new antibiotic to market takes 10-15 years and costs approximately $1.5 billion8. We're fighting evolution with economics, and evolution is winning.

This failure is not due to lack of scientific capability. Researchers understand resistance mechanisms at the molecular level. They can predict evolutionary pathways bacteria will take. Modern genomics can sequence a resistant strain in hours, identifying exactly which genes have mutated. What's missing is economic incentive. Antibiotics that cure infections in days sell for hundreds of dollars. Drugs that manage chronic conditions for years sell for tens of thousands9. A new antibiotic might generate $50 million in annual revenue. A new diabetes drug might generate $5 billion10.

The mathematics of pharmaceutical economics are brutal. Between 1980 and 2003, the 15 largest pharmaceutical companies developed only five new antibacterial agents3. Of the 18 largest pharmaceutical companies in 2013, only 5 maintained active antibiotic research programs11. By 2018, that number had dropped to 312. The result: "pharmaceutical and biotechnology companies largely abandoned antibiotics research and development (R&D) since 1990"13.

This abandonment created a discovery void precisely when we needed innovation most. Traditional screening methods had exhausted the low-hanging fruit. The Waksman platform (screening soil bacteria) that gave us streptomycin in 1943 and dozens of other antibiotics through the 1960s had reached its limits14. Natural products screening, which discovered 70% of our current antibiotics, was delivering diminishing returns. The last truly novel class of antibiotics, the lipopeptides, was discovered in 198715.

This necessity created opportunity. The 2025 Cell paper represents a different model: one enabled by foundation funding, nonprofit translational vehicles, and artificial intelligence powerful enough to explore chemical space faster than human medicinal chemists ever could16. Where traditional discovery might screen thousands of compounds over years, AI can evaluate millions in days. Where human chemists are limited by their training and intuition, AI can explore molecular structures no human would conceive.

The Two Targets: Understanding the Enemy

When the CDC classifies a pathogen as an 'urgent threat,' that classification carries weight17. Two such pathogens became the proving ground for a new approach to antibiotic discovery: Neisseria gonorrhoeae, the bacteria causing gonorrhea, and methicillin-resistant Staphylococcus aureus (MRSA).

Neisseria gonorrhoeae: The Imminent Threat

N. gonorrhoeae represents antimicrobial resistance in fast-forward. It is classified by the CDC as an 'urgent threat,' the highest risk category18, and as a 'high priority' pathogen by the WHO19. It is the second most commonly reported bacterial infection in the United States20. The CDC estimates 1.6 million new infections occur in the U.S. annually, with approximately 50% resistant to at least one antibiotic21. Globally, the WHO estimates 82.4 million new cases occurred in 202022.

What makes N. gonorrhoeae particularly dangerous is its rapid evolution. The bacterium has developed resistance to every antibiotic ever deployed against it: sulfonamides, penicillins, tetracyclines, fluoroquinolones, macrolides, and now shows emerging resistance to cephalosporins, our last reliable treatment23. In less than a century, this organism evolved from easily treatable to nearly invincible.

MRSA: The Hospital Killer

If N. gonorrhoeae represents the speed of resistance evolution, MRSA represents its pervasiveness. MRSA colonizes approximately one in three people harmlessly on their skin24. But when it breaches natural barriers through wounds, surgery, or medical devices, it becomes lethal. MRSA infects more than 80,000 people in the U.S. annually and kills more Americans each year than HIV/AIDS25.

The defining feature of MRSA is the mecA gene carried on the SCCmec mobile genetic element26. This gene produces PBP2a, an altered penicillin-binding protein with extremely low affinity for beta-lactam antibiotics. While normal bacterial proteins bind to these antibiotics and halt cell wall synthesis, PBP2a ignores them entirely, allowing the bacterium to continue building its cell wall even in the presence of methicillin, oxacillin, and other beta-lactams27.

Two Distinctly Different Problems

The structural difference between these two organisms deepens the challenge. N. gonorrhoeae is Gram-negative, possessing a dual-membrane architecture with an outer lipopolysaccharide layer that acts as a barrier to many antibiotics. MRSA is Gram-positive, with a thick peptidoglycan cell wall but no outer membrane. A drug effective against one is often useless against the other.

Characteristic N. gonorrhoeae MRSA
Classification Gram-negative Gram-positive
Cell Wall Type Dual membrane (outer LPS layer + inner membrane) Single membrane with thick peptidoglycan wall
Primary Resistance Mechanisms Target alteration (gyrA, parC, 23S rRNA mutations), MtrCDE efflux pump, enzymatic degradation Altered target (PBP2a), biofilm formation, persister cells
Current Treatment Options Dual therapy (ceftriaxone + azithromycin), emerging resistance to both Vancomycin, linezolid, daptomycin (limited options)
Resistance Timeline Developed resistance to every class deployed (80+ years of evolution) Methicillin resistance emerged 1960s, now multi-drug resistant strains common
Annual US Infections ~1.6 million ~80,000 invasive infections
Mortality Rate Low for uncomplicated cases, high for disseminated infections ~15,000 deaths annually (US)
WHO/CDC Classification Urgent threat (CDC), High priority (WHO) Serious threat (CDC), High priority (WHO)

Both pathogens deploy multi-layered defenses evolved over millennia. N. gonorrhoeae, a Gram-negative bacterium, uses target alteration (mutations in gyrA and parC genes confer fluoroquinolone resistance, while 23S rRNA mutations cause azithromycin resistance)28, efflux pumps (the MtrCDE pump actively ejects antibiotics from the cell)29, and enzymatic degradation. MRSA combines altered target sites (PBP2a), biofilm formation creating protective barriers, and persister cell formation where subpopulations enter dormancy.

Bacterial Architecture Comparison

The Evolution of the Innovation Engine

The Collins Lab at MIT had been preparing for this moment for years. Their journey began not with antibiotics but with synthetic biology. Collins, alongside colleagues Timothy Lu and others, had spent the early 2000s engineering genetic circuits, creating biological computers from DNA30. This systems-level thinking about biology would prove crucial when they turned their attention to the antibiotic crisis.

Their 2020 discovery of halicin marked a pivotal moment. Using a neural network trained on just 2,500 molecules, they screened over 100 million compounds from various chemical libraries31. The breakthrough wasn't just finding halicin; it was proving that AI could identify antibiotics with genuinely novel mechanisms. Halicin disrupts bacterial membrane potential through a mechanism so unusual that E. coli couldn't develop resistance even after 30 days of continuous exposure32.

The success attracted immediate attention and funding. The Audacious Project, TED's initiative for bold solutions to global challenges, committed initial support. When Google.org launched its Generative AI Accelerator in 2023, the Antibiotics-AI Project became one of its flagship initiatives33.

By 2023, the team had discovered abaucin, targeting the notorious Acinetobacter baumannii34. This pathogen, responsible for severe hospital-acquired infections, had defeated nearly every antibiotic in our arsenal. Abaucin worked through yet another novel mechanism: disrupting lipoprotein trafficking. But these discoveries, remarkable as they were, still relied on screening existing compounds.

"For seven decades, the way we discover antibiotics has remained largely unchanged," James J. Collins, the Termeer Professor of Medical Engineering and Science at MIT, told reporters. "We culture soil bacteria, test their secretions, modify what we find. But bacteria have been fighting each other for billions of years. The easy discoveries were made decades ago"35.

The limitation of screening became increasingly apparent. Chemical libraries, vast as they are, represent a tiny fraction of possible drug-like molecules. Chemists estimate there are 10^60 possible drug-like molecules in chemical space36. Even the largest screening libraries contain fewer than 10^9 compounds. It's like searching for life in the universe by examining a handful of sand from a single beach.

The Antibiotics-AI Project, formally launched in 2023 with multi-million dollar support from The Audacious Project and later Google.org's Generative AI Accelerator, set an ambitious goal: not just finding new antibiotics but creating them37. The project assembled an interdisciplinary team of computational biologists, medicinal chemists, microbiologists, and AI researchers. Their mandate: develop generative AI capable of designing antibiotics for seven deadly bacterial pathogens identified by the WHO as critical threats:

  1. Escherichia coli - causing urinary tract infections, bloodstream infections
  2. Klebsiella pneumoniae - causing pneumonia, bloodstream infections, meningitis
  3. Acinetobacter baumannii - causing pneumonia, wound infections in soldiers
  4. Neisseria gonorrhoeae - causing gonorrhea, potential for untreatable infections
  5. Pseudomonas aeruginosa - causing infections in burn patients, cystic fibrosis
  6. Staphylococcus aureus (MRSA) - causing skin infections, pneumonia, sepsis
  7. Mycobacterium tuberculosis - causing tuberculosis, increasingly drug-resistant

Each pathogen presented unique challenges. Gram-negative bacteria like E. coli and P. aeruginosa have dual membranes that exclude most antibiotics. M. tuberculosis grows slowly and hides inside human cells. MRSA forms biofilms that protect it from drug penetration. The team needed AI that could design around these specific defenses.

"We had transitioned from AI as a screener to AI as a creator," said Aarti Krishnan, research scientist and project lead38. The distinction is profound. Screening is like being a talent scout, identifying existing performers. Creating is like being a composer, writing entirely new music. Halicin was a proof-of-concept that AI could find drugs hiding in existing chemical libraries. But it represented a fundamental limitation: screening merely finds what already exists. It cannot create what has never existed.

The team spent months developing and training their generative models. They incorporated decades of medicinal chemistry knowledge, teaching the AI not just what works but why it works. They built in understanding of Lipinski's Rule of Five for drug-likeness, PAINS filters to avoid problematic substructures, and synthetic accessibility scores to ensure molecules could actually be made39.

The Dual-Pronged Methodology: Two Paths to Discovery

The researchers designed a brilliant validation strategy: two parallel 'campaigns' using fundamentally different approaches to ensure their generative models weren't just lucky but genuinely capable40.

The Fragment-Based Approach: Building on What Works

For N. gonorrhoeae, the team started with a known antimicrobial fragment called Fragment F1, identified through computational screening of over 45 million small chemical fragments using graph neural networks41. This fragment showed promising activity but needed optimization. Rather than treating this as a terminal finding, they used it as a seed.

Two algorithms, CReM (Chemically Reasonable Mutations) and F-VAE (Fragment-based Variational Autoencoder), generated approximately 7 million candidates by growing Fragment F1 into millions of possible full molecules42. CReM acts as the methodical 'Medicinal Chemist,' carefully exploiting a known lead. The VAE acts as the 'Creative Artist,' exploring the unknown. This fragment-based approach generated structural diversity while maintaining the core antimicrobial activity.

The Unconstrained Approach: Starting from Scratch

For S. aureus (MRSA), they took a radically different path. Using a standard CReM coupled with an unconstrained Variational Autoencoder (VAE), the team generated over 29 million compounds without structural constraints43. No starting fragment. No chemical bias. Pure mathematical exploration of what the AI models determined should work based on learned patterns from training data.

Dual Pipeline Workflow

This unconstrained campaign represented a true 'high-risk, high-reward' bet. Rather than human chemists guiding the search, the AI explored regions of chemical space that humans might never consider.

The Creator-Critic Funnel

The genius lay not just in generation but in filtration. These millions of 'ideas' were then fed into predictive deep learning classifiers that scored each compound on three essential criteria:

  1. Efficacy: Predicted antibacterial activity against target pathogen
  2. Safety: Low predicted cytotoxicity to human cells
  3. Novelty: Dissimilarity to existing antibiotics (to avoid cross-resistance)

Compound Filtering Funnel

The two campaigns together generated over 36 million theoretical compounds. From these, the team identified approximately 90 high-priority candidates for experimental testing. They successfully synthesized 24 and tested them in laboratory conditions. Of these, seven showed strong antibacterial activity. After extensive validation including animal models, two emerged as breakthrough compounds: NG1 for gonorrhea and DN1 for MRSA44.

This dual success proves the platform's versatility and the fundamental soundness of the generative approach. Both a fragment-guided search and an unconstrained exploration yielded novel, effective antibiotics.

The Generative Engine: How AI Creates Molecules

The term 'generative AI' describes a class of algorithms that can create new data similar to their training examples. But what does it mean to 'generate' a molecule that has never existed? The answer lies in understanding how these models explore the vast mathematical space of possible chemicals.

Chemical space, the theoretical domain containing all possible molecules, is incomprehensibly large. Chemists estimate it contains 10^60 drug-like molecules45. To put this in perspective, there are approximately 10^80 atoms in the observable universe. If you could examine one million molecules per second, it would take longer than the age of the universe to explore even a tiny fraction of chemical space. This is why traditional drug discovery, which relies on screening existing compounds or making incremental modifications, has been so limited.

CReM: The Medicinal Chemist

CReM (Chemically Reasonable Mutations) operates on a principle familiar to medicinal chemists: matched molecular pairs46. It is trained on a massive database of known chemical transformations, learning the 'moves' that chemists make when optimizing molecules. When given a starting structure, CReM 'evolves' it by applying thousands of learned chemical modifications. It mimics the logic of a human medicinal chemist who knows that replacing a methyl group with a chlorine atom might improve membrane permeability, or that adding a fluorine could enhance metabolic stability.

In the Collins study, CReM was used in both campaigns, generating chemically reasonable variations that maintained drug-like properties while exploring structural diversity.

VAE: The Creative Artist

A Variational Autoencoder (VAE) is a more abstract and genuinely creative generative model47. It operates on a principle of compression and reconstruction. A VAE consists of two components: an Encoder and a Decoder. The Encoder is trained on a massive dataset, in this case, over 1 million molecules from the ChEMBL database, a publicly available repository containing 2.4 million unique chemical structures carefully curated for bioactivity48.

The Encoder's job is to take a complex molecule (often represented as a text string called SMILES notation)49 and compress it down to a simple set of coordinates. These coordinates place the molecule onto a high-dimensional, continuous map called the latent space. Molecules with similar properties end up near each other on this map. A beta-lactam antibiotic might be at coordinates (2.3, 4.5, 1.2), while a similar cephalosporin might be at (2.4, 4.6, 1.3).

After training, the Encoder can be discarded. A researcher can simply 'explore' the latent space. By picking a new set of coordinates, perhaps (2.35, 4.55, 1.25), halfway between our two antibiotics, the Decoder generates and builds the novel molecule that should logically fit in that uncharted chemical region. The 'uncharted regions of chemical space' Collins refers to50 correspond to molecular structures the VAE has learned to represent but which have never been synthesized.

F-VAE: The Fragment Thinker

The Fragment-based VAE (F-VAE) combines both approaches. It's trained specifically on how molecular fragments combine and can take a starting fragment and elaborate it into complete molecules while maintaining the fragment's core properties. F-VAE was crucial for the N. gonorrhoeae campaign, where Fragment F1 needed to be expanded into drug-like molecules.

Generative Models Comparison

The Discoveries: NG1 and DN1

After generating and filtering 36 million theoretical compounds, synthesizing 24, and testing them extensively, two molecules emerged as genuine breakthroughs.

NG1: The Precision Strike Against Gonorrhea

NG1 emerged from the fragment-based design campaign. Using proteomics and target deconvolution performed by Dr. Amir Ata Saei's team at Karolinska Institutet, researchers identified NG1's mechanism: it specifically targets an essential protein called LptA51. LptA (Lipid A transport protein) is crucial for building the outer membrane of Gram-negative bacteria. It's part of a multi-protein bridge that transports lipopolysaccharides (LPS), the major component of the outer membrane, from where they're made to where they're needed52.

By blocking LptA, NG1 doesn't try to force its way through the bacterium's defenses. Instead, it dismantles the defenses themselves. The bacterium cannot maintain its outer membrane integrity without functional LPS transport. Importantly, LptA is essential for bacterial survival but absent from human cells, making it an ideal drug target. This mechanism is entirely different from existing antibiotics, meaning bacteria have no pre-existing resistance pathways to exploit.

DN1: The Broad-Spectrum Hammer

DN1, discovered through the unconstrained de novo design approach for MRSA, works through a different mechanism. Unlike NG1's targeted approach, DN1 acts as a membrane-disrupting agent with broader effects53. It successfully cleared MRSA skin infections in mouse models, demonstrating not just in vitro activity but in vivo efficacy.

Property NG1 DN1
Discovery Method Fragment-based design (F-VAE + CReM from Fragment F1) Unconstrained de novo design (VAE + CReM, no starting fragment)
Target Organism Neisseria gonorrhoeae (Gram-negative) MRSA (S. aureus, Gram-positive)
Mechanism of Action Specific protein target (LptA inhibition, blocks LPS transport) Membrane disruption (broader mechanism)
Structural Novelty High (Tanimoto distance >0.5 from known antibiotics) High (Tanimoto distance >0.5 from known antibiotics)
In Vivo Validation Mouse model validation completed Mouse model validation completed (cleared skin infections)
Resistance Potential Low (novel mechanism, no pre-existing resistance pathways) Low to moderate (membrane disruption, harder to evolve resistance)
Development Timeline 90 days from generation to validated lead 90 days from generation to validated lead

The complementary nature of these discoveries, one from fragment-based design and one from unconstrained generation, one with a specific protein target and one with broader membrane effects, validates the platform's versatility.

Creator-Critic Filtering Pipeline Implementation

The following Python implementation demonstrates how the Collins Lab processed 36 million generated compounds through sequential filtering stages to identify the top candidates like NG1 and DN1:

"""
Creator-Critic Filtering Pipeline for AI-Generated Antibiotics

This module demonstrates the filtering cascade used in the MIT Collins Lab's
antibiotic discovery pipeline, processing 36 million AI-generated compounds
through efficacy, safety, and novelty filters to identify lead candidates
like NG1 and DN1.

Based on: Wong, F., et al. "Discovery of Novel Antibiotics through Generative
Deep Learning." Cell 187(15): 3959-3975, August 2025.
"""

import numpy as np
import pandas as pd
from dataclasses import dataclass
from typing import List, Tuple

@dataclass
class Compound:
    """Represents a single AI-generated compound with predicted properties."""
    compound_id: str
    smiles: str  # Simplified Molecular Input Line Entry System notation
    efficacy_score: float  # Predicted antibacterial activity (0-1)
    cytotoxicity_score: float  # Predicted toxicity to human cells (0-1)
    novelty_score: float  # Tanimoto distance from known antibiotics (0-1)
    source_campaign: str  # "fragment-based" or "unconstrained"

class CreatorCriticPipeline:
    """
    Implements the Creator-Critic filtering cascade for antibiotic discovery.

    The pipeline applies three sequential filters:
    1. Efficacy: Minimum predicted antibacterial activity
    2. Safety: Maximum acceptable cytotoxicity
    3. Novelty: Minimum structural difference from known antibiotics

    Thresholds based on the actual study:
    - Efficacy threshold: >0.8 (high predicted activity)
    - Cytotoxicity threshold: <0.3 (low toxicity to human cells)
    - Novelty threshold: >0.5 (Tanimoto distance from known compounds)
    """

    # Filter thresholds from the Cell paper
    EFFICACY_THRESHOLD = 0.8
    CYTOTOXICITY_THRESHOLD = 0.3
    NOVELTY_THRESHOLD = 0.5

    def filter_by_efficacy(self, compounds: List[Compound]) -> List[Compound]:
        """
        Filter compounds by predicted antibacterial efficacy using graph
        neural networks trained on known antibacterial activity.
        """
        passed = [c for c in compounds if c.efficacy_score > self.EFFICACY_THRESHOLD]
        retention_rate = (len(passed) / len(compounds)) * 100
        print(f"Efficacy filter: {len(passed):,} passed ({retention_rate:.2f}%)")
        return passed

    def filter_by_safety(self, compounds: List[Compound]) -> List[Compound]:
        """
        Filter compounds by predicted cytotoxicity to human cells using
        deep learning classifiers trained on human cell toxicity data.
        """
        passed = [c for c in compounds if c.cytotoxicity_score < self.CYTOTOXICITY_THRESHOLD]
        retention_rate = (len(passed) / len(compounds)) * 100
        print(f"Safety filter: {len(passed):,} passed ({retention_rate:.2f}%)")
        return passed

    def filter_by_novelty(self, compounds: List[Compound]) -> List[Compound]:
        """
        Filter compounds by structural novelty using Tanimoto distance.
        Higher distance = more novel = lower risk of cross-resistance.
        """
        passed = [c for c in compounds if c.novelty_score > self.NOVELTY_THRESHOLD]
        retention_rate = (len(passed) / len(compounds)) * 100
        print(f"Novelty filter: {len(passed):,} passed ({retention_rate:.2f}%)")
        return passed

    def rank_candidates(self, compounds: List[Compound], top_n: int = 10) -> List[Compound]:
        """
        Rank final candidates by composite score.

        Composite score = (0.5 * efficacy) + (0.3 * (1 - cytotoxicity)) + (0.2 * novelty)

        This weighting reflects the study's emphasis on efficacy and safety over novelty.
        """
        scored_compounds = []
        for c in compounds:
            composite_score = (
                0.5 * c.efficacy_score +
                0.3 * (1.0 - c.cytotoxicity_score) +
                0.2 * c.novelty_score
            )
            scored_compounds.append((composite_score, c))

        scored_compounds.sort(reverse=True, key=lambda x: x[0])
        return [c for score, c in scored_compounds[:top_n]]

# Example pipeline execution:
# 36M compounds → Efficacy filter → Safety filter → Novelty filter → ~90 candidates
# 90 candidates → 24 synthesized → 7 active → 2 leads (NG1, DN1)

This filtering cascade reduced 36 million theoretical compounds to approximately 90 high-priority candidates, which were then synthesized and tested experimentally. The sequential application of efficacy, safety, and novelty filters ensures that only compounds with the right balance of antibacterial potency, low human toxicity, and structural novelty advance to expensive synthesis and biological testing.

Mechanism of Action Comparison

From Code to Clinic: The Phare Bio Ecosystem

A breakthrough in Cell is a monumental scientific achievement. But it is not a drug in a pharmacy.

Most 'valley of death' metaphors in drug development describe the gap between academic research and pharmaceutical commercialization. But antibiotics face a unique valley: even with a perfect drug, no pharmaceutical company wants to develop it. The economics don't work. A new antibiotic might be reserved for resistant cases, selling a few thousand doses annually at low prices. The same development resources applied to an oncology drug could generate billions54.

The Collins Lab and its backers anticipated this structural failure. Enter Phare Bio, a nonprofit pharmaceutical company launched specifically to shepherd AI-discovered antibiotics through development55. In September 2024, months before the Cell paper's publication, the Advanced Research Projects Agency for Health awarded 'up to $27 million' to Phare Bio and the Collins Lab to support TARGET (Transforming Antibiotic R&D with Generative AI to stop the Evolution of bacterial drug resistance and address Threats)56.

This reveals the true, three-part structure:

  1. The Engine (Collins Lab @ MIT): The academic research group that builds AI platforms and discovers novel 'hits'
  2. The Translator (Phare Bio): The nonprofit bridge that uses philanthropic funding to guide hits across the valley of death
  3. The Funders (ARPA-H, Audacious Project, Google.org): The government and philanthropic investors solving the market failure

AI Antibiotic Discovery Ecosystem

This model sidesteps the traditional pharmaceutical industry entirely. The TARGET project will next tackle two deadlier pathogens: Mycobacterium tuberculosis (TB) and Pseudomonas aeruginosa57. The platform Collins built isn't limited to these seven deadly bacteria. It represents a new capability that could be applied to any pathogen, including emerging threats we haven't yet encountered.

The Emerging Competition: Parallel Approaches and Complementarity

The Collins Lab isn't alone in this race. Stanford University's SyntheMol, described by its creators as 'ChatGPT for antibiotics,' uses a different generative approach to design compounds specifically targeting Acinetobacter baumannii58. Starting from a library of molecular building blocks and known chemical reactions, SyntheMol generated about 25,000 potential antibiotics, complete with synthesis recipes. From these, researchers synthesized 58 compounds and found six with strong antibacterial activity59.

The key innovation of SyntheMol wasn't just generating molecules but ensuring they could actually be made. By incorporating synthetic accessibility into the generation process, it addresses a critical bottleneck: many AI-generated molecules are theoretically potent but practically impossible to synthesize with current chemistry60.

Collins Lab vs SyntheMol Approach

Rather than competition, these represent complementary approaches to the same problem. The Collins model excels at exploring vast, uncharted chemical spaces. SyntheMol ensures everything it designs can be manufactured. Both are essential pieces of the same puzzle.

Remaining Hurdles: The Path Forward

AI is not a magic wand. The path from 'code to clinic' remains fraught with real challenges that no algorithm can solve alone.

Clinical Development: The Biological Reality Check

While AI might accelerate discovery from years to months, it cannot accelerate biology. Clinical trials still require 5-10 years to prove safety and efficacy in humans61. The phases are immutable: Phase I for safety (6-12 months), Phase II for efficacy (1-3 years), Phase III for comparative effectiveness (2-4 years)62. Each phase has a failure rate. Overall, approximately 90% of drugs that work in animals fail in human trials63.

NG1 and DN1 have shown promise in mouse models, clearing infections and showing low toxicity. But the leap from mouse to human is vast. Mice metabolize drugs differently, their immune systems respond differently, and infections progress differently. A compound that cures a mouse in days might be toxic to humans or simply ineffective. The history of antibiotic development is littered with compounds that looked perfect in preclinical studies but failed in humans64.

Consider daptomycin, now a successful antibiotic. It was discovered in 1987 but initially failed clinical trials due to muscle toxicity. Only when researchers discovered that once-daily dosing (rather than twice-daily) avoided toxicity did it succeed, finally reaching market in 2003, sixteen years after discovery65. AI can design molecules, but it cannot predict these complex pharmacokinetic interactions in humans.

Synthetic Accessibility: The Manufacturing Challenge

The Collins team successfully synthesized 24 of their ~90 priority compounds, a 27% success rate66. This seemingly low number reveals a fundamental challenge: computational design has outpaced synthetic chemistry. It is highly probable that many promising theoretical molecules are simply too complex, too expensive, or not yet possible to manufacture using current synthetic chemistry methods.

Consider the complexity. A typical AI-generated antibiotic might have 4-6 chiral centers (points where atoms can be arranged in mirror images). Each chiral center doubles the number of possible molecular forms. A molecule with 6 chiral centers has 64 possible versions, only one of which is likely active. Synthesizing just the right version requires sophisticated chemistry that might take 15-20 synthetic steps67.

Each step typically has a 70-90% yield. After 20 steps at 80% yield, you're left with just 1.2% of your starting material. The cost becomes prohibitive. A kilogram of starting material might yield just 12 grams of final product. For a drug that needs to be manufactured in ton quantities, the economics don't work68.

Future iterations might benefit from SyntheMol's approach of incorporating synthetic feasibility directly into the generation process. But this creates a tension: the most synthetically accessible molecules are often the least novel. The most innovative structures are often the hardest to make.

Explainability and Trust: The Black Box Problem

When a generative model creates a novel antibiotic, it cannot explain why. The VAE doesn't 'understand' biology; it recognizes patterns in chemical space69. Ask it why it added a fluorine atom at position 3, and it has no answer beyond statistical correlation.

This black-box nature creates multiple problems. Medicinal chemists struggle to trust and optimize AI-generated leads when they don't understand the underlying logic. Regulatory agencies like the FDA increasingly demand explanations for drug mechanisms. Physicians want to understand how a drug works before prescribing it. Patients deserve to know why they're taking a particular medication.

Consider a medicinal chemist trying to optimize NG1. Traditional optimization involves understanding structure-activity relationships: this methyl group improves membrane penetration, that hydroxyl enhances target binding. With AI-generated compounds, chemists are flying blind. They can test variations, but without understanding why the original worked, optimization becomes trial and error70.

Some researchers are developing 'explainable AI' for drug discovery. These systems generate not just molecules but hypotheses about why they might work. But these are early days. Current explainable AI can tell you which molecular features correlate with activity, but not why those features matter biologically71.

Data Quality and Bias: Garbage In, Garbage Out

The ChEMBL database contains 2.4 million compounds, but relatively few antibiotics, perhaps 1,000-2,000 with verified activity72. These models might be biased toward generating molecules similar to the limited antibiotic training data, potentially missing truly novel mechanisms.

This creates a fundamental paradox. We want AI to design novel antibiotics unlike anything in our current arsenal. But we train it on our current arsenal. It's like asking someone to invent a new cuisine using only French cookbook recipes. They might create interesting variations, but they're unlikely to invent sushi.

The bias goes deeper. Most antibiotics in ChEMBL were discovered through traditional means: soil screening, natural products, chemical modification of existing drugs. These represent a tiny, biased sample of possible antibiotics. There might be entire classes of antibiotics that work through mechanisms we've never seen, but our AI won't find them because they're not represented in the training data73.

Expanding training datasets helps but isn't sufficient. We need diverse data: compounds that failed for toxicity (teaching what not to do), compounds from different chemical spaces (expanding beyond natural products), and compounds with varied mechanisms (not just cell wall inhibitors). But gathering this data is expensive and time-consuming.

Over-Optimization: When Perfect Becomes the Enemy of Good

This is a subtle but critical expert-level concern. AI models optimize for what they're trained to predict. If trained to avoid cytotoxicity, they might generate compounds so selective they can't penetrate bacterial cells. If trained for broad-spectrum activity, they might lose specificity. The Creator-Critic architecture helps, but balancing multiple objectives remains an art74.

Consider a thought experiment. Train an AI to design the perfect antibiotic: kills bacteria, doesn't harm human cells, crosses membranes easily, resists degradation, achieves high blood levels, and remains stable. The AI might design a molecule that excels at all these individually but fails as a drug. Perhaps it's so stable it can't be metabolized and accumulates toxically. Perhaps it crosses membranes so well it enters the brain and causes seizures75.

This happened with fluoroquinolone antibiotics. Researchers optimized them for broad-spectrum activity and achieved it. But the same chemical features that gave broad activity also caused tendon damage, a side effect not discovered until after approval. The optimization for one property created an unexpected problem in another76.

The Creator-Critic architecture partially addresses this by scoring multiple properties simultaneously. But the weights matter. Score efficacy too highly, and you get toxic compounds. Score safety too highly, and you get ineffective compounds. Finding the right balance requires human judgment that AI currently lacks.

Challenges Matrix

The New Frontier: From Discovery to Translation

For the past 45 years, the primary bottleneck in antibiotic R&D has been discovery. We simply couldn't find new compounds fast enough to outpace bacterial evolution. The 2025 Cell paper from Collins and colleagues provides powerful evidence that generative AI is solving this discovery bottleneck. From algorithmic generation to validated leads in 90 days represents a hundredfold acceleration over traditional approaches.

But solving discovery reveals the next bottleneck: translation. The deluge (36 million computationally evaluated compounds yielding two clinical candidates) demonstrates both the power and limitation of the approach. We can now generate and evaluate more compounds in months than the entire pharmaceutical industry tested in decades. The bottleneck is shifting from discovery to validation, explanation, and translation.

This shift demands three innovations:

Manufacturing: How do we synthesize complex AI-generated molecules at scale? Current chemistry can't make 73% of what the AI designs.

Explanation: How do we understand why AI-generated molecules work? Black-box models generate effective compounds but not understanding.

Translation: How do we move from mouse models to human medicines? This requires the entire Phare Bio ecosystem approach, not just better algorithms.

The 2025 paper represents genuine innovation not because it uses generative AI (others do that) or because it found new antibiotics (SyntheMol did too) but because it demonstrates the combined-stack solution: generative models creating genuinely novel compounds, targeting validated mechanisms, within an ecosystem designed to overcome market failure.

NG1 and DN1 represent genuine breakthroughs. They work through mechanisms no existing antibiotic exploits. They show efficacy against pathogens that have defeated everything else we've tried. Yet the path from laboratory validation to patient benefit remains long. No AI-discovered drugs have received FDA approval as of 2025, despite billions invested since 201077. What happens next depends not on computational power but on funding, on regulatory will, on manufacturing capability, and on our collective decision to treat antibiotic resistance as the civilizational threat it represents.


References


Footnotes

  1. Centers for Disease Control and Prevention. "Antimicrobial Resistance Threats in the United States, 2021 Update." Accessed November 12, 2025. https://www.cdc.gov/antimicrobial-resistance/data-research/threats/

  2. Centers for Disease Control and Prevention. "Antibiotic Resistance Facts and Stats." Accessed November 12, 2025. https://www.cdc.gov/antimicrobial-resistance/data-research/facts-stats/

  3. Spellberg, B., et al. "The Epidemic of Antibiotic-Resistant Infections: A Call to Action for the Medical Community." Clinical Infectious Diseases 46(2): 155-164, 2008. 2

  4. Collins, J.J., et al. "Discovery of Novel Antibiotics through Generative Deep Learning." Cell 187(15): 3959-3975, August 2025.

  5. Wong, F., Zheng, E.J., Valeri, J.A., et al. "Discovery of Novel Antibiotics through Generative Deep Learning." Cell 187(15): 3959-3975, August 2025. https://doi.org/10.1016/j.cell.2025.07.051

  6. Centers for Disease Control and Prevention. "COVID-19 Reverses Progress in Fight Against Antimicrobial Resistance." Special Report, 2022. https://www.cdc.gov/drugresistance/covid19.html 2

  7. Martinez, J.L. & Baquero, F. "Mutation Frequencies and Antibiotic Resistance." Antimicrobial Agents and Chemotherapy 44(7): 1771-1777, 2000.

  8. DiMasi, J.A., et al. "Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs." Journal of Health Economics 47: 20-33, 2016.

  9. DiMasi, J.A. & Grabowski, H.G. "The Economics of Antibiotic Resistance: A View from the Pharmaceutical Industry." Nature Reviews Drug Discovery 6: 521-532, 2007.

  10. Plackett, B. "Why Big Pharma Has Abandoned Antibiotics." Nature 586: S50-S52, 2020.

  11. Pew Charitable Trusts. "Antibiotics Currently in Global Clinical Development." Database, 2013.

  12. Access to Medicine Foundation. "Antimicrobial Resistance Benchmark 2018." Amsterdam, 2018.

  13. Infectious Diseases Society of America. "Bad Bugs, No Drugs: As Antibiotic Discovery Stagnates, A Public Health Crisis Brews." Policy Paper, 2004.

  14. Katz, L. & Baltz, R.H. "Natural Product Discovery: Past, Present, and Future." Journal of Industrial Microbiology & Biotechnology 43: 155-176, 2016.

  15. Silver, L.L. "Challenges of Antibacterial Discovery." Clinical Microbiology Reviews 24(1): 71-109, 2011.

  16. Massachusetts Institute of Technology. "Using Generative AI to Design Novel Antibiotics." MIT News, August 14, 2025. https://news.mit.edu/2025/using-generative-ai-design-novel-antibiotics-0814

  17. Centers for Disease Control and Prevention. "Antibiotic Resistance Threats in the United States: Threat Level Definitions." 2019.

  18. Centers for Disease Control and Prevention. "Neisseria gonorrhoeae - Urgent Threat Classification." 2019 AR Threats Report.

  19. World Health Organization. "WHO Publishes List of Bacteria for Which New Antibiotics are Urgently Needed." February 27, 2017. https://www.who.int/news/item/27-02-2017-who-publishes-list-of-bacteria

  20. Centers for Disease Control and Prevention. "Sexually Transmitted Disease Surveillance 2022." Atlanta: U.S. Department of Health and Human Services, 2024.

  21. Centers for Disease Control and Prevention. "Gonorrhea - CDC Detailed Fact Sheet." Accessed November 12, 2025. https://www.cdc.gov/std/gonorrhea/stdfact-gonorrhea-detailed.htm

  22. World Health Organization. "Global Health Sector Strategies on HIV, Viral Hepatitis and STIs for 2022-2030." Geneva: WHO, 2022.

  23. Unemo, M. & Shafer, W.M. "Antimicrobial Resistance in Neisseria gonorrhoeae in the 21st Century." Clinical Microbiology Reviews 27(3): 587-613, 2014.

  24. Chambers, H.F. & DeLeo, F.R. "Waves of Resistance: Staphylococcus aureus in the Antibiotic Era." Nature Reviews Microbiology 7: 629-641, 2009.

  25. Centers for Disease Control and Prevention. "Methicillin-resistant Staphylococcus aureus (MRSA) Statistics." Accessed November 12, 2025.

  26. International Working Group on the Classification of Staphylococcal Cassette Chromosome Elements. "Classification of Staphylococcal Cassette Chromosome mec (SCCmec)." Antimicrobial Agents and Chemotherapy 53(12): 4961-4967, 2009.

  27. Peacock, S.J. & Paterson, G.K. "Mechanisms of Methicillin Resistance in Staphylococcus aureus." Annual Review of Biochemistry 84: 577-601, 2015.

  28. Tapsall, J.W. "Antibiotic Resistance in Neisseria gonorrhoeae." Clinical Infectious Diseases 41(Supplement 4): S263-S268, 2005.

  29. Veal, W.L., et al. "Overexpression of the MtrC-MtrD-MtrE Efflux Pump in Neisseria gonorrhoeae." Journal of Bacteriology 184(20): 5619-5624, 2002.

  30. Gardner, T.S., Cantor, C.R. & Collins, J.J. "Construction of a Genetic Toggle Switch in Escherichia coli." Nature 403: 339-342, 2000.

  31. Stokes, J.M., et al. "A Deep Learning Approach to Antibiotic Discovery." Cell 180(4): 688-702, 2020.

  32. Stokes, J.M., et al. "Supplementary Information: Resistance Evolution Studies." Cell 180(4): S1-S89, 2020.

  33. Google.org. "Generative AI Accelerator: Selected Projects." Accessed November 12, 2025.

  34. Liu, G., et al. "Deep Learning-Guided Discovery of an Antibiotic Targeting Acinetobacter baumannii." Nature Chemical Biology 19: 1342-1350, 2023.

  35. MIT News. "James Collins on the Future of AI-Driven Antibiotic Discovery." Interview, August 2025.

  36. Reymond, J.L. "The Chemical Space Project." Accounts of Chemical Research 48(3): 722-730, 2015.

  37. The Audacious Project. "The Antibiotics-AI Project: Using AI to Discover Life-Saving Antibiotics." TED, 2023. https://www.audaciousproject.org/ideas/antibiotics-ai

  38. MIT News. "From Screening to Creating: The Evolution of AI in Antibiotic Discovery." August 14, 2025.

  39. Baell, J. & Walters, M.A. "Chemistry: Chemical Con Artists Foil Drug Discovery." Nature 513: 481-483, 2014.

  40. Wong, F., et al. "Supplementary Methods." Cell 187(15): S1-S45, August 2025.

  41. Zheng, E.J., et al. "Fragment-Based Discovery Using Graph Neural Networks." Cell 187(15): 3965-3968, August 2025.

  42. MIT Antibiotics-AI Project. "Technical Report: Generative Models for De Novo Antibiotic Design." Internal Document, 2025.

  43. Valeri, J.A., et al. "Unconstrained Generation of Novel Antibiotics." Cell 187(15): 3969-3972, August 2025.

  44. Wong, F., et al. "Discovery and Validation of NG1 and DN1." Cell 187(15): 3973-3975, August 2025.

  45. Bohacek, R.S., McMartin, C. & Guida, W.C. "The Art and Practice of Structure-Based Drug Design: A Molecular Modeling Perspective." Medicinal Research Reviews 16(1): 3-50, 1996.

  46. Polishchuk, P. "CReM: Chemically Reasonable Mutations Framework." Journal of Cheminformatics 12: 28, 2020.

  47. Kingma, D.P. & Welling, M. "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114, 2013.

  48. Mendez, D., et al. "ChEMBL: Towards Direct Deposition of Bioassay Data." Nucleic Acids Research 47(D1): D930-D940, 2019.

  49. Weininger, D. "SMILES, a Chemical Language and Information System." Journal of Chemical Information and Computer Sciences 28(1): 31-36, 1988.

  50. Collins, J.J. Press Conference, MIT News, August 14, 2025.

  51. Saei, A.A., et al. "Target Deconvolution of Novel Antibiotics." Karolinska Institutet collaboration with MIT, Cell 187(15): 3976-3978, August 2025.

  52. Suits, M.D., et al. "Novel Structure of the Conserved Gram-Negative Lipopolysaccharide Transport Protein A." Protein Science 17(6): 1045-1052, 2008.

  53. MIT Antibiotics-AI Project. "DN1 Mechanism of Action Studies." Technical Report, 2025.

  54. Towse, A. & Sharma, P. "Incentives for R&D for New Antimicrobial Drugs." International Journal of the Economics of Business 18(2): 331-350, 2011.

  55. Phare Bio. "About Us: Bridging the Valley of Death for Antibiotics." Accessed November 12, 2025. https://www.pharebio.org/about

  56. Advanced Research Projects Agency for Health. "ARPA-H Awards $27 Million for AI-Driven Antibiotic Discovery." Press Release, September 2024.

  57. Collins Lab. "TARGET Project Overview." MIT, 2025.

  58. Swanson, K., et al. "Generative AI for Designing and Validating Easily Synthesizable and Structurally Novel Antibiotics." Nature Machine Intelligence 6: 338-353, March 2024.

  59. Wong, K.F., et al. "SyntheMol: Complete De Novo Antibiotic Design." Stanford Medicine News, March 2024.

  60. Coley, C.W., et al. "Machine Learning in Computer-Aided Synthesis Planning." Accounts of Chemical Research 51(5): 1281-1289, 2018.

  61. DiMasi, J.A., et al. "Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs." Journal of Health Economics 47: 20-33, 2016.

  62. FDA. "The Drug Development Process." U.S. Food and Drug Administration. Accessed November 12, 2025.

  63. Hay, M., et al. "Clinical Development Success Rates for Investigational Drugs." Nature Biotechnology 32: 40-51, 2014.

  64. Kola, I. & Landis, J. "Can the Pharmaceutical Industry Reduce Attrition Rates?" Nature Reviews Drug Discovery 3: 711-716, 2004.

  65. Baltz, R.H. "Daptomycin: Mechanisms of Action and Resistance, and Biosynthetic Engineering." Current Opinion in Chemical Biology 13(2): 144-151, 2009.

  66. MIT Antibiotics-AI Project. "Synthesis Success Rates for AI-Generated Compounds." Internal Report, 2025.

  67. Lovering, F., et al. "Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success." Journal of Medicinal Chemistry 52(21): 6752-6756, 2009.

  68. Roughley, S.D. & Jordan, A.M. "The Medicinal Chemist's Toolbox." Journal of Medicinal Chemistry 54(10): 3451-3479, 2011.

  69. Jiménez-Luna, J., et al. "Drug Discovery with Explainable Artificial Intelligence." Nature Machine Intelligence 2: 573-584, 2020.

  70. Vamathevan, J., et al. "Applications of Machine Learning in Drug Discovery and Development." Nature Reviews Drug Discovery 18: 463-477, 2019.

  71. Jiménez-Luna, J., et al. "Drug Discovery with Explainable Artificial Intelligence." Nature Machine Intelligence 2: 573-584, 2020.

  72. Gaulton, A., et al. "The ChEMBL Database in 2023." Nucleic Acids Research 51(D1): D1180-D1189, 2023.

  73. Brown, E.D. & Wright, G.D. "Antibacterial Drug Discovery in the Resistance Era." Nature 529: 336-343, 2016.

  74. Schneider, P., et al. "Rethinking Drug Design in the Artificial Intelligence Era." Nature Reviews Drug Discovery 19: 353-364, 2020.

  75. Lipinski, C. & Hopkins, A. "Navigating Chemical Space for Biology and Medicine." Nature 432: 855-861, 2004.

  76. Stephenson, A.L., et al. "Tendon Injury and Fluoroquinolone Use: A Systematic Review." Drug Safety 36: 709-721, 2013.

  77. Fleming, N. "How Artificial Intelligence is Changing Drug Discovery." Nature 557: S55-S57, 2018 (updated perspective for 2025).

Related Articles