Carbon-3B: A 3B DNA Foundation Model That Matches Evo2-7B at 150x the Speed
Carbon-3B matches Evo2-7B on sequence recovery, variant-effect prediction, and motif-perturbation discrimination while generating DNA over 150 times faster.
Pragmatic advisory and empirical analysis for machine learning in genomics, proteins, and drug discovery — what holds up on a held-out test set, what doesn't, and what it means if you build with it. The writing here is the practice thinking in the open.
The pitch for genomic foundation models is that one pretrained network now beats task-specific tools across the board, from regulatory annotation to clinical variant interpretation.
Carbon-3B matches Evo2-7B on sequence recovery, variant-effect prediction, and motif-perturbation discrimination while generating DNA over 150 times faster.
A 1-billion-parameter model conditioned on RNA chemical-probing reactivity folds a held-out transcript to an F1 of 0.987 against an experimentally guided reference.
ESMFold predicts atomic-level structure from a single sequence. No multiple sequence alignment, no database search, no Evoformer churning over homologs.
Fewer than one in a thousand neurons in a large language model can predict whether it's about to hallucinate -- and they encode something unexpected: not factual errors, but a tendency toward compliance over truth.
Reinforcement learning works when you can check the answer. A chess engine wins or loses. A code-generation model passes or fails the test suite.
Vision-language models can describe a scene in paragraph-length detail and still fail to tell you whether a red cube is in front of or behind a blue cylinder.
Sequence models, variant-effect prediction, and the work to read non-coding DNA.
Single-sequence folding, protein language models, and structure without alignment.
Contrastive screening, protein–ligand interaction, and design that survives the wet lab.
What models encode, where they fail, and how evaluation and oversight hold up.
rewire.it works with biotech, pharma, and research teams deciding whether and how to adopt a given AI-in-bio method — feasibility reviews, evaluation design, and held-out benchmarking. Independent, candid about where machine learning helps and where it doesn't.