Carbon-3B: A 3B DNA Foundation Model That Matches Evo2-7B at 150x the Speed2026-05-29T00:00:00Z•25 min read#genomics#foundation-models#dna-language-models#tokenization#transformers#variant-effect-predictionCarbon-3B matches Evo2-7B on sequence recovery, variant-effect prediction, and motif-perturbation discrimination while generating DNA over 150 times faster.
Why Language Models Still Can't Spell: The Case for Morphologically-Aware Tokenization2024-09-03T13:47:30+01:00•38 min read#nlp#tokenization#morphology#linguistics#ai#language-modelsLanguage models can write poetry but struggle with basic spelling. Discover why current tokenization breaks language, and how morphology-aware approaches fix it.