Here is the sentence that should reorganize how you think about this entire field: the most accurate biomolecular structure predictor available, AlphaFold3, ships its model weights under a custom non-commercial license, and you cannot even download them. Academic and non-commercial researchers must request the parameters through a Google form, granted at DeepMind's sole discretion, with no redistribution permitted and outputs barred from training competing models.1
That is not a footnote to the AlphaFold3 story. For anyone who has to ship something, it is the story.
The reversal is sharper when you look at where the field came from. In January 2022, DeepMind quietly relicensed the AlphaFold2 weights from non-commercial to CC BY 4.0, opening them to commercial use with attribution.2 Two years later, when AlphaFold3 shipped, the weights went the other way: research-only, no redistribution, no commercial use.1 One model relicensed up, into the open. Its successor relicensed down, behind a gate.
The protein structure prediction subfield now contains fifteen distinct foundation models.1 Most surveys rank them by accuracy. That is the wrong first axis for a practitioner, because the accuracy leader is the one model a commercial team legally cannot deploy. The axis that actually constrains your work is the one almost nobody leads with: who is allowed to run the weights, for what, and at what cost. This guide leads with that.
Two ledgers
Keep two ledgers while reading any structure-prediction paper. The first records capability: what a model can demonstrably do on a benchmark its authors chose. The second records validity: whether the number survives a held-out test set, an honest baseline, and the failure modes you actually hit in production. The two diverge constantly. A model can post a state-of-the-art DockQ and still hand you a confidently wrong interface for the one complex you care about. AlphaFold3 tops the first ledger and, for a for-profit lab, fails the second on licensing alone.
The field splits cleanly into two technical lineages and one licensing fault line that cuts across both.
The lineages descend from two 2021 papers. AlphaFold2 introduced the Evoformer trunk, a stack of 48 attention blocks over a multiple-sequence-alignment representation and a residue-residue pair representation, feeding a structure module that emits 3D coordinates.2 Weeks earlier the same year, RoseTTAFold introduced a "three-track" network that jointly processes 1D sequence, 2D residue-pair, and 3D coordinate information, with information flowing between all three tracks.3 Almost everything since is a descendant, an open reimplementation, or a single-sequence shortcut around the expensive MSA step.
The fault line is licensing. On one side sit the permissively licensed models you can put in a commercial pipeline today: AlphaFold2, AlphaFold-Multimer, ESMFold, OpenFold, Boltz-1, Boltz-2, Chai-1, Protenix, OmegaFold, RoseTTAFold2, RoseTTAFold All-Atom, RoseTTAFold2NA. On the other side sit the restricted ones: AlphaFold3 and HelixFold3, both non-commercial, plus the original RoseTTAFold v1 weights.143 The capability frontier and the open frontier are not the same models, and the gap between them is where most of the practical decisions get made.
Five failure modes, before any benchmark
Before the models, the failure modes. These are the things that move a number from the capability ledger to the validity ledger, and every model below inherits some subset. Read the benchmark tables later with these in mind.
Overconfident wrong interfaces. AlphaFold-Multimer can produce confident but wrong interfaces and false-positive contacts.5 The confidence head is not a lie detector. A high ipTM on a protein-protein complex is evidence, not proof.
Hallucinated geometry. AlphaFold3 can produce confident-looking but spurious structures, including chirality and clashing-atom errors, and stochastic diffusion can yield hallucinations in unstructured regions.1 Boltz-1 documents the same failure class: overlapping or stacked identical chains in large complexes, explicitly noted as shared with AlphaFold3.6
MSA-depth dependence. AlphaFold2 accuracy depends on MSA depth, and the model has limited accuracy for orphan sequences that lack deep alignments.2 This is the recurring tax across the entire MSA-based family. A glowing benchmark on well-aligned targets tells you nothing about your orphan protein.
Disordered regions. AlphaFold2 has limited accuracy for intrinsically disordered regions, which land in the low-pLDDT bucket, and AlphaFold3 is also weaker there.21 If your biology lives in a disordered region, a low confidence score is telling you something true: do not trust the coordinates.
No ensembles. AlphaFold2 and AlphaFold3 predict mostly static single conformations, not conformational ensembles or dynamics.21 Boltz-2 is the partial exception, training on molecular-dynamics data toward ensembles, and even there the authors note it fails to capture large binding-induced conformational changes.7
The numbers in the tables below are real. So are these five.
The master table
Start with the map. Bold names link to the per-model notes further down.
| Model | Released | Architecture | Params | License (weights) | Commercial? |
|---|---|---|---|---|---|
| AlphaFold2 | 20212 | Evoformer + structure module2 | ~93M2 | CC BY 4.02 | Yes |
| AlphaFold-Multimer | 2021.105 | AF2 adapted for complexes5 | undisclosed5 | CC BY 4.05 | Yes |
| AlphaFold3 | 2024.051 | Pairformer + diffusion, all-atom1 | undisclosed1 | custom non-commercial1 | No |
| Boltz-1 | 2024.116 | AF3-style, MSA + Pairformer + diffusion6 | undisclosed6 | MIT6 | Yes |
| Boltz-2 | 2025.067 | AF3-style + affinity module7 | undisclosed7 | MIT7 | Yes |
| Chai-1 | 2024.098 | AF3-like + 3B PLM track8 | undisclosed8 | Apache-2.08 | Yes |
| ESMFold | 20229 | ESM-2 LM + folding head, no MSA9 | ~15B (with ESM-2 15B)9 | MIT9 | Yes |
| HelixFold3 | 2024.084 | AF3-style in PaddlePaddle4 | undisclosed4 | non-commercial4 | No |
| OmegaFold | 2022.0710 | OmegaPLM + Geoformer, no MSA10 | ~670M (PLM)10 | Apache-2.010 | Yes |
| OpenFold | 202211 | AF2 reimplementation, PyTorch11 | ~93M11 | Apache-2.011 | Yes |
| Protenix | 2025.0112 | AF3 reproduction, PyTorch12 | 368M (v1) / 464M (v2)12 | Apache-2.012 | Yes |
| RoseTTAFold | 2021.073 | three-track network3 | undisclosed3 | mixed by version3 | v1 weights: No |
| RoseTTAFold All-Atom | 2023.1013 | three-track + atom-bond graph13 | undisclosed13 | BSD (weights)13 | Yes, with caveats |
| RoseTTAFold2NA | 2022.0914 | three-track + nucleic acids14 | ~67M14 | MIT14 | Yes |
| trRosetta / trRosetta2 | 202015 | 2D ResNet + Rosetta refinement15 | undisclosed15 | MIT code, Rosetta-gated pipeline15 | Pipeline: gated |
Table 1: The 15 protein structure prediction foundation models. "Undisclosed" means the developers did not publish a parameter count, not that the model is small. Architecture, date, and license details per the model papers.125678941011123131415
Two things jump out. First, parameter counts are mostly undisclosed. DeepMind never published a total for AlphaFold2 (the commonly cited ~93M is inferred from the released weights), AlphaFold-Multimer, or AlphaFold3.251 The open reimplementations are more transparent: Protenix-v1 is 368M parameters and Protenix-v2 is 464M.12 Second, the commercial-use column is binary and it does not track accuracy. The "No" rows are among the most capable models in the table.
Two eras: Evoformer, then diffusion
The cleanest way to organize fifteen models is by architecture generation, and there are two.
The Evoformer era runs from AlphaFold2 (2021) through its direct descendants. AlphaFold2 is an end-to-end network with two stages: the Evoformer trunk doing attention over an MSA representation and a pair representation with triangle attention for geometric consistency, feeding a structure module that produces 3D coordinates as per-residue rotations and translations.2 It is not a transformer language model; it is a specialized geometric architecture, roughly 93M parameters across the released ensemble.2 At CASP14 it reached median backbone accuracy around 0.96 Å RMSD and median domain GDT_TS of 92.4 out of 100, the result that ended the structure prediction problem as an open research question for most single-domain proteins.2
The diffusion era begins with AlphaFold3 in May 2024.1 The architecture changed in two places. The Evoformer became a Pairformer with reduced reliance on MSA processing. The structure module was replaced by a diffusion module that predicts raw atom coordinates directly through a multiscale denoising process.1 Instead of predicting residue frames and reconstructing atoms from them, AlphaFold3 denoises every atom directly, which is what lets a single model handle proteins, DNA, RNA, ligands, ions, and modified residues in one pass.1
Figure 1: The two architecture eras. The diffusion era opens with AlphaFold3 and is immediately mirrored by open reimplementations because the original shipped with gated weights. Lineage and architecture per AlphaFold2,2 AlphaFold3,1 RoseTTAFold,3 ESMFold,9 Boltz-1,6 Chai-1,8 Protenix,12 RoseTTAFold All-Atom,13 RoseTTAFold2NA.14
Why does this matter to a practitioner and not just an architecture historian? Because the diffusion models are the only ones that fold a protein, its nucleic-acid partners, and its bound ligand jointly, and because the open reimplementations are how you get that capability without AlphaFold3's licensing handcuffs. Everything downstream follows from that one shift.
The openness matrix, the part that actually decides things
This is the centerpiece. The license on the weights, not the code, is what governs your deployment. A model can publish Apache-2.0 source and still gate the weights, which is precisely what AlphaFold3 does.1 License compliance here is not a single checkbox; it is a dependency graph. Read the weights license, then read the pipeline's bundled dependencies.
| Model | Code | Weights | Weights downloadable? | Commercial use | Notable trap |
|---|---|---|---|---|---|
| AlphaFold2 | Apache-2.02 | CC BY 4.02 | Yes | Permitted (attribution)2 | Relicensed from CC BY-NC in Jan 20222 |
| AlphaFold-Multimer | Apache-2.05 | CC BY 4.05 | Yes | Permitted5 | Protein-only complexes; no ligands or NA5 |
| AlphaFold3 | Apache-2.01 | custom non-commercial1 | No, gated request form1 | Forbidden1 | Outputs may not train competing models1 |
| Boltz-1 / Boltz-2 | MIT67 | MIT67 | Yes67 | Permitted67 | None at the weights level |
| Chai-1 | Apache-2.08 | Apache-2.08 | Yes8 | Permitted8 | Launched non-commercial Sep 2024, relicensed Nov 20248 |
| ESMFold | MIT9 | MIT9 | Yes9 | Permitted9 | Successor ESM3 is non-commercial; do not conflate9 |
| OpenFold / Protenix | Apache-2.01112 | Apache-2.01112 | Yes1112 | Permitted1112 | None at the weights level |
| OmegaFold | Apache-2.010 | Apache-2.010 | Yes10 | Permitted10 | Repo dormant since 202210 |
| HelixFold3 | non-commercial4 | non-commercial4 | Yes (but NC)4 | Forbidden (free tier)4 | README says CC BY-NC-SA, LICENSE file is custom; ambiguity unresolved4 |
| RoseTTAFold v1 | MIT3 | Rosetta-DL, non-commercial3 | Yes (but NC)3 | Forbidden (v1 weights)3 | RF2 and RFAA are permissive; only v1 is restricted3 |
| RoseTTAFold All-Atom | BSD13 | BSD13 | Yes13 | Permitted, with caveats13 | Bundled PDB100 templates are CC BY-NC-SA; SignalP6 separately licensed13 |
| RoseTTAFold2NA | MIT14 | MIT14 | Yes14 | Permitted14 | None at the weights level |
| trRosetta / trRosetta2 | MIT15 | open15 | Yes15 | Code yes, full pipeline gated15 | Rosetta/PyRosetta refinement needs a paid commercial license15 |
Table 2: Openness matrix, sorted by commercial usability. The headline license often lies about the deployable license. Per-model license terms from the model papers and repositories.125678941011123131415
The traps matter. RoseTTAFold All-Atom ships a permissive BSD license on its own code and weights, but the default pipeline pulls in a PDB100 template database under CC BY-NC-SA and a SignalP6 dependency under academic terms, so a fully commercial deployment is not as simple as the headline license suggests.13 trRosetta is MIT at the network level, but the structure-building step depends on Rosetta/PyRosetta, which is free for academic use and requires a paid commercial license for for-profit work.15 HelixFold3's own documentation contradicts itself: the README says CC BY-NC-SA while the LICENSE file is a custom non-commercial license, and the discrepancy is unresolved.4 When the license is ambiguous, treat it as restrictive.
The cleanest commercial story belongs to the Boltz family: MIT on both code and weights, with no trap at the weights level.67 Permissive code does not imply permissive weights, and permissive weights do not imply a permissive pipeline. Read all three.
Capability over time
Two charts frame the trajectory. The first shows when each family arrived; structure prediction went from a single 2020 method to a dense cluster of AlphaFold3-class models in 2024 and 2025.
Figure 2: Release timeline by family. Note the density of releases after May 2024, when the open reimplementations of AlphaFold3 arrived in quick succession. Dates from the model papers.1256789410111231415
The second shows the parameter spread, which is wide and partly a fiction of disclosure. Where counts exist, they range from RoseTTAFold2NA at ~67M to ESMFold's ~15B total when paired with the ESM-2 15B language model.149 Most AlphaFold3-class models simply do not report a number.
Figure 3: Disclosed parameter counts on a log scale. The gaps are the point: most diffusion-era models do not publish a total. Counts from the model papers and repositories.29101214
A note on reading benchmarks: a benchmark number is a hypothesis about real-world performance, not a measurement of it. Several of the "beats AF3" claims below are self-reported by a model's own authors on their own test splits.12 Treat them as directional.
The accuracy reference points
Here are the headline numbers that anchor the field, each on a different benchmark, which is itself a warning that cross-model comparison is harder than it looks. The metric and the test set stay attached to every claim, because a DockQ on 17 heterodimers and a DockQ on 541 recent-PDB structures6 are not the same claim.
| Model | Benchmark | Headline result |
|---|---|---|
| AlphaFold2 | CASP14 (2020) | median domain GDT_TS 92.4, ~0.96 Å backbone RMSD2 |
| AlphaFold-Multimer | 17 template-free heterodimers | medium accuracy (DockQ ≥ 0.49) on 13/17, high (≥ 0.8) on 7/17, vs 9 and 4 for prior SOTA5 |
| AlphaFold3 | protein-other-molecule interactions | ~50% improvement over prior methods, ~2x for some categories1 |
| Boltz-1 | recent-PDB test, top-1 | mean LDDT ~0.716, LDDT-PLI ~0.580; on par with AF3 and Chai-16 |
| Boltz-2 | CASP16 affinity, FEP+, MF-PCBA | beat all CASP16 entrants out of the box; Pearson R ~0.66; EF 18.4 at 0.5%7 |
| Chai-1 | PoseBusters / multimer DockQ | ~77% ligand success (AF3 ~76%); 69.8% DockQ (AF-Multimer 67.7%)8 |
| ESMFold | CAMEO / CASP14 | TM-score ~0.83 / ~0.68; up to 60x faster than MSA pipelines9 |
| OmegaFold | antibodies / orphans | RMSD 2.12 Å vs AF2 2.98 Å; orphan TM 0.73 vs AF2 0.6010 |
| RoseTTAFold2NA | protein-NA complexes (n=224) | mean lDDT 0.73; 13/14 on held-out vs 1/14 for AF+docking14 |
Table 3: Headline benchmarks with metric and test set attached. This is a map of what each paper chose to report, not a leaderboard. Results per the model papers.12567891014
The CASP14 GDT_TS of 92.4 is the one number in this whole field that earned its reputation under genuinely blind conditions; it approached experimental accuracy and was far ahead of every other group.2 That is validity, not just capability. Everything after is about extending it to complexes, ligands, and nucleic acids, and about doing it under a license you can use.
Two readings of the rest of that table. First, the open AlphaFold3 reimplementations cluster around AlphaFold3, not above it, despite the marketing. Boltz-1's authors are careful: across protein-protein DockQ and protein-ligand metrics, the differences from AlphaFold3 and Chai-1 sit within 95% bootstrap confidence intervals, with AlphaFold3 holding a slight edge on overall mean lDDT attributed to its extra RNA/DNA distillation data.6 Read those claims as "competitive," not "decisively better." Second, the most sobering number is RoseTTAFold2NA's: across 224 monomeric protein-nucleic-acid complexes the mean lDDT is a respectable 0.73, but only about 29% of predictions reach lDDT above 0.8.14 Nucleic-acid complex prediction is still hard, and the confidence estimate is how you filter the usable predictions from the rest.
The models, by what you would actually use them for
Single-chain protein folding: AlphaFold2, ESMFold, OmegaFold, OpenFold
If you need a monomer structure and you have homologs, AlphaFold2 is still the open default. Its weights moved to CC BY 4.0 in January 2022, which permits commercial use with attribution, and the AlphaFold Protein Structure Database already holds more than 200 million precomputed structures, so for many targets you do not even run inference.2 It is superseded for state-of-the-art by AlphaFold3 but remains the practical workhorse because AlphaFold3's weights are gated.2
OpenFold is the trainable, memory-efficient PyTorch reimplementation of AlphaFold2 from the AlQuraishi lab. It matches AF2 accuracy when trained from scratch, ships Apache-2.0 on code, weights, and training data, and exists precisely so you can retrain and fine-tune, which the original AF2 release does not support cleanly.11 If you need to adapt the architecture rather than just call it, this is the one.
When you lack homologs, MSAs stop helping. ESMFold replaces the alignment step with embeddings from the ESM-2 protein language model (up to 48 layers, 15B parameters), folding directly from a single sequence.9 It is MIT-licensed, runs up to 60x faster than MSA-based predictors (a 384-residue protein in ~14.2s on a single V100), and produced the ESM Metagenomic Atlas of more than 617 million predicted structures, over 600 million of them computed in roughly two weeks on about 2000 GPUs.9 The accuracy trade is real: CAMEO TM-score ~0.83 versus ~0.88 for the full AF2 pipeline.9 You buy speed and orphan-protein coverage with a few points of accuracy. One trap: the ESMFold weights are MIT, but the successor ESM3 from EvolutionaryScale is non-commercial. Do not conflate them.9
OmegaFold is the other single-sequence option, pairing a ~670M-parameter language model (OmegaPLM) with a geometry-focused Geoformer.10 On antibodies it reported lower error than AlphaFold2 (RMSD 2.12 Å vs 2.98 Å, p=0.0017) and on orphan proteins a higher TM-score (0.73 vs 0.60, p=0.0238).10 It is Apache-2.0 and commercially usable. The honest caveat: the repository has been dormant since 2022, the preprint was never peer-reviewed, and the niche has largely been taken over by ESMFold.10 Treat it as a fast MSA-free baseline rather than a maintained tool.
Complexes and all-atom prediction: the AlphaFold3-class fight
This is where the licensing fault line bites hardest, because the accuracy reference and the deployable options diverge.
AlphaFold3 is the reference. It replaced the Evoformer with a Pairformer trunk feeding a diffusion module that predicts raw atom coordinates, and it handles proteins, DNA, RNA, ligands, ions, and modifications in one model.1 DeepMind reports roughly 50% improvement over prior methods for protein interactions with other molecule types, with accuracy roughly doubled for some categories; a December 2024 Addendum later corrected some reported metrics, which is worth knowing if you cite the original numbers.116 And you almost certainly cannot use it. The weights are non-commercial, gated behind a discretionary request form, non-redistributable, and barred from training competing models; commercial structure prediction is handled internally by Isomorphic Labs.1 Its real contribution to most teams is the architecture, which the open ecosystem promptly reimplemented.
Boltz-2 is the model most teams should reach for. It is an AlphaFold3-style architecture extended with a binding affinity module, MIT-licensed on code and weights, and as of mid-2026 the most widely adopted open AlphaFold3 alternative.7 I will give it its own section below, because affinity prediction is a genuinely different capability from structure.
Boltz-1, its predecessor, was the proof of concept: the first fully open, commercially usable model to approach AlphaFold3-level accuracy.6 On the recent-PDB test set its top-1 mean LDDT was ~0.716, with protein-protein and protein-ligand metrics on par with AlphaFold3 and Chai-1 inside the bootstrap confidence intervals.6 It shares AlphaFold3's failure class: hallucinated overlapping or stacked chains in large complexes, which the Boltz-1x variant's inference-time steering largely fixes.6 For most new work, go straight to Boltz-2.
Chai-1 from Chai Discovery is an AlphaFold3-like model with a twist: it adds a track of embeddings from a ~3B-parameter protein language model to enable strong single-sequence prediction.8 It edges AlphaFold3 on PoseBusters protein-ligand success (~77% vs ~76%) and exceeds AlphaFold-Multimer 2.3 on multimer DockQ (~69.8% vs ~67.7%), notably even in single-sequence mode.8 The licensing history is the cautionary tale: it launched in September 2024 under a restrictive non-commercial license and was relicensed to Apache-2.0 around late November 2024.8 The current weights are commercially usable; verify the LICENSE in the repo for the version you pull, because the history is not uniform.
Protenix, from ByteDance, is a full PyTorch reproduction of AlphaFold3, Apache-2.0 on both code and weights.12 Protenix-v1 (368M) was described as the first fully open-source model to surpass AlphaFold3 across multiple benchmark sets, and v2 (464M) adds gains on antibody-antigen complexes.12 That "surpass AF3" claim is self-reported on the developers' own benchmarks, so weight it accordingly, but the transparency is real: this is one of the few models in the field with a published parameter count.12
HelixFold3 from Baidu is another full AlphaFold3 reimplementation, this one in PaddlePaddle, and on some axes it is excellent: it surpasses AlphaFold3 on SAbDab antibody-antigen DockQ and reaches over 80% success with five specified epitope residues, with ligand success comparable to AlphaFold3.4 But the weights are non-commercial. The README claims CC BY-NC-SA while the actual LICENSE file is a custom non-commercial license, and the contradiction is unresolved.4 Capable, restricted, and ambiguously documented; for commercial work, skip it.
Protein-nucleic-acid and mixed assemblies: the RoseTTAFold lineage
The Baker lab's three-track architecture is the other root of the field, and its descendants specialize.
RoseTTAFold (the original, Science 2021) approached AlphaFold2-level accuracy on CASP14 while being far cheaper, and folds a single structure in roughly ten minutes on a gaming PC.3 The licensing is version-dependent, and this is the trap: the original v1 trained weights are under the non-commercial Rosetta-DL license, while RoseTTAFold2 (MIT) and RoseTTAFold All-Atom (BSD) are permissive.3 If you cite "RoseTTAFold is open," specify the version.
RoseTTAFold All-Atom introduced an atom-bond graph representation that treats ligands, ions, and modifications as flexible atoms and bonds rather than residue templates, enabling covalent modifications and mixed assemblies.13 On the CAMEO blind docking benchmark it outperforms classical docking like AutoDock Vina while preserving correct bond lengths, ring planarity, and chirality.13 The model code and weights are BSD, but as the openness matrix shows, the default pipeline drags in non-commercial template data and an academically-licensed dependency.13
RoseTTAFold2NA is the focused, lightweight choice for protein-nucleic-acid complexes: ~67M parameters, pure MIT, no weights-level trap.14 It was trained on a deliberately small set, about 26,128 protein clusters, 1,632 RNA clusters, and 1,556 protein-nucleic-acid complex clusters, and on a 14-case held-out test it correctly identified 13/14 complexes against 1/14 for AlphaFold-plus-docking baselines.14 Be honest about its ceiling, though: only ~29 to 30% of its complex predictions reach lDDT above 0.8, so the confidence estimate is not optional, it is how you filter usable models from the rest.14
The historical baseline: trRosetta
trRosetta (2020) predates the end-to-end era. It is a 2D dilated ResNet that predicts inter-residue distances and orientations, then hands those restraints to Rosetta energy minimization.15 It substantially outperformed the prior leading methods on CASP13 and CAMEO in 2019 and 2020.15 It is foundational and historical now, superseded by RoseTTAFold and AlphaFold2, and its full pipeline carries the Rosetta commercial-license gate.15 Use it to understand the lineage, not to fold proteins in 2026.
Boltz-2: structure plus affinity
Boltz-2 deserves its own section because it does something the others do not. It is an AlphaFold3-style diffusion structure model extended with an affinity module that predicts protein-small-molecule binding, both binary binder classification and continuous regression.7 The reported result should make a drug-discovery team pay attention: it is described as the first AI model to approach free-energy perturbation (FEP) accuracy while being at least 1000x cheaper.7 On the CASP16 affinity challenge of 140 protein-ligand pairs it outperformed all participants out of the box, on a FEP+ four-target subset it reached Pearson R ~0.66, and for hit discovery it reports an enrichment factor of 18.4 at 0.5% on MF-PCBA.7 If your question is "will this compound bind, and how tightly," Boltz-2 is the only model in this survey that answers it directly.7
Keep the validity ledger open here too. Boltz-2's structure accuracy is modestly behind AlphaFold3 on the hardest complexes and antibody-antigen cases, the authors say so plainly.7 Affinity performance varies widely across assay types and protein families, weaker on GPCRs for instance, and the authors say the applicability domain needs further study.7 It also fails to capture large binding-induced conformational changes.7 None of that diminishes the achievement; it bounds it. As of mid-2026 it is the leading openly-licensed AlphaFold3 alternative with affinity prediction, with no successor announced.7
A decision procedure
Strip away the leaderboards and the choice is mostly mechanical.
Figure 4: A task-and-license decision tree. The license branch is decisive for anyone shipping a product. Per-model licenses from Table 2.12567841112314
In prose, the same logic:
Single-chain protein, homologs available, fully open: AlphaFold2, or check the AlphaFold DB first since your structure may already be computed.2 Need to retrain or fine-tune? OpenFold.11
Single-chain protein, orphan or antibody, or you need speed at scale: ESMFold (MIT, fastest, metagenome-scale).9 OmegaFold if you specifically want its antibody numbers and accept an unmaintained repo.10
Complexes, ligands, DNA/RNA, and you can ship commercially: Boltz-2 for structure plus affinity, Chai-1 or Protenix as alternatives. All three are permissively licensed.7812
Protein-nucleic-acid complexes specifically, lightweight and MIT: RoseTTAFold2NA, with the confidence filter applied.14
Non-commercial research only, want the accuracy ceiling: AlphaFold3, via the gated request form and the free AlphaFold Server for small jobs.1
Need binding affinity, not just structure: Boltz-2 is currently the only open model in this table that predicts it.7
What is not known, stated plainly
The honest gaps are large enough to change decisions.
Parameter counts are mostly undisclosed. AlphaFold2's ~93M is inferred, not published; AlphaFold-Multimer, AlphaFold3, Boltz-1, Boltz-2, Chai-1's total, and the entire RoseTTAFold family report no figure.2516783 This is not a minor omission; it makes inference-cost and hardware planning a matter of measurement rather than reading a spec sheet.
Benchmark superiority claims over AlphaFold3 are frequently self-reported. Protenix's "first to surpass AF3" and several HelixFold3 wins come from the developers' own evaluations on their own splits.124 Independent, held-out evaluation against an honest baseline is the standard the field has not uniformly met. Always beat a trivial baseline first, and trust independent reproductions over author claims.
Training-data composition is partially disclosed at best, and cutoffs are getting stale. AlphaFold3, Chai-1, HelixFold3, and RoseTTAFold All-Atom all describe their data only partially.18413 Cutoffs also differ in ways that affect fair comparison: AlphaFold3, Boltz-1, and HelixFold3 train to a 2021-09-30 PDB cutoff, while Boltz-2 trains to 2023-06-01, so a like-for-like comparison has to control for what each model could have seen.1647 When you cannot see the training set, you cannot fully reason about leakage into a test set.
License documentation is sometimes self-contradictory. HelixFold3's README and LICENSE disagree.4 RoseTTAFold's licensing varies by version in ways that headlines flatten.3 Permissive code does not imply permissive weights, and permissive weights do not imply a permissive pipeline. Read all three.
The bottom line
The field's center of gravity has moved. The capability frontier is the gated AlphaFold3, but the deployment frontier is a cluster of open AlphaFold3 reimplementations, Boltz, Chai, and Protenix, that reproduce the architecture without the lock.167812 For a practitioner, that is the good news buried under the licensing problem: the most useful models in this table are also the ones you are allowed to run.
Three things you can act on today. Use the open AlphaFold3 reproductions as your default for commercial all-atom work, with Boltz-2 for structure plus affinity.7 Match the model to the MSA you actually have, AlphaFold2 with deep alignments, ESMFold without.29 And treat every confidence score as a hypothesis, because AlphaFold-Multimer's confident wrong interfaces, AlphaFold3's and Boltz-1's stacked-chain hallucinations, and RoseTTAFold2NA's roughly 70% of complexes below the lDDT 0.8 line are not edge cases, they are the working conditions.51614
Pick by task and license first. Read the LICENSE file rather than the README. Let the benchmarks break ties. The model that wins your benchmark is not always the model you are allowed to ship.
References
Footnotes
-
Abramson, Adler, Dunger et al., "Accurate structure prediction of biomolecular interactions with AlphaFold 3," Nature 630, 493-500 (2024). https://www.nature.com/articles/s41586-024-07487-w ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23 ↩24 ↩25 ↩26 ↩27 ↩28 ↩29 ↩30 ↩31 ↩32 ↩33 ↩34 ↩35 ↩36 ↩37
-
Jumper et al., "Highly accurate protein structure prediction with AlphaFold," Nature 596:583-589 (2021). https://doi.org/10.1038/s41586-021-03819-2 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23 ↩24 ↩25 ↩26 ↩27 ↩28 ↩29 ↩30 ↩31
-
Baek M. et al., "Accurate prediction of protein structures and interactions using a three-track neural network," Science 373:871-876 (2021). https://www.science.org/doi/10.1126/science.abj8754 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20
-
PaddleHelix Team, Baidu, "Technical Report of HelixFold3 for Biomolecular Structure Prediction," arXiv:2408.16975 (2024). https://arxiv.org/abs/2408.16975 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21
-
Evans et al., "Protein complex prediction with AlphaFold-Multimer," bioRxiv 2021 (updated 2022), DOI: 10.1101/2021.10.04.463034. https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18
-
Wohlwend, Corso, Passaro et al., "Boltz-1: Democratizing Biomolecular Interaction Modeling," bioRxiv 2024, DOI: 10.1101/2024.11.19.624167. https://www.biorxiv.org/content/10.1101/2024.11.19.624167 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23 ↩24 ↩25 ↩26
-
Passaro, Corso, Wohlwend et al., "Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction," bioRxiv 2025, DOI: 10.1101/2025.06.14.659707. https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23 ↩24 ↩25 ↩26 ↩27 ↩28 ↩29 ↩30 ↩31
-
Chai Discovery team, "Chai-1: Decoding the molecular interactions of life," bioRxiv 2024, DOI: 10.1101/2024.10.10.615955. https://doi.org/10.1101/2024.10.10.615955 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23
-
Lin et al., "Evolutionary-scale prediction of atomic-level protein structure with a language model," Science 379:6637 (2023), pp.1123-1130. https://www.science.org/doi/10.1126/science.ade2574 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23
-
Wu et al., "High-resolution de novo structure prediction from primary sequence," bioRxiv 2022, DOI: 10.1101/2022.07.21.500999. https://doi.org/10.1101/2022.07.21.500999 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19
-
Ahdritz, G. et al., "OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization," Nature Methods 21, 1514-1524 (2024). https://www.nature.com/articles/s41592-024-02272-z ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14
-
ByteDance Protenix Team, "Protenix - Advancing Structure Prediction Through a Comprehensive AlphaFold3 Reproduction," bioRxiv 2025. https://www.biorxiv.org/content/10.1101/2025.01.08.631967v1 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22
-
Krishna R., Wang J., Ahern W. et al., "Generalized biomolecular modeling and design with RoseTTAFold All-Atom," Science 384, eadl2528 (2024). https://www.science.org/doi/10.1126/science.adl2528 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17
-
Baek, McHugh, Anishchenko, Jiang, Baker, DiMaio, "Accurate prediction of protein-nucleic acid complexes using RoseTTAFoldNA," Nature Methods (2023). https://www.nature.com/articles/s41592-023-02086-5 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23
-
Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D, "Improved protein structure prediction using predicted interresidue orientations," PNAS 117(3):1496-1503 (2020). https://doi.org/10.1073/pnas.1914677117 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16
-
Addendum to "Accurate structure prediction of biomolecular interactions with AlphaFold 3," Nature (2024). https://www.nature.com/articles/s41586-024-08416-7 ↩
Comments and feedback
Spotted an error or have a counterpoint? Comment below. No account needed, a name is enough. Corrections and pushback are welcome.