Which open model is the best AlphaFold3 alternative for commercial work?

Boltz-2, Chai-1, and Protenix are the three open AlphaFold3-class models with permissive licenses. Boltz-1 and Boltz-2 are MIT-licensed for both code and weights, and Boltz-2 adds binding-affinity prediction; Chai-1 and Protenix are Apache-2.0. All approach AlphaFold3-level accuracy without the non-commercial restriction on AlphaFold3's weights.

Do I still need MSAs in 2025?

It depends on the target. MSA-based models like AlphaFold2 remain the most accurate for single chains with deep homolog coverage. For orphan proteins, antibodies, or proteome-scale jobs where building alignments is the bottleneck, single-sequence models like ESMFold trade some accuracy for large speedups.

What if I need protein-nucleic-acid structures but not a heavy all-atom model?

RoseTTAFold2NA is a roughly 67-million-parameter, MIT-licensed model purpose-built for protein-DNA, protein-RNA, and RNA-only complexes. It is far lighter than AlphaFold3-class models, though accuracy is uneven and you should filter by its confidence estimate.

Which model predicts binding affinity, not just structure?

Boltz-2 is the first openly licensed model to jointly predict structure and protein-small-molecule binding affinity. It reports approaching free-energy perturbation accuracy at more than 1000x lower cost, and it is MIT-licensed for commercial use.

Fifteen Ways to Fold a Protein

Q: Why can't I use AlphaFold3 weights in a commercial pipeline?

AlphaFold3's source code is Apache-2.0, but the model weights are released under a custom non-commercial, research-only Terms of Use. Researchers must request them through a Google form, granted at DeepMind's sole discretion, and may not redistribute them or use outputs to train competing models. Commercial structure prediction is handled internally by Isomorphic Labs.

The most capable biomolecular structure predictor available, AlphaFold3, ships its model weights under a custom non-commercial license, and you cannot even download them. Academic and non-commercial researchers must request the parameters through a Google form, granted at DeepMind's sole discretion, with no redistribution permitted and outputs barred from training competing models.¹ The model reported at least a 50% improvement over prior methods for predicting how proteins interact with other molecule types, with accuracy roughly doubled for some interaction categories,¹ and most teams still cannot put it into anything they ship.

The reversal is sharper when you look at where the field came from. In January 2022, DeepMind relicensed the AlphaFold2 weights from non-commercial to CC BY 4.0, opening them to commercial use with attribution.² Two years later, when AlphaFold3 shipped, the weights went the other way: research-only, gated, no redistribution.¹ One model relicensed up, into the open. Its successor relicensed down, behind a gate.

So the question that actually matters when you sit down to fold something is not "what is the best model." It is "what am I folding, and what can I ship." Those two questions have different answers, and the gap between them is where most teams waste a quarter.

This is a survey of fifteen protein structure prediction foundation models, organized the way you would actually pick one: by task first, then by license. The set is AlphaFold2, AlphaFold-Multimer, AlphaFold3, Boltz-1, Boltz-2, Chai-1, ESMFold, HelixFold3, OmegaFold, OpenFold, Protenix, RoseTTAFold, RoseTTAFold All-Atom, RoseTTAFold2NA, and trRosetta. Where a lineage has a later successor that is not yet a standalone, evaluated release, such as OpenFold3 under the OpenFold project, it is mentioned as context rather than counted. Underneath the choices sit two real stories: an architectural one, the move from the Evoformer-era alignment models of 2020 through 2022 to the diffusion-era all-atom models of 2024 onward, and a licensing one, the split between AlphaFold2's open weights and AlphaFold3's gated ones. Both matter, but they are supporting context. The decision is the spine.

The master comparison

Here is every model in scope, with the facts that drive a decision. Bold names are introduced once here; the rest of the article refers back to this table.

Model	Released	Params	Modality	License (weights)	Commercial?
AlphaFold2	2021²	~93M²	Single chain (MSA)	CC BY 4.0²	Yes
AlphaFold-Multimer	2021.10³	undisclosed³	Protein-protein	CC BY 4.0³	Yes
AlphaFold3	2024.05¹	undisclosed¹	All-atom	Custom non-commercial¹	No
ESMFold	2022⁴	~15B total⁴	Single chain (no MSA)	MIT⁴	Yes
OmegaFold	2022.07⁵	~670M (PLM)⁵	Single chain (no MSA)	Apache-2.0⁵	Yes
OpenFold	2022⁶	~93M⁶	Single chain (MSA)	Apache-2.0⁶	Yes
Boltz-1	2024.11⁷	undisclosed⁷	All-atom	MIT⁷	Yes
Boltz-2	2025.06⁸	undisclosed⁸	All-atom + affinity	MIT⁸	Yes
Chai-1	2024.09⁹	undisclosed⁹	All-atom	Apache-2.0⁹	Yes
Protenix	2025.01¹⁰	368M (v1)¹⁰	All-atom	Apache-2.0¹⁰	Yes
HelixFold3	2024.08¹¹	undisclosed¹¹	All-atom	Custom non-commercial¹¹	No
RoseTTAFold	2021.07¹²	undisclosed¹²	Single chain + PPI	v1 non-commercial¹²	v1 no
RoseTTAFold All-Atom	2023.10¹³	undisclosed¹³	All-atom	BSD (caveats)¹³	Mostly
RoseTTAFold2NA	2022.09¹⁴	~67M¹⁴	Protein-nucleic acid	MIT¹⁴	Yes
trRosetta / trRosetta2	2020¹⁵	undisclosed¹⁵	Single chain (MSA)	MIT code¹⁵	Code yes

Two columns deserve your attention before any benchmark does: license and modality. A model that wins every benchmark and that you cannot deploy is, for a commercial team, not a candidate. Keep that filter active while reading.

The decision, by task

Start with what is on your screen. The honest decision tree has six branches, and the license warning lives at the branch tips, not as a footnote.

Decision tree for choosing a structure prediction model by task and license, warning on the non-commercial models. Decision logic synthesized from the per-model facts; license and capability claims for each leaf are cited in the corresponding section below.¹²⁴³⁸⁹¹¹¹⁴

Single chain, deep MSA available

This is the classic case and the one with the clearest answer. If your target has many homologs and you can afford to build a multiple sequence alignment, AlphaFold2 remains the workhorse. On CASP14 it reached a median backbone accuracy of about 0.96 angstrom RMSD on the C-alpha atoms and a median domain GDT_TS of 92.4 out of 100, far ahead of every other group and approaching experimental accuracy.² Its Evoformer trunk is a stack of 48 blocks using attention over an MSA representation and a residue-residue pair representation.² The weights moved from non-commercial to CC BY 4.0 in January 2022, so commercial use is allowed with attribution.²

If you need to retrain, fine-tune, or audit the pipeline rather than just run it, reach for OpenFold instead. It is a trainable, memory-efficient PyTorch reimplementation of AlphaFold2 that matches AF2 accuracy on CAMEO and CASP-style validation when trained from scratch.⁶ It is Apache-2.0 for both code and weights, with no non-commercial restriction.⁶ OpenFold's training set, OpenProteinSet, ships about 400,000 MSAs and template hit files on the AWS open-data registry, which is why it is the model people actually fork.⁶ Its successor, OpenFold3, extends the same open posture to AF3-style cofolding, but as a project still maturing toward a fully evaluated release it sits outside the fifteen here.

trRosetta is the historical root of this branch and the honest "do not use this today" entry. It predicts inter-residue distances and orientation angles with a dilated ResNet of about 61 residual blocks, then folds with Rosetta energy minimization.¹⁵ It beat the AF1-era contact methods in 2019 and 2020,¹⁵ but it is two-stage rather than end-to-end and is superseded by RoseTTAFold and AlphaFold2.¹⁵ Worth knowing the lineage; not worth running.

Single chain, no MSA: orphans, antibodies, proteome scale

When alignment building is the bottleneck, or when your target has few homologs, the MSA itself becomes the problem rather than the solution. To see why that matters, look at what the alignment costs. A standard pipeline builds its MSA from UniRef30 at about 46 GB, BFD at about 272 GB, plus PDB100 structure templates at more than 100 GB, then runs a search over all of it before the network ever sees the sequence.¹² RoseTTAFold folds a single structure in roughly ten minutes on one GPU once that alignment exists.¹² The alignment is the slow part. Two models drop it entirely.

ESMFold is the one most people reach for. It replaces the MSA input with embeddings from a pretrained ESM-2 protein language model of up to 48 layers and 15 billion parameters.⁴ The folding head itself is only about 90 to 94 million parameters; the bulk is the language model.⁴ It folds a 384-residue protein in about 14.2 seconds on a single V100, up to 60 times faster than MSA-based predictors,⁴ and the team used it to predict more than 617 million metagenomic structures.⁴ The catch is accuracy: CAMEO average TM-score is about 0.83 and CASP14 about 0.68, below the full AlphaFold2 pipeline's roughly 0.88 and 0.85.⁴ It is MIT-licensed, including the weights.⁴

OmegaFold targets the same niche with a roughly 670-million-parameter language model called OmegaPLM feeding a geometry-aware transformer.⁵ On antibodies it reported RMSD of 2.12 angstrom versus AlphaFold2's 2.98 (p=0.0017), and on orphan proteins a TM-score of 0.73 versus 0.60 (p=0.0238).⁵ It is Apache-2.0,⁵ and runs about 10x faster than MSA-based pipelines.⁵ The caveat is maintenance: the repository has been dormant since 2022 and the niche has largely shifted to ESMFold.⁵ Treat it as a fast MSA-free baseline, not an actively developed tool.

The numbers make the trade concrete.

Model	CAMEO TM-score	CASP14 TM-score	Speed vs MSA pipelines	Orphan / antibody evidence
AlphaFold2 (full MSA)	~0.88⁴	~0.85⁴	baseline (alignment-heavy)²	weaker on orphans (shallow MSA)²
ESMFold (single-seq)	~0.83⁴	~0.68⁴	up to 60x faster⁴	folds orphans by design⁴
OmegaFold (single-seq)	not reported⁵	not reported⁵	~10x faster⁵	orphan TM 0.73 vs 0.60; antibody RMSD 2.12 vs 2.98⁵

Benchmark figures as reported by each model's primary publication; CAMEO and CASP14 are standard blind-assessment sets, and cross-paper comparisons should be read with care because evaluation sets and dates differ.

The practical read on this branch: single-sequence models do not beat a deep MSA on a well-studied family. They win when the alignment is shallow or absent, when you fold a whole proteome, or when latency is the constraint. That is a narrower claim than the hype around them suggested, and it is the honest one. Use ESMFold for the permissive license and the speed; keep OmegaFold as a fallback if you specifically want its antibody and orphan numbers and accept an unmaintained repo.

Protein-protein complexes

For protein-only complexes with known stoichiometry, AlphaFold-Multimer is the open default. It is the AlphaFold2 Evoformer and structure module retrained for multimeric inputs.³ On a benchmark of 17 template-free heterodimers it reached at least medium accuracy (DockQ at or above 0.49) on 13 of 17 targets and high accuracy (DockQ at or above 0.8) on 7 of 17, versus 9 and 4 for the prior state of the art.³ Its weights are CC BY 4.0, so it is commercially usable.³ Be explicit about its failure modes: it requires known stoichiometry, struggles with antibody-antigen and weak transient interactions, and can produce confident but wrong interfaces.³

If you want stronger multimer numbers and can run a single-sequence mode, Chai-1 reports a DockQ acceptable rate of about 69.8%, exceeding AlphaFold-Multimer 2.3's roughly 67.7%, and does so even without MSAs.⁹ More on Chai-1 in the all-atom branch; it doubles as a strong complex predictor.

All-atom: protein plus nucleic acid plus ligand

This is the branch AlphaFold3 defined and the one where the open ecosystem caught up fastest. The task is a single model that folds proteins, DNA, RNA, small-molecule ligands, ions, and modifications together.

AlphaFold3 sets the accuracy bar. It replaced the Evoformer with a Pairformer trunk of 48 blocks feeding a diffusion module that predicts raw atom coordinates directly.¹ Its protein-ligand accuracy exceeds classical docking on PoseBusters without input pose information, its protein-nucleic-acid accuracy beats nucleic-acid-specific predictors, and its antibody-antigen accuracy markedly exceeds AlphaFold-Multimer.¹ The headline doubling of accuracy on some interaction categories is real.¹ The license is the problem. Code is Apache-2.0; weights are a custom non-commercial, research-only Terms of Use, must be requested from Google, cannot be redistributed, and outputs cannot be used to train competing models.¹ For a commercial team, AlphaFold3 is a reference, not a dependency.

That gap is exactly why open reimplementations exist, and they are good.

Two eras of structure prediction: the Evoformer/MSA era and the diffusion/all-atom era, nodes colored by license. Lineage and license coloring synthesized from the per-model architecture and license facts cited throughout this article.⁷¹²⁹¹¹¹²¹³

Boltz-1 was the first fully open, commercially usable model to approach AlphaFold3-level accuracy.⁷ It uses the AlphaFold3 architecture, an MSA module and Pairformer trunk feeding a diffusion module, and trains on PDB structures released before 30 September 2021, the same cutoff as AlphaFold3.⁷ On a recent-PDB test set its top-1 results were a mean LDDT of about 0.716, DockQ above 0.23 at about 0.625, and ligand RMSD below 2 angstrom at about 0.545, on par with AlphaFold3 and Chai-1 within bootstrap confidence intervals.⁷ It is MIT-licensed for both code and weights.⁷ That single licensing fact is why Boltz spread so fast.

Boltz-2 extends the lineage and is the current open state of the art.⁸ Beyond structure, it adds a binding-affinity module with dual heads for binary binding likelihood and continuous affinity regression.⁸ It is the first AI model to approach free-energy perturbation accuracy while being at least 1000 times cheaper, with a Pearson R of about 0.66 on the FEP+ 4-target subset and an out-of-the-box win on the CASP16 affinity challenge across 140 protein-ligand pairs.⁸ On hit discovery it reports an enrichment factor of 18.4 at 0.5% on MF-PCBA.⁸ It is MIT-licensed for model, weights, and code.⁸ If your question is "will this molecule bind, and how tightly," this is the open answer.

Chai-1 is the third open AlphaFold3-class model, Apache-2.0 for code and weights since a November 2024 relicensing.⁹ On PoseBusters protein-ligand it reports about 77% success, edging AlphaFold3's roughly 76%,⁹ and a CASP15 monomer C-alpha LDDT of about 0.849, above ESM3-98B's roughly 0.801.⁹ Its trick for strong single-sequence prediction is an extra input track of embeddings from a roughly 3-billion-parameter protein language model.⁹ Note the licensing history: at its September 2024 launch the weights were non-commercial, relicensed to Apache-2.0 in late November 2024, so verify the LICENSE in the repo for the version you pull.⁹

Protenix, from ByteDance, is an Apache-2.0 reproduction of AlphaFold3 with a disclosed parameter count: 368 million for v1 base.¹⁰ It is described as the first fully open-source model to surpass AlphaFold3 across multiple benchmark sets, with that superiority self-reported by the developers.¹⁰ It uses the AF3 pipeline of MSA and template embeddings, a Pairformer trunk, and a diffusion module.¹⁰ Default checkpoints use the 2021-09-30 cutoff to match AF3's training window.¹⁰

HelixFold3, from Baidu, is the open reimplementation to watch the license on. The architecture is an AlphaFold3-style Pairformer-plus-diffusion stack in PaddlePaddle.¹¹ On antibody-antigen it surpasses AF3 in DockQ and success rate, exceeding 80% success with five specified epitope residues, and on PoseBusters ligands it is comparable to AF3 with over 90% stereochemistry pass rates.¹¹ But the license is non-commercial: the README cites CC BY-NC-SA while the actual LICENSE file is a custom non-commercial license, an unresolved discrepancy, with commercial use requiring a separate paid Baidu offering.¹¹ Strong model, restricted license. The same shape as AlphaFold3.

The Baker lab's RoseTTAFold All-Atom belongs in this branch too. It extends the three-track RoseTTAFold design with an atom-bond graph representation that treats ligands, ions, and modifications as flexible atoms and bonds rather than residue templates.¹³ On the CAMEO blind protein-ligand docking benchmark it outperforms classical docking such as AutoDock Vina and preserves ligand chemistry.¹³ The license is a BSD variant on code and weights, but with a real caveat: the bundled PDB100 template database is CC BY-NC-SA 4.0 and a runtime dependency.¹³ So the model is commercially usable but the default pipeline pulls in a restricted dependency. For broad complex prediction it is generally regarded as exceeded by AlphaFold3 and the Boltz and Chai reimplementations.¹³

Protein-nucleic acid, lightweight

If your task is specifically protein-DNA, protein-RNA, or RNA-only complexes and you do not want to stand up an AlphaFold3-class model, RoseTTAFold2NA is the purpose-built, lightweight choice. It is about 67 million parameters,¹⁴ a three-track network trained on PDB structures published before 30 May 2020 across roughly 26,128 protein clusters, 1,632 RNA clusters, and 1,556 protein-nucleic-acid complex clusters.¹⁴ On a 14-case held-out test it correctly identified 13 of 14 complexes versus 1 of 14 for AlphaFold-plus-docking baselines.¹⁴ It is MIT-licensed.¹⁴ Be honest about its ceiling: only about 29 to 30% of its complex predictions reach LDDT above 0.8, so the confidence estimate is needed to filter usable models.¹⁴ Use it as a fast first pass, not a final answer.

Need binding affinity

If the deliverable is a binding number rather than a structure, the branch collapses to one open answer: Boltz-2, covered above, with its affinity module approaching FEP accuracy at more than 1000x lower cost and an MIT license.⁸ No other model in this survey jointly predicts structure and affinity under an open license. That is a genuinely narrow field, and worth saying plainly.

The licensing split, stated plainly

Strip away the architecture talk and one practical question remains: can you put this model into the thing you are building. The answer splits cleanly, and not along the lines a leaderboard would suggest. The table below is the center of this survey. Read it before the benchmark numbers, because for most teams it filters the candidate set first.

Model	Code license	Weights license	Commercial use?	Notable trap
AlphaFold2	Apache-2.0	CC BY 4.0²	Yes, with attribution²	Relicensed from CC BY-NC to CC BY 4.0 in Jan 2022²
AlphaFold-Multimer	Apache-2.0	CC BY 4.0³	Yes³	Needs known stoichiometry; no ligands or nucleic acids³
AlphaFold3	Apache-2.0	Custom non-commercial¹	No¹	Outputs may not be used to train competing models; weights gated, no redistribution¹
ESMFold	MIT	MIT⁴	Yes⁴	Successor ESM3 line is under different, partly non-commercial terms; do not assume ESMFold's MIT license carries over
OpenFold	Apache-2.0	Apache-2.0⁶	Yes⁶	None at the weights level⁶
Boltz-1	MIT	MIT⁷	Yes⁷	None at the weights level; superseded by Boltz-2⁸
Boltz-2	MIT	MIT⁸	Yes⁸	None at the weights level; param count undisclosed⁸
Chai-1	Apache-2.0	Apache-2.0⁹	Yes (since Nov 2024)⁹	Launched non-commercial Sep 2024, relicensed Apache-2.0 Nov 2024⁹
Protenix	Apache-2.0	Apache-2.0¹⁰	Yes¹⁰	Benchmark wins over AF3 are self-reported¹⁰
HelixFold3	Custom non-commercial¹¹	Custom non-commercial¹¹	No (free tier)¹¹	README says CC BY-NC-SA, LICENSE file is custom; treat as restrictive¹¹
RoseTTAFold (v1)	MIT (code)	Rosetta-DL, non-commercial¹²	No (weights)¹²	RF2 and RFAA are permissive; only v1 weights are restricted¹²
RoseTTAFold2NA	MIT	MIT¹⁴	Yes¹⁴	None at the weights level; inference databases carry own terms¹⁴
RoseTTAFold All-Atom	BSD-style	BSD-style¹³	Mostly¹³	Bundled PDB100 templates are CC BY-NC-SA; SignalP6 dependency is academic-only¹³
OmegaFold	Apache-2.0	Apache-2.0⁵	Yes⁵	Repo dormant since 2022⁵
trRosetta / trRosetta2	MIT	Open¹⁵	Code yes; pipeline no¹⁵	Rosetta/PyRosetta refinement needs a paid commercial license¹⁵

Openness and commercial-use matrix, sorted from clean to restricted. "Commercial use?" answers the narrow question of whether the model's own weights permit it; the "Notable trap" column flags the per-model gotcha that the headline license hides. Sources cited inline per row.

Several patterns in that table are worth stating out loud, because they trip up real procurement decisions.

First, the headline model is the restricted one. AlphaFold3 weights are not openly downloadable; non-commercial researchers must request them through a Google form, granted at DeepMind's sole discretion, while commercial structure prediction is handled internally by Isomorphic Labs.¹ That is the single most consequential licensing fact in the field right now.

Second, "open source" can hide a non-commercial dependency. trRosetta's network and weights are MIT, but the full folding pipeline depends on Rosetta and PyRosetta, free for academia and a paid license for for-profit use, so a commercial deployment of the complete pipeline is gated despite the MIT code.¹⁵ RoseTTAFold All-Atom has BSD-licensed code and weights, but its default pipeline pulls in a CC BY-NC-SA template database.¹³ A permissive license on the model file is necessary but not sufficient.

Third, licenses change, sometimes in your favor. Chai-1 launched in September 2024 under a restrictive non-commercial license, then Chai Discovery relicensed both code and weights to Apache-2.0 around late November 2024, which permits broad commercial use.⁹ AlphaFold2 made the same kind of move earlier, from CC BY-NC to CC BY in January 2022.² Always check the LICENSE file at the version you are actually running.

Fourth, the same lab can ship across the fault line. The Baker lab's original RoseTTAFold distributes restricted, non-commercial weights under the Rosetta-DL license, but RoseTTAFold2 is MIT, RoseTTAFold2NA is MIT,¹⁴ and RoseTTAFold All-Atom is BSD-style.¹³ Within one lineage, only the v1 weights are the trap.

The clean cases, then, are AlphaFold2 and its open descendants: AlphaFold2 and AlphaFold-Multimer under CC BY 4.0,²³ ESMFold under MIT,⁴ OpenFold under Apache-2.0,⁶ Boltz-1 and Boltz-2 under MIT,⁷⁸ Chai-1 and Protenix under Apache-2.0,⁹¹⁰ and RoseTTAFold2NA under MIT.¹⁴ The restricted cases are AlphaFold3,¹ HelixFold3,¹¹ and the original RoseTTAFold v1.¹² The traps live in the caveats above. Read the dependencies, not just the LICENSE file.

Failure modes you inherit no matter which row you pick

Before the benchmark numbers, the failures behind them. These are shared across the family, and a high benchmark score on a curated set tells you nothing about whether your target hits one of them. Read the table in the next section with these in mind.

A high confidence score is not a correctness guarantee. AlphaFold-Multimer can produce confident but wrong interfaces and false-positive contacts.³ The confidence head is evidence, not proof.

The diffusion models hallucinate geometry. Boltz-1 produces overlapping and stacked identical chains in large complexes, a failure class it explicitly notes is shared with AlphaFold3.⁷ The newer Boltz-1x variant's inference-time steering largely fixes it, but the failure mode is real.⁷

Accuracy tracks MSA depth in the alignment-based family. AlphaFold2 has limited accuracy for orphan sequences that lack deep alignments,² so a glowing benchmark on well-aligned targets says little about your low-homology one. This is the recurring tax across the MSA-based models, and the reason the single-sequence branch exists at all.

Disordered regions stay hard. AlphaFold2 and AlphaFold3 predict mostly static single conformations, not ensembles or dynamics, and degrade on intrinsically disordered regions.¹² If your biology lives in a disordered region, a low confidence score there is telling you something true.

None of this is an edge case. It is the working condition. Wet-lab validation, not the confidence head, remains the arbiter.

The benchmark picture

Numbers across these models come from different papers, test sets, and metrics, so cross-model comparison is approximate. Treat the figures below as directional, and weight self-reported comparisons accordingly. With that stated, here is what each model reports on its own headline benchmark.

Model	Headline benchmark	Reported result	License class
AlphaFold2	CASP14 median GDT_TS²	92.4 / 100; ~0.96 A C-alpha RMSD²	Permissive
AlphaFold-Multimer	17 heterodimers, DockQ >= 0.49³	13/17 medium, 7/17 high³	Permissive
AlphaFold3	Protein-other interactions¹	~50%+ improvement, ~2x on some¹	Restricted
ESMFold	CAMEO / CASP14 TM-score⁴	~0.83 / ~0.68; ~60x faster⁴	Permissive
OmegaFold	Antibody RMSD vs AF2⁵	2.12 A vs 2.98 A (p=0.0017)⁵	Permissive
OpenFold	CAMEO/CASP vs AF2⁶	matches AF2 from scratch⁶	Permissive
Boltz-1	Recent-PDB top-1 mean LDDT⁷	~0.716, on par with AF3⁷	Permissive
Boltz-2	FEP+ affinity Pearson R⁸	~0.66, ~1000x cheaper than FEP⁸	Permissive
Chai-1	PoseBusters success⁹	~77% vs AF3 ~76%⁹	Permissive
Protenix	Multiple AF3 benchmarks¹⁰	reported to surpass AF3 (self-reported)¹⁰	Permissive
HelixFold3	SAbDab antibody-antigen¹¹	surpasses AF3; >80% with epitope hints¹¹	Restricted
RoseTTAFold2NA	14-case held-out¹⁴	13/14 vs 1/14 baseline¹⁴	Permissive

Two patterns stand out. First, the open AlphaFold3 reimplementations cluster around AlphaFold3 itself, with differences often inside reported confidence intervals.⁷⁹ Second, "surpasses AF3" claims for Protenix and on specific tasks for HelixFold3 are self-reported by the developers and benchmarked on their own chosen sets; treat them as promising rather than settled.¹⁰¹¹ The benchmark winner and the model you can deploy are frequently not the same row, which is the entire reason to read by task and license rather than by score.

What changed under the hood: Evoformer to diffusion

The architectural shift maps cleanly onto a timeline, which is worth seeing before drawing conclusions.

Release timeline of the fifteen models, 2020-2025, colored by license, with the Evoformer/diffusion divider at mid-2024. Release dates and license categories per the per-model facts; individual dates and licenses are cited in the sections above.⁷¹²⁴³⁸⁹¹¹⁵⁶¹⁰¹²¹³¹⁴¹⁵

The first era, roughly 2020 to 2022, was built on multiple sequence alignments and the Evoformer. AlphaFold2's Evoformer was 48 blocks of attention over an MSA representation and a pair representation, feeding a structure module that produced coordinates as per-residue rotations and translations.² RoseTTAFold reframed the same idea as a three-track network passing information between 1D sequence, 2D residue-pair, and 3D coordinate representations, approaching AF2-level accuracy on CASP14 while being faster.¹² The whole era leaned on evolutionary information; accuracy tracked MSA depth, and orphan sequences suffered.²

The MSA-free models were the first crack in that assumption. ESMFold swapped the alignment for a protein language model's embeddings,⁴ and OmegaFold did the same with OmegaPLM.⁵ They traded accuracy for speed and independence from alignment building.

The second era, 2024 onward, replaced the Evoformer with a Pairformer and the structure module with diffusion. AlphaFold3's Pairformer has 48 blocks and reduced reliance on MSA processing, feeding a diffusion module that denoises raw atom coordinates directly.¹ That architecture is what unlocked unified all-atom prediction across proteins, nucleic acids, ligands, and modifications in one model.¹ Every open AlphaFold3-class model here, Boltz-1, Boltz-2, Chai-1, Protenix, and HelixFold3, copies that Pairformer-plus-diffusion shape.⁷⁸⁹¹⁰¹¹ The architecture became a commodity within a year. What did not become a commodity was the original's weights.

What we do not know

Being explicit about the gaps is part of a fair survey.

Parameter counts are undisclosed for most of these models. AlphaFold3, AlphaFold-Multimer, Boltz-1, Boltz-2, Chai-1, HelixFold3, RoseTTAFold, and RoseTTAFold All-Atom all decline to state a total count.¹³⁷⁸⁹¹¹¹²¹³ We have firm numbers only for AlphaFold2 at about 93 million,² RoseTTAFold2NA at about 67 million,¹⁴ Protenix-v1 at 368 million,¹⁰ and ESMFold's folding head at about 90 to 94 million on top of its language model.⁴ Scaling claims for the undisclosed models cannot be independently checked.

Disclosed parameter counts on a log scale; most models publish no total. Disclosed parameter counts, log scale. Only published numbers are plotted; the list of undisclosed models is the more telling data point. Sources: AlphaFold2,² RoseTTAFold2NA,¹⁴ OpenFold,⁶ ESMFold,⁴ Protenix,¹⁰ OmegaFold,⁵ Chai-1.⁹

Cross-model benchmark comparison is shaky. These numbers come from different papers, test sets, cutoffs, and metrics. The "surpasses AlphaFold3" claims from Protenix and HelixFold3 are self-reported on developer-chosen benchmarks.¹⁰¹¹ AlphaFold3 itself published a December 2024 addendum correcting some reported metrics.¹ Independent head-to-head evaluation on a single shared test set is what would settle the ranking, and it does not yet exist for this full set.

The shared failure modes laid out earlier, the confident-but-wrong interfaces, the stacked-chain hallucinations, the MSA-depth dependence, and the weakness on disordered regions, are underexamined precisely because no shared benchmark stresses them in one place.³⁷¹² They are the working conditions, not edge cases, and wet-lab validation remains the arbiter.

How to actually choose

Strip it to the decisions that hold up.

If you have a single chain with deep homolog coverage and need maximum accuracy, run AlphaFold2; it is open and still the single-chain leader.² If you need to fork, retrain, or audit, use OpenFold.⁶

If alignment building is your bottleneck, or you are folding orphans or working at proteome scale, use ESMFold for the permissive license and the speed, with OmegaFold as a fallback.⁴⁵

If you need protein-protein complexes and want open weights, AlphaFold-Multimer is the default,³ with Chai-1 a stronger and also open alternative.⁹

If you need all-atom prediction across proteins, nucleic acids, and ligands and you can ship commercially, the answer is one of the open AlphaFold3 reimplementations: Boltz-2 if you also want affinity,⁸ Chai-1 or Protenix otherwise.⁹¹⁰ AlphaFold3 and HelixFold3 are more capable on specific tasks but non-commercial.¹¹¹

If you need protein-nucleic-acid structures and want something light, RoseTTAFold2NA, with its confidence filter on.¹⁴

If you need a binding number, Boltz-2.⁸

The model that wins the benchmark and the model you can deploy are usually different rows in the table. AlphaFold3 is the most capable predictor published, and for a commercial team it is mostly a reference point. The open reimplementations closed the accuracy gap inside a year; the license gap is the one that still decides projects. The frontier is gated. The usable frontier, increasingly, is not. Pick by what you are folding and what you can ship, and the fifteen models collapse into a short, defensible shortlist.

References

Abramson, Adler, Dunger et al., "Accurate structure prediction of biomolecular interactions with AlphaFold 3," Nature 630, 493-500 (2024). https://www.nature.com/articles/s41586-024-07487-w ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³ ↩²⁴ ↩²⁵ ↩²⁶ ↩²⁷
Jumper et al., "Highly accurate protein structure prediction with AlphaFold," Nature 596:583-589 (2021). https://doi.org/10.1038/s41586-021-03819-2 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³ ↩²⁴ ↩²⁵ ↩²⁶ ↩²⁷
Evans et al., "Protein complex prediction with AlphaFold-Multimer," bioRxiv 2021 (updated 2022). https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹
Lin et al., "Evolutionary-scale prediction of atomic-level protein structure with a language model," Science 379:6637 (2023), pp.1123-1130. https://www.science.org/doi/10.1126/science.ade2574 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³ ↩²⁴ ↩²⁵ ↩²⁶
Wu et al., "High-resolution de novo structure prediction from primary sequence," bioRxiv 2022. https://doi.org/10.1101/2022.07.21.500999 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹
Ahdritz, G. et al., "OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization," Nature Methods 21, 1514-1524 (2024). https://www.nature.com/articles/s41592-024-02272-z ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵
Wohlwend, Corso, Passaro et al., "Boltz-1: Democratizing Biomolecular Interaction Modeling," bioRxiv 2024. https://www.biorxiv.org/content/10.1101/2024.11.19.624167 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰
Passaro, Corso, Wohlwend et al., "Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction," bioRxiv 2025. https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²²
Chai Discovery team, "Chai-1: Decoding the molecular interactions of life," bioRxiv 2024. https://doi.org/10.1101/2024.10.10.615955 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³ ↩²⁴ ↩²⁵
ByteDance Protenix Team, "Protenix - Advancing Structure Prediction Through a Comprehensive AlphaFold3 Reproduction," bioRxiv 2025. https://www.biorxiv.org/content/10.1101/2025.01.08.631967v1 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰
PaddleHelix Team, Baidu, "Technical Report of HelixFold3 for Biomolecular Structure Prediction," arXiv:2408.16975 (2024). https://arxiv.org/abs/2408.16975 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹
Baek M. et al., "Accurate prediction of protein structures and interactions using a three-track neural network," Science 373:871-876 (2021). https://www.science.org/doi/10.1126/science.abj8754 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
Krishna R., Wang J., Ahern W. et al., "Generalized biomolecular modeling and design with RoseTTAFold All-Atom," Science 384, eadl2528 (2024). https://www.science.org/doi/10.1126/science.adl2528 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵
Baek, McHugh, Anishchenko, Jiang, Baker, DiMaio, "Accurate prediction of protein-nucleic acid complexes using RoseTTAFoldNA," Nature Methods (2023). https://www.nature.com/articles/s41592-023-02086-5 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰
Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D, "Improved protein structure prediction using predicted interresidue orientations," PNAS 117(3):1496-1503 (2020). https://doi.org/10.1073/pnas.1914677117 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹

Fifteen Ways to Fold a Protein

The master comparison