In preparation

Benchmarks

An independent, held-out evaluation of foundation models for sequence biology — built to measure what survives a domain shift, not what tops a clean leaderboard.

Coming soon

Public leaderboards reward models that fit the test. This benchmark is being built to do the opposite: measure where models break when they meet the messy, shifting reality of real biological data.

It is in preparation. If you would like to be told when the first results are published — or have a model or task you would like covered — get in touch.

hello@rewire.it →

Held-out test sets

Evaluation on data the models could not have memorised — low-homology splits, time-based holdouts, and genuinely external references.

Domain-shift robustness

How accuracy moves when a model is taken off its training distribution: different assays, organisms, and clinical sites.

Hardware & inference cost

Parameters, memory, and throughput — the operational reality of running a model, not just its headline accuracy.

Uncertainty & graceful degradation

Whether a model knows when it is wrong, and how its confidence behaves on the hardest, most novel inputs.