Synthetic datasets designed for training state-of-the-art protein structure prediction models. Supercharge your models and go beyond the PDB.
Here are the world-leading models that already use synthetic data in their training curriculum.
Foundational co-folding models
Demonstrated significant performance advances by training on large-scale self-distillation datasets, especially for biologics/antibodies.
Genesis Molecular AI
Technical report extensively mentions the data-scaling benefits observed when training on PDB-based synthetic data.
Read PEARL Technical ReportBoltz.bio
Boltz2 pre-print mentions 3 separate molecular dynamics datasets used in model training.
Read Boltz2 pre-printLarge-scale model inferences with confidence filtering for high-quality synthetic structures
Extended molecular dynamics simulations capturing protein dynamics and conformational states
Join our waitlist for early access to pre-made packages and custom generation