Groundbreaking AI Decodes Complex Genomes: Evo 2 Trained on Trillions of Bases

Evo 2, an open-source AI system, has been trained on genomes from all three domains of life, developing internal representations of key features in even the most complex genomes.
Evo 2, an open-source AI system, has been trained on genomes from all three domains of life - bacteria, archaea, and eukaryotes. After training on trillions of base pairs of DNA, this powerful AI has developed internal representations of key features in even the most complex genomes, including those of humans.
This breakthrough comes after the initial development of Evo, an AI system that was able to correctly identify the next gene in a sequence or suggest a completely novel protein when prompted with sequences from a cluster of related genes. However, that system was limited to working with bacterial genomes, as they tend to cluster related genes together - a feature not found in the complex genome structures of more advanced organisms.
Undeterred by this challenge, the team behind Evo set out to expand the capabilities of their AI system. The result is Evo 2, an open-source AI that has been trained on an unprecedented amount of genetic data, allowing it to develop a deep understanding of the intricate features found in complex genomes, including regulatory DNA and splice sites - elements that can be difficult for humans to identify.
{{IMAGE_PLACEHOLDER}}Source: Ars Technica


