In a groundbreaking development, scientists have unveiled an artificial intelligence (AI) model capable of designing entirely new proteins that do not exist in nature. This innovation, known as ESM3, represents a significant leap in the field of protein engineering and has far-reaching implications for various scientific disciplines.
The research team, comprising former Meta scientists now associated with EvolutionaryScale, demonstrated the model’s capabilities by creating a novel fluorescent protein. This engineered protein shares only 58% of its sequence with naturally occurring fluorescent proteins, showcasing the model’s ability to generate unique molecular structures.
ESM3: A Language Model for Proteins
ESM3 is a large language model (LLM) akin to OpenAI’s GPT-4, but specifically tailored for protein design. The model was trained on an impressive dataset of 2.78 billion proteins, analyzing their sequences, structures, and functions. By masking random pieces of information and challenging the model to predict the missing data, the researchers honed ESM3’s predictive capabilities.
This new model builds upon previous work, including ESMFold, which focused on predicting unknown microbial protein structures. While other AI models, such as Alphabet’s DeepMind, have made strides in protein structure prediction, ESM3 takes a significant step forward by generating entirely new proteins with specific functions.
From Prediction to Creation
ESM3’s ability to create novel proteins stems from its analysis of 771 billion unique pieces of information on protein structure, function, and sequence. In a demonstration of its capabilities, the model generated 96 proteins with potential fluorescent properties. The researchers then selected one with minimal similarity to natural fluorescent proteins for further refinement.
Through iterative improvements guided by ESM3, the team developed “esmGPF,” a green fluorescent protein unlike any found in nature. This process, which the AI accomplished in moments, would have taken an estimated 500 million years of natural evolution to achieve.
Implications and Future Applications
The potential applications of ESM3 are vast and exciting. EvolutionaryScale suggests that this technology could revolutionize fields such as drug discovery and the development of new chemicals for plastic degradation. The ability to design proteins with specific functions on demand could accelerate research and innovation across multiple scientific disciplines.
While the full version of ESM3 will be made available to commercial researchers, a smaller version has been released under a non-commercial license. This move aims to balance the advancement of scientific research with potential commercial applications.
As with any breakthrough technology, there are limitations and the need for further validation. Scientists emphasize that while AI models like ESM3 can significantly speed up the process of protein design and structure prediction, experimental verification remains crucial. As research in this field continues to evolve, ESM3 and similar AI models may pave the way for unprecedented advancements in biotechnology, medicine, and environmental science.