Today, CarperAI is releasing OpenELM, an open-source library combining large language models with evolutionary algorithms for code synthesis.
ELM stands for Evolution Through Large Models, a technique from a recent OpenAI paper demonstrating that large language models can act as intelligent mutation operators in an evolutionary algorithm, enabling diverse and high quality generation of code in domains not seen in the language model’s training set.
The initial release of OpenELM, version 0.0.1, includes:
- An implementation of the basic ELM setup, including MAP-Elites for generated code, either from a diff model or from prompt engineering an existing language model.
- The Sodarace 2D environment, along with several other baseline environments.
- A sandbox using Flask and a Docker container to safely run code generated by language models.
- Benchmarking of mutation LLMs using a toy environment.
In addition, we are also releasing an open-source diff model fine-tuned on GitHub diffs from Salesforce’ CodeGen 350M code synthesis model, under an MIT license. A diff model is an autoregressive language model trained on edits to a piece of code, formatted in Unified Diff Format. These diff models can suggest, given a section of code and a description of the desired change (like a commit message), an intelligent change to the code that fits the description, marking the lines added, changed, and deleted in diff format. This diff model will let you more easily generate intelligent code suggestions in ELM.
If you are interested in joining the OpenELM project, check out our Discord or Twitter.
Find out more about how it works below!
Evolutionary algorithms and open-endedness
Evolutionary algorithms (EAs) are a type of population based optimization algorithm inspired by biological evolution. These algorithms start with a population of potential solutions to a problem (often called “individuals”), and then apply evolutionary operators such as mutation, crossover, and selection to the population, in order to generate new populations of solutions.
Over time, the average quality of the solutions in the population will increase, as the “fittest” individuals are more likely to be selected for reproduction, and their offspring will inherit their “fitness”. Evolutionary algorithms therefore rely on a pre-defined fitness function which evaluates the performance or quality of an individual in the population.
This search technique is gradient free and makes very few assumptions about the structure of the fitness landscape, making evolutionary algorithms a powerful optimizer for domains where fitness can be efficiently evaluated and the evolutionary operators can explore the search space effectively.
One fundamental open problem in the evolutionary algorithms community is that of open-endedness. This field seeks to create algorithmic systems that produce never-ending innovation—just as biological evolution is capable of seemingly endless creativity and complexity. Of course, true endless innovation seems out of reach for AI for the foreseeable future, but creating open-ended artifacts of greater and greater complexity has the potential to unlock powerful new generative algorithms. Crucially, open-endedness requires the ability to search outside of the distribution of previous experience, which is typically difficult for deep learning models to do. A recent paper from OpenAI called Evolution Through