This prototype synthesizes verified code with an LLM.
Using Monte Carlo Tree Search (MCTS), it explores the space of possible generation of a verified program, and it checks at every step that it’s on the right track by calling the verifier.
This prototype uses Dafny.
Logs for example runs can be found in the log directory.
Scroll to the very end of a log to see a chosen solution.
Note that the linked solution is optimal for the problem.
By using this technique, weaker models that might not even know the generated language all that well can compete with stronger models.
We can also reinforce the snippets that succeed positively and that fail negatively through PPO training.
The model after PPO can solve the prompts without backtracking!
For example, the log for solving the problem fact
after PPO training on another problem, opt0
.
Running
This project relies on GPU access. It has been tested on a multi-GPU machine with two NVIDIA A100s.
Setup
Using mamba
or equivalently conda
:
mamba create --name llm-verified python=3.10
mamba activate llm-verified
pip install -r requirements.txt
Execution
Pick an LLM in model.py and a prompt in prompts.py, then:
For the PPO trainer (slow!), do:
TODOs
- Support other verifiers in addition to Dafny:
- Coq
- Support other LLM infrastructures in addition to Hugging Face:
- Support test cases.
- Design a steerable interaction to give human or tool feedback to the LLM.
- Design a reinforcement learning scheme, whereas the LLM learns from trial.
- Evaluate whether the model after PPO suffers degradation for some tasks, even unrelated.
- Force the PPO solution to converge to an optimal known one, using it entirely for training rather than discovery.
- Get wandb to work.
Credits
- The montecarlo lib