2025-01-06
This document is a summary of my personal experiences using generative models while programming over the past year. It has not been a passive process. I have intentionally sought ways to use LLMs while programming to learn about them. The result has been that I now regularly use LLMs while working and I consider their benefits net-positive on my productivity. (My attempts to go back to programming without them are unpleasant.)
Along the way I have found oft-repeated steps that can be automated, and a few of us are working on building those into a tool specifically for Go programming: sketch.dev. It’s very early but so far the experience has been positive.
Background
I am typically curious about new technology. It took very little experimentation with LLMs for me to want to see if I could extract practical value. There is an allure to a technology that can (at least some of the time) craft sophisticated responses to challenging questions. It is even more exciting to watch a computer attempt to write a piece of a program as requested, and make solid progress.
The only technological shift I have experienced that feels similar to me happened in 1995, when we first configured our LAN with a usable default route. We replaced the shared computer in the other room running Trumpet Winsock with a machine that could route a dialup connection, and all at once I had The Internet on tap. Having the internet all the time was astonishing, and felt like the future. Probably far more to me in that moment than to many who had been on the internet longer at universities, because I was immediately dropped into high internet technology: web browsers, JPEGs, and millions of people. Access to a powerful LLM feels like that.
So I followed this curiosity, to see if a tool that can generate something mostly not wrong most of the time could be a net benefit in my daily work. The answer appears to be yes, generative models are useful for me when I program. It has not been easy to get to this point. My underlying fascination with the new technology is the only way I have managed to figure it out, so I am sympathetic when other engineers claim LLMs are “useless.” But as I have been asked more than once how I can possibly use them effectively, this post is my attempt to describe what I have found so far.
Overview
There are three ways I use LLMs in my day-to-day programming:
- Autocomplete. This makes me more productive by doing a lot of the more-obvious typing for me. It turns out the current state of the art can be improved on here, but that’s a conversation for another day. Even the standard products you can get off the shelf are better for me than nothing. I convinced myself of that by trying to give them up. I could not go a week without getting frustrated by how much mundane typing I had to do before having a FIM model. This is the place to experiment first.
- Search. If I have a question about a complex environment, say “how do I make a button transparent in CSS” I will get a far better answer asking any consumer-based LLM, o1, sonnet 3.5, etc, than I do using an old fashioned web search engine and trying to parse the details out of whatever page I land on. (Sometimes the LLM is wrong. So are people. The other day I put my shoe on my head and asked my two year old what she thought of my hat. She dealt with it and gave me a proper scolding. I can deal with LLMs being wrong sometimes too.)
- Chat-driven programming. This is the hardest of the three. This is where I get the most value of LLMs, but also the one that bothers me the most. It involves learning a lot and adjusting how you program, and on principle I don’t like that. It requires at least as much messing about to get value out of LLM chat as it does to learn to use a slide rule, with the added annoyance that it is a non-deterministic service that is regularly changing its behavior and user interface. Indeed, the long-term goal in my work is to replace the need for chat-driven programming, to bring the power of these models to a developer in a way that is not so off-putting. But as of now I am dedicated to approaching the problem incrementally, which means figuring out how to do best with what we have and improve it.
As this is about the practice of programming, this has been a fundamentally qualitative process that is hard to write about with quantitative rigor. The closest I will get to data is to say: it appears from my records that for every two hours of programming I do now, I accept more than 10 autocomplete suggestions, use LLM for a search-like task once, and program in a chat session once.
The rest of this is about extracting value from chat-driven programming.
Why use chat at all?
Let me try to motivate this for the skeptical. A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don’t have the energy to create a new file, start typing, then start looking up the libraries I need. (I’m an early-morning person, so this is usually any time after 11am for me, though it can also be any time I context-switch into a different language/framework/etc.) LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.
This means chat-based programming may not be for you. I am doing a particular kind of programming, product development, which could be roughly described as trying to bring programs to a user through a robust interface. That means I am building a lot, throwing away a lot, and bouncing around between environments. Some days I mostly write typescript, some days mostly Go. I spent a week in a C++ codebase last month exploring an idea, and just had an opportunity to learn the HTTP server-side events format. I am all over the place, constantly forgetting and relearning. If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don’t think any of my observations here are going to be useful to you.
Chat-based LLMs do best with exam-style questions
Give an LLM a specific objective and all the background material it needs so it can craft a well-contained code review packet and expect it to adjust as you question it. There are two major elements to this:
- Avoid creating a situation with so much complexity and ambiguity that the LLM gets confused and produces bad results. This is why I have had little success with chat inside my IDE. My workspace is often messy, the repository I am working on is by default too large, it is filled with distractions. One thing humans appear to be much better than LLMs at (as of January 2025) is not getting distracted. That is why I still use an LLM via a web browser, because I want a blank slate on whic