Skip to content Skip to footer
0 items - $0.00 0

AI Blindspots – Blindspots in LLMs I’ve noticed while AI coding by rahimnathwani

AI Blindspots – Blindspots in LLMs I’ve noticed while AI coding by rahimnathwani

19 Comments

  • Post Author
    ezyang
    Posted March 19, 2025 at 6:34 pm

    Hi Hacker News! One of the things about this blog that has gotten a bit unwieldy as I've added more entries is that it's a sort of undifferentiated pile of posts. I want some sort of organization system but I haven't found one that's good. Very open to suggestions!

  • Post Author
    datadrivenangel
    Posted March 19, 2025 at 6:44 pm

    Almost all of these are good things to consider with human coders as well. Product managers take note!

    https://ezyang.github.io/ai-blindspots/requirements-not-solu…

  • Post Author
    fizx
    Posted March 19, 2025 at 6:48 pm

    The community seems rather divided as to whether these are intrinsic, or we solve these with today's tech, and more training, heuristics and workarounds.

  • Post Author
    mystified5016
    Posted March 19, 2025 at 7:07 pm

    Recently I've been writing a resume/hire-me website. I'm not a stellar writer, but I'm alright, so I've been asking various LLMs to review it by just dropping the HTML file in.

    Every single one has completely ignored the "Welcome to nginx!" Header at the top of the page. I'd left it in half as a joke to amuse myself but I expected it would get some kind of reaction from the LLMs, even if just a "it seems you may have forgotten this line"

    Kinda weird. I even tried guiding them into seeing it without explicitly mentioning it and I could not get a response.

  • Post Author
    antasvara
    Posted March 19, 2025 at 7:16 pm

    This highlights a thing I've seen with LLM's generally: they make different mistakes than humans. This makes catching the errors much more difficult.

    What I mean by this is that we have thousands of years of experience catching human mistakes. As such, we're really good at designing systems that catch (or work around) human mistakes and biases.

    LLM's, while impressive and sometimes less mistake-prone than humans, make errors in a fundamentally different manner. We just don't have the intuition and understanding of the way that LLM's "think" (in a broad sense of the word). As such, we have a hard time designing systems that account for this and catch the errors.

  • Post Author
    teraflop
    Posted March 19, 2025 at 8:00 pm

    > I had some test cases with hard coded numbers that had wobbled and needed updating. I simply asked the LLM to keep rerunning the test and updating the numbers as necessary.

    Why not take this a step farther and incorporate this methodology directly into your test suite? Every time you push a code change, run the new version of the code and use it to automatically update the "expected" output. That way you never have to worry about failures at all!

  • Post Author
    Mc91
    Posted March 19, 2025 at 8:01 pm

    One thing I do is go to Leetcode, see the optimal big O time and space solutions, then give the LLM the Leetcode medium/hard problem, and limit it to the optimal big O time/space solution and suggest the method (bidirectional BFS). I ask for the solution in some fairly mainstream modern language (although not Javascript, Java or Python). I also say to do it as compact as possible. Sometimes I reiterate that.

    It's just a function usually, but it does not always compile. I'd set this as a low bar for programming. We haven't even gotten into classes, architecture, badly-defined specifications and so on.

    LLMs are useful for programming, but I'd want them to clear this low hurdle first.

  • Post Author
    logicchains
    Posted March 19, 2025 at 8:18 pm

    I found Gemini Flash Thinking Experimental is almost unusable in an agent workflow because it'll eventually accidentally remove a closing bracket, breaking compilation, and be unable to identify and fix the issue even with many attempts. Maybe it has trouble counting/matching braces due to fewer layers?

  • Post Author
    taberiand
    Posted March 19, 2025 at 8:32 pm

    Based on the list, LLMs are at a "very smart junior programmer" level of coding – though with a much broader knowledge base than you'd expect from even a senior. They lack bigger-picture thinking, and default to doing what is asked of them instead of what needs to be done.

    I expect the models will continue improving though, I feel like most of it comes down to the ephemeral nature of their context window / the ability to recall and attach relevant information to the working context when prompted.

  • Post Author
    dataviz1000
    Posted March 19, 2025 at 8:35 pm

    Are you using Cursor? I'm using Github Copilot in VSCode and I'm wondering if I will get more efficiency from a different coding assistant.

  • Post Author
    boredtofears
    Posted March 19, 2025 at 8:38 pm

    Great read, I can definitely confirm a lot of these myself. Would be nice to see this aggregated into some kind of "best practices" document (although hard to say how quickly it'd be out of date).

  • Post Author
    submeta
    Posted March 19, 2025 at 9:25 pm

    > Preparatory refactoring

    > Current LLMs, without a plan that says they should refactor first, don’t decompose changes in this way. They will try to do everything at once.

    Just today I leaned the hard way. I had created an app for my spouse and myself for sharing and reading news-articles, some of them behind paywalls.

    Using Cursor I have a FastAPI backend and a React frontend. When I added extracting the article text in markdown and then summarizing it, both using openai, and when I tasked Cursor with it, the chaos began. Cursor (with the help of Claude 3.7) tackled everything at once and some more. It started writing a module for using openai, then it also changed the frontend to not only show the title and url, but also the extracted markdown and the summary, by doing that it screwed up my UI, deleted some rows in my database, came up with as module for interacting with Openai that did not work, the ectraction was screwed, the summary as well.

    All of this despite me having detailed cursorrules.

    That‘s when I realized: Divide and conquer. Ask it to write one function that workd, then one class where the function becomes a method, test it, then move on to next function. Until every piece is working and I can glue them together.

  • Post Author
    colonCapitalDee
    Posted March 19, 2025 at 9:36 pm

    > Preparatory Refactoring says that you should first refactor to make a change easy, and then make the change. The refactor change can be quite involved, but because it is semantics preserving, it is easier to evaluate than the change itself.

    > In human software engineering, a common antipattern when trying to figure out what to do is to jump straight to proposing solutions, without forcing everyone to clearly articulate what all the requirements are. Often, your problem space is constrained enough that once you write down all of the requirements, the solution is uniquely determined; without the requirements, it’s easy to devolve into a haze of arguing over particular solutions.

    > When you’re learning to use a new framework or library, simple uses of the software can be done just by copy pasting code from tutorials and tweaking them as necessary. But at some point, it’s a good idea to just slog through reading the docs from top-to-bottom, to get a full understanding of what is and is not possible in the software.

    > The Walking Skeleton is the minimum, crappy implementation of an end-to-end system that has all of the pieces you need. The point is to get the end-to-end system working first, and only then start improving the various pieces.

    > When there is a bug, there are broadly two ways you can try to fix it. One way is to randomly try things based on vibes and hope you get lucky. The other is to systematically examine your assumptions about how the system works and figure out where reality mismatches your expectations.

    > The Rule of Three in software says that you should be willing to duplicate a piece of code once, but on the third copy you should refactor. This is a refinement on DRY (Don’t Repeat Yourself) accounting for the fact that it might not necessarily be obvious how to eliminate a duplication, and waiting until the third occurrence might clarify.

    These are lessons that I've learned the hard way (for some definition of "learned", these things are simple but not easy), but I've never seen them phrased to succinctly and accurately before. Well done OP!

  • Post Author
    admiralrohan
    Posted March 19, 2025 at 9:54 pm

    Even in the age of Vibe coding, I always try to learn as much as possible.

    For example, yesterday I was working with the Animation library Motion which I never worked earlier. I used the code suggested by AI but at least picke 2-3 basic animation concepts while reviewing the code.

    Kind of unfocused passive learning I always tried even before AI.

  • Post Author
    akomtu
    Posted March 19, 2025 at 10:13 pm

    LLMs aren't AI. They are more like librarians with eidetic memory: they can discuss in depth any book in the library, but sooner or later you notice that they don't really understand what they are talking about.

    One easy test for AI-ness is the optimization problem. Give it a relatively small, but complex program, e.g. a GPU shader on shadertoy.com, and tell it to optimize it. The output is clearly defined: it's an image or an animation. It's also easy to test how much it's improved the framerate. What's good is this task won't allow the typical LLM bullshitting: if it doesn't compile or doesn't draw a correct image, you'll see it.

    The thing is, the current generation of LLMs will blunder at this task.

  • Post Author
    kleton
    Posted March 19, 2025 at 10:49 pm

    Most of the things are applicable to the current top models, but he frequently references Claude sonnet, which is not even above the fold on the leaderboard

  • Post Author
    lukev
    Posted March 19, 2025 at 10:55 pm

    This is exceptionally useful advice, and precisely the way we should be talking about how to engage with LLMs when coding.

    That said, I take issue with "Use Static Types".

    I've actually had more success with Claude Code using Clojure than I have Typescript (the other thing I tried.)

    Clojure emphasizes small, pure functions, to a high degree. Whereas (sometimes) fully understanding a strong type might involve reading several files. If I'm really good with my prompting to make sure that I have good example data for the entity types at each boundary point, it feels like it does a better job.

    My intuition is that LLMs are fundamentally context-based, so they are naturally suited to an emphasis on functions over pure data, vs requiring understanding of a larger type/class hierarchy to perform well.

    But it took me a while to figure out how to build these prompts and agent rules. A LLM programming in a dynamic language without a human supervising the high-level code structure and data model is a recipe for disaster.

  • Post Author
    oglop
    Posted March 19, 2025 at 10:59 pm

    I just talk to an LLM like it’s a person who is smart, meaning I expect it to be confidently wrong now and then but I don’t have to worry about hurting its feelings. They are remarkably similar to people, though others seems to not think that so maybe it is a case of some people finding them easier to work with compared to others. I wonder what drives that. Maybe it’s the difference between a person who thinks life unfolds before then vs the person who views life as a bundle, with each day a fold making up your experience and through this stack you discern the structure which is your life, which sure seems how these things work.

  • Post Author
    torginus
    Posted March 19, 2025 at 11:31 pm

    I have one more – LLMs are terrible at counting and arithmetic – if your code gen relies on cutting off the first two words of a constant string – you better check if you need to cut off 12 characters like the LLM says. If it adds 2 numbers, it might be suspect. If you need it to decode a byte sequence, where getting the numbers from the exact right position is necessary.. you get the idea.

    Took me a day to debug my LLM-generated code – and of course, like all fruitless and long debugging sessions, this one started with me assuming that it can't possibly get this wrong – yet it did.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.