Skip to content Skip to footer
0 items - $0.00 0

Are LLMs able to notice the “gorilla in the data”? by finding_theta

Are LLMs able to notice the “gorilla in the data”? by finding_theta

Are LLMs able to notice the “gorilla in the data”? by finding_theta

24 Comments

  • Post Author
    zozbot234
    Posted February 8, 2025 at 5:47 pm

    That's done on purpose. The AI can't easily tell whether the drawing might be intended to be one of a human or a gorilla, so when in doubt it doesn't want to commit either way and just ignores the topic altogether. It's just another example of AI ethics influencing its behavior for alignment purposes.

  • Post Author
    badlibrarian
    Posted February 8, 2025 at 5:52 pm

    [flagged]

  • Post Author
    johnfn
    Posted February 8, 2025 at 5:56 pm

    GPT can't "see" the results of the scatterplot (unless prompted with an image), it only sees the code it wrote. If a human had the same constraints I doubt they'd identify there was a gorilla there.

    Take a screenshot of the scatterplot and feed it into multimodal GPT and it does a fine job at identifying it.

    EDIT:

    Sorry, as a few people pointed out, I missed the part where the author did feed a PNG into GPT. I kind of jumped to conclusions when it worked fine for me. I still maintain that the article's conclusion ("Your AI Can't See Gorillas") is overly broad, given that I had no trouble getting it to see one.

    But I wonder why the author had trouble? My suspicion is that AI got stuck on summary statistics because the previous messages in the chat were all about summary statistics.

  • Post Author
    Retr0id
    Posted February 8, 2025 at 5:56 pm

    I'm not sure I'd be able to tell it was supposed to be a gorilla specifically, without context.

  • Post Author
    amelius
    Posted February 8, 2025 at 5:59 pm

    Can it draw the unicorn yet?

    https://gpt-unicorn.adamkdean.co.uk/

  • Post Author
    GaggiX
    Posted February 8, 2025 at 6:01 pm

    If you give the graph as image to the model they will easily see the monkey: "I see a drawing of a monkey outlined with red and blue dots.", if you give them coordinates than they will much more struggle with it like a human would do.

  • Post Author
    sw1sh
    Posted February 8, 2025 at 6:13 pm

    I got "The scatter plot appears to be arranged to resemble the character "Pepe the Frog," a popular internet meme … " lol

    Not sure whether multimodal embeddings have such a good pattern recognition accuracy in this case, probably most of information goes into attending to plot related features, like its labels and ticks.

  • Post Author
    mkoubaa
    Posted February 8, 2025 at 6:19 pm

    Asimov forgot to warn us of Artificial Stupidity

  • Post Author
    runjake
    Posted February 8, 2025 at 6:19 pm

    Only tangentially related to this story, I've been trying for months to train the YOLO models to recognize my Prussian blue cat, with its assorted white spots, as a cat rather than a dog or a person.

    However, it refuses to cooperate. It's maddening.

    As a result, I receive "There is a person at your front door" notifications at all hours of the night.

  • Post Author
    ultimoo
    Posted February 8, 2025 at 6:22 pm

    reminded me of classic attention test https://m.youtube.com/watch?v=vJG698U2Mvo

  • Post Author
    talles
    Posted February 8, 2025 at 6:42 pm

    Is "seeing the gorilla" a reference borrowed from this work? https://www.youtube.com/watch?v=UtKt8YF7dgQ

  • Post Author
    lxe
    Posted February 8, 2025 at 6:44 pm

    If you give a blind researcher this task, they might have trouble seeing the gorillas as well.

    Also the prompt matters. To a human, literally everything they see and experience is "the prompt", so to speak. A constant barrage of inputs.

    To the AI, it's just the prompt and the text it generates.

  • Post Author
    wodenokoto
    Posted February 8, 2025 at 6:45 pm

    I love that gorilla test. Happens in my team all the time, that people start with the assumption that the data is “good” and then deep dive.

    Is there a blog post that just focus on the gorilla test that I can share with my team? I’m not even interested in the LLM part

  • Post Author
    albert_e
    Posted February 8, 2025 at 6:45 pm

    Recently we read about how DeepSeek reasoning models exhibited a "Aha! moment" when analyzing a complex problem, where they find a deeper pattern/insight that provides a breakthrough.

    I feel we also need models to be able to have a " Wait, What?" moment

  • Post Author
    forgotusername6
    Posted February 8, 2025 at 6:54 pm

    I had a recent similar experience with chat gpt and a gorilla. I was designing a rather complicated algorithm so I wrote out all the steps in words. I then asked chatgpt to verify that it made sense. It said it was well thought out, logical etc. My colleague didn't believe that it was really reading it properly so I inserted a step in the middle "and then a gorilla appears" and asked it again. Sure enough, it again came back saying it was well thought out etc. When I questioned it on the gorilla, it merely replied saying that it thought it was meant to be there, that it was a technical term or a codename for something…

  • Post Author
    cjbgkagh
    Posted February 8, 2025 at 6:55 pm

    Seems like the specific goal post of gorilla was chosen in order to obtain the outcome to write the paper they wanted and rather uninteresting compared to determining at what point does the AI start to see shapes in the data. Could the AI see a line, curve, square, or an umbrella? If AI can't see a square why would we expect it to see a gorilla?

  • Post Author
    tonetegeatinst
    Posted February 8, 2025 at 7:13 pm

    "The core value of EDA…"

    Another subtle joke about chip design and layout strikes again.

  • Post Author
    svilen_dobrev
    Posted February 8, 2025 at 7:19 pm

    is this the opposite of people seeing/searching for dicks here or there ?

  • Post Author
    shortrounddev2
    Posted February 8, 2025 at 7:24 pm

    Do this in reverse and ask it to generate ascii art for you

  • Post Author
    mrbonner
    Posted February 8, 2025 at 7:24 pm

    Is it just me thinking that we are officially in the new territory of trolling LLMs?

  • Post Author
    mariofilho
    Posted February 8, 2025 at 7:27 pm

    I uploaded the image to Gemini 2.0 Flash Thinking 01 21 and asked:

    “ Here is a steps vs bmi plot. What do you notice?”

    Part of the answer:

    “Monkey Shape: The most striking feature of this plot is that the data points are arranged to form the shape of a monkey. This is not a typical scatter plot where you'd expect to see trends or correlations between variables in a statistical sense. Instead, it appears to be a creative visualization where data points are placed to create an image.”

    Gemini 2.0 Pro without thinking didn’t see the monkey

  • Post Author
    notnmeyer
    Posted February 8, 2025 at 7:28 pm

    maybe i dont get it, but can we conclusively say that the gorilla wasn’t “seen” vs. deemed to be irrelevant to the questions being asked?

    “look at the scatter plot again” is anthropomorphizing the llm and expecting it to infer a fairly odd intent.

    would queries like, “does the scatter plot visualization look like any real world objects?” may have produced a result the author was fishing for.

    if it were the opposite situation and you were trying to answer “real” questions and the llm was suggesting, “the data is visualized looks like notorious big” we’d all be here laughing at a different post about the dumb llm.

  • Post Author
    zmgsabst
    Posted February 8, 2025 at 7:32 pm

    But both models did see the gorilla when prompted with it…?

    ChatGPT:

    > It looks like the scatter plot unintentionally formed an artistic pattern rather than a meaningful representation of the data.

    Claude:

    > Looking at the scatter plot more carefully, I notice something concerning: there appear to be some unlikely or potentially erroneous values in the data. Let me analyze this in more detail.

    > Ah, now I see something very striking that I missed in my previous analysis – there appears to be a clear pattern in the data points that looks artificial. The data points form distinct curves and lines across the plot, which is highly unusual for what should be natural, continuous biological measurements.

    Given the context of asking for quantitative analysis and their general beaten-into-submission attitude where they defer to you, eg, your assertion this is a real dataset… I’m not sure what conclusion we’re supposed to draw.

    That if you lie to the AI, it’ll believe you…?

    Neither was prompted that this is potentially adversarial data — and AI don’t generally infer social context very well. (A similar effect occurs with math tests.)

  • Post Author
    hinkley
    Posted February 8, 2025 at 7:35 pm

    Boring.

    I don’t even like AI and I still will tell you this whole premise is bullshit.

    ChatGPT got

    > It looks like the scatter plot unintentionally formed an artistic pattern rather than a meaningful representation of the data.

    Claude drew a scatter plot with points that are so fat that it doesn’t look like a gorilla. It looks like two graffiti artists fighting over drawing space.

    It’s a resolution problem.

    What happens if you give Claude the picture ChatGPT generated?

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.