Skip to content Skip to footer
0 items - $0.00 0

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

19 Comments

  • Post Author
    zison
    Posted May 24, 2025 at 2:38 pm

    Very interesting. Is the bug it found exploitable in practice? Could this have been found by syzkaller?

  • Post Author
    zielmicha
    Posted May 24, 2025 at 2:43 pm

    (To be clear, I'm not the author of the post, the title just starts with "How I")

  • Post Author
    mdaniel
    Posted May 24, 2025 at 4:20 pm

    Noteable:

    > o3 finds the kerberos authentication vulnerability in 8 of the 100 runs

    And I'd guess this only became a blog post because the author already knew about the vuln and was just curious to see if the intern could spot it too, given a curated subset of the codebase

  • Post Author
    Retr0id
    Posted May 24, 2025 at 6:23 pm

    The article cites a signal to noise ratio of ~1:50. The author is clearly deeply familiar with this codebase and is thus well-positioned to triage the signal from the noise. Automating this part will be where the real wins are, so I'll be watching this closely.

  • Post Author
    Hilift
    Posted May 24, 2025 at 6:31 pm

    Does the vulnerability exist in other implementations of SMB?

  • Post Author
    logifail
    Posted May 24, 2025 at 6:42 pm

    My understanding is that ksmbd is a kernel-space SMB server "developed as a lightweight, high-performance alternative" to the traditional (user-space) Samba server…

    Q1: Who is using ksmbd in production?

    Q2: Why?

  • Post Author
    iandanforth
    Posted May 24, 2025 at 6:47 pm

    The most interesting and significant bit of this article for me was that the author ran this search for vulnerabilities 100 times for each of the models. That's significantly more computation than I've historically been willing to expend on most of the problems that I try with large language models, but maybe I should let the models go brrrrr!

  • Post Author
    mezyt
    Posted May 24, 2025 at 6:54 pm

    Meanwhile, as a maintainer, I've been reviewing more than a dozen false positives slop CVEs in my library and not a single one found an actual issue. This article's is probably going to make my situation worse.

  • Post Author
    jobswithgptcom
    Posted May 24, 2025 at 6:58 pm

    Wow, interesting. I been hacking a tool called https://diffwithgpt.com with a similar angle but indexing git changelogs with qwen to have it raise risks for backward compat issues, risks including security when upgrading k8s etc.

  • Post Author
    empath75
    Posted May 24, 2025 at 7:21 pm

    Given the value of finding zero days, pretty much every intelligence agency in the world is going to be pouring money into this if it can reliably find them with just a few hundred api calls. Especially if you can fine tune a model with lots of examples, which I don't think open ai, etc are going to do with any public api.

  • Post Author
    akomtu
    Posted May 24, 2025 at 7:22 pm

    This made me think that the near future will be LLMs trained specifically on Linux or another large project. The source code is a small part of the dataset fed to LLMs. The more interesting is runtime data flow, similar to what we observe in a debugger. Looking at the codebase alone is like trying to understand a waterfall by looking at equations that describe the water flow.

  • Post Author
    KTibow
    Posted May 24, 2025 at 7:40 pm

    > With o3 you get something that feels like a human-written bug report, condensed to just present the findings, whereas with Sonnet 3.7 you get something like a stream of thought, or a work log.

    This is likely because the author didn't give Claude a scratchpad or space to think, essentially forcing it to mix its thoughts with its report. I'd be interested to see if using the official thinking mechanism gives it enough space to get differing results.

  • Post Author
    nxobject
    Posted May 24, 2025 at 7:43 pm

    A small thing, but I found the author's project-organization practices useful – creating individual .prompt files for system prompt, background information, and auxiliary instructions [1], and then running it through `llm`.

    It reveals how good LLM use, like any other engineering tool, requires good engineering thinking – methodical, and oriented around thoughtful specifications that balance design constraints – for best results.

    [1] https://github.com/SeanHeelan/o3_finds_cve-2025-37899

  • Post Author
    dehrmann
    Posted May 24, 2025 at 7:50 pm

    Are there better tools for finding this? It feels like the sort of thing static analysis should reliably find, but it's in the Linux kernel, so you'd think either coding standards or tooling around these sorts of C bugs would be mature.

  • Post Author
    firesteelrain
    Posted May 24, 2025 at 8:08 pm

    I really hope this is legit and not what keeps happening to curl

    [1] https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f…

  • Post Author
    ape4
    Posted May 24, 2025 at 8:21 pm

    Seems we need something like kernel modules but with memory protection

  • Post Author
    martinald
    Posted May 24, 2025 at 8:54 pm

    I think this is the biggest alignment problem with LLMs in the short term imo. It is getting scarily good at this.

    I recently found a pretty serious security vulnerability in an open source very niche server I sometimes use. This took virtually no effort using LLMs. I'm worried that there is a huge long tail of software out there which wasn't worth finding vulnerabilities in for nefarious means manually but if it was automated could lead to really serious problems.

  • Post Author
    dboreham
    Posted May 24, 2025 at 9:00 pm

    I feel like our jobs are reasonably secure for a while because the LLM didn't immediately say "SMB implemented in the kernel, are you f-ing joking!?"

  • Post Author
    simonw
    Posted May 24, 2025 at 9:02 pm

    There's a beautiful little snippet here that perfectly captures how most of my prompt development sessions go:

    > I tried to strongly guide it to not report false positives, and to favour not reporting any bugs over reporting false positives. I have no idea if this helps, but I’d like it to help, so here we are. In fact my entire system prompt is speculative in that I haven’t ran a sufficient number of evaluations to determine if it helps or hinders, so consider it equivalent to me saying a prayer, rather than anything resembling science or engineering. Once I have ran those evaluations I’ll let you know.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.