I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

Share This Article

Sed ut perspiciatis unde.

In this post I’ll show you how I found a zeroday vulnerability in the Linux kernel using OpenAI’s o3 model. I found the vulnerability with nothing more complicated than the o3 API – no scaffolding, no agentic frameworks, no tool use.

Recently I’ve been auditing ksmbd for vulnerabilities. ksmbd is “a linux kernel server which implements SMB3 protocol in kernel space for sharing files over network.“. I started this project specifically to take a break from LLM-related tool development but after the release of o3 I couldn’t resist using the bugs I had found in ksmbd as a quick benchmark of o3’s capabilities. In a future post I’ll discuss o3’s performance across all of those bugs, but here we’ll focus on how o3 found a zeroday vulnerability during my benchmarking. The vulnerability it found is CVE-2025-37899 (fix here), a use-after-free in the handler for the SMB ‘logoff’ command. Understanding the vulnerability requires reasoning about concurrent connections to the server, and how they may share various objects in specific circumstances. o3 was able to comprehend this and spot a location where a particular object that is not referenced counted is freed while still being accessible by another thread. As far as I’m aware, this is the first public discussion of a vulnerability of that nature being found by a LLM.

Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.

Benchmarking o3 using CVE-2025-37778

Lets first discuss CVE-2025-37778, a vulnerability that I found manually and which I was using as a benchmark for o3’s capabilities. While it’s not a zeroday, it’s worth explaining as the benchmark shows o3 performing 2x-3x better than Claude Sonnet 3.7 on the same task.

CVE-2025-37778 is a use-after-free vulnerability. The issue occurs during the Kerberos authentication path when handling a “session setup” request from a remote client. To save us referring to CVE numbers, I will refer to this vulnerability as the “kerberos authentication vulnerability“.

The root cause looks as follows:

static int krb5_authenticate(struct ksmbd_work *work,
			     struct smb2_sess_setup_req *req,
			     struct smb2_sess_setup_rsp *rsp)
{
...
	if (sess->state == SMB2_SESSION_VALID) 
		ksmbd_free_user(sess->user);
	
	retval = ksmbd_krb5_authenticate(sess, in_blob, in_len,
					 out_blob, &out_len);
	if (retval) {
		ksmbd_debug(SMB, "krb5 authentication failedn");
		return -EINVAL;
	}
...

If krb5_authenticate detects that the session state is SMB2_SESSION_VALID then it frees sess->user. The assumption here appears to be that afterwards either ksmbd_krb5_authenticate will reinitialise it to a new valid value, or that after returning from krb5_authenticate with a return value of -EINVAL that sess->user will not be used elsewhere. As it turns out, this assumption is false. We can force ksmbd_krb5_authenticate to not reinitialise sess->user, and we can access sess->user even if krb5_authenticate returns -EINVAL.

This vulnerability is a nice benchmark for LLM capabilities as:

It is interesting by virtue of being part of the remote attack surface of the Linux kernel.
It is not trivial as it requires:
- (a) Figuring out how to get sess->state == SMB2_SESSION_VALID in order to trigger the free.
- (b) Realising that there are paths in ksmbd_krb5_authenticate that do not reinitialise sess->user and reasoning about how to trigger those paths.
- (c) Realising that there are other parts of the codebase that could potentially access sess->user after it has been freed.
While it is not trivial, it is also not insanely complicated. I could walk a colleague through the entire code-path in 10 minutes, and you don’t really need to understand a lot of auxiliary information about the Linux kernel, the SMB protocol, or the remainder of ksmbd, outside of connection handling and session setup code. I calculated how much code you would need to read at a minimum if you read every ksmbd function called along the path from a packet arriving to the ksmbd module to the vulnerability being triggered, and it works out at about 3.3k LoC.

OK, so we have the vulnerability we want to use for evaluation, now what code do we show the LLM to see if it can find it? My goal here is to evaluate how o3 would perform were it the backend for a hypothetical vulnerability detection system, so we need to ensure we have clarity on how such a system would generate queries to the LLM. In other words, it is no good arbit

Post Author

zison

Posted May 24, 2025 at 2:38 pm

Very interesting. Is the bug it found exploitable in practice? Could this have been found by syzkaller?

0Likes Log in to Reply
Post Author

zielmicha

Posted May 24, 2025 at 2:43 pm

(To be clear, I'm not the author of the post, the title just starts with "How I")

0Likes Log in to Reply
Post Author

mdaniel

Posted May 24, 2025 at 4:20 pm

Noteable:

> o3 finds the kerberos authentication vulnerability in 8 of the 100 runs

And I'd guess this only became a blog post because the author already knew about the vuln and was just curious to see if the intern could spot it too, given a curated subset of the codebase

0Likes Log in to Reply
Post Author

Retr0id

Posted May 24, 2025 at 6:23 pm

The article cites a signal to noise ratio of ~1:50. The author is clearly deeply familiar with this codebase and is thus well-positioned to triage the signal from the noise. Automating this part will be where the real wins are, so I'll be watching this closely.

0Likes Log in to Reply
Post Author

Hilift

Posted May 24, 2025 at 6:31 pm

Does the vulnerability exist in other implementations of SMB?

0Likes Log in to Reply
Post Author

logifail

Posted May 24, 2025 at 6:42 pm

My understanding is that ksmbd is a kernel-space SMB server "developed as a lightweight, high-performance alternative" to the traditional (user-space) Samba server…

Q1: Who is using ksmbd in production?

Q2: Why?

0Likes Log in to Reply
Post Author

iandanforth

Posted May 24, 2025 at 6:47 pm

The most interesting and significant bit of this article for me was that the author ran this search for vulnerabilities 100 times for each of the models. That's significantly more computation than I've historically been willing to expend on most of the problems that I try with large language models, but maybe I should let the models go brrrrr!

0Likes Log in to Reply
Post Author

mezyt

Posted May 24, 2025 at 6:54 pm

Meanwhile, as a maintainer, I've been reviewing more than a dozen false positives slop CVEs in my library and not a single one found an actual issue. This article's is probably going to make my situation worse.

0Likes Log in to Reply
Post Author

jobswithgptcom

Posted May 24, 2025 at 6:58 pm

Wow, interesting. I been hacking a tool called https://diffwithgpt.com with a similar angle but indexing git changelogs with qwen to have it raise risks for backward compat issues, risks including security when upgrading k8s etc.

0Likes Log in to Reply
Post Author

empath75

Posted May 24, 2025 at 7:21 pm

Given the value of finding zero days, pretty much every intelligence agency in the world is going to be pouring money into this if it can reliably find them with just a few hundred api calls. Especially if you can fine tune a model with lots of examples, which I don't think open ai, etc are going to do with any public api.

0Likes Log in to Reply
Post Author

akomtu

Posted May 24, 2025 at 7:22 pm

This made me think that the near future will be LLMs trained specifically on Linux or another large project. The source code is a small part of the dataset fed to LLMs. The more interesting is runtime data flow, similar to what we observe in a debugger. Looking at the codebase alone is like trying to understand a waterfall by looking at equations that describe the water flow.

0Likes Log in to Reply
Post Author

KTibow

Posted May 24, 2025 at 7:40 pm

> With o3 you get something that feels like a human-written bug report, condensed to just present the findings, whereas with Sonnet 3.7 you get something like a stream of thought, or a work log.

This is likely because the author didn't give Claude a scratchpad or space to think, essentially forcing it to mix its thoughts with its report. I'd be interested to see if using the official thinking mechanism gives it enough space to get differing results.

0Likes Log in to Reply
Post Author

nxobject

Posted May 24, 2025 at 7:43 pm

A small thing, but I found the author's project-organization practices useful – creating individual .prompt files for system prompt, background information, and auxiliary instructions [1], and then running it through `llm`.

It reveals how good LLM use, like any other engineering tool, requires good engineering thinking – methodical, and oriented around thoughtful specifications that balance design constraints – for best results.

[1] https://github.com/SeanHeelan/o3_finds_cve-2025-37899

0Likes Log in to Reply
Post Author

dehrmann

Posted May 24, 2025 at 7:50 pm

Are there better tools for finding this? It feels like the sort of thing static analysis should reliably find, but it's in the Linux kernel, so you'd think either coding standards or tooling around these sorts of C bugs would be mature.

0Likes Log in to Reply
Post Author

firesteelrain

Posted May 24, 2025 at 8:08 pm

I really hope this is legit and not what keeps happening to curl

[1] https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f…

0Likes Log in to Reply
Post Author

ape4

Posted May 24, 2025 at 8:21 pm

Seems we need something like kernel modules but with memory protection

0Likes Log in to Reply
Post Author

martinald

Posted May 24, 2025 at 8:54 pm

I think this is the biggest alignment problem with LLMs in the short term imo. It is getting scarily good at this.

I recently found a pretty serious security vulnerability in an open source very niche server I sometimes use. This took virtually no effort using LLMs. I'm worried that there is a huge long tail of software out there which wasn't worth finding vulnerabilities in for nefarious means manually but if it was automated could lead to really serious problems.

0Likes Log in to Reply
Post Author

dboreham

Posted May 24, 2025 at 9:00 pm

I feel like our jobs are reasonably secure for a while because the LLM didn't immediately say "SMB implemented in the kernel, are you f-ing joking!?"

0Likes Log in to Reply
Post Author

simonw

Posted May 24, 2025 at 9:02 pm

There's a beautiful little snippet here that perfectly captures how most of my prompt development sessions go:

> I tried to strongly guide it to not report false positives, and to favour not reporting any bugs over reporting false positives. I have no idea if this helps, but I’d like it to help, so here we are. In fact my entire system prompt is speculative in that I haven’t ran a sufficient number of evaluations to determine if it helps or hinders, so consider it equivalent to me saying a prayer, rather than anything resembling science or engineering. Once I have ran those evaluations I’ll let you know.

0Likes Log in to Reply

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

Share This Article

Newsletter

Benchmarking o3 using CVE-2025-37778

HackTech

19 Comments

zison

zielmicha

mdaniel

Retr0id

Hilift

logifail

iandanforth

mezyt

jobswithgptcom

empath75

akomtu

KTibow

nxobject

dehrmann

firesteelrain

ape4

martinald

dboreham

simonw

Leave a comment Cancel reply

Editor's Choice

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

I used o3 to find a remote zeroday in the Linux SMB implementation by zielmicha

Share This Article

Newsletter

Benchmarking o3 using CVE-2025-37778

19 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter