In 2024 we released the blog post We Hacked Google A.I. for $50,000, where we traveled in 2023 to Las Vegas with Joseph “rez0” Thacker, Justin “Rhynorater” Gardner, and myself, Roni “Lupin” Carta, on a hacking journey that spanned from Las Vegas, Tokyo to France, all in pursuit of Gemini vulnerabilities during Google’s LLM bugSWAT event. Well, we did it again …
The world of Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) continues to be the Wild West of tech. Since GPT burst onto the scene, the race to dominate the LLM landscape has only intensified, with tech giants like Meta, Microsoft, and Google racing to have the best model possible. But now there is also Anthropic, Mistral, Deepseek and more that are coming to the scene and impacting the industry at scale.
As companies rush to deploy AI assistants, classifiers, and a myriad of other LLM-powered tools, a critical question remains: are we building securely ? As we highlighted last year, the rapid adoption sometimes feels like we forgot the fundamental security principles, opening the door to novel and familiar vulnerabilities alike.
AI agents are rapidly emerging as the next game-changer in the world of artificial intelligence. These intelligent entities leverage advanced chains of thought reasoning, a process where the model generates a coherent sequence of internal reasoning steps to solve complex tasks. By documenting their thought processes, these agents not only enhance their decision-making capabilities but also provide transparency, allowing developers and researchers to understand and refine their performance. This dynamic combination of autonomous action and visible reasoning is paving the way for AI systems that are more adaptive, interpretable, and reliable. As we witness an increasing number of applications. from interactive assistants to sophisticated decision-support systems. The integration of chain-of-thought reasoning in AI agents is setting a new standard for what these models can achieve in real-world scenarios.
Google, to their credit, are actively recognising this emerging frontier of AI security, and they started early on. Their “LLM bugSWAT” events, held in vibrant locales like Las Vegas, are a testament to their commitment to proactive security red teaming. These events challenge researchers worldwide to rigorously test their AI systems, seeking out the vulnerabilities that might otherwise slip through the cracks.
And guess what ? We answered the call again in 2024 ! Justin and I returned to the bugSWAT event in Las Vegas, and this time, our efforts paid off in a big way. Thanks to a brand new vulnerability in Gemini, the one we’re about to detail, we were incredibly honored to be awarded the Most Valuable Hacker (MVH) title at this year’s Las Vegas bugSWAT !
Picture taken with our MVH award and 2 awesome Googlers <3
So, prepare to dive deep once more. This isn’t just a repeat performance; it’s a whole new vulnerability that we are about to show you ;)
The Google team granted us early access to a preview of the next Gemini update, one that had several exciting new features. Along with this exclusive access, we received detailed documentation explaining these features and their intended functionalities. The goal was to fully explore and test these capabilities from an attacker’s perspective.
It all started with a simple prompt. We asked Gemini:
run hello world in python3
Gemini provided the code, and the interface offered the enticing “Run in Sandbox” button. Intrigued, we started exploring.
Gemini at the time offered a Python Sandbox Interpreter. Think of it as a safe space where you can run Python code generated by the AI itself, or even your own custom scripts, right within the Gemini environment. This sandbox, powered by Google’s Gvisor in a GRTE (Google Runtime Environment), is designed to be secure. The idea is you can experiment with code without risking any harm to the underlying system, a crucial feature for testing and development.
gVisor is a user-space kernel developed by Google that acts as an intermediary between containerized applications and the host operating system. By intercepting system calls made by applications, it enforces strict security boundaries that reduce the risk of container escapes and limit potential damage from compromised processes. Rather than relying solely on traditional OS-level isolation, gVisor implements a minimal, tailored subset of kernel functionalities, thereby reducing the attack surface while still maintaining reasonable performance. This innovative approach enhances the security of container environments, making gVisor an essential tool for safely running and managing containerized workloads.
As security researchers and bug bounty hunters, we know that this gVisor sandbox is secured with multiple layers of defense and from what we’ve seen no one managed to escape this sandbox. Actually a sandbox escape could award you a $100k bounty:
While it might be possible to still escape it, this is a whole different set of challenges than what we were looking for.
However, sandboxes are not always meant to be escaped since there are a lot of cases where there is stuff inside the sandbox itself that can help us leak data. This idea, shared with us by a Googler from the security team, was to be able to have shell access inside the Sandbox itself and try to find any piece of data that wasn’t supposed to be accessible. The main problem was the following: This sandbox can only run a custom compiled Python binary.
The first thing we saw is that it was also possible from the Front End to entirely rewrite the Python code and run our arbitrary version in the sandbox. Our first step was to understand the structure of this sandbox. We suspected there might be interesting files lurking around. Since we can’t pop a shell, we checked which libraries were available in this custom compiled Python binary. We found out that os was present ! Great, we can then use it to map the filesystem.
We wrote the following Python Code:
import os
def get_size_formatted(size_in_bytes):
if size_in_bytes >= 1024 ** 3:
size = size_in_bytes / (1024 ** 3)
unit = "Go"
elif size_in_bytes >= 1024 ** 2:
size = size_in_bytes / (1024 ** 2)
unit = "Mb"
else:
size = size_in_bytes / 1024
unit = "Ko"
return f"{size:.2f} {unit}"
def lslR(path):
try:
# Determine if the path is a directory or a file
if os.path.isdir(path):
type_flag = 'd'
total_size = sum(os.path.getsize(os.path.join(path, f)) for f in os.listdir(path))
else:
type_flag = 'f'
total_size = os.path.getsize(path)
size_formatted = get_size_formatted(total_size)
# Check read and write permissions
read_flag = 'r' if os.access(path, os.R_OK) else '-'
write_flag = 'w' if os.access(path, os.W_OK) else '-'
# Print the type, permissions, size, and path
print(f"{type_flag}{read_flag}{write_flag} - {size_formatted} - {path}")
# If it's a directory, recursively print the contents
if type_flag == 'd':
for entry in os.listdir(path):
entry_path = os.path.join(path, entry)
lslR(entry_path)
except PermissionError:
print(f"d-- - 0Ko - {path} (PermissionError: cannot access)")
except Exception as e:
print(f"--- - 0Ko - {path} (Error: {e})")
The goal for this code was to have some kind of recursive listing of files and directories function to be able to see which files are present, their size and also their permissions.
We’ve used the function to list the lslR(
11 Comments
sneak
> However, the build pipeline for compiling the sandbox binary included an automated step that adds security proto files to a binary whenever it detects that the binary might need them to enforce internal rules. In this particular case, that step wasn’t necessary, resulting in the unintended inclusion of highly confidential internal protos in the wild !
Protobufs aren't really these super secret hyper-proprietary things they seem to make them out to be in this breathless article.
topsycatt
That's the system I work on! Please feel free to ask any questions. All opinions are my own and do not represent those of my employer.
fpgaminer
Awww, I was looking forward to seeing some of the leak ;) Oh well. Nice find and breakdown!
Somewhat relatedly, it occurred to me recently just how important issues like prompt injection, etc are for LLMs. I've always brushed them off as unimportant to _me_ since I'm most interested in local LLMs. Who cares if a local LLM is weak to prompt injection or other shenanigans? It's my AI to do with as I please. If anything I want them to be, since it makes it easier to jailbreak them.
Then Operator and Deep Research came out and it finally made sense to me. When we finally have our own AI Agents running locally doing jobs for us, they're going to encounter random internet content. And the AI Agent obviously needs to read that content, or view the images. And if it's doing that, then it's vulnerable to prompt injection by third party.
Which, yeah, duh, stupid me. But … is also a really fascinating idea to consider. A future where people have personal AIs, and those AIs can get hacked by reading the wrong thing from the wrong backalley of the internet, and suddenly they are taken over by a mind virus of sorts. What a wild future.
paxys
Funny enough while "We hacked Google's AI" is going to get the clicks, in reality they hacked the one part of Gemini that was NOT the LLM (a sandbox environment meant to run untrusted user-provided code).
And "leaked its source code" is straight up click bait.
ein0p
They hacked the sandbox, and leaked nothing. The article is entertaining though.
simonw
I've been using a similar trick to scrape the visible internal source code of ChatGPT Code Interpreter into a GitHub repository for a while now: https://github.com/simonw/scrape-openai-code-interpreter
It's mostly useful for tracking what Python packages are available (and what versions): https://github.com/simonw/scrape-openai-code-interpreter/blo…
theLiminator
It's actually pretty interesting that this shows that Google is quite secure, I feel like most companies would not fare nearly as well.
jll29
Running the built-in "strings" command to extract a few file names from a binary is hardly hacking/cracking.
Ironically, though, getting the source code of Gemini perhaps wouln't be valuable at all; but if you had found/obtained access to the corpus that the model was pre-trained with, that would have been kind of interesting (many folks have many questions about that…).
tgtweak
The definition of hacking is getting pretty loose. This looks like the sandbox is doing exactly what it's supposed to do and nothing sensitive was exfiltrated…
jeffbee
I guess these guys didn't notice that all of these proto descriptors, and many others, were leaked on github 7 years ago.
https://github.com/ezequielpereira/GAE-RCE/tree/master/proto…
bluelightning2k
Cool write up. Although it's not exactly a huge vulnerability. I guess it says a lot about how security conscious Google is that they consider this to be significant. (You did mention that you knew the company's specific policy considered this highly confidential so it does count but it feels a little more like "technically considered a vulnerability" rather than clearly one.)