R1 Computer Use by mountainriver

Share This Article

Sed ut perspiciatis unde.

Applying the ideas of Deepseek R1 and Open R1 to computer use.

r1-computer-use is an experimental project that applies large-scale Reinforcement Learning techniques similar to DeepSeek-R1 to computer usage scenarios. The primary goal is to train an agent to interact with a computer environment (e.g., file system, web browser, command line) while utilizing a neural reward model to validate the correctness of the agent’s actions and reason about intermediate steps.

DeepSeek-R1 has shown that large language models can develop powerful reasoning skills through iterative reward optimization. Traditionally, such projects rely on hard verifiers or rule-based scripts to determine correctness in tasks like math or coding. However, these methods are too difficult to reproduce at scale for general computer usage.

We aim to replace hard-coded verifiers with a neural reward model that itself reasons about whether or not the agent’s actions are correct or helpful.

Both the actor and reward models follow a three-step cycle which can be seen as an extention of ReACT into reinforcement learning.

observation = "Current directory contains: setup.py requirements.txt"
reasoning = """
1. Project appears to be a Python package
2. No virtual environment detected
3. Should create venv before proceeding
"""
action = "python -m venv .venv"

analysis = """
1. Correctly identified project type
2. Appropriate prerequisite check
3. Standard venv location chosen
"""
reward = 0.85

from r1_computer_use import Agent, RewardModel

agent = Agent()
reward_model = RewardModel()

result = agent.run(
    task="Set up Python development environment",
    obse

Post Author

mountainriver

Posted February 6, 2025 at 8:02 pm

Hey HN,

We are working to apply the ideas of R1 to computer use. The primary struggle is creating reliable neural reward models since hard-verification rewards are not available at scale in GUI interactions.

Our team is currently deep in the weeds of collecting reasoning annotation data for GUI interfaces to train a reliable reward model.

We would love all thoughts, feedback, and collaborations!

0Likes Log in to Reply
Post Author

llama-mini

Posted February 6, 2025 at 8:46 pm

It seems a placeholder for now? No content? Right?

0Likes Log in to Reply
Post Author

falcor84

Posted February 6, 2025 at 8:50 pm

> @software{r1_computer_use,
title = {R1-Computer-Use: Reasoning-First Computer Interaction},
author = {Barker, Patrick},
year = {2025},
url = {https://github.com/agentsea/r1-computer-use},
}

Sorry to be a party-pooper, but does it really make sense to add a citation when you don't have fully working code yet, let alone a paper about it?

0Likes Log in to Reply
Post Author

fkyoureadthedoc

Posted February 6, 2025 at 8:59 pm

This is the type of post some VP at my company sees and starts telling people that R1 can use computer and then I have to be like "well actually" to 25 people.

Computer use is pretty exciting stuff in general though, good luck

0Likes Log in to Reply
Post Author

refulgentis

Posted February 6, 2025 at 9:04 pm

Free advice (though, worth less than free, because A) it's unsolicited B) it's saying "don't do it")

TL;DR:

– Turns out that if you do UXR, even if computer use is 100% successful in the action execution, and there's no latency, people don't use it. (interesting to me is, the core demo was buying airline tickets, and so is OpenAI's. no one would defer to a computer on that, for humanist / design reasons)

– You're not going to be able out-do model companies on building models, they have too much funds.

– Try writing GUI-based integration tests. Then imagine an LLM, miraculously, always chooses the right route. Does the UX look good?

– Note the reasoning models are worse at tool calling. It's very, very, VERY stark when you have Claude next to o1/4o. OpenAI also owns up to this in the o3-mini paper, though its not under a blaring red line headline or phrased that straightforwardly.

– Why is that? You're fighting against the current when you're trying to teach the next token predictor to throw a bunch of text out there to <think>, then generate perfectly correct JSON/python/whatever given N tools.

CLI, though….

0Likes Log in to Reply
Post Author

crazygringo

Posted February 6, 2025 at 9:07 pm

I can't wait for something like this to be built.

People have tons of workflows that involve a lot of clicks and typing in response to data that are too difficult or one-off to automate with fragile macros.

But if my computer can quickly realize that I'm deleting every odd-numbered page of a PDF, or renaming every file to add a prefix, or following each link on a website and saving an image… and then just instantly automate the next 100 times… that's going to be huge!

0Likes Log in to Reply
Post Author

iiJDSii

Posted February 6, 2025 at 9:24 pm

What does your perception look like, are you using raw screenshots? GUI snapshots? Vision is very difficult for these, and snapshots are incomplete, is what I've found in some earlier experiments.

0Likes Log in to Reply
Post Author

mkagenius

Posted February 6, 2025 at 9:38 pm

Training a base model just for computer use seems like an overkill as normal reasoning model like o3 for planning + a vision model like gemini-flash is good enough[1] without being trained specifically for computer use.

But if you still want to try out this path, Google has made the screenQA dataset(rico) available[2] along with bounding boxes.

1. A framework to use local/hosted models for android use/control – https://github.com/BandarLabs/clickclickclick

2. https://github.com/google-research-datasets/screen_qa

0Likes Log in to Reply
Post Author

emregucerr

Posted February 6, 2025 at 9:39 pm

i wonder how good is R1 at counting pixels from a screenshot. what enabled claude and OAI's CUA to develop computer use was being able to precisely give x-y coordinates of a click location.

also, how big of a gain to have reasoning for computer use? i feel like reasoning unlocks a lot when there is a single complex question but not so much better at taking actions in a long term plan.

0Likes Log in to Reply
Post Author

3s

Posted February 6, 2025 at 9:49 pm

Are people concerned about the privacy implications of computer use at all? This is why I haven’t been using Claude computer use personally. Somehow the idea of sending everything I do on my computer to a random third party seems creepy. There are a lot of applications of AI (rewind comes to mind) that I simply cannot accept the idea of sharing my screen with

0Likes Log in to Reply

R1 Computer Use by mountainriver

R1 Computer Use by mountainriver

Share This Article

Newsletter

HackTech

10 Comments

mountainriver

llama-mini

falcor84

fkyoureadthedoc

refulgentis

crazygringo

iiJDSii

mkagenius

emregucerr

3s

Leave a comment Cancel reply

Editor's Choice

R1 Computer Use by mountainriver

R1 Computer Use by mountainriver

Share This Article

Newsletter

10 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter