AI tools are spotting errors in research papers by kgwgk

ByHackTech March 9, 2025

35Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

Late last year, media outlets worldwide warned that black plastic cooking utensils contained worrying levels of cancer-linked flame retardants. The risk was found to be overhyped – a mathematical error in the underlying research suggested a key chemical exceeded the safe limit when in fact it was ten times lower than the limit. Keen-eyed researchers quickly showed that an artificial intelligence (AI) model could have spotted the error in seconds.

The incident has spurred two projects that use AI to find mistakes in the scientific literature. The Black Spatula Project is an open-source AI tool that has so far analysed around 500 papers for errors. The group, which has around eight active developers and hundreds of volunteer advisers, hasn’t made the errors public yet; instead, it is approaching the affected authors directly, says Joaquin Gulloso, an independent AI researcher based in Cartagena, Colombia, who helps to coordinate the project. “Already, it’s catching many errors,” says Gulloso. “It’s a huge list. It’s just crazy.”

The other effort is called YesNoError and was inspired by the Black Spatula Project, says founder and AI entrepreneur Matt Schlicht. The initiative, funded by its own dedicated cryptocurrency, has set its sights even higher. “I thought, why don’t we go through, like, all of the papers?” says Schlicht. He says that their AI tool has analysed more than 37,000 papers i

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (35)

35 Comments

Post Author

more_corn

Posted March 8, 2025 at 3:05 pm

This is great to hear. I good use of AI if the false positives can be controlled.

0Likes Log in to Reply
Post Author

topaz0

Posted March 8, 2025 at 3:16 pm

This is such a bad idea. Skip the first section and read the "false positives" section.

0Likes Log in to Reply
Post Author

crazygringo

Posted March 8, 2025 at 3:27 pm

This actually feels like an amazing step in the right direction.

If AI can help spot obvious errors in published papers, it can do it as part of the review process. And if it can do it as part of the review process, authors can run it on their own work before submitting. It could massively raise the quality level of a lot of papers.

What's important here is that it's part of a process involving experts themselves — the authors, the peer reviewers. They can easily dismiss false positives, but especially get warnings about statistical mistakes or other aspects of the paper that aren't their primary area of expertise but can contain gotchas.

0Likes Log in to Reply
Post Author

epidemiology

Posted March 8, 2025 at 3:33 pm

AI tools are hopefully going to eat lots of manual scientific research. This article looks at error spotting, but you follow the path of getting better and better at error spotting to it's conclusion and you essentially reproduce the work entirely from scratch. So in fact AI study generation is really where this is going.

All my work could honestly be done instantaneously with better data harmonization & collection along with better engineering practices. Instead, it requires a lot of manual effort. I remember my professors talking about how they used to calculate linear regressions by hand back in the old days. Hopefully a lot of the data cleaning and study setup that is done now sounds similar to a set of future scientists who use AI tools to operate and check these basic programatic and statistical tasks.

0Likes Log in to Reply
Post Author

latexr

Posted March 8, 2025 at 3:33 pm

https://archive.ph/20250307115346/https://www.nature.com/art…

0Likes Log in to Reply
Post Author

sega_sai

Posted March 8, 2025 at 3:38 pm

As a researcher I say it is a good thing. Provided it gives a small number of errors that are easy to check, it is a no-brainer. I would say it is more valuable for authors though to spot obvious issues.
I don't think it will drastically change the research, but is an improvement over a spell check or running grammarly.

0Likes Log in to Reply
Post Author

simonw

Posted March 8, 2025 at 3:42 pm

"YesNoError is planning to let holders of its cryptocurrency dictate which papers get scrutinized first."

Sigh.

0Likes Log in to Reply
Post Author

yosito

Posted March 8, 2025 at 3:51 pm

While I don't doubt that AI tools can spot some errors that would be tedious for humans to look for, they are also responsible for far more errors. That's why proper understanding and application of AI is important.

0Likes Log in to Reply
Post Author

tomrod

Posted March 8, 2025 at 3:56 pm

I know academics that use it to make sure their arguments are grounded, after a meaningful draft. This helps them in more clearly laying out their arguments, and IMO is no worse than the companies that used motivated graduate students for review the grammar and coherency of papers written by non-native language speakers.

0Likes Log in to Reply
Post Author

webdoodle

Posted March 8, 2025 at 3:59 pm

The push for AI is about controlling the narrative. By giving AI the editorial review process, it can control the direction of science, media and policy. Effectively controlling the course of human evolution.

On the other hand, I'm fully supportive of going through ALL of the rejected scientific papers to look for editorial bias, censorship, propaganda, etc.

0Likes Log in to Reply
Post Author

bookofjoe

Posted March 8, 2025 at 4:08 pm

https://archive.ph/fqAig

0Likes Log in to Reply
Post Author

TZubiri

Posted March 8, 2025 at 4:29 pm

Didn't this YesNoError thing start as a memecoin?

0Likes Log in to Reply
Post Author

huijzer

Posted March 8, 2025 at 4:33 pm

I think improving incentives is the real problem in science. Tools aren’t gonna fix it.

0Likes Log in to Reply
Post Author

gusgus01

Posted March 8, 2025 at 4:38 pm

I'm extremely skeptical for the value in this. I've already seen wasted hours responding to baseless claims that are lent credence by AI "reviews" of open source codebases. The claims would have happened before but these text generators know how to hallucinate in the correct verbiage to convince lay people and amateurs and are more annoying to deal with.

0Likes Log in to Reply
Post Author

RainyDayTmrw

Posted March 8, 2025 at 4:52 pm

Perhaps our collective memories are too short? Did we forget what curl just went through with AI confabulated bug reports[1]?

[1]: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f…

0Likes Log in to Reply
Post Author

InkCanon

Posted March 8, 2025 at 4:56 pm

This sounds way, way out of how LLMs work. They can't count the R's in strarwberrrrrry, but they can cross reference multiple tables of data? Is there something else going on here?

0Likes Log in to Reply
Post Author

lfsh

Posted March 8, 2025 at 5:10 pm

I am using Jetbrain's AI to do code analysis (find errors).

While it sometimes spot something I missed it also gives a lot of confident 'advise' that is just wrong or not useful.

Current AI tools are still sophisticated search engines. They cannot reason or think.

So while I think it could spot some errors in research papers I am still very sceptical that it is useful as trusted source.

0Likes Log in to Reply
Post Author

_tom_

Posted March 8, 2025 at 5:19 pm

This basically turns research papers as a whole into a big generative adversarial network.

0Likes Log in to Reply
Post Author

sfink

Posted March 8, 2025 at 5:20 pm

Don't forget that this is driven by present-day AI. Which means people will assume that it's checking for fraud and incorrect logic, when actually it's checking for self-consistency and consistency with training data. So it should be great for typos, misleading phrasing, and cross-checking facts and diagrams, but I would expect it to do little for manufactured data, plausible but incorrect conclusions, and garden variety bullshit (claiming X because Y, when Y only implies X because you have a reasonable-sounding argument that it ought to).

Not all of that is out of reach. Making the AI evaluate a paper in the context of a cluster of related papers might enable spotting some "too good to be true" things.

Hey, here's an idea: use AI for mapping out the influence of papers that were later retracted (whether for fraud or error, it doesn't matter). Not just via citation, but have it try to identify the no longer supported conclusions from a retracted paper, and see where they show up in downstream papers. (Cheap "downstream" is when a paper or a paper in a family of papers by the same team ever cited the upstream paper, even in preprints. More expensive downstream is doing it without citations.)

0Likes Log in to Reply
Post Author

BurningFrog

Posted March 8, 2025 at 5:41 pm

In the not so far future we should have AIs that have read all the papers and other info in a field. They can then review any new paper as well as answering any questions in the field.

This then becomes the first sanity check for any paper author.

This should save a lot of time and effort, improve the quality of papers, and root out at least some fraud.

Don't worry, many problems will remain :)

0Likes Log in to Reply
Post Author

surferbayarea

Posted March 8, 2025 at 5:46 pm

Here are 2 examples from the Black Spatula project where we were able to detect major errors:
– https://github.com/The-Black-Spatula-Project/black-spatula-p…
– https://github.com/The-Black-Spatula-Project/black-spatula-p…

Some things to note : this didn't even require a complex multi-agent pipeline. A single shot prompting was able to detect these errors.

0Likes Log in to Reply
Post Author

systemstops

Posted March 8, 2025 at 5:48 pm

When people starting building tools like this to analyze media coverage of historic events, it will be a game changer.

0Likes Log in to Reply
Post Author

jongjong

Posted March 8, 2025 at 5:54 pm

I expect that for truly innovative research, it might flag the innovative parts of the paper as a mistake if they're not fully elaborated upon… E.g. if the author assumed that the reader possesses certain niche knowledge.

With software design, I find many mistakes in AI where it says things that are incorrect because it parrots common blanket statements and ideologies without actually checking if the statement applies in this case by looking at it from first principles… Once you take the discussion down to first principles, it quickly acknowledges its mistake but you had to have this deep insight in order to take it there… Some person who is trying to learn from AI would not get this insight from AI; instead they would be taught a dumbed-down, cartoonish, wordcel version of reality.

0Likes Log in to Reply
Post Author

delusional

Posted March 8, 2025 at 6:02 pm

Reality check: yesnoerror, the only part of the article that actually seems to involve any published AI reviewer comments, is just checking arxiv papers. Their website claims that they "uncover errors, inconsistencies, and flawed methods that human reviewers missed." but arxiv is of course famously NOT a peer-reviewed journal. At best they are finding "errors, inconsistencies, and flawed methods" in papers that human reviewers haven't looked at.

Let's then try and see if we can uncover any "errors, inconsistencies, and flawed methods" on their website. The "status" is pure madeup garbage. There's no network traffic related to it that would actually allow it to show a real status. The "RECENT ERROR DETECTIONS" lists a single paper from today, but looking at the queue when you click "submit a paper" lists the last completed paper as the 21st of February. The front page tells us that it found some math issue in a paper titled "Waste tea as absorbent for removal of heavy metal present in contaminated water" but if we navigate to that paper[1] the math error suddenly disappears. Most of the comments are also worthless, talking about minor typographical issues or misspellings that do not matter, but of course they still categorize that as an "error".

It's the same garbage as every time with crypto people.

[1]: https://yesnoerror.com/doc/82cd4ea5-4e33-48e1-b517-5ea3e2c5f…

0Likes Log in to Reply
Post Author

EigenLord

Posted March 8, 2025 at 6:42 pm

The role of LLMs in research is an ongoing, well, research topic of interest of mine. I think it's fine so long as a 1. a pair of human eyes has validated any of the generated outputs and 2. The "ownership rule": the human researcher is prepared to defend and own anything the AI model does on their behalf, implying that they have digested and understood it as well as anything else they may have read or produced in the course of conducting their research.
Rule #2 avoids this notion of crypto-plagiarism. If you prompted for a certain output, your thought in a manner of speaking was the cause of that output. If you agree with it, you should be able to use it.
In this case, using AI to fact check is kind of ironic, considering their hallucination issues. However infallibility is the mark of omniscience; it's pretty unreasonable to expect these models to be flawless. They can still play a supplementary role to the review process, a second line of defense for peer-reviewers.

0Likes Log in to Reply
Post Author

robwwilliams

Posted March 8, 2025 at 7:14 pm

Great start but definitely will require supervision by experts in the fields. I routinely use Claude 3.7 to flag errors in my submissions. Here is a prompt I used yesterday:

“This is a paper we are planning to submit to Nature Neuroscience. Please generate a numbered list of significant errors with text tags I can use to find the errors and make corrections.”

It gave me a list of 12 errors of which Claude labeled three as “inconsistencies”, “methods discrepancies”. and “contradictions”. When I requested that Claude reconsider it said “You are right, I apologize” in each of these three instances.
Nonetheless it was still a big win for me and caught a lot of my dummheits.

Claude 3.7 running in standard mode does not use its context window very effectively. I suppose I could have demanded that Claude “internally review (wait: think again)” for each serious error it initially thought it had encountered. I’ll try that next time. Exposure of chain of thought would help.

0Likes Log in to Reply
Post Author

lifeisstillgood

Posted March 8, 2025 at 8:30 pm

It’s a nice idea, and I would love to be able to use it for my own company reports (spotting my obvious errors before sending them to my bosses boss)

But the first thing I noticed was the two approaches highlighted – one a small scale approach that does not publish first but approaches the authors privately – and the other publishes first, does not have human review and has its own cryptocurrency

I don’t think anything quite speaks more about the current state of the world and the choices in our political space

0Likes Log in to Reply
Post Author

ysofunny

Posted March 8, 2025 at 9:00 pm

top two links at this moment are:

> AI tools are spotting errors in research papers: inside a growing movement (nature.com)

and

> Kill your Feeds – Stop letting algorithms dictate what you think (usher.dev)

so we shouldn't let the feed algorithms influence our thoughs, but also, AI tools need to tell us when we're wrong

0Likes Log in to Reply
Post Author

forum-soon-yuck

Posted March 8, 2025 at 9:19 pm

[dead]

0Likes Log in to Reply
Post Author

TheRealPomax

Posted March 8, 2025 at 9:55 pm

Oh look, an actual use case for AI. Very nice.

0Likes Log in to Reply
Post Author

mac-mc

Posted March 8, 2025 at 10:33 pm

Now they need to do it for their own outputs to spot their own hallucination errors.

0Likes Log in to Reply
Post Author

rosstex

Posted March 8, 2025 at 10:41 pm

Why not just skip the human and have AI write, evaluate and submit the papers?

0Likes Log in to Reply
Post Author

YeGoblynQueenne

Posted March 8, 2025 at 10:46 pm

Needs more work.

>> Right now, the YesNoError website contains many false positives, says Nick Brown, a researcher in scientific integrity at Linnaeus University. Among 40 papers flagged as having issues, he found 14 false positives (for example, the model stating that a figure referred to in the text did not appear in the paper, when it did). “The vast majority of the problems they’re finding appear to be writing issues,” and a lot of the detections are wrong, he says.

>> Brown is wary that the effort will create a flood for the scientific community to clear up, as well fuss about minor errors such as typos, many of which should be spotted during peer review (both projects largely look at papers in preprint repositories). Unless the technology drastically improves, “this is going to generate huge amounts of work for no obvious benefit”, says Brown. “It strikes me as extraordinarily naive.”

0Likes Log in to Reply
Post Author

timoth3y

Posted March 8, 2025 at 11:02 pm

Perhaps this is a naive question from a non-academic, but why isn't deliberately falsifying data or using AI tools or photoshop to create images career-ending?

Wouldn't a more direct system be one in which journals refused submissions if one of the authors had committed deliberate fraud in a previous paper?

0Likes Log in to Reply
Post Author

dbg31415

Posted March 9, 2025 at 12:18 am

Can we get it to fact check politicians and Facebook now?

0Likes Log in to Reply

AI tools are spotting errors in research papers by kgwgk

AI tools are spotting errors in research papers by kgwgk

Share This Article

Newsletter

35 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter