Skip to content Skip to footer
AI tools are spotting errors in research papers by kgwgk

AI tools are spotting errors in research papers by kgwgk

35 Comments

  • Post Author
    more_corn
    Posted March 8, 2025 at 3:05 pm

    This is great to hear. I good use of AI if the false positives can be controlled.

  • Post Author
    topaz0
    Posted March 8, 2025 at 3:16 pm

    This is such a bad idea. Skip the first section and read the "false positives" section.

  • Post Author
    crazygringo
    Posted March 8, 2025 at 3:27 pm

    This actually feels like an amazing step in the right direction.

    If AI can help spot obvious errors in published papers, it can do it as part of the review process. And if it can do it as part of the review process, authors can run it on their own work before submitting. It could massively raise the quality level of a lot of papers.

    What's important here is that it's part of a process involving experts themselves — the authors, the peer reviewers. They can easily dismiss false positives, but especially get warnings about statistical mistakes or other aspects of the paper that aren't their primary area of expertise but can contain gotchas.

  • Post Author
    epidemiology
    Posted March 8, 2025 at 3:33 pm

    AI tools are hopefully going to eat lots of manual scientific research. This article looks at error spotting, but you follow the path of getting better and better at error spotting to it's conclusion and you essentially reproduce the work entirely from scratch. So in fact AI study generation is really where this is going.

    All my work could honestly be done instantaneously with better data harmonization & collection along with better engineering practices. Instead, it requires a lot of manual effort. I remember my professors talking about how they used to calculate linear regressions by hand back in the old days. Hopefully a lot of the data cleaning and study setup that is done now sounds similar to a set of future scientists who use AI tools to operate and check these basic programatic and statistical tasks.

  • Post Author
    sega_sai
    Posted March 8, 2025 at 3:38 pm

    As a researcher I say it is a good thing. Provided it gives a small number of errors that are easy to check, it is a no-brainer. I would say it is more valuable for authors though to spot obvious issues.
    I don't think it will drastically change the research, but is an improvement over a spell check or running grammarly.

  • Post Author
    simonw
    Posted March 8, 2025 at 3:42 pm

    "YesNoError is planning to let holders of its cryptocurrency dictate which papers get scrutinized first."

    Sigh.

  • Post Author
    yosito
    Posted March 8, 2025 at 3:51 pm

    While I don't doubt that AI tools can spot some errors that would be tedious for humans to look for, they are also responsible for far more errors. That's why proper understanding and application of AI is important.

  • Post Author
    tomrod
    Posted March 8, 2025 at 3:56 pm

    I know academics that use it to make sure their arguments are grounded, after a meaningful draft. This helps them in more clearly laying out their arguments, and IMO is no worse than the companies that used motivated graduate students for review the grammar and coherency of papers written by non-native language speakers.

  • Post Author
    webdoodle
    Posted March 8, 2025 at 3:59 pm

    The push for AI is about controlling the narrative. By giving AI the editorial review process, it can control the direction of science, media and policy. Effectively controlling the course of human evolution.

    On the other hand, I'm fully supportive of going through ALL of the rejected scientific papers to look for editorial bias, censorship, propaganda, etc.

  • Post Author
    bookofjoe
    Posted March 8, 2025 at 4:08 pm
  • Post Author
    TZubiri
    Posted March 8, 2025 at 4:29 pm

    Didn't this YesNoError thing start as a memecoin?

  • Post Author
    huijzer
    Posted March 8, 2025 at 4:33 pm

    I think improving incentives is the real problem in science. Tools aren’t gonna fix it.

  • Post Author
    gusgus01
    Posted March 8, 2025 at 4:38 pm

    I'm extremely skeptical for the value in this. I've already seen wasted hours responding to baseless claims that are lent credence by AI "reviews" of open source codebases. The claims would have happened before but these text generators know how to hallucinate in the correct verbiage to convince lay people and amateurs and are more annoying to deal with.

  • Post Author
    RainyDayTmrw
    Posted March 8, 2025 at 4:52 pm

    Perhaps our collective memories are too short? Did we forget what curl just went through with AI confabulated bug reports[1]?

    [1]: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f…

  • Post Author
    InkCanon
    Posted March 8, 2025 at 4:56 pm

    This sounds way, way out of how LLMs work. They can't count the R's in strarwberrrrrry, but they can cross reference multiple tables of data? Is there something else going on here?

  • Post Author
    lfsh
    Posted March 8, 2025 at 5:10 pm

    I am using Jetbrain's AI to do code analysis (find errors).

    While it sometimes spot something I missed it also gives a lot of confident 'advise' that is just wrong or not useful.

    Current AI tools are still sophisticated search engines. They cannot reason or think.

    So while I think it could spot some errors in research papers I am still very sceptical that it is useful as trusted source.

  • Post Author
    _tom_
    Posted March 8, 2025 at 5:19 pm

    This basically turns research papers as a whole into a big generative adversarial network.

  • Post Author
    sfink
    Posted March 8, 2025 at 5:20 pm

    Don't forget that this is driven by present-day AI. Which means people will assume that it's checking for fraud and incorrect logic, when actually it's checking for self-consistency and consistency with training data. So it should be great for typos, misleading phrasing, and cross-checking facts and diagrams, but I would expect it to do little for manufactured data, plausible but incorrect conclusions, and garden variety bullshit (claiming X because Y, when Y only implies X because you have a reasonable-sounding argument that it ought to).

    Not all of that is out of reach. Making the AI evaluate a paper in the context of a cluster of related papers might enable spotting some "too good to be true" things.

    Hey, here's an idea: use AI for mapping out the influence of papers that were later retracted (whether for fraud or error, it doesn't matter). Not just via citation, but have it try to identify the no longer supported conclusions from a retracted paper, and see where they show up in downstream papers. (Cheap "downstream" is when a paper or a paper in a family of papers by the same team ever cited the upstream paper, even in preprints. More expensive downstream is doing it without citations.)

  • Post Author
    BurningFrog
    Posted March 8, 2025 at 5:41 pm

    In the not so far future we should have AIs that have read all the papers and other info in a field. They can then review any new paper as well as answering any questions in the field.

    This then becomes the first sanity check for any paper author.

    This should save a lot of time and effort, improve the quality of papers, and root out at least some fraud.

    Don't worry, many problems will remain :)

  • Post Author
    surferbayarea
    Posted March 8, 2025 at 5:46 pm

    Here are 2 examples from the Black Spatula project where we were able to detect major errors:
    https://github.com/The-Black-Spatula-Project/black-spatula-p…
    https://github.com/The-Black-Spatula-Project/black-spatula-p…

    Some things to note : this didn't even require a complex multi-agent pipeline. A single shot prompting was able to detect these errors.

  • Post Author
    systemstops
    Posted March 8, 2025 at 5:48 pm

    When people starting building tools like this to analyze media coverage of historic events, it will be a game changer.

  • Post Author
    jongjong
    Posted March 8, 2025 at 5:54 pm

    I expect that for truly innovative research, it might flag the innovative parts of the paper as a mistake if they're not fully elaborated upon… E.g. if the author assumed that the reader possesses certain niche knowledge.

    With software design, I find many mistakes in AI where it says things that are incorrect because it parrots common blanket statements and ideologies without actually checking if the statement applies in this case by looking at it from first principles… Once you take the discussion down to first principles, it quickly acknowledges its mistake but you had to have this deep insight in order to take it there… Some person who is trying to learn from AI would not get this insight from AI; instead they would be taught a dumbed-down, cartoonish, wordcel version of reality.

  • Post Author
    delusional
    Posted March 8, 2025 at 6:02 pm

    Reality check: yesnoerror, the only part of the article that actually seems to involve any published AI reviewer comments, is just checking arxiv papers. Their website claims that they "uncover errors, inconsistencies, and flawed methods that human reviewers missed." but arxiv is of course famously NOT a peer-reviewed journal. At best they are finding "errors, inconsistencies, and flawed methods" in papers that human reviewers haven't looked at.

    Let's then try and see if we can uncover any "errors, inconsistencies, and flawed methods" on their website. The "status" is pure madeup garbage. There's no network traffic related to it that would actually allow it to show a real status. The "RECENT ERROR DETECTIONS" lists a single paper from today, but looking at the queue when you click "submit a paper" lists the last completed paper as the 21st of February. The front page tells us that it found some math issue in a paper titled "Waste tea as absorbent for removal of heavy metal present in contaminated water" but if we navigate to that paper[1] the math error suddenly disappears. Most of the comments are also worthless, talking about minor typographical issues or misspellings that do not matter, but of course they still categorize that as an "error".

    It's the same garbage as every time with crypto people.

    [1]: https://yesnoerror.com/doc/82cd4ea5-4e33-48e1-b517-5ea3e2c5f…

  • Post Author
    EigenLord
    Posted March 8, 2025 at 6:42 pm

    The role of LLMs in research is an ongoing, well, research topic of interest of mine. I think it's fine so long as a 1. a pair of human eyes has validated any of the generated outputs and 2. The "ownership rule": the human researcher is prepared to defend and own anything the AI model does on their behalf, implying that they have digested and understood it as well as anything else they may have read or produced in the course of conducting their research.
    Rule #2 avoids this notion of crypto-plagiarism. If you prompted for a certain output, your thought in a manner of speaking was the cause of that output. If you agree with it, you should be able to use it.
    In this case, using AI to fact check is kind of ironic, considering their hallucination issues. However infallibility is the mark of omniscience; it's pretty unreasonable to expect these models to be flawless. They can still play a supplementary role to the review process, a second line of defense for peer-reviewers.

  • Post Author
    robwwilliams
    Posted March 8, 2025 at 7:14 pm

    Great start but definitely will require supervision by experts in the fields. I routinely use Claude 3.7 to flag errors in my submissions. Here is a prompt I used yesterday:

    “This is a paper we are planning to submit to Nature Neuroscience. Please generate a numbered list of significant errors with text tags I can use to find the errors and make corrections.”

    It gave me a list of 12 errors of which Claude labeled three as “inconsistencies”, “methods discrepancies”. and “contradictions”. When I requested that Claude reconsider it said “You are right, I apologize” in each of these three instances.
    Nonetheless it was still a big win for me and caught a lot of my dummheits.

    Claude 3.7 running in standard mode does not use its context window very effectively. I suppose I could have demanded that Claude “internally review (wait: think again)” for each serious error it initially thought it had encountered. I’ll try that next time. Exposure of chain of thought would help.

  • Post Author
    lifeisstillgood
    Posted March 8, 2025 at 8:30 pm

    It’s a nice idea, and I would love to be able to use it for my own company reports (spotting my obvious errors before sending them to my bosses boss)

    But the first thing I noticed was the two approaches highlighted – one a small scale approach that does not publish first but approaches the authors privately – and the other publishes first, does not have human review and has its own cryptocurrency

    I don’t think anything quite speaks more about the current state of the world and the choices in our political space

  • Post Author
    ysofunny
    Posted March 8, 2025 at 9:00 pm

    top two links at this moment are:

    > AI tools are spotting errors in research papers: inside a growing movement (nature.com)

    and

    > Kill your Feeds – Stop letting algorithms dictate what you think (usher.dev)

    so we shouldn't let the feed algorithms influence our thoughs, but also, AI tools need to tell us when we're wrong

  • Post Author
    forum-soon-yuck
    Posted March 8, 2025 at 9:19 pm

    [dead]

  • Post Author
    TheRealPomax
    Posted March 8, 2025 at 9:55 pm

    Oh look, an actual use case for AI. Very nice.

  • Post Author
    mac-mc
    Posted March 8, 2025 at 10:33 pm

    Now they need to do it for their own outputs to spot their own hallucination errors.

  • Post Author
    rosstex
    Posted March 8, 2025 at 10:41 pm

    Why not just skip the human and have AI write, evaluate and submit the papers?

  • Post Author
    YeGoblynQueenne
    Posted March 8, 2025 at 10:46 pm

    Needs more work.

    >> Right now, the YesNoError website contains many false positives, says Nick Brown, a researcher in scientific integrity at Linnaeus University. Among 40 papers flagged as having issues, he found 14 false positives (for example, the model stating that a figure referred to in the text did not appear in the paper, when it did). “The vast majority of the problems they’re finding appear to be writing issues,” and a lot of the detections are wrong, he says.

    >> Brown is wary that the effort will create a flood for the scientific community to clear up, as well fuss about minor errors such as typos, many of which should be spotted during peer review (both projects largely look at papers in preprint repositories). Unless the technology drastically improves, “this is going to generate huge amounts of work for no obvious benefit”, says Brown. “It strikes me as extraordinarily naive.”

  • Post Author
    timoth3y
    Posted March 8, 2025 at 11:02 pm

    Perhaps this is a naive question from a non-academic, but why isn't deliberately falsifying data or using AI tools or photoshop to create images career-ending?

    Wouldn't a more direct system be one in which journals refused submissions if one of the authors had committed deliberate fraud in a previous paper?

  • Post Author
    dbg31415
    Posted March 9, 2025 at 12:18 am

    Can we get it to fact check politicians and Facebook now?

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.