Skip to content Skip to footer
0 items - $0.00 0

I genuinely don’t understand why some people are still bullish about LLMs by ksec

I genuinely don’t understand why some people are still bullish about LLMs by ksec

32 Comments

  • Post Author
    encypherai
    Posted March 27, 2025 at 9:42 pm

    We've had the opposite experience, especially with o3-mini using Deep Research for market research & topic deep-dive tasks. The sources that are pulled have never been 404 for us, and typically have been highly relevant to the search prompt. It's been a huge time-saver. We are just scratching the surface of how good these LLMs will become at research tasks.

  • Post Author
    retrac
    Posted March 27, 2025 at 9:44 pm

    You're using them wrong. Everyone is though I can't fault you specifically. Chatbot is like the worst possible application of these technologies.

    Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.

    But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription – in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead – but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.

    It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.

  • Post Author
    MostlyStable
    Posted March 27, 2025 at 9:53 pm

    My experience (almost exclusively Claude), has just been so different that I don't know what to say. Some of the examples are the kinds of things I explicitly wouldn't expect LLMs to be particularly good at so I wouldn't use them for, and others, she says that it just doesn't work for her, and that experience is just so different than mine that I don't know how to respond.

    I think that there are two kinds of people who use AI: people who are looking for the ways in which AIs fail (of which there are still many) and people who are looking for the ways in which AIs succeed (of which there are also many).

    A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.

    LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.

    Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air.
    Not every use case is like this, but there are many.

    -edit- Also, when she says "none of my students has ever invented references that just don't exist"…all I can say is "press X to doubt"

  • Post Author
    latemedium
    Posted March 27, 2025 at 9:53 pm

    My experience is starkly different. Today I used LLMs to:

    1. Write python code for a new type of loss function I was considering

    2. Perform lots of annoying CSV munging ("split this CSV into 4 equal parts", "convert paths in this column into absolute paths", "combine these and then split into 4 distinct subsets based on this field.." – they're great for that)

    3. Expedite some basic shell operations like "generate softlinks for 100 randomly selected files in this directory"

    4. Generate some summary plots of the data in the files I was working with

    5. Not to mention extensive use in Cursor & GH Copilot

    The tool (Claude 3.7 mostly, integrated with my shell so it can execute shell commands and run python locally) worked great in all cases. Yes I could've done most of it myself, but I personally hate CSV munging and bulk file manipulations and its super nice to delegate that stuff to an LLM agent

    edit: formatting

  • Post Author
    jongjong
    Posted March 27, 2025 at 9:58 pm

    People who don't work in tech have no idea how hard it is to do certain things at scale. Skilled tech people are severely underappreciated.

    From a sub-tweet:

    >> no LLM should ever output a url that gives a 404 error. How hard can it be?

    As a developer, I'm just imagining a server having to call up all the URLs to check that they still exist (and the extra costs/latency incurred there)… And if any URLs are missing, getting the AI to re-generate a different variant of the response, until you find one which does not contain the missing links.

    And no, you can't do it from the client side either… It would just be confusing if you removed invalid URLs from the middle of the AI's sentence without re-generating the sentence.

    You almost need to get the LLM to engineer/pre-process its own prompts in a way which guesses what the user is thinking in order to produce great responses…

    Worse than that though… A fundamental problem of 'prompt engineering' is that people (especially non-tech people) often don't actually fully understand what they're asking. Contradictions in requirements are extremely common. When building software especially, people often have a vague idea of what they want… They strongly believe that they have a perfectly clear idea but once you scope out the feature in detail, mapping out complex UX interactions, they start to see all these necessary tradeoffs and limitations rise to the surface and suddenly they realize that they were asking for something they don't want.

    It's hard to understand your own needs precisely; even harder to communicate them.

  • Post Author
    airstrike
    Posted March 27, 2025 at 9:59 pm

    I wrote an AI assistant which generates working spreadsheets with formulas and working presentations with neatly laid out elements and styles. It's a huge productivity gain relative to starting from a blank page.

    I think LLMs work best when they are used as a "creative" tool. They're good for the brainstorming part of a task, not for the finishing touches.

    They are too unreliable to be put in front of your users. People don't want to talk to unpredictable chatbots. Yes, they can be useful in customer service chats because you can put them on rails and map natural language to predetermined actions. But generally speaking I think LLMs are most effective when used _by_ someone who's piloting them instead of wrapped in a service offered _to_ someone.

    I do think we've squeezed 90%+ of what we could from current models. Throwing more dollars of compute at training or inference won't make much difference. The next "GPT moment" will come from some sufficiently novel approach.

  • Post Author
    joegibbs
    Posted March 27, 2025 at 9:59 pm

    Because it’s not a scientific research tool, it’s a most likely next text generator. It doesn’t keep a database of ingested information with source URLs. There are plenty of scientific research tools but something that just outputs text based on your input is no good for it.

    I’m sure that in the future there will be a really good search tool that utilises an LLM but for now a plain model just isn’t designed for that. There are a ton of other uses for them, so I don’t think that we should discount them entirely based on their ability to output citations.

  • Post Author
    GaggiX
    Posted March 27, 2025 at 10:05 pm

    I use them everyday and they work greatly, I even made a command (using Claude, actually Claude made everything in that script) that calls Gemini from the terminal so that I can ask for question related to the shell directly there, just doing a: ai "how can I convert a webp to a png", the system prompt asks to be brief, using markdown (it does display nicely), that most question are related to Linux and it provides information about my OS (uname -a), the last code block is also copied in the clipboard, super useful, I imagine there are plenty online of similar utilities.

  • Post Author
    saaaaaam
    Posted March 27, 2025 at 10:15 pm

    I’ve used Claude today to:

    Write code to pull down a significant amount of public data using an open API. (That took about 30 seconds – I just gave it the swagger file and said “here’s what I want”)

    Get the data (an hour or so), clean the data (barely any time, gave it some samples, it wrote the code), used the cleaned data to query another API, combined the data sources, pulled down a bunch of PDFs relating to the data, had the AI write code to use tesseract to extract data from the PDFs, and used that to build a dashboard. That’s a mini product for my users.

    I also had a play with Mistral’s OCR and have tested a few things using that against the data. When I was out walking my dogs I thought about that more, and have come up with a nice workflow for a problem I had, which I’ll test in more detail next week.

    That was all whole doing an entirely different series of tasks, on calls, in meetings. I literally checked the progress a few times and wrote a new prompt or copy/pasted some stuff in from dev tools.

    For the calls I was on, I took the recording of those calls, passed them into my local instance whisper, fed the transcript into Claude with a prompt I use to extract action points, pasted those into a google doc, circulated them.

    One of the calls was an interview with an expert. The transcript + another prompt has given me the basis for an article (bulleted narrative + key quotes) – I will refine that tomorrow, and write the article, using a detailed prompt based on my own writing style and tone.

    I needed to gather data for a project I’m involved in, so had Claude write a handful of scrapers for me (HTML source > here is what I need).

    I downloaded two podcasts I need to listen to – but only need to listen to five minutes of each – and fed them into whisper then found the exact bits I needed and read the extracts rather than listening to tedious podcast waffle.

    I turned an article I’d written into an audio file using elevenlabs, as a test for something a client asked me about earlier this week.

    I achieved about three times as much today as I would have done a year ago. And finished work at 3pm.

    So yeah, I don’t understand why people are so bullish about LLMs. Who knows?

  • Post Author
    doctoboggan
    Posted March 27, 2025 at 10:20 pm

    Why do people who don't like using LLMs keep insisting they are useless for the rest of us? If you don't like to use them, then simply don't use them.

    I use them almost daily in my job and get tremendous use out of them. I guess you could accuse me of lying, but what do I stand to gain from that?

    I've also seem people claim that only people who don't know how to code or people doing super simple done a million times apps can get value out of LLMs. I don't believe that applies to my situation, but even if it did, so what? I do real work for a real company delivering real value, and the LLM delivers value to me. It's really as simple as that.

  • Post Author
    harrall
    Posted March 27, 2025 at 10:21 pm

    I am neither bullish or bearish. LLM is a tool.

    It's a hammer — sometimes it works well. It summarizes the user reviews on a site… cool, not perfect, but useful.

    And like every tool, it is useless for 90% of life's situations.

    And I know when it's useful because I've already tried a hammer on 1000 things and have figured out what I should be using a hammer on.

  • Post Author
    belter
    Posted March 27, 2025 at 10:26 pm

    "Yes, I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself. Stopped using it for that reason."

    Thank you Sabine. Every time I have mentioned Gemini is the worst, and not even worth of consideration, I have been bombarded with downvotes, and told I am using it wrong.

  • Post Author
    throwawa14223
    Posted March 27, 2025 at 10:26 pm

    My experience mirrors hers. Asking questions is worthless because the answers are either 404 links, telling me how to use a search engine, or just flat out wrong and the code generated compiles maybe one time out of ten and when it does the implementation is usually poor.

    When I evaluate against areas I possess professional expertise I become convinced LLMs produce the Gell Mann amnesia effect for any area I don't know.

  • Post Author
    islewis
    Posted March 27, 2025 at 10:27 pm

    > I genuinely don't understand why some people are still bullish about LLMs.

    I don't believe OP's thesis is properly backed by the rest of his tweet, which seems to boil down to "LLM's can't properly cite links".

    If LLM's performing poorly on an arbitrary small-scoped test case makes you bearish on the whole field, I don't think that falls on the LLM's.

  • Post Author
    crazygringo
    Posted March 27, 2025 at 10:28 pm

    If there's one common thread across LLM criticisms, it's that they're not perfect.

    These critics don't seem to have learned the lesson that the perfect is the enemy of the good.

    I use ChatGPT all the time for academic research. Does it fabricate references? Absolutely, maybe about a third of the time. But has it pointed me to important research papers I might never have found otherwise? Absolutely.

    The rate of inaccuracies and falsehoods doesn't matter. What matters is, is it saving you time and increasing your productivity. Verifying the accuracy of its statements is easy. While finding the knowledge it spits out in the first place is hard. The net balance is a huge positive.

    People are bullish on LLM's because they can save you days' worth of work, like every day. My research productivity has gone way up with ChatGPT — asking it to explain ideas, related concepts, relevant papers, and so forth. It's amazing.

  • Post Author
    simonw
    Posted March 27, 2025 at 10:31 pm

    The most interesting thing about this post is how it reinforces how terrible the usability of LLMs still is today:

    "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error. I Google for the alleged quote, it doesn't exist. They reference a scientific publication, I look it up, it doesn't exist."

    To experienced LLM users that's not surprising at all – providing citations, sources for quotes, useful URLs are all things that they are demonstrably terrible at.

    But it's a computer! Telling people "this advanced computer system cannot reliably look up facts" goes against everything computers have been good at for the last 40+ years.

  • Post Author
    puppycodes
    Posted March 27, 2025 at 10:33 pm

    Here's one simple reason:

    I have a very specific esoteric question like: "What material is both electrically conductive and good at blocking sound?" I could type this into google and sift through the titles and short descriptions of websites and eventually maybe find an answer, or I can put the question to the LLM and instantly get an answer that I can then research further to confirm.

    This is significantly faster, more informative, more efficient, and a rewarding experience.

    As others have said, its a tool. A tool is as good as how you use it. If you expect to build a house by yelling at your tools I wouldn't be bullish either.

  • Post Author
    xiphias2
    Posted March 27, 2025 at 10:36 pm

    ,,By my personal estimate currently GPT 4o DeepResearch is the best one. ''

    If the o3 based 3 month old strongest model is the best one, it's a proof that there were quite significant improvements in the last 2 years.

    I can't name any other technology that improved as much in 2 years.

    O1 and o1 pro helped me with filing tax returns and answered me questions that (probably quite bad) tax accountants (and less smart models) weren't able to (of course I read the referenced laws, I don't trust the output either).

  • Post Author
    meowface
    Posted March 27, 2025 at 10:37 pm

    AI coding overall still seems to be underrated by the average developer.

    They try to add a new feature or change some behavior in a large existing codebase and it does something dumb and they write it off as a waste of time for that use case. And that's understandable. But if they had tweaked the prompt just a bit it actually might've done it flawlessly.

    It requires patience and learning the best way to guide it and iterate with it when it does something silly.

    Although you undoubtedly will lose some time re-attempting prompts and fixing mistakes and poor design choices, on net I believe the frontier models can currently make development much more productive in almost any codebase.

  • Post Author
    jwrallie
    Posted March 27, 2025 at 10:38 pm

    The author mentioned Gemini sometimes refusing to do something.

    I’ve recently been using Gemini (mostly 2.0 flash) a lot and I’ve noticed it sometimes will challenge me to try doing something by myself. Maybe it’s something in my system prompt or the way I worded the request itself. I am a long time user of 4o so it felt annoying at first.

    Since my purpose was to learn how to do something, being open minded I tried to comply with the request and I can say that… it’s being a really great experience in terms of retention of knowledge. Even if I’m making mistakes Gemini will point them out and explain it nicely.

  • Post Author
    aaron695
    Posted March 27, 2025 at 10:39 pm

    [dead]

  • Post Author
    BSOhealth
    Posted March 27, 2025 at 10:40 pm

    Like others here, I use it to code (no longer a professional engineer, but keep side projects).

    As soon as LLMs were introduced into the IDE it began to feeling like LLM autocomplete was almost reading my mind. With some context built up over a few hundred lines of initial architecture, autocomplete now sees around the same corners I am. It’s more than just “solve this contrived puzzle” or “write snake”. It combines the subject matter use case (informed by variable and type naming) underlying the architecture and sometimes produces really breathtaking and productive results. Like I said, it took some time but when it happened, it was pretty shocking.

  • Post Author
    kriro
    Posted March 27, 2025 at 10:45 pm

    She's a scientist. In that area LLMs are quite useful in my opinion and part of my daily workflow. Quick scripts that use APIs to get data, cleaning the data and converting it. Quickly working with polar data frames. Dumb daily stuff like "take this data from my CSV file and turn it into a Latex table"…but most importantly freeing up time from tedious administrative tasks (let's not go into detail here).

    Also great for brainstorming and quick drafting grant proposals. Anything prototyping and quickly glued together I'll go for LLMs (or LLM agents). They are no substitute for your own brain though.

    I'm also curious about the hallucinated sources. I've recently read some papers on using LLM-agents to conduct structured literature reviews and they do it quite well and fairly reproducible. I'm quite willing to build some LLM-agents to reproduce my literature review process in the near future since it's fairly algorithmic. Check for surveys and reviews on the topic, scan for interesting papers within, check sources of sources, go through A tier conference proceedings for the last X years and find relevant papers. Rinse, repeat.

    I'm mostly bullish because of LLM-agents, not because of using stock models with the default chat interface.

  • Post Author
    marcuschong
    Posted March 27, 2025 at 10:50 pm

    The key for LLM productivity, it seems to me, is grounding. Let me give you my last example, from something I've been working on.

    I just updated my company commercial PPT. ChatGPT helped me with:
    – Deep Research great examples and references of such presentations.
    – Restructure my argument and slides according to some articles I found on the previous step, and thought were pretty good.
    – Come up with copy for each slide.
    – Iterate new ideas as I was progressing.

    Now, without proper context and grounding, LLMs wouldn't be so helpful at this task, because they don't know my company, clients, product and strategy, and would be generic at best. The key: I provided it with my support portal documentation and a brain dump I recorded to text on ChatGPT with key strategic information about my company. Those are two bits of info I keep always around, so ChatGPT can help me with many tasks in the company.

    From that grounding to the final PPT, it's pretty much a trivial and boring transformation task that would have cost me many, many hours to do.

  • Post Author
    mcv
    Posted March 27, 2025 at 11:15 pm

    I've been saying the same thing, though in less detail. AI is so dr8ven by hype at the moment, that it's unavoidable that it's going to collapse at some point. I'm not saying tue current crop of AI is useless; there are plenty of useful applications, but it's clear lots of people expect more from it than it's capable of, and everybody is investing in it just because everybody else is.

    But even if it does work, you still need to doublecheck everything it does.

    Anyway, my RPG group is going to try roleplaying with AI generated content (not yet as GM). We'll see how it goes.

  • Post Author
    nipponese
    Posted March 27, 2025 at 11:26 pm

    ChatGPT has a share button. This is like a bug report with no repro steps.

    If people aren't linking the conversation, it's really hard to take the complaint seriously.

  • Post Author
    prennert
    Posted March 27, 2025 at 11:29 pm

    I have been using Claude this week the first time for a _slightly_ bigger SwiftUI project than just a few lines of bash or SQL I used it before. I have never used swift before but I am amazed how much Claude could do. It feels to me as we are at the point where anyone can now generate small tools with low effort for themselves. Maybe not production ready, but good enough to use yourself. It feels like it should be good enough to empower the average user to break out of having to rely on pre-made apps to do small things. Kind of like bash for the average Joe.

    What worked:

    – generated a mostly working PoC with minimal input and hallucinated UI layout, Color scheme, etc. this is amazing because it did not bombard me with detailed questions. It just carried on to provide me with a baseline that I could then finetune

    – it corrected build issues by me simply copy pasting the errors from Xcode
    – got APIs working
    – added debug code when it could not fix an issue after a few rounds

    – resolved an API issue after I pointed it to a typescript SDK to the API (I literally gave a link to the file and told it, try to use this to work out where the problem is)
    – it produces code very fast

    What is not working great yet:

    – it started off with one large file and crashed soon after because it hit a timeout when regenerating the file. I needed to ask it to split the file into a typical project order

    – some logic I asked it to implement explicitly got changed at some point during an unrelated task. To prevent this in future I asked it mark this code part as important and that it should only be changed at explicit request. I don’t know yet how long this code will stay protected for

    – by the time enough context got build up usage warnings pop up in Claude

    – only so many files are supported atm

    So my takeaway is that it is very good at translating, I.e. API docs into code, errors into fixes. There is also a fine line between providing enough context and running out of tokens.

    I am planning to continue my project to see how far I can push it. As I am getting close to the limit of the token size now, I am thinking of structuring my app in a Claude friendly way:

    – clear internal APIs. Kind of like header files so that I can tell Claude what functions it can use without allowing it to change them or needing to tokenize the full source code

    – adversarial testing. I don’t have tests yet, but I am thinking of asking one dedicated instance of Claude to generate tests. I will use other Claude instances for coding and provide them with failing test outputs like I do now with build errors. I hope it will fix itself similarly.

  • Post Author
    bluebarbet
    Posted March 27, 2025 at 11:48 pm

    Perhaps attitudes to this new phenomenon are correlated with propensity to skepticism in general.

    I will cite myself as Exhibit A. I am the sort of person who takes almost nothing at face value. To me, physiotherapy, and oenology, and musicology, and bed marketing, and mineral-water benefits, and very many other such things, are all obviously pseudoscience, worthy of no more attention than horoscopes. If I saw a ghost I would assume it was a hallucination caused by something I ate.

    So it seems like no coincidence that I reflexively ignore the AI babble at the top of search results. After all, an LLM is a language-rehashing machine which (as we all know by now) does not understand facts. That's terribly relevant.

    I remember reading, a couple of years back, about some Very Serious Person (i.e. a credible voice, I believe some kind of scientist) who, after a three-hour conversation with ChatGPT, had become convinced that the thing was conscious. Rarely have I rolled my eyes so hard. It occurred to me then that skepticism must be (even) less common a mindset than I assumed.

  • Post Author
    gilbetron
    Posted March 28, 2025 at 12:20 am

    I get so confused on this. I play around, test, and mess with LLMs all the time and they are miraculous. Just amazing, doing things we dreamed about for decades. I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out. It talks to me like a person. It generates really cool images. It helps me write code. And just tons of other stuff that astounds me.

    And people just sit around, unimpressed, and complain that … what … it isn't a perfect superintelligence that understands everything perfectly? This is the most amazing technology I've experienced as a 50+ year old nerd that has been sitting deep in tech for basically my whole life. This is the stuff of science fiction, and while there totally are limitations, the speed at which it is progressing is insane. And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"

    Crazy.

  • Post Author
    slackfan
    Posted March 28, 2025 at 12:26 am

    LLMs are like any tool, you get what you put in. If you are frustrated with the results, maybe you need to think about what you're doing.

    300/5290 functions decompiled and analyzed in less than three hours off of a huge codebase. By next weekend, a binary that had lost source code will have tests running on a platform it wasn't designed for.

  • Post Author
    rglover
    Posted March 28, 2025 at 12:28 am

    Never underestimate the momentum of a poorly understood idea that appears as magic to the average person. Once the money starts flowing, that momentum will increase until the idea hits a brick wall that its creators can't gaslight people about.

    I hope that realization happens before "vibe coding" is accepted as standard practice by software teams (especially when you consider the poor quality of software before the LLM era). If not, it's only a matter of time before we refer to the internet as "something we used to enjoy."

  • Post Author
    mkoubaa
    Posted March 28, 2025 at 12:33 am

    I'm just glad I have something that can write logic in any unix shell script for me. It's the right combination of thing I don't have to do often, thing that doesn't fit my mental model, thing that works differently based on the platform, and thing I just can't be bothered to master.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.