Skip to content Skip to footer
0 items - $0.00 0

TL;DR of Deep Dive into LLMs Like ChatGPT by Andrej Karpathy by oleg_tarasov

TL;DR of Deep Dive into LLMs Like ChatGPT by Andrej Karpathy by oleg_tarasov

TL;DR of Deep Dive into LLMs Like ChatGPT by Andrej Karpathy by oleg_tarasov

10 Comments

  • Post Author
    dzogchen
    Posted February 10, 2025 at 7:46 am

    For a model to be ‘fully’ open source you need more than the model itself and a way to run it. You also need the data and the program that can be used to train it.

    See The Open Source AI Definition from OSI: https://opensource.org/ai

  • Post Author
    bluelightning2k
    Posted February 10, 2025 at 8:25 am

    Great write up of what is presumably a truly great lecture. Debating trying to follow the original now.

  • Post Author
    albert_e
    Posted February 10, 2025 at 8:38 am

    OT: What is a good place to discuss the original video — once it has dropped out of the HN front-page?

    I am going through the video myself — roughly halfway through — and have a fw things to bring up.

    Here they are now that we have a fresh opportunity to discuss:

    1 – MATH and LLMs

    I am curious why many of the examples Andrej chose to pose to the LLM were "computational" questions — for instance "what is 2+2" or some numerical puzzles that needed algebraic thinking and then some addition/subtraction/multiplication (example at 1:50 mins about buying Apples and Oranges).

    I can understand these abilities of LLMs are becoming powerful and useful too — but in my mind these are not the "basic" abilities of a next token predictor.

    I would have appreciated a more clear distinction of prompts that showcase core LLM ability — to generate text that is acceptable as generally grammatically correct, based in facts and context, without necessarily needing the ability of a working memory / assigning values to algebraic variables / doing arithmetic etc.

    If there are any good references to discussion on the mathematical abilities of LLMs and the wisdom of trying to make them do math — versus simply recognizing when a math is needed and generating the necessary python/expressions and let the tools handle it.

    2 – META

    While Andrej briefly acknowledges the "meta" situation where LLMs are being used to create training data for the training of and judge the outputs of newer LLMs … there is not much discussion on that here.

    There are just many more examples of how LLMs are used to prepare mitigations for hallucinations by preparing Q&A training sets with "correct" answers etc

    I am curious to know more about the limitations / perils of using LLMs to train/evaluate other LLMs.

    I kind of feel that this is a bit like the Manhattan project and atomic weapons — in that early results and advances are being looped back immediately into the development of more powerful technology. (A smaller fission charge at the core of a larger fusion weapon — to be very loose with analogies)

    <I am sure I will have a few more questions as I go through the rets of the video and digest it>

  • Post Author
    est
    Posted February 10, 2025 at 9:11 am

    I have read many articles about LLMs, and understand how it works in general, but one thing always bothers me: why other models did't work as good as SOTA ones? What's the history and reason behind the current model architecture?

  • Post Author
    khazhoux
    Posted February 10, 2025 at 9:12 am

    I'm still seeking an answer to what DeepSeek really is, especially in the context of their $5M versus ChatGPT's >$1B (source: internet). What did they do versus not do?

  • Post Author
    miletus
    Posted February 10, 2025 at 9:37 am
  • Post Author
    EncomLab
    Posted February 10, 2025 at 11:39 am

    It would be great if the hardware issues were discussed more – too little is made of the distinction between silicon substrate, fixed threshold, voltage moderated brittle networks of solid-state switches and protein substrate, variable threshold, chemically moderated plastic networks of biological switches.

    To be clear, neither possesses any magical "woo" outside of physics that gives one or the other some secret magical properties – but these are not arbitrary meaningless distinctions in the way they are often discussed.

  • Post Author
    thomasahle
    Posted February 10, 2025 at 11:52 am

    I find Meta’s approach to hallucinations delightfully counter intuitive. Basically they (and presumably OpenAI and others):

       - Extract a snippet of training data.
       - Generate a factual question about it using Llama 3.
       - Have Llama 3 generate an answer.
       - Score the response against the original data.
       - If incorrect, train the model to recognize and refuse incorrect responses.
    

    In a way this is obvious in hindsight, but it goes against ML engineers natural tendency when detecting a wrong answer: Teaching the model the right answer.

    Instead of teaching the model to recognize what it doesn't know, why not teach it using those same examples? Of course the idea is to "connect the unused uncertainty neuron", which makes sense for out-of-context generalization. But we can at least appreciate why this wasn't an obvious thing to do for generation 1 LLMs.

  • Post Author
    sylware
    Posted February 10, 2025 at 12:00 pm

    It is sad to see that much attention given to LLM in comparison to the other types of AIs like those doing maths (strapped to a formal solver), folding proteins, etc.

    We had a talk about those physics AIs using those maths AIs to design hard mathematical models to fit fundamental physics data.

  • Post Author
    wolfhumble
    Posted February 10, 2025 at 12:14 pm

    I haven't watched the video, but was wondering about the Tokenization part from the TL;DR:

    "|" "View" "ing" "Single"

    Just looking at the text being tokenized in the linked article, it looked like (to me) that the text was: "I View", but the "I" is actually a pipe "|".

    From Step 3 in the link that @miletus posted in the Hacker News comment: https://x.com/0xmetaschool/status/1888873667624661455 the text that is being tokenized is:

    |Viewing Single (Post From) . . .

    The capitals used (View, Single) also makes more sense when seeing this part of the sentence.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.