Skip to content Skip to footer

Deep Learning Is Not So Mysterious or Different by wuubuu

10 Comments

  • Post Author
    cgdl
    Posted March 17, 2025 at 5:20 pm

    Agreed, but PAC-Bayes or other descendants of VC theory is probably not the best explanation. The notion of algorithmic stability provides a (much) more compelling explanation. See [1] (particularly Sections 11 and 12)

    [1] https://arxiv.org/abs/2203.10036

  • Post Author
    uoaei
    Posted March 17, 2025 at 5:31 pm

    [flagged]

  • Post Author
    TechDebtDevin
    Posted March 17, 2025 at 5:41 pm

    Anyone who wants to demystify ML should read: The StatQuest Illustrated Guide to Machine Learning [0] By Josh Starmer.

    To this day I haven't found a teacher who could express complex ideas as clearly and concisely as Starmer does. It's written in an almost children's book like format that is very easy to read and understand. He also just published a book on NN that is just as good. Highly recommend even if you are already an expert as it will give you great ways to teach and communicate complex ideas in ML.

    [0]: https://www.goodreads.com/book/show/75622146-the-statquest-i…

  • Post Author
    getnormality
    Posted March 17, 2025 at 5:49 pm

    > rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem.

    How does deep learning do this? The last time I was deeply involved in machine learning, we used a penalized likelihood approach. To find a good model for data, you would optimize a cost function over model space, and the cost function was the sum of two terms: one quantifying the difference between model predictions and data, and the other quantifying the model's complexity. This framework encodes exactly a "soft preference for simpler solutions that are consistent with the data", but is that how deep learning works? I had the impression that the way complexity is penalized in deep learning was more complex, less straightforward.

  • Post Author
    inciampati
    Posted March 17, 2025 at 5:57 pm

    An example, which is interesting, in which "deep" networks are necessary, is discussed in this fascinating and popular recent paper on RNNs [1]. Despite the fact that the minGRU and minLSTM models they propose don't explicitly model ordered state dependency, they can learn them as long as they are deep enough (deep >= 3):

    > Instead of explicitly modelling dependencies on previous states to capture long-range dependencies, these kinds of recurrent models can learn them by stacking multiple layers.

    [1] https://arxiv.org/abs/2410.01201

  • Post Author
    YesBox
    Posted March 17, 2025 at 6:42 pm

    I wish I had the time to try this:

    1.) Grab many GBs of text (books, etc).

    2.) For each word, for each next $N words, store distance from current word, and increment count for word pair/distance.

    3.) For each word, store most frequent word for each $N distance. [a]

    4.) Create a prediction algorithm that determines the next word (or set of words) to output from any user input. Basically this would compare word pairs/distance and find most probable next set of word(s)

    How close would this be to GPT 2?

    [a] You could go one step further and store multiple words for each distance, ordered by frequency

  • Post Author
    EncomLab
    Posted March 17, 2025 at 6:44 pm

    The implication that any software is "mysterious" is problematic – there is no "woo" here – the exact state of the machine running the software may be determined at every cycle. The exact instruction and the data it executed with may be precisely determined, as can the next instruction. The entire mythos of any software being a "black box" is just so much advertising jargon, perpetuated by tech bros who want to believe they are part of some Mr. Robot self-styled priestly class.

  • Post Author
    rottc0dd
    Posted March 17, 2025 at 6:49 pm

    If anyone wants to delve into machine learning, one of the superb resources I have found is, Stanfords "Probability for computer scientists"(https://www.youtube.com/watch?v=2MuDZIAzBMY&list=PLoROMvodv4…).

    It delves into theoretical underpinnings of probability theory and ML, IMO better than any other course I have seen. (Yeah, Andrew Ng is legendary, but his course demands some mathematical familarity with linear algebra topics)

    And of course, for deep learning, 3b1b is great for getting some visual introduction (https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQ…).

  • Post Author
    buffalobuffalo
    Posted March 17, 2025 at 7:26 pm

    When I was first getting into Deep Learning, learning the proof of the universal approximation theorem helped a lot. Once you understand why neural networks are able to approximate functions, it makes everything built on top of them much easier to understand.

  • Post Author
    talles
    Posted March 17, 2025 at 7:29 pm

    Correct me if I'm wrong, but an artificial neuron is just good old linear regression followed by an activation function to make it non linear. Make a network out of it and cool stuff happens.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.