Skip to content Skip to footer
0 items - $0.00 0

Are polynomial features the root of all evil? (2024) by Areibman

Are polynomial features the root of all evil? (2024) by Areibman

Are polynomial features the root of all evil? (2024) by Areibman

14 Comments

  • Post Author
    creata
    Posted April 22, 2025 at 5:05 pm

    A well-known related paper that I didn't see mentioned in the article (although Trefethen was mentioned) is "Six Myths of Polynomial Interpolation and Quadrature".

    https://people.maths.ox.ac.uk/trefethen/mythspaper.pdf

  • Post Author
    ComplexSystems
    Posted April 22, 2025 at 5:24 pm

    Great article! Very curious how this orthogonalization + regularization idea could be extended to other kinds of series, such as Fourier series.

  • Post Author
    SkyBelow
    Posted April 22, 2025 at 5:32 pm

    One thought I had while looking into regression recently is to consider the model created with a given regularization coefficient not as a line on a 2 dimensional graph but as a slice of a surface on a 3 dimensions graph where the third dimension is the regularization coefficient.

    In my case the model was for logistic regression and it was the boundary lines of the classification, but the thought is largely the same. Viewing it as a 3d shape form by boundary lines and considering hill tops as areas where entire classification boundaries disappeared as the regularization coefficient grew large enough to eliminate them. Impractical to do on models of any size and only useful when looking at two features at a time, but a fun consideration.

    More on topic with the article, how well does this work with considering multiple features and the different combinations of them. Instead of sigma(n => 50) of x^n, what happens if you have sigma(n => 50) of sigma(m => 50) of (x^n)*(y^n). Well probably less than 50 in the second example, maybe it is fair to have n and m go to 7 so there are 49 total terms compared to the original 50, instead of 2500 terms if they both go to 50.

  • Post Author
    FabHK
    Posted April 22, 2025 at 5:38 pm

    This article is much better, more informative, and factual than I'd have expected from the title. Note that it's part of an 8-article series.

    Worth a read if you're ever fitting functions.

  • Post Author
    xg15
    Posted April 22, 2025 at 6:01 pm

    Great article and clever use of linkbait!

  • Post Author
    PaulHoule
    Posted April 22, 2025 at 6:25 pm

    See also https://en.wikipedia.org/wiki/Chebyshev_polynomials

    They're the kind of math which is non-obvious and a bit intricate but yet if you knew the basics of how they worked and you were bored you might sit down with pencil and paper and derive everything about them. Then you wouldn't be bored anymore.

  • Post Author
    tc4v
    Posted April 22, 2025 at 6:38 pm

    good article, but I am very bother by the "standard basis"… it's called canonical in math. I don't think standard is the right name in any context.

  • Post Author
    petters
    Posted April 22, 2025 at 6:48 pm

    This part about double decent is really good: https://alexshtf.github.io/2025/03/27/Free-Poly.html#fnref:2

  • Post Author
    ForceBru
    Posted April 22, 2025 at 7:01 pm

    In another post (https://alexshtf.github.io/2025/03/27/Free-Poly.html) the author fits a degree-10000 (ten thousand!) polynomial using the Legendre basis. The polynomial _doesn't overfit_, demonstrating double descent. "What happened to our overfitting from ML 101 textbooks? There is no regularization. No control of the degree. But “magically” our high degree polynomial is not that bad!"

    So… are _all_ introductions to machine learning just extremely wrong here? I feel like I've seen tens of reputable books and courses that introduce overfitting and generalization using severe overfitting and terrible generalization of high-degree polynomials in the usual basis (1,x,x^2,…). Seemingly everyone warns of the dangers of high-degree polynomials, yet here the author just says "use another basis" and proves everyone wrong? Mind blown, or is there a catch?

  • Post Author
    programjames
    Posted April 22, 2025 at 7:18 pm

    This is why I believe a numerical methods course should be a requirement for any AI majors.

  • Post Author
    fancyfredbot
    Posted April 22, 2025 at 7:53 pm

    Wanted to add my voice to the chorus of appreciation for this article (actually a series of 8). Very informative and engaging.

  • Post Author
    ziofill
    Posted April 22, 2025 at 8:02 pm

    In this case wouldn't a Fourier-type approach work better? At least there's no risk the function blows up and it possibly needs fewer parameters?

  • Post Author
    constantcrying
    Posted April 22, 2025 at 8:31 pm

    I completely disagree with the conclusion of the article. The reason the examples worked so well is because of an arbitrary choice, which went completely uncommented.

    The interval was chosen as 0 to 1. This single fact was what made this feasible. Had the interval been chosen as 0 to 10. A degree 100 polynomial would have to calculate 10^100, this would have lead to drastic numerical errors.

    The article totally fails to give any of the totally legitimate and very important reason why high degree polynomials are dangerous. It is absurd to say that well known numerical problems do not exist because you just found one example where they did not occur.

  • Post Author
    constantcrying
    Posted April 22, 2025 at 8:35 pm

    It should also be noted that this kind of fitting is extremely closely related to integration or in other words calculating the mean.

    By using the "right" kind of polynomial basis you can get a polynomial approximation which also tells you the mean and variance of the function under a random variable.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.