Skip to content Skip to footer
0 items - $0.00 0

Polars Cloud: The Distributed Cloud Architecture to Run Polars Anywhere by neilfrndes

Polars Cloud: The Distributed Cloud Architecture to Run Polars Anywhere by neilfrndes

Polars Cloud: The Distributed Cloud Architecture to Run Polars Anywhere by neilfrndes

17 Comments

  • Post Author
    LaurensBER
    Posted March 7, 2025 at 9:24 pm

    This is very impressive and definitely fills a huge hole in the whole data frame ecosystem.

    I've been quite impressed with the Polars team and after using Pandas for years, Polars feels like a much needed fresh wind. Very excited to give this a go sometime soon!

  • Post Author
    ydjje
    Posted March 7, 2025 at 9:27 pm

    [flagged]

  • Post Author
    0cf8612b2e1e
    Posted March 7, 2025 at 9:32 pm

    I’ll bite- what’s the pitch vs Dask/Spark/Ray/etc?

    I am admittedly a tough sell when the workstation under my desk has 192GB of RAM.

  • Post Author
    __mharrison__
    Posted March 7, 2025 at 9:44 pm

    Really excited for the Polars team. I've always been impressed by their work and responsiveness to issues I've filed in the past. The world is lifted when there is good competition like this.

  • Post Author
    TheAlchemist
    Posted March 7, 2025 at 9:49 pm

    Having switched from Pandas to Polars recently, this is quite interesting and I guess performance wise it will be excellent.

  • Post Author
    whalesalad
    Posted March 7, 2025 at 9:53 pm

    Never understood these kinds of cloud tools that deal with big data. You are paying enormous ingress/egress fees to do this.

  • Post Author
    Starlord2048
    Posted March 7, 2025 at 9:56 pm

    I can appreciate the pain points you guys are addressing.

    The "diagonal scaling" approach seems particularly clever – dynamically choosing between horizontal and vertical scaling based on the query characteristics rather than forcing users into a one-size-fits-all model. Most real-world data workloads have mixed requirements, so this flexibility could be a major advantage.

    I'm curious how the new streaming engine with out-of-core processing will compare to Dask, which has been in this space for a while but hasn't quite achieved the adoption of pandas/PySpark despite its strengths.

    The unified API approach also tackles a real issue. The cognitive overhead of switching between pandas for local work and PySpark for distributed work is higher than most people acknowledge. Having a consistent mental model regardless of scale would be a productivity boost.

    Anyway, I would love to apply for the early access and try it out. I'd be particularly interested in seeing benchmark comparisons against Ray, Dask, and Spark for different workload profiles. Also curious about the pricing model and the cold start problem that plagues many distributed systems.

  • Post Author
    tfehring
    Posted March 7, 2025 at 9:57 pm

    This is really cool, not sure how I missed it. I assume catalog support will be added fairly quickly. But ironically I think the biggest barrier to adoption will be the lack of an off-ramp to a FOSS solution that companies can self-host. Obviously Polars itself is FOSS, but it understandably seems like there's no way to self-host a backend to point a `pc.ComputeContext` to. That will be an especially tough selling point for companies that are already on Spark. I wonder how much they'll focus on startups vs. trying to get bigger companies to switch, and whether they'll try a Spark compatibility layer like DataFusion (https://github.com/apache/datafusion-comet).

  • Post Author
    whyho
    Posted March 7, 2025 at 10:01 pm

    How does this integrate into existing services like aws glue? I fear that despite polars being good/better it will lack adoption since it cannot easily be integrated.

  • Post Author
    melvinroest
    Posted March 7, 2025 at 10:06 pm

    I just got into data analysis recently (former software engineer) and tried out pandas vs polars. I like polars way more because it feels like SQL but then sane, and it's faster. It's clear in what it tries to do. I didn't really have that with pandas.

  • Post Author
    efxhoy
    Posted March 7, 2025 at 10:12 pm

    Looks great! Can I run it on my own bare metal cluster? Will I need to buy a license?

  • Post Author
    marxisttemp
    Posted March 7, 2025 at 10:23 pm

    What does this project have to do with Serbia? They’re based in the Netherlands. They must have made a mistake when registering their domain name.

  • Post Author
    unit149
    Posted March 7, 2025 at 11:25 pm

    [dead]

  • Post Author
    marquisdepolis
    Posted March 8, 2025 at 12:08 am

    This is very interesting, clearly there's a major pain point here to be addressed, especially the delta between local pandas work and distributed [pyspark] work!

    Would love to test this out and do benchmarks against us/ Dask/ Spark/ Ray etc which have been our primary testing ground. Full disclosure, work at Bodo which has similar-ish aspirations (https://github.com/bodo-ai/Bodo), but FOSS all the way.

  • Post Author
    noworriesnate
    Posted March 8, 2025 at 12:12 am

    Every time I build something complex with dataframes in either R or Python (Pandas, I haven't used Polars yet), I end up really wishing I could have statically typed dataframes. I miss the security of knowing that when I change common code, the compiler will catch if I break a totally different part of the dashboard for instance.

    I'm aware of Pandera[1] which has support for Polars as well but, while nice, it doesn't cause the code to fail to compile, it only fails at runtime. To me this is the achilles heel of analysis in both Python and R.

    Does anybody have ideas on how this situation could be improved?

    [1] https://pandera.readthedocs.io/en/stable/

  • Post Author
    c7THEC2DDFVV2V
    Posted March 8, 2025 at 1:16 am

    who covers egress costs?

  • Post Author
    Larrikin
    Posted March 8, 2025 at 3:28 am

    As a hobbyist, I describe polars as pandas if it was planned for humans to use. It's great to use, I just hate running into issues trying to use it. I wish them luck

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.