Skip to content Skip to footer
0 items - $0.00 0

Kaggle and the Wikimedia Foundation are partnering on open data by xnx

Kaggle and the Wikimedia Foundation are partnering on open data by xnx

Kaggle and the Wikimedia Foundation are partnering on open data by xnx

7 Comments

  • Post Author
    toomuchtodo
    Posted April 16, 2025 at 5:38 pm

    This sounds good to take the ML/AI consumption load off Wikimedia infra?

  • Post Author
    0cf8612b2e1e
    Posted April 16, 2025 at 5:58 pm

    I wish the Kaggle site were better. Unnecessary amounts of JS to browse a forum.

  • Post Author
    riyanapatel
    Posted April 16, 2025 at 6:03 pm

    I like the concept of Kaggle and appreciate it – I also do agree that UI aspects hinder me from taking the time to explore its capabilities. Hoping this new partnership helps structure data for me.

  • Post Author
    sfx77
    Posted April 16, 2025 at 6:12 pm

    How are they going to reconcile the recent co-opting of data? Wikipedia is not reliable.

  • Post Author
    ashvardanian
    Posted April 16, 2025 at 6:20 pm

    It's a good start, but I was hoping for more data. Currently, it's only around 114 GB across 2 languages (<https://www.kaggle.com/datasets/wikimedia-foundation/wikiped…>):

      - English: "Size of uncompressed dataset: 79.57 GB chunked by max 2.15GB."
      - French: "Size of the uncompressed dataset: 34.01 GB chunked by max 2.15GB."
    

    In 2025, the standards for ML datasets are quite high.

  • Post Author
    bk496
    Posted April 16, 2025 at 7:06 pm

    It would be cool if all the HTML tables on Wikipedia were put under individual datasets

  • Post Author
    bilsbie
    Posted April 16, 2025 at 7:07 pm

    Wasn’t this data always available?

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.