
Kaggle and the Wikimedia Foundation are partnering on open data by xnx
Kaggle is hosting Wikimedia Enterprise‘s beta release of structured data in both French and English. Kaggle is home to a vast trove of open and accessible data, with more than 461,000 freely accessible datasets. Researchers, students and machine learning practitioners use this data to explore, train, learn and c
7 Comments
toomuchtodo
This sounds good to take the ML/AI consumption load off Wikimedia infra?
0cf8612b2e1e
I wish the Kaggle site were better. Unnecessary amounts of JS to browse a forum.
riyanapatel
I like the concept of Kaggle and appreciate it – I also do agree that UI aspects hinder me from taking the time to explore its capabilities. Hoping this new partnership helps structure data for me.
sfx77
How are they going to reconcile the recent co-opting of data? Wikipedia is not reliable.
ashvardanian
It's a good start, but I was hoping for more data. Currently, it's only around 114 GB across 2 languages (<https://www.kaggle.com/datasets/wikimedia-foundation/wikiped…>):
In 2025, the standards for ML datasets are quite high.
bk496
It would be cool if all the HTML tables on Wikipedia were put under individual datasets
bilsbie
Wasn’t this data always available?