Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure if Wikipedia is OK with that, considering the load it generates [1]. I am currently doing some research on Wikipedia, and for my purposes I use the official dumps site at https://dumps.wikimedia.org/

[1] http://en.wikipedia.org/robots.txt



Load management is a big problem that we're looking to fix in the future. Wikipedia's API definitely struggles when rendering old revisions of medium-to-large pages, but we try our best to respect their own API etiquette guidelines: http://www.mediawiki.org/wiki/API:Etiquette.

In the future, we'd love to combine a mix of the Wikipedia API for real-time edit updates with a copy of the Wikipedia history database stored locally (all 2+ TB of it) to improve performance and decrease load on their servers.


FWIW, the wikisphere thinks this is pretty cool :-) New ways of reusing our stuff are always of interest.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: