Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For clarity, the Reddit datasets are not released by Reddit itself, but scraped through the API. (More context/examples of what I do with the data: http://minimaxir.com/2015/10/reddit-bigquery/ )


Okay, but reddit does also release some public data sets: https://github.com/reddit/public-data-sets


Ah, right, forgot about those. (Although, those are traffic aggregates and wouldn't be affected by changed in the score ranking)


Does this mean you (or I if I want to do some analytics on Reddit data) will need to completely rescrape the site after scores are recomputed?


If you wanted to compare raw scores for submissions before the change to those after the changes, yes.

Otherwise, it shouldn't matter.


An API is not scraping. Scraping is taking a loosely structure document with no agreed upon interface and extracting data.

A relevant comparison would be Craigslist who will use legal force to prevent you using data you scrape off the site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: