Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What corpus did you get your term frequencies from?


If I understand the blog correctly, their corpus consists of all the articles they have processed so far. Maybe they have some additional source, e.g. a more general collection of web pages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: