Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There was (is?) a DARPA project called "Memex" that was built to crawl the hidden web that has many tools like crawling with authentication, automatic registration, machine-learning to detect search-forms, auto detecting pagination etc etc etc etc https://github.com/darpa-i2o/memex-program-index


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: