Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The other day I though how cool it would be to have a Web service that could crawl your site and auto categorize all of your pages (or at least help you to do it). As ever, turns out someone is on the case ;-) Nice work! I think there's definitely a wider audience for this technology.


I think there's definitely a wider audience for this technology.

What audiences do you see for this technology?

Also, how would you expand the audience for this technology?

Possible options:

* Auto-crawl content and automagically organize it, without involving the content owner. (The Google approach).

* Build a turn-key solution that people can upload their content and get the index returned to them. (An API approach.)

* Talking to businesses directly, and make one on one deals. (An enterprise/B2B approach.)


There are lots of uses for this, but my main advice is to not loose these three things:

1. advantage of relevancy within specific domains. The page-rank was a huge value-add to relevancy over other search. But internet wide is now too ambitious. HN is a great corpus because the content is already vetted by a community. The work of integrating other specialized communities content can give density and relevancy.

2. ease-of-use in integration. The less configuration to use this API, the better. Autotagging, done well, is very useful. I have a lot of ideas around this if you'd like to chat some time.

3. ease-of-use interface . Combining browsable, faceted search with NLP is, I think, the sweet spot between getting lots of relevant results, but allowing for discovery.


Mostly #1, but agreed with all. Especially so as to leverage a managed topic domain into a transferable form of domain knowledge.


i think that'd be quite easy, being honest) i've been working on the same concept to that app, it's hackable in 3-5 weeks :) especially having some great tools such as Weka or Mahout


As with all NLP/machine learning, it's trivial to do if you know the tools and already have the data -- but pretty difficult to do well.


it's hard to build a complete custom solution. although it's possible to allow people to use certain wrapped components, giving explanation of how to achieve best results using certain methods.

again, one person may be satisfied with a certain achieved result, but that may work not quite well for other person. just for instance, clusterization or topic extraction. you should have some knowledge about what you want to get, what are the possible outcomes of investigation, and dig into data to get what you need. generic approach will give broad resultset that may require some additional effort for real-world usage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: