Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> because it conflicts with the major thing vendors want

Maybe, but that doesn't explain why there are no startups on the case, that wouldn't have this conflict of interest.



You'd think it wouldn't be that hard to build a personal search engine on top of Lucene or Elasticsearch and on one level it isn't. But there are two very hard problems.

(1) Performance. Back in the day people frequently turned full text indexing on Windows off because it would slow down their computer too much. People won't be happy with the overheard of a search engine that is always scanning 10's of GBs of document.

(2) Search quality. People are familiar with Google being an effective search engine but they've certainly tried a number with terrible relevance scoring and probably learnt that it is not worth trying the search on a web site or the search on the help of an application. In the case of Elasticsearch the default similarity is BM25 but that has two tunable parameters. There are other similarities you could use but most of them have tunable parameters. It makes a real difference what you choice and there is a methodology for tuning them that is now built into ElasticSearch.

https://www.elastic.co/guide/en/elasticsearch/reference/curr...

I talked to about 20 vendors of full text search products and found that only 2 out of the list regularly evaluated the quality of the results, 1 of them just did it so they could get some advertising by being on the TREC leaderboard. They told over and over again that customers didn't care about search quality, they just wanted to see a list of 350+ data sources that the product could index.


> They told over and over again that customers didn't care about search quality, they just wanted to see a list of 350+ data sources that the product could index.

This seems to describe corporate buyers ticking boxes on a form, more than actual users.


Sometimes those box tickers are your customers, as the ones paying you. That's how you get Enterprise Software


Who would fund those, and where's the potential for outsized growth?


I think the problem is there's no demand, not that incumbents are fighting it. But I also think it's weird and a mystery that nobody wants good search.

The trust issue you mention may be a clue.


> I think the problem is there's no demand

Computing is a supply-driven market. The demand is there, it's just partially latent, partially ignored. Vast majority of technology users have no choice but to choose from what's being offered, and the minority of tech-savvy users with opinions are increasingly too small a niche to support "power user" tools.


There isn't even an open source product with traction.


ripgrep solves the "precise full-text search" problem quite nicely.


There is that, and locate.

Most people have Word Documents, PDF files, and other things that need a more complex indexing strategy. Also a lot of people have lots of image and audio files which pose their own challenges, namely indexing textual metadata and possibly some indexing of the content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: