Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[flagged] Why one search engine for the whole internet is not a good idea anymore (dynamicguy.com)
20 points by ferdous on May 18, 2017 | hide | past | favorite | 27 comments


I work on Google Search. Having one search engine per country doesn't seem like the correct approach to the problem.

> Every country has a mammoth collection of valid results for your query.

Having seen the corpus of content available from each language, this is categorically false. Consider Wikipedia, which is a fairly ubiquitous information source on the web that provides answers to tons of searches. English documents: 5.4M, Romanian documents: 376k.

Perhaps the solution to the OP's woes is more tools for filtering. This post conflates the ideas between language and country. Search typically returns to you results and search features that are in the query language. Today's solution where filtering is done through query refinement and query operators seems to cover a lot of use cases already.

Further, by having an integrated product, ML models can learn behaviors specific to certain locales where those behaviors differ from region to region, and balance with more universal behaviors that apply to more than one region.


A server made out of Lego probably would not have been seen as the correct approach to the problem by an engineer at AltaVista a few years ago. Search engines by country is just a way of expressing the abstractions that are involved in chunking up search in a way that is orthogonal to building a service chunked around advertizing revenue, English, and Silicon Valley political philosophies.

Or to put it another way, 376k Wikipedia documents in Romanian is about 50% more than the number of articles in the last print version of Encyclopedia Britannica. The dismissal of their significance may express a worldview bubble that is endemic of Google and its business model.

Just because 376k Romanian documents is not enough to train a data center in how to sell chia pets to Bucharestians, doesn't mean that it is not a significant repository of information for actual human beings.


Slight off-topic and a shameless plug:

Are you guys working on removing the websites who somehow manage to be on top when you search for an entire class of queries? They are practically empty, they contain 10+ affiliate Javascripts (with which they supposedly make money from clicks?) and are basically a search query aggregators, yet Google haven't removed them yet.

Sorry I can't give examples, but these sites are out there and IMO should be outright banned. They bring zero value to customers and I'd argue that a good part of them are bringing zero value to Google as well -- they utilize some "SEO secrets" to get to the first page of Google without paying a penny.


Agree and, as I work for Seznam, especially agree on the ability to train models properly to satisfy local users. Not to mention that the user can choose what should be the language of the results.


I guess I wasn't too far off geographically with Romanian Wikipedia :)

Keep up the great work at https://www.seznam.cz/ ! Lots of interesting search features, maps, etc.


Here's what I get for "iPhone price": http://i.imgur.com/VOE0Q1q.png which shows the actual prices.

The thesis of the post seems to be that it's a bad idea to have one search engine because people will reverse engineer it. That's true, but the billions of dollars flowing into Google means they throw the smartest people at staying ahead of SEO. It seems pretty effective.


For that query I get a range of prices (sponsored result) ranging from £0.00 to £300. Obviously, the £0.00 result is going to get the click-through despite the £300 option being likely to be the better deal?

The organic results (below the fold) seem to be worse than would have been the case 5 years ago.

SEO has been replaced with chuck money at adwords and hope some of it works.


> Sooner or later the internet will have to be decentralized.

It sounds like a pipe dream. Centralisation exists for a reason: it's far easier to coordinate complex efforts within a single organisation, rather than a host of unreliable entities. Supposedly, we develop a protocol where we share the responsibility of crawling, indexing, and searching the whole web. Isn't is unrealistic to expect that millions of computers (an optimistic estimate considering number of Google servers dedicated to Search) would voluntarily want to participate in the effort? Who ensures redundancy, speed, and abuse handling?

If a single company commanding the whole power is untrustworthy, a million entities having no stake in the game is multifold more.


I hear you. Nobody claims it's gonna be easy to go the decentralization route however. Quite the contrary, it's gonna be extremely hard, probably even harder than what it took to arrive at today's status quo. But IMO it will be worth it.

Centralization and monopoly never end well. Only benevolent dictatorship systems kind of worked half-well, historically. I don't think we should just blindly believe a person in a position of power is benevolent. We humans are easily corruptible, that's the sad truth.


I couldn't agree more - one search engine is a terrible thing. But for other reasons. It's amazing to see how in other areas people complain about monopolies but when it comes to our window to the digital world, most of us seem to be fine with it. One US monopoly that dictates what should be important for us, filters what we see based on secret algorithms, forces us to "optimize" our websites in this and that way along their "standards". Google has good technology but we should wake up and see they are not the altruists they like to tell everyone.


The author says a lot of correct things. When information belongs to one organization, then there is no future. Do you really think that we will receive information only from 1 organization? Monopoly on information, like any other monopoly will not be eternal.

Today the information growing rapidly and they try to place everything on first page - it is impossible to display all the relevant information on 1 page, it is very restrictive.

And I'm not talking about the fact that on this first page there are a lot of advertising links and artificially advanced sites.

Model Yahoo has a future. And we are developing it: we are developing the Bubblehunt project, where each user can create his own search engine. This is decentralization, where each user can act as an independent information provider, like miniGoogle.

A bit of advertising, sorry: https://bubblehunt.com - tell me, what do we need to improve?

We strive to enable each user to influence the quality of Internet search and make it more social, transparent and dynamic.

People are increasingly talking about the need to change something in search.

Google is an excellent search engine, but it hides from us a huge amount of information. Every day born a lot of interesting resources and we get information only from sites that are on the 1 page of search. They become only more popular. And you need to pay money so people see your website!

Is this correct in your opinion? In my opinion, Google will make a huge profit here and this is not very correct. People just do not have alternatives.


Feedback on your product: very bland landing page (I know Google does it too but they can get away with it).

Consider showing some random examples from popular searches. I think DuckDuckGo have been doing this a while ago -- was showing search result page which was more relevant than what Google was showing.

Just one thought. Try to engage potential users. To me, your landing page says nothing.


One search engine per country seems like the wrong way to slice the problem up to me.

Locality is missing from many queries because the data is generally not relevant or is relevant but not available.

I would propose that instead of breaking it per country, breaking the search based on what sort of content you are looking for.

Programming data or food recipes have nothing to do with locality but finding the nearest place to get chocolate covered coffee beans is.

I know this is going to seem like total self-promotion but this article caught my eye and thought that it would be valid to say I have been thinking about this problem a while, which is why I built https://www.closient.com/ - mind you I am still building it but it does work.

Maybe I am not seeing the same problem as the author is but I do see locality as being a critical part of merging the logical/virtual world with the physical.


How does it work? Searching for book store (https://www.closient.com/?q=book_store the underscore was autocompleted) gave me results that were 1200+ km away from my current location. Other queries did the same.


The internet is quite independent of countries and regions. Sure, sites are localised and internationalised but the idea of an "[insert country here] internet" is meaningless. One of the major reasons that the internet is successful is that it is effectively borderless.


Personally, when I know that a certain topic is of high value to advertisers, I don't even bother googling anymore, and try to find someone qualified instead. For example, I recently was trying to look up theory behind working out, how differenr muscles work, how nutrition works and all that - but there're so many people who're trying to sell their courses, apps, books and stuff, that I have no idea how to go around looking for unbiased and at least semi-scientific source of information.

Edit: I would be thankful for pointers as to why this comment was downvoted, so I can increase quality of my content in the future.


Thankfully we still have journals and conference papers. Look for the ones with high impact factor and go from there. That's what I did when I was looking into creatine supplementation.

You can't even ask experts for their opinion since they are the ones selling the stuff!


Good advice - but in my experience, reading the latest research in any field requires a lot of more basic knowledge first.


That's a problem now only AI could solve. All other attempts at non-AI algorithms are futile. We need to be able to search through a curated internet that looks somewhat like a mix of Wikipedia and the old Yahoo directory and has to be tailored to each individual user search goals (I unlike you may be interested in commercial workout products). AI is the only "curator" capable of ranking and selecting results with such abstract filters.


Just like Facebook's AI is solving the feed problem? Do you trust the proprietary machine learning black box, that even the creators don't completely understand, ran by a for-profit corporation to decide what information you consume?

It's not like I have a better suggestion, but AI is definitely not a silver bullet for this.


I wasn't think corporate AI actually. Something between Wikipedia not-for-profit wikipedians and personal AI (browser or independently managed). AI is definitely not a silver bullet, but algorithm and human-based solutions are out of the question, so AI is probably the best option moving forward.


Facebook's so-called "AI algorithms" only optimize for advertisement money, not for good feed experience. Hardly a good example.


If you're worried about ad money in a scope of 10 next years, one will be related to the other.


Same here. I want to find a good and affordable high-protein and low-carb diet which is sustainable over months, years, or maybe a lifetime, and yet Google fails me every time. It's always some bright-smiling 22-year old chick (as if they need a diet before 30), or some old guy who discovered how his diet fixed his knee problem, or an Indian guy who is happy to tell us how everybody else is bullshitting us but hey, HE HAS THE ANSWERS. Etc. to infinity.

Also nobody wants to give you an affordable diet. Hello, I can't pay for salmon 3 times a week, okay? I know it contains an easier to digest fat, fine! Can't you give me some alternatives?

It's infuriating. But Google optimizes for advertisement money and there are known "cheats" on how to be on top in Google searches which aren't well addressed to this day.


Of course Google has its short comings. But I don't see "a search engine per country" is going to change anything. First of all, why limit the boundary to countries? There could be vastly different areas in the same country or two countries with similar culture and languages. Second not every country can afford / has enough resource for setting up their own Google. Search engine requires expertise that builds up through many years of experience. Instead of reinventing the wheel for every country, why not work on better localisation of existing search engines?


Google works surprisingly well with languages other than English. I've been looking for Russian-language originals of some of Chekhov's short stories and plays, and pasting, e.g., "Чехов чайка"into the search bar finds the full text of The Seagull (on a .ru site) with no problem. I'm not sure how having a Russian-only search engine helps here.


Google is not the only search engine available. Bing has still a big market share in the US, Yandex in Russia, and Seznam in the Czech Republic. Nevertheless, the dominance of Google often means that SEO follows their ideas but often we see SEO masters adjusting to these search engines too because they want to reach also their users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: