Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've replaced 90% of my Google searches with Phind in the last few weeks. My use cases are learning a new API, debugging, generating test cases.

It's amazing. Real time saver. Just yesterday it saved me from going down an hour+ rabbit hole due to a cryptic error message. The first solution it gave me didn't work, neither did the second, but I kept pushing and in just a couple of minutes I had it sorted.

Having said that, I'm not sure I see the gain with Expert mode yet. After using it for the last couple of days, it's definitely much slower but I couldn't perceive it to be any more accurate.

Judging by your example, it looks like the main difference is that the Expert mode search returned a more relevant top result, which then the LLM heavily relied on for its answer. If search results come from bing, can you really credit that answer to Expert mode?

PS. You mention launching GPT-4 today, but the Expert Mode toggle has been there for at least a few days, I reckon? Was it not GPT-4 before?



Love to hear it. It's true that for some searches you might not notice a difference, but for complex code examples, reasoning, and debugging Expert mode does seem to be much better. We quietly launched Expert mode a few days ago on our Discord but are now telling the broader HN community about it.

We're working on making all of our searches the same quality as Expert mode while being much faster.


I'm definitely giving this a try sometime soon. I had an idea back when it was just GPT-3 out there, to use LLM-generated embeddings as part of a search ranking function. I'm betting that's roughly how Expert mode works, right?

Edit: Just had another thought. You could use the output of a normal search algorithm to feed the LLM targeted context, which it could then use to come up with a better answer than it would without the extra background. Yeah, I like that.

Although, I will say I asked it about writing a lisp interpreter in Python, because I was just tooling around with such a thing a little while ago for funsies. It essentially pointed me to Peter Norvig's two articles on the subject, which, unfortunately, both feature code that either doesn't run properly or doesn't run right at all. I was disappointed.


We do use the output of a "normal" search algorithm to feed our LLM context :)

Did you use Expert mode for your search? Only Expert mode is GPT-4 and its code quality is vastly superior to that of the default mode.


I'm a beginner, so I'm unable to tell if it's hallucinating or not. Do you find it hallucinates or is incorrect? I'm wary of noting stuff down and remember wrong things an don't want to drill 2 levels deep for each question


I've been using ChatGPT 4 the past couple weeks and also Phind just last night with a new library version. While yes, i did find that Phind was wrong a lot (though i don't think it was fully hallucinations, just wrong library version combinations), i think there's a more important point to be made.

Unless we get a very near breakthrough on self-validating accuracy of these models or models+plugin combinations, i suspect it may be a useful skill to learn to use LLMs to explore ideas even when hallucination is a risk.

Ie searching with Google is a skill we have to acquire. Validating results from Google is yet another skill. Likewise i feel it could be very useful to find a way to use LLMs in a way where you get the benefits while managing to mitigate the risk.

For me these days that usually translates to low risk environments. Things i can validate easily. ChatGPT was a good starting off point for researching ideas. It's also very useful to know how niche your subject matter is. The less results you find on Google for your specific edge case the more likely ChatGPT will struggle to have real or complete thoughts on the matter.

Likewise i imagine similarly this is true for Phind. Yea, it can search the web, but as my tests last night showed it still happily strings together incorrect data. Old library versions, notably. I'd say "Given Library 1.15, how do i do X?". It did eventually give me the right answer, but it happily wrote up coding examples that were a mix of library versions.

I imagine Phind will, to me, be similarly useful (if not more?) than ChatGPT, but you really have to be aware of what it might do wrong. .. because it will, heh.


We definitely still have work to do in this area and the feedback we've gotten here is incredibly helpful. Having the AI be explicitly aware of specific library versions so it doesn't mix-and-match is a high priority.


I just tried it for a problem I solved in Azure Data Explorer and it solved it by making up some APIs that don't exist. It got close to how I solved the problem but cheated even with Expert mode enabled.


Seems like accuracy is the next killer feature for LLM search and teaching, will try again in 6 months


What a time to be alive where we likely need wait only a few months for the next big hurdle to be accomplished.

Exhilarating and terrifying at the same time.


I dunno about that in this case. The "confidently incorrect" problem seems inherent to the underlying algorithm to me. If it were solved, I suspect that would be a paradigm shift of the sort that happens on the years scale at best.


Yes, the "confidently incorrect" issue will be a tough nut to crack for the current spate of generative text models. LLMs have no ability to analyze a body of text and determine anything about it (e.g. how likely it is to be true); they are clever but at bottom can only extrapolate from patterns found in the training data. If no one has said anything like "X, and I'm 78% certain about it", then it's tough to imagine how an LLM could generate reasonably correct probability estimates.


What you're alluding to is calibration and base gpt-4 had excellent calibration before RlHF.


It seems to be sort of a bit wrong more often than it hallucinates.

I've had it straight up invent a library that doesn't exist once, but that seems to be quite rare and you need to be deep in the weeds with a rare problem domain to get that.

More often I ask it how to do something, and it sort of provides an answer, but not quite. So I point out the flaw, and it fixes it, but not quite. Rinse and repeat. After anywhere between 4-10 iterations it's usually quite good. The experience is like code reviewing a really apologetic and endlessly patient junior developer.

Although I think what might be a beginner's saving grace is that it seems to be better at beginner questions than advanced questions, since there are more of them in the training data.


google who?


its a verb now, I search at phind.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: