Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried a prompt that consistently gets Gemini to badly hallucinate, and it responded correctly.

Prompt: "At a recent SINAC conference (approx Sept 2025) the presenters spoke about SINAC being underresourced and in crisis, and suggested better leveraging of and coordination with NGOs. Find the minutes of the conference, and who was advocating for better NGO interaction."

The conference was actually in Oct 2024. The approx date in parens causes Gemini to create an entirely false narrative, which includes real people quoted out of context. This happens in both Gemini regular chat and Gemini Deep Research (in which the narrative gets badly out of control).

Kagi reasonably enough answers: "I cannot find the minutes of a SINAC conference from approximately September 2025, nor any specific information about presenters advocating for better NGO coordination at such an event."



I also tried two hallucinations that I've seen in the latest Gemini and Claude. The first one is the famous "how many r in strawberry" and another one is asking about king sacrifice in chess where most LLMs tell you it's a very powerful move and tell you how to do it (obviously a nonsense since the point of the game is to protect the king).

Kagi actually got those two correct, explaining how losing your king means losing your chess game, which is quite impressive.


I tried both these prompts (along with the SINAC one as per GP) in Sonnet 4.5 and Gemini 3, and they both answered correctly for all three. Both also provided context on the chess question as well.


All of this will depend on the settings on the model (reasoning effort, temperature, top_k,etc) as well.

Which is why you should have benchmarks that are a bit broader generally (>10 questions for a personal setup) otherwise you overfit to noise


You might be interested to know your comment caused Gemini 3 to go meta on this question. This is what I found in its thinking:

``` Evaluating Date Discrepancies

I've just uncovered a significant discrepancy: a "Hacker News" result flags a date error originating from the very format of parenthetical dates. The system appears to be misinterpreting these, leading to demonstrably false information. This warrants further investigation to understand the root cause and ensure data integrity.

Analyzing Date Ambiguity

I'm now grappling with the user's date. The Hacker News comment suggests the user's prompt might be flawed, as it contradicts the actual date of an event I've indexed. Specifically, the conference referenced occurred in October 2024, not September 2025 as the prompt implies. This leads me to question the user's understanding of the timeline or if the prompt refers to an unrelated event in September 2025. This inconsistency demands further clarification.

...

Investigating User Prompt Intent

I've just realized that the Hacker News snippet reveals this might be a specific problematic test prompt. My strategy is to focus on the October 2024 forum, a strong match, organized by Kattia Cambronero and CCT. I'll verify if there was a 2025 event, but the 2024 event fits the described "crisis" and "NGO coordination". If the user says "approx Sept 2025," they likely mean a recent event. ```


Ah yes we have some benchmarks on this sort of misguided prompt trap, so it should perform well on this




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: