I'm actively trying to scaffold an LLM for business usecases and my experience totally echoes this. I can tweak the prompt for a very representative set of training data, and then it vomits at a seemingly normal new case.
You might be interested in another way to use LLMs inside a SQL query. For example, LLMs can be used to power a "soft join" between SQL tables for when a correspondence is only implied (e.g. different address formats, etc.).
This isn't enough to enable a useful interface.
It takes a lot of scaffolding to get an llm powered interface to actually work.