Hmm, so if structured output affects the quality of the response maybe it's better to convert the output to a structured format as a post-processing step?
It's a tradeoff between getting "good enough" performance w/ guided/constrained generation and using 2x calls to do the same task. Sometimes it works, sometimes it's better to have a separate model. One good case of 2 calls is the "code merging" thing, where you "chat" with a model giving it a source file + some instruction, and if it replies with something like ... //unchanged code here ... some new code ... //the rest stays the same, then you can use a code merging model to apply the changes. But that's become somewhat obsolete by the new "agentic" capabilities where models learn how to diff files directly.
Haiku is my favorite model for the second pass. It's small cheap and usually gets it right. If I see hallucinations they are mostly from the base model in the first pass.
Depending on the task you can often get it in about one request on average. Ask for the output in Markdown with reasoning up front and the structured output in a code block at the end, then extract and parse that bit in code.
After endlessly tweaking the SQL generators[1] that I am working on, I would recommend setting a "reasoning" output string to activate step by step thinking and better responses. Even better if you can add output "reasoning strings" more relevant to the specific task you are trying to solve.
Depends on your use case. Post-processing can save headaches when soft constraints are fine or you want max flexibility, but you risk subtle errors slipping by. For API responses or anything that gets parsed downstream, I still trust grammar-constrained generation more—it just surfaces problems earlier.