Last I checked Ollama inference is based on llama.cpp so either Ollama has not c... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		RossBencina 3 months ago \| parent \| context \| favorite \| on: 'Western Qwen': IBM Wows with Granite 4 LLM Launch... Last I checked Ollama inference is based on llama.cpp so either Ollama has not caught up yet, or the answer is no. EDIT: Looks like Granite 4 hybrid architecture support was added to llama.cpp back in May: https://github.com/ggml-org/llama.cpp/pull/13550

magicalhippo 3 months ago [–]

> Last I checked Ollama inference is based on llama.cpp

Yes and no. They've written their own "engine" using GGML libraries directly, but fall back to llama.cpp for models the new engine doesn't yet support.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact