I’ve made an example app for a Flutter plugin I created that can do this.
Open-source, runs natively on all major platforms. I shared videos showing it on my iPad Mini, Pixel 7, iPhone 12, Surface Pro (Win 10 & Ubuntu Jellyfish) and Macs (Intel & M archs).
By all means, it’s not a finished app. I simply wanted to use on-device AI stuff in Flutter so I started with porting over llama.cpp, and later on I’ll tinker with porting over whatever is the state of the art (whisper.cpp, bark.cpp etc).
App is compatible with any GGUF files, but it must be in the ChatML prompt format otherwise the chat UI/bubbles probably gets funky. I haven’t made it customizable yet, after all - it’s just an example app of the plugin. But I am actively working on it to nail my vision.
You can use it commercially but there are some restrictions, including some of a competitive nature, like using the output to train new LLMs. This is the restriction that Bytedance (Tiktok) was recently banned for violating.