It's pretty easy. Let me tell you!
1. install llama.cpp
2. download models from hugging face (gguf)
3. run the script to start a server of the model
4. execute script with camera capture!
The tweet got 90k views in 10 hours. And was liked by the Georgi Gerganov (llama.cpp author) and Andrew Karpathy. Got super happy, ahah, my legends liked my tweet:)
I have released step by step guide for it on github.
Let me know what you think? What potential usage does this open?
https://github.com/Fuzzy-Search/realtime-bakllav
Related links Discussion on running llama: https://github.com/ggerganov/llama.cpp/pull/3436 Model: https://huggingface.co/SkunkworksAI/BakLLaVA-1 X: https://twitter.com/Karmedge/status/1720825128177578434