I implemented CLIP inference in plain C/C++ with no extra dependencies thanks to the great work in GGML that powers llama.cpp. It can work with models from both OpenAI and LAION. It also supports 4-bit quantization for extremely constrained devices, 4-bit CLIP-base is only ~85 MB!
I'm also happy to hear that a company is thinking of adopting it for their project only a few hours after I announced the repo.
Check it out and feel free to reach out to me for any suggestions or give it a star