I built llama3.java in the past, this is a follow-up: Gemma 4 running entirely on the JVM.
No Python. No JNI. No native code. Just Java.
It’s (mostly) a single Java file implementing the full stack:
GGUF parsing, tokenization, Gemma 4 transformer inference, quantizations, CLI...
Built using the Java Vector API, with support for GraalVM Native Image.