I also tried to use it with claude code with claude code router and it's pretty fast. Roo code uses bigger contexts, so it's quite slower than claude code in general, but I like the workflow better.
this is my snippet for llama-swap
``` models: "glm45-air": healthCheckTimeout: 300 cmd: | llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.5-Air-GGUF:IQ1_M --split-mode layer --tensor-split 0.48,0.52 --flash-attn on -c 82000 --ubatch-size 512 --cache-type-k q4_1 --cache-type-v q4_1 -ngl 99 --threads -1 --port ${PORT} --host 0.0.0.0 --no-mmap -hfd mradermacher/GLM-4.5-DRAFT-0.6B-v3.0-i1-GGUF:Q6_K -ngld 99 --kv-unified ```