Assuming math works here although I think there's some caveats depending on the model architecture, 1T 4 bit is 465Gi just for the weights so you wouldn't be able to fit kv cache.
It's showing about 8-9 tk/sec which seems quite slow for something like a web search with result aggregate although maybe bareable for smaller context stuff
The thing I've been running into with z.ai hosted GLM-5.2 is the 2024 knowledge cutoff. Anything recent requires web augmentation which is more token intensive so low tk/sec hurts even more than a "smarter" model
It seems (somewhat unsurprisingly) open weight models have older knowledge cutoffs.