Hm? GPT-3 is relatively cheap to inference from, at least compared to the cost of training. You can load all the params onto a single TPU, actually. (A TPU can allocate up to 300GB on its CPU without OOM'ing.)
AI dungeon is also powered by GPT-3, and it's quite snappy. I'm not sure why GPT-3 is seen as computationally expensive, but it seems workable.