On the other hand, GTR-XXL is an example of a research model that biases in favor of search relevance, at the expense of latency. It's not really practical to deploy in production environments as a result.
So what is the size of your model than, or did i miss something?