By my math just the hosting cost for GPT-J’s TPU node puts it at about the same price per token as OpenAI’s Davinci. And that’s if you’re using it 100% of the time for a whole month.
Adding in the cost of a server to manage a REST API and a proxy to forward completions requests to an available node and you’re definitely going to pay more than you would to OpenAI. Add on the personnel cost of now maintaining your own GPT-J cluster and writing your own software to handle the completions requests and you’re paying way WAY more than you would to OpenAI. All that subsidization from M$ is making its way to the customer.
This is still something we’ll probably look into at work. But it won’t be a cost saving measure.