The pattern is to put the work in a queue, respond to the user immediately, then process in the background, outside the request cycle.
Regarding auto scaling, it is scaling your worker servers to work down the queues, but it is not as urgent/critical as auto scaling your app servers if they had to handle the load.