I can provide insight here...
A typical archive might touch 50+ services. Each of those services has an API to export data which is called. If any service is down, the whole thing is delayed.
Internally, each service has to go retrieve all the data. All the data. That's typically a very expensive operation - A datastore for a document editor would perhaps be designed for an average user to store 100k documents, but perhaps only access 10 per day. There's a good chance the data is sharded per user, which means the work of retrieving all the data is going to fall on just one machine/storage server/application server/rendering server/whatever. That server still has other users to service too, so we can't hammer it flat out with your request.
Many types of data, when old, get archived on hard disk, since the chance of a user accessing an email attachment from 2009 is very very low. When creating a mail archive however, all those old mail attachments need accessing, and remember there's a good chance they're sharded by user, and therefore all on a small set of disks.
Remember most of the applications were designed before data exporting was a thing, so typically there is no API to read all data, and instead it must be implemented as a 'list all objects then retrieve objects one/a few at a time'.
If a disk seek on a 7200 rpm disk takes 10 milliseconds, and you have 1 million mail attachments to retrieve in random order, thats 3 hours, assuming no other load on that disk cluster.