Doing this does complicate decisions for releasing subsequent model updates, as the production model can't be directly compared against new iterations any more. Instead a pre-production model would need to be used, that has not seen the test set. However, if data drift is likely, then re-using the old test set wouldn't be useful anyway.
More complication arises when users expect that things which worked previously in one way - continue working in this way. Users don't really care that their traffic was in the test set. In an even more extreme case, many industrial problems have a high correlation between the traffic today and the traffic next week, An optimal solution for such a situation would be to complete a full memorization today's traffic and use that for next week. In many cases, an overfit model can effectively perform this memorization task with fewer parameters/infrastructure than an actual dictionary lookup.
I'm simplifying now, but you can think of epochs as "how many times we train over the entire dataset? 1 time? 10 times?"
Correspondingly, you can think of dataset size as "how many Wikipedia pages we include in the dataset? 1 million? 10 million?"
Now let's think about overfitting.
What happens when you increase epochs is the model is more likely to overfit your data.
What happens when you increase dataset size is the model is less likely to overfit your data.