A difficult, but not intractable problem: OLMoTrace claims to be able to trace from output to training data in seconds [1]. Notably, it can do this because OLMo itself was intentionally designed to be open and transparent [2]; it was trained on 4.6 trillion tokens of entirely open data (which you can download yourself) [3]. There's nothing stopping Meta or OpenAI from creating a similar tool, other than the obvious detail of that showing their exact training data.
[1] https://arxiv.org/abs/2504.07096
[2] https://allenai.org/blog/olmotrace
[3] https://huggingface.co/datasets/allenai/olmo-mix-1124