If what a government receptionist says is copyright-free, you still can't walk into their office thousands of times per day and ask various questions to learn what human answers are like in order to train your artificial neural network
The amount of scraping that happened in ~2020 as compared to 2024 is orders of magnitude different. Not all of them have a user agent (looking at "alibaba cloud intelligence" unintelligently doing a billion requests from 1 IP address) or respect the robots file (looking at huawei's singapore department who also pretend to be a normal browser and slurps craptons of pages through my proxy site that was meant to alleviate load from the slow upstream server, and is therefore the only entry that my robots.txt denies)