They could do that just as easily without federation (and in fact they already do: they've been found to create "shadow users" from people who aren't even Facebook/Instagram users from the tracking cookies they've gotten lots of sites to add, and they plug in all that info if that person later joins Facebook or Instagram).
I can't think of anything they can find out if federation is turned on that they can't find if federation is turned off. Even if there were some info that could only be obtained by being in the federation (and I can't think of anything but I might be wrong), that's easy enough: just create some small instances that don't identify as Meta or Threads and have the users of those instances follow people on all the large instances.