Check out Figure 3.13 in the GPT-3 paper, https://arxiv.org/pdf/2005.14165.pdf.
The authors experimented with 200-word news articles, to see whether 80 human judges could tell the difference between human-generated and GPT-3-generated ones.
It turns out they could not: the human judges correctly identified GPT-3-generated content only 52% of the time, essentially as good as random guessing. (And no, the machine-generated articles were not cherry-picked for the experiment.)