Can you? I imagine e.g. Google is using material not available to the public to train their models (unsencored Google books, etc.). Also, the chat bots, like Gemini, are not just pure LLMs anymore, but they also utilize other tools as part of their computation. I've asked Gemini computationally heavy questions and it successfully invokes Python scripts to answer them. I imagine it can also use other tools than Python, some of which might not even be publicly known.
I'm not sure what the situation is currently, but I can easily see private data and private resources leading to much better AI tools, which can not be matched by open source solutions.