I am also glad that libgen exists. Liberating human knowledge from copyright will improve humanity overall.
But I don’t understand how you can be sure that the big players are using it as a training corpus. Such an effort of questionable legality would be a significant investment of resources. Certainly as the computronium gets cheaper and techniques evolve, bringing it into reach of entities that don’t answer to shareholders and investors, it will happen. What makes you sure that publicly owned companies or OpenAI are training on libgen?