As far as I can tell, there are a lot of people out of their leagues going around with the title "data scientist".
This is not a sample. This is a census at this point in time. The fact that there will be another population tomorrow does not change the fact that you have the entire population of all words spoken by all characters up to today.
I am not a statistician. I am an economist who knows enough about statistics and econometrics to know when a significance test is applicable.
Also, do note the issue that R's csv parsing is going to mis-attribute some characters' speech to others. GIGO speaks loud.