Wikipedia handles “New York City” and “NYC” as intended. “NY” and “New York” are ambiguous to both machines and humans (are you referring to the city or the state?) and if you have a resolution strategy for this then Wikipedia gives you the options to disambiguate. I’ve never seen “NYCity” used by anybody.
If you start processing web articles on the scale of millions you'll be surprised by how creative people can be. Not talking about tweets, just news and blog articles.
Not surprised, just not relevant. The criteria here is “you can get pretty good results”, not “you must be able to process millions of articles without failure”.