On that note, I recently launched a new domain search tool called Lean Domain Search [1] which makes finding available .com's infinitely easier than it's ever been. It pairs your search term with 2,500 other keywords commonly found in domain names and instantly shows you which are still available, returning on average 1,200 available domain names per search.
Given the abundance of great .com's still out there, there is no good reason not to use a .com for your site over any of the other TLDs especially since as the author points out, for most normal people websites === .com.
If you have any questions, please contact matt@leandomainsearch.com or say hi @mhmazur."
I tried two words: "power tube"
We have quite a number of algorithms also (we used the whole of wikipedia's ngrams among others).
Thank you to the numerous people who took to time to email me and correct me about a definition. In this article, I refer to the entire root of a domain name e.g. Amazon.com as the TLD. I made a mistake, it is just the .com component of this name that is the TLD. I hope this error didn’t mask the enjoyment of the article for you. I appreciate all the feedback I receive.
Might give you some ideas: http://blog.hotnamelist.com/2009/02/are-all-good-com-domains...
Your graphs on the frequency of each length domain name is misleading as it is too easy to interpret it to mean that the most popular lengths are 10 or 11 characters long, when in fact shorter names are more popular but have a limited since shorter means fewer combinations. You discuss saturation later, but it would be more informative to combine the two pieces of information. For example, on the same graph you could plot a ceiling line representing the total available combinations for each length. It would be obvious that the frequency bars for lengths less than 10 are shorter only because they're bumping up against the ceiling.
Secondly, a lot of experts disagree with you on the importance of having a .com domain name, and many successful sites are on different domains.
Thirdly, what actual utility do the couplet/triplet and start/end character data and graphs provide?
If I were looking for an expert to select a domain name, I would choose someone who understands what matters, not someone buried in inconsequential minutia.
The graphs aren't misleading (IMHO), they show the distribution of lengths. However you slice it, if you have to type the domain name, you have to push that number of keys. It's far more important to know that, than the ratio of the length normalized by the possible combinations of characters at that length. (I did try looking at that as well, but the graphs were almost meaningless - even at log scales, the increase in the number of combinations of characters dwarfs the number of names, and after you get more than 10 characters, the percentages drop to so small that comparing is meaningless).
"a lot of experts disagree with you on the importance of having a .com domain name". Well, yes they can! So go with the wisdom of the masses. It's a free market, and as a company you can select your own domain suffix. Yet most go for .COM, because well, right or wrong, that's what people expect. (I will agree that, in the end, it's not as important because, if you read my article I state that many now find web sites through typing keywords in search engines, and the exact domain is not important), but, Ask youself the question though: if you are company XYZZY, and there is a XYZZY.COM domain there, and it's not yours would you be just as happy with XYZZY.NET and not worry about it (Listening to those experts?) I think not, you want to preserve your brand, and avoid confussion, and make it as easy as possible for the masses to find your site (the non-experts who make up the majority of your consumers). It's the tastes of the fish, not the tastes of the fisherman after all :)
This article was written for fun. The couplet/triplet was generated out of interest and to see common combinations of letters. I find it fascinating, and I'll be happy to explain some business utility of it if you want to send me a personal email.
I'm not trying to sell my services as an expert domain name seller; it's not what I do. I make a living as someone who mines data and helps find 'inconsequential minutia' in data to leverage (when dealing with hundreds of millions of users, moving the needle just fraction of a percent can make a difference to a bottom line).
But anyway, the article was created as a trivia/fun article. I'm sorry you don't find it interesting/relevant. (Though again, wisdom of the crowd: since someone posted it here this morning, my inbox/twitter has been alive with comments/retweets about how fun and interesting an article it is - to date, it's been one of the most promiscious articles I've written)
That may be part of it, but the author doesn’t recognize at all the likelihood the letter I is used more frequently probably due to Apple’s product naming influence, imitation from other companies pre-pending the letter before their prouducts and services, and the fact that ‘I’ is a strong, powerful pronoun.
Far more important, for instance are the substrings like "FREE" (which can apply to all things, not just computer related) and this has a couple of "E"s, or anything that has the "%ING%" substrings (which is a very common letter combination in the English language)
Also, I have to disagree with you; there are thousands of companies that have capitalized on the Apple product ecosystem (iSkin, iLounge, iPodResQ, etc.) and in the commonly associated abbreviation of “Internet” to i. I would say there are many more prefixes with ‘i’ than ‘e’ or any other letter.
I accidentally discovered that any chimp could sign up to receive the database, did that basic analysis, and have watched as it rinses and repeats every six months or so.
Such a glaring ignorance makes it hard to trust the author's domain expertise. Pun intended. Reading the rest of the article proved my instincts.
I simply processed the file provided by the good folks at Verisign, and used however they classified things.
Anyone know of such a database that is also public?