I happen to be working on a toy machine learning project that, based on the fictional characters known by someone, predicts their approximate age. Your list is the first organic validation set that happened onto my machine!
When Friends or Firefly were on, for example, I was watching a lot of TV (like you) because at that stage of my life I was settling into a long-term relationship (early twenties) and we found those to be mutually enjoyable things to watch. Movie watching tends to fall off around the time people start having kids. So far, on my friends-and-family (kids and grandparents alike) polls, it's pretty accurate.
Thanks again for answering :) I may just post my quick and dirty hack when I'm through playing with it.
It guessed early 70's.
It's written to guess early/mid/late subsets of ten-year ranges, and actually does have some data for kids/teens. Mostly it had fictional characters in movies, books and television but some singers snuck in there when I had my family fill out excel worksheets, heh!
Any advice on how I could collect this kind of data? A survey on ask HN? Mechanical Turk?
EDIT: the only features it uses are fictional names that one can rattle off in one sitting and the specific age to use for the label. No gender or other demographic information. So that's what I would collect, just a list of fictional names a submitter can think of in one sitting, and their age. It's basically a form of supervised topic classification training seen in other ML tutorials, but using the age as the training set topic label. I'm experimenting on enriching the data afterwards with media (book/movie/tv flags) see if that feature improves its performance, but I'm teaching a class this week and don't have any spare time to work on it.
Here goes, and I'm not trying to make fun of anyone, and I don't belive it's true. Personally, I believe the millenial generation might be the most knowledgable in history.
That said, I've heard some of you don't know what a 45 record is, or how a rotary telephone works?
Some people claim to not to know certain dated things because it's fashionable. I once heard a young Rebublican claim to know nothing about the GOP past. The guy sitting next to her said, "I didn't live through the French Revolution, but I know what happened?" That ended the conversation.
So what the truth?
I was barred from calling my girlfriend at night in my room, and my phone was replaced with one that was broken. My parents wanted me to be able to answer the phone, but the keypad didn't work, so I couldn't dial out.
What I could do, however, was rapidly toggle the receiver hook. Turns out the pulse frequency tolerance had a pretty wide range, and I became quite skilled at quickly tapping out her number. That, and the Nintendo gamer hotline. Priorities!
This list made be realize a sort-of annoying (sometimes) tendency I seem to have developed. It appears that my first reaction to cool things is now not wonder but 'I need to engineer the shit out of fit'.My first thought after looking at the list was not 'wow, cool', but more of 'so if I use Named Entity Recognition, and a large corpus, I could have tens of thousands of such names in hours. Maybe I can catch up on computational linguistics literature on the issue, and even identify the relative importance of characters on the text. Should be a day-long project'. Need to learn to enjoy things for what they are, sigh.
My approach is in part to learn to accept that it's what I'm like, it's something that fortunately helps put bread on the table, and fighting it too much seems pointless. And the other part consists of forcibly turning it off at times, through meditation or other activities, and seeing if it actually benefits me or if I'm just trying to be something I'm not.
So far I lean towards 'accepting who I am' with the occasional and very necessary break. It's only when I become to 'meta' about this process itself that I get truly unhappy (trying to engineer my periods of non-engineering, and then to force myself to not engineer this process, and so on).
In fact, it's all the 'meta' stuff in general that seems to be a bigger problem than any of my natural urges. But I digress...
See what I mean? ; )
Yes, whenever I read a book or watched a movie I've added some of the characters to the list.
> It appears that my first reaction to cool things is now not wonder but 'I need to engineer the shit out of fit'.
Imagination and creativity. Nothing wrong with that. And nobody can blame you if you haven't the time to realize the idea.
If the goal is testing, an improvement would be to add some internationalization. There are not other than English characters there. You want to be sure that your first foreigner don't break your program.
Actually, maybe it would be a nice project to accept pull request from around the world and create an standard international data set.
The goal is rather to be using some fun names during development. For example, when I work on our appointment scheduling software, I often need to walk through the scheduling and registration process in order to check the user experience. Instead of using the tired old John Smith123 and john123@example.com I can use some names that I fondly remember from a book or movie.
I'm not "book" cultured, so a lot of names I don't recognize, but nice shout outs to 30 Rock and Anchorman :p
Github complains that it's not a properly formatted CSV file. Maybe consider a TSV? It'd probably still complain.
I've yet to use it, but it's been in my back pocket for when I need it. This PHP package looks nice if you need more than just names: https://github.com/fzaninotto/Faker
Yes, I need that for a simple script that generates email addresses and links.
When I'm writing database fixtures for use in tests, I like to manually choose names from movies/tv-shows for related entities.
For example for an Account with multiple Users I will pick Phil Dunphy for the owner role, Claire Dunphy for the admin role and Luke/Haley/Alex dunphy for regular user roles.
I knew I couldn't be the only one. And using family members to illustrate different user roles is quite clever.
Since it has been starred and forked on Github I think the answer is yes.
> has that ever been a problem worth mention to someone?
It has been to me. When I work I want to write lines and not think what name I use when testing. I have a simple script that creates a link to my applications so that the registration forms get pre-filled.
Optionally: "If you'd like, I'd be happy to do some of the legwork for you there. Would you be open to receiving a pull request?"
We ended up using our customers' first names and it was a disaster. We had all kinds of joke entries put in, like "JERK"... My favorite customer name was "POOP LENGTH". lol. /facepalm.
Anyway, so in this multimillion dollar enterprise application we're showing "POOP" to the whole company.
At least it was an intra-enterprise-only app.