"Captcha" stands for "completely automated
Turing test to tell computers and humans apart."
No it doesn't - where's the word that starts with "P"?According to Wikipedia "CAPTCHA" is an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart" (and is apparently a trademark of Carnegie Mellon University, which I didn't know.) Even that's a really bad backronym.
Google's reCAPTCHA page[0] says the same thing, and even attributes it:
The term CAPTCHA (for Completely Automated Public Turing Test
To Tell Computers and Humans Apart) was coined in 2000 by Luis
von Ahn, Manuel Blum, Nicholas Hopper and John Langford of
Carnegie Mellon University.
That should then be CAPTTTTCAHA. Doesn't quite have the same ring to it.Given the term CAPTCHA it seems that a "better" (by some definition) backronym would be:
Completely Automated Process to Tell Computers and Humans Apart.
At least with that one only the particles "to" and "and" are left out of the abbreviation.More on topic though, I long for the day when off-the-shelf systems that even script-kiddies can use are able to break CAPTCHAs with ease - that's when they'll finally die. TicketMaster gave up on using the standard CAPTCHAs earlier this year[1], and I'd love to see something better replace it.
So long as the "better" solution is something OTHER than "Please log in to your Facebook account."
The article linked to 28 other different systems that have claimed to beat / demonstrated beating captchas at some point: http://www.karlgroves.com/2013/02/09/list-of-resources-break....
Without a performance comparison to existing methodologies on a benchmark dataset and precise details on the models, this is a neat marketing demo and nothing more.
That's a really sad thought.
I have previously told some people that this is the holy grail of dieting apps. The difference in ease of use between entering all of the items in your meal, one by one, and just taking a photo, would be a game changer. Of course, this is just a throwaway example in the article so they probably haven't done any of the work that would be required to make this a reality (aside from the vision processing, of course). I think it could be done with thousands of human raters estimating for you instead of a machine learning system, but I was skeptical of whether it could be profitable enough to justify it as a startup on a risk/reward basis. One day, maybe we'll see it though.
https://itunes.apple.com/us/app/meal-snap-calorie-counting/i...
Apparently the quality of the estimates is very poor for the app. The problem is that DailyBurn LITERALLY uses Mechanical Turk[1], which I don't think is what you would need to do to get accurate estimates. You would have to actually have in-house talent with training and feedback on estimating the size and calories of food. I'm pretty good at estimating the calorie content of common foods just by eyeballing because I've done it for a long time and know generally what goes into them, but this is obviously an acquired skill.
My prediction is that Google will one day enter the "Bot Recognition Market". They've got so much data on everyone and their browsing habits from Gmail, to Adsense, to Search results, to Google Analytics. Their cookies, browser, javascript, and ads follow you around all over the internet. They're the only company capable of putting all that data together and returning via API: "This user is a real person, we've analyzed 3 years of data from them, go ahead and let them sign up"
Or returning: "User is a bot, their IP has no purchases, no google account, no search requests, no adsense impressions, etc..."
However simply disabling users < 3 posts being allowed to post hyperlinks got rid of 99% of spam.
We initially allowed plain text links but spammers seemed happy to just post them.
Spammers want to post hyperlinks, without that ability they can't do anything.
Make links remain as plain text (so they are plain text as far as google is concerned)
then use javascript to turn them into clickable links when the user mouse is over or near the link so the user can still click on them.
No idea why they seemed happy to post plaintext links but there you go!
Apart from one blip (scepticism vs sccpticism) it outperforms me.
http://fontspace.com/malwin-b%C3%A9la-h%C3%BCrkey/merkur
http://fontspace.com/honey-and-death/dotline/8617.charmap
http://fontspace.com/bythebutterfly/bubble-bath
http://fontspace.com/jecko-development/jd-lcd-rounded
Eventually, the spammers will make a bot to analyze the distance between dots, group them into letters, and the race will be on to use other methods. I see this as a never ending virus/immunity battle. We're pretty much at the end of Captchas. Other methods like mouse movement, surfing time, scrolling, etc... can all be mimiced as well. Computers can or will be able to simulate humans very well, even our imperfections.
Love your optimism. I'd like to say that captchas are very limited : they have to be solvable by idiot humans. Captcha algorithms have no such limits they have to abide by. Since the anti-spam side of things is blocked at a certain point in the arms race, the other side is bound to win.
Why not just require, say, a google or facebook login and transfer the "eliminate spammers" problem onto them ?
Hype is dangerous to AI. Hype killed AI four times in the last five decades. AI Hype must be stopped."
Yann LeCun, https://plus.google.com/104362980539466846301/posts/Qwj9EEkU...
While the modern Machine Learning Movement makes sense in a historical context and is a reasonable reaction to the disappointing returns from symbolic inference during the early days of AI research, it is terrifying that the research community is satisfied to rely on big data and statistical methods to carry us forward.
Few among us recognize the need to prioritize the study of the human brain. Even fewer are placing their bets on intelligent computer systems seeded with neurologically-inspired designs.
Vicarious gets it.
How long before others see the writing on the wall?
Now is the time to stop reacting. Now is the time to consider the field in a broad context and develop a balanced, holistic approach.
Consider this a wake-up call.
Also, could you expand on what a "balanced, holistic approach" to machine learning is?
One example is that most NNs neglect the time domain.
A balanced approach recognizes the importance of learning from data, but does not _rely_ on big data. A holistic approach entails a close examination of biological learning systems.
Numenta has pulled crap like this before. We know patents may be pending, but you don't have the epistemological right to go blowing the Great Shofar for the invention of True AI with a link to your company's website and a fancy buzzword about neural or cortical this-and-that on the front page. We need to see some published research, or you need to take over the world. Preferably the former.
Until then, stop making claims unless you want the rest of us to consider you a crackpot and a braggart.
Come on, guys, put up or shut up. If you've made the kind of advance in machine learning that entitles you to talk about human-level cognition, take out a patent and then publish some freaking papers. Or take over the world.
There are accepted ways of proving claims like this, and founding company after company without releasing a product or publishing research isn't one of them.
They state a captcha solving rate of around 90%.
For OCR to be cost-competitive, you typically need it to be correct on about 98% of characters or more; below that and it is typically cheaper to have a human typing in the text than to have a human correct OCR'd text.
Modern OCR engines typically do better than 99% on text that isn't really badly damaged (my MSc. dissertation was on error correction in OCR, and as part of that I tested some engines with pages that had been crumpled, intentionally damaged with sand and liquids, and even then many of the engines managed more than 99%).
> One big difference in Vicarious’s approach, says cofounder Dileep George, is that its system can be trained with moving images rather than only static ones.
Does this imply they teach it how the shapes of numbers change, for easier detection?
This is one of the things people don't often realize you can do with algorithms. You don't need to look at the world the way it actually really exists, and there may be very good reasons not to. Training algorithms to actually recognize moving images is incredibly hard, because it requires things like memory, fade-outs, recurrent networks, all that very advanced stuff. Obviously time exists as a continuum in the "real" world. But that's bloody inconvenient. So just look at big "quanta" of time, collecting all data points during the quantum, analyse it, then shift the quanta/window ahead 0.1s and do the exercise again. This is so much easier you wouldn't believe it.
Teaching an algorithm to recognize, say, a car collision, given 100 frames. It doesn't require any change to the algorithm (just a change in training data). And obviously your backend system needs to be aware that, over time, the "isColliding" output will look like ......1.....11.....1111...1111.1.1.111.11..11...11.11...11...1.....1...1...... when a collision occurs and this of course doesn't mean you've had 20 collisions.
It does mean a bigger network, slower training, and more resources needed. But not as much as you'd think. Keep in mind that a "temporal" network will need more hidden layers. Also please consider building "redundant" networks for temporal data. When people ask why, I have no better answer than that it's the same technique our brain uses, so frankly if it's good enough for God, it's good enough for me.
Doing the temporal thing means you're back to using trivially simple algorithms, running on more data.
Cracking captcha's is not very impressive. I've done it as a weekend project, and exceeding "average" human captcha'ing ability is easy. I actually got it to the point where my algorithm was slightly better at captcha's than me, where I was allowed to take 2 minutes for difficult captchas. If I wasn't allowed to take more than 10 seconds, my algorithm easily beat me by over 10% (my captcha performance, when measured, shockingly is only ~83%). I didn't cheat : I used an external site's captchas (from dns.be).
The algorithm used was dead simple backpropagation.
If it's so easy, could you share it on github?
Turning on recaptcha only briefly to test resulted in a mass of spammers.
Even CAPTCHA inventor Luis von Ahn has talked about CAPTCHAs being a practical way of generating test data for computer vision systems.
I think the assumption you're making is that humans only have sex because it feels good, and not for the explicit purpose or the creation of life. If people want children, they'll procreate regardless, even if it doesn't feel as good as robot sex.