The Unreasonable Effectiveness of Recurrent Neural Networks (opens in new tab)

(karpathy.github.io)

913 pointsbenfrederickson11y ago207 comments

207 comments

136 comments · 37 top-level

swalsh11y ago· 17 in thread

As a father, the output feels really familiar. It's like a child learning to talk. At first, though the words they say are actual words (and mean something to you), they themselves have no idea what the meaning is. Eventually though they start understanding the meaning, which combined with the syntax creates a person who can communicate.

I wonder if all that's missing is just a few more layers, and another source of input. Maybe a list of requirements/output/input matched with the code so it understands why what was written was written. I wonder what would happen if you ran the program, took the output, and fed it back in as input.

Really cool stuff here.

wayfarer2s11y ago

I think you are right in that other inputs are needed to decipher meaning. Humans for example tend to have quite a lot of different sources of input -- as when we are children and learning new words we have the spelling (visual), how it sounds (auditory), and possibly another image that shows what the thing means ("cat"). Or maybe we have the auditory ("mommy") and the visual (the child's mother). If you were trained strictly on text, then the meaning of concepts is harder to decipher. It might be why abstract concepts like higher level math are hard for a lot of people to grasp -- their only exposure to the concepts is usually just in the form of text.

As an exercise, when I think of the word "circle", images of circles and spheres show up in my head. Also the equation of a circle. My quick definition of it would be "a perfectly round object" which leads to questions of what "round" and "perfect" mean. The more I think about it, all my knowledge seems quite circular in that there are no axiomatic concepts, everything is relative and it just builds on itself. I wonder if that's the key to decipher meaning, increase the connections of the web -- with strong enough references you can pinpoint which of the nodes in the web something refers to.

Houshalter11y ago

What about programming, for example? It's entirely abstract and not necessarily have any visual representation. Programming is best learned through examples. E.g. "here's a line of code, here is what it outputs. Now try to figure out what the rules of the language are."

In the case of this article, the NN isn't being asked to do any abstract task like "decipher meaning", but the very concrete task of "predict the next word". As the article shows NNs can do this fairly well.

There is also a evidence that they can learn very high level knowledge about words and objects. See the success of word vectors: http://technology.stitchfix.com/blog/2015/03/11/word-is-wort...

tim33311y ago

> when I think of the word "circle", images of circles and spheres show up in my head

There seems some evidence that this stuff is fairly central to human intelligence and the ability to visualize in 3d is kind of hard wired. Deciphering meaning is approximately "seeing what it means" which can correspond to visualizing it in your head. For example "the cat sat on the mat" is a bunch of symbols but someone or some machine can convert that to an image of a cat sitting on a mat then I guess they've understood it.

hellameta11y ago

This part of your comment, "...there are no axiomatic concepts, everything is relative and it just builds on itself", reminded me of this Marvin Minsky paper. If you haven't read him before - enjoy the ride!

https://web.media.mit.edu/~minsky/papers/MusicMindMeaning.ht...

guybrushT11y ago

As a father, and as someone interested in this discussion (about "child learning to talk"), I think you will love Prof Deb Roy's insights into how his infant son learned language.

http://www.ted.com/talks/deb_roy_the_birth_of_a_word?languag...

As one of the other commenters pointed out - it is like a tree (words/concepts) branching out from one another. I would be fascinated by seeing if this research can be continued into adulthood, where the individual "concepts" aren't as important as the interplay between them.

digikata11y ago

Human children have the great benefit of interactively learning from their parents and other humans raising them. Could we expect a child to learn to speak if they only heard recordings of existing speech with otherwise no human interaction/feedback - correcting them or offering customized and contextual new bits of information? It would be interesting to add feedback path for human corrective input. i.e. because it's direct interaction, feed it back but somehow weight it a little more than just another corpus input.

tzs11y ago

> Could we expect a child to learn to speak if they only heard recordings of existing speech with otherwise no human interaction/feedback - correcting them or offering customized and contextual new bits of information?

I once asked a similar question on some online forum [1] where many linguists hung out. My question was if an English-only speaking household left a general interest Spanish language TV station on most of the time when they weren't actively using the TV to watch something, so that their child received a very large exposure to Spanish language programming (news, sports, soap operas, sitcoms, movies, etc) from birth onward, would the child naturally learn Spanish?

I don't recall for sure what the linguists who responded said, but I think they all said the child would not learn Spanish from this.

[1] I have no recollection of where this was.

3 more replies

dwiel11y ago

There are some recent examples where people have trained a collection of large nets which are then used to teach a smaller net. The smaller net can learn more quickly and finally achieves better performance than the large collection.

The methods involve providing more detailed feedback at each example. With most training data used now, we give a 0 or 1, does this example belong to this class. In the teacher networks, they were able to teach with more subtly: this is definitely not a car, it is very lizard like and a little snake like.

1 more reply

jpatokal11y ago

The TL;DR appears to be "no", interaction is necessary:

https://en.wikipedia.org/wiki/Language_acquisition#General_a...

Although for obvious reasons this is very hard to study experimentally:

https://en.wikipedia.org/wiki/Language_deprivation_experimen...

1 more reply

solve11y ago

I'm convinced that voice pitch, syllable meter, hand gestures, facial gestures, and other forms of non-word emphasis are also crucial to figuring out basic phrase chunking and word types, with minimal interaction, early on.

Seems it would be far harder to infer the basic initial structure from just plain text.

frozenport11y ago

Umm. When you train you train against a cost function...

1 more reply

cristianpascu11y ago

A machine will never get the meaning of a word, unlike a very small child. I am simply amazed by the fact that a child can learn a language, catch what a question is, offer an answer, say no (and how they like to say no), and all. As much as I wish it was possible, that much I believe it's not. The best we can do is put our knowledge of our ability to infer meaning of words into machine code.

NathanKP11y ago

So far scientists haven't found anything special about the human brain that can't be mimicked by a machine. Given enough neural connections, and a large enough data set, and a long enough training period there is no reason to think that a machine can't do everything a human brain can do.

Put another way there is nothing magical about a child learning about the world. A child's brain is just a large neural network being fed patterned data over the course of many years by a variety of extremely high resolution analog sensors. Eventually the child begins to respond to the patterns.

4 more replies

seiji11y ago

> A machine will never get the meaning of a word, unlike a very small child.

Why not? Your brain isn't magic, just highly associative. We can do the same thing with computers real soon now.

2 more replies

btown11y ago

I often wonder if the gap is that we're so caught up on training our networks on vision and text that we're ignoring the fact that living beings have a sense of time and reward as part of their input.

A child knows that if it says "Mama food," it is likely to get attention, and if it gets attention, it is likely to minimize its hunger. Right now, a neural network can be trained to know that "Mama" occurs often in human dialogue, what words occur around it, even its dictionary definition and images of mothers. But it's not making the deeper connection to a strategy that minimizes hunger.

When I think about this, I wonder if insights from the world of gaming "AI" would be useful in developing the training datasets for real AI. Because you can't be a mother to a billion virtual babies, but you might be able to program a set of heuristics to be a mother to a billion virtual babies. Then you have some system that trains on their life experiences...? All speculation, but very interesting stuff.

1 more reply

hanspeter11y ago

What makes the human brain not a machine?

1 more reply

gnaritas11y ago

> A machine will never get the meaning of a word

That's an irrational and indefensible position.

narrator11y ago· 9 in thread

The thing about neural nets is that they are pretty opaque from an analyst point of view. It's hard to figure out why they do what they do, except that they have been trained to optimize a particular cost function. I think Strong AI will never happen because the people in charge will not give control over to a system that makes important decisions without explaining why. They will certainly not give control over the cost function to a strong AI because control of determination of the cost function is the axis upon which all power will rest.

dimatura11y ago

Our life is dominated by systems we don't understand. I have some understanding of how my cell phone works at the software level, but when it comes to details at the hardware level I just trust the electrical engineers knew what they're doing. I have virtually no understanding of how the engine in the bus operates beyond what I learned in thermodynamics 101. Sure, you might say - someone understands these things. But for some systems, it's hard to pinpoint these people. And for some other complex systems, like the stock market, nobody really understands them or (completely) controls them. But we still use them every day. I think once AI becomes useful enough, people will gladly hand control over.

imaginenore11y ago

But some engineer out there understands how your phone works.

With neural nets NOBODY really understands how they work.

3 more replies

relate11y ago

This is a common criticism. However, almost all ML methods have some built in heuristic choices, that are the result of finding something that both works and is mathematically nice. Each of these choices restricts us to some family of functions where it's hard to justify why it's really relevant to the problem at hand, e.g. convex loss functions (l1, l2, ..), convex regularizers (l1,l2,..), gaussian priors, linear classifiers, some mathematically nice kernel functions, e.t.c. In the end, people usually statistically estimate the performance of the methods and use what works.

pjc5011y ago

the people in charge will not give control over to a system that makes important decisions without explaining why

They will if it gives the answers they want to hear. History is full of critical decisions based on ridiculous pretexts or unclear processes.

scarmig11y ago

It may be the case, though, that companies that relinquish control to neural nets will have better results than companies that don't. In fact, there's a winner-take-all effect in many markets, so in those even a slight improvement over humans would lead to massive benefits, rapidly pushing human analysts out of the market.

That's the (morally neutral) wonder of the market--it'll beat ideological or emotional objections into the ground, for better or for worse.

And sooner or later, someone might start a company where all decision making is performed by a neural net...

__Joker11y ago

I kind of drifted into the camp of transhumanism as future where human is enhanced by all the smart sub AI problem solver but generally the humans take the decision at the end of the day. Also I think other problem is for strong AI to exist we are not sure what the "objective function" for the AI to work for.

btown11y ago

> the people in charge will not give control

Eliezer Yudkowsky would likely disagree with you: http://www.yudkowsky.net/singularity/aibox

EDIT: Also - http://www.explainxkcd.com/wiki/index.php/1450:_AI-Box_Exper...

m-i-l11y ago

I remember wanting to train a neural net for my MSc thesis more than 20 years ago, but my tutor recommended against doing so for precisely this reason, i.e. he said it is very difficult to prove your results. While not being able to prove your results might be a bad idea if you're trying to get your MSc, I don't see it holding back other advances.

chronolitus11y ago

What if an AI saves money, though? ( i.e. is cheaper than hiring a real person for a simple task )

" Never! Companies would never sacrifice principle and safety to save money! "

We'll see...

tshadwell11y ago· 8 in thread

I'm not sure how "unreasonable" the effectiveness of RNNs are if the corpus output at 2000 iterations isn't significantly better than a simple prefix based markov chain implementation [1] (and for the regular languages, with some extra bracket-checking), but I found the evolution visualizations really interesting.

[1] http://thinkzone.wlonk.com/Gibber/GibGen.htm

darkmighty11y ago

It's quite unreasonable. He could have optimized it more for fooling humans in Gibberish generation, but that would not show the general effectiveness of the approach. The power shows (quantifiably) in compression: 1.57 bits per character wikipedia is quite hard to beat. Of course, Markov Chains are essentially universal models, so the training algorithm is the crucial distinction.

I believe Markov Chains as a model quickly become inefficient (specially memory-wise) as you increase the complexity (long range correlations) of your prediction. It's an unnecessarily restrictive model for high complexity behavior that state of the art RNNs skip entirely.

new29911y ago

The state of the art in compressing wikipedia is 1.278bits (on a certain subset) [1]. So that does seem pretty good.

[1] http://prize.hutter1.net/

1 more reply

samizdatum11y ago

That Markov Chain model operates on 4-grams by default. The RNN featured in the article generates output character-by-character, which is significantly more impressive. Here's a sample from the Markov Chain model operating on 4-grams:

  Ther deat is more; for in thers that undiscorns the unwortune, 
  the pangs against a life, the law's we know no trave, the hear, 
  thers thus pause.

The only reason why it seems like the model can occasionally spell, and create anglo-sounding neologisms, is because it operates on 4-grams.

Here's some character-by-character output from the same Markov Chain model.

  T,omotsuo ait   pw,, l f,s teo efoat t hoy tha fm nwo   
     bs rs a h enwcbr lwntikh  wqmaohaaer ah es aer 
  mkazeoltl.etnhhifcmfeifnmeeoddssmusoat irca   
  do'ltyuntos sih i etsoatbrbdl

mellavora11y ago

"do'lty untos sih i"

maybe the computer was drunk?

1 more reply

murbard211y ago

It balances parentheses and keeps track of other long range dependencies, something markov chain implementations cannot do.

pohl11y ago

Welcome to the unbearable forced-ness of titles. Everyone's making a nod to Milan Kundera these days.

jsprogrammer11y ago

http://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_...

coldtea11y ago

First, it's not a nod to Kundera, but to a classic math related work that predates Kundera's book.

Second, even if it was, really? As if we see plays on Kundera titles regularly on the web?

1 more reply

fpgaminer11y ago· 8 in thread

I'm in the middle of reading this article (very much appreciate Karpathy's writings), but I also wanted to brain dump some of my musings on modern machine learning; RNNs in particular. Sorry if this is redundant to anything the article talks about.

Deep learning has made great strides in recent years, but I don't think architectures which aren't recurrent will ever give rise to mammalian "thought". In my opinion, thought is equivalent to state, and feed forward networks do not have immediate state. Not in any relevant sense. So therefore they can never have thought.

RNNs, on the other hand, do have state, and therefore are a real step towards building machines that posses the capacity to think. That said, modern deep learning architectures based around feed forward networks are still very important. They aren't thinking machines, but they are helping us to build all those important pre-processing filters mammalian brains have (e.g. the visual cortex). This means we won't have to copy the mammalian versions, which would be rather tedious. We can just "learn" a V1, V2, etc from scratch. Wonderful. And they'll be helpful for building machine with senses different than biology has yet evolved. But, again, these feed forward networks won't lead to thought.

My second musing is where I think the next leap in machine learning will occur. To-date efforts have been focused on how to build algorithms that optimize the NN architecture (i.e. optimize weights, biases, etc). But mammalian brains seem to posses the ability to problem solve on the fly, far faster than I imagine tweaks to architecture could account for. We solve problem in-thought, rather than in-architecture; we think through a problem. Machine Learning doesn't posses this ability. It can only learn by torturing its architecture.

So, I believe there is this distinction to the learning that mammalian brains are able to do on the fly, using just their thoughts, and the learning they do long term by adjusting synaptic connections/response. It seems as if they solve a problem in the short term, and then store the way they solved it in the underlying architecture over the long term. Tweaking the architecture then makes solving similar problems in the future easier. The synaptic weights lead to what we call intuition, understanding, and wisdom. They make it so we don't have to think about a class of problems; we just know the solutions without thought. (Note how I say class of problems; this isn't just long term memory).

Along those lines, I come to my final musing. That mammalian brains are motivated by optimization of energy expenditure. Like anything in biologically evolved systems, energy efficiency is key, since food is often scarce. So why wouldn't brains also be motivated to be energy efficient? To that end, I believe tweaking synaptic weights, that kind of learning that machine learning does so well, is a result of the brain trying to reduce energy expenditure. Thoughts are expensive. Any time you have a thought running through your brain, it has some associated neuronal activity associated with it. That activity costs energy. So minimizing the amount we have to think on a day-to-day basis is important. And that, again, is where architecture changes come in. They are not the basis for learning; they are the basis for making future problem solving more efficient. Like I said, once a class of problems has been carved into your synaptic weights, you no longer have to think about that class of problems. The solutions come immediately. You don't think about walking; you just do it. But when you were a baby, I'll bet the bank that your young mind thought about walking a lot. Eventually all the mechanics of it were carved into your brain's architecture and now it requires many orders of magnitude less energy expenditure by your brain to walk.

So, the obvious question is ... how do mammalian brains problem solve using just thoughts. The answer to that, as I mentioned, is likely to lead to the next leap in machine learning. And it will, more likely than not, come from research on RNNs. What we need to do is find a way to train RNNs that are able to adapt to new problems immediately without tweaking their weights (which should be a slower, longer term process).

P.S. Yes, I know this was probably a bit off-topic and quite a bit wandering. I've had these musing percolating for awhile and don't really have an outlet for them at the moment. I hope it's on topic enough, and at least stimulates some interesting discussion. Machine learning is fascinating.

kylebrown11y ago

> That mammalian brains are motivated by optimization of energy expenditure. Like anything in biologically evolved systems, energy efficiency is key, since food is often scarce.

That doesn't square with empirical reality. Evolved biological systems appear to be optimized for robustness to perturbations, not efficiency (John Doyle argues that there is in fact a fundamental tradeoff between robustness and efficiency, for all types of complex systems not just biological).

> how do mammalian brains problem solve using just thoughts.

They don't. Sensory input is required for brains to learn new classes of problems.

> find a way to train RNNs that are able to adapt to new problems

Is this something different than multi-task learning?

Lambdanaut11y ago

> They don't. Sensory input is required for brains to learn new classes of problems.

Sensory input is required to gain the knowledge, but then you can just as easily muse over your gained knowledge for further insights in a sensory deprivation chamber as you can in a classroom.

1 more reply

seiji11y ago

> feed forward networks do not have immediate state. Not in any relevant sense.

Feed-forward networks do have state, but all the useful parts all obtained through explicit training (ye olde backprop, ye older hebbian). The typical scenario is "train model (write mode), deploy model (read-only mode)," which as you point out, has no "thought" since at runtime, no changes or introspections are happening.

> So therefore they can never have thought.

The key idea here would be: generative models. Most current AI fads are driven by discriminative models (image recognition, speech recognition, etc) which provide very narrow "faster than human" output, but, as you point out, have no thought or will or motives of their own.

But, once you have a sufficiently connected network, you can start to ask it open-ended questions ("draw a cat for me") in the form of sampling from the network (gibbs sampling, MCMC, ...) and it fills in the blanks.

The extra oomph of providing actual agency and intent and desire to the model is an exercise left to the reader.

> (which should be a slower, longer term process).

Sleep is a requirement of all things with neural network based brains as far as we know.

murbard211y ago

A RNN can, in particular, implement a GOFAI algorithm. I think that's what we basically learn for the first 5-7 years of our lives by analyzing other people's behavior, communicating, etc.

spin11y ago

The "DQN" (Deep Q Network) stuff from Google DeepMind has states. (And actions that transition from state to state.) This comes from Reinforcement Learning theory. (The Q-Learner from Reinforcement Learning is the "Q" in Deep Q Networks.) [doi:10.1038/nature14236]

Suri and Shultz argue that dopamine in the mammalian brain follows the "reward prediction error" from Reinforcement Learning [doi:10.1016/S0306-4522(98)00697-6] (Indeed the DQN paper mentions dopamine in the very first paragraph.)

Because of this, I am very excited about DQN. (I do think that it's only a building block towards building a self-aware brain, though.)

shock-value11y ago

This comment is really well written and expresses a lot of my thoughts about recent advances in computer learning as well -- though in a more clear and expressive way than I could, I think. Thanks fpgaminer.

1 more reply

0xdeadbeefbabe11y ago

From what I can tell the RNN in question already has mammalian intelligence, but also a weakness for PG, a phase that it will emerge from in 100 years.

morenoh14911y ago

nice dump 👍

pcmonk11y ago· 7 in thread

The code generator is awesome. There's hardly a syntax error. The file headers are the best.

Nitpick: although tty == tty is, as you say, vacuously true in this case, that's just because tty is a pointer. If tty were a float, this wouldn't be the case, since it could be NaN. I wouldn't be surprised if it learned to test a variable for equality against itself from some floating point code.

MrZongle211y ago

If nothing else, the RNN would be great for generating bogus source code for use in television programs and movies.

It would drive those who attempt to understand & reference it absolutely crazy. :D

zxyzzxxx11y ago

The code is nonsense. Their method is good for fuzzy logic like recognition, but this approach with code will never work for anything other than an art project.

teraflop11y ago

Currently it doesn't work, but saying it'll never work is pretty strong.

This kind of demo shows that deep neural networks can capture the structure of language, if not the semantics, in a very general way. And we have separate evidence that they can (in principle) capture semantic meaning and algorithmic reasoning as well, for example: http://arxiv.org/pdf/1410.5401v2.pdf (the "neural Turing machines" paper from DeepMind)

1 more reply

nomel11y ago

Feed it all of github, and I'm sure you could come up with some interesting auto complete code generation tools. Of course, coming from github , it'll be poorly documented and filled with buffer overflows :D

2 more replies

joelthelion11y ago

Could be interesting to plug this kind of generator into American Fuzzy Lop.

simias11y ago

There's not a lot of floating point in the kernel though.

ChuckMcM11y ago

Yes, and feed it into hackertyper.net and you can entertain an 8 year old for hours :-)

0xdeadbeefbabe11y ago· 7 in thread

I'm getting the funny impression that what distinguishes an algorithm from an AI algorithm isn't about the algorithm, but how people treat the algorithm. It's an AI algorithm if they describe it behaving intelligently i.e. painting numbers on a house, learning english first, being born, being tricked into painting a fence, etc. Otherwise its just an algorithm.

Jtsummers11y ago

This is an old problem in AI. Chess was an AI problem, until a computer beat a grandmaster. Vision was an AI problem, now we have OpenCV. Many AI problems get shifted out of "AI" once they're solved.

TheLoneWolfling11y ago

It stems from our definition of an AI.

An AI is a computer doing those things a computer cannot do. As such, anything that a computer cannot do isn't AI, and anything a computer can do isn't AI either.

1 more reply

im3w1l11y ago

One explanation for this could be that we think that some problem is so hard that any solution to it is necessarily so complicated that it could be adapted to solve pretty much anything. When we realize that that isn't the case, we stop calling it AI.

dumitrue11y ago

I don't think OpenCV really solved computer vision to be fair. There's definitely no model out there that can do image-based question & answering as well as a human can, or interpret the contents of an image (parse it, if you will) in an accurate way, with the exception of very few special cases.

deepnet11y ago

Learning to do something is an AI problem.

Writing a program to play Chess is not AI but doing so has helped figure learning out.

Ma8ee11y ago

Can a submarine swim?

0xdeadbeefbabe11y ago

This one can: https://www.youtube.com/watch?v=GGrWHlAm7zM (cartoon submarine character TV show) because it has large human like eyes.

TheLoneWolfling11y ago· 6 in thread

My question, and something this doesn't get into, is this: how do you train a RNN?

deepnet11y ago

You need an error signal - a target value is compared with the networks prediction. That error is carefully assigned proportionally to the network weights that contributed to it and the weights adjusted a small amount in that direction. This is repeated many times.

Backpropagation suffers from vanishing gradients on very deep neural nets.

Recurrent Neural Nets can be very deep in time.

Or the weights could be evolved using Genetic Programming.

raverbashing11y ago

> Backpropagation suffers from vanishing gradients on very deep neural nets.

Especially when using saturating functions (tanh/sigmoid)

> Or the weights could be evolved using Genetic Programming

GA, not GP http://en.wikipedia.org/wiki/Genetic_algorithm

warfangle11y ago

> Or the weights could be evolved using Genetic Programming.

Some algorithms, such as NEAT[0], use a genetic algorithm to describe not only the weights on edges in the network, but also the shape of the network itself - e.g., instead of every node of one layer connected to every node of the next, only certain connections are made.

0. http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_to...

skorgu11y ago

It would be interesting to occasionally train the generated C against a compiler.

2 more replies

egocodedinsol11y ago

there are various ways, but one is to unroll the network about some timestep and treat it as a regular NN. You might find this helpful:

http://minds.jacobs-university.de/sites/default/files/upload...

TheLoneWolfling11y ago

That sounds rather absurdly computationally expensive.

Thanks for the link, I'll take a look.

Smerity11y ago· 5 in thread

Karpathy is one of my favourite authors - not only is he deeply involved in technical work (audit the CS231n course for more[1]!), he spends much of his time demystifying the field itself, which is a brilliant way to encourage others to explore it :)

If you enjoyed his blog posts, I highly recommend watching his talk on "Automated Image Captioning with ConvNets and Recurrent Nets"[2]. In it he raises many interesting points that he hasn't had a chance to get around to fully in his articles.

He humbly says that his captioning work is just stacking image recognition (CNN) on to sentence generation (RNN), with the gradients effectively influencing the two to work together. Given that we've powerful enough machines now, I think we'll be seeing a lot of stacking of previously separate models, either to improve performance or to perform multi-task learning[3]. A very simple concept but one that can still be applied to many other fields of interest.

[1]: http://cs231n.stanford.edu/

[2]: https://www.youtube.com/watch?v=xKt21ucdBY0

[3]: One of the earliest - "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" http://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf

wonderingwhere11y ago

> he spends much of his time demystifying the field itself, which is a brilliant way to encourage others to explore it :)

yup. this is the first time I understood someone from this field. Honestly, this dude just broken down the wall.

What's more important, passion flows through his writing. And it can be felt. I got so excited while reading it.

pigscantfly11y ago

Andrej is also a great lecturer; his CS231n class in the winter was both the most enjoyable and educational I've taken all year. All of the materials are available at cs231n.stanford.edu, although I can't seem to find the lecture videos online. It may not have been recorded.

As a bonus, there's an ongoing class on deep learning architectures for NLP which covers Recurrent (and Recursive) Neural nets in depth (as well as LSTM's and GRU's). Check out cs224d.stanford.edu for lecture notes and materials. The lectures are definitely being recorded, but I don't think they're publicly available yet.

3 more replies

cOgnaut11y ago

Read over [1] and am currently watching [2], and I really can't get over a not insignificant bit of dissonance:

(a) He seems to be very intelligent. Kudos. But…

(b) How good of an idea is it really to create software with these abilities? We're already making machines that can do most things that had once been exclusive to humans. Pretty soon we'll be completely obsolete. Is that REALLY a good idea? To create "face detectors" (his words!)?

firethief11y ago

Our generation is going to get old and feeble and eventually die. If we have children, they'll completely supplant us.

Our relevance is ephemeral, but our influence will be lasting. Do we want to have a legacy of clinging to our personal feelings of importance, or of embracing the transience of our existence and nurturing our (intellectual) progeny?

1 more reply

Lambdanaut11y ago

Good for the entrepreneurs that invent them.

myth_buster11y ago· 5 in thread

This is quite incredible. The stylistic similarities of generated Shakespearean saga, Linux code etc was quite startling. Perhaps we can train a Haiku/Fortune cookie generator which could occasionally be quite profound.

seiji11y ago

> Linux code etc

People are always worried about "computers taking factory jobs" resulting in mass unemployment, but the truth is, a rudimentary AI with acceptance tests on output will obsolete every programmer alive.

Hell, half the programming people do these days is just gluing APIs together then seeing if it actually works. It doesn't take 16 years of rich inner human life experience to accomplish that, just exhaustive combinational parameter searching on the subset of API interactions you're interested in evaluating.

myth_buster11y ago

Douglas Crockford touches on this aspect in this entertaining and insightful talk [0]. I'm guilty of what you state and I think a large part of "programming" is rudimentary boiler plate coding/configuration and staring into the Abyss. I think our role will be to design algorithms and come up with creative solutions/hacks (which would be difficult for a program) and designing a workflow/flow chart and feeding it into a program which spits out binaries and flag for edge cases. A whole swat industries and economies (read outsourcing) will become redundant and only outsourcing done would be to the generator.

[0]: https://www.youtube.com/watch?v=taaEzHI9xyY

raverbashing11y ago

Who do you think will write the acceptance tests? (to be honest they're sometimes more complex than the code itself. E.g. write the acceptance tests for x=a/b for a and b as inputs )

I'm all for it, it's going to be a productivity gain. It's like going from a manual screwdriver to a motorized one.

speechduh11y ago

That particular stuff is actually pretty typical. I have a textbook that shows similar results on Shakespeare using N-grams from years ago.

gipp11y ago

Capturing writing style with ngram-based input and individual-character input are very, very different tasks. That's several ballparks higher in difficulty.

With ngrams, Markov models are perfectly sufficient. With individual characters, complex concepts need to be remembered across many, many characters of input.

tormeh11y ago· 4 in thread

I've thought a bit about RNNs, and I can see an obvious problem: Fixed amount of memory.

Is there any chance someone's come up with an RNN that has dynamic amounts of memory?

exgrv11y ago

There is this paper by Joulin & Mikolov: Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (http://arxiv.org/abs/1503.01007).

In this case, the memory of the RNN is an ensemble of differentiable stacks.

varelse11y ago

There's a huge degree of data re-use in the weights. This should be exploited.

Second, one could envision paging the hidden units back to system memory on a coprocessor-based implementation (GPUs/FPGAs/not Xeon Phi, gag me). 256 GB servers are effectively peanuts these days relative to developer salaries and university grants (datapoint: my grad school work system was ~$100K in 1990 dollars) so unless you're trying to create the first strong AI, I don't think this is a serious constraint.

Good luck with that no matter what Stephen Hawking, Elon Musk, and Nick Bostrom harp on about: we have no idea what the error function for strong AI ought to be and even if we did, it's over a MW using current technology to achieve the estimated FLOPS of a human cerebrum.

tormeh11y ago

I meant that the state vector has constant size and just setting it at the maximum available might give you problems with training.

1 more reply

p1esk11y ago

Look up Neural Turing Machines: connecting neural networks to external banks of memory.

rsp198411y ago· 3 in thread

I wonder what would happen if you train an RNN like described with, say, the scores of all of Mozart's Chamber Music and then let it generate new music from the learned pieces. How would it sound? Would it figure out beat? Chords? Harmonies? May it even sound a bit like Mozart?

kastnerkyle11y ago

The work of Nicolas Boulanger-Lewandowski was extensively focused on this topic, see his work [1]. He wrote a Theano deep learning tutorial on this topic [2], and several people (Kratarth Goel) [3][4] have advanced the work to use LSTM and deep belief networks.

For a brief while RNN-NADE made an appearance as well, though I do not know of an open source implementation

There are also a few of us who are working on more advanced versions of this model for speech synthesis, versus operating on the MIDI sequence. Stay tuned in the near future!

I can say from experience that some of the samples from the LSTM-DBN are shockingly cool, and drove me to spend about a week using K-means coded speech. It made robo-voices at least but our research moved past that pretty fast.

[1] http://www-etud.iro.umontreal.ca/~boulanni/ [2] http://deeplearning.net/tutorial/rnnrbm.html [3] http://arxiv.org/pdf/1412.6093.pdf [4] https://github.com/kratarth1203/NeuralNet/blob/master/rnndbn...

JonnieCache11y ago

Is the robot-voice code published anywhere?

You can make money out of that kind of thing btw!

https://soniccharge.com/bitspeek

(Obviously not the same thing but the point is that silly robo-voice code is marketable :)

Nadya11y ago

There's a few such projects in existence. Perhaps not RNN-Mozart inspired, but I'm sure that exists too.

Emily Howell

https://www.youtube.com/watch?v=QEjdiE0AoCU

Here's a Bach-inspired computer-generated song:

https://www.youtube.com/watch?v=PczDLl92vlc

dools11y ago· 3 in thread

Web spam 2.0:

1) Take the entire works of several popular content creators in a given field, complete with links out to articles etc.

2) Concatenate them into a single file

3) Train this thing to generate new articles

4) Create a map of popular articles that other people have written, to articles you have written on similar topics

5) Replace the originals with your articles

6) Publish millions of articles that can't be detected as spam automatically by Google

It's like bot wars: Spammers can train their robots to try and defeat Google's robots.

tluyben211y ago

They have been doing for years and years with Markov Chains and it works if you have content farms (100.000 urls different ipv4 ranges). Usually Google weeds them out after a while but as all is automated it works. It's tricky business as Google gets better and better but it still works and people are making a lot of money with it.

stefs11y ago

well, i don't see how they - the spammers - would fake google's valuation system of valuing incoming links from valuable sources. it's not like many valuable sites outside this relatively insular system would link to those generated nonsense pages. that'd practically create an insular babblenet that could be relatively easily identified.

i mean, it's not like that's exactly what's happening right now.

dools11y ago

Okay so in the system I'm hypothesising, I pick a topic -- say content marketing. I go to Neil Patel's and KissMetrics blog and get all their articles on content marketing, and train this thingy with them.

I then buy, say, 1,000 domains. Doesn't matter what they are -- Or I buy 100 domains and setup 300 tumblr blogs, and 300 blogger blogs and 300 wordpress.com blogs.

Now I drip feed content to each of those blogs, but instead of linking to the articles on content marketing that kissmetrics and neil patel originally reference, I link to articles I have created instead.

How can Google tell the difference between a tonne of nobody bloggers link to Neil Patel's articles, and my bots linking to my articles? The fact is that if you blog on niche topics, with good article titles reflecting low competition long tail keywords, you'll get some traffic from Google pretty easily -- how can Google possible tell that links are coming from shitty bot generated pages versus from a tonne of obscure bloggers with virtually no audiences (of which there are thousands)?

The way they can tell the difference is Panda (or Penguin? I think it's Panda ... ) so as long as your pet robot can learn from Neil Patel and Kissmetrics well enough to produce content that cannot be penalised by Panda, and so long as you don't do it stupidly by like, having the same anchor text for all the articles and doing 1,000 articles overnight and actually phase it in so that it looks as though you're getting some reasonable organic spread, you'll be able to game Google's rankings pretty reliably for your real articles that you're trying to promote, and get higher volumes of traffic to those articles than you would be able to by just focusing on niche, long tail articles (for example because you'd be able to get on page #1 or in the top 5 for much higher volume keywords).

You would then get shares etc. for your actual content -- just because those "spam farms" don't have social shares or backlinks from PR6 blogs doesn't mean Google completely disregards them, just means that you need a lot more of them to make the same impact as lots of shares/backlinks from PR6 blogs.

This strategy is old, and was killed by Panda, but if you could beat Panda using a RNN then this would work again.

noahmbarr11y ago· 3 in thread

Would the returned samples from PG/Shakespeare/Wikipedia examples be of higher quality if you used a word-level language model instead of character model with similar parameters?

I was curious if the overhead of learning how to spell words (vs a pure task of sentence construction with word objects) out weigh the reduction in sample set size?

(Awesome article for a RNN newbie)

fpgaminer11y ago

Karpathy states in the blog post that word-level models currently tend to beat character models, across the broad field of NLP related RNNs. But he argues that character models will eventually overtake (much in the same way that ConvNets have "replaced" manual feature extraction).

That said, I think the RNNs here are limited by the corpus. They need to be exposed to more writing. Even if all you want is a Shakespeare generator, you still need to expose it to other literature. That will give it greater context, and more freedom of expression and, dare I say, creativity. I mean, imagine if all you were exposed to your whole life was Shakespeare. Nothing else (no other senses). Even with your superior mind, I doubt you'd generate anything better than what this RNN spits out.

So yeah, it needs a large corpus to build a broader model. Then we need a way to instruct the broadly trained RNN to generate only Shakespeare-like text. Perhaps by adding an "author" or "style" input.

kylebgorman11y ago

I fail to see how word-based models are character-based models with manual feature extraction. Word boundaries are read directly from deterministically tokenized inputs.

And, as I mentioned upthread, it has been known for about ten years, long before the current neural net revival, that high-order character-based models are competitive with word-based models (at least in terms of perplexity).

kylebgorman11y ago

"Old-school" Markovian language models (the vast majority of what's being used in production today) are mostly word-based but for text applications with tons of data, high-order character models are competitive with word-based models. (http://www.aclweb.org/anthology/W05-1107)

waterlesscloud11y ago· 2 in thread

Side note: The title is in reference to this famous paper from 1960- http://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_...

The form of the title has become a common trope.

coverup11y ago

"Unreasonable Effectiveness Considered Harmful"

pizza11y ago

'Considered Harmful Essays' Considered Harmful: http://meyerweb.com/eric/comment/chech.html

ux-app11y ago· 2 in thread

I'm an absolute layman with regard to AI, so I'd be keen to hear some explanations with regard to the possibility of creating strong AI in silicon.

Might there be properties of our biological brain that silicon can't capture? Is this related to the concept of computability? I'm not suggesting that there is a spiritual or metaphysical component to thinking. I'm not, I'm a materialist through and through. I just wonder if maybe there is some component of non-deterministic behavior occurring inside a brain that our current silicon-based computing does not capture.

Another way to ask this is will we need to incorporate some form of wetware to achieve strong AI?

aamar11y ago

These are not fully settled questions, though the answer is probably no.

Most researchers believe that brains are Turing machine equivalent, therefore can be simulated by any other equivalents. Even Gödel believed this, though he believed the mind had more capabilities than the brain.[1] As a materialist, you would share the commonly-accepted view and reject his latter claim.

There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening in the brain, distinguishing them from classical computers.[2] Some recent computer simulations have shown this to be plausible, but the general impression is that it seems unlikely, and we don't have specific evidence of effects of this sort.

Quantum effects of certain sorts are computationally infeasible to perform with classical computers. And it's theoretically plausible that such effects can not be conducted at scale with in-development quantum computer technology, and is only practical with organic chemistry, but again, this is quite a minority view.

It's also possible that classical brain features, such as its massive concurrence or various clever algorithms, prove difficult to replicate or simulate. If these are easy problems to solve, then strong AI may arrive in decades; if very difficult, centuries. In the latter case, it seems plausible that incorporating wetware would be a useful shortcut. But there's good reason to believe that the practical disadvantages of wetware (e.g. keeping it alive, coordinating with its slow "clock speed") overwhelm the computational conveniences.

[1] http://www.hss.cmu.edu/philosophy/sieg/onmindTuringsMachines...

[2] http://en.wikipedia.org/wiki/Quantum_mind

ux-app11y ago

Thank you for the detailed response. I'm looking forward to digging into the links you posted.

> There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening

I wonder why this is a minority view. Bear in mind that I am an armchair scientist, but I recall reading that meaningful quantum effects are responsible for the efficiency of photosynthesis. It seems quite plausible (due to the electro-chemical nature of brain functioning) that there might be similar effects present in the brain.

Fascinating stuff.

clickok11y ago· 2 in thread

I love stuff like this, and I think "unreasonable" is almost an understatement.

It's "unreasonable" mainly because it occasionally captures subtle aspects of the data source for "free". If you've worked with procedurally generated content, Markov chains, and so on, you probably have had to perform a few tweaks in order to get plausible results[1]. From the article, an excerpt of the output from an RNN trained on Shakespeare:

  Second Lord:
  They would be ruled after this chamber, and
  my fair nues begun out of the fact, to be conveyed,
  Whose noble souls I'll have the heart of the wars.

  Clown:
  Come, sir, I will make did behold your worship.

  VIOLA:
  I'll drink it.

Sure, the individual blocks are similar to what you'd get from a Markov text generator-- but it gets that after a full stop, there comes a newline, a new character name, and a new text block. To my eyes, this is a qualitative leap in performance. It suggests that the model has figured out some things about the data stream that you'd normally have to add in by hand[2].

It's also unreasonable that the same framework works well for so many different data sources. My experience with other generative methods has been that they were fragile and prone to pathological behaviour, and that getting them to work required for a specific use case required a bunch of unprincipled hacks[3]. It used to be that when a talk started to veer towards generative models, I'd start looking around the room, wondering whether I could survive the drop from any outside-facing windows. But with RNNs using LSTM (or neural Turing machines!) you can consider incorporating a generative model in the solution you're putting together without having to spend a huge chunk of time massaging it into usefulness and purchasing time on a supercomputer[4]

1. I once wrote quick a Reddit bot with the aim of learning to repost frequent highly upvoted comments and trained it using a simple k-Markov model... it was not good at first, and in order to get it to work I had to do a lot of non-fun stuff like sanitizing input, adding heuristics for when/where to post, and at the end it was mediocre.

2. Alex Graves (from DeepMind) has a demo about using RNNs to "hallucinate" the evolution of Atari games, using the pixels from the screen as inputs. It's interesting because it shows that same sort of tendency to capture the subtle stuff: https://youtu.be/-yX1SYeDHbg?t=2968

3. As in occult knowledge and rules-of-thumb, but you might also read this as a double entendre about myself and my colleagues.

4. Well, you still might need an AWS GPU instance if you don't have a fancy graphics card.

jameshart11y ago

The shakespeare generator isn't just reproducing the syntactic structures, it occasionally seems to capture meter. The samples you've reproduced here aren't iambic, but they are around ten or eleven syllables per line, which is impressive enough in itself. In the longer passages, it manages some proper iambic pentameter:

   My power to give thee but so much as hell:
   Some service in the noble bondman here

It doesn't seem to have managed to pick up on rhyming couplets, though.

A quick search of Shakespeare's corpus also shows that Shakespeare never called a bondman 'noble'; there must be some conception of parts of speech being captured by the RNN, to enable it to decide that 'bondman' is a reasonable word to follow 'noble'.

So yes, "unreasonable" seems about right.

ryukafalz11y ago

I'd imagine the lack of rhyme is likely due to the fact that English pronunciation is ambiguous. Given only the text, it would have no way of picking up the fact that, say, "here" and "beer" rhyme, while "there" does not.

(Put another way, English text is a lossy representation of English speech.)

Perhaps if you were to feed the IPA representation of each word in alongside the text, the RNN would do a bit better, though admittedly I'm not sure how you would do so.

If this is the case, I'd imagine training it against Lojban text would see similar results.

1 more reply

cs70211y ago· 1 in thread

Nice. Andrej Karpathy deserves some kind of award for demystifying deep learning and making the subject so accessible to a wider audience. If you're a developer who knows little about the subject and want to learn more, a great starting point is the home page for his ConvNetJS project.[1]

[1] http://cs.stanford.edu/people/karpathy/convnetjs/

choppaface11y ago

And if you're more comfortable with Python, I strongly recommend the CS231n assignments / labs: http://cs231n.github.io/

Assignments 1 and 2 alone give a solid intro to implementing these algorithms, and the lab-oriented iPython-based format gives you a very high probability of writing a correct implementation even if you're clueless at the start.

Patryk11y ago· 1 in thread

This same thing (i.e., using recurrent neural networks to predict characters (and even words)) was done by Elman in 1990 in a paper called "Finding Structure in Time"[1]. In that paper, Elman goes several steps further and carries out some analysis to show what kind of information the recurrent neural network maintains about the on-going inputs.

It's an excellent read for anyone interested in learning about recurrent neural networks.

[1] http://crl.ucsd.edu/~elman/Papers/fsit.pdf

sushirain11y ago

It's amazing how much was already known decades ago. Elman and others did much more, and hopefully, now the field will take the next step (which was long delayed), with the help of today's computer power.

snikeris11y ago· 1 in thread

In the spirit of:

https://www.dartmouth.edu/~matc/MathDrama/reading/Wigner.htm...

http://www.researchgate.net/profile/Derek_Abbott/publication...

iyn11y ago

Very short video about the topic: https://www.youtube.com/watch?v=ZBkzqLJPkmM

maaaats11y ago· 1 in thread

Isn't the author's definition of RNNs wrong?

I thought the difference is that a RNN allows connection back to previous layers, compared to a feed-forward net. Not this talk about "fixed sizes" and "accepting vectors". Or am I wrong?

fpgaminer11y ago

Karpathy usually talks about machine learning topics from multiple viewpoints, and usually (in my experience with his writings) prefers more loose, non-traditional interpretations (that ultimately lead to better understanding of the underlying mechanics of the approach).

In this case, his point was that one way RNNs differ from FFNNs is their ability to accept arbitrarily sized inputs and generated arbitrarily sized outputs. That's pretty important, which is likely why he emphasizes it.

But the rest of the article shows the salient point; RNNs are NNs that hold a state vector.

Saying that RNNs are NNs that allow connections back to previous layers is true, but that's only one way of looking at it. Holding state is another, since it implies backwards connections. Feedback is another term. And because they have backwards connections, state, feedback, etc, they also posses the capacity to handle non-fixed sized inputs and outputs.

In summary; it's different viewpoints of the same mathematical object. Karpathy focuses on the ability of RNNs to handle arbitrarily long inputs and outputs, because that's something FFNNs cannot do.

mikecmpbll11y ago· 1 in thread

This is my deep learning enlightenment moment. 22/05/15

phyalow11y ago

me to, mesmerised.

efnx11y ago· 1 in thread

I've only read the first section but it seems RNNs are very close in concept to Mealy machines.

http://hackage.haskell.org/package/machines-0.4.1/docs/Data-...

> They accept an input vector x and give you an output vector y. However, crucially this output vector's contents are influenced not only by the input you just fed in, but also on the entire history of inputs you've fed in in the past.

teraflop11y ago

If it helps, you can think of a RNN as being analogous to a finite state machine. But instead of a single discrete state, it's a continuous, high-dimensional vector. That has the extremely important effect that the output is a continuous function of the input, which is necessary for training using gradient descent.

lqdc1311y ago· 1 in thread

Does anyone know if these are/can be good for named entity recognition? I am stuck implementing second order CRFs right now for the lack of a good implementation, and this seems a lot easier.

syllogism11y ago

I'm not aware of any strong RNN results for NER, no.

You'd probably find the paper here: http://aclweb.org/anthology/ (everything in CL is open access). You want the proceedings of CL, TACL, ACL, EMNLP, EACL, and NAACL. Don't bother with the workshops.

higherpurpose11y ago· 1 in thread

If neural networks are the way to build strong AI and neural nets are all about optimization, wouldn't a quantum computer be ideal to power an AI? (assuming we can get one to work)

Houshalter11y ago

I don't think so. NNs have millions of parameters, and making a quantum computer that large, and with that many complex interactions, would be very difficult.

Optimization of NNs isn't really that bad. Stochastic gradient descent is extremely powerful and roughly linear with the number of parameters, possibly better.

fdej11y ago

"This sample from a relatively decent model illustrates a few common mistakes. For example, the model opens a \begin{proof} environment but then ends it with a \end{lemma} ... By the time the model is done with the proof it has forgotten whether it was doing a proof or a lemma. Similarly, it opens an \begin{enumerate} but then forgets to close it."

Ah, so strong AI is finally here. A computer program that makes just the same mistakes as humans when writing in TeX.

danans11y ago

I'm curious to know if, since these networks can learn syntax, whether they can also be re-purposed as syntax checkers, not just syntax generators. That is, can the syntactical knowledge learned by these models be run in a static classification mode on some input text to recognize the anomalies within and suggest fixes.

j2kun11y ago

What's unreasonable about neural networks (in general, not just recurrent ones) is that we don't really have any theoretical understanding of why they work. In fact, we don't even really understand what sorts of functions neural networks compute.

mangeletti11y ago

Imagine a conversion-optimizing genetic algorithm for spam (web and/or email) generation, using a tool like this (e.g., when users perform the intended actions, DNA is passed on to the next iteration).

That would be one positive feedback loop to rule them all.

stcredzero11y ago

So, if Neural Networks can be thought of as just an optimized way of implementing unreasonably large dictionaries, Recurrent Neural Networks could be thought of as an optimized way of implementing unreasonably large Markov chains.

wonderingwhere11y ago

this is quite possibly the most interesting item I've read on HN

viraptor11y ago

I found the learning progress great. I was thinking some time ago how to generate english-sounding words which don't exist. Well, here they are: (from iteration 700)

Aftair, unsuch, hearly, arwage, misfort, overelical, ...

(although I admit, some of them may be just old words I haven't heard of before)

oggy11y ago

In all the examples on the page, the RNN is first trained and then used to generate the text. Is there a way to use RNNs for something interactive? For instance, can one train an RNN to mimic Paul Graham in a discussion, and not only in writing an essay?

hgibbs11y ago

I did have a bit of a chuckle when they got to Algebraic Geometry. That's incredible.

evc12311y ago

Someone should train an RNN on neural network source code to see if it's possible to get neural networks to generate neural networks.

divs121011y ago

This felt like watching Ex Machina. Thanks a lot, this was extremely informative and super fun.

thewarrior11y ago

I have a dumb question. How is a recurrent neural network different from a Markov Model ?

jgmmo11y ago

Very neat, and funny article. I love the PG generator.

j / k navigate · click thread line to collapse

207 comments

136 comments · 37 top-level

swalsh11y ago· 17 in thread

Really cool stuff here.

wayfarer2s11y ago

Houshalter11y ago

There is also a evidence that they can learn very high level knowledge about words and objects. See the success of word vectors: http://technology.stitchfix.com/blog/2015/03/11/word-is-wort...

tim33311y ago

> when I think of the word "circle", images of circles and spheres show up in my head

hellameta11y ago

https://web.media.mit.edu/~minsky/papers/MusicMindMeaning.ht...

guybrushT11y ago

As a father, and as someone interested in this discussion (about "child learning to talk"), I think you will love Prof Deb Roy's insights into how his infant son learned language.

http://www.ted.com/talks/deb_roy_the_birth_of_a_word?languag...

digikata11y ago

tzs11y ago

I don't recall for sure what the linguists who responded said, but I think they all said the child would not learn Spanish from this.

[1] I have no recollection of where this was.

3 more replies

dwiel11y ago

1 more reply

jpatokal11y ago

The TL;DR appears to be "no", interaction is necessary:

https://en.wikipedia.org/wiki/Language_acquisition#General_a...

Although for obvious reasons this is very hard to study experimentally:

https://en.wikipedia.org/wiki/Language_deprivation_experimen...

1 more reply

solve11y ago

Seems it would be far harder to infer the basic initial structure from just plain text.

frozenport11y ago

Umm. When you train you train against a cost function...

1 more reply

cristianpascu11y ago

NathanKP11y ago

4 more replies

seiji11y ago

> A machine will never get the meaning of a word, unlike a very small child.

Why not? Your brain isn't magic, just highly associative. We can do the same thing with computers real soon now.

2 more replies

btown11y ago

1 more reply

hanspeter11y ago

What makes the human brain not a machine?

1 more reply

gnaritas11y ago

> A machine will never get the meaning of a word

That's an irrational and indefensible position.

narrator11y ago· 9 in thread

dimatura11y ago

imaginenore11y ago

But some engineer out there understands how your phone works.

With neural nets NOBODY really understands how they work.

3 more replies

relate11y ago

pjc5011y ago

the people in charge will not give control over to a system that makes important decisions without explaining why

They will if it gives the answers they want to hear. History is full of critical decisions based on ridiculous pretexts or unclear processes.

scarmig11y ago

That's the (morally neutral) wonder of the market--it'll beat ideological or emotional objections into the ground, for better or for worse.

And sooner or later, someone might start a company where all decision making is performed by a neural net...

__Joker11y ago

btown11y ago

> the people in charge will not give control

Eliezer Yudkowsky would likely disagree with you: http://www.yudkowsky.net/singularity/aibox

EDIT: Also - http://www.explainxkcd.com/wiki/index.php/1450:_AI-Box_Exper...

m-i-l11y ago

chronolitus11y ago

What if an AI saves money, though? ( i.e. is cheaper than hiring a real person for a simple task )

" Never! Companies would never sacrifice principle and safety to save money! "

We'll see...

tshadwell11y ago· 8 in thread

[1] http://thinkzone.wlonk.com/Gibber/GibGen.htm

darkmighty11y ago

new29911y ago

The state of the art in compressing wikipedia is 1.278bits (on a certain subset) [1]. So that does seem pretty good.

[1] http://prize.hutter1.net/

1 more reply

samizdatum11y ago

  Ther deat is more; for in thers that undiscorns the unwortune, 
  the pangs against a life, the law's we know no trave, the hear, 
  thers thus pause.

The only reason why it seems like the model can occasionally spell, and create anglo-sounding neologisms, is because it operates on 4-grams.

Here's some character-by-character output from the same Markov Chain model.

  T,omotsuo ait   pw,, l f,s teo efoat t hoy tha fm nwo   
     bs rs a h enwcbr lwntikh  wqmaohaaer ah es aer 
  mkazeoltl.etnhhifcmfeifnmeeoddssmusoat irca   
  do'ltyuntos sih i etsoatbrbdl

mellavora11y ago

"do'lty untos sih i"

maybe the computer was drunk?

1 more reply

murbard211y ago

It balances parentheses and keeps track of other long range dependencies, something markov chain implementations cannot do.

pohl11y ago

Welcome to the unbearable forced-ness of titles. Everyone's making a nod to Milan Kundera these days.

jsprogrammer11y ago

http://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_...

coldtea11y ago

First, it's not a nod to Kundera, but to a classic math related work that predates Kundera's book.

Second, even if it was, really? As if we see plays on Kundera titles regularly on the web?

1 more reply

fpgaminer11y ago· 8 in thread

kylebrown11y ago

> That mammalian brains are motivated by optimization of energy expenditure. Like anything in biologically evolved systems, energy efficiency is key, since food is often scarce.

> how do mammalian brains problem solve using just thoughts.

They don't. Sensory input is required for brains to learn new classes of problems.

> find a way to train RNNs that are able to adapt to new problems

Is this something different than multi-task learning?

Lambdanaut11y ago

> They don't. Sensory input is required for brains to learn new classes of problems.

Sensory input is required to gain the knowledge, but then you can just as easily muse over your gained knowledge for further insights in a sensory deprivation chamber as you can in a classroom.

1 more reply

seiji11y ago

> feed forward networks do not have immediate state. Not in any relevant sense.

> So therefore they can never have thought.

The extra oomph of providing actual agency and intent and desire to the model is an exercise left to the reader.

> (which should be a slower, longer term process).

Sleep is a requirement of all things with neural network based brains as far as we know.

murbard211y ago

A RNN can, in particular, implement a GOFAI algorithm. I think that's what we basically learn for the first 5-7 years of our lives by analyzing other people's behavior, communicating, etc.

spin11y ago

Because of this, I am very excited about DQN. (I do think that it's only a building block towards building a self-aware brain, though.)

shock-value11y ago

1 more reply

0xdeadbeefbabe11y ago

From what I can tell the RNN in question already has mammalian intelligence, but also a weakness for PG, a phase that it will emerge from in 100 years.

morenoh14911y ago

nice dump 👍

pcmonk11y ago· 7 in thread

The code generator is awesome. There's hardly a syntax error. The file headers are the best.

MrZongle211y ago

If nothing else, the RNN would be great for generating bogus source code for use in television programs and movies.

It would drive those who attempt to understand & reference it absolutely crazy. :D

zxyzzxxx11y ago

The code is nonsense. Their method is good for fuzzy logic like recognition, but this approach with code will never work for anything other than an art project.

teraflop11y ago

Currently it doesn't work, but saying it'll never work is pretty strong.

1 more reply

nomel11y ago

2 more replies

joelthelion11y ago

Could be interesting to plug this kind of generator into American Fuzzy Lop.

simias11y ago

There's not a lot of floating point in the kernel though.

ChuckMcM11y ago

Yes, and feed it into hackertyper.net and you can entertain an 8 year old for hours :-)

0xdeadbeefbabe11y ago· 7 in thread

Jtsummers11y ago

This is an old problem in AI. Chess was an AI problem, until a computer beat a grandmaster. Vision was an AI problem, now we have OpenCV. Many AI problems get shifted out of "AI" once they're solved.

TheLoneWolfling11y ago

It stems from our definition of an AI.

An AI is a computer doing those things a computer cannot do. As such, anything that a computer cannot do isn't AI, and anything a computer can do isn't AI either.

1 more reply

im3w1l11y ago

dumitrue11y ago

deepnet11y ago

Learning to do something is an AI problem.

Writing a program to play Chess is not AI but doing so has helped figure learning out.

Ma8ee11y ago

Can a submarine swim?

0xdeadbeefbabe11y ago

This one can: https://www.youtube.com/watch?v=GGrWHlAm7zM (cartoon submarine character TV show) because it has large human like eyes.

TheLoneWolfling11y ago· 6 in thread

My question, and something this doesn't get into, is this: how do you train a RNN?

deepnet11y ago

Backpropagation suffers from vanishing gradients on very deep neural nets.

Recurrent Neural Nets can be very deep in time.

Or the weights could be evolved using Genetic Programming.

raverbashing11y ago

> Backpropagation suffers from vanishing gradients on very deep neural nets.

Especially when using saturating functions (tanh/sigmoid)

> Or the weights could be evolved using Genetic Programming

GA, not GP http://en.wikipedia.org/wiki/Genetic_algorithm

warfangle11y ago

> Or the weights could be evolved using Genetic Programming.

0. http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_to...

skorgu11y ago

It would be interesting to occasionally train the generated C against a compiler.

2 more replies

egocodedinsol11y ago

there are various ways, but one is to unroll the network about some timestep and treat it as a regular NN. You might find this helpful:

http://minds.jacobs-university.de/sites/default/files/upload...

TheLoneWolfling11y ago

That sounds rather absurdly computationally expensive.

Thanks for the link, I'll take a look.

Smerity11y ago· 5 in thread

[1]: http://cs231n.stanford.edu/

[2]: https://www.youtube.com/watch?v=xKt21ucdBY0

[3]: One of the earliest - "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" http://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf

wonderingwhere11y ago

> he spends much of his time demystifying the field itself, which is a brilliant way to encourage others to explore it :)

yup. this is the first time I understood someone from this field. Honestly, this dude just broken down the wall.

What's more important, passion flows through his writing. And it can be felt. I got so excited while reading it.

pigscantfly11y ago

3 more replies

cOgnaut11y ago

Read over [1] and am currently watching [2], and I really can't get over a not insignificant bit of dissonance:

(a) He seems to be very intelligent. Kudos. But…

firethief11y ago

Our generation is going to get old and feeble and eventually die. If we have children, they'll completely supplant us.

1 more reply

Lambdanaut11y ago

Good for the entrepreneurs that invent them.

myth_buster11y ago· 5 in thread

seiji11y ago

> Linux code etc

myth_buster11y ago

[0]: https://www.youtube.com/watch?v=taaEzHI9xyY

raverbashing11y ago

Who do you think will write the acceptance tests? (to be honest they're sometimes more complex than the code itself. E.g. write the acceptance tests for x=a/b for a and b as inputs )

I'm all for it, it's going to be a productivity gain. It's like going from a manual screwdriver to a motorized one.

speechduh11y ago

That particular stuff is actually pretty typical. I have a textbook that shows similar results on Shakespeare using N-grams from years ago.

gipp11y ago

Capturing writing style with ngram-based input and individual-character input are very, very different tasks. That's several ballparks higher in difficulty.

With ngrams, Markov models are perfectly sufficient. With individual characters, complex concepts need to be remembered across many, many characters of input.

tormeh11y ago· 4 in thread

I've thought a bit about RNNs, and I can see an obvious problem: Fixed amount of memory.

Is there any chance someone's come up with an RNN that has dynamic amounts of memory?

exgrv11y ago

There is this paper by Joulin & Mikolov: Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (http://arxiv.org/abs/1503.01007).

In this case, the memory of the RNN is an ensemble of differentiable stacks.

varelse11y ago

There's a huge degree of data re-use in the weights. This should be exploited.

tormeh11y ago

I meant that the state vector has constant size and just setting it at the maximum available might give you problems with training.

1 more reply

p1esk11y ago

Look up Neural Turing Machines: connecting neural networks to external banks of memory.

rsp198411y ago· 3 in thread

kastnerkyle11y ago

For a brief while RNN-NADE made an appearance as well, though I do not know of an open source implementation

There are also a few of us who are working on more advanced versions of this model for speech synthesis, versus operating on the MIDI sequence. Stay tuned in the near future!

JonnieCache11y ago

Is the robot-voice code published anywhere?

You can make money out of that kind of thing btw!

https://soniccharge.com/bitspeek

(Obviously not the same thing but the point is that silly robo-voice code is marketable :)

Nadya11y ago

There's a few such projects in existence. Perhaps not RNN-Mozart inspired, but I'm sure that exists too.

Emily Howell

https://www.youtube.com/watch?v=QEjdiE0AoCU

Here's a Bach-inspired computer-generated song:

https://www.youtube.com/watch?v=PczDLl92vlc

dools11y ago· 3 in thread

Web spam 2.0:

1) Take the entire works of several popular content creators in a given field, complete with links out to articles etc.

2) Concatenate them into a single file

3) Train this thing to generate new articles

4) Create a map of popular articles that other people have written, to articles you have written on similar topics

5) Replace the originals with your articles

6) Publish millions of articles that can't be detected as spam automatically by Google

It's like bot wars: Spammers can train their robots to try and defeat Google's robots.

tluyben211y ago

stefs11y ago

i mean, it's not like that's exactly what's happening right now.

dools11y ago

I then buy, say, 1,000 domains. Doesn't matter what they are -- Or I buy 100 domains and setup 300 tumblr blogs, and 300 blogger blogs and 300 wordpress.com blogs.

This strategy is old, and was killed by Panda, but if you could beat Panda using a RNN then this would work again.

noahmbarr11y ago· 3 in thread

Would the returned samples from PG/Shakespeare/Wikipedia examples be of higher quality if you used a word-level language model instead of character model with similar parameters?

I was curious if the overhead of learning how to spell words (vs a pure task of sentence construction with word objects) out weigh the reduction in sample set size?

(Awesome article for a RNN newbie)

fpgaminer11y ago

kylebgorman11y ago

I fail to see how word-based models are character-based models with manual feature extraction. Word boundaries are read directly from deterministically tokenized inputs.

kylebgorman11y ago

waterlesscloud11y ago· 2 in thread

Side note: The title is in reference to this famous paper from 1960- http://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_...

The form of the title has become a common trope.

coverup11y ago

"Unreasonable Effectiveness Considered Harmful"

pizza11y ago

'Considered Harmful Essays' Considered Harmful: http://meyerweb.com/eric/comment/chech.html

ux-app11y ago· 2 in thread

I'm an absolute layman with regard to AI, so I'd be keen to hear some explanations with regard to the possibility of creating strong AI in silicon.

Another way to ask this is will we need to incorporate some form of wetware to achieve strong AI?

aamar11y ago

These are not fully settled questions, though the answer is probably no.

[1] http://www.hss.cmu.edu/philosophy/sieg/onmindTuringsMachines...

[2] http://en.wikipedia.org/wiki/Quantum_mind

ux-app11y ago

Thank you for the detailed response. I'm looking forward to digging into the links you posted.

> There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening

Fascinating stuff.

clickok11y ago· 2 in thread

I love stuff like this, and I think "unreasonable" is almost an understatement.

  Second Lord:
  They would be ruled after this chamber, and
  my fair nues begun out of the fact, to be conveyed,
  Whose noble souls I'll have the heart of the wars.

  Clown:
  Come, sir, I will make did behold your worship.

  VIOLA:
  I'll drink it.

3. As in occult knowledge and rules-of-thumb, but you might also read this as a double entendre about myself and my colleagues.

4. Well, you still might need an AWS GPU instance if you don't have a fancy graphics card.

jameshart11y ago

   My power to give thee but so much as hell:
   Some service in the noble bondman here

It doesn't seem to have managed to pick up on rhyming couplets, though.

So yes, "unreasonable" seems about right.

ryukafalz11y ago

(Put another way, English text is a lossy representation of English speech.)

Perhaps if you were to feed the IPA representation of each word in alongside the text, the RNN would do a bit better, though admittedly I'm not sure how you would do so.

If this is the case, I'd imagine training it against Lojban text would see similar results.

1 more reply

cs70211y ago· 1 in thread

[1] http://cs.stanford.edu/people/karpathy/convnetjs/

choppaface11y ago

And if you're more comfortable with Python, I strongly recommend the CS231n assignments / labs: http://cs231n.github.io/

Patryk11y ago· 1 in thread

It's an excellent read for anyone interested in learning about recurrent neural networks.

[1] http://crl.ucsd.edu/~elman/Papers/fsit.pdf

sushirain11y ago

snikeris11y ago· 1 in thread

In the spirit of:

https://www.dartmouth.edu/~matc/MathDrama/reading/Wigner.htm...

http://www.researchgate.net/profile/Derek_Abbott/publication...

iyn11y ago

Very short video about the topic: https://www.youtube.com/watch?v=ZBkzqLJPkmM

maaaats11y ago· 1 in thread

Isn't the author's definition of RNNs wrong?

I thought the difference is that a RNN allows connection back to previous layers, compared to a feed-forward net. Not this talk about "fixed sizes" and "accepting vectors". Or am I wrong?

fpgaminer11y ago

But the rest of the article shows the salient point; RNNs are NNs that hold a state vector.

In summary; it's different viewpoints of the same mathematical object. Karpathy focuses on the ability of RNNs to handle arbitrarily long inputs and outputs, because that's something FFNNs cannot do.

mikecmpbll11y ago· 1 in thread

This is my deep learning enlightenment moment. 22/05/15

phyalow11y ago

me to, mesmerised.

efnx11y ago· 1 in thread

I've only read the first section but it seems RNNs are very close in concept to Mealy machines.

http://hackage.haskell.org/package/machines-0.4.1/docs/Data-...

teraflop11y ago

lqdc1311y ago· 1 in thread

Does anyone know if these are/can be good for named entity recognition? I am stuck implementing second order CRFs right now for the lack of a good implementation, and this seems a lot easier.

syllogism11y ago

I'm not aware of any strong RNN results for NER, no.

You'd probably find the paper here: http://aclweb.org/anthology/ (everything in CL is open access). You want the proceedings of CL, TACL, ACL, EMNLP, EACL, and NAACL. Don't bother with the workshops.

higherpurpose11y ago· 1 in thread

If neural networks are the way to build strong AI and neural nets are all about optimization, wouldn't a quantum computer be ideal to power an AI? (assuming we can get one to work)

Houshalter11y ago

I don't think so. NNs have millions of parameters, and making a quantum computer that large, and with that many complex interactions, would be very difficult.

Optimization of NNs isn't really that bad. Stochastic gradient descent is extremely powerful and roughly linear with the number of parameters, possibly better.

fdej11y ago

Ah, so strong AI is finally here. A computer program that makes just the same mistakes as humans when writing in TeX.

danans11y ago

j2kun11y ago

mangeletti11y ago

That would be one positive feedback loop to rule them all.

stcredzero11y ago

wonderingwhere11y ago

this is quite possibly the most interesting item I've read on HN

viraptor11y ago

I found the learning progress great. I was thinking some time ago how to generate english-sounding words which don't exist. Well, here they are: (from iteration 700)

Aftair, unsuch, hearly, arwage, misfort, overelical, ...

(although I admit, some of them may be just old words I haven't heard of before)

oggy11y ago

hgibbs11y ago

I did have a bit of a chuckle when they got to Algebraic Geometry. That's incredible.

evc12311y ago

Someone should train an RNN on neural network source code to see if it's possible to get neural networks to generate neural networks.

divs121011y ago

This felt like watching Ex Machina. Thanks a lot, this was extremely informative and super fun.

thewarrior11y ago

I have a dumb question. How is a recurrent neural network different from a Markov Model ?

jgmmo11y ago

Very neat, and funny article. I love the PG generator.

j / k navigate · click thread line to collapse