I wonder if all that's missing is just a few more layers, and another source of input. Maybe a list of requirements/output/input matched with the code so it understands why what was written was written. I wonder what would happen if you ran the program, took the output, and fed it back in as input.
Really cool stuff here.
As an exercise, when I think of the word "circle", images of circles and spheres show up in my head. Also the equation of a circle. My quick definition of it would be "a perfectly round object" which leads to questions of what "round" and "perfect" mean. The more I think about it, all my knowledge seems quite circular in that there are no axiomatic concepts, everything is relative and it just builds on itself. I wonder if that's the key to decipher meaning, increase the connections of the web -- with strong enough references you can pinpoint which of the nodes in the web something refers to.
In the case of this article, the NN isn't being asked to do any abstract task like "decipher meaning", but the very concrete task of "predict the next word". As the article shows NNs can do this fairly well.
There is also a evidence that they can learn very high level knowledge about words and objects. See the success of word vectors: http://technology.stitchfix.com/blog/2015/03/11/word-is-wort...
There seems some evidence that this stuff is fairly central to human intelligence and the ability to visualize in 3d is kind of hard wired. Deciphering meaning is approximately "seeing what it means" which can correspond to visualizing it in your head. For example "the cat sat on the mat" is a bunch of symbols but someone or some machine can convert that to an image of a cat sitting on a mat then I guess they've understood it.
https://web.media.mit.edu/~minsky/papers/MusicMindMeaning.ht...
http://www.ted.com/talks/deb_roy_the_birth_of_a_word?languag...
As one of the other commenters pointed out - it is like a tree (words/concepts) branching out from one another. I would be fascinated by seeing if this research can be continued into adulthood, where the individual "concepts" aren't as important as the interplay between them.
I once asked a similar question on some online forum [1] where many linguists hung out. My question was if an English-only speaking household left a general interest Spanish language TV station on most of the time when they weren't actively using the TV to watch something, so that their child received a very large exposure to Spanish language programming (news, sports, soap operas, sitcoms, movies, etc) from birth onward, would the child naturally learn Spanish?
I don't recall for sure what the linguists who responded said, but I think they all said the child would not learn Spanish from this.
[1] I have no recollection of where this was.
The methods involve providing more detailed feedback at each example. With most training data used now, we give a 0 or 1, does this example belong to this class. In the teacher networks, they were able to teach with more subtly: this is definitely not a car, it is very lizard like and a little snake like.
https://en.wikipedia.org/wiki/Language_acquisition#General_a...
Although for obvious reasons this is very hard to study experimentally:
https://en.wikipedia.org/wiki/Language_deprivation_experimen...
Seems it would be far harder to infer the basic initial structure from just plain text.
Put another way there is nothing magical about a child learning about the world. A child's brain is just a large neural network being fed patterned data over the course of many years by a variety of extremely high resolution analog sensors. Eventually the child begins to respond to the patterns.
Why not? Your brain isn't magic, just highly associative. We can do the same thing with computers real soon now.
A child knows that if it says "Mama food," it is likely to get attention, and if it gets attention, it is likely to minimize its hunger. Right now, a neural network can be trained to know that "Mama" occurs often in human dialogue, what words occur around it, even its dictionary definition and images of mothers. But it's not making the deeper connection to a strategy that minimizes hunger.
When I think about this, I wonder if insights from the world of gaming "AI" would be useful in developing the training datasets for real AI. Because you can't be a mother to a billion virtual babies, but you might be able to program a set of heuristics to be a mother to a billion virtual babies. Then you have some system that trains on their life experiences...? All speculation, but very interesting stuff.
That's an irrational and indefensible position.
With neural nets NOBODY really understands how they work.
They will if it gives the answers they want to hear. History is full of critical decisions based on ridiculous pretexts or unclear processes.
That's the (morally neutral) wonder of the market--it'll beat ideological or emotional objections into the ground, for better or for worse.
And sooner or later, someone might start a company where all decision making is performed by a neural net...
Eliezer Yudkowsky would likely disagree with you: http://www.yudkowsky.net/singularity/aibox
EDIT: Also - http://www.explainxkcd.com/wiki/index.php/1450:_AI-Box_Exper...
" Never! Companies would never sacrifice principle and safety to save money! "
We'll see...
I believe Markov Chains as a model quickly become inefficient (specially memory-wise) as you increase the complexity (long range correlations) of your prediction. It's an unnecessarily restrictive model for high complexity behavior that state of the art RNNs skip entirely.
Ther deat is more; for in thers that undiscorns the unwortune,
the pangs against a life, the law's we know no trave, the hear,
thers thus pause.
The only reason why it seems like the model can occasionally spell, and create anglo-sounding neologisms, is because it operates on 4-grams.Here's some character-by-character output from the same Markov Chain model.
T,omotsuo ait pw,, l f,s teo efoat t hoy tha fm nwo
bs rs a h enwcbr lwntikh wqmaohaaer ah es aer
mkazeoltl.etnhhifcmfeifnmeeoddssmusoat irca
do'ltyuntos sih i etsoatbrbdlSecond, even if it was, really? As if we see plays on Kundera titles regularly on the web?
Deep learning has made great strides in recent years, but I don't think architectures which aren't recurrent will ever give rise to mammalian "thought". In my opinion, thought is equivalent to state, and feed forward networks do not have immediate state. Not in any relevant sense. So therefore they can never have thought.
RNNs, on the other hand, do have state, and therefore are a real step towards building machines that posses the capacity to think. That said, modern deep learning architectures based around feed forward networks are still very important. They aren't thinking machines, but they are helping us to build all those important pre-processing filters mammalian brains have (e.g. the visual cortex). This means we won't have to copy the mammalian versions, which would be rather tedious. We can just "learn" a V1, V2, etc from scratch. Wonderful. And they'll be helpful for building machine with senses different than biology has yet evolved. But, again, these feed forward networks won't lead to thought.
My second musing is where I think the next leap in machine learning will occur. To-date efforts have been focused on how to build algorithms that optimize the NN architecture (i.e. optimize weights, biases, etc). But mammalian brains seem to posses the ability to problem solve on the fly, far faster than I imagine tweaks to architecture could account for. We solve problem in-thought, rather than in-architecture; we think through a problem. Machine Learning doesn't posses this ability. It can only learn by torturing its architecture.
So, I believe there is this distinction to the learning that mammalian brains are able to do on the fly, using just their thoughts, and the learning they do long term by adjusting synaptic connections/response. It seems as if they solve a problem in the short term, and then store the way they solved it in the underlying architecture over the long term. Tweaking the architecture then makes solving similar problems in the future easier. The synaptic weights lead to what we call intuition, understanding, and wisdom. They make it so we don't have to think about a class of problems; we just know the solutions without thought. (Note how I say class of problems; this isn't just long term memory).
Along those lines, I come to my final musing. That mammalian brains are motivated by optimization of energy expenditure. Like anything in biologically evolved systems, energy efficiency is key, since food is often scarce. So why wouldn't brains also be motivated to be energy efficient? To that end, I believe tweaking synaptic weights, that kind of learning that machine learning does so well, is a result of the brain trying to reduce energy expenditure. Thoughts are expensive. Any time you have a thought running through your brain, it has some associated neuronal activity associated with it. That activity costs energy. So minimizing the amount we have to think on a day-to-day basis is important. And that, again, is where architecture changes come in. They are not the basis for learning; they are the basis for making future problem solving more efficient. Like I said, once a class of problems has been carved into your synaptic weights, you no longer have to think about that class of problems. The solutions come immediately. You don't think about walking; you just do it. But when you were a baby, I'll bet the bank that your young mind thought about walking a lot. Eventually all the mechanics of it were carved into your brain's architecture and now it requires many orders of magnitude less energy expenditure by your brain to walk.
So, the obvious question is ... how do mammalian brains problem solve using just thoughts. The answer to that, as I mentioned, is likely to lead to the next leap in machine learning. And it will, more likely than not, come from research on RNNs. What we need to do is find a way to train RNNs that are able to adapt to new problems immediately without tweaking their weights (which should be a slower, longer term process).
P.S. Yes, I know this was probably a bit off-topic and quite a bit wandering. I've had these musing percolating for awhile and don't really have an outlet for them at the moment. I hope it's on topic enough, and at least stimulates some interesting discussion. Machine learning is fascinating.
That doesn't square with empirical reality. Evolved biological systems appear to be optimized for robustness to perturbations, not efficiency (John Doyle argues that there is in fact a fundamental tradeoff between robustness and efficiency, for all types of complex systems not just biological).
> how do mammalian brains problem solve using just thoughts.
They don't. Sensory input is required for brains to learn new classes of problems.
> find a way to train RNNs that are able to adapt to new problems
Is this something different than multi-task learning?
Sensory input is required to gain the knowledge, but then you can just as easily muse over your gained knowledge for further insights in a sensory deprivation chamber as you can in a classroom.
Feed-forward networks do have state, but all the useful parts all obtained through explicit training (ye olde backprop, ye older hebbian). The typical scenario is "train model (write mode), deploy model (read-only mode)," which as you point out, has no "thought" since at runtime, no changes or introspections are happening.
> So therefore they can never have thought.
The key idea here would be: generative models. Most current AI fads are driven by discriminative models (image recognition, speech recognition, etc) which provide very narrow "faster than human" output, but, as you point out, have no thought or will or motives of their own.
But, once you have a sufficiently connected network, you can start to ask it open-ended questions ("draw a cat for me") in the form of sampling from the network (gibbs sampling, MCMC, ...) and it fills in the blanks.
The extra oomph of providing actual agency and intent and desire to the model is an exercise left to the reader.
> (which should be a slower, longer term process).
Sleep is a requirement of all things with neural network based brains as far as we know.
Suri and Shultz argue that dopamine in the mammalian brain follows the "reward prediction error" from Reinforcement Learning [doi:10.1016/S0306-4522(98)00697-6] (Indeed the DQN paper mentions dopamine in the very first paragraph.)
Because of this, I am very excited about DQN. (I do think that it's only a building block towards building a self-aware brain, though.)
Nitpick: although tty == tty is, as you say, vacuously true in this case, that's just because tty is a pointer. If tty were a float, this wouldn't be the case, since it could be NaN. I wouldn't be surprised if it learned to test a variable for equality against itself from some floating point code.
It would drive those who attempt to understand & reference it absolutely crazy. :D
This kind of demo shows that deep neural networks can capture the structure of language, if not the semantics, in a very general way. And we have separate evidence that they can (in principle) capture semantic meaning and algorithmic reasoning as well, for example: http://arxiv.org/pdf/1410.5401v2.pdf (the "neural Turing machines" paper from DeepMind)
An AI is a computer doing those things a computer cannot do. As such, anything that a computer cannot do isn't AI, and anything a computer can do isn't AI either.
Writing a program to play Chess is not AI but doing so has helped figure learning out.
Backpropagation suffers from vanishing gradients on very deep neural nets.
Recurrent Neural Nets can be very deep in time.
Or the weights could be evolved using Genetic Programming.
Especially when using saturating functions (tanh/sigmoid)
> Or the weights could be evolved using Genetic Programming
Some algorithms, such as NEAT[0], use a genetic algorithm to describe not only the weights on edges in the network, but also the shape of the network itself - e.g., instead of every node of one layer connected to every node of the next, only certain connections are made.
0. http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_to...
http://minds.jacobs-university.de/sites/default/files/upload...
Thanks for the link, I'll take a look.
If you enjoyed his blog posts, I highly recommend watching his talk on "Automated Image Captioning with ConvNets and Recurrent Nets"[2]. In it he raises many interesting points that he hasn't had a chance to get around to fully in his articles.
He humbly says that his captioning work is just stacking image recognition (CNN) on to sentence generation (RNN), with the gradients effectively influencing the two to work together. Given that we've powerful enough machines now, I think we'll be seeing a lot of stacking of previously separate models, either to improve performance or to perform multi-task learning[3]. A very simple concept but one that can still be applied to many other fields of interest.
[1]: http://cs231n.stanford.edu/
[2]: https://www.youtube.com/watch?v=xKt21ucdBY0
[3]: One of the earliest - "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" http://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf
yup. this is the first time I understood someone from this field. Honestly, this dude just broken down the wall.
What's more important, passion flows through his writing. And it can be felt. I got so excited while reading it.
As a bonus, there's an ongoing class on deep learning architectures for NLP which covers Recurrent (and Recursive) Neural nets in depth (as well as LSTM's and GRU's). Check out cs224d.stanford.edu for lecture notes and materials. The lectures are definitely being recorded, but I don't think they're publicly available yet.
(a) He seems to be very intelligent. Kudos. But…
(b) How good of an idea is it really to create software with these abilities? We're already making machines that can do most things that had once been exclusive to humans. Pretty soon we'll be completely obsolete. Is that REALLY a good idea? To create "face detectors" (his words!)?
Our relevance is ephemeral, but our influence will be lasting. Do we want to have a legacy of clinging to our personal feelings of importance, or of embracing the transience of our existence and nurturing our (intellectual) progeny?
People are always worried about "computers taking factory jobs" resulting in mass unemployment, but the truth is, a rudimentary AI with acceptance tests on output will obsolete every programmer alive.
Hell, half the programming people do these days is just gluing APIs together then seeing if it actually works. It doesn't take 16 years of rich inner human life experience to accomplish that, just exhaustive combinational parameter searching on the subset of API interactions you're interested in evaluating.
I'm all for it, it's going to be a productivity gain. It's like going from a manual screwdriver to a motorized one.
With ngrams, Markov models are perfectly sufficient. With individual characters, complex concepts need to be remembered across many, many characters of input.
Is there any chance someone's come up with an RNN that has dynamic amounts of memory?
In this case, the memory of the RNN is an ensemble of differentiable stacks.
Second, one could envision paging the hidden units back to system memory on a coprocessor-based implementation (GPUs/FPGAs/not Xeon Phi, gag me). 256 GB servers are effectively peanuts these days relative to developer salaries and university grants (datapoint: my grad school work system was ~$100K in 1990 dollars) so unless you're trying to create the first strong AI, I don't think this is a serious constraint.
Good luck with that no matter what Stephen Hawking, Elon Musk, and Nick Bostrom harp on about: we have no idea what the error function for strong AI ought to be and even if we did, it's over a MW using current technology to achieve the estimated FLOPS of a human cerebrum.For a brief while RNN-NADE made an appearance as well, though I do not know of an open source implementation
There are also a few of us who are working on more advanced versions of this model for speech synthesis, versus operating on the MIDI sequence. Stay tuned in the near future!
I can say from experience that some of the samples from the LSTM-DBN are shockingly cool, and drove me to spend about a week using K-means coded speech. It made robo-voices at least but our research moved past that pretty fast.
[1] http://www-etud.iro.umontreal.ca/~boulanni/ [2] http://deeplearning.net/tutorial/rnnrbm.html [3] http://arxiv.org/pdf/1412.6093.pdf [4] https://github.com/kratarth1203/NeuralNet/blob/master/rnndbn...
You can make money out of that kind of thing btw!
https://soniccharge.com/bitspeek
(Obviously not the same thing but the point is that silly robo-voice code is marketable :)
Emily Howell
https://www.youtube.com/watch?v=QEjdiE0AoCU
Here's a Bach-inspired computer-generated song:
1) Take the entire works of several popular content creators in a given field, complete with links out to articles etc.
2) Concatenate them into a single file
3) Train this thing to generate new articles
4) Create a map of popular articles that other people have written, to articles you have written on similar topics
5) Replace the originals with your articles
6) Publish millions of articles that can't be detected as spam automatically by Google
It's like bot wars: Spammers can train their robots to try and defeat Google's robots.
i mean, it's not like that's exactly what's happening right now.
I then buy, say, 1,000 domains. Doesn't matter what they are -- Or I buy 100 domains and setup 300 tumblr blogs, and 300 blogger blogs and 300 wordpress.com blogs.
Now I drip feed content to each of those blogs, but instead of linking to the articles on content marketing that kissmetrics and neil patel originally reference, I link to articles I have created instead.
How can Google tell the difference between a tonne of nobody bloggers link to Neil Patel's articles, and my bots linking to my articles? The fact is that if you blog on niche topics, with good article titles reflecting low competition long tail keywords, you'll get some traffic from Google pretty easily -- how can Google possible tell that links are coming from shitty bot generated pages versus from a tonne of obscure bloggers with virtually no audiences (of which there are thousands)?
The way they can tell the difference is Panda (or Penguin? I think it's Panda ... ) so as long as your pet robot can learn from Neil Patel and Kissmetrics well enough to produce content that cannot be penalised by Panda, and so long as you don't do it stupidly by like, having the same anchor text for all the articles and doing 1,000 articles overnight and actually phase it in so that it looks as though you're getting some reasonable organic spread, you'll be able to game Google's rankings pretty reliably for your real articles that you're trying to promote, and get higher volumes of traffic to those articles than you would be able to by just focusing on niche, long tail articles (for example because you'd be able to get on page #1 or in the top 5 for much higher volume keywords).
You would then get shares etc. for your actual content -- just because those "spam farms" don't have social shares or backlinks from PR6 blogs doesn't mean Google completely disregards them, just means that you need a lot more of them to make the same impact as lots of shares/backlinks from PR6 blogs.
This strategy is old, and was killed by Panda, but if you could beat Panda using a RNN then this would work again.
I was curious if the overhead of learning how to spell words (vs a pure task of sentence construction with word objects) out weigh the reduction in sample set size?
(Awesome article for a RNN newbie)
That said, I think the RNNs here are limited by the corpus. They need to be exposed to more writing. Even if all you want is a Shakespeare generator, you still need to expose it to other literature. That will give it greater context, and more freedom of expression and, dare I say, creativity. I mean, imagine if all you were exposed to your whole life was Shakespeare. Nothing else (no other senses). Even with your superior mind, I doubt you'd generate anything better than what this RNN spits out.
So yeah, it needs a large corpus to build a broader model. Then we need a way to instruct the broadly trained RNN to generate only Shakespeare-like text. Perhaps by adding an "author" or "style" input.
And, as I mentioned upthread, it has been known for about ten years, long before the current neural net revival, that high-order character-based models are competitive with word-based models (at least in terms of perplexity).
The form of the title has become a common trope.
Might there be properties of our biological brain that silicon can't capture? Is this related to the concept of computability? I'm not suggesting that there is a spiritual or metaphysical component to thinking. I'm not, I'm a materialist through and through. I just wonder if maybe there is some component of non-deterministic behavior occurring inside a brain that our current silicon-based computing does not capture.
Another way to ask this is will we need to incorporate some form of wetware to achieve strong AI?
Most researchers believe that brains are Turing machine equivalent, therefore can be simulated by any other equivalents. Even Gödel believed this, though he believed the mind had more capabilities than the brain.[1] As a materialist, you would share the commonly-accepted view and reject his latter claim.
There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening in the brain, distinguishing them from classical computers.[2] Some recent computer simulations have shown this to be plausible, but the general impression is that it seems unlikely, and we don't have specific evidence of effects of this sort.
Quantum effects of certain sorts are computationally infeasible to perform with classical computers. And it's theoretically plausible that such effects can not be conducted at scale with in-development quantum computer technology, and is only practical with organic chemistry, but again, this is quite a minority view.
It's also possible that classical brain features, such as its massive concurrence or various clever algorithms, prove difficult to replicate or simulate. If these are easy problems to solve, then strong AI may arrive in decades; if very difficult, centuries. In the latter case, it seems plausible that incorporating wetware would be a useful shortcut. But there's good reason to believe that the practical disadvantages of wetware (e.g. keeping it alive, coordinating with its slow "clock speed") overwhelm the computational conveniences.
--
[1] http://www.hss.cmu.edu/philosophy/sieg/onmindTuringsMachines...
> There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening
I wonder why this is a minority view. Bear in mind that I am an armchair scientist, but I recall reading that meaningful quantum effects are responsible for the efficiency of photosynthesis. It seems quite plausible (due to the electro-chemical nature of brain functioning) that there might be similar effects present in the brain.
Fascinating stuff.
It's "unreasonable" mainly because it occasionally captures subtle aspects of the data source for "free". If you've worked with procedurally generated content, Markov chains, and so on, you probably have had to perform a few tweaks in order to get plausible results[1]. From the article, an excerpt of the output from an RNN trained on Shakespeare:
Second Lord:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.
Clown:
Come, sir, I will make did behold your worship.
VIOLA:
I'll drink it.
Sure, the individual blocks are similar to what you'd get from a Markov text generator-- but it gets that after a full stop, there comes a newline, a new character name, and a new text block.
To my eyes, this is a qualitative leap in performance.
It suggests that the model has figured out some things about the data stream that you'd normally have to add in by hand[2].It's also unreasonable that the same framework works well for so many different data sources. My experience with other generative methods has been that they were fragile and prone to pathological behaviour, and that getting them to work required for a specific use case required a bunch of unprincipled hacks[3]. It used to be that when a talk started to veer towards generative models, I'd start looking around the room, wondering whether I could survive the drop from any outside-facing windows. But with RNNs using LSTM (or neural Turing machines!) you can consider incorporating a generative model in the solution you're putting together without having to spend a huge chunk of time massaging it into usefulness and purchasing time on a supercomputer[4]
1. I once wrote quick a Reddit bot with the aim of learning to repost frequent highly upvoted comments and trained it using a simple k-Markov model... it was not good at first, and in order to get it to work I had to do a lot of non-fun stuff like sanitizing input, adding heuristics for when/where to post, and at the end it was mediocre.
2. Alex Graves (from DeepMind) has a demo about using RNNs to "hallucinate" the evolution of Atari games, using the pixels from the screen as inputs. It's interesting because it shows that same sort of tendency to capture the subtle stuff: https://youtu.be/-yX1SYeDHbg?t=2968
3. As in occult knowledge and rules-of-thumb, but you might also read this as a double entendre about myself and my colleagues.
4. Well, you still might need an AWS GPU instance if you don't have a fancy graphics card.
My power to give thee but so much as hell:
Some service in the noble bondman here
It doesn't seem to have managed to pick up on rhyming couplets, though.A quick search of Shakespeare's corpus also shows that Shakespeare never called a bondman 'noble'; there must be some conception of parts of speech being captured by the RNN, to enable it to decide that 'bondman' is a reasonable word to follow 'noble'.
So yes, "unreasonable" seems about right.
(Put another way, English text is a lossy representation of English speech.)
Perhaps if you were to feed the IPA representation of each word in alongside the text, the RNN would do a bit better, though admittedly I'm not sure how you would do so.
If this is the case, I'd imagine training it against Lojban text would see similar results.
--
Assignments 1 and 2 alone give a solid intro to implementing these algorithms, and the lab-oriented iPython-based format gives you a very high probability of writing a correct implementation even if you're clueless at the start.
It's an excellent read for anyone interested in learning about recurrent neural networks.
I thought the difference is that a RNN allows connection back to previous layers, compared to a feed-forward net. Not this talk about "fixed sizes" and "accepting vectors". Or am I wrong?
In this case, his point was that one way RNNs differ from FFNNs is their ability to accept arbitrarily sized inputs and generated arbitrarily sized outputs. That's pretty important, which is likely why he emphasizes it.
But the rest of the article shows the salient point; RNNs are NNs that hold a state vector.
Saying that RNNs are NNs that allow connections back to previous layers is true, but that's only one way of looking at it. Holding state is another, since it implies backwards connections. Feedback is another term. And because they have backwards connections, state, feedback, etc, they also posses the capacity to handle non-fixed sized inputs and outputs.
In summary; it's different viewpoints of the same mathematical object. Karpathy focuses on the ability of RNNs to handle arbitrarily long inputs and outputs, because that's something FFNNs cannot do.
http://hackage.haskell.org/package/machines-0.4.1/docs/Data-...
> They accept an input vector x and give you an output vector y. However, crucially this output vector's contents are influenced not only by the input you just fed in, but also on the entire history of inputs you've fed in in the past.
You'd probably find the paper here: http://aclweb.org/anthology/ (everything in CL is open access). You want the proceedings of CL, TACL, ACL, EMNLP, EACL, and NAACL. Don't bother with the workshops.
Optimization of NNs isn't really that bad. Stochastic gradient descent is extremely powerful and roughly linear with the number of parameters, possibly better.
Ah, so strong AI is finally here. A computer program that makes just the same mistakes as humans when writing in TeX.
That would be one positive feedback loop to rule them all.
Aftair, unsuch, hearly, arwage, misfort, overelical, ...
(although I admit, some of them may be just old words I haven't heard of before)