A non-technical explanation of deep learning (opens in new tab)

ChatGTP3y ago

Sorry if this is a personal question, but why would you get a 9 year old into machine learning ?

hackernewds3y ago

You could just let your daughter see it. To what extent can you "protect" her exposure to the world?

SnooSux3y ago

I've barely forgiven him for explaining genetic algorithms and acting like they have any relevance to contemporary ML research.

The footnote video was an alright explanation of backprop. If that were part of the main video that would have been reasonable.

I really like his history/geography videos but anything technical leave a lot to be desired. And don't get me started on Humans Need Not Apply.

jstx13y ago

> And don't get me started on Humans Need Not Apply.

Well now you have to tell us. :) Many of the concrete examples in that video are exaggerated and/or misunderstood but the general question it asks - what to do when automation makes many people unemployable through no fault of their own - seems valid.

flangola73y ago

Humans Need Not Apply is one of the most phenomenal videos on YouTube, what do you think is wrong with it?

Trufa3y ago

What a strange word to use in that context, why would he need to be forgiven by you? How has he wronged you? Seems at worst, an honest mistake in a complicated topic.

Cpoll3y ago

You're not using "at worst" correctly. What you describe is an "at best". Worse would be that CGP Grey deliberately picked a misleading title in order to optimize views, algorithm, etc.

This is, I think, the case. But I don't begrudge them too much, YouTube is cutthroat.

gregschlom3y ago

Maybe GP is a non-native English speaker? This construct would be pretty common way for a native French speaker to say they are angry at something. Not sure if it's common in English as well.

zvmaz3y ago· 9 in thread

I have met people who think they understand a particular topic I am versed in, but actually don't. Similarly, I am often wary that I get superficial knowledge about a topic I don't know much about through "laymen" resources, and I doubt one can have an appropriate level of understanding mainly through analogies and metaphors. It's a kind of "epistemic anxiety". Of course, there are "laymen" books I stumbled upon which I think go to appropriate levels of depth and do not "dumb down" to shallow levels the topics, yet remain accessible, like Gödel's Proof, by Ernest Nagel. I'd be glad to read about similar books on all topics, including the one discussed in this thread.

Knowledge is hard to attain...

sainez3y ago

I find the best way to learn technical topics is to build a simplified version of the thing. The trick is to understand the relationship between the high level components without getting lost in the details. This high level understanding then helps inform you when you drill down into specifics.

I think this book is a shining example of that philosophy: https://www.buildyourownlisp.com/. In the book, you implement an extremely bare-bones version of lisp, but it has been invaluable in my career. I found I was able to understand nuanced language features much more quickly because I have a clear model of how programming languages are decomposed into their components.

joe_the_user3y ago

I agree but that's a good guide to build a technical understanding of a complex subject, not sufficient-in-itself tool set for considering questions in that complex subject.

Especially, I'll people combining some "non-technical summary" of quantum-mechanics/Newton Gravity/genetic engineer/etc with their personal common sense are constant annoyance to me whenever such topics come here.

Chocobean3y ago

Thanks for the link. Could you give an example of something you learned better/easier after having implemented a simplified version?

Side question: Is there a entry level build your own language model or gan type learning tool out there as well?

somenameforme3y ago

There are a million e.g. number parsing (image to digit) neural network type programs on GitHub. Go pick one in your preferred language and break it apart, and rebuild it, looking up the concepts behind parts you don't understand. After you finish up with the above, look up 'the xor problem' to see a common practical problem (which creating a network to replicate xor illustrates, rather than is) and you'll be well on your way to a nice fundamental understanding, built from the ground up.

One of the most interesting things about this topic is that the fundamental concepts and implementations are all really simple. It's the fact that it actually works that's mind boggling. In any case, the above is not a months like affair - but like one week of dedicated work.

lxe3y ago

I've noticed that the learning curve stary fairly flat when it comes to understanding weights, and layers, and neural networks, heck, even what gradient descent is for... but then when it comes to actually understanding why optimization algorithms are needed, and how they work, things just spiral into very hard math territory.

I do think that maybe it feels inaccessible because we transition from discrete concepts easily digestible by CS grads into some complicated math with very terse mathematic notation, yet the math might not be as hard if presented in a way that doesn't scare away programmers.

> an appropriate level of understanding mainly through analogies and metaphors

I think it's actually worse than that - somebody who doesn't know actually realizes that he doesn't know, but somebody who _thinks_ he understands through analogies and metaphors will confidently come to the incorrect conclusion and then argue with somebody who actually does understand the topic - often managing to convince innocent bystanders because his reasoning is easier to grasp and the so-called expert seems to be getting more and more flustered (as he tries to explain why the analogy is actually correct, but oversimplified).

dmreedy3y ago

I am fascinated by this phenomenon, and the double-edged sword that metaphors are.

On the one hand they're jargon used as short hand to technical concepts understood well by domain experts. And the concision they afford can lead to deeper understanding as they transcend their composite or adapted meanings and become base terminology in and of themselves (I think of e.g. Latin in English legal terminology. "Habeas corpus" has a literal meaning when translated, but the understood jargon has a deeper, and more specific meaning). At that point, they are powerful because of the precision of meaning and concision of expression they afford.

On the other hand, they lift intuitive terminology from a base language that is understood in vaguer terms by a broader audience. And this creates invisible disconnects because the abstraction created by these terms leaks like a sieve unless you know the precise semantics and have the model to use them.

By translating a discourse into a higher metaphoric level, we increase precision and efficiency amongst mutual understanders, but at the same time, we increase the level of ambiguity, the number of possible interpretations, and the availability of terms familiar to (and thus, handles to grab on to) non-understanders. And that latter situation allows non-understanders to string together what sound superficially like well-formed thoughts using jargon terms, but based on the base language semantics. But without the deeper knowledge required to understand whether a given utterance scans or not.

That's how I've been trying to wrap my head around it at least. I hope it doesn't sound like moralizing or condescension, I don't mean it to. I know I'm "guilty" of trying to manipulate metaphoric models that I don't actually understand, based on the lay-semantics of their jargon.

__loam3y ago

You're basically describing a lot of generative AI developers who are applying their technology to fields they don't really understand

wanderlust1233y ago

Can you please suggest other books similar in spirit to the Nagel book? Would love to read some over summer

clarle3y ago· 9 in thread

Totally aware that this isn't a fully formal definition of deep learning, but one interesting takeaway for me is realizing that in a way, corporations with their formal and informal reporting structures are structured in a way similar to neural networks too.

It seems like these sort of structures just regularly arise to help regulate the flow of information through a system.

0xBABAD00C3y ago

There is research claiming the entire universe is a neural network: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712105/

mtlmtlmtlmtl3y ago

You can find theoretical physics research claiming a lot of bonkers things that almost certainly are not true, that's sort of the nature of the field.

mcbuilder3y ago

That's the sort of blue sky research I'm glad exists.

mxkopy3y ago

Indra's Net is, in fact, a neural network

__loam3y ago

Frankly, stuff like this makes me more skeptical of the ML community. Remember when people thought Brains were just really complicated hydraulic systems?

joe_the_user3y ago

Uh,

The similarity of corporations and neural nets is pretty much only that both are information processing systems. An operating system or missile guidance system is far more like a corporation than a neural network.

Neural networks have no memory and generally don't seek particular goals, they simply recognize, predict and generate similar instances.

Retric3y ago

Plenty of ways to think about this stuff. IMO neural networks don’t inherently do anything, it’s just a data structure.

Different ways you can interact with that data structure can however provide meaning and store information in the weights etc.

d0gsg0w00f3y ago

    > Neural networks have no memory and generally don't seek particular goals, they simply recognize, predict and generate similar instances.

Sounds exactly like every corporation I've ever worked in.

mensetmanusman3y ago

Neural networks do have a memory of sorts, that’s why they improve with node size. A mathematician proved something along these lines recently.

The memory isn’t digital bits like we think of now though, but abstractions in higher dimensions.

_gmax03y ago· 9 in thread

The most concise and intuitive line of explanation I've been given goes along the lines of this:

1 - We want to model data, representative of some system, through functions.

2 - Virtually any function can be expressed by a n-th order polynomial.

3 - We wish to learn the parameters, the coefficients, of such polynomials.

4 - Neural networks allow us to brute-force test candidate values of such parameters (finding optimal candidate parameters such that error between expected and actual values of our dataset are minimized)

Whereas prior, methods (e.g. PCA) could only model linear relationships, neural networks allowed us to begin modeling non-linear ones.

civilized3y ago

You don't need neural networks to do polynomial regression. Polynomial regression, perhaps surprisingly, can be implemented using only (multivariable) linear regression. You just include powers of your predictor x as terms in the regression formula:

  y = a + bx + cx^2 + dx^3 + ...

The resulting model is linear, even though there are powers of x in your formula. Because x and y are known from the data. They're not what you're solving for, you're solving for the unknown coefficients (a, b, c, d...). This gives you a linear system of equations in those unknown coefficients, which can be solved using standard linear least squares methods.

So fitting polynomials is easy. The problem is that it's not that useful. Deep learning has to solve much harder problems to get to a useful model.

lhnz3y ago

Hm, I don't think that's quite it. I went through my own process of learning how neural networks work recently and wrote this based on my learning: https://sebinsua.com/bridging-the-gap

As far as my understanding goes, you can represent practically any function as layers of linear transformations followed by non-linear functions (e.g. `ReLU(x) = max(0, x)`). It's this sprinkling of non-linearity that allows the networks to be able to model complex functions.

However, from my perspective, the secret sauce is (1) composability and (2) differentiability. These enable the backpropagation process (which is just "the chain rule" from calculus) and this is what allows these massive mathematical expressions to learn parameters (weights and biases) that perform well.

pedrosorio3y ago

Mentioning polynomials is a pretty poor way to explain it for two reasons:

- It requires some mathematical understanding so will exclude some part of the non-technical audience

- It is the incorrect analogy. Non-linearities in neural networks have nothing to do with polynomials. In fact, polynomial regression is a type of linear regression, and for the most part, it sucks.

Also, as someone mentioned, all the “serious” alternative ML methods prior to the deep learning revolution allow modeling non linearities (even if just through modification of linear regressions, like polynomial regression).

_gmax03y ago

Thanks for the correction. It's been some time since I actively thought about the theory (evidently I didn't digest it correctly the first time!).

mr_toad3y ago

> Virtually any function can be expressed by a n-th order polynomial.

But there are many things that are not functions. Like circles. And they tend to crop up a lot in the real world, no pun intended.

cshimmin3y ago

Well, technically a circle can't be said to be a function but not for the reason you mean. A circle is a collection or a set of points, for example in a 2d plane, that are equidistant from a center point.

Probably what you are trying to say is that "a circle is not the image of a function", but that is also not true. You're assuming since in cartesian coordinates you can solve for y = +/- sqrt(R^2 - x^2), the fact that y is multi-valued means it's not a function. This is what they teach in highschool pre-calculus anyway.

But for example, we can associate the points on a circle with the image of the function e^{i theta}. Or equivalently, with the R^2-valued function f(theta) = (cos(theta), sin(theta)).

teruakohatu3y ago

> Whereas prior, methods (e.g. PCA) could only model linear relationships,

Prior methods also allowed modelling of non-linear relationships, eg. Random Forests.

jacksnipe3y ago

Except gradient descent is about as far from brute force as it gets

_gmax03y ago

Sure, under the assumption that your parameter space is convex.

amelius3y ago· 7 in thread

The problem with deep learning is opposite. You can understand most of it with just high school math. Advanced math is mostly useless because of the dimensionality of neural nets.

ftxbro3y ago

> Advanced math is mostly useless because of the dimensionality of neural nets.

It depends what you mean by advanced math. There is a lot of math that only really comes into play because of the high dimensionality! For example math related to tensor wrangling, low rank approximations, spectral theory, harmonic theory, matrix calculus derivatives, universality principles, and other concepts that could be interesting or bewildering or horrifying depending how you react to it. Of course some of it is only linear algebra of the 'just high school math' kind but that's not how I would normally describe it. If you look at the math in the proofs in the appendices of the more technical AI papers on arxiv there is often some weird stuff in there, not just matrix multiply and softmax.

amelius3y ago

Yes but do you have examples of "higher" math not being just a curiosity and actually making it into real world models and training algorithms?

uoaei3y ago

That's like saying you understand state-of-the-art CFD code because you can read Fortran.

There are many aspects to learning systems that we still don't have any kind of grasp on, and will take more than a little advanced math (statistics/probability theory, transport theory, topology, etc.) to understand as a community.

Dunning-Kruger is probably more common in spaces like this one, where people carry social capital for being able to "spin up quickly". But the true meta-skill of upskilling is turning unknown unknowns (UU) into known unknowns (KU), and then into known knowns (KK). It's not enough to just jump from UU to KK through osmosis by reading blog posts on a news aggregator, because there will still be a huge space of unknowns not covered by that approach.

mach1ne3y ago

Yes, it’s really rather like alchemy in some sense. Stuff works, and often nobody knows exactly why.

uoaei3y ago

"I don't follow the latest ML scaling and theory research" does not in any way equate to "these things are unknowable".

Utkarsh_Mood3y ago

can you elaborate further on what you mean by 'dimensionality of neural nets'? Thanks!

amelius3y ago

Yes, I mean the huge number of trainable parameters.

onikolas73y ago· 4 in thread

Funny. In the game black&white you would slap or pet your avatar to train it. The lead AI programmer on that was Demis Hassabis of deepmind fame.

Maultasche3y ago

The description made me think of Black & White as well. I still have memories of smacking my creature around every time he ate someone.

redog3y ago

Somehow he knew AI would be our Gods.

Hithredin3y ago

Actually, the god is the AI trainer. The AI is the god tool and us poor villagers are forced to worship the first.

__loam3y ago

False idols maybe

nailer3y ago· 4 in thread

Does this article imply there are circumstances where a spreadsheet is a cat? What a poor example of technical writing.

andysinclair3y ago

He’s saying that the spreadsheet represents the “picture” of the cat in terms of pixels and RGB values etc.

The algorithm/workers are not really “looking” at a picture of a cat, they are analysing and looking for patterns in the data that defines the picture of the cat.

nailer3y ago

Yes I know. The spreadsheet analogy does not work.

teacpde3y ago

Not the author, but to the author's defense, it is meant to be non-technical. And the first paragraph reads interesting to me.

nailer3y ago

Most non technical people would think there are zero circumstances where a spreadsheet could be a cat.

pkdpic3y ago· 3 in thread

I love this, but Im always confused in these kinds of analogies what the reward / punishment system really equates to...

Also reminds me of Ted Chiang warning us that we will torture innumerable AI entities long before we start having real conversations about treating them with compassion.

time_to_smile3y ago

Don't love it, it's not correct.

> what the reward / punishment system really equates to

Nothing, and least as far as neural network training goes. This is an extremely poor analogy regarding how neural networks learn.

If you've ever done any kind of physical training and have had a trainer sightly adjust the position of your limbs until what ever activity you're doing feels better, that's a much closer analogy. You're gently searching the space of possible correct positions, guided by an algorithm (your trainer) that knows how to move you towards a more correct solution.

There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

GaggiX3y ago

>There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

Well deep reinforcement learning.

> what the reward / punishment system really equates to

Well, in the article, it says the punishment was a slap. On the other hand, he just says "she gives you a wonderful reward"... so you're left to use your imagination there.

wrs3y ago· 3 in thread

This is the funniest refutation of the Chinese Room argument that I’ve seen. Note that at the end, it’s still the case that none of these people can recognize a cat.

pringk023y ago

Doesn't that mean it supports the Chinese room argument? I'm not sure I follow your reasoning.

(also, popular conciousness forgets that technically the Chinese Room argument is only arguing against the much narrower, and now philosophically unfashionable, "Hard AI" stance as it was held in the 70s)

tasty_freeze3y ago

> the Chinese Room argument is only arguing against the much narrower, and now philosophically unfashionable, "Hard AI" stance as it was held in the 70s

Searle has stood behind his argument in the 70s, but in every decade since then too.

The main failure is that most people fundamentally don't believe they are mechanistic. If one believe in dualism, then it easy to attribute various mental states to that dualism, and of course a computer neural network cannot experience qualia like humans do.

I don't believe in a soul, and thus believe that a computer neural network, probably not today's models but a future one that is large enough and has the right recurrent topology, will be able to have qualia similar to what humans and animals experience.

wrs3y ago

I understand the Chinese Room argument to be that because the human in the room doesn’t understand Chinese, the system doesn’t understand Chinese. In this case, none of the humans can recognize cats, but the collective can.

https://www.youtube.com/watch?v=kCc8FmEb1nY

giardini3y ago· 3 in thread

Nothing about LLMs?!

fifteen15063y ago

Yeah, I need something to explain me about those Transformers things. I know it was published by Google in 2017 and that it is 'magic'.

End of knowledge.

Maybe I should ask ChatGPT?

Izkata3y ago

2-hour video posted a month or two ago in a comment here: "Let's build GPT: from scratch, in code, spelled out."

(I haven't gotten around to watching it yet)

jedberg3y ago

> Maybe I should ask ChatGPT?

You actually should, it spits out a pretty good explanation (sometimes).

time_to_smile3y ago· 1 in thread

> This is how neural networks work: they see many examples and get rewarded or punished based on whether their guesses are correct.

This description more closely describes reinforcement learning, rather than gradient based optimization.

In fact, the entire metaphor of a confused individual being slapped or rewarded without understanding what's going on doesn't really make sense when considering gradient optimization because the gradient wrt the to loss function tells the network exactly how to change it's behavior to improve it's performance.

This last point is incredibly important to understand correctly since it contains one of the biggest assumptions about network behavior: that the optimal solution, or at least good enough for our concerns solution, can be found by slowing taking small steps in the right direction.

Neural networks are great at refining their beliefs but have a difficult time radically changing them. A better analogy might be trying to very slowly convince your uncle that climate change is real, and not a liberal conspiracy.

edit: it also does a poor job of explaining layers, which reads much more similar to how ensemble methods work (lots of little classifiers voting) than how deep networks work.

cshimmin3y ago

Well said re: gradient optimization vs. "getting slapped". However, note that since NN optimization is almost always nonconvex, we are NOT guaranteed to arrive an a optimal (or even close-enough) solution. A major limitation of gradient based optimization on nonconvex problems is that they are very susceptible to getting trapped in local minima.

But, for now it's the best tool we have, so we just have to hope that we get close enough, or just empirically run lots of times to find the best local minimum we can. Incidentally, this actually is more like a brute-force approach, but at the ensemble level, which is quite different than the article means it.

sainez3y ago· 1 in thread

If anyone is looking for a quick overview of how LLMs are built, I highly recommend this video by Steve Seitz: https://www.youtube.com/watch?v=lnA9DMvHtfI.

It does an excellent job of taking you from 0 to a decent understanding without dumbing down the content or abusing analogies.

MandieD3y ago

This really was excellent and just what I was looking for to explain what LLMs are to non-CS people. Thanks!

dukeofdoom3y ago· 1 in thread

I asked ChatGPT:

Sure, I can explain deep learning using the analogy of mixing paint colors.

When you mix different colors of paint, you can create new colors that are a combination of the original colors. Similarly, in deep learning, you can combine different types of data, such as images, text, and audio, to create a more comprehensive understanding of the information.

In a neural network, which is the basic building block of deep learning, each layer is like a different color of paint. The input layer represents the initial data, which is like a blank canvas. As the data passes through each layer, the network extracts increasingly complex features, similar to how mixing different colors creates more complex hues.

Each layer in a neural network can be thought of as a different color that is mixed with the previous layer's color. As the colors mix, the resulting color becomes more complex and nuanced, just as the data becomes more sophisticated and detailed as it passes through the layers of the neural network.

At the end of the process, the output layer represents the final result, which is like a finished painting. The deep learning algorithm can then use this final output to make predictions or classifications based on the input data.

So, in essence, deep learning is like mixing paint colors to create new and more complex hues, but instead of colors, the mixtures are made up of data and the neural network layers are the different colors being mixed.

dukeofdoom3y ago

Certainly! Here's an explanation of how a neural network is trained using the analogy of mixing paint colors.

Initialization: The weights and biases of the neural network are like the starting colors of the paint mixture. Just like how you need to start with some base colors to mix new ones, the neural network needs some initial weights and biases to begin the training process.

Forward Pass: During the forward pass, the input data is fed into the neural network, and the network produces an output. This output is like the mixed color of paint that results from combining different colors.

Backward Pass: During the backward pass, the error between the predicted output and the actual output is calculated using a loss function. This error is like a measure of how close the mixed color is to the desired color. Just like how you would adjust the mixture of colors to get the desired hue, the neural network adjusts the weights and biases of the neurons to minimize the error between the predicted output and the actual output.

Repeat: Steps 2 and 3 are repeated many times with different input data points, and the weights and biases of the neural network are updated after each iteration. This process is like adding more colors to the mixture and adjusting the amounts until the desired color is achieved.

Validation: After several epochs, the neural network is evaluated using a validation set of data to determine if it's overfitting or underfitting the training data. This step is like checking the mixture of colors to see if it matches the desired color under different lighting conditions.

Testing: Once the neural network has been trained and evaluated, it can be used to make predictions on new data. This step is like using the paint mixture to create a painting.

In summary, just like mixing paint colors to achieve a desired hue, the neural network adjusts its weights and biases to minimize the error between the predicted output and the actual output until it accurately predicts the output for new data.

great_wubwub3y ago· 1 in thread

As someone who knows barely enough to be dangerous, I like this. I'm sure it leaves enough out to make most experts angry, but it makes a lot of sense to me.

time_to_smile3y ago

> I'm sure it leaves enough out to make most experts angry

It's not that it leaves out details, it's that the articles metaphors are not actually correct in regards to the way deep learning works.

This post mostly confuses both reinforcement learning and ensemble models with deep learning. If you only enough "enough to be dangerous" then this post will steer your intuition in the wrong direction.

Myrmornis3y ago· 1 in thread

> they see 3 spreadsheets of numbers representing the RGB values of the picture.

This needs expanding: it's the sort of thing that's easy for a programmer to say, but few non-{programmer,mathematically trained person} are going to see that an RGB value has 3 parts and so a collection of RGB values could be sliced into 3 sheets.

romwell3y ago

...or know what Ruth Ginsburg Bader has anything to do with it all.

The RGB color model and representation of images in it is already technical. Anyone who knows what it means also wouldn't need to be told the following quip:

>Also note that computers see things as multi-dimensional tables of data. They don't look at a "picture" - they see 3 spreadsheets of numbers representing the RGB values of the picture.

...which is the only time RGB is mentioned in the article.

That's before we get to the part that "multidimensional" here is extraneous, and doesn't even match the typical usage (where RGBA is stored as a single 32-bit value). Everything is a tape of 1's and 0's, "multidimensionality" comes from interpretation of data.

The dimension of image data is still 2: each pixel is a sample a 2D projection of a 3D world, and is related to other pixels in a way that's different than, say, those of letters in a line of text, or voxels (letters don't have a a well-defined "up" neighbor, voxels have more well-defined neighbors than pixels do).

lhnz3y ago

I have a few funny analogies that I think kind of work.

1. "gradient descent" is like tuning a guitar by ear and listening to the beat frequencies ("loss") and then decreasing these by tuning a string up or down.

2. the best I can come up with for "backpropagation" is to imagine a clever device that can tirelessly optimize a Rube Goldberg machine for you but as a science, not an art.

hgsgm3y ago

Non-technical, non-accurate. "Truthy", buzzfeed/huffpo quality.

charcircuit3y ago

Why is violence and praise being used to illustrate gradient descent? Why does each person get to see the entire input data?

kagaw3y ago

I see, so neural network works like Naruto's Kage Bunshin technique where the learning of the clones will pass to its origin

hyperdimension3y ago

> They respond that this sounds very convoluted and they'll only agree to do it if you call them "colonel".

Cute.

tartakovsky3y ago

Made me laugh because it's true, funny. Well done!

lxe3y ago

This has Three-Body Problem vibes :)

j / k navigate · click thread line to collapse

130 comments

88 comments · 22 top-level

jstx13y ago· 10 in thread

Does stuff like this help anyone?

gnicholas3y ago

Disclaimer: I don't know a lot about AI/ML, so it's possible that I am 100% wrong here!

kaycebasques3y ago

> I think it is not any more useful (alone) than watching the 3b1b video on this topic

This one? https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_...

ChatGTP3y ago

Sorry if this is a personal question, but why would you get a 9 year old into machine learning ?

hackernewds3y ago

You could just let your daughter see it. To what extent can you "protect" her exposure to the world?

SnooSux3y ago

I've barely forgiven him for explaining genetic algorithms and acting like they have any relevance to contemporary ML research.

The footnote video was an alright explanation of backprop. If that were part of the main video that would have been reasonable.

I really like his history/geography videos but anything technical leave a lot to be desired. And don't get me started on Humans Need Not Apply.

jstx13y ago

> And don't get me started on Humans Need Not Apply.

flangola73y ago

Humans Need Not Apply is one of the most phenomenal videos on YouTube, what do you think is wrong with it?

Trufa3y ago

What a strange word to use in that context, why would he need to be forgiven by you? How has he wronged you? Seems at worst, an honest mistake in a complicated topic.

Cpoll3y ago

You're not using "at worst" correctly. What you describe is an "at best". Worse would be that CGP Grey deliberately picked a misleading title in order to optimize views, algorithm, etc.

This is, I think, the case. But I don't begrudge them too much, YouTube is cutthroat.

gregschlom3y ago

Maybe GP is a non-native English speaker? This construct would be pretty common way for a native French speaker to say they are angry at something. Not sure if it's common in English as well.

zvmaz3y ago· 9 in thread

Knowledge is hard to attain...

sainez3y ago

joe_the_user3y ago

I agree but that's a good guide to build a technical understanding of a complex subject, not sufficient-in-itself tool set for considering questions in that complex subject.

Chocobean3y ago

Thanks for the link. Could you give an example of something you learned better/easier after having implemented a simplified version?

Side question: Is there a entry level build your own language model or gan type learning tool out there as well?

somenameforme3y ago

lxe3y ago

> an appropriate level of understanding mainly through analogies and metaphors

dmreedy3y ago

I am fascinated by this phenomenon, and the double-edged sword that metaphors are.

__loam3y ago

You're basically describing a lot of generative AI developers who are applying their technology to fields they don't really understand

wanderlust1233y ago

Can you please suggest other books similar in spirit to the Nagel book? Would love to read some over summer

clarle3y ago· 9 in thread

It seems like these sort of structures just regularly arise to help regulate the flow of information through a system.

0xBABAD00C3y ago

There is research claiming the entire universe is a neural network: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712105/

mtlmtlmtlmtl3y ago

You can find theoretical physics research claiming a lot of bonkers things that almost certainly are not true, that's sort of the nature of the field.

mcbuilder3y ago

That's the sort of blue sky research I'm glad exists.

mxkopy3y ago

Indra's Net is, in fact, a neural network

__loam3y ago

Frankly, stuff like this makes me more skeptical of the ML community. Remember when people thought Brains were just really complicated hydraulic systems?

joe_the_user3y ago

Uh,

Neural networks have no memory and generally don't seek particular goals, they simply recognize, predict and generate similar instances.

Retric3y ago

Plenty of ways to think about this stuff. IMO neural networks don’t inherently do anything, it’s just a data structure.

Different ways you can interact with that data structure can however provide meaning and store information in the weights etc.

d0gsg0w00f3y ago

    > Neural networks have no memory and generally don't seek particular goals, they simply recognize, predict and generate similar instances.

Sounds exactly like every corporation I've ever worked in.

mensetmanusman3y ago

Neural networks do have a memory of sorts, that’s why they improve with node size. A mathematician proved something along these lines recently.

The memory isn’t digital bits like we think of now though, but abstractions in higher dimensions.

_gmax03y ago· 9 in thread

The most concise and intuitive line of explanation I've been given goes along the lines of this:

1 - We want to model data, representative of some system, through functions.

2 - Virtually any function can be expressed by a n-th order polynomial.

3 - We wish to learn the parameters, the coefficients, of such polynomials.

Whereas prior, methods (e.g. PCA) could only model linear relationships, neural networks allowed us to begin modeling non-linear ones.

civilized3y ago

  y = a + bx + cx^2 + dx^3 + ...

So fitting polynomials is easy. The problem is that it's not that useful. Deep learning has to solve much harder problems to get to a useful model.

lhnz3y ago

Hm, I don't think that's quite it. I went through my own process of learning how neural networks work recently and wrote this based on my learning: https://sebinsua.com/bridging-the-gap

pedrosorio3y ago

Mentioning polynomials is a pretty poor way to explain it for two reasons:

- It requires some mathematical understanding so will exclude some part of the non-technical audience

- It is the incorrect analogy. Non-linearities in neural networks have nothing to do with polynomials. In fact, polynomial regression is a type of linear regression, and for the most part, it sucks.

_gmax03y ago

Thanks for the correction. It's been some time since I actively thought about the theory (evidently I didn't digest it correctly the first time!).

mr_toad3y ago

> Virtually any function can be expressed by a n-th order polynomial.

But there are many things that are not functions. Like circles. And they tend to crop up a lot in the real world, no pun intended.

cshimmin3y ago

But for example, we can associate the points on a circle with the image of the function e^{i theta}. Or equivalently, with the R^2-valued function f(theta) = (cos(theta), sin(theta)).

teruakohatu3y ago

> Whereas prior, methods (e.g. PCA) could only model linear relationships,

Prior methods also allowed modelling of non-linear relationships, eg. Random Forests.

jacksnipe3y ago

Except gradient descent is about as far from brute force as it gets

_gmax03y ago

Sure, under the assumption that your parameter space is convex.

amelius3y ago· 7 in thread

The problem with deep learning is opposite. You can understand most of it with just high school math. Advanced math is mostly useless because of the dimensionality of neural nets.

ftxbro3y ago

> Advanced math is mostly useless because of the dimensionality of neural nets.

amelius3y ago

Yes but do you have examples of "higher" math not being just a curiosity and actually making it into real world models and training algorithms?

uoaei3y ago

That's like saying you understand state-of-the-art CFD code because you can read Fortran.

mach1ne3y ago

Yes, it’s really rather like alchemy in some sense. Stuff works, and often nobody knows exactly why.

uoaei3y ago

"I don't follow the latest ML scaling and theory research" does not in any way equate to "these things are unknowable".

Utkarsh_Mood3y ago

can you elaborate further on what you mean by 'dimensionality of neural nets'? Thanks!

amelius3y ago

Yes, I mean the huge number of trainable parameters.

onikolas73y ago· 4 in thread

Funny. In the game black&white you would slap or pet your avatar to train it. The lead AI programmer on that was Demis Hassabis of deepmind fame.

Maultasche3y ago

The description made me think of Black & White as well. I still have memories of smacking my creature around every time he ate someone.

redog3y ago

Somehow he knew AI would be our Gods.

Hithredin3y ago

Actually, the god is the AI trainer. The AI is the god tool and us poor villagers are forced to worship the first.

__loam3y ago

False idols maybe

nailer3y ago· 4 in thread

Does this article imply there are circumstances where a spreadsheet is a cat? What a poor example of technical writing.

andysinclair3y ago

He’s saying that the spreadsheet represents the “picture” of the cat in terms of pixels and RGB values etc.

The algorithm/workers are not really “looking” at a picture of a cat, they are analysing and looking for patterns in the data that defines the picture of the cat.

nailer3y ago

Yes I know. The spreadsheet analogy does not work.

teacpde3y ago

Not the author, but to the author's defense, it is meant to be non-technical. And the first paragraph reads interesting to me.

nailer3y ago

Most non technical people would think there are zero circumstances where a spreadsheet could be a cat.

pkdpic3y ago· 3 in thread

I love this, but Im always confused in these kinds of analogies what the reward / punishment system really equates to...

Also reminds me of Ted Chiang warning us that we will torture innumerable AI entities long before we start having real conversations about treating them with compassion.

time_to_smile3y ago

Don't love it, it's not correct.

> what the reward / punishment system really equates to

Nothing, and least as far as neural network training goes. This is an extremely poor analogy regarding how neural networks learn.

There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

GaggiX3y ago

>There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

Well deep reinforcement learning.

> what the reward / punishment system really equates to

Well, in the article, it says the punishment was a slap. On the other hand, he just says "she gives you a wonderful reward"... so you're left to use your imagination there.

wrs3y ago· 3 in thread

This is the funniest refutation of the Chinese Room argument that I’ve seen. Note that at the end, it’s still the case that none of these people can recognize a cat.

pringk023y ago

Doesn't that mean it supports the Chinese room argument? I'm not sure I follow your reasoning.

tasty_freeze3y ago

> the Chinese Room argument is only arguing against the much narrower, and now philosophically unfashionable, "Hard AI" stance as it was held in the 70s

Searle has stood behind his argument in the 70s, but in every decade since then too.

wrs3y ago