Understanding Convolutional Neural Networks (opens in new tab)

(poloclub.github.io)

191 pointsadvanderveer5y ago21 comments

21 comments

18 comments · 6 top-level

natch5y ago· 6 in thread

Very nice interactive tutorial tool. I wish some of the terms were defined. Hyperparameter? Convolution? Kernel?

mjburgess5y ago

Model: the learnt relationship (eg., f(x; a,b) = ax + b)

Parameter: an aspect of the model, a dial which is fixed by data (eg., a)

Kernel (as used here): a subset of such parameters

Algorithm: procedure which accepts data and produces a model

Hyperparameter: an aspect of the algorithm, a dial which changes model production

Convolution: A convolution of image A and Filter B describes to what degree A is "like" B. Here "Filter B" is a kernel, ie., a parameter set learnt by the network.

The goal of a CNN is to produce a model whose parameters are image filters that describe the degree to which an images expresses various shapes. By learning the filters from an image set, the network is specialized to distinguish images in that set.

f6v5y ago

> Hyperparameter: an aspect of the algorithm, a dial which changes model production

This seems a bit cryptic. The way I understand hyperparamters, they define how a model learns, i.e. you can set an alpha in gradient descent. Now when you compare them to "ordinary" parameters, hyperparamters do not define relationship between data and output.

natch5y ago

These definitions are too vague by half. In a word, useless. “An aspect of the algorithm, a dial” so the same as a parameter then, according to your definition... the only distinction is it changes model production, but in my experience data does that too, so... no clarity here. You make them sound the same to the naive reader.

And you misunderstood my suggestion for the article as a request for your help. But thanks. I don’t doubt what you wrote is accurate and helpful in the same way that saying “a transom is a part of a building” is accurate and helpful.

2 more replies

ww5205y ago

The main parameter in ML is θ (theta), as in Y = θ0 + θ1 X1 + θ2 X2 + ..., which are learned using the training data. (X1, X2, ...) are the features. The main goal of ML is to determine these theta parameters to a model so that you can use them to predict result on new data.

Hyperparameters in ML is the tuning parameters on the shape and structure of the model, such as the number of features in linear regression above, the number of layers in a NN or number of neurons in each layer. I think basically any tuning parameters besides theta can be considered hyperparameters. The difference is the theta parameters are learned while the hyperparameters are decided by human. But you can also run experiments on different tuning parameters and compare the outcomes so in a sense hyperparameters can be learned.

Convolution, well, the article is trying to explain it. It's like rolling up a portion of an image using a filter. E.g. Making an image blur by pixelizing it. The main purpose is the find out high level feature of the image. E.g. Put a filter on to find the edge of an object in the image.

Kernel is a small NxN matrix (3x3, 4x4, 16x16, etc) used as filter to convert the pixels in an image to high level feature. E.g. the mean-color-kernel takes 4x4 pixels and computes the average of their colors. Now apply the mean-color-kernel over all the 4x4 blocks of an image and you got one convolution.

natch5y ago

Thanks, very helpful.

bigfoot6755y ago

I agree, and I'm not sure why people are replying to you trying to help you. Your comment is a constructive critique on the link. By adding a bit more introduction, this tool could be made even more useful.

noname1235y ago· 3 in thread

Just a shout-out to Professor Polo (the Professor who is the leader of the Polo Club of Data Science who wrote the tutorial on CNN's), 3 years ago, as a CRUD code monkey, I found out about Georgia Tech's launching of Online Masters in Data Science on Hacker News and then entered into the program and took Dr.Polo's class on Data Visualization and Analysis (for that semester I lit. spent more time on the class than on my day-job)... now for going full circle 3 years later, I see Dr.Polo's tutorial on CNN on Hacker News again and am working on CNN/RNN's for my last class/capstone project lol. The circle of life or the circle of HN I suppose!

Anyone here in the OMSA/OMSCS program?

loulou245y ago

Did you study this course [1] on your own?

[1] https://omscs.gatech.edu/cse-6242-data-visual-analytics

[ ] https://poloclub.github.io/cse6242-2019fall-online/

person_of_color5y ago

3 years is a long time for a degree. Should have done Udacity.

scrozart5y ago

3 years isn't a long time to obtain a master's degree while working. This program, and many others, actually give you 5 years so you can take roughly one class per semester, 2 per year, and not destroy yourself. GT is also a top-tier school. Not a knock on Udacity.

archerx5y ago· 2 in thread

Why doesn't it explain what the "bias" is?

scarlac5y ago

Neurons have weight and bias. Weight is multiplied, bias is added. Really that simple.

Here are examples of equivalent notations that may make it more familiar:

  y = 2⋅x + b or
  y = a⋅x + b

- x would be the input to the neuron, often named 'x'.

- a/2 would be the weight, often named 'w'.

- b would be the bias, often named 'b'.

- y is the output.

In neural network documentation this is often written as (often case sensitive, which may be confusing): output = w⋅x + b (output = weight * input + bias).

I hope that explains it.

confuseshrink5y ago

Bias is just a scalar term that is added. You can learn it via backpropagation like all the other weights.

max_likelihood5y ago· 1 in thread

I know very little about CNNs. But, I noticed the ReLu Activation step is Max(0,x) where x is the sum of the pixel intensities from each channel. In this example, it appears x > 0 (for all x) and so the activation step isn't really doing much?

EDIT: I'm wrong. x < 0 for some of the pixels. Specifically for the more red-ish channels.

blackbear_5y ago

No, relu is applied after convolution, so x is the result of applying the kernel at a particular location of the input, so it depends on the color as well as on the kernel.

atum475y ago

I wrote a nn a while back and made some interesting projects with it. two things I wanted to do: transfer the matrix multiplications to the GPU and also implement convolution layers. if I ever got free time again I'll maybe do it.

thanks for sharing this, great content. made me think about old projects I have laying around.

lbj5y ago

The interactive inspection of each layer is beautifully implemented. I hope that one day we'll be able to make even more sense of the consequence of each individual weight, ie. know more than -0.52 for a given pixel.

j / k navigate · click thread line to collapse

21 comments

18 comments · 6 top-level

natch5y ago· 6 in thread

Very nice interactive tutorial tool. I wish some of the terms were defined. Hyperparameter? Convolution? Kernel?

mjburgess5y ago

Model: the learnt relationship (eg., f(x; a,b) = ax + b)

Parameter: an aspect of the model, a dial which is fixed by data (eg., a)

Kernel (as used here): a subset of such parameters

Algorithm: procedure which accepts data and produces a model

Hyperparameter: an aspect of the algorithm, a dial which changes model production

Convolution: A convolution of image A and Filter B describes to what degree A is "like" B. Here "Filter B" is a kernel, ie., a parameter set learnt by the network.

f6v5y ago

> Hyperparameter: an aspect of the algorithm, a dial which changes model production

natch5y ago

2 more replies

ww5205y ago

natch5y ago

Thanks, very helpful.

bigfoot6755y ago

noname1235y ago· 3 in thread

Anyone here in the OMSA/OMSCS program?

loulou245y ago

Did you study this course [1] on your own?

[1] https://omscs.gatech.edu/cse-6242-data-visual-analytics

[ ] https://poloclub.github.io/cse6242-2019fall-online/

person_of_color5y ago

3 years is a long time for a degree. Should have done Udacity.

scrozart5y ago

archerx5y ago· 2 in thread

Why doesn't it explain what the "bias" is?

scarlac5y ago

Neurons have weight and bias. Weight is multiplied, bias is added. Really that simple.

Here are examples of equivalent notations that may make it more familiar:

  y = 2⋅x + b or
  y = a⋅x + b

- x would be the input to the neuron, often named 'x'.

- a/2 would be the weight, often named 'w'.

- b would be the bias, often named 'b'.

- y is the output.

In neural network documentation this is often written as (often case sensitive, which may be confusing): output = w⋅x + b (output = weight * input + bias).

I hope that explains it.

confuseshrink5y ago

Bias is just a scalar term that is added. You can learn it via backpropagation like all the other weights.

max_likelihood5y ago· 1 in thread

EDIT: I'm wrong. x < 0 for some of the pixels. Specifically for the more red-ish channels.

blackbear_5y ago

No, relu is applied after convolution, so x is the result of applying the kernel at a particular location of the input, so it depends on the color as well as on the kernel.

atum475y ago

thanks for sharing this, great content. made me think about old projects I have laying around.

lbj5y ago

j / k navigate · click thread line to collapse