Stanford Lecture Notes on Probabilistic Graphical Models (opens in new tab)

(ermongroup.github.io)

336 pointsvolodia9y ago32 comments

32 comments

23 comments · 8 top-level

beambot9y ago· 4 in thread

For a practical application of using graphical models to "solve" a bayesian problem, I recommend Frank Dellaert's whitepaper, which cover Simultaneous Localization and Mapping (SLAM, a robotics algorithm) using similar techniques: https://research.cc.gatech.edu/borg/sites/edu.borg/files/dow...

rsp19849y ago

I'm not an expert in Probabilistic Graphical Models but I do know factor graphs well. I've also read quite a bit of the recent SLAM and VO stuff that came out of Dellaert's group.

Here's the thing: Dellaert loves to write about SLAM from a viewpoint of probabilities and densities. I think he likes to see himself as a mathematician. However at the end of the day he's working with Gaussian noise assumptions, just like everyone else, and the solution is obtained by some form of Least Squares, again, like everyone else.

I would really like it if he just cut all the probabilistic thicket and went straight to the core of how his methods improve the SOTA just in terms of how the friggin Least Squares problem is set up and solved (e.g. like here [1]). But I guess that would probably take away most of the "magic".

[1] http://grail.cs.washington.edu/projects/mcba/pba.pdf

beambot9y ago

The factor graph representation and analysis lends itself to simplified understanding and reasoning about the relationship between poses, measurements, loop closures, etc. in a way that is simple and intuitive. Dellaert then uses that intuitive structure to show how to setup the problem in a way that can be efficiently computed using linear algebra.

To use another example: Let's say you understand and prefer to use QR factorization for some linear algebra use case. That doesn't mean that Cholesky or SVD are somehow "worthless" tools -- they just provide other useful insights or applications.

I personally find Dellaert's decomposition of the problem useful. It doesn't hurt that he was one of the early SLAM "inventors" (along with the likes of Fox, Burguard, et al). And since the topic of the post is (literally) probabilistic graphical models, it seems extremely relevant to this thread.

(I read through the article you linked. I do not agree that it is an "easier" or "simpler" formulation.)

1 more reply

aub3bhat9y ago

You cannot be more correct in your criticism, it feels as if there are a group of researchers who are just too enamored by beauty of "Bayesian stats and graphical models" that finding better solution of problem has become a secondary task for them.

Plus a segment of academia loves this type of research.

Xcelerate9y ago

> However at the end of the day he's working with Gaussian noise assumptions, just like everyone else, and the solution is obtained by some form of Least Squares, again, like everyone else.

Well you need some kind of model to optimize, and finding the perfect one is a vastly more difficult problem that probably begins to incorporate some stuff from information theory (which starts to become rather a different field).

I don't actually see the issue with researchers working on algorithms to solve the Gaussian noise versions of these problems — they can usually be extended to some other model relatively easily (for instance, rotation synchronization via Riemannian manifolds can incorporate a Pseudo-Huber loss function without much difficulty).

Also, what is SOTA?

2 more replies

dirtyaura9y ago· 4 in thread

For a novice, it's hard to assess how important PGMs are currently.

Investing my time to learn Deep Learning (CNNs, RNNs) vs. Random Forest vs. PGMs vs. Reinforcement Learning well enough to be able to apply the chosen approach, it seems that PGMs are not high in the list, is that correct?

Are there Kaggle competitions, in which PGMs have been the best approach?

What are the real-world problem areas that PGM's currently excel at compared to other methods?

mannigfaltig9y ago

https://www.quora.com/What-are-some-applications-of-Probabil...

I'd say for people with interest in ML and DL, the main insight from PGMs is intuitions about probability theory, especially conditional probabilities and the concept of (un)conditional independence. State of the art in which PGMs would be relevant is currently mostly brute-force learning via stochastic gradient descent (with momentum) in deep directed models/function approximators, but function approximation likely does not solve all problems as adversarial examples show. There are certainly very important applications of PGMs in all kinds of domains as listed in the link above (perhaps I would have added FastSLAM [1]), and there are also variational autoencoders/Helmholz machines and (Deep) Boltzmann Machines, which are especially interesting for research perspective.

[1] http://robots.stanford.edu/papers/montemerlo.fastslam-tr.pdf

eachro9y ago

Random Forests is not really at all in the same class of depth(no pun intended) as the other models you've mentioned; if you understand decision trees, random forests are just a way of combining independently trained decision trees that is less prone to overfitting.

I'd probably rank these areas in order of importance as follows: deep learning, pgms, reinforcement learning. Deep learning as a framework is pretty general. PGMs, as I have seen them, don't really have any one killer domain area - maybe robotics and areas where you want to explicitly model causality? Applications for reinforcement learning seem the more niche, but maybe that's because they haven't been adequately explored to the extent that DL/CNNs/RNNs/PGMs have been.

digitalzombie9y ago

Tree based algorithm have it's place. I dunno what your definition of depth is but it all depends on your data.

I'm doing a thesis on tree based algorithm and it works great for medical data.

Granted I have little exposer to NN, you can't do that with NN when clinical trials data is small as hell.

It's all base on the type data and your resource and criteria.

NN is really hype and people tend to over look other algorithms but NN is not a silver bullet. I don't get how you rank those importance either, it could be bias.

1 more reply

jhbadger9y ago

In analysis of scientific data, causality is important. Typically in science you aren't really making a model to predict things (although that can be a good way to test the model) but rather to understand what is going on, and PGMs are great for that.

georgeek9y ago· 3 in thread

An amazing text on this topic is Martin Wainwright/Michael Jordan's Graphical Models, Exponential Families, and Variational Inference: https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_F...

rsp19849y ago

Thanks for sharing. Somehow I find a lot of Computer Science texts published under the "Foundations and Trends" model of very high quality.

signa119y ago

wow ! thank you :) this should be required reading/prereq, ideally, before taking the PGM course offered via coursera.

georgeek9y ago

This is more of a graduate-level text though. It's as close as you can currently get to Michael Jordan's forthcoming book on graphical models and ML.

1 more reply

graycat9y ago· 2 in thread

In their "Probability review" at

http://ermongroup.github.io/cs228-notes/preliminaries/probab...

I see two problems:

(1) First Problem -- Sample Space

Their definition of a sample space is

"The set of all the outcomes of a random experiment. Here, each outcome ω can be thought of as a complete description of the state of the real world at the end of the experiment."

The "complete description" part is not needed and even if included has meaning that is not clear.

Instead, each possible experiment is one trial and one element in the set of all trials Ω. That's it: Ω is just a set of trials, and each trial is just an element of that set. There is nothing there about the outcomes of the trials.

Next the text has

"The sample space is Ω = {1, 2, 3, 4, 5, 6}."

That won't work: Too soon will find that need an uncountably infinite sample space. Indeed an early exercise is that the set of all events cannot be countably infinite.

Indeed, a big question was, can there be a sample space big enough to discuss random variables as desired? The answer is yes and is given in the famous Kolomogorov extension theorem.

(2) Second Problem -- Notation

An event A is an element of the set of all events F and a subset of the sample space Ω.

Then a probability measure P or just a probability is a function P: F --> [0,1] that is, the closed interval [0,1].

So, we can write the probability of event A by P(A). Fine.

Or, given events A and B, we can consider the event C = A U B and, thus, write P(C) = P(A U B). Fine.

But the notes have P(1,2,3,4), and that is undefined in the notes and, really, in the rest of probability. Why? Because

1,2,3,4,

is not an event.

For the set of real numbers R, a real random variable X: Ω --> R (that is measurable with respect to the sigma algebra F and a specified sigma algebra in R, usually the Borel sets, the smallest sigma algebra containing the open sets, or the Lebesgue measurable sets).

Then an event would be X in {1,2,3,4} subset of R or the set of all ω in Ω so that X(ω) in {1,2,3,4} or

{ω| X(ω) in {1,2,3,4} }

or the inverse image of {1,2,3,4} under X -- could write this all more clearly if had all of D. Knuth's TeX.

in which case we could write

P(X in {1,2,3,4})

When the elementary notation is bad, a bit tough to take the more advanced parts seriously.

A polished, elegant treatment of these basics is early in

Jacques Neveu, Mathematical Foundations of the Calculus of Probability, Holden-Day, San Francisco, 1965.

Neveu was a student of M. Loeve at Berkeley, and can also see Loeve, Probability Theory, I and II, Springer-Verlag. A fellow student of Neveu at Berkeley under Loeve was L. Breiman, so can also see Breiman, Probability, SIAM.

These notes are from Stanford. But there have long been people at Stanford, e.g., K. Chung, who have these basics in very clear, solid, and polished terms, e.g.,

Kai Lai Chung, A Course in Probability Theory, Second Edition, ISBN 0-12-174650-X, Academic Press, New York, 1974.

K. L. Chung and R. J. Williams, Introduction to Stochastic Integration, Second Edition, ISBN 0-8176-3386-3, Birkhaüser, Boston, 1990.

Kai Lai Chung, Lectures from Markov Processes to Brownian Motion, ISBN 0-387-90618-5, Springer-Verlag, New York, 1982.

ilzmastr9y ago

If the notes only discuss discrete valued RVs there is no problem. This looks like the case since from greping for "continuous" there are few results: https://github.com/ermongroup/cs228-notes/search?l=Markdown&...

graycat9y ago

Hmm, discrete only? So, can't take averages, state the law of large numbers, the central limit theorem, convergence, completeness, do linear regression, etc. Can't multiply a random variable by a real number. Hmm ....

Will be in trouble when considering a sequence of, say, independent random variables, say, as in coin flipping.

Will have trouble with P(A|B) and E[Y|X].

And the notation is still wrong.

1 more reply

mrcactu59y ago· 2 in thread

course like these make me wonder ... how much one can dress up basic probability. i think the answer is A LOT

adamnemecek9y ago

If you can derive all this just from probability my hat is off to you.

1 more reply

Kinnard9y ago

What makes you say that?

refrigerator9y ago

For anyone interested, here are the materials for the Graphical Models course at Oxford: http://www.stats.ox.ac.uk/~evans/gms/index.htm

philipov9y ago

OpenCourseOnline, different Stanford professor: https://www.youtube.com/watch?v=WPSQfOkb1M8&list=PL50E6E80E8...

Carnegie-Mellon: https://www.youtube.com/watch?v=lcVJ_zsynMc&list=PLI3nIOD-p5...

likelynew9y ago

We have a professor in our college, who uses this thing with great passion. I took two related courses under him. The problem with this thing is he uses his own mental image and notation in the lectures and examination. Even the internet seems to be highly affected with this problem. I think there are many concepts that are not hard but it takes time to get a feeling. Like see "monads are not burritos" essay. It describes the problem of using analogies in explaining monads. This seems like a great page in the sense it uses minimal confusing analogies as is common in most resources in bayesian rule.

j / k navigate · click thread line to collapse

32 comments

23 comments · 8 top-level

beambot9y ago· 4 in thread

rsp19849y ago

I'm not an expert in Probabilistic Graphical Models but I do know factor graphs well. I've also read quite a bit of the recent SLAM and VO stuff that came out of Dellaert's group.

[1] http://grail.cs.washington.edu/projects/mcba/pba.pdf

beambot9y ago

(I read through the article you linked. I do not agree that it is an "easier" or "simpler" formulation.)

1 more reply

aub3bhat9y ago

Plus a segment of academia loves this type of research.

Xcelerate9y ago

> However at the end of the day he's working with Gaussian noise assumptions, just like everyone else, and the solution is obtained by some form of Least Squares, again, like everyone else.

Also, what is SOTA?

2 more replies

dirtyaura9y ago· 4 in thread

For a novice, it's hard to assess how important PGMs are currently.

Are there Kaggle competitions, in which PGMs have been the best approach?

What are the real-world problem areas that PGM's currently excel at compared to other methods?

mannigfaltig9y ago

https://www.quora.com/What-are-some-applications-of-Probabil...

[1] http://robots.stanford.edu/papers/montemerlo.fastslam-tr.pdf

eachro9y ago

digitalzombie9y ago

Tree based algorithm have it's place. I dunno what your definition of depth is but it all depends on your data.

I'm doing a thesis on tree based algorithm and it works great for medical data.

Granted I have little exposer to NN, you can't do that with NN when clinical trials data is small as hell.

It's all base on the type data and your resource and criteria.

NN is really hype and people tend to over look other algorithms but NN is not a silver bullet. I don't get how you rank those importance either, it could be bias.

1 more reply

jhbadger9y ago

georgeek9y ago· 3 in thread

An amazing text on this topic is Martin Wainwright/Michael Jordan's Graphical Models, Exponential Families, and Variational Inference: https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_F...

rsp19849y ago

Thanks for sharing. Somehow I find a lot of Computer Science texts published under the "Foundations and Trends" model of very high quality.

signa119y ago

wow ! thank you :) this should be required reading/prereq, ideally, before taking the PGM course offered via coursera.

georgeek9y ago

This is more of a graduate-level text though. It's as close as you can currently get to Michael Jordan's forthcoming book on graphical models and ML.

1 more reply

graycat9y ago· 2 in thread

In their "Probability review" at

http://ermongroup.github.io/cs228-notes/preliminaries/probab...

I see two problems:

(1) First Problem -- Sample Space

Their definition of a sample space is

"The set of all the outcomes of a random experiment. Here, each outcome ω can be thought of as a complete description of the state of the real world at the end of the experiment."

The "complete description" part is not needed and even if included has meaning that is not clear.

Next the text has

"The sample space is Ω = {1, 2, 3, 4, 5, 6}."

That won't work: Too soon will find that need an uncountably infinite sample space. Indeed an early exercise is that the set of all events cannot be countably infinite.

Indeed, a big question was, can there be a sample space big enough to discuss random variables as desired? The answer is yes and is given in the famous Kolomogorov extension theorem.

(2) Second Problem -- Notation

An event A is an element of the set of all events F and a subset of the sample space Ω.

Then a probability measure P or just a probability is a function P: F --> [0,1] that is, the closed interval [0,1].

So, we can write the probability of event A by P(A). Fine.

Or, given events A and B, we can consider the event C = A U B and, thus, write P(C) = P(A U B). Fine.

But the notes have P(1,2,3,4), and that is undefined in the notes and, really, in the rest of probability. Why? Because

1,2,3,4,

is not an event.

Then an event would be X in {1,2,3,4} subset of R or the set of all ω in Ω so that X(ω) in {1,2,3,4} or

{ω| X(ω) in {1,2,3,4} }

or the inverse image of {1,2,3,4} under X -- could write this all more clearly if had all of D. Knuth's TeX.

in which case we could write

P(X in {1,2,3,4})

When the elementary notation is bad, a bit tough to take the more advanced parts seriously.

A polished, elegant treatment of these basics is early in

Jacques Neveu, Mathematical Foundations of the Calculus of Probability, Holden-Day, San Francisco, 1965.

These notes are from Stanford. But there have long been people at Stanford, e.g., K. Chung, who have these basics in very clear, solid, and polished terms, e.g.,

Kai Lai Chung, A Course in Probability Theory, Second Edition, ISBN 0-12-174650-X, Academic Press, New York, 1974.

K. L. Chung and R. J. Williams, Introduction to Stochastic Integration, Second Edition, ISBN 0-8176-3386-3, Birkhaüser, Boston, 1990.

Kai Lai Chung, Lectures from Markov Processes to Brownian Motion, ISBN 0-387-90618-5, Springer-Verlag, New York, 1982.

ilzmastr9y ago

graycat9y ago

Will be in trouble when considering a sequence of, say, independent random variables, say, as in coin flipping.

Will have trouble with P(A|B) and E[Y|X].

And the notation is still wrong.

1 more reply

mrcactu59y ago· 2 in thread

course like these make me wonder ... how much one can dress up basic probability. i think the answer is A LOT

adamnemecek9y ago

If you can derive all this just from probability my hat is off to you.

1 more reply

Kinnard9y ago

What makes you say that?

refrigerator9y ago

For anyone interested, here are the materials for the Graphical Models course at Oxford: http://www.stats.ox.ac.uk/~evans/gms/index.htm

philipov9y ago

OpenCourseOnline, different Stanford professor: https://www.youtube.com/watch?v=WPSQfOkb1M8&list=PL50E6E80E8...

Carnegie-Mellon: https://www.youtube.com/watch?v=lcVJ_zsynMc&list=PLI3nIOD-p5...

likelynew9y ago

j / k navigate · click thread line to collapse