Here's the thing: Dellaert loves to write about SLAM from a viewpoint of probabilities and densities. I think he likes to see himself as a mathematician. However at the end of the day he's working with Gaussian noise assumptions, just like everyone else, and the solution is obtained by some form of Least Squares, again, like everyone else.
I would really like it if he just cut all the probabilistic thicket and went straight to the core of how his methods improve the SOTA just in terms of how the friggin Least Squares problem is set up and solved (e.g. like here [1]). But I guess that would probably take away most of the "magic".
To use another example: Let's say you understand and prefer to use QR factorization for some linear algebra use case. That doesn't mean that Cholesky or SVD are somehow "worthless" tools -- they just provide other useful insights or applications.
I personally find Dellaert's decomposition of the problem useful. It doesn't hurt that he was one of the early SLAM "inventors" (along with the likes of Fox, Burguard, et al). And since the topic of the post is (literally) probabilistic graphical models, it seems extremely relevant to this thread.
(I read through the article you linked. I do not agree that it is an "easier" or "simpler" formulation.)
Plus a segment of academia loves this type of research.
Well you need some kind of model to optimize, and finding the perfect one is a vastly more difficult problem that probably begins to incorporate some stuff from information theory (which starts to become rather a different field).
I don't actually see the issue with researchers working on algorithms to solve the Gaussian noise versions of these problems — they can usually be extended to some other model relatively easily (for instance, rotation synchronization via Riemannian manifolds can incorporate a Pseudo-Huber loss function without much difficulty).
Also, what is SOTA?
Investing my time to learn Deep Learning (CNNs, RNNs) vs. Random Forest vs. PGMs vs. Reinforcement Learning well enough to be able to apply the chosen approach, it seems that PGMs are not high in the list, is that correct?
Are there Kaggle competitions, in which PGMs have been the best approach?
What are the real-world problem areas that PGM's currently excel at compared to other methods?
I'd say for people with interest in ML and DL, the main insight from PGMs is intuitions about probability theory, especially conditional probabilities and the concept of (un)conditional independence. State of the art in which PGMs would be relevant is currently mostly brute-force learning via stochastic gradient descent (with momentum) in deep directed models/function approximators, but function approximation likely does not solve all problems as adversarial examples show. There are certainly very important applications of PGMs in all kinds of domains as listed in the link above (perhaps I would have added FastSLAM [1]), and there are also variational autoencoders/Helmholz machines and (Deep) Boltzmann Machines, which are especially interesting for research perspective.
[1] http://robots.stanford.edu/papers/montemerlo.fastslam-tr.pdf
I'd probably rank these areas in order of importance as follows: deep learning, pgms, reinforcement learning. Deep learning as a framework is pretty general. PGMs, as I have seen them, don't really have any one killer domain area - maybe robotics and areas where you want to explicitly model causality? Applications for reinforcement learning seem the more niche, but maybe that's because they haven't been adequately explored to the extent that DL/CNNs/RNNs/PGMs have been.
I'm doing a thesis on tree based algorithm and it works great for medical data.
Granted I have little exposer to NN, you can't do that with NN when clinical trials data is small as hell.
It's all base on the type data and your resource and criteria.
NN is really hype and people tend to over look other algorithms but NN is not a silver bullet. I don't get how you rank those importance either, it could be bias.
http://ermongroup.github.io/cs228-notes/preliminaries/probab...
I see two problems:
(1) First Problem -- Sample Space
Their definition of a sample space is
"The set of all the outcomes of a random experiment. Here, each outcome ω can be thought of as a complete description of the state of the real world at the end of the experiment."
The "complete description" part is not needed and even if included has meaning that is not clear.
Instead, each possible experiment is one trial and one element in the set of all trials Ω. That's it: Ω is just a set of trials, and each trial is just an element of that set. There is nothing there about the outcomes of the trials.
Next the text has
"The sample space is Ω = {1, 2, 3, 4, 5, 6}."
That won't work: Too soon will find that need an uncountably infinite sample space. Indeed an early exercise is that the set of all events cannot be countably infinite.
Indeed, a big question was, can there be a sample space big enough to discuss random variables as desired? The answer is yes and is given in the famous Kolomogorov extension theorem.
(2) Second Problem -- Notation
An event A is an element of the set of all events F and a subset of the sample space Ω.
Then a probability measure P or just a probability is a function P: F --> [0,1] that is, the closed interval [0,1].
So, we can write the probability of event A by P(A). Fine.
Or, given events A and B, we can consider the event C = A U B and, thus, write P(C) = P(A U B). Fine.
But the notes have P(1,2,3,4), and that is undefined in the notes and, really, in the rest of probability. Why? Because
1,2,3,4,
is not an event.
For the set of real numbers R, a real random variable X: Ω --> R (that is measurable with respect to the sigma algebra F and a specified sigma algebra in R, usually the Borel sets, the smallest sigma algebra containing the open sets, or the Lebesgue measurable sets).
Then an event would be X in {1,2,3,4} subset of R or the set of all ω in Ω so that X(ω) in {1,2,3,4} or
{ω| X(ω) in {1,2,3,4} }
or the inverse image of {1,2,3,4} under X -- could write this all more clearly if had all of D. Knuth's TeX.
in which case we could write
P(X in {1,2,3,4})
When the elementary notation is bad, a bit tough to take the more advanced parts seriously.
A polished, elegant treatment of these basics is early in
Jacques Neveu, Mathematical Foundations of the Calculus of Probability, Holden-Day, San Francisco, 1965.
Neveu was a student of M. Loeve at Berkeley, and can also see Loeve, Probability Theory, I and II, Springer-Verlag. A fellow student of Neveu at Berkeley under Loeve was L. Breiman, so can also see Breiman, Probability, SIAM.
These notes are from Stanford. But there have long been people at Stanford, e.g., K. Chung, who have these basics in very clear, solid, and polished terms, e.g.,
Kai Lai Chung, A Course in Probability Theory, Second Edition, ISBN 0-12-174650-X, Academic Press, New York, 1974.
K. L. Chung and R. J. Williams, Introduction to Stochastic Integration, Second Edition, ISBN 0-8176-3386-3, Birkhaüser, Boston, 1990.
Kai Lai Chung, Lectures from Markov Processes to Brownian Motion, ISBN 0-387-90618-5, Springer-Verlag, New York, 1982.
Will be in trouble when considering a sequence of, say, independent random variables, say, as in coin flipping.
Will have trouble with P(A|B) and E[Y|X].
And the notation is still wrong.
Carnegie-Mellon: https://www.youtube.com/watch?v=lcVJ_zsynMc&list=PLI3nIOD-p5...