The book starts from the deduction of Bayesian theorem from the first principles of logic and shows its applications to a wide range of topics. There is thorough discussion of various "paradoxes" and the author sharply criticizes the frequentist statistics. In addition there is a lot of historical references.
http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...
It's a book I'm happy to have in dead tree form on my shelf.
P(H') = (H/H+T)^H'
You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.If the goal of the article is to show that Bayesian is more correct than frequentist then it leaves the reader unconvinced. If the goal is to show 3 ways of finding a probability, you should either say each is fine under its own paradigm, or argue why only one paradigm is correct.
That's not the probability of getting H' heads in a row. It's an estimate of the probability of getting H' heads in a row based on a Maximum Likelihood estimation.
It doesn't make much sense if you take it to be the probability of getting H' heads in a row. For example, if {H=1, T=0}, then P(H'=100) = 1. You looked at one flip, and then decided that every subsequent flip was guaranteed to be heads?
It becomes even more clear that the question isn't really being answered if you take {H=0, T=0}.
The question was asking for P(H' | H, T), not P(H').
> You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.
Because there is no error? In the proof I assume P(p) is known and then after that every step follows from a law of probability. There is no error to be accounted for in the procedure. The only caveat is that we need to know P(p) to be able to perform the procedure, which is a caveat that I point out at least 3 times in the page.
This is most of the reason I come here, because people show the good will to share bits of knowledge and experience.
Then a whole other benefit, is that when people are willing to do this, their contribution might be critiqued or corrected, which can then sharpen or polish your knowledge and thinking even in areas where you might be very qualified.
For some people this would be a nightmare, if they can easily feel angry or hurt when their intellect is challenged, especially when they are an “expert” on the subject.
But I suspect most people here feel the opposite. You found a flaw in my results or reasoning? Fucking awesome, you have just make me stronger.
edit: I don’t know many other online forums where this dynamic exists, so if anyone does please don’t keep it a secret.
I'd recommend https://www.readthesequences.com/ as something to test the waters; if this is your style, then you'll enjoy lesswrong.com .
This is a rather unusual book where it gives primer on probabilistic method that is actually applicable in non computer vision problems. It is Bayesian heavy and rarely touches neural networks; the book is released in 2012, the year deep learning boom started.
I love the way it’s explained there.
For example, suppose it is thinking about the hair colour and eye colour of Joe. It starts with these hypotheses about Joe's (eye colour, hair colour):
(eye colour, hair colour)
=========================
(blue, blond)
(blue, black)
(brown, blond)
(brown, black)
Suppose that it learns that blue eyed people have blond hair. It deletes hypothesis (blue, black) incompatible with it, and keeps only the hypotheses compatible with it: (blue, blond)
(brown, blond)
(brown, black)
Suppose it now learns that Joe has blue eyes. It keeps only the hypothesis compatible with it: (blue, blond)
So it has now learned the hair colour.In reality it is not true that all blue eyed people have blond hair. We change the robot's brain and give a weight to each hypothesis indicating how likely it is. Equivalently, we could insert multiple copies of each hypothesis, and the likelihood of a hypothesis is equal to the number of copies of the hypothesis.
(blue, blond): 10
(blue, black): 2
(brown, blond): 9
(brown, black): 8
Blue eyed people are more likely to be blond. Those are our hypotheses about the attributes of Joe. Suppose we now learn that Joe has blue eyes. It keeps only the hypotheses compatible with it: (blue, blond): 10
(blue, black): 2
So P(blond hair) = 10/12 and P(black hair) = 2/12. This is all Bayes' theorem is: you have a set of weighted hypotheses, and you delete hypotheses incompatible with the observed evidence. The extra factor in Bayes' theorem is only there to re-normalise the weights so that they sum to 1.Conditional probability (with some caveats that someone in the comments can fill in on):
P(a,b) = P(b,a)
P(a|b) * P(b) = P(b|a) * P(a)
P(a|b) = P(b|a) * P(a) / P(b)
a can be model and b can be data so it becomes P(model | data) =
P(data | model) * P(model) / P(data)
We have or can estimate the things on the right side. We want to ultimately get the thing on the left side. p(a and b | context c) = p(a|b,c) * p(b|c)
= p(b|a,c) * p(a|c)
or = p(a|c)*p(b|c) = p(b|c)*p(a|c) if a and b are independent of each other
so Bayes only matters when there is dependence:
p(a|b,c) = p(a|c) * p(b|a,c) / p(b|c)
otherwise it's just p(a|c) = p(a|c)
I like to put things in that order because p(a|c) is the "prior belief" and with some handwaving say things like "updated belief = prior belief and new evidence about belief".Edit typo
For anyone interested in Allen's style: http://greenteapress.com/wp/physical-modeling-in-matlab/
Do you know if he was using spaced repetition to do that? I know some teachers have tried that to speed up learning their students.
From the description:
> This book is intended for a different audience, and it has different goals. I developed it for a class at Olin College called Software Systems.
> Most students taking this class learned to program in Python, so one of the goals is to help them learn C. For that part of the class, I use Griffiths and Griffiths, Head First C, from O'Reilly Media. This book is meant to complement that one.
> Few of my students will ever write an operating system, but many of them will write low-level applications in C, and some of them will work on embedded systems. My class includes material from operating systems, networks, databases, and embedded systems, but it emphasizes the topics programmers need to know.
It's impressive not so much that he did that, but that he bothered to try.
Most lecturers (myself included) will try very hard not to learn anything about their students because they consider actually dealing with undergrads (particularly first-years!) on an individual level is beneath them.
The book I've used so far to study is "Probability and Statistics: The Science of Uncertainty", by Michael J. Evans and Jeffrey S. Rosenthal. This book is not being published anymore and is free in PDF form.
https://cran.r-project.org/web/packages/IPSUR/vignettes/IPSU...
R has builtin functions for most of your needs. You can get a lot done with very few code.
Why would you write a book that targets the Python community and ignore PEP8 styling, inconveniencing an entire community, simply because it would be too much trouble for you to change?
“Also on the topic of style, I write “Bayes’s theorem” with an s after the apostrophe, which is preferred in some style guides and deprecated in others.”
It is deprecated in all modern style guides and should not be used. You’ll get dinged in college English and writing classes for using this outdated and redundant style.
I’m sure this book is great, but, as a point of constructive criticism, I would suggest the author do a better job at adhering to the styles of code and English expected by his target audience, rather than what is comfortable for him.
"Many projects have their own coding style guidelines. In the event of any conflicts, such project-specific guides take precedence for that project."
and
"A Foolish Consistency is the Hobgoblin of Little Minds".
And, throughout, PEP8 makes it clear that it is a set of recommendations, and that if a project or community already has an established style, it need not be changed.
An Introduction to Likelihoodist, Bayesian, and Frequentist Methods
http://gandenberger.org/2014/07/28/intro-to-statistical- methods-2/
A Bayesian pollster began with a certain set of prior probabilities. That the college educated were more likely to vote in previous elections, for example, informed the sample population, because it wouldn't make much sense to ask the opinions of those who would stay home.
Thus, based on priors that were updated with new empirical data, a new set of probabilities emerged, that gave a certain candidate a high probability of victory.
Members of the voting public, aware of this high probability, decided that this meant with certainty that this candidate would win and therefore decided to stay home on election day.
In reality the Bayesian models were incorrect as amongst other factors, a much higher number of non-college educated individuals decided to vote and to vote for the other candidate.
As it is with Bayesian intelligence, shared as much by pollsters as machine learning algorithms:
Real-time heads up display
Keeps the danger away
But only for the things that already ruined your day.http://andrewgelman.com/2016/12/08/19-things-learned-2016-el...
I'll have to write another poem about pithy rebuttals that cherry-pick a counter-narrative!
Now, what rhymes with anecdotal...