Think Bayes: Bayesian Statistics Made Simple (2012) (opens in new tab)

(greenteapress.com)

404 pointsmycat8y ago56 comments

56 comments

47 comments · 11 top-level

boostedsignal8y ago· 9 in thread

For those unclear on the concrete (rather than philosophical) difference between Bayesian and frequentist statistics in the first place, I hope it's not inappropriate for me to share this 5-minute example that I wrote a while back: https://news.ycombinator.com/item?id=11096129

wodenokoto8y ago

You write that the frequentist doesn't answer the question, but it does. It answers

    P(H') = (H/H+T)^H'

You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.

If the goal of the article is to show that Bayesian is more correct than frequentist then it leaves the reader unconvinced. If the goal is to show 3 ways of finding a probability, you should either say each is fine under its own paradigm, or argue why only one paradigm is correct.

justinpombrio8y ago

> You write that the frequentist doesn't answer the question, but it does. It answers > P(H') = (H/H+T)^H'

That's not the probability of getting H' heads in a row. It's an estimate of the probability of getting H' heads in a row based on a Maximum Likelihood estimation.

It doesn't make much sense if you take it to be the probability of getting H' heads in a row. For example, if {H=1, T=0}, then P(H'=100) = 1. You looked at one flip, and then decided that every subsequent flip was guaranteed to be heads?

It becomes even more clear that the question isn't really being answered if you take {H=0, T=0}.

1 more reply

boostedsignal8y ago

> You write that the frequentist doesn't answer the question, but it does. It answers: P(H') = (H/H+T)^H'

The question was asking for P(H' | H, T), not P(H').

> You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.

Because there is no error? In the proof I assume P(p) is known and then after that every step follows from a law of probability. There is no error to be accounted for in the procedure. The only caveat is that we need to know P(p) to be able to perform the procedure, which is a caveat that I point out at least 3 times in the page.

2 more replies

badminton18y ago

Another problem for the less familiar with the Bayes theorem is what is described as the "Bayesian trap", explained by the youtuber Veritasium: https://www.youtube.com/watch?v=R13BD8qKeTg

WhitneyLand8y ago

I hope people agree it’s totally appropriate, and appreciated, thank you for reposting it.

This is most of the reason I come here, because people show the good will to share bits of knowledge and experience.

Then a whole other benefit, is that when people are willing to do this, their contribution might be critiqued or corrected, which can then sharpen or polish your knowledge and thinking even in areas where you might be very qualified.

For some people this would be a nightmare, if they can easily feel angry or hurt when their intellect is challenged, especially when they are an “expert” on the subject.

But I suspect most people here feel the opposite. You found a flaw in my results or reasoning? Fucking awesome, you have just make me stronger.

edit: I don’t know many other online forums where this dynamic exists, so if anyone does please don’t keep it a secret.

taserian8y ago

You might want to check out lesswrong.com

I'd recommend https://www.readthesequences.com/ as something to test the waters; if this is your style, then you'll enjoy lesswrong.com .

mycatOP8y ago

For me, this book's first chapters explained nicely about ML, MAP and Bayesian using real computer vision problems. The author included helpful visual aids (gaussian plots, contour plots, filters output, etc) http://www.computervisionmodels.com

This is a rather unusual book where it gives primer on probabilistic method that is actually applicable in non computer vision problems. It is Bayesian heavy and rarely touches neural networks; the book is released in 2012, the year deep learning boom started.

boostedsignal8y ago

Yeah, there are lots of good books out there. My goal was to get the point across in a 5-minute read (give or take).

yalph8y ago

This is great thanks for posting.

epalmer8y ago· 8 in thread

My youngest has Allen Downey as a professor this year. She says he is crazy. And she means this in the best way possible. His productivity is prolific having written Think Java in 13 days. He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

Edit typo

keerthiko8y ago

Allen's classes were always some of the most over-enrolled ever since I can remember at Olin =).

arcticfox8y ago

My favorite Allen memory was a modeling contest between him and Mark Somerville. (Physical modeling, of course.) The result was basically a draw, but their approaches were totally different: Allen's was beautifully simple as usual, Mark's brilliantly complex.

For anyone interested in Allen's style: http://greenteapress.com/wp/physical-modeling-in-matlab/

gwern8y ago

> He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

Do you know if he was using spaced repetition to do that? I know some teachers have tried that to speed up learning their students.

epalmer8y ago

I don't know. But he is so deep and broad in his knowledge that he may have a memory that exceeds mere mortal memory. This is a book he is working on now http://greenteapress.com/thinkos/index.html

From the description:

> This book is intended for a different audience, and it has different goals. I developed it for a class at Olin College called Software Systems.

> Most students taking this class learned to program in Python, so one of the goals is to help them learn C. For that part of the class, I use Griffiths and Griffiths, Head First C, from O'Reilly Media. This book is meant to complement that one.

> Few of my students will ever write an operating system, but many of them will write low-level applications in C, and some of them will work on embedded systems. My class includes material from operating systems, networks, databases, and embedded systems, but it emphasizes the topics programmers need to know.

sn98y ago

I would assume so. I can't imagine how you could accomplish that otherwise without having a photographic memory.

1 more reply

jabretti8y ago

>He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

It's impressive not so much that he did that, but that he bothered to try.

Most lecturers (myself included) will try very hard not to learn anything about their students because they consider actually dealing with undergrads (particularly first-years!) on an individual level is beneath them.

epalmer8y ago

At Olin everyone is an undergraduate. Olin is about reinventing engineering education. They consider faculty as very important to the process but they are guides not instructors. The students most of the time have to seek information and approaches out.

WoodenChair8y ago

Sad to know that attitude pervades higher ed. Another reason students are well served choosing a teaching college for undergraduate instead of a research university.

baxtr8y ago· 5 in thread

I regularly forget how Bayes works. Everytime that happens I browse up to that page: https://www.bayestheorem.net/

I love the way it’s explained there.

jules8y ago

You can also think about Bayes' theorem as follows. Suppose we have a logical robot trying to learn about the world. The robot has a collection of hypotheses in its brain. Every time it observes a new fact, it deletes all hypotheses that are incompatible with that fact.

For example, suppose it is thinking about the hair colour and eye colour of Joe. It starts with these hypotheses about Joe's (eye colour, hair colour):

    (eye colour, hair colour)
    =========================
    (blue, blond)
    (blue, black)
    (brown, blond)
    (brown, black)

Suppose that it learns that blue eyed people have blond hair. It deletes hypothesis (blue, black) incompatible with it, and keeps only the hypotheses compatible with it:

    (blue, blond)
    (brown, blond)
    (brown, black)

Suppose it now learns that Joe has blue eyes. It keeps only the hypothesis compatible with it:

    (blue, blond)

So it has now learned the hair colour.

In reality it is not true that all blue eyed people have blond hair. We change the robot's brain and give a weight to each hypothesis indicating how likely it is. Equivalently, we could insert multiple copies of each hypothesis, and the likelihood of a hypothesis is equal to the number of copies of the hypothesis.

    (blue, blond):  10
    (blue, black):  2
    (brown, blond): 9
    (brown, black): 8

Blue eyed people are more likely to be blond. Those are our hypotheses about the attributes of Joe. Suppose we now learn that Joe has blue eyes. It keeps only the hypotheses compatible with it:

    (blue, blond):  10
    (blue, black):  2

So P(blond hair) = 10/12 and P(black hair) = 2/12. This is all Bayes' theorem is: you have a set of weighted hypotheses, and you delete hypotheses incompatible with the observed evidence. The extra factor in Bayes' theorem is only there to re-normalise the weights so that they sum to 1.

Gravityloss8y ago

How Bayes kinda works, or how I see it.

Conditional probability (with some caveats that someone in the comments can fill in on):

    P(a,b) = P(b,a)
    P(a|b) * P(b) = P(b|a) * P(a)
    P(a|b) = P(b|a) * P(a) / P(b)

a can be model and b can be data so it becomes

    P(model | data) =
    P(data | model) * P(model) / P(data)

We have or can estimate the things on the right side. We want to ultimately get the thing on the left side.

Jach8y ago

To clear up your first set to have conditional probabilities for everything, Bayes' theorem is just a restatement of the product rule:

    p(a and b | context c) = p(a|b,c) * p(b|c)
                           = p(b|a,c) * p(a|c)
    or = p(a|c)*p(b|c) = p(b|c)*p(a|c) if a and b are independent of each other

    so Bayes only matters when there is dependence:
    p(a|b,c) = p(a|c) * p(b|a,c) / p(b|c)

    otherwise it's just p(a|c) = p(a|c)

I like to put things in that order because p(a|c) is the "prior belief" and with some handwaving say things like "updated belief = prior belief and new evidence about belief".

pletnes8y ago

Mathematically trivial, but great notation to explain the point of Bayes. Brilliant!

submeta8y ago

Excellent! Thanks for sharing.

innocentoldguy8y ago· 5 in thread

“I broke this rule because I developed some of the code while I was a Visiting Scientist at Google, so I followed the Google style guide, which deviates from PEP 8 in a few places. Once I got used to Google style, I found that I liked it. And at this point, it would be too much trouble to change.”

Why would you write a book that targets the Python community and ignore PEP8 styling, inconveniencing an entire community, simply because it would be too much trouble for you to change?

“Also on the topic of style, I write “Bayes’s theorem” with an s after the apostrophe, which is preferred in some style guides and deprecated in others.”

It is deprecated in all modern style guides and should not be used. You’ll get dinged in college English and writing classes for using this outdated and redundant style.

I’m sure this book is great, but, as a point of constructive criticism, I would suggest the author do a better job at adhering to the styles of code and English expected by his target audience, rather than what is comfortable for him.

leephillips8y ago

From the PEP8 style guide:

"Many projects have their own coding style guidelines. In the event of any conflicts, such project-specific guides take precedence for that project."

and

"A Foolish Consistency is the Hobgoblin of Little Minds".

And, throughout, PEP8 makes it clear that it is a set of recommendations, and that if a project or community already has an established style, it need not be changed.

innocentoldguy8y ago

You misunderstand. I’m not criticizing Google not following PEP8. They’re welcome to make whatever modifications they want. I do the same. For example, I don’t like having two blank lines between methods and I don’t limit my line widths to 80 characters. This personal or project level alteration is fine. However, when your target audience is the Python community at large, you are better off following the PEP8 standard, which everyone knows and is comfortable with, rather that a project specific format, just because you personally find it more convenient. Standards are pretty important in the publishing industry, so I’m not sure why this is so controversial here today.

franklin_cobb8y ago

Why are you arguing against PEP8? As you mentioned in your final sentence, the Python community DOES have an established standard. It is called PEP8. The parent has made a valid point. Why would you criticize or trash his "karma" for stating it?

1 more reply

TFortunato8y ago

Others have already made the point about PEP8 being a guideline, so I just wanted to also point out that not all style guides would agress with you on Bayes'/'s theorem. Case in point, the APA style guide: http://blog.apastyle.org/apastyle/2013/06/forming-possessive...

bubblesocks8y ago

I'm not sure why you were down-voted. This is a valid point and as a college professor and author, I'm sure Downey would appreciate any feedback that would make his book better.

1 more reply

_0w8t8y ago· 4 in thread

For me the best so far book on Bayesian probability was "Probability Theory: The Logic of Science: Principles and Elementary Applications" by E. T. Jaynes.

The book starts from the deduction of Bayesian theorem from the first principles of logic and shows its applications to a wide range of topics. There is thorough discussion of various "paradoxes" and the author sharply criticizes the frequentist statistics. In addition there is a lot of historical references.

CalChris8y ago

Probability Theory is available as a PDF.

http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

rtehfm8y ago

Thanks. I was about to buy the book on Amazon for ~$90 but at least now I can give it a read first to see if it's worth, for me at least, buying.

vowelless8y ago

Great book. Unfortunately Jaynes passed away before he could finish it. Still, a great take on probability theory.

xelxebar8y ago

I came here to recommend this book too! It's a text that definitely allows one to go as deep as they wish very fleshed out references, a good appendix, and lots of comments on directions that can be explored more deeply.

It's a book I'm happy to have in dead tree form on my shelf.

badminton18y ago· 2 in thread

Thanks for posting this. The Jupyter notebooks (and the fact Github has built-in support for them) really help illustrating the concepts.

The book I've used so far to study is "Probability and Statistics: The Science of Uncertainty", by Michael J. Evans and Jeffrey S. Rosenthal. This book is not being published anymore and is free in PDF form.

bhattisatish8y ago

The book you mentioned is available at http://www.utstat.toronto.edu/mikevans/jeffrosenthal/

badminton18y ago

May also want to take a look at: "Introduction to Statistics and Probability using R"

https://cran.r-project.org/web/packages/IPSUR/vignettes/IPSU...

R has builtin functions for most of your needs. You can get a lot done with very few code.

1 more reply

folksinger8y ago· 2 in thread

Let's take a recent election as an example:

A Bayesian pollster began with a certain set of prior probabilities. That the college educated were more likely to vote in previous elections, for example, informed the sample population, because it wouldn't make much sense to ask the opinions of those who would stay home.

Thus, based on priors that were updated with new empirical data, a new set of probabilities emerged, that gave a certain candidate a high probability of victory.

Members of the voting public, aware of this high probability, decided that this meant with certainty that this candidate would win and therefore decided to stay home on election day.

In reality the Bayesian models were incorrect as amongst other factors, a much higher number of non-college educated individuals decided to vote and to vote for the other candidate.

As it is with Bayesian intelligence, shared as much by pollsters as machine learning algorithms:

  Real-time heads up display
  Keeps the danger away
  But only for the things that already ruined your day.

clircle8y ago

I suppose you aren't talking about Andrew Gelman... https://www.nytimes.com/interactive/2016/09/20/upshot/the-er...

folksinger8y ago

You mean the same Andrew Gelman who did not predict the election of Trump and took the time to reflect on the issues with polling methodology?

http://andrewgelman.com/2016/12/08/19-things-learned-2016-el...

I'll have to write another poem about pithy rebuttals that cherry-pick a counter-narrative!

Now, what rhymes with anecdotal...

nafizh8y ago· 1 in thread

previous discussion:

https://news.ycombinator.com/item?id=4634843

stablemap8y ago

With some comments from the author. Hopefully he’ll see this too.

emerged8y ago

My introduction to Bayesian probability was accidentally reinventing it while trying to invent my own AI system. It naturally followed by constructing a network of information which could be queried to get back whatever had been fed into it and perform deduction/induction.

platz8y ago

I was surprised to learn about the 'Likelihoodist', an interpretation of bayes that avoids ambiguities with choosing a prior.

An Introduction to Likelihoodist, Bayesian, and Frequentist Methods

http://gandenberger.org/2014/07/28/intro-to-statistical- methods-2/

raister8y ago

It should also have one called "Think Markov Chain Monte Carlo" - even the simplest reference is intractable and others began very simplistically and ends incomprehensible enough to disgust the subject altogether.

j / k navigate · click thread line to collapse

56 comments

47 comments · 11 top-level

boostedsignal8y ago· 9 in thread

wodenokoto8y ago

You write that the frequentist doesn't answer the question, but it does. It answers

    P(H') = (H/H+T)^H'

You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.

justinpombrio8y ago

> You write that the frequentist doesn't answer the question, but it does. It answers > P(H') = (H/H+T)^H'

That's not the probability of getting H' heads in a row. It's an estimate of the probability of getting H' heads in a row based on a Maximum Likelihood estimation.

It becomes even more clear that the question isn't really being answered if you take {H=0, T=0}.

1 more reply

boostedsignal8y ago

> You write that the frequentist doesn't answer the question, but it does. It answers: P(H') = (H/H+T)^H'

The question was asking for P(H' | H, T), not P(H').

> You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.

2 more replies

badminton18y ago

Another problem for the less familiar with the Bayes theorem is what is described as the "Bayesian trap", explained by the youtuber Veritasium: https://www.youtube.com/watch?v=R13BD8qKeTg

WhitneyLand8y ago

I hope people agree it’s totally appropriate, and appreciated, thank you for reposting it.

This is most of the reason I come here, because people show the good will to share bits of knowledge and experience.

For some people this would be a nightmare, if they can easily feel angry or hurt when their intellect is challenged, especially when they are an “expert” on the subject.

But I suspect most people here feel the opposite. You found a flaw in my results or reasoning? Fucking awesome, you have just make me stronger.

edit: I don’t know many other online forums where this dynamic exists, so if anyone does please don’t keep it a secret.

taserian8y ago

You might want to check out lesswrong.com

I'd recommend https://www.readthesequences.com/ as something to test the waters; if this is your style, then you'll enjoy lesswrong.com .

mycatOP8y ago

boostedsignal8y ago

Yeah, there are lots of good books out there. My goal was to get the point across in a 5-minute read (give or take).

yalph8y ago

This is great thanks for posting.

epalmer8y ago· 8 in thread

Edit typo

keerthiko8y ago

Allen's classes were always some of the most over-enrolled ever since I can remember at Olin =).

arcticfox8y ago

For anyone interested in Allen's style: http://greenteapress.com/wp/physical-modeling-in-matlab/

gwern8y ago

> He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

Do you know if he was using spaced repetition to do that? I know some teachers have tried that to speed up learning their students.

epalmer8y ago

I don't know. But he is so deep and broad in his knowledge that he may have a memory that exceeds mere mortal memory. This is a book he is working on now http://greenteapress.com/thinkos/index.html

From the description:

> This book is intended for a different audience, and it has different goals. I developed it for a class at Olin College called Software Systems.

sn98y ago

I would assume so. I can't imagine how you could accomplish that otherwise without having a photographic memory.

1 more reply

jabretti8y ago

>He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

It's impressive not so much that he did that, but that he bothered to try.

epalmer8y ago

WoodenChair8y ago

Sad to know that attitude pervades higher ed. Another reason students are well served choosing a teaching college for undergraduate instead of a research university.

baxtr8y ago· 5 in thread

I regularly forget how Bayes works. Everytime that happens I browse up to that page: https://www.bayestheorem.net/

I love the way it’s explained there.

jules8y ago

For example, suppose it is thinking about the hair colour and eye colour of Joe. It starts with these hypotheses about Joe's (eye colour, hair colour):

    (eye colour, hair colour)
    =========================
    (blue, blond)
    (blue, black)
    (brown, blond)
    (brown, black)

Suppose that it learns that blue eyed people have blond hair. It deletes hypothesis (blue, black) incompatible with it, and keeps only the hypotheses compatible with it:

    (blue, blond)
    (brown, blond)
    (brown, black)

Suppose it now learns that Joe has blue eyes. It keeps only the hypothesis compatible with it:

    (blue, blond)

So it has now learned the hair colour.

    (blue, blond):  10
    (blue, black):  2
    (brown, blond): 9
    (brown, black): 8

Blue eyed people are more likely to be blond. Those are our hypotheses about the attributes of Joe. Suppose we now learn that Joe has blue eyes. It keeps only the hypotheses compatible with it:

    (blue, blond):  10
    (blue, black):  2

Gravityloss8y ago

How Bayes kinda works, or how I see it.

Conditional probability (with some caveats that someone in the comments can fill in on):

    P(a,b) = P(b,a)
    P(a|b) * P(b) = P(b|a) * P(a)
    P(a|b) = P(b|a) * P(a) / P(b)

a can be model and b can be data so it becomes

    P(model | data) =
    P(data | model) * P(model) / P(data)

We have or can estimate the things on the right side. We want to ultimately get the thing on the left side.

Jach8y ago

To clear up your first set to have conditional probabilities for everything, Bayes' theorem is just a restatement of the product rule:

    p(a and b | context c) = p(a|b,c) * p(b|c)
                           = p(b|a,c) * p(a|c)
    or = p(a|c)*p(b|c) = p(b|c)*p(a|c) if a and b are independent of each other

    so Bayes only matters when there is dependence:
    p(a|b,c) = p(a|c) * p(b|a,c) / p(b|c)

    otherwise it's just p(a|c) = p(a|c)

I like to put things in that order because p(a|c) is the "prior belief" and with some handwaving say things like "updated belief = prior belief and new evidence about belief".

pletnes8y ago

Mathematically trivial, but great notation to explain the point of Bayes. Brilliant!

submeta8y ago

Excellent! Thanks for sharing.

innocentoldguy8y ago· 5 in thread

Why would you write a book that targets the Python community and ignore PEP8 styling, inconveniencing an entire community, simply because it would be too much trouble for you to change?

“Also on the topic of style, I write “Bayes’s theorem” with an s after the apostrophe, which is preferred in some style guides and deprecated in others.”

It is deprecated in all modern style guides and should not be used. You’ll get dinged in college English and writing classes for using this outdated and redundant style.

leephillips8y ago

From the PEP8 style guide:

"Many projects have their own coding style guidelines. In the event of any conflicts, such project-specific guides take precedence for that project."

and

"A Foolish Consistency is the Hobgoblin of Little Minds".

And, throughout, PEP8 makes it clear that it is a set of recommendations, and that if a project or community already has an established style, it need not be changed.

innocentoldguy8y ago

franklin_cobb8y ago

1 more reply

TFortunato8y ago

bubblesocks8y ago

I'm not sure why you were down-voted. This is a valid point and as a college professor and author, I'm sure Downey would appreciate any feedback that would make his book better.

1 more reply

_0w8t8y ago· 4 in thread

For me the best so far book on Bayesian probability was "Probability Theory: The Logic of Science: Principles and Elementary Applications" by E. T. Jaynes.

CalChris8y ago

Probability Theory is available as a PDF.

http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

rtehfm8y ago

Thanks. I was about to buy the book on Amazon for ~$90 but at least now I can give it a read first to see if it's worth, for me at least, buying.

vowelless8y ago

Great book. Unfortunately Jaynes passed away before he could finish it. Still, a great take on probability theory.

xelxebar8y ago

It's a book I'm happy to have in dead tree form on my shelf.

badminton18y ago· 2 in thread

Thanks for posting this. The Jupyter notebooks (and the fact Github has built-in support for them) really help illustrating the concepts.

bhattisatish8y ago

The book you mentioned is available at http://www.utstat.toronto.edu/mikevans/jeffrosenthal/

badminton18y ago

May also want to take a look at: "Introduction to Statistics and Probability using R"

https://cran.r-project.org/web/packages/IPSUR/vignettes/IPSU...

R has builtin functions for most of your needs. You can get a lot done with very few code.

1 more reply

folksinger8y ago· 2 in thread

Let's take a recent election as an example:

Thus, based on priors that were updated with new empirical data, a new set of probabilities emerged, that gave a certain candidate a high probability of victory.

Members of the voting public, aware of this high probability, decided that this meant with certainty that this candidate would win and therefore decided to stay home on election day.

In reality the Bayesian models were incorrect as amongst other factors, a much higher number of non-college educated individuals decided to vote and to vote for the other candidate.

As it is with Bayesian intelligence, shared as much by pollsters as machine learning algorithms:

  Real-time heads up display
  Keeps the danger away
  But only for the things that already ruined your day.

clircle8y ago

I suppose you aren't talking about Andrew Gelman... https://www.nytimes.com/interactive/2016/09/20/upshot/the-er...

folksinger8y ago

You mean the same Andrew Gelman who did not predict the election of Trump and took the time to reflect on the issues with polling methodology?

http://andrewgelman.com/2016/12/08/19-things-learned-2016-el...

I'll have to write another poem about pithy rebuttals that cherry-pick a counter-narrative!

Now, what rhymes with anecdotal...

nafizh8y ago· 1 in thread

previous discussion:

https://news.ycombinator.com/item?id=4634843

stablemap8y ago

With some comments from the author. Hopefully he’ll see this too.

emerged8y ago

platz8y ago

I was surprised to learn about the 'Likelihoodist', an interpretation of bayes that avoids ambiguities with choosing a prior.

An Introduction to Likelihoodist, Bayesian, and Frequentist Methods

http://gandenberger.org/2014/07/28/intro-to-statistical- methods-2/

raister8y ago

j / k navigate · click thread line to collapse