It seems like probability just happens to work without explanation? Intuitively this seems a bit strange since it feels as though probability should be derived from something else. Not sure if I'm correct here.
What confuses me even more is that I do know logic can be defined in terms of probability. Causal connections can be probabilistic. If A then 25% chance of B and so on.
E.g.,
((A or B) and C) = (A and C) or (B and C)
=> P[(A or B) and C] = P[(A and C) or (B and C)]
= P[(A and C)] + P[(B and C)] - P[(A and B) and (A and C)]
= P[(A and C)] + P[(B and C)] - P[A and B and C]
= P[A|C] P[C] + P[B|C] P[C] - P[A and B and C]
Notice the last couple lines -- this is the way in which probability extends logic. In you take the limit where P[A], P[B], P[C] = 0, 1, then the probability statement reduces to the logic statement at the top.Here's the strange part. Let's say we make those probabilities to be 100% thus we have ordinary logic without probability.
Then let's create a physical closed Newtonian system: The interior of a cube with no gravity and a bunch of identical bouncing balls which obey Newtonian physics and therefore logic.
Let's say those balls all have a random initial velocity at a random direction but all those balls are initially positioned near one corner in the cube. Thus the balls from a position stand point start with low entropy. I use the term random, but it's not really random as you have perfect knowledge of these numbers, you know what they all are.
As time increases, entropy increases. The balls as a system begin to increase in entropy. Entropy continues to rise towards an equilibrium.
That is the question. Entropy is a probabilistic phenomenon. It occurs because ball configurations that are spread out have higher probabilities of occurring then balls that are concentrated in a corner. Thus given enough time the balls occupy the higher entropy state.
My question is, WHY does this occur. Probability (aka Entropy) is ARISING out of a perfect Newtonian system following perfect logic without any probabilistic extensions.
Can entropy be derived from Newtonian physics? The better question is, can probability be derived from Newtonian physics because entropy is in actuality a phenomenon of probability?
Any system that uses pure logic (and completely avoids probability in any part of its definition) exhibits this sort of rule as long as you introduce a sort of randomness into initial parameters above. Probability permeates everything.
From George Boole's The Laws of Thought, p.244: "Probability is expectation founded upon partial knowledge. A perfect acquaintance with _all_ the circumstances affecting the occurence of an event would change expectation into certainty, and leave neither room nor demand for a theory of probabilities."
Can we deduce from this that nature is not probabilistic?
In your proposed flipping model, there are likely to be very small physical imprecisions (vibrations in the flipper, say, drifting tension of some kind of spring or actuator, or small amounts of circulating air, or perhaps tiny imprecisions in the way the coin is loaded into a slot). The machine might always flip heads, but it's still possible to say that whatever arbitrary degree of certainty you need to model the coin's behaviour in the air to achieve 100% accuracy, there could still be arbitrarily smaller error below that threshold, and we'd view this as "randomness" even if it isn't by the laws of physics.
It's important to emphasise that in terms of propensity, it doesn't matter whether or not the event has occurred, what matters is your knowledge about it. A flipped fair coin has a definite side up (as can be verified by a silent third observer) but for you, who has not yet observed which side it is, your best guess is still either side with 50 % probability.
Similarly, if you only know there's a soccer game going on, you might guess that the stronger team will win with 60 % probability (based on historic frequencies of exchangeable situations), but someone who has seen the score and knows the weaker team has a lead and knows there's only a few minutes left of the game will judge there to be a 2 % probability the stronger team wins. Same situation, different information, different judgements.
That's the first meaning of "probability". What we also mean with that word is "the rules of probabilistic calculation". These are based on mathematical ideas like coherency (if one of two things can happen, their probabilities should add up to 100 %) and can definitely be taken as axiomatic.
All of this is not an answer to your question, but it might make the discussion richer.
The Monty Hall Problem is a great example of this.
So I would say the more fundamental thing might be information theory.
This my layman view. Not an expert.
Probability is often being misused to say things about reality though. You see that especially in computer simulation whether used in economics, weather etc.
Different initial conditions are put into the models and simulated. And the probability is calculated based on what the majority of those models say.
But those initial conditions are guesses not actual objective explanations. If they were you only needed to run one simulation rather than a range.
A lot of statistics is pure placebo. Purely retrospective.
In reality it either is or it isn't. If you have good explanations like we do in physics you don't need probability.
David Deutsch IMO has the most sane rebuttal of the probability.
http://www.daviddeutsch.org.uk/2014/08/simple-refutation-of-...
So... are you saying Statistical Mechanics (to give an example) is not part of Physics?
In real life, the amount of information you have (and can have) about a physical process is limited. You can either throw your hands up and say "we can't know for sure", or you can use probabilities to try to get somewhere.
How do you define the position of an electron without using probabilities?
Who is we? A lot of physics textbooks have a good amount of probability at their core.
You mean good explanations and observations? You're absolutely right in that if you are able to observe all the relevant information with no noise, you don't need probability. But there are a lot of systems where you can't noiselessly observe what you want – this is where probability is important.
I tried to be a physics major but could not swallow all the daily really stupid mistakes such as this one by Feynman I got each physics lecture and I didn't have time both to learn the physics AND to clean up the sloppy math. So, I majored in math.
As I learned the math, from some of the best sources, I came to understand just how just plain awful the math of the physics community is.
Then in one of the Adams lectures on quantum mechanics at MIT I saw some of the reason: The physics community takes pride in doing junk math. They get by with it because they won't take the math seriously anyway, that is, they insist on experimental evidence. So, to them, the math can be just a heuristic, a hint for some guessing.
Students need to be told this, in clear terms, early on.
It went on this way: In one of the lectures from MIT a statement was that the wave functions were differentiable and also continuous. Of COURSE they are continuous -- every differentiable function is continuous.
The lectures made a total mess out of correlation and independence. It looks like Adams does not understand the two or their difference clearly.
There was more really sloppy stuff around Fourier theory. I got my Fourier theory from three of W. Rudin's books. It looks like at MIT they get Fourier theory from a comic book.
I got sick, really sick, of the math in physics. Feynman on probability is just one example.
I used to work in an area of applied probability where some statistical-mechanics principles were applicable. I'd read papers where authors were making analogies of a large neural network to a stat-mech system, using an applicable stat-mech approximation, and then differentiating that approximation to get a probability bound.
It gave interesting results, and did show you something about the original problem that was hard to get by sticking to the original formalism. But at the end of the day, you really would not bet the farm on the truth of those approximations...
On the other hand, Fourier analysis was originally doubted and scorned by mathematics, but (if I'm remembering the story correctly) ended up being used so much that theory was developed to explain in what sense the Fourier transform approximates the original function.
Another example of the interplay between physics and mathematics is the percolation problem, where there was a kind of archipelago of physics-motivated results that probabilists have been trying to tidy up for decades now. E.g., sec. 1.2 of: https://www.unige.ch/~duminil/publi/2018ICM.pdf
Principles does Fourier series and does not use measure theory. Real and Complex Analysis does the Fourier transform and uses the Lebesgue integral, that is, measure theory. Functional Analysis does Fourier theory with distributions.
I had serious courses from Principles and R & CA. Also Royden and Neveu. All from a star student of Cinlar, long at Princeton. I read FA, quickly. I don't much care to bother with distributions.
I got into applications via the fast Fourier transform and power spectral estimation as in Blackman and Tukey.
There is an intuitive view that sort of works: The given function is a point in a vector space, with an inner product. The sine waves are coordinate axes. They are orthogonal. The Fourier things are projections of the point onto the coordinate axes. The inverse Fourier thing reconstructs the original function. If use only some of the coordinate axes, then get a least squares approximation of the original function. This all works exactly in ordinary linear algebra, e.g., as in Halmos, Finite Dimensional Vector Spaces, sometimes given to physics students studying quantum mechanics.
But these Fourier things have infinitely many coordinate axes, countably infinite for Fourier series and uncountably infinite for the transform, and there the finite dimensional things don't always work. So, Rudin has to be very careful in presenting what does work and proving it -- so with the details it's not easy reading. What fails is a fairly general situation in a fully general Hilbert space. E.g., are not locally compact, but can get some help from a clever use of the parallolgram inequality (somewhat relevant in my startup).
I'm no teacher. I do not now nor have I ever had any desire to be a prof.
For more, get copies of Rudin's books and dig in.
Wonder of wonders, not all physics profs have done that.
Principles does Fourier series and does not use measure theory. Real and Complex Analysis does the Fourier transform and uses the Lebesgue integral, that is, measure theory. Functional Analysis does Fourier theory with distributions.
I had serious courses from Principles and R & CA. Also Royden and Neveu. All from a star student of Cinlar, long at Princeton. I read FA, quickly. I don't much care to bother with distributions.
I got into applications via the fast Fourier transform and power spectral estimation as in Blackman and Tukey.
There is an intuitive view that sort of works: The given function is a point in a vector space, with an inner product. The sine waves are coordinate axes. They are orthogonal. The Fourier things are projections of the point onto the coordinate axes. The inverse Fourier thing reconstructs the original function. If use only some of the coordinate axes, then get a least squares approximation of the original function. This all works exactly in ordinary linear algebra, e.g., as in Halmos, Finite Dimensional Vector Spaces, sometimes given to physics students studying quantum mechanics.
But these Fourier things have infinitely many coordinate axes, countably infinite for Fourier series and uncountably infinite for the transform, and there the finite dimensional things don't always work. So, Rudin has to be very careful in presenting what does work and proving it -- so with the details it's not easy reading. What fails is a fairly general situation in a fully general Hilbert space. E.g., are not locally compact, but can get some help from a clever use of the parallolgram inequality (somewhat relevant in my startup).
I'm no teacher. I do not now nor have I ever had any desire to be a prof.
For more, get copies of Rudin's books and dig in.
Wonder of wonders, not all physics profs have done that.
I would rather consider that because it seems that you "need" a uniform distribution for a particle of unknown location, it might makes sense for such applications from physics to weaken the property that a probability measure has to be σ-additive to that a probability measure has to be additive. Then it should be possible to define such a "uniform probability 'measure' over all space", perhaps similarly to the example given at
> https://en.wikipedia.org/w/index.php?title=Sigma-additive_se...
In that case, in which locations is the density higher and in which is it lower?
Not such a particle can exist then.
You’re assuming infinite space though. Did he?
It's not a mistake, it's a "lie-to-children" fundamentally no different from an intro analysis class talking about "the" real numbers. Freshmen aren't ready for model theory, and they're not ready for rigged Hilbert spaces.
Hilbert space? A complete inner product space where complete means every Cauchy convergent sequence converges. The real numbers serve as one example. With the usual inner product, R^3 (where R is the set of real numbers) is another example.
If space is bounded then such a density can exist.
n >= 2 / p
cubic inches. Then the probability that the particle is in those n cubic inches is
np >= 2 > 1
greater than 1, a contradiction. Done.
One well considered and informed explanation is that the physics community abuses its students.
https://quantumfrontiers.com/2018/12/23/chasing-ed-jayness-g...
Jaynes about Probability in Science:
https://www.cambridge.org/gb/academic/subjects/physics/theor...
Information Processing: The Maximum Entropy Principle https://a.co/d/71tL5bw
He really takes apart the maximum entropy principle in a comprehensible way, to the point where one can see how to apply it to new problems.
(the volumes I and III are also good but not strictly necessary)
https://www.fooledbyrandomness.com/blog/2021/09/07/estimatin...
“Yet, when all is said and done, we find ourselves, to our own surprise, in agreement with Kolmogorov and in disagreement with his critics, on nearly all technical issues. As noted in Appendix A, each of his axioms turns out to be, for all practical purposes, derivable from the Polya–Cox desiderata of rationality and consistency. In short, we regard our system of probability as not contradicting Kolmogorov's; but rather seeking a deeper logical foundation that permits its extension in the directions that are needed for modern applications.”
https://physics.stackexchange.com/questions/233203/has-jayne...
Unfortunately, those who like statistical mechanics seem few and far between. :(
“Ludwig Boltzmann, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on the work, died similarly in 1933. Now it is our turn to study statistical mechanics.”
David L. Goodstein, States of Matter
Less facetiously, I think Jaynes is becoming better known as Bayesian techniques have become more mainstream.
One day, if I really get into quantum mechanics, I will try to understand how they rebuilt maxwell equations from QED.
This derivation is in the context of classical field theory, but QED is only a short hop away through path integrals.
It’s quite remarkable how the complexity of Maxwell’s equations can be reduced to a single term in the Lagrangian - (F_uv)(F^uv), assuming no charges. That’s really it!
This can be explained through phase decoherence. As temperature rises, random phase shifts are introduced, which effectively removes the quantum effect. You can show mathematically how this works.
Consider the young experiment:
For a plane wave ψ ~ e^(ipx/ħ-iωt), the wave function at X is the sum of two components
<X|ψ> = <X|P> + <X|Q>
Where for some path-independent normalization function ψ(X,t), and using the small angle assumption (QX-PX = 2Xa/L), the components are:
1 ipXa/ħL
<X|P>= ψ(X,t)- e
2
1 -ipXa/ħL
<X|Q>= ψ(X,t) - e
2
And the probability of finding the particle at X is 2 2 2 pXa
|<X|ψ>| = |ψ(X,t)| cos -----
ħL
That is what you'd expect from the Young experiment. If we introduce a constant phase shift ϕ between P and Q, you get this average instead: 2 2 2 pXa
|<X|ψ>| = |ψ(X,t)| cos (--- + ϕ)
ħL
If this phase shift is instead random, the formula becomes 2 1 ^ pXa 2
|<X|ψ>| - (1 + | dϕ P(ϕ)cos(2 --- + 2ϕ)) |ψ(X,t)|
2 v ħL
Where P(ϕ) is a probability function for the phase shift. If the probability function is flat, the integral is zero since you're integrating the cosine across its domain. What you get is the classical result! 2 2
|<X|ψ>| = |ψ(X,t)|
You can even re-phrase random phase shifts into a diffusion equation, and find that given α as the diffusion coefficient 2 2 1 -αt 2 pXa
|<X|ψ>| = |ψ(X,t)| - (1 + e cos (--- + ϕ) )
2 ħL
i.e. the transition behavior from quantum to classical dependent on a direct measure of the decoherence!α small => quantum result, α = large, classical result.
Either way, this is a great chapter on probability. Thanks to whoever wrote it!
I should probably give some evidence to back up my claim that Feynman didn’t write all of the Lectures, but alas, it’s late. I think the credits for the rest of the authors were in the preface, or at the end. I just wish they’d gotten a little more credit.
EDIT: Aha: https://en.m.wikipedia.org/wiki/Shakespeare_authorship_quest...
Well, this is a fascinating rabbit hole. Apparently there’s some question whether Shakespeare himself was literate, since his parents and daughters seemingly weren’t.
I recommend you start with these basic theoretical books to get a sense of what it's all built on. But then if you want more practical advice about how to handle things, books on sampling theory tend to hit a sweetspot between theory and practise, in my experience. I like Sampling Techniques (Cochran, 1953) and Sampling of Populations (Levy & Lemeshow, 2013).