Additionally, the German army command didn't think that way. Where the US relied on overpowering by materiel dominance, and the Soviets fought and won through unimaginable human sacrifice, the considerable initial success of the German army was based on better, smarter tactics, individual leadership, bravery, ruthlessness, etc. The leadership assumed they'd be able to win the war that way, even when the war had turned into a much more industrial operation.
You can see that in operations such as the Battle of the Bulge, the war in Normandy, and most importantly in the the Russian campaign.
This is of course over-generalizing, but I believe the general mode of thinking was there, and that'd explain the lack of attention on such details.
Second, all calculations were done by hand in those days (and documents that weren't printed in bulk had tp be retyped by hand) so sequential numbers were not only easier to issue but to track (e.g. if you have a production problem you can say "let's check all tanks with S/Ns between A and B" rather than having to maintain a list mapping production dates to serial numbers that might be in a file cabinet somewhere distant from where you are.
Because there weren't well-known examples of the risk of not doing that, and not doing it is the easy and obvious thing if you have no clear reason to do it, and makes lots of things you might use those numbers for yourself easier (and if it wasn't for your own use, you wouldn't issue the numbers at all.)
The Germans did (eventually) make some effort to obscure details of their supply chain--they forced manufacturers to use three-letter codes instead of their normal trademarks--but that still suffered from poor operational security which allowed the codes to be quickly matched up to manufacturers. It didn't help that the British analysts meticulously kept track of everything, allowing them to identify the manufacturer of one unlabelled part by the inspector's number.
The German army was not particularly mechanised or well equipped as a whole, relying on a lot of horse draw vehicles for the entire war.
When you look at the war from a manufacturing perspective, the question is more about how Germany survived for so long again it’s such huge manufacturing nations. For a seemingly dry subject, David Edgerton’s book on this is very readable. https://www.theguardian.com/books/2011/mar/27/britains-war-m...
Still, the idea that this could leak valuable information is probably more obvious in hindsight, and sequential serial numbers do have some upsides. If there's a design flaw in one version of the gearboxes, you can just pull everything with a serial number between XXXX and YYYY. With randomized numbers, you'd have to maintain some master database, which is a lot harder when most of logging is done with pen-and-ink ledgers, carbon copies, and maybe punchcards.
I think it's more likely that many were aware of the security issues, but it wasn't worth the coordination of coming up with a scheme, giving it to all spare parts suppliers in a secure way, etc. potentially slowing down the war effort. I bet the Allies used a lot of serial numbers too, despite this work.
[0] https://en.wikipedia.org/wiki/German_Workers%27_Party#Adolf_...
Because it happened 80 years ago, when German army (or any other) did not understand statistics as well as they do today. It was a groundbreaking achievement by allies.
I don't think you need deep statistics knowledge to know that if the enemy captured Serial # 0020, 0120, 0439, 1293 and 1356; they would at least have some hint that the lower bound is 1356 tanks.
It's nearly impossible for a bomber to navigate long distances in the dark over a blacked-out country, so the Germans came up with a radio navigation system involving beams transmitted from the mainland to intersect over the target, which the British figured out how to jam; the Germans came up with another nav system, and the Brits eventually jammed that one too.
The British knew the Germans would be trying to find yet another way. They'd learned from Enigma decrypts about a new device called Wotan. One researcher looked up the word, learned that it was the name of a one-eyed god, and concluded that the new system would use a single transmitter with a rangefinding transponder aboard the bomber, instead of multiple beams like the previous ones. Starting from there, they had a countermeasure online and ready to go before the Germans even deployed Wotan. When the Nazis realized they'd been outmaneuvered from the start, they gave up on radio-guided bombing completely, at least against Britain.
British intelligence was pretty impressive during WWII.
https://www.amazon.com/Most-Secret-Penguin-World-Collection-...
And yes, the military intelligence of the Germans sucked in WW2. Didn't help neither that the culture, military and political, was highly idiological. When truth cannot be spoken and power won't listen facts are ignored. It cannot be what's not allowed to be. And then reality bites your ass ultimately.
Congratulations on your nerd snipe!
But AFAIK those targets were selected based on pre-war "traditional" intelligence what the likely bottleneck resources would be, not statistical analysis of captured equipment.
If M is the maximum serial number of N is the total number of observations, using the formula in the post:
M + (avg. spacing) = M + M / N - 1 = (N + 1) / N * M
To me that gives a more clear picture of what the unbiased
estimator is doing: inflate the maximum value by a factor that
limits towards one as the sample size grows.Or does it make a difference?
A full listing was available through the site's robots.txt sitemaps file, or rather, a listing to the listing of 50,000 user profile sitemap files, with about 44k profiles per file. This worked out to 25 GB of profile listings alone.
Rather than download the full set (though I eventually did), I picked an arbitrary file from near the middle of the listing, and ran some spot checks on the profiles, which seemed to be reasonably randomly distributed by age, location, and other characteristics. With as few as 100 profile page downloads, it was clearly evident that active posting to G+ was limited to about 8-11% ofall profiles. The full 50k profile sample, and a third party's independent (and more robustly randomised) 500k profile sample eventually showed this to be 9.7%.
(And yes, if I was being more rigorous I could have done much more testing or work, but I was mostly addressing personal curiosity and an online disagreement with someone.)
An interesting proof of the power of random sampling.
Larger samples do allow for clearer views of rare phenomena -- such as dialing in on the fraction of 1% of G+ users highly active on the site. Or when I later looked at Communities characteristics, the properties of the very largest (about 50 > 1 million members) of the 8 million total. In that case, I eventually got access (also via a third-party) to a comprehensive summary dataset.
The userID hashing also made approaches such as exhaustively searching the ID space for user pages nonviable. The search space was trillions pf times larger than the target space.
[0] https://www.theguardian.com/technology/blog/2008/oct/08/ipho...
Confounding question: 1000 years ago, would this argument look any different? Answer: mathematically speaking, it would not. In fact, far more humans have been born than you could have predicted using this method. Conclusion: the argument is flawed.
However, the argument will still give the correct prediction for most humans that try to use it. Just not for the few that were in the special position to be born early in the sequence of all humans. The argument essentially tells you that you have no reason to believe that you are also in that special position.
[0] https://en.wikipedia.org/wiki/Discrete_uniform_distribution#...
[689, 341, 386, 741, 982, 414, 845, 241, 180, 447, 880, 21, 583, 993, 812]
it’s tough to see an argument for anything other than: a. about 1,000, or b. 1,000ish but there may be a confounding fact pattern we are unaware of....
2 x mean
should be an unbiased estimator of the true mean. But because we are probably under sampling the extremes, we could use the Bessel correction:
1/(n-1) x summation_{i=1}^n(sample_i)
I would guess this comes out to a better estimation than what the article says.
Bessel's correction might be a bit of overkill, since it's intended to work with normal distributions. But I still suspect it comes out to a better estimation that what the blog post says.
You could adjust for such problems but it seems much easier to use the maximum.
https://en.wikipedia.org/wiki/German_tank_problem
Matter of fact...
According to conventional Allied intelligence estimates, the Germans
were producing around 1,400 tanks a month between June 1940 and September 1942.
Applying the formula below to the serial numbers of captured tanks, the number
was calculated to be 246 a month. After the war, captured German production
figures from the ministry of Albert Speer showed the actual number to be 245.[0] https://www.eadan.net/blog/german-tank-problem/#probabilisti...
Not much clutter & straight to the point. Loads fast and it’s under 630KB.
Could certainly be improved but it’s nice not having to load >25MB just to read an article.
Are you looking for the answer that is the 'most likely', or one that has the 'lowest least squared error', or maybe one that is 'unbiased' (mean error)?
Probably in today's world this is racist or nationalist or something. But (as someone of German descent) I have to admit it's funny.
http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2...
Little bit of a funny though: Note how num_tanks ~ Unif(max(captured),2000) was defined, so you already have p[ parameter | data ]. Isn't this already a posterior?
I get however how if you had the r.v.s num_tanks ~ Unif(M,2000), observed | num_tanks ~ Unif(1,num_tanks), M some constant, that you could find a posterior distribution num_tanks | vector<observed> by first finding the joint via E[ 1[num_tanks < t]P[observed | num_tanks] ]
https://github.com/CamDavidsonPilon/Probabilistic-Programmin...