The German Tank Problem (opens in new tab)

(eadan.net)

422 pointseadan6y ago104 comments

104 comments

79 comments · 25 top-level

spectramax6y ago· 12 in thread

Why didn't they use randomized and scrambled serial numbers? Sort of like what Amazon does to their order numbers. I know it can still be cracked but serially numbering military equipment is not very smart. I was setting up a Shopify store the other day and it doesn't allow for a lookup table to be used for order numbers. I don't want competitors to know that I've sold so many X items. Same thing with Squarespace and Square e-commerce stores. It blows my mind that a multi-billion dollar ecom giant has not implemented despite of forum posts and requests from users.

Nitramp6y ago

World War II was (at least one of) the first industrialized war. So the whole situation was genuinely novel to most participants.

Additionally, the German army command didn't think that way. Where the US relied on overpowering by materiel dominance, and the Soviets fought and won through unimaginable human sacrifice, the considerable initial success of the German army was based on better, smarter tactics, individual leadership, bravery, ruthlessness, etc. The leadership assumed they'd be able to win the war that way, even when the war had turned into a much more industrial operation.

You can see that in operations such as the Battle of the Bulge, the war in Normandy, and most importantly in the the Russian campaign.

This is of course over-generalizing, but I believe the general mode of thinking was there, and that'd explain the lack of attention on such details.

greedo6y ago

Don't underestimate Soviet industrial capabilities during the war. The Soviets produced over 58K T-34 type tanks compared to Germany producing 37K (PzIII through Pz6).

2 more replies

gumby6y ago

First, they probably didn't consider that serial numbers might be an information leak.

Second, all calculations were done by hand in those days (and documents that weren't printed in bulk had tp be retyped by hand) so sequential numbers were not only easier to issue but to track (e.g. if you have a production problem you can say "let's check all tanks with S/Ns between A and B" rather than having to maintain a list mapping production dates to serial numbers that might be in a file cabinet somewhere distant from where you are.

dragonwriter6y ago

> Why didn't they use randomized and scrambled serial numbers?

Because there weren't well-known examples of the risk of not doing that, and not doing it is the easy and obvious thing if you have no clear reason to do it, and makes lots of things you might use those numbers for yourself easier (and if it wasn't for your own use, you wouldn't issue the numbers at all.)

jcranmer6y ago

Because supply chain logistics. The Germans in WWII were world leaders in manufacturing (perhaps bested only by the US), and one of the elements of that manufacturing quality is the ability to trace individual parts back to the exact manufacturing batch to figure out why particular batches go wrong.

The Germans did (eventually) make some effort to obscure details of their supply chain--they forced manufacturers to use three-letter codes instead of their normal trademarks--but that still suffered from poor operational security which allowed the codes to be quickly matched up to manufacturers. It didn't help that the British analysts meticulously kept track of everything, allowing them to identify the manufacturer of one unlabelled part by the inspector's number.

lostlogin6y ago

They might have had good equipment sometimes, but they had nowhere near enough of it. They were outproduced by Britain “alone” (counting the colonies) in most areas, most the time, and often by considerable margins.

The German army was not particularly mechanised or well equipped as a whole, relying on a lot of horse draw vehicles for the entire war.

When you look at the war from a manufacturing perspective, the question is more about how Germany survived for so long again it’s such huge manufacturing nations. For a seemingly dry subject, David Edgerton’s book on this is very readable. https://www.theguardian.com/books/2011/mar/27/britains-war-m...

2 more replies

mattkrause6y ago

They did, sort of, starting in the 1940s. The tank make/model were replaced by arbitrary codes, but it was done sloppily and many of the tanks could be re-identified. (See page 80 of the paper @dooglius posted above).

Still, the idea that this could leak valuable information is probably more obvious in hindsight, and sequential serial numbers do have some upsides. If there's a design flaw in one version of the gearboxes, you can just pull everything with a serial number between XXXX and YYYY. With randomized numbers, you'd have to maintain some master database, which is a lot harder when most of logging is done with pen-and-ink ledgers, carbon copies, and maybe punchcards.

tyingq6y ago

They could just randomly skip some numbers. That would maintain the XXX to YYY advantage.

1 more reply

dooglius6y ago

It's interesting to see other commenters saying that manipulating IDs wouldn't occur to the Germans, it reminded me of an interesting anecdote from Hitler's rise to power: "It was in January 1920 when a numeration was issued for the first time and listed in alphabetical order Hitler received the number 555. In reality, he had been the 55th member, but the counting started at the number 501 in order to make the party appear larger." [0]

I think it's more likely that many were aware of the security issues, but it wasn't worth the coordination of coming up with a scheme, giving it to all spare parts suppliers in a secure way, etc. potentially slowing down the war effort. I bet the Allies used a lot of serial numbers too, despite this work.

[0] https://en.wikipedia.org/wiki/German_Workers%27_Party#Adolf_...

terramex6y ago

>Why didn't they use randomized and scrambled serial numbers?

Because it happened 80 years ago, when German army (or any other) did not understand statistics as well as they do today. It was a groundbreaking achievement by allies.

spectramax6y ago

They did know about encryption and developed the Enigma machine.

I don't think you need deep statistics knowledge to know that if the enemy captured Serial # 0020, 0120, 0439, 1293 and 1356; they would at least have some hint that the lower bound is 1356 tanks.

1 more reply

notinversed6y ago

Because they were Germans.

jackfoxy6y ago· 10 in thread

How ironic that the nation that led the world in the frontiers of maths in the 19th century completely missed the boat in the applied math of signals intelligence in WWII. I'm referring to the tank serial numbers and the lack of care in Enigma codes, except by the Kriegsmarine, but even they eventually lost a code book to the allies, which they apparently considered an impossibility.

PhasmaFelis6y ago

They had a lot of opsec problems. There's a great story about how using "cool" codenames instead of random ones bit them in the ass.

It's nearly impossible for a bomber to navigate long distances in the dark over a blacked-out country, so the Germans came up with a radio navigation system involving beams transmitted from the mainland to intersect over the target, which the British figured out how to jam; the Germans came up with another nav system, and the Brits eventually jammed that one too.

The British knew the Germans would be trying to find yet another way. They'd learned from Enigma decrypts about a new device called Wotan. One researcher looked up the word, learned that it was the name of a one-eyed god, and concluded that the new system would use a single transmitter with a rangefinding transponder aboard the bomber, instead of multiple beams like the previous ones. Starting from there, they had a countermeasure online and ready to go before the Germans even deployed Wotan. When the Nazis realized they'd been outmaneuvered from the start, they gave up on radio-guided bombing completely, at least against Britain.

noir_lord6y ago

We also caught every single German spy and turned them all (iirc it was all) but didn’t know we’d got them all till after the wars conclusion.

British intelligence was pretty impressive during WWII.

1 more reply

chiph6y ago

Dr. R.V. Jones had significant involvement in the War of the Beams, and after the war wrote a book about British Scientific Intelligence efforts during the war.

https://www.amazon.com/Most-Secret-Penguin-World-Collection-...

hef198986y ago

Slightly off topic, but that mindset, ignoring expertise in field that could help in another, is still quite common in Germany if you ask me.

And yes, the military intelligence of the Germans sucked in WW2. Didn't help neither that the culture, military and political, was highly idiological. When truth cannot be spoken and power won't listen facts are ignored. It cannot be what's not allowed to be. And then reality bites your ass ultimately.

anoncake6y ago

It wasn't all incompetence. The head of the Abwehr was part of the resistance.

https://en.wikipedia.org/wiki/Wilhelm_Canaris

hef198986y ago

True. Still, from what I know, he saw himself as a patriot. Which was the reason why he opposed the Nazis but the reason why he didn't defect or betray the Germans.

1 more reply

bnegreve6y ago

I suspect there is a strong winner bias here. Success stories of allies tends to be reported more often.

michaelt6y ago

If you were a Nazi codebreaker whose successes in the war were classified, would you publish detailed memoirs? Or would you destroy the evidence, which was probably what your orders said to do anyway?

tsss6y ago

It certainly didn't help that they killed all the academics and free-thinkers.

dmos626y ago

Quote? I'm vague on what went on in Germany in the first half of the century.

3 more replies

srean6y ago· 7 in thread

The job interview version: If you are being interviewed for a position by engineers who have their employee ids (serially allocated) on their badge find the number of employees from those ids assuming all engineers are equally likely to be on the panel of 8.

chiph6y ago

I have looked at my payroll check numbers from contracting firms to see how they're doing as a business. If the interval between check numbers drops in a month, I have a good idea that there aren't as many people working there anymore.

feintruled6y ago

I worked for a big company and we used to put bug numbers in our change releases. We were told to stop doing this, as some customers would see that their bug would appear to have been given a lower priority when they saw lower numbers coming in first.

carlmr6y ago

I would guess that the likelihood of older employees should be higher. Although I've never seen a panel of 8 at a job interview.

rootw0rm6y ago

when i first started a grey-market research chemical company some years ago, i added like 31 or something to each invoice number to make it seem like i did more business.

breakingcups6y ago

Invoice numbers have to be sequential where I live.

1 more reply

HeWhoLurksLate6y ago

I read your comment before I read the article, and my head started spinning really hard.

Congratulations on your nerd snipe!

pieterr6y ago

dooglius6y ago· 5 in thread

This is only the toy version of the actual problems solved by the Allies, which were more nuanced, and involved reasoning about the tank manufacturing pipeline. The write-up [0] doesn't go into the math but makes an interesting read.

[0] https://sci-hub.tw/10.2307/2280189

cortesoft6y ago

Yeah, I can't imagine the assumption that tanks captured were "randomly uniformly distributed" is a good one. I can imagine all sorts of reasons that wouldn't be the case.

Causality16y ago

How accurate did the allies' model turn out to be when compared to the real number?

dooglius6y ago

Quite well, see pg. 86 for a plot of all predictions

walrus016y ago

I recall something about targeted bombing of ball bearing factories.

jabl6y ago

There were the infamous raids on the ball bearing factories in Schweinfurt. (At the time the allies didn't have escort fighters with sufficient range, and the bombers suffered heavily.)

But AFAIK those targets were selected based on pre-war "traditional" intelligence what the likely bottleneck resources would be, not statistical analysis of captured equipment.

1 more reply

laGrenouille6y ago· 4 in thread

Interesting article, though I think it incorrectly leaves the reader thinking that there is some interesting informating hidden in the average spacing of the numbers. In fact, all you need to know is that maximum observation and the number of observations. Once you simplify the average spacing goes away.

If M is the maximum serial number of N is the total number of observations, using the formula in the post:

    M + (avg. spacing) = M + M / N - 1 = (N + 1) / N * M

To me that gives a more clear picture of what the unbiased estimator is doing: inflate the maximum value by a factor that limits towards one as the sample size grows.

comicjk6y ago

If you just assume that the sample mean = the population mean, then you get the right answer, at least for this example. I don't see why the article fools around with the maximum at all - isn't the maximum a much more noisy statistic than the mean?

skosch6y ago

The range matters – had they found 10 serial numbers between 100000 and 101000, would the mean still be a meaningful estimate of the production rate? In this case, the author just tacitly assumes the minimum to be zero.

ptero6y ago

To be the devils advocate: what you say is true if you know the distribution. If spacing looks weird (e.g. clustered) it might indicate that the number is, for example a pairing of model and serial numbers, etc.

popotamonga6y ago

Distribution of manufacturing date or distribution of rate of tank capture?

Or does it make a difference?

nevir6y ago· 4 in thread

FWIW, this is part of why Amazon's product identifiers (ASINs) are obfuscated the way they are

dredmorbius6y ago

Similarly, Google+ userIDs were assigned as 21-character numeric strings, beginning with '10' or '11', but otherwise appearing to be randomly assigened.

A full listing was available through the site's robots.txt sitemaps file, or rather, a listing to the listing of 50,000 user profile sitemap files, with about 44k profiles per file. This worked out to 25 GB of profile listings alone.

Rather than download the full set (though I eventually did), I picked an arbitrary file from near the middle of the listing, and ran some spot checks on the profiles, which seemed to be reasonably randomly distributed by age, location, and other characteristics. With as few as 100 profile page downloads, it was clearly evident that active posting to G+ was limited to about 8-11% ofall profiles. The full 50k profile sample, and a third party's independent (and more robustly randomised) 500k profile sample eventually showed this to be 9.7%.

(And yes, if I was being more rigorous I could have done much more testing or work, but I was mostly addressing personal curiosity and an online disagreement with someone.)

An interesting proof of the power of random sampling.

Larger samples do allow for clearer views of rare phenomena -- such as dialing in on the fraction of 1% of G+ users highly active on the site. Or when I later looked at Communities characteristics, the properties of the very largest (about 50 > 1 million members) of the 8 million total. In that case, I eventually got access (also via a third-party) to a comprehensive summary dataset.

The userID hashing also made approaches such as exhaustively searching the ID space for user pages nonviable. The search space was trillions pf times larger than the target space.

eadanOP6y ago

People have used a similar strategy to estimate iPhone production [0].

[0] https://www.theguardian.com/technology/blog/2008/oct/08/ipho...

shereadsthenews6y ago

There's a zillion things you can estimate this way. A lot of sites use sequential cookies, user IDs, etc. Until about a decade ago UPS tracking numbers were sequential for each shipper which made it trivial to estimate output for online shops. Apple invoice numbers used to be dense and sequential and you only needed the number to retrieve the invoice. The IMEI is actually just about the worst way to have estimated iPhone sales in 2008; at that time you could literally have crawled Apple's website for every invoice whether sold online or in stores.

dividuum6y ago

See also Doomsday Argument.

https://en.wikipedia.org/wiki/Doomsday_argument

currymj6y ago

The excellent book "Paradoxes in Probability Theory" by William Eckhardt has some good arguments against this, the Simulation Hypothesis, and similar things. One simple way to summarize the counterargument is that in a lot of cases, the choice of seemingly-reasonable priors actually hides an unreasonable assumption that it is possible for the future to change the past.

sulam6y ago

Take a silly premise, get a silly argument and a really silly conclusion.

Confounding question: 1000 years ago, would this argument look any different? Answer: mathematically speaking, it would not. In fact, far more humans have been born than you could have predicted using this method. Conclusion: the argument is flawed.

wcoenen6y ago

The people living a 1000 years ago indeed could have used the same argument to show that they should be 95% certain to be in the last 95% of all humans to be ever born, and history indeed showed that this was not the case; the dice fell on the other 5% possibility for them and the population increased more than 20x.

However, the argument will still give the correct prediction for most humans that try to use it. Just not for the few that were in the special position to be born early in the sequence of all humans. The argument essentially tells you that you have no reason to believe that you are also in that special position.

2 more replies

coldcode6y ago· 3 in thread

More impressive than using modern tools is that people in WW2 figured this out and modeled it on paper using slide rules.

eadanOP6y ago

It's actually fairly straightforward to derive the estimators by hand [0]. It's just a bit tedious to do in a blog post.

[0] https://en.wikipedia.org/wiki/Discrete_uniform_distribution#...

mcenedella6y ago

I’m not sure how much higher math you’d really need to answer the question “how many tanks has the other side produced?” if these were the serial numbers of the captured tanks:

[689, 341, 386, 741, 982, 414, 845, 241, 180, 447, 880, 21, 583, 993, 812]

it’s tough to see an argument for anything other than: a. about 1,000, or b. 1,000ish but there may be a confounding fact pattern we are unaware of....

dogma11386y ago

Mechanical computers that are designed to solve a single problem aren’t necessarily at a disadvantage, especially when the math isn’t that complex figuring out the variables and all factors was the problem.

mruts6y ago· 3 in thread

Seeing that it's a uniform distribution, let's start out with assuming our sample mean (the average serial number we find) has the same distribution as the true mean (the actual number of tanks in existence). If this is true, then:

2 x mean

should be an unbiased estimator of the true mean. But because we are probably under sampling the extremes, we could use the Bessel correction:

1/(n-1) x summation_{i=1}^n(sample_i)

I would guess this comes out to a better estimation than what the article says.

Bessel's correction might be a bit of overkill, since it's intended to work with normal distributions. But I still suspect it comes out to a better estimation that what the blog post says.

comicjk6y ago

I thought the same, and tried it. The mean sample mean after 10 million runs is 500.47, which is very close to the true mean, 500.5. Bessel's correction is not correct here - it's effectively multiplying by n/(n-1), so your estimate would be 536. Bessel's correction is made for estimating true variance using a sample, not for estimating true mean.

mruts6y ago

Yes of course, the mean is the first moment and a Bessel correction would be inappropriate. Now I feel stupid. The mean calculated by sum(p_t x_i) or 1/n sum(x_i) is already the best linear unbiased estimate. Maybe we can't get better than twice the mean?

nestorD6y ago

What if you get into the war at a later point ? Most tank with a small serial number will have been destroyed (tanks tend to have a short life) and, using the mean instead of the maximum, you will get a seriously biaised result.

You could adjust for such problems but it seems much easier to use the maximum.

tzury6y ago· 1 in thread

More about Frequentist and Bayesian analysis can be found here:

https://en.wikipedia.org/wiki/German_tank_problem

Matter of fact...

    According to conventional Allied intelligence estimates, the Germans 
    were producing around 1,400 tanks a month between June 1940 and September 1942. 

    Applying the formula below to the serial numbers of captured tanks, the number 
    was calculated to be 246 a month. After the war, captured German production 
    figures from the ministry of Albert Speer showed the actual number to be 245.

debbiedowner6y ago

I was actually surprised that MVUEs and the fellow point estimators are called frequentist (though it makes sense). In school we always referred to them as non-Bayesian, at the same time frequentist always seemed like a dirty word to us students so maybe that's why

joker36y ago· 1 in thread

Given the praise for Bayesian methods here, I'm surprised the author didn't discuss the Bayesian solution. See http://isaacslavitt.com/2015/12/19/german-tank-problem-with-... for a similar exposition.

eadanOP6y ago

I implement the Bayesian solution in pymc3 at the end [0].

[0] https://www.eadan.net/blog/german-tank-problem/#probabilisti...

jaimex26y ago· 1 in thread

Was there only one tank factory?

jandrese6y ago

No but each tank also had a manufacturer code, so this technique still worked.

mhh__6y ago

For anyone else interested in WW2 reverse engineering and design etc., https://www.youtube.com/watch?v=GJCF-Ufapu8 "The secret war" is a huge documentary covering british efforts to counter german electronic warfare and V-weapons.

Nomentatus6y ago

This all seems to assume the tank serial numbers would be captured at one moment in time ("captured 15 of these tanks uniformly at random.") But in fact the tank shells dribble in over time which biases the gap, the gaps at the highest numbers are going to be greater. Earlier tanks have had many more chances to be destroyed or captured. So using average gap is clearly not going to give the best estimate. If you restrict yourself to tanks from the latest large battle, that will cancel out the dribble effect though.

kevingrahl6y ago

Since no one else commented on that yet; I just wanted to say that I like the simple layout OP is using.

Not much clutter & straight to the point. Loads fast and it’s under 630KB.

Could certainly be improved but it’s nice not having to load >25MB just to read an article.

d--6y ago

This is also a good (applied, with simple code) example of the use of probabilistic programming. I can't get myself to read full books, but somehow this simple example gave me some intuition and additional pointers to follow.

dang6y ago

2015: https://news.ycombinator.com/item?id=10517882

2009: https://news.ycombinator.com/item?id=670065

squeakynick6y ago

It depends on how you intend to 'score' the estimate.

Are you looking for the answer that is the 'most likely', or one that has the 'lowest least squared error', or maybe one that is 'unbiased' (mean error)?

http://datagenetics.com/blog/march22014/index.html

slyu6y ago

I recommend Think Bayes by Allen Downey if you want to study more. It's a free book available online. http://www.greenteapress.com/thinkbayes/thinkbayes.pdf

RickJWagner6y ago

"the Germans, being Germans, had numbered their parts in the order they rolled off the production line"

Probably in today's world this is racist or nationalist or something. But (as someone of German descent) I have to admit it's funny.

ngneer6y ago

I remember studying this problem in the context of anonymity a few years back, defining immeasurability as the property whereby an adversary cannot distinguish between different node counts, for example. The tank problem is related to mark recapture techniques for animal population size estimation. Shameless plug,

http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2...

debbiedowner6y ago

Nice write up.

Little bit of a funny though: Note how num_tanks ~ Unif(max(captured),2000) was defined, so you already have p[ parameter | data ]. Isn't this already a posterior?

I get however how if you had the r.v.s num_tanks ~ Unif(M,2000), observed | num_tanks ~ Unif(1,num_tanks), M some constant, that you could find a posterior distribution num_tanks | vector<observed> by first finding the joint via E[ 1[num_tanks < t]P[observed | num_tanks] ]

ngcc_hk6y ago

Very interesting. Especially the three links and in particular

https://github.com/CamDavidsonPilon/Probabilistic-Programmin...

salty_biscuits6y ago

Why call it probabilistic programming? It is bayesian inference with mcmc (or am I missing something).

matchagaucho6y ago

Moral of the story: Don't auto-increment serial numbers :-)

1 more reply

j / k navigate · click thread line to collapse

104 comments

79 comments · 25 top-level

spectramax6y ago· 12 in thread

Nitramp6y ago

World War II was (at least one of) the first industrialized war. So the whole situation was genuinely novel to most participants.

You can see that in operations such as the Battle of the Bulge, the war in Normandy, and most importantly in the the Russian campaign.

This is of course over-generalizing, but I believe the general mode of thinking was there, and that'd explain the lack of attention on such details.

greedo6y ago

Don't underestimate Soviet industrial capabilities during the war. The Soviets produced over 58K T-34 type tanks compared to Germany producing 37K (PzIII through Pz6).

2 more replies

gumby6y ago

First, they probably didn't consider that serial numbers might be an information leak.

dragonwriter6y ago

> Why didn't they use randomized and scrambled serial numbers?

jcranmer6y ago

lostlogin6y ago

The German army was not particularly mechanised or well equipped as a whole, relying on a lot of horse draw vehicles for the entire war.

2 more replies

mattkrause6y ago

tyingq6y ago

They could just randomly skip some numbers. That would maintain the XXX to YYY advantage.

1 more reply

dooglius6y ago

[0] https://en.wikipedia.org/wiki/German_Workers%27_Party#Adolf_...

terramex6y ago

>Why didn't they use randomized and scrambled serial numbers?

Because it happened 80 years ago, when German army (or any other) did not understand statistics as well as they do today. It was a groundbreaking achievement by allies.

spectramax6y ago

They did know about encryption and developed the Enigma machine.

I don't think you need deep statistics knowledge to know that if the enemy captured Serial # 0020, 0120, 0439, 1293 and 1356; they would at least have some hint that the lower bound is 1356 tanks.

1 more reply

notinversed6y ago

Because they were Germans.

jackfoxy6y ago· 10 in thread

PhasmaFelis6y ago

They had a lot of opsec problems. There's a great story about how using "cool" codenames instead of random ones bit them in the ass.

noir_lord6y ago

We also caught every single German spy and turned them all (iirc it was all) but didn’t know we’d got them all till after the wars conclusion.

British intelligence was pretty impressive during WWII.

1 more reply

chiph6y ago

Dr. R.V. Jones had significant involvement in the War of the Beams, and after the war wrote a book about British Scientific Intelligence efforts during the war.

https://www.amazon.com/Most-Secret-Penguin-World-Collection-...

hef198986y ago

Slightly off topic, but that mindset, ignoring expertise in field that could help in another, is still quite common in Germany if you ask me.

anoncake6y ago

It wasn't all incompetence. The head of the Abwehr was part of the resistance.

https://en.wikipedia.org/wiki/Wilhelm_Canaris

hef198986y ago

True. Still, from what I know, he saw himself as a patriot. Which was the reason why he opposed the Nazis but the reason why he didn't defect or betray the Germans.

1 more reply

bnegreve6y ago

I suspect there is a strong winner bias here. Success stories of allies tends to be reported more often.

michaelt6y ago

tsss6y ago

It certainly didn't help that they killed all the academics and free-thinkers.

dmos626y ago

Quote? I'm vague on what went on in Germany in the first half of the century.

3 more replies

srean6y ago· 7 in thread

chiph6y ago

feintruled6y ago

carlmr6y ago

I would guess that the likelihood of older employees should be higher. Although I've never seen a panel of 8 at a job interview.

rootw0rm6y ago

when i first started a grey-market research chemical company some years ago, i added like 31 or something to each invoice number to make it seem like i did more business.

breakingcups6y ago

Invoice numbers have to be sequential where I live.

1 more reply

HeWhoLurksLate6y ago

I read your comment before I read the article, and my head started spinning really hard.

Congratulations on your nerd snipe!

pieterr6y ago

dooglius6y ago· 5 in thread

[0] https://sci-hub.tw/10.2307/2280189

cortesoft6y ago

Yeah, I can't imagine the assumption that tanks captured were "randomly uniformly distributed" is a good one. I can imagine all sorts of reasons that wouldn't be the case.

Causality16y ago

How accurate did the allies' model turn out to be when compared to the real number?

dooglius6y ago

Quite well, see pg. 86 for a plot of all predictions

walrus016y ago

I recall something about targeted bombing of ball bearing factories.

jabl6y ago

There were the infamous raids on the ball bearing factories in Schweinfurt. (At the time the allies didn't have escort fighters with sufficient range, and the bombers suffered heavily.)

But AFAIK those targets were selected based on pre-war "traditional" intelligence what the likely bottleneck resources would be, not statistical analysis of captured equipment.

1 more reply

laGrenouille6y ago· 4 in thread

If M is the maximum serial number of N is the total number of observations, using the formula in the post:

    M + (avg. spacing) = M + M / N - 1 = (N + 1) / N * M

To me that gives a more clear picture of what the unbiased estimator is doing: inflate the maximum value by a factor that limits towards one as the sample size grows.

comicjk6y ago

skosch6y ago

ptero6y ago

popotamonga6y ago

Distribution of manufacturing date or distribution of rate of tank capture?

Or does it make a difference?

nevir6y ago· 4 in thread

FWIW, this is part of why Amazon's product identifiers (ASINs) are obfuscated the way they are

dredmorbius6y ago

Similarly, Google+ userIDs were assigned as 21-character numeric strings, beginning with '10' or '11', but otherwise appearing to be randomly assigened.

(And yes, if I was being more rigorous I could have done much more testing or work, but I was mostly addressing personal curiosity and an online disagreement with someone.)

An interesting proof of the power of random sampling.

The userID hashing also made approaches such as exhaustively searching the ID space for user pages nonviable. The search space was trillions pf times larger than the target space.

eadanOP6y ago

People have used a similar strategy to estimate iPhone production [0].

[0] https://www.theguardian.com/technology/blog/2008/oct/08/ipho...

shereadsthenews6y ago

dividuum6y ago