undefined | Better HN

0 pointsdragontamer10y ago0 comments

Base52 means capital letters and lower case letters. Base62 includes numbers (0 through 9).

We programmers like being specific. Sometimes these sorts of details matter.

0 comments

11 comments · 4 top-level

dpark10y ago· 4 in thread

This detail doesn't matter and is needlessly confusing. "Random upper and lowercase letters" is exactly as specific and accurate as "base52-encoded random numbers", but the former is more understandable while the latter is trying way too hard to sound smart.

kbar1310y ago

look, this is hacker news. the author knows his audience. most people who read this will know what base52 is or will at least recognize what it might be and be able to look it up. i dont think it's meant to sound "smart", it's an accurate and concise description of the randomness

bitJericho10y ago

It wasn't smart since I literally thought he was saying there was encoded data.

RandomBK10y ago

These details can often shed light into what happened ( or was supposed to happen ), what tools were being used, etc. While they might be useless for the average reader, there are many on HN, myself included, who will be dissecting the update to learn more about how Windows update works

dpark10y ago

That's fine, but it's not useful to use obtuse terminology. Not once have I heard anyone use the term base-52 before today. I understood the term, but it's not common (because base-52 encoding is not common). It's so not common that it doesn't merit a page on Wikipedia, nor a reference from the page for the number 52, nor even a reference from the page for base-64. It's obtuse.

It's also so specific as to be inaccurate. These strings could be interpreted as "base-52", but also as base-64 or any other base greater than 52. Calling them "encoded" also implies a belief about how these were derived that isn't justified. "Encoded" means that there is some original source that can be recreated by decoding. It's possible that these were actually created by generating random numbers and then encoding that data in base-52. I think that's pretty unlikely, though.

So no, I don't think this sheds any light or additional detail. It's inaccurate and misleading and if we're so specific we're making up terminology, then we should also be specific enough to say things like "assumed pseudorandom" rather than "random" when we don't know. Otherwise we're just being obtuse.

1 more reply

Dylan1680710y ago· 3 in thread

And saying "base52" is misleading. It implies that there is a source data that's been encoded, which is not likely here. You can be specific without implying that.

A random string out of a specific character set is also subtly different from taking random bits and encoding them with that same character set, in that the first couple digits will have different distributions.

kaoD10y ago

> saying "base52" is misleading. It implies that there is a source data that's been encoded

No, it doesn't. 0xFF is a number I just made up, no source data at all, I promise. Also, it's base 16 :)

Anyhow, the source data was most definitely base 2 (as is your computer's memory, I assume) and later encoded into base52 to be represented as a string (unless someone at Microsoft wrote it in base52, which seems unlikely).

dpark10y ago

> 0xFF is a number I just made up, no source data at all, I promise. Also, it's base 16.

It's not base 16 encoded, which was his point. Encoding demands a source. This is just a base-16 number unless you encoded something to arrive at this. You could interpret "Romeo and Juliet" as a very large base65 number (65 unique chars in the random copy I grabbed) if you want, but it's not meaningful or accurate to call it a base65 encoding.

> Even if that were true, the source data was most definitely encoded from base 2 (which is what our computers work with).

This is the kind of pedantry that people hate because it adds nothing to the conversation. It's a way to inject "I'm right" moments into the conversation so you can feel smart, while no one else really cares. It makes for unpleasant conversations.

You're also not right. Your brain doesn't work in base-2, and you likely didn't enter this number into your computer in base2. You typed in the string "0xFF", and that string was encoded in base 2. The base2 that represents the string "0xFF" is very different from the base2 number that represents the logical (base16) number 0xFF.

</pedantry>

2 more replies

Dylan1680710y ago

>No, it doesn't. 0xFF is a number I just made up, no source data at all, I promise. Also, it's base 16 :)

But the full quote was "base52-encoded". (Though I would argue that base52 implies encoding, because nothing naturally works in base52. The only thing that's naturally 52 is "random letters with random case". Or something with cards.)

>Anyhow, the source data was most definitely base 2 (as is your computer's memory, I assume) and later encoded into base52 to be represented as a string (unless someone at Microsoft wrote it in base52, which seems unlikely).

That is an enormous assumption. It's easier to pick random letters than it is to take a specific binary number and convert it to letters. And they don't give you the same result. Bits stored in base 52 will never start with zzzzz.

jlebar10y ago

> We programmers like being specific.

But above all, we like being pedantic. (Not you.) :)

SixSigma10y ago

[a-Z] and [a-Z0-9] would be a better representation, no?

j / k navigate · click thread line to collapse

0 comments

11 comments · 4 top-level

dpark10y ago· 4 in thread

kbar1310y ago

bitJericho10y ago

It wasn't smart since I literally thought he was saying there was encoded data.

RandomBK10y ago

dpark10y ago

1 more reply

Dylan1680710y ago· 3 in thread

And saying "base52" is misleading. It implies that there is a source data that's been encoded, which is not likely here. You can be specific without implying that.

kaoD10y ago

> saying "base52" is misleading. It implies that there is a source data that's been encoded

No, it doesn't. 0xFF is a number I just made up, no source data at all, I promise. Also, it's base 16 :)

dpark10y ago

> 0xFF is a number I just made up, no source data at all, I promise. Also, it's base 16.

> Even if that were true, the source data was most definitely encoded from base 2 (which is what our computers work with).

</pedantry>

2 more replies

Dylan1680710y ago

>No, it doesn't. 0xFF is a number I just made up, no source data at all, I promise. Also, it's base 16 :)

jlebar10y ago

> We programmers like being specific.

But above all, we like being pedantic. (Not you.) :)

SixSigma10y ago

[a-Z] and [a-Z0-9] would be a better representation, no?

j / k navigate · click thread line to collapse