However, in common usage we now have the pretty awful scenario where, if a person says "a megabyte", the actual number of bytes they're talking about can change depending on where the data is stored! A megabyte of RAM is not a megabyte of hard disk space. The solution would be to standardise on binary prefixes, since we pretty much have to talk about RAM that way, and a consistent measurement across different media is eminently sensible.
The "hard drive maker conspiracy" story is driven by the fact that manufacturers have no real incentive to switch to binary prefixes, because that would make their drives look "smaller". Do you want the 300 gigabyte drive or the 279.4 gibibyte drive? Aside from the fact that hardly anyone knows that a gibibyte is, in the absence of any more information the larger number is probably better. Even worse, if you created a 300 gibibyte drive to compete with the 300 gigabyte drive, consumers would probably not realise that the 300 gibibyte drive is bigger. It's not exactly a conspiracy, but it is a suboptimal arrangement that results from the manufacturers' incentives.
Except that the HDD manufacturers are going to keep doing what they've been, and it'll just create more confusion among consumers. "What's the conversion factor between GB and GiB?"
> Even worse, if you created a 300 gibibyte drive to compete with the 300 gigabyte drive, consumers would probably not realise that the 300 gibibyte drive is bigger
One manufacturer could just start using phrases like "300 REAL gigabytes!" and market it aggressively. "7% more storage than the competition! Finally a drive that stores the amount you paid for!"
As for why the -bi- binary prefices haven't caught on, I think one of the biggest obstacles is that they just sound hilariously stupid...
Ironically, my spell checker shows a little red squiggly under gibibyte. Even the developers don't believe it's a real word.
They're really not. SI still doesn't define any units relating to information quantity, so BIPM isn't relevant. This is one issue where an appeal to authority really doesn't fly. The relevant authorities (IEC, ISO, IEEE, JEDEC, etc.) didn't try to address the ambiguity until the late 90's, after it had spread into the general population.
It's not a mistake. In a living language, weight of usage trumps prescriptivism.
> However, in common usage we now have the pretty awful scenario where, if a person says "a megabyte", the actual number of bytes they're talking about can change depending on where the data is stored! A megabyte of RAM is not a megabyte of hard disk space.
That's not correct. A megabyte of RAM and a megabyte of HD are both 1,048,576 bytes, one megabyte. Again, usage trumps prescriptivism.
I agree that the hard-drive-manufacturer thing is not a "conspiracy;" it's simple false advertising that slips through the cracks of the law because of, again, a foolish belief in prescriptivism on the part of the enforcers.
If ram manufacturers do not want to advertise in gib but on gb, use thecorrect number. 0.987 or whatever gb instead of 1gb.
I can't sell liquor in 750ml bottles and advertise as 1L "just because i like rounding L instead of mL"
Mistakenly? How? The new terms with the "bibis" in them were invented because of the misuse of the original terms by storage vendors.
300GB hardrive becomes 300 10^9 bytes or 3 10^10. 32GB or memory becomes 32 2^30, or 2^35.
No ambiguity at all, and if you're spending hundreds of dollars for hardware, you should at least be able to grasp exponentiation.
>Because in 1960, the Bureau International des Poids et Mesures decided that the SI prefix G- meant 10^9.
This is wrong. The switch from GB to GiB happened in 1998 and came from International Electrotechnical Commission (IEC)[1], before that a GB was indeed 2^30. Then the old GB became the GiB and the new GB was standardized to use base 10.
$ dd if=/dev/zero of=/dev/null bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.157131 s, 6.8 GB/s
'dd' interprets command-line arguments "M" and "K" as 2^20 and 2^10 respectively, but when it reports the total amount copied, it uses the decimal GB (10^9). It also uses decimal SI units in the average speed calculation.My favorite file manager, Thunar, reports file sizes in decimal units. Not sure if it's distro-specific, though.
Many other programs, like 'df', have a switch (--si) that turns decimal units on or off.
Of course, aficionados of RFCs will know that "octet" is (or at least was) the preferred term for "byte" in IETF documents as well. I don't think this is because the RFC editor was secretly French, although i'm sure he was thoroughly classy; presumably it's because early RFCs were written in the era when the byte had not quite settled down at eight bits, and needed to be unambiguous.
I suppose I should've said that up until five years ago, nearly everyone was using powers of two for file sizes. And Windows/Mac never used the correct prefixes.
$ df -k /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 151798672 47602008 102615608 32% /
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 145G 46G 98G 32% /
$ df -H /
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 156G 49G 106G 32% /
(-h "human readable" with powers of 2^10, -H "human readable" with powers of 10^3)$ df --version df (GNU coreutils) 8.22
ls has -h/--si/-k ...
$ ls -h -l /proc/kcore
-r-------- 1 root root 128T Apr 6 19:02 /proc/kcore
$ ls --si -l /proc/kcore
-r-------- 1 root root 141T Apr 6 19:03 /proc/kcore
(/proc/kcore is 2^47 = 140737488355328 = 128 * 2^40 = 128 TiBi bytes)In fact, I've got a 20MB SCSI drive here which I should hook up to verify...
The real questions are: Why did hard drive manufactures move from a (misnamed) base 2 to base 10? They were confused before but then saw the light (decades later)? Why did all OSes and utilities use base 2? Why do most still use base 2? Why can't we use base 2 now?
Making the consumer think he gets more for his money doesn't give you an advantage over your competitor (if he's doing the same thing), but it does put more money into the industry as a whole. Imagine a home ice cream machine fad. All manufacturers might rise more or less equally, but they all make more money now that people believe their lives are better enhanced by putting their money into ice cream.
I think it makes sense that this didn't become significant until disk sizes started reaching GB levels, and therefore manufacturers' choices to switch to base 10.
Except when they do. Amazon Web Services uses 2^30-byte GBs for bandwidth and EBS disk sizes. They measure EC2 ephemeral disk sizes in 10^9-byte GBs, though...
A similar question is - why do all gas stations sell gas for x.yy9? It's impossible to sell something for 9/10 of a penny but in the U.S. they all do it.
If you ask the people doing it, you'll get the answer that it serves the consumer better, it's just a coincidence that it happens to make their product look artificially better/cheaper/whatever than it is.
Ditto with flash devices; due to their addressing architecture, they are inherently binarily-capacitised(?) I have here a 16MB USB drive from when they first came out, and it stores exactly 16,777,216 bytes, or 8,388,608 512-byte sectors. Back then, flash memory was all SLC and it was reliable enough that only the few spare bytes on each page were needed for remapping/ECC and the OS's filesystem bad-block management could be used.
1.44 MB = 1.44 B * 1000 * 1024
Now I realize that they mixed 2^10 and 10^3 in the same sentence. Thanks for finally answering a question that's been in the back of my brain for 20+ years!
http://en.wikipedia.org/wiki/Gibibyte
1 gigabyte = 1000^3 (10^9) bytes.
1 gibibyte = 1024^3 (1,073,741,824) bytes.
Second, it's not specific enough -- the fact that two parallel measurement schemes are in effect is the answer to the question he asks in his title. That deserves more than an offhand, incomplete reference.
> Unlike everything else in the world of computing, RAM is addressed in hardware. When you're designing a piece of silicon, you want to have N address lines and have every combination of zeroes and ones map to a memory location — to do otherwise would make the logic far more complicated. Nothing else is addressed this way.
So it's ok to used 2-based kMG for RAM, but not for hard drives? But hard drives get mapped to memory, and memory gets mapped to hard drives. I have pages of memory written out to disk, and I have inodes of files cached into memory. So my hard drive will be subdivided into pieces that are 2-based, and my partitions will normally have a whole number of such pieces, right? Addressing, of whatever sort, is often more convenient if different levels of subdivisions are 2-based (because the arithmetic can be bitwise or mostly bitwise, rather than addition and subtraction everywhere).
...so the idea that hard drives might want to report their entire size in a 2-based unit isn't even remotely as far-fetched as the author claims. It's 2s all the way down.
Blaming Tarsnap or people creating hard drives or other media is like when people use some old Frontpage and blame browsers that it's not supported, while IE (used by the majority!) does.
The idea of using different words to distinguish between real and fake gigabytes is a good one, but it makes the fundamental mistake of getting the "real" one backwards. It should be gigabytes and metric gigabytes, not gibibytes and gigabytes.