And this is great for environments that can support it, but as the levels get lower and lower, such safety nets become prohibitively expensive.
Take data formats, for example. Say we have a small device that records ieee754 binary float32 readings. A simple format might be something like this:
record = reading* terminator;
reading = float(32, ~) | invalid;
invalid = float(32, snan);
terminator = uint(32, 0xffffffff);
We use a signaling NaN to record an error in the sensor reading, and we use the encoding 0xffffffff (which is a quiet NaN) to mark the end of the record.If we wanted the validity signaling to be out-of-band, we'd need to encode it as such; perhaps as a "validity" bit preceding each record:
record = reading* terminator;
reading = valid_bit & float(32, ~);
valid_bit = uint(1, ~);
terminator = uint(1, 1) & uint(32, 0xffffffff);
Now the format is more complicated, and we also have alignment problems due to each record entry being 33 bits. We could use a byte instead and lose to bloat a little: record = reading* terminator;
reading = valid_bit & float(32, ~);
valid_bit = uint(8, ~);
terminator = uint(8, 1) & uint(32, 0xffffffff);
But we're still unaligned (40 bits per record), which will slow down ingestion. We could fix that by using a 32-bit validity "bit": record = reading* terminator;
reading = valid_bit & float(32, ~);
valid_bit = uint(32, ~);
terminator = uint(32, 1) & uint(32, 0xffffffff);
But now we've doubled the size of the data format.Or perhaps we keep it as a separate bit array, padded to a 32-bit boundary to deal with alignment issues:
record = bind(count,count_field) & pad(32, validity{count.value}, padding*) & reading{count.value};
count_field = uint(32,bind(value,~));
reading = float(32, ~);
validity = uint(1, ~);
padding = uint(1, 0);
But now we've lost the ability of ad-hoc appends (we have to precede each record with a length), and the format is becoming a lot more complicated.