| Header | Total Bytes | Payload Bits |
| ---------- | ----------- | ------------ |
| `.......1` | 1 | 7 |
| `......10` | 2 | 14 |
| `.....100` | 3 | 21 |
| `....1000` | 4 | 28 |
| `...10000` | 5 | 35 |
| `..100000` | 6 | 42 |
| `.1000000` | 7 | 49 |
| `10000000` | 8 | 56 |
| `00000000` | 9 | 64 |
The full value is stored little endian, so you simply read the first byte (low byte) in the stream to get the full length, and it has the exact same compactness of VLQ/LEB128 (7 bits per byte).Even better: modern chips have instructions that decode this field in one shot (callable via builtin):
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
static inline size_t decodeLengthFieldTotalByteCount(uint8_t header) {
return (size_t)__builtin_ctz(header) + 1;
}
After running this builtin, you simply re-read the memory location for the specified number of bytes, then cast to a little-endian integer, then shift right by the same number of bits to get the final payload - with a special case for `00000000`, although numbers that big are rare. In fact, if you limit yourself to max 56 bit numbers, the algorithm becomes entirely branchless (even if your chip doesn't have the builtin).https://github.com/kstenerud/ksbonjson/blob/main/library/src...
It's one of the things I did to make BONJSON 35x faster to decode/encode compared to JSON.
https://github.com/kstenerud/bonjson
If you wanted to maintain ASCII compatibility, you could use a 0-based unary code going left-to-right, but you lose a number of the speed benefits of a little endian friendly encoding (as well as the self-synchronization of UTF-8 - which admittedly isn't so important in the modern world of everything being out-of-band enveloped and error-corrected). But it would still be a LOT faster than VLQ/LEB128.