To hold the length of a string, I'd do something similar to unicode:
7-bits for size + 1-bit for continuation, then 15 bits for size + 1 bit for continuation, then 23-bits for size + 1 bit for continuation, etc.
Or maybe even do it exactly the same as unicode:
0XXX XXXX -> length of string is in those 7 bits
1XXX XXXX XXXX XXXX -> length of string is in those 7+8 bits
11XX XXXX XXXX XXXX XXXX XXXX-> length of string is in those 6+8+8 bits
...
> On the critical short string path, it costs just a single bit test.A few more clock cycles compared to NULL-termination, although my alternatives above require even more clock cycles.
If the hardware had instructions for sentinel values, things would be easier (Like how DOS calls used '$' termination for strings) and safer.
Load a sentinel byte into a register and have dedicated copy and compare instructions that take each two addresses (src and dst) and copies (or compares) src/dst until the terminator is reached (with copy copying the sentinel as well).
Considering that sentinel values are needed so often, and are so useful, it's surprising that this is not in any ISA. What we have now is kludgy workarounds in the HLL for this. It's hard to blame the HLL, because some workaround has to be implemented.