undefined | Better HN

0 pointslmm10mo ago0 comments

> Sure, and if I store my data in a Rust array and store indexes into that array around the place as sort of weak references (something I've seen Rust programmers use and talk about all the time), I can easily fetch the wrong data too.

Maybe, but you'll do so in an understandable way that you can catch in testing. You won't suddenly start fetching different wrong data when someone builds your program with a newer version of the compiler, which is a very real risk with C. To say nothing of the risk of arbitrary code execution if your program is operating on attacker-supplied data.

> The irony, of couse, is that the characters that can break your terminal session are perfectly valid UTF-8.

Terminals that can't handle the full UTF-8 range are a problem with those terminals IMO. And terminals implemented in Rust probably don't have that problem :).

0 comments

tsimionescu10mo ago

> Terminals that can't handle the full UTF-8 range are a problem with those terminals IMO. And terminals implemented in Rust probably don't have that problem :).

No, it isn't, and yes, they would. The problem is that the terminal accepts certain valid UTF-8 characters (typically from the ASCII subset) as output control characters. This is how you get things like programs that can output colored text.

This is a part of the fundamental design of how a terminal device is supposed to work: its input is defined to be a single stream of characters, and certain characters (or sequences of characters) represent control sequences that change the way other characters are output. The problem here is with the design of POSIX in general and Linux in particular - the fact that, despite knowing most interaction will be done through a terminal device with no separate control and data channels, they chose to allow control characters as part of file names.

As a result of this, it is, by design, impossible to write a program that can print out any legal file name to a terminal without risking to put the terminal in a display state that the user doesn't expect. Best you could do is recognize terminal control sequences in file names, recognize if your output device is a terminal, and in those cases print out escaped versions of those character sequences.

lmmOP10mo ago

> the terminal accepts certain valid UTF-8 characters (typically from the ASCII subset) as output control characters. This is how you get things like programs that can output colored text.

The terminal should not allow such a sequence to break it. Yes, being able to output colour is desirable, but it shouldn't come at the cost of breaking, and doesn't need to. (Whereas it is much less unreasonable for a terminal to break when it's sent invalid UTF-8).

> This is a part of the fundamental design of how a terminal device is supposed to work: its input is defined to be a single stream of characters, and certain characters (or sequences of characters) represent control sequences that change the way other characters are output.

"Design" and "supposed to" are overstating things. It's a behaviour that some terminals and some programs have accreted.

> it is, by design, impossible to write a program that can print out any legal file name to a terminal without risking to put the terminal in a display state that the user doesn't expect

I would not say by design, and I maintain that the terminal should handle it.

tsimionescu10mo ago

I believe you're misunderstanding the problem. The terminal doesn't "break" in the sense that it crashes or does something undefined for those cases. The terminal is doing something that is completely meaningful and well defined and probably has some realistic use cases, such as switching to a different character encoding.

The only problem is that it's not what the user wanted to happen. For a simple example, if a file name contains the control sequence for starting a block of red text, and you print that file name as is in a terminal, you'll, (1), see a truncated file name (that is, copying the text from terminal will not give you the actual file name, since the control characters will be entirely missing), and (2) all future text will be red.

The terminal has done nothing wrong in this case: it used its normal logic for turning text red. The file name is not in any way wrong - it's a completely valid Linux and ext4 file name. The program is not necessarily doing anything wrong - perhaps it was never designed to print to a terminal. But the overall interaction produces the wrong results.

lmmOP10mo ago

> I believe you're misunderstanding the problem. The terminal doesn't "break" in the sense that it crashes or does something undefined for those cases. The terminal is doing something that is completely meaningful and well defined and probably has some realistic use cases, such as switching to a different character encoding.

I'm aware of the details, but I think sometimes that knowledge leads people to miss the forest for the trees. If the user perceives the terminal as having "broken", that's a case of poor UX design at a minimum. Given that users can readily distinguish between legitimate coloured output etc. and terminals getting into a poor state, it really shouldn't be too hard for the terminal itself to do so. (E.g. it's pretty normal for today's terminals to display some kind of visible warning (complete with resume button) when you press Ctrl-S, rather than simply silently stopping). And while this is a much fuzzier and more contentious claim, I think the Rust community's mentality (as seen in e.g. their approach to compiler errors) nudges people towards such approaches.

1 more reply

fc417fc80210mo ago

I don't think it's Linux so much as it is any given filesystem implementation. As I understand it validation is entirely up to the filesystem itself. I could be mistaken but I don't believe there's anything stopping you from implementing a filesystem that uses raw binary data for filenames.

There's also the question of what happens if the data structures on disk become corrupted. The filesystem driver might or might not validate the "string" it reads back before returning it to you.

tsimionescu10mo ago

Linux itself exposes various syscalls that operate with filenames, userland programs can't interact directly with the FS driver. But Linux chose to implement only 2 restrictions at the syscall level (slash used to separate elements of the path and NULL used to mark the end of the input). The kernel will resolve the path to a particular file system and send a subset of the path to the corresponding FS driver exactly as it received them, and the FS can choose whether to accept or reject them. Most Linux FSs don't apply any extra restrictions either. The main exceptions are FSs written to interface with other systems, such as CIFS or SMB, which additionally apply DOS/Windows filename restrictions by necessity.

If Linux had chosen to standardize file names at the syscall level to a safe subset of UF-8 (or even ASCII), FS writers would never have even seen file names that contain invalid sequences, and we would have been spared from a whole set terminal issues. Of course, UTF-8 was nowhere close to as universally adopted as it is today at the time Linux was developed, so it's just as likely they might have standardized to some subset of UTF-16 or US-ASCII and we might have had other kinds of problems, so it's arguable they took the right decision for that time.

fc417fc80210mo ago

It's even worse than that. If a newer version of the compiler is able to leverage knowledge of the array bounds it could "optimize" away an entire chunk of your program. It probably won't do that because compiler authors supposedly aren't openly hostile towards compiler users but it isn't so easy to write an algorithm that will flag such "obviously" wrong things.

The control characters are themselves valid (but unprintable) UTF-8. They are also, against all common sense and reason, permitted within filenames by many filesystems. Rust won't save you here. https://www.compart.com/en/unicode/category/Cc

serbuvlad10mo ago

> Maybe, but you'll do so in an understandable way that you can catch in testing. You won't suddenly start fetching different wrong data when someone builds your program with a newer version of the compiler, which is a very real risk with C. To say nothing of the risk of arbitrary code execution if your program is operating on attacker-supplied data.

What guarantees it? Literally nothing. You can catch errors in testing in C as well. Yeah, in C you get "different" data on a different version of the compiler, but you get garbage data in all versions of the compiler and Valgrind flags that in testing.

And, of course, you can get arbitrary code execution if your program is operating on data coming from multiple users of different privilege levels in Rust if you use vectors like that.

Sure, Rust fixes a lot of easy to commit bugs in C and C++, absolutely. But there are no absolutes.

lmmOP10mo ago

> What guarantees it? Literally nothing.

The behaviour of looking up index x in array y in rust is well-defined, and consistent between compiler versions. Maybe you use the wrong index and get the wrong customer's data or something, but you'll still get the data at that index (or a panic if the index is invalid).

> You can catch errors in testing in C as well. Yeah, in C you get "different" data on a different version of the compiler, but you get garbage data in all versions of the compiler and Valgrind flags that in testing.

Not always. It's very common for code to be broken according to the standard but do the right thing in some compilers, and then in a different compiler it does something completely bizarre. The code might not do the same thing in testing as it does in release. And while Valgrind does a lot of good for the minority of C programmers who use it, it's far from 100% reliable.

> you can get arbitrary code execution if your program is operating on data coming from multiple users of different privilege levels in Rust if you use vectors like that.

You might in some cases, but it's a lot tricker. You can get the program to operate on different parts of its data than it was intended to, but to go from there to running arbitrary code will still be a significant leap and require an exploitation technique specific to that particular program. Whereas the techniques for going from common classes of C vulnerabilities (e.g. buffer overflow or use after free) to arbitrary code execution are practically textbook at this point.

j / k navigate · click thread line to collapse

0 comments

tsimionescu10mo ago

> Terminals that can't handle the full UTF-8 range are a problem with those terminals IMO. And terminals implemented in Rust probably don't have that problem :).

lmmOP10mo ago

> the terminal accepts certain valid UTF-8 characters (typically from the ASCII subset) as output control characters. This is how you get things like programs that can output colored text.

"Design" and "supposed to" are overstating things. It's a behaviour that some terminals and some programs have accreted.

> it is, by design, impossible to write a program that can print out any legal file name to a terminal without risking to put the terminal in a display state that the user doesn't expect

I would not say by design, and I maintain that the terminal should handle it.

tsimionescu10mo ago

lmmOP10mo ago

1 more reply

fc417fc80210mo ago

There's also the question of what happens if the data structures on disk become corrupted. The filesystem driver might or might not validate the "string" it reads back before returning it to you.

tsimionescu10mo ago

fc417fc80210mo ago

serbuvlad10mo ago

And, of course, you can get arbitrary code execution if your program is operating on data coming from multiple users of different privilege levels in Rust if you use vectors like that.

Sure, Rust fixes a lot of easy to commit bugs in C and C++, absolutely. But there are no absolutes.

lmmOP10mo ago

> What guarantees it? Literally nothing.

> you can get arbitrary code execution if your program is operating on data coming from multiple users of different privilege levels in Rust if you use vectors like that.

j / k navigate · click thread line to collapse