It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.
And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.
And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:
extern char **environ;
Which is, of course, best described as bullshit.So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.
stdenvlock(); // imaginary function added to ISO C or POSIX
char *home = getenv("HOME");
char *home_copy = strdup(home);
stdenvunlock(); // only here can we unlock
// home pointer is now indeterminate
Other solutions:1. Put the above sequence into a function, and don't expose the mutex. Thread-safe code must use:
char *home = dupenv("HOME"); // imaginary function; caller responsible for freeing.
2. Provide environment lookup into a buffer: getenvbuf("HOME", mybuf, sizeof mybuf); // returns some value that helps to resize the buffer
All functions that retain pointers out of the classic getenv remain unsafe.A mutex can be provided to those applications that want to manipulate the environ array directly, or use getenv and setenv, or any combinations of these.
The main problem is all the code out there using getenv.
If your program wants to use the environment as an out-of-band global var for cross thread communication, you can make your own mutex.
A mutex can ensure thread safety but risks deadlocks if not used carefully and will hurt performance...
Nice to see that the author of the library has a sensible take. Unfortunately the ecosystem does not: https://github.com/seanmonstar/reqwest/blob/master/Cargo.tom...
This also shows up in web frameworks where Vue has the v-html directive and react has dangerouslySetInnerHTML. Vue definitely has it better.
https://doc.rust-lang.org/stable/std/env/fn.set_var.html
There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
keep[s] older versions around and adopt[s] an exponential resizing policy. This results in an amortized constant space leak per active environment variable, but there already is such a leak for the variable itself (and that is even length-dependent, and includes no-longer used values).
There have got to be pathalogical uses out there where this will cause unbounded memory growth in well-formed (according to the API) programs, no?
Interesting to see this _introduce_ a ‘bug’ (unbounded memory growth) for these programs that follow the API in order to ‘fix’ programs that don’t (by using the API in multiple threads). Pragmatism over dogma I guess. Leaves me feeling a bit sketched out though.
The only way to coordinate locking would be to do so in libc itself.
synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.
Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.
I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.
extern char **environ;
As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).
Exactly my point. Over time *environ would disappear, at least from the major software projects that everyone uses (assuming it's even in use in them in the first place).
[1] https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
How do you figure? The problem isn't the implementation, it's the API. setenv(), unsetenv(), putenv(), and especially environ, are inherently unsafe in a multithreaded program. Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer. Sure, a getenv_r() fixes the case where you get something back from getenv(), and then another thread calls setenv() and makes that memory invalid, but there's no way to protect the other calls breaking the API.
There are ways to mitigate some of the issues, like having libc hold a mutex when inside getenv()/setenv()/putenv()/unsetenv(), but there's still no way for libc to guarantee that something returned by getenv() remains valid long enough for the calling code to use it (which, right, can be fixed by getenv_r(), which could also be protected by that mutex). But there's no good way to make direct access to environ safe. I suppose you could make environ a thread-local, but then different threads' views of the environment could become out of sync, permanently (and you could get different results between calling getenv_r() and examining environ directly).
Back-compat here is just really hard to do. Even adding a mutex to protect those functions could change the semantics enough to break existing programs. (Arguably they're already broken in that case, but still...)
> How do you figure?
From https://illumos.org/man/3C/putenv:
> The putenv() function can be safely called from multithreaded programs
Won't that depends on the libc implementation. For example, maybe setenv writes to another buffer, then swaps pointers atomically; wouldn't that work?
Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.
Why is anyone using "setenv" anyway?
There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.
Oversights all around and hopefully this continues to improve.
Because it’s there and it looks like a good idea until it takes one of your fingers.
I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.
Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.
If there were a language feature that let me mark apps such that during any process env vars are not writable and are readable only once (together, in a batch, not once per var), I'd use it everywhere.
But yes, a flat namespace, with string values, shared as a free-for-all with who knows what libraries and modules you're loading… that's not a good idea even if it didn't have safety issues in setenv().
Being on the JVM which actually treats the environment as immutable that and which probably inspired a lot of the 12 factor app movement (with companies like Soundcloud being big Scala and Java users and pushing this), I've never experienced any issues with the environment changing on me or causing any threading issues. The environment is effectively immutable and there's nothing in my processes that sneakily circumvents that (via some native calls into libc). So, complete non issue on the JVM.
Even if somebody manages to modify the environment, the immutable copy stays the same. That copy gets created on JVM startup and is immutable. Anything using normal Java apis to interact with the environment will never see the modification. I'm sure people might have tried to work around that but it's not a wide spread practice. Because, again, why would you even want to do that?
The problem with configuration files is that their parsing is process specific. That's why Linux/Unix is such a mess. Every single tool seems to have its own conventions and mechanisms for configuration. There are no standards for this.
Other of course than the Docker ecosystem. You can do whatever you want inside the container but effectively your only interface to the outside world is either messily mounting some volume and doing whatever convoluted way of configuration your app requires; or just using environment variables. Most modern software is docker ready/friendly in the sense that you can fully control their behavior via the environment. It's perfectly adequate for most things that people run via docker these days. Which of course is pretty much anything.
And of course with Docker compose or kubernetes (which I'm not necessarily a fan of) you get yaml files defining lists of environment variables that define how your process starts. So you more or less get what you are asking for. I'm not a big YAML fan but it works well enough. Too much potential for syntax issues really ruining your day IMHO. But it's not like the alternatives are free of issues.
I personally use 12 factor app style, but once it's entered the app I validate the env variables and data and then store them. It's totally fine after that.
These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?
> We don’t have the necessary files outside of the container, and our containers are quite minimal and don’t allow us to easily install gdb.
Have people lost the ability to build and debug their code locally, without clouds and containers?
The reason is that cloud is where all the money is because cloud is DRM. Put software there and you can charge a subscription and nobody can evade it and you have perfect lock in forever. People usually can’t even get their data out. You can also do all kinds of realtime analytics conveniently to optimize your product.
Computing architecture is downstream of the business model. Mainframe died originally because there was no Internet and PCs were cheaper, but vendors also lost a lot of their lock in power. Now they have a way to bring a model that is much more profitable back. No more pesky freedom for users, who to be fair if given such freedom will often just refuse to pay, making quality software a non-viable business.
Tangent I know.
there are faults to the cloud but it solves real problems users have.
they should have handled crashes better - a problem they seem to recognize but not the issue here so not covered.
No, of course not, but it didn't crash on our machines!
I hate envvars. It’s “the Linux way”. I avoid them like the plague. A++ strong recommend.
libc is terrible. The world needs to move on.
Throw away your CPU and RAM then.
And even if those abstractions can't be 100% effective, we'd go a long way to achieving the desirable results of getting rid of it, if we just develop the mindset of avoiding it if at all possible, excepting for very rare instances where it's needed as a last resort.
Go ahead and write lots of mutable global statics. But when your program crashes randomly and you need my help to debug and it is, once again, a global mutable then you have to perform a walk of shame.
the problem is not linux, not mutable global state or resources and not libc.
the problem is not getting time at work to do things properly. like spotting this in GDB before the issue hit, because your boss gave you time to tirelessly debug and reverse your code and anything it touches....
there is too much money in halfbaked code. sad but true.
If both Rust and C have independent standard libraries loaded into the same process, each would have an independent set of environment variables. So setting a variable from Rust wouldn't make it visible to the C code, which would break the article's usecase of configuring OpenSSL.
The only real solution is to have the operating system provide a thread-safe way of managing environment variables. Windows does so; but in Linux that's the job of libc, which refuses to provide thread-safety.
*BSD allows it too, or used as of 2022.
What is unusual about Linux is that it guarantees a syscall ABI, meaning that if you follow it, you can make a system call "portably" across "any" version of Linux.
Is that true? It's just a process global string -> string map, that can be pre-loaded with values before the process starts, with a copy of the current state being passed to any sub-process. This could be trivially implemented with batch processing/supervisory programs.
Yes to read them, if Rust wish to modify - modify your own, already copied structure. I'd do that in pretty much any language.
https://github.com/sunfishcode/eyra
Oh look:
> Why use Eyra? It fixes Rust's set_var unsoundness issue. The environment-variable implementation leaks memory internally (it is optional, but enabled by default), so setenv etc. are thread-safe.
if (__environ == NULL || name[0] == '\0')
return NULL; import numpy
setproctitle() worked before numpy import but not after because it couldn't find the memory address of **environ.I'm hazy on the details but it led me to a somethingenv call (possibly getenv or setenv) in numpy initialization and it turned out that function changed the address of **environ and that was the reason for why setproctitle couldn't find it.
And:
> This function is safe to call in a single-threaded program.
> This function is also always safe to call on Windows, in single-threaded and multi-threaded programs.
> In multi-threaded programs on other operating systems, the only safe option is to not use set_var or remove_var at all.
It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.
There's also no real backwards compatible way of fixing setenv(). getenv() returns a pointer that can be read at any time, and then there's the *environment parameter that can also be used to read env variables.
IMO the entire API should be deprecated for a thread safe one, but until someone comes with a standard setenv() alternative that's implemented by the libc runtimes, we'll be stuck with the shitty POSIX API, and every year we will read blog posts about get/setenv() crashing processes.
The setenv( ) function need not be thread-safe. A function that is not required to be thread-safe is not required to be reentrant.
https://www.open-std.org/jtc1/sc22/open/n4217.pdf.Page.. 1860 :')
Is "the standard says it doesn't NEED to be thread safe" the argument that the Linux libc maintainers are using for not enhancing it to be thread safe, or is it based on some technical or backwards compatibility issues in doing so ?
A similar bug related to setlocale was found in 2007 and fixed in 2014. That bug did not take getenv/setenv into account. https://sourceware.org/bugzilla/show_bug.cgi?id=5443
> POSIX.1 does not require setenv() or unsetenv() to be reentrant.
A non-reentrant function cannot be thread safe.
In general (for POSIX, libc and many other libraries: if the docs do not explicitly say "this function is thread safe" they are not).
"It is ridiculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem." https://www.evanjones.ca/setenv-is-not-thread-safe.html
Actually, a non-reentrant function can be thread-safe. A common example of such a function in libc being malloc().
You run into a problem if you keep using a string returned by getenv after calling another environment function: including possibly getenv itself!
However, it's easy to just strdup the result of getenv; that defends against the issue in a single-threaded program.
1. If a process crashes and dumps, be sure to look at the system log of the cause (e.g. SIGSEGV, OOM, invalid instruction, etc.)
2. Be certain you’re looking at the right core dumps — I believe UID 1000 just means posix UserID (which is unrelated to a PID), though I don’t use containers.
3. Stay focused on the right level of abstraction — memory model details are great to know, but irrelevant here.
4. Variables do not correlate 1:1 with registers, except in C calling conventions. The assumption about x20 and a local variable is incorrect, unfortunately.
5. getenv() and setenv() do not work as implied in the post. When a process starts via execve(), the OS/libc constructs a new snapshot of the environment, and cannot be modified by an ancestral process. It’s a snapshot in time, unless updated by the process itself. When a process fork()s, the child gets a new copy of the parent’s environment — updates do not propagate.
getenv() is thread safe and reentrant. You don’t use an environment to pass shared data — setenv() is generally used when constructing the environment for a child process before a fork(). See man environment.
6. FWIW, ‘char** env’ is a null-terminated array of pointers, so dumping memory from *env (or env[0]) is only valid until you hit the first NULL. The size of the array is not stored in the array.
I hope this helps! And apologies if this is redundant — I read so many comments; mostly variations of “the problem with getenv is x”, but gave up before reading all of the (currently) 168 comments.
Re: fork(), I just meant to be thorough in explaining the environment is copied, not shared by processes. Setenv() only affects the process from which it’s called.
The array size bit in the article: The value 0x220 looks suspiciously close to the size of the old environment in 64-bit words (0x220 / 8 = 68), and this value was written over the terminating NULL of the environment block…
HTH!
It does not help, because you do not appear to have understood the article (or even read it all that closely).
Some of these bullet points feel a lot like the kind of junk output one sees from the various (popular, but flawed) AI summary tools...
I could be wrong though —- could you be specific? I don’t want to misinform anyone…
I should see if the env_logger crate has a better solution.
But really, I don't understand why a sensitive security-related library would implicitly use an unsafe function like setenv().
This is a oversimplification. Windows has essentially the exact same API and it works just fine in multithreaded contexts.
The issue here is unix allows the underlying pointer to be accessed, bypassing any possible thread-safe APIs.
Ref: https://man7.org/linux/man-pages/man3/setenv.3.html
... it clearly says: "MT-Unsafe"
Also, there is a whole section about get/set env thread safety here (under "Other safety remarks -> env"):
Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.
Then I can use code blocks to delimit where that stack should be popped. [1]
This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]
IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]
[1]: https://git.yzena.com/Yzena/Yc/src/branch/master/tests/yao/e...
[2]: https://gavinhoward.com/2024/09/rewriting-rust-a-response/#g...
[3]: https://gavinhoward.com/2024/05/what-rust-got-wrong-on-forma...
If you have 1) C FFI interop in Yao, there's still a chance you might have two C libraries cause a crash without your code even being involved.