It might be nice if future surveys explicitly asked a followup question "Regardless of the standard or behavior of existing compilers, is there one of these answers that is the 'obviously correct' manner in which compilers should behave? Which one?"
If practically all users believe that the same answer is 'obviously correct', compiler writes might want to take this into account when deciding which behavior to implement.
For MSVC, one respondent said:
"I am aware of a significant divergence between the LLVM
community and MSVC here; in general LLVM uses "undefined
behaviour" to mean "we can miscompile the program and get
better benchmarks", whereas MSVC regards "undefined
behaviour" as "we might have a security vulnerability so
this is a compile error / build break". First, there is
reading an uninitialized variable (i.e. something which
does not necessarily have a memory location); that should
always be a compile error. Period. Second, there is reading
a partially initialised struct (i.e. reading some memory
whose contents are only partly defined). That should give a
compile error/warning or static analysis warning if
detectable. If not detectable it should give the actual
contents of the memory (be stable). I am strongly with the
MSVC folks on this one - if the compiler can tell at
compile time that anything is undefined then it should
error out. Security problems are a real problem for the
whole industry and should not be included deliberately by
compilers."
I'm much less familiar with MSVC than the alternatives, but this is a refreshing approach. Yes, give me a mode that refuses to silently rewrite undefined behavior. Is MSVC possibly able to take this approach because it isn't trying to be compliant to modern C standards? Does it actually reduce the ability to apply useful optimizations? Or just a difference in philosophy?Initialization of a variable has no relation to whether it has a memory location. You can legitimately take the address of an uninitialized variable (that is often one step in initializing them, such as when you pass the address to memset), and even an initialized variable may not have a memory location (if it lives only in register, or is compiled away entirely).
In those cases some compiler override could provide the solution, while still allowing the compiler to flag all the other cases as errors.
The problem with this democratic approach is that most users are not qualified such that their opinion is particularly valuable.
The small minority who doesn't find it "obviously correct" may actually in some cases be the minority with a clue.
It could work if only those users are given a vote who pass a language lawyer exam.
They aren't advocating a democratic approach, they are saying that the general expectations of users is valuable information.
You don't rely exclusively on what the users want, but users expectations for how compilers behave is very valuable behavior, whether it means finding out how to inform users of the real behavior or changing the behavior.
You cannot implement that efficiently without rejecting valid programs. Consider code like this:
int x;
if( f()) x = 1;
if( g()) h(x);
For sufficiently complex functions f and g, there's no reasonable way to decide at compile time whether x will be set whenever g() returns true. For example, f() might always return false because Fermat's last theorem is true.And that 'reasonable' likely isn't a necessary part of that statement.
For example, your example could legally be rewritten to:
int x;
f();
x = 1;
if (g()) { h(x); }
The only difference is if f() were false, and someone would then access x and see 1; but that's undefined behavior, so you can ignore it. In fact, assuming x is not accessible outside this block: f();
if (g()) { h(1); }
These optimizations happen all over the place, it's not the compiler invoking or causing undefined behavior, but assuming that it won't ever happen.EDIT: Note the further optimization that looms: if f() can be proven pure (no side effects), then it can be removed. This makes little sense for a function with no arguments (in which case it would just be a constant). If, however, f(y, ...) is some expensive but pure function, it can just be removed completely.
int x;
h(x);
The point is not to solve the Halting Problem, but to catch code that is so obviously wrong that no programmer would deliberately write it (unless they were testing the compiler's reaction.)This is begging for one question in my opinion. Should we keep using old libraries that nobody is maintaining anymore? Isn't that a big security issue?
int foo() {
int bar;
return bar + 5; /* C4700: local variable 'bar' used without having been initialized */
}
https://msdn.microsoft.com/en-us/library/axhfhh6x.aspxIn the same situation, GCC says "may be used uninitialized in this function" if you enable the warning (-Wmaybe-uninitialized), despite this being a trivial and certain case.
Maybe you are using a old version.
If you enable -Wall which you always should then this flag is automatically enabled.
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#inde...
You're out of luck. You have to solve the halting problem to statically analyze whether or not a variable will be used when it's undefined. The reason why this is not solved well is because it's impossible to solve perfectly! Java made another approach that I hate: For example if you have
Object a;
for(int i = 0; i < 1;++i) a = new Object();
a.toString();
then Java will give you error: variable a might not have been initialized
even though it's plain and obvious that a is initialized. Things like that make me mad, it's a non-solution.The parallel to your example would be if the for loop was "for(int i = 0; i < j;++i)". If the compiler was able to determine that there is a code path whereby j might be undefined, should it be allowed to remove the body of the loop, even in those cases where the programmer knows by other means that "j >= 1"?
My request is that it either keep the loop body, or complain about the undefined behavior, but not silently make 'optimizations' based on the fact that it has identified the potential for undefined behavior to occur.
Note that I'm just using an 'uninitialized variable' as a hypothetical example. Given a chance, I always compile with -Wall -Wextra, and in practice, GCC, CLang, and ICC (the compilers I use) do a good job of issuing warnings for the use of uninitialized variables. I like this current behavior, but would prefer a philosophical approach that makes warnings like this more rather than less common.
let x;
println!("x:{}", x); Object a = null;
and go on your way. You do run the risk of a null pointer exception if you don't assign it a valid object reference, though.Most programmers today will never encounter any of those.
Yes, if your system is essentially a PDP-11.
I don't mean that sarcastically; pcwalton's sibling message only begins to mention the ways in which C is not a match to modern systems. Vector processing, NUMA, umpteen caching layers, CPU features galore... it's not really a match to the "architecture" any more.
(To the extent that you may think C supports those things, I don't really think "lets you drop arbitrary assembler in the middle of a function" constitutes "support". YMMV. To be fair to C there's a lot of features that seem to be unsupported by any high-level language today. Hardware moves way faster than programming languages. If you want to figure out what's coming after the current generation of languages, "a language that actually lets you use all the capabilities of modern hardware without dropping to assembler and giving up all the safety of the higher-level language" is at least one possible Next Big Thing.)
The standard doesn't say anything about SIMD, GPU, instruction reording, IO registers, interrupts...
All of that are language extensions or library functions written in Assembly. Any programming language can offer similar extensions.
Is reading an uninitialised variable or struct member (with a current mainstream compiler):
(This might either be due to a bug or be intentional, e.g. when copying a partially initialised struct, or to output, hash, or set some bits of a value that may have been partially initialised.)
a) undefined behaviour (meaning that the compiler is free to arbitrarily miscompile the program, with or without a warning) : 128 (43%)
b) ( * ) going to make the result of any expression involving that value unpredictable : 41 (13%)
c) ( * ) going to give an arbitrary and unstable value (maybe with a different value if you read again) : 20 ( 6%)
d) ( * ) going to give an arbitrary but stable value (with the same value if you read again) : 102 (34%)
e) don't know : 3 ( 1%)
f) I don't know what the question is asking : 2 ( 0%)
--------------------
I know of one datastructure (a sparse set of integers from 1-n) which relies on this behavior: http://research.swtch.com/sparse . I always thought it was a neat trick. However, from the article is seems that may NOT give stable values to uninitialized members. Which may make that data structure behave strangly or cause the program to miscompile.
For the sparse set of integers, generally speaking that's going to be allocated somewhere else all at once, so there is not much the compiler can do to 'realize' it's uninitialized and decide to just ignore the read from memory completely. A fancy compiler could presumably flag every uninitialized location, then do checks and use some random value every-time you attempt to use one, but practically speaking no compiler is going to do that, so this data-structure isn't technically standards compliant, but it should probably still work anyway.
int b; int c = b * 0;
Also, there seems to be some confusion about storing and loading pointers, when the standard speaks to this as well; roughly: a pointer which is converted to a "large enough" integer type will point to the same object when converted back. It is permissible for an implementation to not provide a large enough integer type, but excepting that, the behavior is well defined.
hides
(Just to be clear, I am referring to the people that downvoted you.)
Is it possible to build a language, which would reduce the number of false assumptions?