Don't feel too silly. Russ Cox, one of the technical leads on the Go language, made the same mistake in the regexp package of the standard library.
After profiling for a bit, I discovered that suddenly a lot of time was spent in isinf on Clang and no time in GCC… Clang was emitting a function call where GCC wasn’t. I happened to randomly change isinf to std::isinf (it’s a random habit of mine to put std:: in front of these C functions). Suddenly the regression disappeared! I guess on Clang only std::isinf was a compiler intrinsic while GCC recognized both? Anyway, that’s my small-change optimization story.
And did you learn your lesson about making random changes that "shouldn't matter" without proving they don't matter? :)
I find that once I spend the time to make these changes correctly, they are not worth the time to make correctly.
- The Go programming language spec https://go.dev/ref/spec
- Effective Go https://go.dev/doc/effective_go
- Advanced Go concurrency patterns https://go.dev/talks/2013/advconc.slide#1
- Plus many more talks/slides https://go.dev/talks/
Changing Ruleset from being []Rule to []*Rule, which would mean we no longer need to explicitly take a reference to the rule.
Returning a Rule rather than a *Rule. This would still copy the Rule, but it should stay on the stack instead of moving to the heap.
> However, both of these would have resulted in a breaking change as this method is part of the public API.The problem with heap allocated objects could be due to the incorrect public API.
The change that improves performance also gives out pointers to the actual elements of Ruleset itself permitting the caller to change the contents of Ruleset which wasn't possible before the speed-up. Perhaps you're already aware since change to []*Rule was being considered.
It's debatable what API guarantees existed around this though; most of the time this would be unspecified.
As long as the global slice is never mutated, the current approach is probably fine, but it is definitely a semantic change to the code.
I’m also fairly sure Go uses RVO here too, which cuts down on the number of times the object is copied around, but again, it’s irrelevant to the discussion of heap allocations. Copying the object isn’t the performance problem here, needlessly allocating a very short-lived object on the heap over and over is.
I agree with you for the garbage collector. By design, a GC allows you to willy-nilly copy without thinking about the consequences.
It's best to treat each struct as a "value" or "pointer" type, and use one or the other consistently for each type. This mostly avoids the need to use & in the first place
It's not that value semantics can't be better (they most assuredly can be), or that reference semantics don't cause their own complexity problems, but rather that so often we thoughtlessly imply/impose value semantics through interfaces in ways that negatively impact performance; getting interfaces wrong is a much tougher bell to unring.
The vast majority of my mental energy when I define an interface in C++ is carefully thinking through a combination of ownership contracts and value vs. reference semantics that I can mostly ignore in languages with implicit reference semantics. While occasionally ignoring those contracts while developing in Java/Python/whatever comes back to bite me, the problem isn't nearly as common or problematic as when I unintentionally impose value semantics in a language that allows me to.
I spend most of my time in a JVM language of one flavor or another, and when I was learning Go, the first thing that stuck out at me was, "why would I ever want the compiler to invisibly copy a data structure for me?"
I suppose the primary reason is to prevent the callee from modifying the caller's data out from under them; unless you pass a reference value, you know the callee cannot modify your data.
But, as someone who leans heavily into "everything should be as immutable as possible," the second thing that stuck out at me was "wait, a struct can't have const fields?"
When I write code, it's common to have references to immutable classes thrown around with wild abandon, heedless of ownership, threads, or good taste, because the data just can't change. But that's a paradigm that Go simply doesn't support.
If there's anything I wish languages with implicit reference semantics would adopt, it's implicit immutability. I wish Java would be so much nicer with keyword that is half way between "final" and "volatile" that means, "yes, you can actually mutate this" and then make final semantics the default for fields & variables.
You might get a kick out of Virgil. It's easy (and terse!) to define immutable classes and you can have immutable ADTs too. (plus tuples, generics with separate typechecking, etc).
I think that’s true. Expensive copies should never have been implicit. There was a story some time ago about a single keypress in the address bar of Chrome causing thousands of memory allocations. The culprit: lots of std::string arguments up and down the call stack.
Rust gets this right, with the hindsight of C++’s example: “a = b” is a move operation by default and clone() is always explicit, except for plain data types where copying is literally memcpy — and those are clearly marked as such by the type system.
Oh, I miss it every time. ;-)
I will say though that some newer languages seem to have a confused idea about how to offer mixed semantics. A bunch of them tie semantics to types. The ideal interface can vary by usage context. It's hard enough getting the semantics right as the callee (as opposed to caller), let alone when you're defining a type that will be used by who knows how many interfaces.
> I guess I always figured the solution was "value semantics with better education / tooling".
I've always thought much the same, but I have slowly come to appreciate that it's more than just education & tooling. Even with good education & tooling, there's a cognitive load that comes with getting interfaces right that for the general case is just not worth it.
To me, this sounds like this is it. Explicit is better than implicit is a very useful truism
Rider does this already for C#.
That is to say, I think I mostly am agreeing with you. In Java, objects are always passed by reference, never by value, and never implicitly copied. But Java doesn't have any value types other than primitives. When I added ADTs to Virgil, I wanted to get the best of both worlds; objects are pass by reference, like Java, and ADTs are immutable, identity-less, so they are deep compared but never deep copied. (ADTs in Virgil can also sometimes be flattened, so they don't necessarily result in heap allocation).
I'd have to take a look at Virgil to appreciate your approach, but I'm always leery of implicit value vs. reference semantics tied to types (aside from the whole array fiasco, easily the ugliest part of Java's type system). So often the particular semantics you want are driven by the context of what you're doing, rather than the what you're doing it with.
There's no performance cost to value semantics, so of course not.
> The copy is cheap. The lifetime tracking is not because it forces a heap allocation and creates additional GC pressure. In fact, assuming Rule is small, if Match returned by value, the code would be similarly as fast.
I'm referring more to how this stuff seeps in without the programmer realizing it. It's the implicit nature of all this behaviour that is the problem.
Esit: Also, pointers into slices will probably leave you sad. You get a pointer to the slice storage, not a pointer to the thing. And if the slice resizes your pointer mow references a dead slice. Basically, pointers and slices are not friends. Unless you have a slice of pointers, which idiomatic go avoids unless there is a decent reason for it :)
PHP, for example, has explicit references. If you have an `$arr1=array(1,2,3)` and an `$arr2 = $arr1`, that second array is a full copy of the first array, and updating $arr1 does nothing to $arr2. Similarly, `function update_array($arr) { $arr[0] = 'cake'; }` called with `$arr1` will create a copy of $arr1 for use inside that function. Any changes to `$arr` inside the function only apply to that copy, not to $arr1. Unless you explicitly tell PHP you want to work with references, by using `function update_array(&$arr) { ... }` instead. PHP uses value semantics.
JS, on the other hand, uses implicit reference semantics. If you have a `const arr1 = [1,2,3];` then `arr1` is an alias, simply pointing to a memory location, and declaring a `const arr2 = arr1`, makes arr2 also be an alias for that same memory location. They both reference the same thing, but neither "is" the thing. Similarly, if you have a `function updateArray(arr) { arr[0] = 'cake'; }` then that `arr` is also just an alias pointing to the memory location of whatever you passed in, and changes to `arr` change the underlying data, so whether you try to access that through `arr1`, `arr2` and `arr`, it's the same thing. JS uses implicit reference semantics.
(But note that many languages with implicit reference semantics make exceptions for immutable primitives like numbers or strings)
If your language has implicit reference semantics, “x = y” will cause x and y to refer to the same object. If it has value semantics, x will be a copy of y.
"Value semantics": you pass the value itself around, which means you're shallow-copying it all the time. That's what you get when you pass or return a bare non-interface type in Go.
PHP is a bit of a mess, objects are passed by reference (= implicit reference semantics) but arrays are passed by value, and you can opt them into pass-by-reference. You can also pass non-arrays by reference, though I don't think that's very common.
b = a
b[0] = 3
print(a)
What does the above print? If the language implements reference semantics, it prints [3, 2]. If it implements value semantics, it prints [1, 2].
In languages like C, C++, Go, Rust… references are more explicit. If you want a pointer to an object, you have to &, or something similar.
It gets a bit fuzzy.
Yeesh
In a normal value-semantics system if you have a pointer you just copy the pointer. Obviously if the langage doesn’t have your back it also means you’ve now fucked up your ownership, but you’re in C so that was to be expected.
Had the author been looking at the type information within the syntax of the code the profile output may not have been a surprise. Perhaps the problem would never have existed in the first place.
If you were forced to stop and think what type to declare, I bet you'd write "var rule *Rule". Even if you don't think deeply and just look at the return type.
And then if you assigned "r[i]" to "rule", you'd get a type error.
if match || err != nil {
return rule, err
}
Translating this code to actual logic takes too much thought and is too fragile. Is that an error path or a success path? It’s both! The logic is “if we found a rule or if there was an error then return a tuple that hopefully indicates the outcome”. If any further code were to be added in this block, it would have to be validated for the success and the error case.But this only makes any sense at all if one is okay with reading Go result returns in their full generality. A canonical Go function returns either Success(value) or Error(err not nil, meaningless auxiliary value). And this code has “meaningless auxiliary value” != nil! In fact, it’s a pointer that likely escapes further into unrelated error handling code and thus complicates and kind of lifetime or escape analysis.
I don’t use Go, but if I did, I think this part of the language would be my biggest peeve. Go has very little explicit error handling; fine, that’s a reasonable design decision. But Go’s error handing is incorrectly typed, and that is IMO not a reasonable design.
Nevertheless, the convention is that if a function returns (value, err), and err != nil, the value is discarded (I think of it as "undefined"). So the code is conventional.
But Go is a garbage collected language, and there is so such thing as “discarding” a pointer. Either it’s there or it isn’t, and this kind of leak has side effects. I find it baffling that the language designers and the community consider this acceptable.
(One thing I really like about Rust is that you can’t play fast and loose with lifetimes like this. If you have function taking &'a Vec<T> and you return &'a T, you can’t arbitrarily “discard” and leak that reference up the call chain. You need to genuinely get rid of it by the time the vector is gone.)
if err != nil {
return nil, err
}
if match {
return rule, nil
}Now of course it's important to read the documentation, but a language with sum types (or exceptions) would have used a separate type to indicate an error condition plus useful information on that error.
If something previously took 4s now takes 2s then it's 100% faster.
Think of driving 10miles. If you drive at 20mph then it takes 30 minutes. If you drive twice as fast, 40mph, it takes 15 minutes.
40mph is 100% faster than 20mph.
Half the time is twice as fast!
I tried all of the obvious things, but the offender ended up being a call to allocate and fill a `struct tm` object from a string representation of a date. This doesn't have any obvious reasons (to me) that it would cause cache invalidation, etc, so I'm a little in the dark.
Still, replacing this four line block improved single threaded performance by 5x, and fixed the threaded behavior, so on the whole it is now ~70x faster and parses about 400mb of csv per second.
The optimization would only work if you had a way to tell the compiler that some values are constant/immutable.
In Rust you can write a function that returns the pointer of one element of a slice. You can also write a function that returns the pointer to a heap-allocated copy of an element of the slice. The two functions would have different signatures.
The compiler would also prevent mutation of the slice as long as there are any references to individual elements of the slice being passed around.
When it's a pointer to a copy, no such implicit dependencies occur.
In this specific case, that technique would be whole-program dataflow analysis. Given a Golang function that passes out references-to-copies-of owned data, you could actually determine for certain — at least in the default static-binary linkage mode — whether these two properties hold universally within the resulting binary:
1. whether no caller of the function will ever try to do anything that would cause data within their copy of the struct to be modified;
2. whether the owner of the data will never modify the data of the original struct in such a way that, if the copy were elided, the changes would be "seen" by any reads done in any of the callers. (The owner could still modify internal metadata within the struct for its own use, as long as such internal metadata is 1. in private fields, 2. where all callers live outside the package defining the struct, making those fields inaccessible; and 3. the fields are never accessed by any struct methods called by borrowers of the struct — keeping in mind that such methods can be defined outside the package by caller code.)
If you could prove both of these properties (using dataflow analysis), then you could safely elide the copy within the function, turning the return of a reference-to-a-copy-of-X into a return of a reference-to-X.
(And, in fact, if you can only prove the second property universally, and the first property in specific instances, then you can still elide the copy from the function itself; but you'd also generate a wrapper function that calls said function [receiving a reference-to-X], copies, and so returns a reference-to-a-copy-of-X; and then, for any call-site where the first property doesn't hold — i.e. callers whose transitive call-graph will ever modify the data — you'd replace the call to the original function with a call to the wrapper. So "safe" connected caller sub-graphs would receive references, while "unsafe" connected caller sub-graphs would receive copies.)
Yep, with values that take a lot of memory, it's faster to pass pointers/references around than it is to pass the values around, because it is less bytes to copy.
Of course there is more to such a decision than just performance, because if the code makes changes to the value which are not meant to be persisted, then one wants to be working with a copy of the value, not a pointer to the value. So one should take care if simply switching some code from values to pointers-to-values.
All of these things are things that coders with more experience of languages that use such semantics kinda know already, almost as second nature, since the first day they got caught out by them. But everyone is learning, to various degrees, and we all have to start somewhere (i.e. knowing little to nothing).
That's a very nice feature! I wonder if compilers for other languages have something similar.
The frontend may or may not have its own optimizations and logs tho e.g. rustc now has MIR optimizations (https://rustc-dev-guide.rust-lang.org/mir/optimizations.html) but while you can dump MIR data (https://rustc-dev-guide.rust-lang.org/mir/debugging.html) I don't remember seeing an optimisation log.
At the end of the day, I think it's more likely that you take a look at the assembly and infer problems from there if the profiler doesn't tell you straight. An other difference is the kind of decisions the compiler makes e.g. while a compiler can optimize away allocations in "manual allocation" languages (https://godbolt.org/z/5nEo7xjEr) the allocations are plainly visible, so if they're trivially avoidable... you'll just avoid them.
Using Rust as an example, you'd have something like this:
pub fn match_(&self, path: &str) -> Result<&Rule, Error> {
for rule in self.0.iter() {
if rule.match_(path)? {
return Ok(rule);
}
}
Err(Error)
}
You couldn't miss an allocation, because the return type would have to change, and you'd need to perform the copy out: pub fn match_(&self, path: &str) -> Result<Box<Rule>, Error> {
for rule in self.0.iter() {
if rule.match_(path)? {
return Ok(Box::new(rule.clone()));
}
}
Err(Error)
} -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
and -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation
The generated logs can be seen with JITWatch [1].Of course sometimes pointers are faster, or much more convenient. But as a rule of thumb: don't use pointers unless you've got a specific reason for them. This applies even more so if you're creating a lot of pointers (like in a loop, or a function that gets called very frequently).
Notably, if you're defaulting to values, you may still have a bunch of allocations when you try to implement interfaces, which usually (always?) requires implicitly boxing values; however, if you pass a pointer into a function that takes an interface, I don't think it gets moved to the heap (but I'm not sure, which is why Go programmers need to be comfortable with profiling the escape analyzer and also why it would be great if Go actually had explicit semantics for allocations).
Isn't it still possible for the value to escape here? For example the callee could stick it until a global data structure.
In fact it seems like a pointer passed to a function would need to be on the heap "by default" unless the compiler can prove that it doesn't escape.
Do you have evidence for this claim? AFAIK the Go compiler does escape analysis, and allocates pointers that don't escape on the stack.
Looks like that's been gone for awhile in favor of C++ 11 stuff, which I don't really like:
https://google.github.io/styleguide/cppguide.html#Copyable_M...
A lot of good software was written in that style, but it has grown bureaucratic over time, and as the C++ language evolved
May I ask, is that theme custom or available somewhere? I really enjoyed it
Moving a single character from one place to another. :-)
A good explanation of why "fire the developers with the lowest 50% of lines added" is an idiotic thing to do: this sort of deep analysis takes a lot of time and expertise, and frequently results in tiny changes.
Looks a bit like https://newcss.net/ or Water CSS
func (r Ruleset) Match(path string) (*Rule, error)
to: func (r *Ruleset) Match(path string) (*Rule, error) type Ruleset []Rule
The original code creates a local copy of a rule and explicitly returns a pointer to that. Taking the ruleset by address wouldn't change that issue.There are valid use cases for wanting to take a copy, and then pass along a pointer of the copy. Perhaps to go through a series of modification methods that don't touch the original. I'd sure hate it if the compiler tried to outsmart me on that and changed the behavior away from what I'd written.
The trap here is that everything is passed by reference (pointer), but the intermediate local value is, well, a value (a copy).
Rule is not a gigantic monster struct (it's 72 bytes), chances are returning it by value would not have been an issue.
Anyway I would say there is an issue with Go here: it's way too easy to copy out of a slice.
-1
It's IO, CPU and Memory hungry, and it's distributed.
C is fast because it's close to how CPU and memory actually work. Go gives you 95+% of that plus easy to learn, easy to use language. A new person could start contributing useful features and bug fixes immediately. A senior person could get C-level performance.
More and more of our code is moved from C to Go, with very little performance penalty, but with a lot more safety and ease of use.
Our customers benefit, and our company makes more money.
In the end, that's what software is about.