Making a Go program faster with a one-character change (opens in new tab)

(hmarr.com)

511 pointshcm3y ago236 comments

236 comments

118 comments · 28 top-level

cbsmith3y ago· 28 in thread

As an old C/C++ programmer, I'm always surprised by how often software developers are surprised by the performance costs of inopportune value semantics (C and C++ even more so, punishes you severely for using value semantics when you shouldn't). I increasingly see the wisdom of languages with implicit reference semantics.

It's not that value semantics can't be better (they most assuredly can be), or that reference semantics don't cause their own complexity problems, but rather that so often we thoughtlessly imply/impose value semantics through interfaces in ways that negatively impact performance; getting interfaces wrong is a much tougher bell to unring.

The vast majority of my mental energy when I define an interface in C++ is carefully thinking through a combination of ownership contracts and value vs. reference semantics that I can mostly ignore in languages with implicit reference semantics. While occasionally ignoring those contracts while developing in Java/Python/whatever comes back to bite me, the problem isn't nearly as common or problematic as when I unintentionally impose value semantics in a language that allows me to.

thomascgalvin3y ago

> I increasingly see the wisdom of languages with implicit reference semantics.

I spend most of my time in a JVM language of one flavor or another, and when I was learning Go, the first thing that stuck out at me was, "why would I ever want the compiler to invisibly copy a data structure for me?"

I suppose the primary reason is to prevent the callee from modifying the caller's data out from under them; unless you pass a reference value, you know the callee cannot modify your data.

But, as someone who leans heavily into "everything should be as immutable as possible," the second thing that stuck out at me was "wait, a struct can't have const fields?"

When I write code, it's common to have references to immutable classes thrown around with wild abandon, heedless of ownership, threads, or good taste, because the data just can't change. But that's a paradigm that Go simply doesn't support.

cbsmith3y ago

> When I write code, it's common to have references to immutable classes thrown around with wild abandon, heedless of ownership, threads, or good taste, because the data just can't change.

If there's anything I wish languages with implicit reference semantics would adopt, it's implicit immutability. I wish Java would be so much nicer with keyword that is half way between "final" and "volatile" that means, "yes, you can actually mutate this" and then make final semantics the default for fields & variables.

1 more reply

titzer3y ago

> When I write code, it's common to have references to immutable classes thrown around with wild abandon, heedless of ownership, threads, or good taste, because the data just can't change. But that's a paradigm that Go simply doesn't support.

You might get a kick out of Virgil. It's easy (and terse!) to define immutable classes and you can have immutable ADTs too. (plus tuples, generics with separate typechecking, etc).

throwaway8943453y ago

I also have a background in C/C++, etc and I've only ever found myself missing value semantics when I use languages with implicit reference semantics. I guess I always figured the solution was "value semantics with better education / tooling". Education: people should understand value semantics. Tooling: imagine an IDE that highlights allocation points automatically (or perhaps the problem is implicit allocations rather than value semantics?).

codeflo3y ago

> perhaps the problem is implicit allocations rather than value semantics?

I think that’s true. Expensive copies should never have been implicit. There was a story some time ago about a single keypress in the address bar of Chrome causing thousands of memory allocations. The culprit: lots of std::string arguments up and down the call stack.

Rust gets this right, with the hindsight of C++’s example: “a = b” is a move operation by default and clone() is always explicit, except for plain data types where copying is literally memcpy — and those are clearly marked as such by the type system.

4 more replies

cbsmith3y ago

> I've only ever found myself missing value semantics when I use languages with implicit reference semantics.

Oh, I miss it every time. ;-)

I will say though that some newer languages seem to have a confused idea about how to offer mixed semantics. A bunch of them tie semantics to types. The ideal interface can vary by usage context. It's hard enough getting the semantics right as the callee (as opposed to caller), let alone when you're defining a type that will be used by who knows how many interfaces.

> I guess I always figured the solution was "value semantics with better education / tooling".

I've always thought much the same, but I have slowly come to appreciate that it's more than just education & tooling. Even with good education & tooling, there's a cognitive load that comes with getting interfaces right that for the general case is just not worth it.

2 more replies

TremendousJudge3y ago

>or perhaps the problem is implicit allocations rather than value semantics

To me, this sounds like this is it. Explicit is better than implicit is a very useful truism

1 more reply

ok1234563y ago

> Tooling: imagine an IDE that highlights allocation points automatically

Rider does this already for C#.

Serow2253y ago

The JetBrains IDEs can do this, at least for .NET

1 more reply

titzer3y ago

I think the main issue in C++ isn't value semantics, it's deep copy semantics. E.g. in a functional language ADTs are immutable and don't have identity. They can be freely copied, or not, passed by reference or by value, but they are never deep copied. Comparison may be deep, but not passing them.

That is to say, I think I mostly am agreeing with you. In Java, objects are always passed by reference, never by value, and never implicitly copied. But Java doesn't have any value types other than primitives. When I added ADTs to Virgil, I wanted to get the best of both worlds; objects are pass by reference, like Java, and ADTs are immutable, identity-less, so they are deep compared but never deep copied. (ADTs in Virgil can also sometimes be flattened, so they don't necessarily result in heap allocation).

cbsmith3y ago

In a functional language, you don't have to worry about bits of code mutating your data. ;-) On the flip side, there's a lot of cognitive load that comes with functional languages, so while they do address the problem neatly...

I'd have to take a look at Virgil to appreciate your approach, but I'm always leery of implicit value vs. reference semantics tied to types (aside from the whole array fiasco, easily the ugliest part of Java's type system). So often the particular semantics you want are driven by the context of what you're doing, rather than the what you're doing it with.

nyanpasu643y ago

I still don't see why value structs need to be immutable; ints are mutable in all languages, and structs are mutable in C, C++, and Rust (if you `let mut`) and it's a feature of the language.

tylerhou3y ago

The performance issue here is not value semantics, it's the overhead of automatic lifetime management. The copy is cheap. The lifetime tracking is not because it forces a heap allocation and creates additional GC pressure. In fact, assuming Rule is small, if Match returned by value, the code would be similarly as fast.

cbsmith3y ago

> The performance issue here is not value semantics

There's no performance cost to value semantics, so of course not.

> The copy is cheap. The lifetime tracking is not because it forces a heap allocation and creates additional GC pressure. In fact, assuming Rule is small, if Match returned by value, the code would be similarly as fast.

I'm referring more to how this stuff seeps in without the programmer realizing it. It's the implicit nature of all this behaviour that is the problem.

1 more reply

bitexploder3y ago

Becoming good at Go is mostly knowing all the sharp edges with the built in types IMO. Of course go routines and concurrency primitives are difficult to master, but that is a different beast and if you understand concurrency from some other languages Go just makes that easy. But knowing all the behaviors of slices, and really intuiting how they work makes your life a lot easier and a lot less prone to bugs. And slice behavior almost all comes down to what is a slice internally and how and where am I copying things as I wrote my code. Generally the copying is fine, but in some cases it is not or it is in a tight performance critical section where you need to be thinking about it.

Esit: Also, pointers into slices will probably leave you sad. You get a pointer to the slice storage, not a pointer to the thing. And if the slice resizes your pointer mow references a dead slice. Basically, pointers and slices are not friends. Unless you have a slice of pointers, which idiomatic go avoids unless there is a decent reason for it :)

Jemaclus3y ago

I'm a dumb dumb. Can you define "implicit reference semantics" and "value semantics"? You use the phrases several times in your post, but I don't really understand what you mean. If it helps, I'm not a C++ programmer, but I am familiar with higher level languages like Go, Python, Ruby, PHP and Javascript.

TheRealPomax3y ago

It's the difference between assigning/passing around "copies of the data" vs. assigning/passing around "the memory address for that data" under the hood.

PHP, for example, has explicit references. If you have an `$arr1=array(1,2,3)` and an `$arr2 = $arr1`, that second array is a full copy of the first array, and updating $arr1 does nothing to $arr2. Similarly, `function update_array($arr) { $arr[0] = 'cake'; }` called with `$arr1` will create a copy of $arr1 for use inside that function. Any changes to `$arr` inside the function only apply to that copy, not to $arr1. Unless you explicitly tell PHP you want to work with references, by using `function update_array(&$arr) { ... }` instead. PHP uses value semantics.

JS, on the other hand, uses implicit reference semantics. If you have a `const arr1 = [1,2,3];` then `arr1` is an alias, simply pointing to a memory location, and declaring a `const arr2 = arr1`, makes arr2 also be an alias for that same memory location. They both reference the same thing, but neither "is" the thing. Similarly, if you have a `function updateArray(arr) { arr[0] = 'cake'; }` then that `arr` is also just an alias pointing to the memory location of whatever you passed in, and changes to `arr` change the underlying data, so whether you try to access that through `arr1`, `arr2` and `arr`, it's the same thing. JS uses implicit reference semantics.

(But note that many languages with implicit reference semantics make exceptions for immutable primitives like numbers or strings)

3 more replies

gdwatson3y ago

“Implicit reference semantics” means that variables ordinarily refer to objects rather than containing them. “Value semantics” means that variables contain values rather than references, though there are pointer types that let you explicitly store references when you want them. (Often this is discussed in terms of parameter passing, for historical reasons, but the same ideas apply to variables more generally.)

If your language has implicit reference semantics, “x = y” will cause x and y to refer to the same object. If it has value semantics, x will be a copy of y.

2 more replies

masklinn3y ago

"Implicit reference semantics" = everything (or near enough) is implicitly a reference. That's what you get in Python, Ruby, or Javascript: when you pass something around, you're really passing a reference to it, a pointer. Everyone ends up referring to the same thing, and sharing it.

"Value semantics": you pass the value itself around, which means you're shallow-copying it all the time. That's what you get when you pass or return a bare non-interface type in Go.

PHP is a bit of a mess, objects are passed by reference (= implicit reference semantics) but arrays are passed by value, and you can opt them into pass-by-reference. You can also pass non-arrays by reference, though I don't think that's very common.

marcosdumay3y ago

a = [1, 2]

b = a

b[0] = 3

print(a)

What does the above print? If the language implements reference semantics, it prints [3, 2]. If it implements value semantics, it prints [1, 2].

1 more reply

klodolph3y ago

Languages like Python, Java, and C# have implicit reference semantics. When you create an object, you get a pointer to that object (usually, or commonly).

In languages like C, C++, Go, Rust… references are more explicit. If you want a pointer to an object, you have to &, or something similar.

It gets a bit fuzzy.

2 more replies

strifey3y ago

If you're familiar with Python and Go, you'll likely be able to quickly spot the differences in how they handle parameter passing. Python uses references and Go uses value.

1 more reply

scubbo3y ago

Thank you for asking - I, too, was a dumb dumb.

brundolf3y ago

Having used C++ relatively little, and a long time ago (and never used Go), I don't think I ever realized that value semantics could be used to implicitly clone heap objects

Yeesh

masklinn3y ago

Normally they’re not, that’s really specific to C++ nonsense.

In a normal value-semantics system if you have a pointer you just copy the pointer. Obviously if the langage doesn’t have your back it also means you’ve now fucked up your ownership, but you’re in C so that was to be expected.

1 more reply

dimitrios13y ago

As a current Go programmer, hunting down escaping pointers via escape analysis is a common part of the routine now.

dleslie3y ago

I would also lay a chunk of blame on the use of type inference which causes the information about the behaviour to be hidden from view.

Had the author been looking at the type information within the syntax of the code the profile output may not have been a surprise. Perhaps the problem would never have existed in the first place.

adrianmonk3y ago

Yeah. Often, writing out the type explicitly is just busywork. But it seems it would have paid off here.

If you were forced to stop and think what type to declare, I bet you'd write "var rule *Rule". Even if you don't think deeply and just look at the return type.

And then if you assigned "r[i]" to "rule", you'd get a type error.

enedil3y ago· 10 in thread

Went from 4.139s to 2.413s. I fail to see how it is 70%. I think it is explained as 4.139/2.413 = 1.7 which of course doesn't make sense here.

CodesInChaos3y ago

I do think saying it's 71% faster makes sense here, since "x% faster" and "speed increased by x%" mean the same thing. This reduces the runtime by 42%, but that doesn't mean it's just 42% faster.

bmicraft3y ago

It would probably be more accurate to say it can do 70% more stuff in the same time. Or that it takes 42% less runtime

xnorswap3y ago

But 70% more stuff in the same time is 70% faster.

1 more reply

morelisp3y ago

This is an extremely common mistake in reporting performance numbers. That the old version is 70% slower does not make the new version 70% faster.

wizofaus3y ago

70% slower is a bit ambiguous though - it could mean 70% extra runtime or it could mean 30% of the new speed. Whereas 70% faster would always suggest to me that it can do 70% more work in the same amount of time, i.e. a 1.7x increase in speed.

1 more reply

hcmOP3y ago

Doh! Thanks for pointing out another silly mistake – I'll fix that.

xnorswap3y ago

You had it right the first time, 1.7x speed is 70% faster.

If something previously took 4s now takes 2s then it's 100% faster.

Think of driving 10miles. If you drive at 20mph then it takes 30 minutes. If you drive twice as fast, 40mph, it takes 15 minutes.

40mph is 100% faster than 20mph.

Half the time is twice as fast!

2 more replies

wizofaus3y ago

I disagree, I think the 70% is right, and matches what you still describe as a 1.7x speed increase. If it originally took 4 seconds and now takes 2, I'd call that a 100% speed increase, i.e. twice as fast.

markoman3y ago

@hcm: Would have loved to see the 'after' flamegraph just for comparison purposes! I'm still trying to get used to groking flamegraphs when optimizing. They're a somewhat new tool, IMO.

barbegal3y ago

It's gone from 14.5 operations per minute to 24.9 operations per minute so a 70% speedup.

lanstin3y ago· 7 in thread

That seems like a potential for compiler optimization. It should already know that the rule value is only used one time, as the target of a & and this must be somewhat common in managing return values.

ithkuil3y ago

The semantics change. You're now returning a pointer to the actual Rule in the Ruleset, while before you'd be returning a pointer to copy of the Rule.

The optimization would only work if you had a way to tell the compiler that some values are constant/immutable.

ithkuil3y ago

BTW; I'm using both Go and Rust lately.

In Rust you can write a function that returns the pointer of one element of a slice. You can also write a function that returns the pointer to a heap-allocated copy of an element of the slice. The two functions would have different signatures.

The compiler would also prevent mutation of the slice as long as there are any references to individual elements of the slice being passed around.

1 more reply

lanstin3y ago

Oh yeah. Too bad. Just ran the escape to heap analysis on my current project, not looking too promising. Mostly it is allocating structure and saving them in a huge in memory hash.

oconnor6633y ago

I think the optimization is only valid if we know that nothing is ever going to use thr returned pointer to do mutation.

kgeist3y ago

I don't think it can be optimized without altering semantics. If it's a pointer to a value in the slice, changing the slice's values (ruleset[i] = ...) will be reflected in all Rule values returned from the function, because they all point to the same memory. In the same way, changing the returned value's fields will change the data in the original slice. The author's code is prone to this behavior after the change.

When it's a pointer to a copy, no such implicit dependencies occur.

derefr3y ago

You're not wrong in general, but one interesting thing about Go as an ecosystem (rather than as a language) is that golang programs are mostly statically compiled — all sources, one pass, one code unit, one object-code output — so they're (theoretically) very amenable to compile-time (rather than link-time) Whole-Program Optimization techniques.

In this specific case, that technique would be whole-program dataflow analysis. Given a Golang function that passes out references-to-copies-of owned data, you could actually determine for certain — at least in the default static-binary linkage mode — whether these two properties hold universally within the resulting binary:

1. whether no caller of the function will ever try to do anything that would cause data within their copy of the struct to be modified;

2. whether the owner of the data will never modify the data of the original struct in such a way that, if the copy were elided, the changes would be "seen" by any reads done in any of the callers. (The owner could still modify internal metadata within the struct for its own use, as long as such internal metadata is 1. in private fields, 2. where all callers live outside the package defining the struct, making those fields inaccessible; and 3. the fields are never accessed by any struct methods called by borrowers of the struct — keeping in mind that such methods can be defined outside the package by caller code.)

If you could prove both of these properties (using dataflow analysis), then you could safely elide the copy within the function, turning the return of a reference-to-a-copy-of-X into a return of a reference-to-X.

(And, in fact, if you can only prove the second property universally, and the first property in specific instances, then you can still elide the copy from the function itself; but you'd also generate a wrapper function that calls said function [receiving a reference-to-X], copies, and so returns a reference-to-a-copy-of-X; and then, for any call-site where the first property doesn't hold — i.e. callers whose transitive call-graph will ever modify the data — you'd replace the call to the original function with a call to the wrapper. So "safe" connected caller sub-graphs would receive references, while "unsafe" connected caller sub-graphs would receive copies.)

runevault3y ago

Nah an optimization is dangerous as others have said. A lint that detects oversized copies could be worthwhile though.

Beltalowda3y ago· 7 in thread

The deeper lesson here is "don't use pointers unless you're sure you need them". I've seen quite a few people use pointers for no reason in particular, or there's simply the assumption it's faster (and have done this myself, too), but it puts a lot more pressure on the GC than simple local stack variables.

Of course sometimes pointers are faster, or much more convenient. But as a rule of thumb: don't use pointers unless you've got a specific reason for them. This applies even more so if you're creating a lot of pointers (like in a loop, or a function that gets called very frequently).

throwaway8943453y ago

Eh, I've waffled a couple of times between "pass values by default" and "pass pointers by default". Ultimately, I don't think there's a really good answer except to understand escape analysis and get comfortable with the escape analyzer. Notably, "using pointers" doesn't inherently put pressure on the GC, but rather allocations put pressure on the GC and there are plenty of instances where pointers don't put pressure on the GC at all (notably, if you're passing data into a function by pointer, it's not going to allocate, but it may if you're returning a pointer to "a stack-allocated value").

Notably, if you're defaulting to values, you may still have a bunch of allocations when you try to implement interfaces, which usually (always?) requires implicitly boxing values; however, if you pass a pointer into a function that takes an interface, I don't think it gets moved to the heap (but I'm not sure, which is why Go programmers need to be comfortable with profiling the escape analyzer and also why it would be great if Go actually had explicit semantics for allocations).

kevincox3y ago

> notably, if you're passing data into a function by pointer, it's not going to allocate

Isn't it still possible for the value to escape here? For example the callee could stick it until a global data structure.

In fact it seems like a pointer passed to a function would need to be on the heap "by default" unless the compiler can prove that it doesn't escape.

1 more reply

dist1ll3y ago

> but it puts a lot more pressure on the GC than simple local stack variables.

Do you have evidence for this claim? AFAIK the Go compiler does escape analysis, and allocates pointers that don't escape on the stack.

throwaway8943453y ago

This is true, but it's hard to tell if a pointer escapes or not without actually profiling. That said, I don't think the answer is to avoid pointers, but rather to get comfortable with profiling the escape analyzer. By default, I just stick to the subset of Go which I know won't escape--functions can take pointer parameters, but I'm very careful about returning pointers to data that would otherwise be stack-allocated (even though it's not especially idiomatic, I'll often prefer mutating an `out T` parameter rather than returning a `T` because I know the former will not allocate).

kccqzy3y ago

That's not a deeper lesson. A deeper lesson is to understand where the pointer points to and then decide accordingly.

Beltalowda3y ago

That is pretty much what I said, except phrased different.

1 more reply

bitwise1013y ago

Most of the times I use a pointer just because I want to return nil in case of error. Is this a valid reason?

amluto3y ago· 6 in thread

Somewhat off topic, but I find a different part of this to be quite ugly:

    if match || err != nil {
        return rule, err
    }

Translating this code to actual logic takes too much thought and is too fragile. Is that an error path or a success path? It’s both! The logic is “if we found a rule or if there was an error then return a tuple that hopefully indicates the outcome”. If any further code were to be added in this block, it would have to be validated for the success and the error case.

But this only makes any sense at all if one is okay with reading Go result returns in their full generality. A canonical Go function returns either Success(value) or Error(err not nil, meaningless auxiliary value). And this code has “meaningless auxiliary value” != nil! In fact, it’s a pointer that likely escapes further into unrelated error handling code and thus complicates and kind of lifetime or escape analysis.

I don’t use Go, but if I did, I think this part of the language would be my biggest peeve. Go has very little explicit error handling; fine, that’s a reasonable design decision. But Go’s error handing is incorrectly typed, and that is IMO not a reasonable design.

ericbarrett3y ago

I write a lot of Go, and I agree that this is a big wart in its error handling that would be served by a proper Result type.

Nevertheless, the convention is that if a function returns (value, err), and err != nil, the value is discarded (I think of it as "undefined"). So the code is conventional.

amluto3y ago

In C, “discarding” a pointer in a way that leaves the value visible is quite common. At least if one doesn’t accidentally use the pointer, it’s harmless. (In the way that all manner of unsafeness is harmless in C as long as no actual UB occurs, which is to say it’s not great.)

But Go is a garbage collected language, and there is so such thing as “discarding” a pointer. Either it’s there or it isn’t, and this kind of leak has side effects. I find it baffling that the language designers and the community consider this acceptable.

(One thing I really like about Rust is that you can’t play fast and loose with lifetimes like this. If you have function taking &'a Vec<T> and you return &'a T, you can’t arbitrarily “discard” and leak that reference up the call chain. You need to genuinely get rid of it by the time the vector is gone.)

1 more reply

za3faran3y ago

What I found frustrating that there is nothing stopping a function from returning both a non nil value and a non nil error. I've seen weird code that did that and it could have easily resulted in incorrect behavior.

1 more reply

slcjordan3y ago

They should probably replace that with 2 if statements to make the error path and the non-error path obvious:

    if err != nil {
        return nil, err
    }
    if match {
        return rule, nil
    }

foldr3y ago

The regular return value doesn’t have to be meaningless just because the error is non-nil. In this case the function returns the rule that triggered the error, which is potentially useful information. I don’t think it is an official Go convention that err being non-nil entails that the other return value should be meaningless.

za3faran3y ago

It's quite error prone, most golang code is if err != nil { return nil, err }

Now of course it's important to read the documentation, but a language with sum types (or exceptions) would have used a separate type to indicate an error condition plus useful information on that error.

2 more replies

karmakaze3y ago· 4 in thread

> I did consider two other approaches:

  Changing Ruleset from being []Rule to []*Rule, which would mean we no longer need to explicitly take a reference to the rule.
  Returning a Rule rather than a *Rule. This would still copy the Rule, but it should stay on the stack instead of moving to the heap.

> However, both of these would have resulted in a breaking change as this method is part of the public API.

The problem with heap allocated objects could be due to the incorrect public API.

The change that improves performance also gives out pointers to the actual elements of Ruleset itself permitting the caller to change the contents of Ruleset which wasn't possible before the speed-up. Perhaps you're already aware since change to []*Rule was being considered.

mook3y ago

The best part is, there's a subtle API breakage here: the returned *Rule is now a reference to an element of the []Rule, so if the caller was previously modifying the returned value it'll change the slice.

It's debatable what API guarantees existed around this though; most of the time this would be unspecified.

ok_dad3y ago

Sometimes it doesn't matter if a public API is incorrect, because it's set in stone for whatever reason, and you just need to fix the problem internally.

spockz3y ago

This is why I like https://github.com/openrewrite so much. One gets to tell users how to rewrite code automatically. It makes refactoring almost as easy as in a mono repo.

1 more reply

karmakaze3y ago

The way to fix it in that manner here is to undo the 42% speedup and return the heap allocated object for the caller to mangle.

1 more reply

coder5433y ago· 4 in thread

There is potentially another option: use the midstack inliner to move the allocation from the heap to the stack of the calling function: https://words.filippo.io/efficient-go-apis-with-the-inliner/

As long as the global slice is never mutated, the current approach is probably fine, but it is definitely a semantic change to the code.

ploxiln3y ago

That seems like overkill for this particular case, but it's a very interesting technique, thanks for the link!

throwaway232iuu3y ago

Why doesn't go use RVO like C++ and Rust?

https://en.wikipedia.org/wiki/Copy_elision#Background

coder5433y ago

I don’t think we’re on the same page about what midstack inlining is being used for in my suggestion. This discussion is about eliminating a heap allocation, which as far as I understand, RVO never does. Please read the article I linked if you want to discuss this further. I don’t want to repeat the article pointlessly.

I’m also fairly sure Go uses RVO here too, which cuts down on the number of times the object is copied around, but again, it’s irrelevant to the discussion of heap allocations. Copying the object isn’t the performance problem here, needlessly allocating a very short-lived object on the heap over and over is.

morelisp3y ago

Packages should be exposing an API with destination slices more often to begin with. The stdlib is pretty good about this (there's a few missing though 1.19 closed the most obvious absences), but most third-party code is awful. Or worse, it only takes strings.

AtNightWeCode3y ago· 4 in thread

So, this is very basic Go design and you could write something about how it works in C and Go and why a older lang like C don't have this prob but then at the end of the day the Go fanclub will down vote the hell out you no matter what.

AtNightWeCode3y ago

Go compiler is garbage by the design. A 20 year old C compiler does not have this prob. This is also why Go have declined so much during the last couple of years. The benefits of Go have not increased and most of the quirks are still there. Like the error handling, the naive compiler and the syntax sugar that somewhat hides the diff between pointers and direct heap allocs.

-1

kosherhurricane3y ago

I work on a code base that is a mixture of Go and C.

It's IO, CPU and Memory hungry, and it's distributed.

C is fast because it's close to how CPU and memory actually work. Go gives you 95+% of that plus easy to learn, easy to use language. A new person could start contributing useful features and bug fixes immediately. A senior person could get C-level performance.

More and more of our code is moved from C to Go, with very little performance penalty, but with a lot more safety and ease of use.

Our customers benefit, and our company makes more money.

In the end, that's what software is about.

3 more replies

YesThatTom23y ago

C is not a good match for modern cpus. https://queue.acm.org/detail.cfm?id=3212479

fear913y ago

Can you show any examples of "go compiler being garbage"? In my experience, it often generates much smarter code than C# or Java.

asim3y ago· 3 in thread

If you want to have a solid understanding and need to do it in just a few hours here's a few things to review.

- The Go programming language spec https://go.dev/ref/spec

- Effective Go https://go.dev/doc/effective_go

- Advanced Go concurrency patterns https://go.dev/talks/2013/advconc.slide#1

- Plus many more talks/slides https://go.dev/talks/

erdaniels3y ago

I created this video on concurrency (maybe advanced) patterns a while back that some may find helpful but it's pretty long https://www.youtube.com/watch?v=U3_2xiPxyA8.

bogomipz3y ago

This was really good. You should do more of these. Cheers.

1 more reply

lairv3y ago

The "How to write Go code" article https://go.dev/doc/code is also very useful to actually know how to structure a codebase

tuetuopay3y ago· 3 in thread

Aaaaaand that's why I love Rust's decision to make copies explicit with `.clone()`. Annoying as hell when you're not used to it but overall worth it.

BlackFly3y ago

Except a lot of structs also derive and prefer `Copy` and a lot of rust code also avoids heap allocation which requires `Clone`. The `Copy` trait can be used implicitly like in the example here. On the other hand, due to the lack of garbage collector, you wouldn't be able to return the reference to the copy which might lead you to find your accidental copy.

tuetuopay3y ago

I have yet to come on structs that implement `Copy` while being expensive to actually copy. The largest I can think of is `Uuid` from the `uuid` crate, which is 128 bits in size. This is a single word copy for most machines since modern hardware has 128 bit support for case like this. Still, two 64-bit words to copy is definitely negligible: that's equivalent of copying two pointers.

I agree with you for the garbage collector. By design, a GC allows you to willy-nilly copy without thinking about the consequences.

kangalioo3y ago

The Copy trait can only be used for bitwise copies. Expensive copying with heap allocations will never happen implicitly

gp3y ago· 3 in thread

I was trying to debug and improve the performance of some parallelized C++ code over the weekend for parsing CSV files. What would happen was parsing each file (~24k lines, 8 columns) would take 100ms with one execution context, but when split across many threads, the execution time of each thread would slow down proportionally and the throughput of the whole program would strictly decrease as thread count increased.

I tried all of the obvious things, but the offender ended up being a call to allocate and fill a `struct tm` object from a string representation of a date. This doesn't have any obvious reasons (to me) that it would cause cache invalidation, etc, so I'm a little in the dark.

Still, replacing this four line block improved single threaded performance by 5x, and fixed the threaded behavior, so on the whole it is now ~70x faster and parses about 400mb of csv per second.

Thorrez3y ago

Maybe date related code calls out to the operating system to find your time zone, and maybe that can't be done in parallel.

Quekid53y ago

False sharing, maybe?

gp3y ago

Not a bad suggestion - thanks for the idea

hoosieree3y ago· 3 in thread

> You can see these decisions being made by passing -gcflags=-m to go build:

That's a very nice feature! I wonder if compilers for other languages have something similar.

masklinn3y ago

You'd have to see if all compilers support it, but LLVM has a "remarks" system, which should provide similar information (though likely a lot more of it) for optimization passes which are traced: https://llvm.org/docs/Remarks.html#introduction-to-the-llvm-...

The frontend may or may not have its own optimizations and logs tho e.g. rustc now has MIR optimizations (https://rustc-dev-guide.rust-lang.org/mir/optimizations.html) but while you can dump MIR data (https://rustc-dev-guide.rust-lang.org/mir/debugging.html) I don't remember seeing an optimisation log.

At the end of the day, I think it's more likely that you take a look at the assembly and infer problems from there if the profiler doesn't tell you straight. An other difference is the kind of decisions the compiler makes e.g. while a compiler can optimize away allocations in "manual allocation" languages (https://godbolt.org/z/5nEo7xjEr) the allocations are plainly visible, so if they're trivially avoidable... you'll just avoid them.

Using Rust as an example, you'd have something like this:

    pub fn match_(&self, path: &str) -> Result<&Rule, Error> {
        for rule in self.0.iter() {
            if rule.match_(path)? {
                return Ok(rule);
            }
        }
        Err(Error)
    }

You couldn't miss an allocation, because the return type would have to change, and you'd need to perform the copy out:

    pub fn match_(&self, path: &str) -> Result<Box<Rule>, Error> {
        for rule in self.0.iter() {
            if rule.match_(path)? {
                return Ok(Box::new(rule.clone()));
            }
        }
        Err(Error)
    }

_old_dude_3y ago

In Java

    -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

and

    -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation

The generated logs can be seen with JITWatch [1].

[1] https://github.com/AdoptOpenJDK/jitwatch

throwaway8943453y ago

Does anyone with VS Code or vim plugin development experience know how hard it would be to call this behind the scenes and highlight allocation points in the editor?

sendfoods3y ago· 3 in thread

1 character, in 2 places ;) I did not know profiling support for go was so seamless, thank you!

May I ask, is that theme custom or available somewhere? I really enjoyed it

gwd3y ago

> 1 character, in 2 places ;)

Moving a single character from one place to another. :-)

A good explanation of why "fire the developers with the lowest 50% of lines added" is an idiotic thing to do: this sort of deep analysis takes a lot of time and expertise, and frequently results in tiny changes.

hcmOP3y ago

Thanks! It's just a few dozen lines of CSS. The body font is Inter and the monospaced font is JetBrains Mono.

blowski3y ago

> is that theme custom or available somewhere

Looks a bit like https://newcss.net/ or Water CSS

sakras3y ago· 2 in thread

A while ago at my company we switched from GCC to Clang, and noticed a couple of massive regressions (on the order of 50%?) in performance having to do with floating point.

After profiling for a bit, I discovered that suddenly a lot of time was spent in isinf on Clang and no time in GCC… Clang was emitting a function call where GCC wasn’t. I happened to randomly change isinf to std::isinf (it’s a random habit of mine to put std:: in front of these C functions). Suddenly the regression disappeared! I guess on Clang only std::isinf was a compiler intrinsic while GCC recognized both? Anyway, that’s my small-change optimization story.

10000truths3y ago

C defines isinf as a macro, whereas C++’s std::isinf is a function. Perhaps the discrepancy has to do with differences in how they’re evaluated?

jeffrallen3y ago

> random habit of mine to put std:: in front of these C functions

And did you learn your lesson about making random changes that "shouldn't matter" without proving they don't matter? :)

I find that once I spend the time to make these changes correctly, they are not worth the time to make correctly.

is_taken3y ago· 1 in thread

Would be interesting to see the performance difference if you undo that move-&-change and change the function signature from:

  func (r Ruleset) Match(path string) (*Rule, error)

to:

  func (r *Ruleset) Match(path string) (*Rule, error)

masklinn3y ago

Likely none: Ruleset is

    type Ruleset []Rule

The original code creates a local copy of a rule and explicitly returns a pointer to that. Taking the ruleset by address wouldn't change that issue.

lxe3y ago· 1 in thread

This is the kind of stuff that the compiler needs to really understand. If all this de-referencing and referencing magic is at the control of the user, it needs to have meaningful effect on what the code does. Otherwise we might as well just write C.

silisili3y ago

The compiler does understand it and did what was asked - it was just written rather poorly.

There are valid use cases for wanting to take a copy, and then pass along a pointer of the copy. Perhaps to go through a series of modification methods that don't touch the original. I'd sure hate it if the compiler tried to outsmart me on that and changed the behavior away from what I'd written.

tonymet3y ago· 1 in thread

Overall good review of profiling tactics . But there’s nothing egregious about Golang here . Pass by value vs reference is a common performance issue.

masklinn3y ago

> But there’s nothing egregious about Golang here . Pass by value vs reference is a common performance issue.

The trap here is that everything is passed by reference (pointer), but the intermediate local value is, well, a value (a copy).

Rule is not a gigantic monster struct (it's 72 bytes), chances are returning it by value would not have been an issue.

Anyway I would say there is an issue with Go here: it's way too easy to copy out of a slice.

ludiludi3y ago

> If you read the title and thought “well, you were probably just doing something silly beforehand”, you’re right!

Don't feel too silly. Russ Cox, one of the technical leads on the Go language, made the same mistake in the regexp package of the standard library.

https://go-review.googlesource.com/c/go/+/355789

assbuttbuttass3y ago

Returning a pointer to a local variable is convenient, but can be a source of hidden allocations.

It's best to treat each struct as a "value" or "pointer" type, and use one or the other consistently for each type. This mostly avoids the need to use & in the first place

jimsmart3y ago

From the headline alone, I guessed this was to do with pointers/references to values vs values themselves.

Yep, with values that take a lot of memory, it's faster to pass pointers/references around than it is to pass the values around, because it is less bytes to copy.

Of course there is more to such a decision than just performance, because if the code makes changes to the value which are not meant to be persisted, then one wants to be working with a copy of the value, not a pointer to the value. So one should take care if simply switching some code from values to pointers-to-values.

All of these things are things that coders with more experience of languages that use such semantics kinda know already, almost as second nature, since the first day they got caught out by them. But everyone is learning, to various degrees, and we all have to start somewhere (i.e. knowing little to nothing).

chubot3y ago

FWIW, to prevent the bug where a = b is slow for big types, Google's C++ style guide used to mandate DISALLOW_COPY_AND_ASSIGN (which used to be DISALLOW_EVIL_CONSTRUCTORS I think) on all types (most types?)

Looks like that's been gone for awhile in favor of C++ 11 stuff, which I don't really like:

https://google.github.io/styleguide/cppguide.html#Copyable_M...

A lot of good software was written in that style, but it has grown bureaucratic over time, and as the C++ language evolved

infamousclyde3y ago

Thank you for sharing. I'm curious if you would recommend any good resources for profiling with Go. I enjoyed your code snippets and methodology.

stephen1233y ago

Great post. I always feel smart when I find these kind of optimisations. Then I wonder why the compiler isnt smarter, I dont have to be.

erdaniels3y ago

Is there any nice tooling / static analysis for golang that instruments the builds the process to add all the gcflags with verbose output and give you hints as to what can be optimized?

renewiltord3y ago

Clear tutorial of how to go about identifying this. Good blog post. Since the problem was constrained and real, it helps someone know when to use these tools. Thank you for sharing.

cratermoon3y ago

This has made me go back to look at all the Go I've written recently and look at the & uses.

amtamt3y ago

It falls in those 3% of code lines one should think of while not optimizing prematurely.

notpushkin3y ago

Well, technically it's either a 2-character or 0-character change! :-)

j / k navigate · click thread line to collapse

236 comments

118 comments · 28 top-level

cbsmith3y ago· 28 in thread

thomascgalvin3y ago

> I increasingly see the wisdom of languages with implicit reference semantics.

I suppose the primary reason is to prevent the callee from modifying the caller's data out from under them; unless you pass a reference value, you know the callee cannot modify your data.

But, as someone who leans heavily into "everything should be as immutable as possible," the second thing that stuck out at me was "wait, a struct can't have const fields?"

cbsmith3y ago

> When I write code, it's common to have references to immutable classes thrown around with wild abandon, heedless of ownership, threads, or good taste, because the data just can't change.

1 more reply

titzer3y ago

You might get a kick out of Virgil. It's easy (and terse!) to define immutable classes and you can have immutable ADTs too. (plus tuples, generics with separate typechecking, etc).

throwaway8943453y ago

codeflo3y ago

> perhaps the problem is implicit allocations rather than value semantics?

4 more replies

cbsmith3y ago

> I've only ever found myself missing value semantics when I use languages with implicit reference semantics.

Oh, I miss it every time. ;-)

> I guess I always figured the solution was "value semantics with better education / tooling".

2 more replies

TremendousJudge3y ago

>or perhaps the problem is implicit allocations rather than value semantics

To me, this sounds like this is it. Explicit is better than implicit is a very useful truism

1 more reply

ok1234563y ago

> Tooling: imagine an IDE that highlights allocation points automatically

Rider does this already for C#.

Serow2253y ago

The JetBrains IDEs can do this, at least for .NET

1 more reply

titzer3y ago

cbsmith3y ago

nyanpasu643y ago

I still don't see why value structs need to be immutable; ints are mutable in all languages, and structs are mutable in C, C++, and Rust (if you `let mut`) and it's a feature of the language.

tylerhou3y ago

cbsmith3y ago

> The performance issue here is not value semantics

There's no performance cost to value semantics, so of course not.

I'm referring more to how this stuff seeps in without the programmer realizing it. It's the implicit nature of all this behaviour that is the problem.

1 more reply

bitexploder3y ago

Jemaclus3y ago

TheRealPomax3y ago

It's the difference between assigning/passing around "copies of the data" vs. assigning/passing around "the memory address for that data" under the hood.

(But note that many languages with implicit reference semantics make exceptions for immutable primitives like numbers or strings)

3 more replies

gdwatson3y ago

If your language has implicit reference semantics, “x = y” will cause x and y to refer to the same object. If it has value semantics, x will be a copy of y.

2 more replies

masklinn3y ago

"Value semantics": you pass the value itself around, which means you're shallow-copying it all the time. That's what you get when you pass or return a bare non-interface type in Go.

marcosdumay3y ago

a = [1, 2]

b = a

b[0] = 3

print(a)

What does the above print? If the language implements reference semantics, it prints [3, 2]. If it implements value semantics, it prints [1, 2].

1 more reply

klodolph3y ago

Languages like Python, Java, and C# have implicit reference semantics. When you create an object, you get a pointer to that object (usually, or commonly).

In languages like C, C++, Go, Rust… references are more explicit. If you want a pointer to an object, you have to &, or something similar.

It gets a bit fuzzy.

2 more replies

strifey3y ago

If you're familiar with Python and Go, you'll likely be able to quickly spot the differences in how they handle parameter passing. Python uses references and Go uses value.

1 more reply

scubbo3y ago

Thank you for asking - I, too, was a dumb dumb.

brundolf3y ago

Having used C++ relatively little, and a long time ago (and never used Go), I don't think I ever realized that value semantics could be used to implicitly clone heap objects

Yeesh

masklinn3y ago

Normally they’re not, that’s really specific to C++ nonsense.

1 more reply

dimitrios13y ago

As a current Go programmer, hunting down escaping pointers via escape analysis is a common part of the routine now.

dleslie3y ago

I would also lay a chunk of blame on the use of type inference which causes the information about the behaviour to be hidden from view.

Had the author been looking at the type information within the syntax of the code the profile output may not have been a surprise. Perhaps the problem would never have existed in the first place.

adrianmonk3y ago

Yeah. Often, writing out the type explicitly is just busywork. But it seems it would have paid off here.

If you were forced to stop and think what type to declare, I bet you'd write "var rule *Rule". Even if you don't think deeply and just look at the return type.

And then if you assigned "r[i]" to "rule", you'd get a type error.

enedil3y ago· 10 in thread

Went from 4.139s to 2.413s. I fail to see how it is 70%. I think it is explained as 4.139/2.413 = 1.7 which of course doesn't make sense here.

CodesInChaos3y ago

I do think saying it's 71% faster makes sense here, since "x% faster" and "speed increased by x%" mean the same thing. This reduces the runtime by 42%, but that doesn't mean it's just 42% faster.

bmicraft3y ago

It would probably be more accurate to say it can do 70% more stuff in the same time. Or that it takes 42% less runtime

xnorswap3y ago

But 70% more stuff in the same time is 70% faster.

1 more reply

morelisp3y ago

This is an extremely common mistake in reporting performance numbers. That the old version is 70% slower does not make the new version 70% faster.

wizofaus3y ago

1 more reply

hcmOP3y ago

Doh! Thanks for pointing out another silly mistake – I'll fix that.

xnorswap3y ago

You had it right the first time, 1.7x speed is 70% faster.

If something previously took 4s now takes 2s then it's 100% faster.

Think of driving 10miles. If you drive at 20mph then it takes 30 minutes. If you drive twice as fast, 40mph, it takes 15 minutes.

40mph is 100% faster than 20mph.

Half the time is twice as fast!

2 more replies

wizofaus3y ago

markoman3y ago

@hcm: Would have loved to see the 'after' flamegraph just for comparison purposes! I'm still trying to get used to groking flamegraphs when optimizing. They're a somewhat new tool, IMO.

barbegal3y ago

It's gone from 14.5 operations per minute to 24.9 operations per minute so a 70% speedup.

lanstin3y ago· 7 in thread

ithkuil3y ago

The semantics change. You're now returning a pointer to the actual Rule in the Ruleset, while before you'd be returning a pointer to copy of the Rule.

The optimization would only work if you had a way to tell the compiler that some values are constant/immutable.

ithkuil3y ago

BTW; I'm using both Go and Rust lately.

The compiler would also prevent mutation of the slice as long as there are any references to individual elements of the slice being passed around.

1 more reply

lanstin3y ago

Oh yeah. Too bad. Just ran the escape to heap analysis on my current project, not looking too promising. Mostly it is allocating structure and saving them in a huge in memory hash.

oconnor6633y ago

I think the optimization is only valid if we know that nothing is ever going to use thr returned pointer to do mutation.

kgeist3y ago

When it's a pointer to a copy, no such implicit dependencies occur.

derefr3y ago

1. whether no caller of the function will ever try to do anything that would cause data within their copy of the struct to be modified;

runevault3y ago

Nah an optimization is dangerous as others have said. A lint that detects oversized copies could be worthwhile though.

Beltalowda3y ago· 7 in thread

throwaway8943453y ago

kevincox3y ago

> notably, if you're passing data into a function by pointer, it's not going to allocate

Isn't it still possible for the value to escape here? For example the callee could stick it until a global data structure.

In fact it seems like a pointer passed to a function would need to be on the heap "by default" unless the compiler can prove that it doesn't escape.

1 more reply

dist1ll3y ago

> but it puts a lot more pressure on the GC than simple local stack variables.

Do you have evidence for this claim? AFAIK the Go compiler does escape analysis, and allocates pointers that don't escape on the stack.

throwaway8943453y ago

kccqzy3y ago

That's not a deeper lesson. A deeper lesson is to understand where the pointer points to and then decide accordingly.

Beltalowda3y ago

That is pretty much what I said, except phrased different.

1 more reply

bitwise1013y ago

Most of the times I use a pointer just because I want to return nil in case of error. Is this a valid reason?

amluto3y ago· 6 in thread

Somewhat off topic, but I find a different part of this to be quite ugly:

    if match || err != nil {
        return rule, err
    }

ericbarrett3y ago

I write a lot of Go, and I agree that this is a big wart in its error handling that would be served by a proper Result type.

Nevertheless, the convention is that if a function returns (value, err), and err != nil, the value is discarded (I think of it as "undefined"). So the code is conventional.

amluto3y ago

1 more reply

za3faran3y ago

1 more reply

slcjordan3y ago

They should probably replace that with 2 if statements to make the error path and the non-error path obvious:

    if err != nil {
        return nil, err
    }
    if match {
        return rule, nil
    }

foldr3y ago

za3faran3y ago

It's quite error prone, most golang code is if err != nil { return nil, err }

2 more replies

karmakaze3y ago· 4 in thread

> I did consider two other approaches:

  Changing Ruleset from being []Rule to []*Rule, which would mean we no longer need to explicitly take a reference to the rule.
  Returning a Rule rather than a *Rule. This would still copy the Rule, but it should stay on the stack instead of moving to the heap.

> However, both of these would have resulted in a breaking change as this method is part of the public API.

The problem with heap allocated objects could be due to the incorrect public API.

mook3y ago

It's debatable what API guarantees existed around this though; most of the time this would be unspecified.

ok_dad3y ago

Sometimes it doesn't matter if a public API is incorrect, because it's set in stone for whatever reason, and you just need to fix the problem internally.

spockz3y ago

This is why I like https://github.com/openrewrite so much. One gets to tell users how to rewrite code automatically. It makes refactoring almost as easy as in a mono repo.

1 more reply

karmakaze3y ago

The way to fix it in that manner here is to undo the 42% speedup and return the heap allocated object for the caller to mangle.

1 more reply

coder5433y ago· 4 in thread

There is potentially another option: use the midstack inliner to move the allocation from the heap to the stack of the calling function: https://words.filippo.io/efficient-go-apis-with-the-inliner/

As long as the global slice is never mutated, the current approach is probably fine, but it is definitely a semantic change to the code.

ploxiln3y ago

That seems like overkill for this particular case, but it's a very interesting technique, thanks for the link!

throwaway232iuu3y ago

Why doesn't go use RVO like C++ and Rust?

https://en.wikipedia.org/wiki/Copy_elision#Background

coder5433y ago

morelisp3y ago

AtNightWeCode3y ago· 4 in thread

AtNightWeCode3y ago

-1

kosherhurricane3y ago

I work on a code base that is a mixture of Go and C.

It's IO, CPU and Memory hungry, and it's distributed.

More and more of our code is moved from C to Go, with very little performance penalty, but with a lot more safety and ease of use.

Our customers benefit, and our company makes more money.

In the end, that's what software is about.

3 more replies

YesThatTom23y ago

C is not a good match for modern cpus. https://queue.acm.org/detail.cfm?id=3212479

fear913y ago

Can you show any examples of "go compiler being garbage"? In my experience, it often generates much smarter code than C# or Java.

asim3y ago· 3 in thread

If you want to have a solid understanding and need to do it in just a few hours here's a few things to review.

- The Go programming language spec https://go.dev/ref/spec

- Effective Go https://go.dev/doc/effective_go

- Advanced Go concurrency patterns https://go.dev/talks/2013/advconc.slide#1

- Plus many more talks/slides https://go.dev/talks/

erdaniels3y ago

I created this video on concurrency (maybe advanced) patterns a while back that some may find helpful but it's pretty long https://www.youtube.com/watch?v=U3_2xiPxyA8.

bogomipz3y ago

This was really good. You should do more of these. Cheers.

1 more reply

lairv3y ago

The "How to write Go code" article https://go.dev/doc/code is also very useful to actually know how to structure a codebase

tuetuopay3y ago· 3 in thread

Aaaaaand that's why I love Rust's decision to make copies explicit with `.clone()`. Annoying as hell when you're not used to it but overall worth it.

BlackFly3y ago

tuetuopay3y ago

I agree with you for the garbage collector. By design, a GC allows you to willy-nilly copy without thinking about the consequences.

kangalioo3y ago

The Copy trait can only be used for bitwise copies. Expensive copying with heap allocations will never happen implicitly

gp3y ago· 3 in thread

Still, replacing this four line block improved single threaded performance by 5x, and fixed the threaded behavior, so on the whole it is now ~70x faster and parses about 400mb of csv per second.

Thorrez3y ago

Maybe date related code calls out to the operating system to find your time zone, and maybe that can't be done in parallel.

Quekid53y ago

False sharing, maybe?

gp3y ago

Not a bad suggestion - thanks for the idea

hoosieree3y ago· 3 in thread

> You can see these decisions being made by passing -gcflags=-m to go build:

That's a very nice feature! I wonder if compilers for other languages have something similar.

masklinn3y ago

Using Rust as an example, you'd have something like this:

    pub fn match_(&self, path: &str) -> Result<&Rule, Error> {
        for rule in self.0.iter() {
            if rule.match_(path)? {
                return Ok(rule);
            }
        }
        Err(Error)
    }

You couldn't miss an allocation, because the return type would have to change, and you'd need to perform the copy out:

    pub fn match_(&self, path: &str) -> Result<Box<Rule>, Error> {
        for rule in self.0.iter() {
            if rule.match_(path)? {
                return Ok(Box::new(rule.clone()));
            }
        }
        Err(Error)
    }

_old_dude_3y ago

In Java

    -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

and

    -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation

The generated logs can be seen with JITWatch [1].

[1] https://github.com/AdoptOpenJDK/jitwatch

throwaway8943453y ago

Does anyone with VS Code or vim plugin development experience know how hard it would be to call this behind the scenes and highlight allocation points in the editor?

sendfoods3y ago· 3 in thread

1 character, in 2 places ;) I did not know profiling support for go was so seamless, thank you!

May I ask, is that theme custom or available somewhere? I really enjoyed it

gwd3y ago

> 1 character, in 2 places ;)

Moving a single character from one place to another. :-)

hcmOP3y ago

Thanks! It's just a few dozen lines of CSS. The body font is Inter and the monospaced font is JetBrains Mono.

blowski3y ago

> is that theme custom or available somewhere

Looks a bit like https://newcss.net/ or Water CSS

sakras3y ago· 2 in thread

A while ago at my company we switched from GCC to Clang, and noticed a couple of massive regressions (on the order of 50%?) in performance having to do with floating point.

10000truths3y ago

C defines isinf as a macro, whereas C++’s std::isinf is a function. Perhaps the discrepancy has to do with differences in how they’re evaluated?

jeffrallen3y ago

> random habit of mine to put std:: in front of these C functions

And did you learn your lesson about making random changes that "shouldn't matter" without proving they don't matter? :)

I find that once I spend the time to make these changes correctly, they are not worth the time to make correctly.

is_taken3y ago· 1 in thread

Would be interesting to see the performance difference if you undo that move-&-change and change the function signature from:

  func (r Ruleset) Match(path string) (*Rule, error)

to:

  func (r *Ruleset) Match(path string) (*Rule, error)

masklinn3y ago

Likely none: Ruleset is

    type Ruleset []Rule

The original code creates a local copy of a rule and explicitly returns a pointer to that. Taking the ruleset by address wouldn't change that issue.

lxe3y ago· 1 in thread

silisili3y ago

The compiler does understand it and did what was asked - it was just written rather poorly.

tonymet3y ago· 1 in thread

Overall good review of profiling tactics . But there’s nothing egregious about Golang here . Pass by value vs reference is a common performance issue.

masklinn3y ago

> But there’s nothing egregious about Golang here . Pass by value vs reference is a common performance issue.

The trap here is that everything is passed by reference (pointer), but the intermediate local value is, well, a value (a copy).

Rule is not a gigantic monster struct (it's 72 bytes), chances are returning it by value would not have been an issue.

Anyway I would say there is an issue with Go here: it's way too easy to copy out of a slice.

ludiludi3y ago

> If you read the title and thought “well, you were probably just doing something silly beforehand”, you’re right!

Don't feel too silly. Russ Cox, one of the technical leads on the Go language, made the same mistake in the regexp package of the standard library.

https://go-review.googlesource.com/c/go/+/355789

assbuttbuttass3y ago

Returning a pointer to a local variable is convenient, but can be a source of hidden allocations.

It's best to treat each struct as a "value" or "pointer" type, and use one or the other consistently for each type. This mostly avoids the need to use & in the first place

jimsmart3y ago

From the headline alone, I guessed this was to do with pointers/references to values vs values themselves.

Yep, with values that take a lot of memory, it's faster to pass pointers/references around than it is to pass the values around, because it is less bytes to copy.

chubot3y ago

Looks like that's been gone for awhile in favor of C++ 11 stuff, which I don't really like:

https://google.github.io/styleguide/cppguide.html#Copyable_M...

A lot of good software was written in that style, but it has grown bureaucratic over time, and as the C++ language evolved

infamousclyde3y ago

Thank you for sharing. I'm curious if you would recommend any good resources for profiling with Go. I enjoyed your code snippets and methodology.

stephen1233y ago

Great post. I always feel smart when I find these kind of optimisations. Then I wonder why the compiler isnt smarter, I dont have to be.

erdaniels3y ago

Is there any nice tooling / static analysis for golang that instruments the builds the process to add all the gcflags with verbose output and give you hints as to what can be optimized?

renewiltord3y ago

Clear tutorial of how to go about identifying this. Good blog post. Since the problem was constrained and real, it helps someone know when to use these tools. Thank you for sharing.

cratermoon3y ago

This has made me go back to look at all the Go I've written recently and look at the & uses.

amtamt3y ago

It falls in those 3% of code lines one should think of while not optimizing prematurely.

notpushkin3y ago

Well, technically it's either a 2-character or 0-character change! :-)

j / k navigate · click thread line to collapse