I was working at a company full of PhDs and well seasoned veterans, who looked at me as a new kid, kind of underqualified to be working in their tools group. I had been at the firm for a while, and they were nice enough, but didn't really have me down as someone who was going to contribute as anything other than a very junior engineer.
We had a severe problem with a program's performance, and no one really had any idea why. And as it was clearly not a sophisticated project, I got assigned to figure something out.
I used the then very new callgrind and the accompanying flamegraph, and discovered that we were passing very large bit arrays for register allocation by value. Very, very large. They had started small enough to fit in registers, but over time had grown so large that a function call to manipulate them effectively flushed the cache, and the rest of the code assumed these operations were cheap.
Profiling tools at the time were quite primitive, and the application was a morass of shared libraries, weird dynamic allocations and JIT, and a bunch of other crap.
Valgrind was able to get the profiles after failing with everything else I could try.
The presentation I made on that discovery, and my proposed fixes (which eventually sped everything up greatly), finally earned the respect of my colleagues, and no phd wasn't a big deal after that. Later on, those colleagues who had left the company invited me to my next gig. And the one after that.
So thanks!
But I, as a fresh member in the team, learned and introduced Google perftools to the team and did a presentation of the breakdown of the running time of the big binary. I have to say that presentation was a life-changing moment in my career.
So together with you, I really want to thank those who devoted heavily into building these tools. When I was doing the presentation, I really felt standing on the shoulders of giants and those giants were helping me.
And over years, I used more and more tools like valgrind, pahole, asan, tsan.
Much appreciated!
There is so much code in the world that nobody has even so much as glanced at a profile of, and any non-trivial, unprofiled code base is virtually guaranteed to have some kind of massive performance problem that is also almost trivial to fix like this.
Put this one in your toolbelt, folks. It's also so fast that you can easily try it without having to "schedule" it, and if I'm wrong and there aren't any easy profiling wins, hey, nobody has to know you even looked. Although in that case, you just learned something about the quality of the code base; if there aren't any profiling quick wins, that means someone else claimed them. As the codebase grows the probability of a quick win being available quickly goes to 1.
Where C++ tooling really lacks is around library management and build tooling. The problem is less that any of the individual tools don’t work and more that there are many of them and they don’t interoperate nicely.
There has to be a deeper understanding I think
The solution is to tune number of buckets from info derived from the pprof callgraph.
There were others too, like redundant serialization, etc. But this one is the most interesting.
(Kidding. Thanks for Valgrind! I still use it for assessing memory corruption vulnerabilities along with ASan.)
https://github.com/google/sanitizers/wiki/AddressSanitizerCo... https://github.com/google/sanitizers/wiki/MemorySanitizer https://valgrind.org/docs/manual/manual.html
I'd use asan over valgrind only for memory leaks. It's faster.
Tridge said words to the effect "if you program in C and you don't aren't using valgrind you flipping should be!" And went on to talk about how some projects like to have a "valgrind clean" build the same way they compile without warnings and that it's a really useful thing. As ever well expressed with examples from samba development.
He was obviously right and I started using valgrind right there in the lecture theatre. apt-get install is a beautiful thing.
He pronounced it val grind like the first part of "value" and "grind" as in grinding coffee beans. I haven't been able to change my pronunciation since then regardless of it being "wrong".
[1] https://lwn.net/Articles/132938/
Corbett's account of this is actually wrong in the lwn link above. Noted by akumria in the comments below it. Every single command and suggestion came from the audience, starting with telnetting to Ted Tso's bitkeeper ip & port that he made available for the demo. Typing help came from the audience as did using netcat and the entire nc command. The audience wrote the bitkeeper client in 2 minutes with tridge doing no more than encouraging, typing and pointing out the "tridge is a wizard reverse engineer who has used his powers for evil" Was clearly just some "wrong thinking." Linus claimed thereafter that Git was named after himself and not Tridge.
Tridge has a way of explaining complicated ideas in a way that pares them down to their essence and helps you to understand them that just really struck me (a smart person is able to talk about a complicated thing in a way that makes you feel dumb, a really smart person is able to talk about a complicated thing in a way that makes you feel like a genius). As well as the ability and intellectual curiosity to jump seemingly effortlessly across disciplines.
And he's a fantastic and very entertaining public speaker. Highly recommend any talk he gives.
That said I sometimes get the "V" tools mixed up (Vagrant, Valgrind, Varnish)
edit: val as in value + grinned
Since it’s an old Norse word, try using Google Translate to hear what happens in Danish, Dutch, German, Icelandic, Norwegian, and Swedish. I don’t know if it’s a modern word in those languages, but Translate is showing translations “election gate” for several languages, and “fall gravel” for Swedish.
According to the audio pronunciations on Translate…
Danish: “vale grint”, long a, hard tapped r, hard d sounds like t
Dutch: sounds like “fall hint” but there’s a slight throaty r in there hard to hear for English speakers, so maybe “hrint”
German: “val grinned”, val like value, grinned with the normal German r
Icelandic: “vall grint”, vall like fall, hard tapped r
Norwegian: “vall grin”, hard tapped r, almost “vall g’din”, silent or nearly silent d/t at the end.
Swedish: “voll grint / g’dint”, hard tapped r, hard d
German is the only one that has “Val” like “value”, all the rest sound more like “fall”. The word valgrind is the door to Valhalla, which means literally “fall hall”, as in hall of the fallen. For that reason, I suspect it makes the most sense to pronounce valgrind like “fall grinned”, but Old Norse might have used val like value, I’m not sure.
BTW Valhalla has an equally amusing number of ways to pronounce it across Germanic languages, “val” sometimes turns into what sound like “fell” instead of “fall”, and in Icelandic the double ell makes it fall-hat-la.
Languages are cool!
The response from Julian Seward was so nice it set a permanently high bar for me when random people I don't know report bugs on my projects!
We still run our entire testsuite under valgrind in CI. Amazing tool!
Running Helgrind or DRD (for threading issues) with GLib has been a bit frustrating, too. If anyone has some advice to share about this, I'm all ears!
(EDIT: I had mistakenly left out the phrase about suppression files)
The magic of Valgrind really lies in its ability to detect errors without recompiling the code. Sure, there’s a performance hit, but sometimes all you have is a binary. It’s damn solid on Linux, and works even with the custom threading library we use for the course; shame the macOS port is barely maintained (last I checked, it only worked on OSes from a few years back - anything more recent will execute syscalls during process startup that Valgrind doesn’t handle).
Sometimes even -fno-omit-frame-pointer won't help, like if the stack is being unwound through a system library that was built without frame pointers. In that case you can switch to the slow unwinder. Set the environment variable `ASAN_OPTIONS=fast_unwind_on_malloc=0` when running your program. But note that this will make most programs run significantly slower so you probably want to use it only when you really need it and not as the default setting for all runs.
Being a UK PhD holder, a sentence stood out out to me was a commentary/comparison between UK and US PhDs: "This was a three year UK PhD, rather than a brutal six-or-more year US PhD."
My cousin has a US PhD and judging from what he tells me. It is a lot more rigorous than UK PhDs.
The US PhD is usually 4-5 years after a 4 year bachelors (8-9 years). It is a little bit longer with more graduate-level coursework.
That said, the US bachelors starts at age 17 while a UK bachelors starts after 2 years of A-levels. So in terms of length it’s a wash.
AIUI you can do Highers (equivalent to GCSE, at 16) and enter Uni then with sufficiently high grades (aged 16/17). Or, stay on for one more year to do Advanced Higher (most common). Uni courses can then be 4 or occasionally 3 years. Don't quote me!
I was able to build and debug them on Linux with Valgrind finding many stupid mistakes and the library worked flawlessly on Symbian.
It's just one of the many times that Valgrind has saved my bacon. It's awesome.
Lovely piece of software toward which I owe a lot of gratitude.
Fun fact: the creator of Purify went on to found Netflix and is still their CEO.
Just like Ada has been doing since 1983.
Ada can manage dynamic stacks, strings and arrays on its own.
For example, Ada has what one could call type safe VLAs, instead of corrupting the stack like C, you get an exception and can redo the call with a smaller size, for example.
As for explicit heap types and Ada.Unchecked_Deallocation, yes if we are speaking about Ada 83.
Ada 95 introduced controlled types, which via Initialize, Adjust, and Finalize, provide the basis of RAII like features in Ada.
Here is an example on how to implement smart pointers with controlled types,
https://www.adacore.com/gems/gem-97-reference-counting-in-ad...
There is also the possiblity to wrap heap allocation primitives with safe interfaces exposed via storage pools, like on this tutorial https://blog.adacore.com/header-storage-pools
Finally thanks to SPARK, nowadays integrated into Ada 2012[0], you can also have formal proofs that it is safe to release heap memory.
In top of all this, Ada is in the process of integrating affine types as well.
[0] - Supported in PTC and GNAT, remaining Ada compilers have a mix of Ada 95 - 2012 features, see https://news.ycombinator.com/item?id=27603292
And running in a container is not really a solution for most of these.
What Mac is definitely lacking, though, is reverse debugging. Linux has rr, Windows has Time Travel Debugging. macOS still doesn't have an equivalent.
Now imagine that you're developing a new application and you want to use some library, and it hasn't been tested with valgrind and generates tons of false messages. Should you then use it? Or look for an alternative library?
$ LD_PRELOAD=libumem.so.1 <executable filename>
I found a lot of memory corruption bugs using libumem in particular including some in MIT Kerberos that were severe enough to be considered security vulnerabilities. Sadly, Solaris is now in support mode thanks to Ellison and friends at Oracle.
Memcheck decreases the memory safety problem of C++ by about 80% in my experience - it really is a big deal. The compiler-based tools that require recompiling every library used are a bit impractical for large stacks such as the ones under Qt-based GUI applications. Several libraries, several build systems. But I hear that they are popular for CI systems in large projects such as web browsers, which probably have dedicated CI developers. There are also some IME rare problems that these tools can find that Memcheck can't, which is due to information unavailable in compiled code. Still, Memcheck has the largest coverage by far.
Callgrind and Cachegrind give very precise, repeatable results, complementary to but not replacing perf and AMD / Intel tooling which use hardware performance counters. I tend to use all of them. They all work without recompiling.
That was one of the more annoying tickets to file. We could of course send them the binary, but it would not run without the Purify license file, and we weren't comfortable to send off the license file as well. But, in the end, they accepted the bug. Not sure if there was every any fix, though.
Also used by SQLite in their performance measurement workflow(https://sqlite.org/cpu.html#performance_measurement)
What needs to be done to get Valgrind binaries available for MacOS (M1) ?, from a company perspective we are happy to support this work. If you know who's interest and can accomplish this pls drop me an email to eduardo at calyptia dot com.
Also, it has a nice integration with Eclipse which reflects the Valgrind memcheck output to the source files directly, enabling you to see where problems are rooted.
All in all, Valgrind is a great toolset.
P.S.: I was pronouncing Valgrind correctly! :)
I'm pretty sure I've seen people using the icache/dcache miss counts from valgrind for profiling. I wonder how unreliable these numbers are.
Cachegrind is used to measure performance because it gives answers that are repeatable to 7 or more significant digits. In comparison, actual (wall-clock) run times are scarcely repeatable beyond one significant digit [...] The high repeatability of cachegrind allows the SQLite developers to implement and measure "microoptimizations".
There's a bunch of ways for caches to behave differently but have they changed much over the past 20 years? i.e. is the difference between [2022 AMD cache, 2002 AMD cache] significantly greater than the difference between [2002 PowerPC G4 cache, 2002 AMD cache, 2002 Intel cache] ?
Just managing the cache access between L2 and L3 I think would be additional consideration, but then you have to consider the actual architectural differences and on server chips locality will matter quite a bit.
I'm speculating, but if that's not modeled, cachegrind may pessimize some less simple predictable patterns and report a lot of expected misses when the cpu would have been able to prefetch it
https://github.com/tialaramex/leakdice (or https://github.com/tialaramex/leakdice-rust)
Leakdice implements some of Raymond Chen's "The poor man’s way of identifying memory leaks" for you. On Linux at least.
https://bytepointer.com/resources/old_new_thing/20050815_224...
All leakdice does is: You pick a running process which you own, leakdice picks a random heap page belonging to that process and shows you that page as hex + ASCII.
The Raymond Chen article explains why you might ever want to do this.
I'm also a fan of systemtap, for when your probing problems push into peeking at the kernel
Also, the sanitizers for GCC and Clang (https://github.com/google/sanitizers), and the Clang static analyzer (and tidy too) through CodeChecker (https://codechecker.readthedocs.io/).
For the Clang static analyzer, make sure your LLVM toolchain has the Z3 support enabled (OK in Debian stable for example), and enable cross translation units (CTU) analysis too for better results.
It seems some packages (even basic ones) are not compatible with Valgrind, thereby spoiling the entire debugging experience.
valgrind target/debug/rustbinary
==10173== HEAP SUMMARY:
==10173== in use at exit: 854,740 bytes in 175 blocks
==10173== total heap usage: 2,046 allocs, 1,871 frees, 3,072,309 bytes allocated
==10173==
==10173== LEAK SUMMARY:
==10173== definitely lost: 0 bytes in 0 blocks
==10173== indirectly lost: 0 bytes in 0 blocks
==10173== possibly lost: 1,175 bytes in 21 blocks
==10173== still reachable: 853,565 bytes in 154 blocks
==10173== suppressed: 0 bytes in 0 blocks
==10173== Rerun with --leak-check=full to see details of leaked memory
valgrind --tool=cachegrind target/debug/rustbinary
==146711==
==146711== I refs: 1,054,791,445
==146711== I1 misses: 11,038,023
==146711== LLi misses: 62,896
==146711== I1 miss rate: 1.05%
==146711== LLi miss rate: 0.01%
==146711==
==146711== D refs: 793,113,817 (368,907,959 rd + 424,205,858 wr)
==146711== D1 misses: 757,883 ( 535,230 rd + 222,653 wr)
==146711== LLd misses: 119,285 ( 49,251 rd + 70,034 wr)
==146711== D1 miss rate: 0.1% ( 0.1% + 0.1% )
==146711== LLd miss rate: 0.0% ( 0.0% + 0.0% )
==146711==
==146711== LL refs: 11,795,906 ( 11,573,253 rd + 222,653 wr)
==146711== LL misses: 182,181 ( 112,147 rd + 70,034 wr)
==146711== LL miss rate: 0.0% ( 0.0% + 0.0% )