undefined | Better HN

0 pointspadenot11y ago0 comments

You can use {call,cache}grind for that, but sometimes it's a bit unpractical if the software is big.

An approach I sometimes use is to throw a generic profiler at the program, make the program do something that is not fast enough (and that would need to be optimizes), look at the profile to identify the function(s) that are too slow, extract them from the big code base, get a good set of input data and run that with {call,cache}grind.

Then you can use the awesome kcachegrind to look at the data (where you can look at different cache misses, branch misdirect, etc.).

Of course, most of the time, simply running in the profiler show a non-optimal algorithm, or terrible allocation patterns, so you don't have to do all that, but I found this approach useful when writing inner loops for numeric computations (and of course, extracting the code if rather easy for this kind of stuff).

And also, this is osx/linux only, sadly.

0 comments

1 comments · 1 top-level

AYBABTME11y ago

That's cool. Right now I'm at the stage "I know how to profile, I can identify algorithmic problems and change them for a better ones, I can find allocation issues and use buffers/pools".

But lower level things like "I know which branch mispredict kills me" or "I know which access patterns results in page faults" are out of my reach. I mean, before asking the question and getting answers like yours.

So thanks for that!

j / k navigate · click thread line to collapse