Tiptop: Hardware Performance Counters (opens in new tab)

(tiptop.gforge.inria.fr)

55 pointsrrrrtttt11y ago15 comments

15 comments

14 comments · 5 top-level

mrmondo11y ago· 6 in thread

Could someone please explain the differences and advantages of tiptop over using a program like htop for analysing performance for every day operations (if any)?

deathanatos11y ago

I think they're not really in the same class. The article states,

> software developers identify critical regions of their applications and evaluate design choices to select the best performing implementation.

Which is what I'd use metrics like cache misses and branch prediction misses for: really fine-tuning some section of the code that needs to execute lightning fast.

htop, on the other hand, gives a more high-level overview. Like, "who's eating all the RAM?". (Or, perhaps more often, "who do I need to kill?")

AYBABTME11y ago

Once you've identified that a process has a lot of branch mispredict (or other counters), how do you identify where those lots of branch mispredictions are caused?

stephencanon11y ago

Use a sampling profiler with support for hardware counters (vtune, instruments, zoom, etc). Configure to take a stack trace every Nth* event on the counter of interest. Run your program under the profiler. Look at the trace with the tree “inverted” or “bottom up” to see exactly which functions are incurring the counter events, which isn’t terribly useful, so look at the functions themselves to see a line-by-line or instruction-by-instruction breakdown of where the events occurred.

(*) What should N be? Depends on how frequently the counter is getting hit. N between 1000 and 1000000 is pretty typical. Choosing prime N is a good idea.

nnx11y ago

You could use "perf record -e cache-misses <process>" then "perf report" to see which functions are responsible.

padenot11y ago

You can use {call,cache}grind for that, but sometimes it's a bit unpractical if the software is big.

An approach I sometimes use is to throw a generic profiler at the program, make the program do something that is not fast enough (and that would need to be optimizes), look at the profile to identify the function(s) that are too slow, extract them from the big code base, get a good set of input data and run that with {call,cache}grind.

Then you can use the awesome kcachegrind to look at the data (where you can look at different cache misses, branch misdirect, etc.).

Of course, most of the time, simply running in the profiler show a non-optimal algorithm, or terrible allocation patterns, so you don't have to do all that, but I found this approach useful when writing inner loops for numeric computations (and of course, extracting the code if rather easy for this kind of stuff).

And also, this is osx/linux only, sadly.

1 more reply

edwintorok11y ago

It monitors hardware counters, so it is more similar to 'perf top' than 'htop'/'top', but it can also monitor multiple counters at once. As a disadvantage it can't show the function-level counters, or assembly annotation that 'perf top' can.

So perhaps you could use 'tiptop' to get a general view of what might be slow, and then drill down using 'perf top'.

stinos11y ago· 2 in thread

Slightly off-topic (maybe) but can this or another readily available tool be used to track counters during startup/shutdown and provide a log of it which can then be imported in some viewer? (think something like windows Performance Monitor/Event Tracer and the likes)

pgeorgi11y ago

Like http://www.bootchart.org/?

stinos11y ago

exactly!

edwintorok11y ago· 1 in thread

Interesting that this software has reached version 2.2 and it is not available in Debian yet. I filed a RFP: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754932

thinred11y ago

Preliminary work on the package: http://anonscm.debian.org/gitweb/?p=collab-maint/tiptop.git I will change to ITP soon. :)

oakwhiz11y ago

Is it possible to use these same hardware performance counters to perform a side-channel attack on another process?

th3iedkid11y ago

it mentions intel instrumentation instructions , does it work on other hardware though goes unmentioned in requirements? Say spark/...

j / k navigate · click thread line to collapse