Last month we used VisualVM to get rid of the OutOfMemory errors and to decrease the run time of an application from >110 hours to 40 hours. We found that a big 3rd party library was keeping some kind of undocumented "undo history" of every action, that could not be flushed.
If those features aren't being used then there's a good chance that the problem is in the chair, not in the profiler.
Who cares if the profiler is making the app run slower. You aren't going to run it all the time in production!
If you can't find the memory leak or performance bottleneck, either you are doing something wrong or the profiler is not giving enough information. Getting a stack trace of where those byte[]'s are being created would be a good start.
In my own experience I've had quite a bit of success with the valgrind family of tools when I was debugging memory leaks in a GSM message codec (written in C, not a managed language) I wrote a few years ago. I guess it depends on the nature of the code and memory leaks.
I've also had some success tracing memory use with VisualVM for the purpose of finding out where memory is allocated so it could be used more efficiently (both for performance and do prevent leaks).
From experience in other languages, profilers are perfectly capable of pointing the source of leaks. It's what they're for. They can do so by displaying a weighted object graph (drilling from byte[], in this example, using back references), and by taking stack snapshots of some percentage of allocations.
A quick look at their websites confirms that all three of the profilers mentioned in the article do.
We ended up using JMX monitoring for most of the needed metrics.