No, I don't even like the guy. Why must I consider someone a hero to feel they deserve a trial for their punished actions?
And, even if you know mathematically without a doubt that "they did it" in point-of-fact, and you have all the evidence to back you up such that you're really sure that if you did have a trial, it'd be an open-and-shut case... you still can't assume. Because it's not about whether you see guilt or innocence when you look at the evidence; it's about whether the jury sees guilt or innocence when they look at the evidence.
Jury nullification means a jury can just decide, arbitrarily, that somebody's not guilty of something. And then double-jeopardy means that you can't ask that question again. It's been resolved, permanently: the accused has been declared innocent, in the eyes of the law—however strong your proof was! And any further judgements by the courts have to take, as input, that innocent verdict that came out of the trial; not the proof-of-guilt that went in but didn't survive.
What that implies, to me, is that in the Al Capone example, he should never have been treated as anything other than a regular tax evader, and should only have received a regular tax-evader's sentence.
I'm reminded heavily of the recent p-hacking controversy in science. Imagine a world where we "pre-register" a trial when an investigation is begun, and are then forced to go through with it (and pretend the government has the infinite money required to enable this.) In this world, that trial would likely declare the suspect innocent a lot of the time... and all the evidence gathered to do so would be "used up" by that trial. You couldn't turn around and use the same evidence to prove that "because he's guilty of X, it's more likely that..." anything else. Because, in the eyes of the law, he isn't guilty of X, and anything that proves that, doesn't. The evidence itself was nullified when the verdict was.
In the real world, just to save time and money, we don't bother to prosecute trials we know we'll lose. But shouldn't the effects still be the same as if we did? An optimization shouldn't change the semantics of the system.