Java is very fast and 3X slower is a pretty wild claim.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
On the real world, you won't get things as optimized in higher level languages, because optimized code looks completely unidiomatic. A 3x speedup from Java is a pretty normal claim.
Most code will be considerably slower due to a lot of factors.
Java in particular is a very pointer-heavy language, made up of pointers to pointers to pointers everywhere, which is really bad for our modern systems that often are much more memory latency than CPU constrained.
A factor of 2-4x to languages like C++ or Rust for most code seems plausible (and even low) unless the limiting factor is external, like network or file system IO.
It's true that pointer chasing really hurts in some sorts of program and benchmark. For sure. No argument. That's why Project Valhalla exists.
But it's also my view that modern C++ programming gets away with a lot of slow behaviours that people don't really investigate or talk about because they're smeared over the program and thus don't show up in profilers, whereas actually the JVM fixes them everywhere.
C++ programs tend to rely much more heavily on copying large structures around than pointer-heavy programs. This isn't always or even mostly because "value types are fast". It's usually because C++ doesn't have good memory management so resource management and memory layout gets conflated, e.g. std::vector<BigObject>. You can't measure this because the overheads are spread out over the entire program and inlined everywhere, so don't really show up in profiling. For the same reasons C++ programs rely heavily on over-specialised generics where the specialisation isn't actually a perf win but rather a side effect of the desire for automatic resource management, which leads to notorious problems with code bloat and (especially) compile time bloat.
Another source of normally obscured C++ performance issues is the heap. We know malloc is very slow because people so frequently roll their own allocators that the STL supports this behaviour out of the box. But malloc/new is also completely endemic all over C++ codebases. Custom allocators are rare and restricted to very hot paths in very well optimised programs. On the JVM allocation is always so fast it's nearly free, and if you're not actually saturating every core on the machine 100% of the time, allocation effectively is free because all the work is pushed to the spare cores doing GC.
Yet another source of problems is cases where the C++ programmer doesn't or can't actually ensure all data is laid out in memory together because the needed layouts are dynamically changing. In this case a moving GC like in the JVM can yield big cache hit rate wins because the GC will move objects that refer to each other together, even if they were allocated far apart in time. This effect is measurable in modern JVMs where the GC can be disabled:
https://shipilev.net/jvm/anatomy-quarks/11-moving-gc-localit...
And finally some styles of C++ program involve a lot of virtual methods that aren't always used, because e.g. there is a base class that has multiple implementations but in any given run of the program only one base class is used (unit tests vs prod, selected by command line flag etc). JVM can devirtualise these calls and make them free, but C++ compilers usually don't.
On the other hand all these things can be obscured by the fact that C++ these days tends only to be used in codebases where performance is considered important, so C++ devs write performance tuned code by default (or what they think is tuned at least). Whereas higher level languages get used for every kind of program, including the common kind where performance isn't that big of a deal.
Here are some benchmarks; I'll leave to the experts out there to confirm or dismiss them.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
If anything the gap is increasing not shrinking. JVM is terrible at memory access patterns due to the design of the language, and designing for memory is increasingly critical for maximum performance on modern systems. All the clever JIT'ing in the world can't save you from the constant pointer chasing, poor cache locality, and poor prefetching.
The gap won't shrink until Java has value types. Which is on the roadmap, yes, but still doesn't exist just yet.
The problem with those benchmarks is if you look at the Java code you'll see it's highly non-idiomatic. Almost no classes or allocations. They almost all exclusively use primitives and raw arrays. Even then it still doesn't match the performance on average of the C (or similar) versions, but if you add the real-world structure you'd find in any substantial project that performance drops off.
Tight, low level code in Java and Go is roughly as fast as average C code. The Go compiler is know to be less good at optimizing code than e.g. GCC, but this in many cases creates little practical difference, while the Java JIT compilers have become excellent to a point where they often beat GCC, especially as they can use run time profiling for code optimization. So they can optimize the code for the actual task at hand.
Where the languages differ in "speed" is their runtime environment. Java and Go are languages with garbage collection, which of course means that some amount of CPU is required to perform GC. But as the modern garbage collectors run in parallel with the program, this CPU effort often enough is no bottleneck. On the other side, manual memory management has different performance trade-offs, which in many cases can make it quite slow on its own.
But now I sound like a Geico (insurance) commercial. Sorry about that.