One way to see how different virtual threads are from our old mechanisms is to ask yourself how many IO operations you can have in flight. There are two options: either the operations are blocking, in which case the number will be equal to the (very limited) number of threads in all of your thread pools combined, or the operations are non-blocking, in which case thread context that is so necessary for troubleshooting and JFR profiles is lost (e.g. JFR can't know on behalf of whom is some IO operation performed because the "owner" of some operation -- in the design of the Java platform -- can only be a thread). Virtual threads allow you to have hundreds of thousands (or even millions) of I/O operations in flight (which you need for high throughput as a result of Little's law) while still preserving observable context.
BTW, as for fork-join's `join`, not only is it designed for pure computation only and cannot help with IO, but every `join` increases the depth of the stack, so it is fundamentally limited in how much it can help. FJ is designed for pure computation workloads, so in practice that's not a huge limitation, but virtual threads are designed for IO workloads.
I apologise for not going into more depth here, but as you can imagine, with a user base numbering in the many millions, we can only afford to put in depth explanations in texts and videos that gain a large audience, but once you've familiarised yourself with the material I'll gladly answer specific questions (and perhaps your questions will help us improve our material, too).