Project Loom will not change that, but it will improve performance for certain scenarios.
Scala’s varied async/concurrency libraries are implemented in user land, and still use threads underneath. Mechanically, you must opt in to these and have to work to interop with code that might use other primitives. Scala can handle a lot of this complexity at compile time w/ types, but it’s not perfect, and certain runtime behaviors will always be out of scope.
Loom improves this by allowing any language that runs on the JVM (Java, Scala, Clojure) to opaquely use virtual threads to run their existing synchronous, scheduler-unaware code on the new Loom concurrency primitives implemented in the JVM. That’s powerful!
Concurrency is not mainly about threads or performance, it is about program behavior semantics. Loom does not do anything about that, it "merely" improves performance. Well, you could say it actually makes semantics worse (you used (TM) for good reasons).
In that sense, I believe that Scala or any language with good concurrency semantics will benefit from Loom more than Java, unfortunately. But Java can of course still catch up on a language level. Even after such a long time, there are still new and interesting libraries (e.g. looking at JOOQ).
I'm not sure how one would get there with the JVM's memory model. you'd need something like actors and a preemptive scheduler per core at the VM level with a share nothing state between actors/virtual threads. Erlang utilizes message passing and immutability to do this.
Project Loom is bringing green threads back, now officially as virtual threads.
Additionally there is java.util.concurrent and for those that care to really go deep enough, custom schedulers.
Part of what Project Loom is doing is bringing lightweight usermode threads, called "Virtual Threads" and it's own scheduler.
Importantly you don't need share nothing state or immutability to add preemption, the JVM already has points during code execution where it knows the full state of the program. They call these "safepoints" and they're important for the GC to work properly. With the current implementation of Loom virtual threads are preempted when they do any blocking I/O or synchronization, but there's no reason why in the future they couldn't preempt them at any safepoint.
I don't know what you define as a few thousand active threads, but running the following C++ code let me run 70,000 threads before I got a resource error: