> You can use software rendering for Wayland cases too. There are even OpenGL / Vulkan software implementations.
That's actually besides the point I intended, which was to provide an example how little code the X11 client actually needs. OpenGL/Vulkan software implementations are the opposite of little.
> I don't really see much value in such use case. Thin client (the reverse) makes more sense (i.e. where your side is a weak computer and remote server is something more powerful).
Yes, I can see that, e.g. remotely using a super computer. However, I think GPU-capability wise the devices people use to interact with graphical systems are quite sufficient to most any interaction task and if X11 was able to video stream just the important bits (I imagine the important bits would be large updated bitmap areas within the user interface, so video encoded server side bitmap transport would do it), it would be just as suitable for that kind of asymmetric case; while being still usable for IoT scenarios, where you have those tiny computers providing sensor data, of which there are probably hundreds of millions if not billions by now.
In principle it's also trivial to convert an X11 style display interaction with video transport (just run the X11 server in the remote end), while the inverse is impossible. So with X11 style you could choose either or, depending on your devices and needs.
> But either way, running a compositor even with software rendering should be doable even on low end hardware.
And how about video-encoding that data on low end hardware without help from hardware? And even with the help of hardware, i.e. NVidia has limited number of video encoding sessions (so number of distinct video streams) to five and not all hardware can even do that. So it's CPU time if you have multiple such sessions, and running high-quality video encoders are not a walk in the park for them.
Because the streams would be between two end points, multiple streams could be packed inside the same stream to save encoders, but I don't think anyone's doing that..
Alternatively to the number of encoders limitation it helps if you run a single stream (stream the desktop), but personally I consider per application remote use a much more flexible system, and the default provided by X11.
> Video by the mere nature or modern codecs is already very optimized on focusing only on changes to the encoded image, so it's the best option. You render things were they run, then send the video.
Surely scrolling a document by instructing the server to render a different part of the server-side bitmap to its own screen is going to be way more effective than encoding+decoding video, i.e. when considering latency, quality, energy consumption, memory usage and bandwidth?