Pointermove event latency in web browsers (opens in new tab)

(rsms.me)

98 pointssecondo6y ago56 comments

56 comments

44 comments · 11 top-level

snvzz6y ago· 9 in thread

The neglect for latency in current popular systems such as Linux sickens me.

I suggest experimenting with cyclictest from rt-tests. On all hardware I've tried, I get 30ms+ peaks after running it on the background for not even very long. I can't comprehend how anybody could find this acceptable.

I do run linux-rt for this reason. Then again, while linux-rt provides the tools to make latency reasonable, the rest of the system hardly does use them.

As we move from the likes of Linux to better architected systems, potentially based on seL4, I do hope the responsiveness will return to sanity. Until then, I'll have to keep going back to my Amiga hardware as cope mechanism.

notriddle6y ago

> I can't comprehend how anybody could find this acceptable.

Because Linux is primarily funded by server companies, and servers are optimized almost exclusively for throughput?

benibela6y ago

Apple 2e from 1983 was the quickest, it is said: https://danluu.com/input-lag/

snvzz6y ago

Of the systems tested. They didn't try AmigaOS. They didn't try freedos. Or haiku. Or netbsd.

But yeah, the point is clear: Current, popular desktop systems are pretty bad at responsiveness.

easytiger6y ago

The jump in rhel from 6 to 7 basically made it incredibly hard to tune Linux for very low latency performance requirements. Fairly simple on 6 but 7 made it very difficult. There are lots of tools available, nohz etc, but it doesn't help much. Primary core on each numa node is also loaded with kernel threads causing huge amounts of jitter.

Basically everything is tuned for running web apps with loads of procs for people who don't really care about latency of 100s of millis.

joosters6y ago

Why would a real-time OS help at all with latency? All RT means is that the latency can be reliably upper-bounded (but note that that upper bound might be very high/slow), it doesn't mean that the latency will be reduced. Real-time OSs aren't faster.

codys6y ago

linux-rt is a patchset that changes the behavior of linux to increase the number of places where preemption can occur (among other things).

Doing this decreases certain types of latency in certain situations. As an example, it tries to have interrupts disabled less frequently and for shorter intervals, and uses mutexes instead of spinlocks.

As a result, using linux-rt can provide a lower latency experience compared plain linux.

1 more reply

snvzz6y ago

>reliably upper-bounded.

Is extremely desirable. Those multi-ms peaks of latency Linux has are the ones that cause audio cuts and perceived hiccups.

Of course it doesn't matter perceptually if the average is 1µs or 5µs. It's all about the peaks, and keeping them bounded enough so that latency does never cross the perceptual threshold.

1 more reply

Dylan168076y ago

For something that normally takes milliseconds, it's metrics like 99% latency that matter, not average latency. It doesn't have to be "faster".

silon426y ago

Personally I'm staying on X11 + no compositor for this reason.

modeless6y ago· 7 in thread

I've done a lot of work testing this type of latency in web browsers: https://google.github.io/latency-benchmark/

On Windows, DWM's display compositing adds one frame of latency to every window on screen. It's not possible to render a dragged object in any window that sticks to the mouse cursor without at least one frame of latency.

But when you drag whole windows around they do stick to the mouse cursor with apparently zero frames of latency; how does DWM do it? Easy, they cheat by disabling the hardware mouse overlay during window dragging so that the mouse cursor gets that extra frame of latency too. You can prove this by enabling "Night light" in settings; watch the mouse cursor change colors as it transitions from hardware overlay to software rendering when you start dragging a window.

DaiPlusPlus6y ago

Could compositors be optimised to eliminate the extra frame of lag in the case where every window on-screen is being displayed “directly” by invisibly switching to a mode that maps each scanline and pixel column to a window’s framebuffer - and non-client areas to the window-manager’s UI buffer which is directly read by the monitor signal generator. While this would mean transparency effects wouldn’t work, it could supported with some special-casing. Basically a framebuffer-less hardware compositor. I think rendering windows to 3D deformable meshes [makes for cool demos](https://youtu.be/USedxVrU2Ko) but in practice we just don’t use it for anything besides window open/close animations.

I had to use a monitor running at 30Hz for a while (4K over HDMI 1.4) and while that was bad enough, the compositor’s lag meant all window contents had an extra (unnecessary IMO) delay of 33ms. Add on to that normal monitor input lag.

We’ll probably all shift to 120Hz w/ variable-rate refreshing as a new baseline standard over the next 10 years as Apple seems to be heading in that direction - at 120Hz the lag of the compositor would be acceptable - but I’m worried lazy that graphics devs are going to use that as an excuse to add another frame of latency...

modeless6y ago

> Could compositors be optimised to eliminate the extra frame of lag in the case where every window on-screen is being displayed “directly” by invisibly switching to a mode that maps each scanline and pixel column to a window’s framebuffer

Yes. This concept is called hardware overlays and there are varying levels of support for it in different GPUs and compositors.

There are tradeoffs. Using multiple hardware overlays may cost extra power and/or memory bandwidth, the number of supported overlays may be very limited, alpha blending may not be supported, and the transforms that can be applied to overlays may be very limited. The extremely hardware specific nature of the restrictions and the lack of good APIs exposing overlays means they get much less use than they should.

1 more reply

gruez6y ago

>On Windows, DWM's display compositing adds one frame of latency to every window on screen. It's not possible to render a dragged object in any window that sticks to the mouse cursor without at least one frame of latency.

AFAIK you can bypass this by using dxgi flip model so no additional latency is incurred. There's still is going to be 1 frame of latency from the vsync though.

>You can prove this by enabling "Night light" in settings; watch the mouse cursor change colors as it transitions from hardware overlay to software rendering when you start dragging a window.

can't reproduce on my end. maybe the upped the night light implementation so the hardware cursor is tinted as well.

modeless6y ago

> AFAIK you can bypass this by using dxgi flip model so no additional latency is incurred.

Using the flip model only eliminates the latency if DWM promotes your window to a hardware overlay. On Nvidia systems this is simply not supported, so the latency is always there and it's impossible to get rid of it. Maybe DWM supports overlays on Intel or AMD, I'm not sure. It would be interesting for someone to test this.

> There's still is going to be 1 frame of latency from the vsync though.

Vsync does not inherently require any extra latency. You can render as close to vsync as you like to reduce the latency an arbitrary amount. That's what VR compositors do. All you need to do is ensure you can't flip during scanout and you can't get tearing.

1 more reply

ReactiveJelly6y ago

> It's not possible to render a dragged object in any window that sticks to the mouse cursor without at least one frame of latency.

I thought this was a fact of all window managers?

I'd noticed it when making games in SDL / SDL2 on Linux and just assumed it was because the X server couldn't possibly wait on me to paint a frame before updating its own cursor

mehrdadn6y ago

Do you know how DWM disables the hardware mouse overlay? Is it an IOCTL?

gruez6y ago

https://docs.microsoft.com/en-us/windows/win32/api/winuser/n...

any application can do it, hence why you get laggy cursor in some games that opt to draw their own cursor.

1 more reply

vxxzy6y ago· 7 in thread

I got a bit excited thinking this may go into latency of dereferencing pointers in C.

forrestthewoods6y ago

Same! I wrote a blog post that _kind of_ talks about that.

https://www.forrestthewoods.com/blog/memory-bandwidth-napkin...

deagle506y ago

Interesting results. Any ideas why the L1 got slower?

1 more reply

dang6y ago

Ok, we've disambiguated the bejesus out of the title above.

kick6y ago

Was the original title the article's title? I'm really curious now.

1 more reply

rq16y ago

It would be great to have these numbers indeed !

sesuximo6y ago

Same! I would love to read that

gumby6y ago

Yeah me too!

skybrian6y ago· 5 in thread

If you try to "predict the present" based on the past (and when you use previous points to calculate velocity and acceleration, that's what you're doing) it will overshoot when there's a change in direction, and how much depends on how aggressively you try to extrapolate. For the one-dimensional case in signal processing, doing this with a quickly-changing signal like a square wave will result in ringing.

It can smooth things a bit but it's not that good a substitute for actually improving latency.

(There are probably consequences for coronavirus charts as well, since they're based on lagging data.)

modeless6y ago

Although I agree that there's no substitute for actually improving latency, I think it's possible to do significantly better at prediction. Mouse movements are not easily predictable but they are also not completely random; this is a good type of problem to apply machine learning to.

Ultimately you want the lowest possible latency and prediction, because you can never get the latency to zero. Once the latency is small enough, prediction becomes a net win. For example, all VR devices do prediction for head and hand positions after lowering latency as much as possible elsewhere.

Rauchg6y ago

I totally agree, you want both. Negative latency!

https://rauchg.com/2014/7-principles-of-rich-web-application...

1 more reply

MauranKilom6y ago

I would reckon that overshoot (the mouse cursor moving "backwards" after over-prediction) is significantly worse than undershoot in terms of user experience. We can easily compensate for mouse acceleration not being constant (ask anyone with enhanced pointer speed). But the pointer doing qualitatively different movements than what you input is annoying.

In the limit I guess this boils down to "do no prediction" (which I also suppose is what the linked site's conclusion is).

xellisx6y ago

Comment almost sounds like it's about PID.

benibela6y ago

Reminds me of a Kalman filter

negativegate6y ago· 2 in thread

I'm seeing <2ms in Edge Chromium and ~10ms in Firefox on a 144 Hz display. I'm curious how that compares to what other people are seeing.

I've been doing some WebGL work recently and I've noticed that while it reaches ~144 fps using requestAnimationFrame() in Firefox, there's a lot of stuttering. It's very smooth at 144 fps in Edge Chromium, while Edge Legacy is below 80 fps. As far as I can tell it's not CPU bound, and it's definitely not GPU bound. It would be nice if I could get it running smoothly in Firefox but I don't know what to investigate.

epidemian6y ago

> I'm curious how that compares to what other people are seeing.

~10ms on Firefox, Linux, 60hz display.

ishanjain286y ago

1-2ms(avg) in FF on Linux on a 60Hz Display. 2.5-3.5 in Brave in a similar setup.

jcelerier6y ago· 1 in thread

> If you move your pointer left and right (or up and down) in sweeping motions and follow it with your eyes, you'll notice that the rectangle is trailing behind the pointer by quite a long distance

that's definitely not what I am observing (https://streamable.com/9u4cpx). Enabling the predictive tracking, however, is quite nauseating especially in circular motions. Please don't play with your users' cursors !

codys6y ago

The article does mention that the predictive tracking feels worse:

> predictive tracking will feel much worse than direct (technically lagging) tracking when there is no system cursors to match.

Additionally, we can see the lag between the red box and your cursor in the video of your screen that you've uploaded.

https://i.imgur.com/ZEBcGch.png

tobr6y ago· 1 in thread

I recently experimented with implementing certain pointer-controlled effects on a <canvas>, and was discouraged by the jerky feeling caused by latency.

But I noticed that if I rendered the effect with motion blur, it suddenly started to feel much smoother, and the perception of jerkiness was mostly gone. I felt that it completely restored my sense of control of the motion.

It’s surprising considering that motion blur actually adds one half frame of extra latency.

Since trying this, I’m bothered by how jerky fast mouse movement always feels in MacOS. 60 fps leaves these enormous, ugly gaps between the pointer at each frame, and makes it hard to perceive the motion correctly. I can’t unsee these gaps now! I’m convinced that system-wide motion blur just for the pointer would be a simple way to make the whole OS feel much smoother and more responsive.

leddt6y ago

I have had a similar experience when first using a 144hz display. I was amazed how responsive the mouse was. How "in control" I felt.

Then, going back to a 60hz display, I couldn't NOT see the gaps left by the cursor's movement. I had never before seen this as a problem, but seeing something better ruined 60hz for me.

ufo6y ago· 1 in thread

The dead-reckoning algorithm seems to do well when moving in a straight line but my impression is that it does worse if there are curves because it veers ourside the path that mouse actually traced. For example, when moving the mouse in a circle the predicted squaretrace appears to move in a circle with a larger radius.

What kind of algorithm could be used to improve the accuracy for curves?

aidenn06y ago

For just ellipses (which includes circles), a 2nd derivative prediction will work perfectly. Obviously there are paths that are not predictable though

emersion6y ago

>This happens in a buffer and is normally one display update behind in time.

This assumes compositors perform their work right after each display refresh. Compositors can decide to perform their work later, some amount of time before the next display refresh (e.g. a few milliseconds). This allows to reduce latency because the new buffers submitted by clients (such as web browsers) can be displayed with less than 1 refresh period worth of latency. For instance the browser can update its buffer at last display refresh + 8ms, then the compositor can composite at last display refresh + 13ms, and the new frame can be displayed at last display refresh + 16ms.

Here's for instance how Weston does it: [1]. Sway has a similar feature.

>However since pointing with a cursor is such a core experience in these OS'es, the "screen compositor" usually have special code to draw the cursor on screen as late as possible—as close in time to an actual display refresh as possible—to be able to use the most recent position data from the input device driver.

That's not entirely true. Nowadays all GPUs have a feature called "cursor plane". This allows the compositor to configure the cursor directly in the hardware and to avoid drawing it. So when the user just moves the mouse around the compositor doesn't need to redraw anything, all it needs to do is update the cursor position in the hardware registers.

Compositors don't have code to draw the cursor as late as possible. Instead, they program the cursor position when drawing a new frame. (On some hardware this allows the compositor to "fixup" the cursor position in case some input events happen after drawing and before the display refresh.)

But in the end, all of this doesn't really matter. What matters is that the app draws before the compositor draws, thus the compositor will have a more up-to-date cursor position.

[1]: https://ppaalanen.blogspot.com/2015/02/weston-repaint-schedu...

eyelidlessness6y ago

I didn't read the article, but I did try the checkboxes. What I saw surprised me and I will go read the article to see if it addresses my experience, but in case it isn't:

1. The predictive checkbox improved tracking my cursor.

2. Disabling `requestAnimationFrame` improved it more.

This is not what I'd have expected, so I'll include details about my environment:

- macOS 10.15.4

- Safari 13.1

- 2019 16" MBP with maxed RAM and ~25GB swap

I have no idea whether the browser or the memory pressure made same-thread tracking more accurate, but something did.

Shtirlic6y ago

Also great bench for browsers and perf tester https://www.vsynctester.com/

j / k navigate · click thread line to collapse

56 comments

44 comments · 11 top-level

snvzz6y ago· 9 in thread

The neglect for latency in current popular systems such as Linux sickens me.

I do run linux-rt for this reason. Then again, while linux-rt provides the tools to make latency reasonable, the rest of the system hardly does use them.

notriddle6y ago

> I can't comprehend how anybody could find this acceptable.

Because Linux is primarily funded by server companies, and servers are optimized almost exclusively for throughput?

benibela6y ago

Apple 2e from 1983 was the quickest, it is said: https://danluu.com/input-lag/

snvzz6y ago

Of the systems tested. They didn't try AmigaOS. They didn't try freedos. Or haiku. Or netbsd.

But yeah, the point is clear: Current, popular desktop systems are pretty bad at responsiveness.

easytiger6y ago

Basically everything is tuned for running web apps with loads of procs for people who don't really care about latency of 100s of millis.

joosters6y ago

codys6y ago

linux-rt is a patchset that changes the behavior of linux to increase the number of places where preemption can occur (among other things).

As a result, using linux-rt can provide a lower latency experience compared plain linux.

1 more reply

snvzz6y ago

>reliably upper-bounded.

Is extremely desirable. Those multi-ms peaks of latency Linux has are the ones that cause audio cuts and perceived hiccups.

Of course it doesn't matter perceptually if the average is 1µs or 5µs. It's all about the peaks, and keeping them bounded enough so that latency does never cross the perceptual threshold.

1 more reply

Dylan168076y ago

For something that normally takes milliseconds, it's metrics like 99% latency that matter, not average latency. It doesn't have to be "faster".

silon426y ago

Personally I'm staying on X11 + no compositor for this reason.

modeless6y ago· 7 in thread

I've done a lot of work testing this type of latency in web browsers: https://google.github.io/latency-benchmark/

DaiPlusPlus6y ago

modeless6y ago

Yes. This concept is called hardware overlays and there are varying levels of support for it in different GPUs and compositors.

1 more reply

gruez6y ago

AFAIK you can bypass this by using dxgi flip model so no additional latency is incurred. There's still is going to be 1 frame of latency from the vsync though.

>You can prove this by enabling "Night light" in settings; watch the mouse cursor change colors as it transitions from hardware overlay to software rendering when you start dragging a window.

can't reproduce on my end. maybe the upped the night light implementation so the hardware cursor is tinted as well.

modeless6y ago

> AFAIK you can bypass this by using dxgi flip model so no additional latency is incurred.

> There's still is going to be 1 frame of latency from the vsync though.

1 more reply

ReactiveJelly6y ago

> It's not possible to render a dragged object in any window that sticks to the mouse cursor without at least one frame of latency.

I thought this was a fact of all window managers?

I'd noticed it when making games in SDL / SDL2 on Linux and just assumed it was because the X server couldn't possibly wait on me to paint a frame before updating its own cursor

mehrdadn6y ago

Do you know how DWM disables the hardware mouse overlay? Is it an IOCTL?

gruez6y ago

https://docs.microsoft.com/en-us/windows/win32/api/winuser/n...

any application can do it, hence why you get laggy cursor in some games that opt to draw their own cursor.

1 more reply

vxxzy6y ago· 7 in thread

I got a bit excited thinking this may go into latency of dereferencing pointers in C.

forrestthewoods6y ago

Same! I wrote a blog post that _kind of_ talks about that.

https://www.forrestthewoods.com/blog/memory-bandwidth-napkin...

deagle506y ago

Interesting results. Any ideas why the L1 got slower?

1 more reply

dang6y ago

Ok, we've disambiguated the bejesus out of the title above.

kick6y ago

Was the original title the article's title? I'm really curious now.

1 more reply

rq16y ago

It would be great to have these numbers indeed !

sesuximo6y ago

Same! I would love to read that

gumby6y ago

Yeah me too!

skybrian6y ago· 5 in thread

It can smooth things a bit but it's not that good a substitute for actually improving latency.

(There are probably consequences for coronavirus charts as well, since they're based on lagging data.)

modeless6y ago

Rauchg6y ago

I totally agree, you want both. Negative latency!

https://rauchg.com/2014/7-principles-of-rich-web-application...

1 more reply

MauranKilom6y ago

In the limit I guess this boils down to "do no prediction" (which I also suppose is what the linked site's conclusion is).

xellisx6y ago

Comment almost sounds like it's about PID.

benibela6y ago

Reminds me of a Kalman filter

negativegate6y ago· 2 in thread

I'm seeing <2ms in Edge Chromium and ~10ms in Firefox on a 144 Hz display. I'm curious how that compares to what other people are seeing.

epidemian6y ago

> I'm curious how that compares to what other people are seeing.

~10ms on Firefox, Linux, 60hz display.

ishanjain286y ago

1-2ms(avg) in FF on Linux on a 60Hz Display. 2.5-3.5 in Brave in a similar setup.

jcelerier6y ago· 1 in thread

> If you move your pointer left and right (or up and down) in sweeping motions and follow it with your eyes, you'll notice that the rectangle is trailing behind the pointer by quite a long distance

codys6y ago

The article does mention that the predictive tracking feels worse:

> predictive tracking will feel much worse than direct (technically lagging) tracking when there is no system cursors to match.

Additionally, we can see the lag between the red box and your cursor in the video of your screen that you've uploaded.

https://i.imgur.com/ZEBcGch.png

tobr6y ago· 1 in thread

I recently experimented with implementing certain pointer-controlled effects on a <canvas>, and was discouraged by the jerky feeling caused by latency.

It’s surprising considering that motion blur actually adds one half frame of extra latency.

leddt6y ago

I have had a similar experience when first using a 144hz display. I was amazed how responsive the mouse was. How "in control" I felt.

Then, going back to a 60hz display, I couldn't NOT see the gaps left by the cursor's movement. I had never before seen this as a problem, but seeing something better ruined 60hz for me.

ufo6y ago· 1 in thread

What kind of algorithm could be used to improve the accuracy for curves?

aidenn06y ago

For just ellipses (which includes circles), a 2nd derivative prediction will work perfectly. Obviously there are paths that are not predictable though

emersion6y ago

>This happens in a buffer and is normally one display update behind in time.

Here's for instance how Weston does it: [1]. Sway has a similar feature.

But in the end, all of this doesn't really matter. What matters is that the app draws before the compositor draws, thus the compositor will have a more up-to-date cursor position.

[1]: https://ppaalanen.blogspot.com/2015/02/weston-repaint-schedu...

eyelidlessness6y ago

I didn't read the article, but I did try the checkboxes. What I saw surprised me and I will go read the article to see if it addresses my experience, but in case it isn't:

1. The predictive checkbox improved tracking my cursor.

2. Disabling `requestAnimationFrame` improved it more.

This is not what I'd have expected, so I'll include details about my environment:

- macOS 10.15.4

- Safari 13.1

- 2019 16" MBP with maxed RAM and ~25GB swap

I have no idea whether the browser or the memory pressure made same-thread tracking more accurate, but something did.

Shtirlic6y ago

Also great bench for browsers and perf tester https://www.vsynctester.com/

j / k navigate · click thread line to collapse