“TurboFan” – Experimental new optimizing compiler for Google's V8 JS engine (opens in new tab)

(groups.google.com)

101 pointsjonrimmer11y ago17 comments

17 comments

13 comments · 3 top-level

dictum11y ago· 5 in thread

Here's something that's been bothering me since Chrome dropped the experimental support for `position: sticky` (and CSS Regions before) and I didn't find the right place to ask (nor is a submission about the JS engine an appropriate venue for this question, I know), so I'm going to hijack this thread:

We know some properties are expensive and when you use them a few times (or with certain values) you get sub-60fps scrolling — but why are they expensive? Are they inherently hard to optimize (e.g. different GPUs across mobile devices), or is it that nobody got to optimize them yet?

pcwalton11y ago

Because, in general, they can trigger layout/reflow, so they can't run on the compositor. For example, changing things like "top" on an absolutely positioned box can trigger reflow of the contents inside, because of the way CSS works (for example, if "bottom" is set, then the height changes, which can affect the size of things with percentage heights, which can cause floats to be repositioned, etc. etc.)

In traditional browser engines it's even worse because layout runs on the main thread, which is also shared with your JavaScript, so painting ends up transitively blocked waiting for your scripts to finish. That is not the case in Servo (disclaimer: I work on Servo), but making layout run off the main thread is hard for many reasons—some inherent to the problem, some historical in the design of current engines—so all current browsers run JS and layout together.

bgirard11y ago

Exactly.

To elaborate on how this works in Gecko. The rendering pipeline has several optional stages:

requestAnimationFrame (Scripts) -> Style flush -> Reflow flush -> display list construction -> Layer construction (recycling) -> invalidation -> Paint/Rasterization -> Compositing (on it's own thread).

Gecko tries to only run each stage of the pipeline only if they are needed. Fast operations like a CSS transition on an opacity or transform will only activate the Compositing phase. WebGL only canvas drawing will only activate rAF + Compositing. Meanwhile a JS animation on "top" will run all of these phases.

erichocean11y ago

> Because, in general, they can trigger layout/reflow, so they can't run on the compositor.

If you're doing absolute layout, it's easier/better performing to just set top/left to zero, set width and height to their correct values, and then use CSS transforms to position the element where you want it on the page. :)

(Not meant for you, pcwalton, but for others who might not know about a reasonable workaround.)

Igglyboo11y ago

I hadn't heard of Servo until just now, googled it and it looks really neat.

Could you elaborate on what things you think Servo does better than current rendering engines?

mkoryak11y ago

I am going to hijack your hijacked thread:

I wrote a [jquery plugin](http://mkoryak.github.io/floatThead/) that 'simulates' `position:sticky` on table headers.

cpeterso11y ago· 3 in thread

> We've reached the point where it's better for our development velocity to work in the open on the bleeding edge branch of V8. That's also better for our collaborators who are working on ports to more platforms.

I wonder who are the collaborators with access to Google's private V8 repo and what platforms they're porting to. If merging TurboFan to the open repo didn't reveal their partners' proprietary plans now, then why not develop in the open sooner?

azakai11y ago

I interpreted that differently - that Google's private, secret v8 repo is hidden from their partners as well. Therefore they decided to unveil the work publicly so that their partners can access it, start to work to port turbofan to the other platforms, and so forth.

As to who the partners are, you can see commits from Intel adding x87 support and Imagination doing MIPS, for example.

As to why not develop in the open sooner, good question - this is the second time v8 does this, after CrankShaft (third time if you count the initial unveiling of chrome and v8). Maybe it's just how they work.

hvidgaard11y ago

I can imagine that they want to have some initial implementation before they let the public see it. They have a fairly good idea of what they want with it and want to make sure it goes in that direction before they let the public comment/commit.

That's just my theory anyway.

admax88q11y ago

x87 isn't a platform, its just the floating point parts of x86

2 more replies

bhouston11y ago· 2 in thread

It would be nice to understand the goals of this project in terms of both approach as well as the overall performance improvements and how it would compare with the new LLVM-based approach taken by Safari/Webkit.

azakai11y ago

Yes, very curious about this too.

As a first guess, I'm not sure what the v8 strategy is here. The new compiler seems to use the "sea of nodes" approach as opposed to SSA form. A comparison of the two is here

http://static.squarespace.com/static/50030e0ac4aaab8fd03f41b...

The "sea of nodes" approach can give some speedups, but they don't appear huge - 10-20% in that link. Not sure how representative that data is. But it is interesting that modern compilers, like gcc and LLVM, typically use SSA form and not the approach v8 is taking, as further evidence that the "sea of nodes" is not clearly superior.

Perhaps the v8 designers believe there is some special advantage for JS that the new model provides? Otherwise this seems surprising. But hard to guess as to such a thing. If anything, JS has lots of possible surprises everywhere, which makes control flow complex (this can throw, that can cause a deopt or bailout, etc.), and not the best setting for the new approach.

Furthermore, the "sea of nodes" approach tends to take longer to compile, even as it emits somewhat better code. Compilation times are already a big concern in JS engines, more perhaps than any other type of compiler.

Perhaps v8 intends to keep crankshaft, and have turbofan as a third tier (baseline-crankshaft-turbofan)? That would let it only run the slower turbofan when justified. But that seems like a path that is hard to maintain - 2 register allocators, etc., - and turbofan seems like in part a cleanup of the crankshaft codebase (no large code duplications anymore, etc.), not a parallel addition.

Overall the Safari and Firefox strategies make sense to me: Safari pushes the limits by using LLVM as the final compiler backend, and Firefox aside from general improvements has also focused efforts on particular aspects of code or code styles, like float32 and asm.js. Both of those strategies have been proven to be very successful. I don't see, at first glance, what Chrome is planning here. However, the codebase has some intriguing TODOs, so maybe the cool stuff is yet to appear.

rayiner11y ago

The "sea of nodes" approach is just a data structure for representing a program in SSA form. It's orthogonal to anything that has an impact on speed. E.g. GCC uses a tree representation, LLVM uses a CFG, and Hotspot (C2 and Graal) uses a "sea of nodes" representation, but they all represent code in SSA form and that representation is orthogonal to the quality of particular optimizations implemented within the framework.

The speedup reported in that paper is from running constant propagation and dead code elimination at the same time instead of doing them separately, which finds more constants and dead code because the two problems are coupled. The same process can be implemented in a more traditional CFG representation (and generally is--sparse conditional constant propagation).

2 more replies

j / k navigate · click thread line to collapse

17 comments

13 comments · 3 top-level

dictum11y ago· 5 in thread

pcwalton11y ago

bgirard11y ago

Exactly.

To elaborate on how this works in Gecko. The rendering pipeline has several optional stages:

erichocean11y ago

> Because, in general, they can trigger layout/reflow, so they can't run on the compositor.

(Not meant for you, pcwalton, but for others who might not know about a reasonable workaround.)

Igglyboo11y ago

I hadn't heard of Servo until just now, googled it and it looks really neat.

Could you elaborate on what things you think Servo does better than current rendering engines?

mkoryak11y ago

I am going to hijack your hijacked thread:

I wrote a [jquery plugin](http://mkoryak.github.io/floatThead/) that 'simulates' `position:sticky` on table headers.

cpeterso11y ago· 3 in thread

azakai11y ago

As to who the partners are, you can see commits from Intel adding x87 support and Imagination doing MIPS, for example.

hvidgaard11y ago

That's just my theory anyway.

admax88q11y ago

x87 isn't a platform, its just the floating point parts of x86

2 more replies

bhouston11y ago· 2 in thread

azakai11y ago

Yes, very curious about this too.

As a first guess, I'm not sure what the v8 strategy is here. The new compiler seems to use the "sea of nodes" approach as opposed to SSA form. A comparison of the two is here

http://static.squarespace.com/static/50030e0ac4aaab8fd03f41b...

rayiner11y ago

2 more replies

j / k navigate · click thread line to collapse