Dav1d: performance and completion of the first release (opens in new tab)

(jbkempf.com)

125 pointsrbultje7y ago37 comments

37 comments

35 comments · 5 top-level

jbk7y ago· 15 in thread

I'm the author, so if you need anything, just ask.

Thank you for putting a "what the heck is this" bit near the top! So many announcements like this assume you know exactly what is being talked about.

gardaani7y ago

Does dav1d support scalability, such as spatial scalability? Is is possible to decode only 1920x1080 frames from a 3840x2160 video (if the video has been encoded with spatial scalability)?

It would be nice to be able to decode smaller frame dimensions with faster decoding time. That would be useful for viewing 4K material on computers which can't decode the full resolution.

The same for 10- and 12-bit videos - it would be nice to be able to decode a 8-bit version for 8-bit displays with faster decoding time.

Sir_Cmpwn7y ago

Hi! This is really cool. I've been browsing the code and I wanted to ask, how difficult do you think it would be to port this to a system without pthreads? Can it be used on one thread?

Update: a more thorough look at the code quickly disillusioned me to this idea. Same as libaom...

rbultjeOP7y ago

Hi! You have 2 options: 1) write pthread emulation for your target system. We wrote one for windows native threads, but others should be straightforward. 2) if you want thread-less, that's possible (single-threaded performance shows 1080p is easy, and on high-end systems even 4K single-threaded might be doable), which basically just involves putting the two functions in thread_task.c under #if HAVE_THREADS, along with any coded calling pthread_() functions or using pthread_ types from <pthreads.h>, and then enforcing that Dav1dSettings.n_{tile,frame}_threads is always 1 (that means it won't ever enter these codepaths). Then, you always get single-threaded and (p)thread-less decoding.

Feel free to come on IRC, happy to help you dive into this, it's not very difficult.

Sir_Cmpwn7y ago

Oh, great! I will hop onto IRC. Thanks!

clouddrover7y ago

How much difference in performance is there between decoding 8-bit video versus 10-bit video?

rbultjeOP7y ago

Right now, 10-bit decoding is horribly slow because the assembly optimizations only cover 8-bit, so it's probably 10-20x slower. We'll work on 10-bit next, and in the end, I'd expect it to be 30-50% slower than 8-bit.

zamadatix7y ago

Are 10 & 12 bit decoding in the same optimization bucket or do they need to be treated separately?

1 more reply

ComputerGuru7y ago

Realistically speaking comparing a hevc (x265) run and a dav1d run producing a video of similar quality but ~20% smaller, what is the difference in encoding time?

ktta7y ago

You'll want to check out rav1e ( https://github.com/xiph/rav1e)

Here's a comment that gives a clue - https://news.ycombinator.com/item?id=17539791

rbultjeOP7y ago

dav1d is a decoder, not an encoder.

phkahler7y ago

Numbers for Ryzen 2400G would be nice. That's my main computer and they make a great HTPC.

Great news though!

jbk7y ago

> Numbers for Ryzen 2400G would be nice. That's my main computer and they make a great HTPC.

No access to those machines, so I cannot guess...

Ono-Sendai7y ago

Can you build it on Windows?

rbultjeOP7y ago

Yes, it supports Windows natively. The tests were run on Windows.

polskibus7y ago· 6 in thread

Is this written in Rust? If so, did any particular Rust features help a lot in this achievement, in comparison to writing the code in C or C++?

nindalf7y ago

You're thinking of rav1e, which bills itself as "The fastest and safest AV1 encoder"

https://github.com/xiph/rav1e

dralley7y ago

Being a decoder, they probably place a high priority on having the widest possible platform support. C is still top dog in that respect.

w-m7y ago

I'm curious, what kind of platforms do you have in mind, that

a) can be targeted by C, but not by Rust

b) provide enough performance to make porting a next-gen video decoder a worthwhile exercise?

1 more reply

rbultjeOP7y ago

No, it is written in C and assembly. See the gitlab graphs:

https://code.videolan.org/videolan/dav1d/graphs/master/chart...

ncmncm7y ago

It has to be C because so many embedded-system vendors are pathologically hostile to anything else. Most tolerate C only to try to win ports from other, typically end-of-lifed, targets, and resent it.

A few have begun to embrace LLVM, and so don't care about the front-end language -- they still only say they support C, but turn out to not notice if you feed in IR from something else. Then it becomes a question of how badly your code needs the language runtime support code, or how good you are at porting it, because they will not pick up maintaining any of that under any circumstance. GC? Ha.

gameswithgo7y ago

They would have had a simpler time of cpu feature detection in Rust, as it is built in. But that isn't a huge thing.

twotwotwo7y ago· 5 in thread

It's super neat to see desktop-class machines should be able to play 1080p AV1 fine with zero hardware support.

I think the lack of mention of GPUs in the post means the answer will be "no", but is this an area where open-source folks could realistically someday lean on the GPU for any help with decoding at all?

I see mentions of CPU/GPU "hybrid decoding" from GPU vendors, but can imagine that might only be something realistically possible with the lower-level access to the GPU the vendor's own driver team has, not via the documented shader languages and APIs.

jbk7y ago

> I think the lack of mention of GPUs in the post means the answer will be "no", but is this an area where open-source folks could realistically someday lean on the GPU for any help with decoding at all?

Very very hard to do, with standard GPU APIs. You need GPU assembly to do great stuff, and this is rarely available or cross-GPUs.

Also, the issue is that, after SIMD, the run time of the things that are easy to parallelize (therefore GPU-izable) is around 25% or 30%. Which could offer some improvements, but not a x2 improvement.

Also, CPU <-> GPU memory transfer need to be avoided, on desktop, or mobiles where the memory access is not uniform, because this adds a lot of I/O latency.

So, some things are doable, but a full "GPGPU decoder" is unlikely...

twotwotwo7y ago

Thanks, seemed like something like this might be the case, but good to hear it confirmed and the details. And thanks again for the work on dav1d!

CyberDildonics7y ago

Why would it take 'low level' GPU access to accelerate video decoding? OpenGL has had compute buffers for years now.

twotwotwo7y ago

The motivating observation here is that I know of a few GPU vendors offering hybrid decoding for HEVC and VP9, but no hybrid decoders put together by the open-source community. (Counterexamples are interesting!)

Reasons a GPU vendor might be better able to do this sort of thing than an outsider who can sling OpenGL include: 1) some hybrid decoders are described as leaning partly on special-purpose video decoding hardware, which tends to be a black box to us, and 2) more-detailed understanding of and access to the details of the hardware might let you efficiently express something that's inefficient or awkward in just GLSL--in other words, same kind of reason people care about Metal/Vulkan vs. OpenGL or asm vs. C.

(The further down in the weeds I get the less sure I am of precise technical correctness, but a couple concrete things that seem to make shaderizing decoding tricky are: 1) AV1 has a ton of control-flow-y elements--blocks can be split many different ways and be different sizes, and there are lots of prediction modes--and branchy code can be bad for shader efficiency, and 2) some things seem to block parallelism, e.g. for intra prediction you need the blocks you're predicting from before you can do predictions for the next block. And given the CPU-GPU transfer latency you can't ping-pong back and forth at will; you need large chunks that run well strictly on the GPU. Could be that pieces like the transforms and post-filtering that can be cleanly separated into GPU steps, though.)

An efficient open-source AV1 decoder based just on OpenGL/GLSL would be great! But since it wasn't mentioned as an ambition in the post, community-written hybrid decoders seem rare, and we had an expert about AV1 decoders in the thread, it did not seem unreasonable to me to ask how realistic it was.

Though if you manage to write an open-source OpenGL-accelerated AV1 decoder, that would definitely answer my question and leave everyone happy. :)

twotwotwo7y ago

(jbk's recent reply answers this better than I could.)

BlackLotus897y ago· 3 in thread

> Therefore, the VideoLAN, VLC and FFmpeg communities have started to work on a new decoder

Is there a need to seperate VideoLAN and VLC?

Anyway nice progress, didn't expect such good results so soon. My main question right now is what the slowest system is on which AV1 is still playable. I know that older CPU and ARM optimizations are on the horizon (On the other platforms, SSE and ARM assembly will follow very quickly, and we're already as fast on ARMv8.), but I'm curious if my raspberry pi/odroid will ever be able to play 1080p AV1 Videos.

jbk7y ago

> Is there a need to seperate VideoLAN and VLC?

Yes, the community are not joint. VideoLAN has numerous people not working on VLC.

> Raspberry pi/odroid will ever be able to play 1080p AV1 Videos.

rPi? no. Recent o-Droid, yes.

naikrovek7y ago

> VideoLAN has numerous people not working on VLC.

Whoa, what? What else is going on? Oh, x264 & x265, I bet.

buovjaga7y ago

https://www.videolan.org/projects/

cornstalks7y ago· 1 in thread

Congrats to everyone on the progress, and a huge thanks from me to all the devs who are working on this! Are there any performance comparisons with dav1d (AV1) vs ffvp9 (VP9)? I’m curious how expensive decoding AV1 is compared to VP9 (in software) (and I’m hoping someone else has already done the benchmarking so I won’t have to).

jbk7y ago

It is a bit more expensive, but not much, for the same quality (aka less bitrate). For same bitrate, it's 25%/30 more expensive.

No actual measure, just feeling from what we've seen.

j / k navigate · click thread line to collapse

37 comments

35 comments · 5 top-level

jbk7y ago· 15 in thread

I'm the author, so if you need anything, just ask.

wpietri7y ago

Thank you for putting a "what the heck is this" bit near the top! So many announcements like this assume you know exactly what is being talked about.

gardaani7y ago

Does dav1d support scalability, such as spatial scalability? Is is possible to decode only 1920x1080 frames from a 3840x2160 video (if the video has been encoded with spatial scalability)?

It would be nice to be able to decode smaller frame dimensions with faster decoding time. That would be useful for viewing 4K material on computers which can't decode the full resolution.

The same for 10- and 12-bit videos - it would be nice to be able to decode a 8-bit version for 8-bit displays with faster decoding time.

Sir_Cmpwn7y ago

Hi! This is really cool. I've been browsing the code and I wanted to ask, how difficult do you think it would be to port this to a system without pthreads? Can it be used on one thread?

Update: a more thorough look at the code quickly disillusioned me to this idea. Same as libaom...

rbultjeOP7y ago

Feel free to come on IRC, happy to help you dive into this, it's not very difficult.

Sir_Cmpwn7y ago

Oh, great! I will hop onto IRC. Thanks!

clouddrover7y ago

How much difference in performance is there between decoding 8-bit video versus 10-bit video?

rbultjeOP7y ago

zamadatix7y ago

Are 10 & 12 bit decoding in the same optimization bucket or do they need to be treated separately?

1 more reply

ComputerGuru7y ago

Realistically speaking comparing a hevc (x265) run and a dav1d run producing a video of similar quality but ~20% smaller, what is the difference in encoding time?

ktta7y ago

You'll want to check out rav1e ( https://github.com/xiph/rav1e)

Here's a comment that gives a clue - https://news.ycombinator.com/item?id=17539791

rbultjeOP7y ago

dav1d is a decoder, not an encoder.

phkahler7y ago

Numbers for Ryzen 2400G would be nice. That's my main computer and they make a great HTPC.

Great news though!

jbk7y ago

> Numbers for Ryzen 2400G would be nice. That's my main computer and they make a great HTPC.

No access to those machines, so I cannot guess...

Ono-Sendai7y ago

Can you build it on Windows?

rbultjeOP7y ago

Yes, it supports Windows natively. The tests were run on Windows.

polskibus7y ago· 6 in thread

Is this written in Rust? If so, did any particular Rust features help a lot in this achievement, in comparison to writing the code in C or C++?

nindalf7y ago

You're thinking of rav1e, which bills itself as "The fastest and safest AV1 encoder"

https://github.com/xiph/rav1e

dralley7y ago

Being a decoder, they probably place a high priority on having the widest possible platform support. C is still top dog in that respect.

w-m7y ago

I'm curious, what kind of platforms do you have in mind, that

a) can be targeted by C, but not by Rust

b) provide enough performance to make porting a next-gen video decoder a worthwhile exercise?

1 more reply

rbultjeOP7y ago

No, it is written in C and assembly. See the gitlab graphs:

https://code.videolan.org/videolan/dav1d/graphs/master/chart...

ncmncm7y ago

gameswithgo7y ago

They would have had a simpler time of cpu feature detection in Rust, as it is built in. But that isn't a huge thing.

twotwotwo7y ago· 5 in thread

It's super neat to see desktop-class machines should be able to play 1080p AV1 fine with zero hardware support.

jbk7y ago

Very very hard to do, with standard GPU APIs. You need GPU assembly to do great stuff, and this is rarely available or cross-GPUs.

Also, CPU <-> GPU memory transfer need to be avoided, on desktop, or mobiles where the memory access is not uniform, because this adds a lot of I/O latency.

So, some things are doable, but a full "GPGPU decoder" is unlikely...

twotwotwo7y ago

Thanks, seemed like something like this might be the case, but good to hear it confirmed and the details. And thanks again for the work on dav1d!

CyberDildonics7y ago

Why would it take 'low level' GPU access to accelerate video decoding? OpenGL has had compute buffers for years now.

twotwotwo7y ago

Though if you manage to write an open-source OpenGL-accelerated AV1 decoder, that would definitely answer my question and leave everyone happy. :)

twotwotwo7y ago

(jbk's recent reply answers this better than I could.)

BlackLotus897y ago· 3 in thread

> Therefore, the VideoLAN, VLC and FFmpeg communities have started to work on a new decoder

Is there a need to seperate VideoLAN and VLC?

jbk7y ago

> Is there a need to seperate VideoLAN and VLC?

Yes, the community are not joint. VideoLAN has numerous people not working on VLC.

> Raspberry pi/odroid will ever be able to play 1080p AV1 Videos.

rPi? no. Recent o-Droid, yes.

naikrovek7y ago

> VideoLAN has numerous people not working on VLC.

Whoa, what? What else is going on? Oh, x264 & x265, I bet.

buovjaga7y ago

https://www.videolan.org/projects/

cornstalks7y ago· 1 in thread

jbk7y ago

It is a bit more expensive, but not much, for the same quality (aka less bitrate). For same bitrate, it's 25%/30 more expensive.

No actual measure, just feeling from what we've seen.

j / k navigate · click thread line to collapse