How to stream media using WebRTC and FFmpeg, and why it's a bad idea (opens in new tab)

(blog.maxwellgale.com)

127 pointsdimes5y ago39 comments

39 comments

33 comments · 7 top-level

mandis5y ago· 17 in thread

>And finally, we encounter a large issue without a good solution. In encoded videos, a key frame is a frame in the video that contains all the visual information needed to render itself without any additional metadata. These are much larger than normal frames, and contribute greatly to the bitrate. Ideally, there would be as a few keyframes as possible. However, when a new user starts consuming a stream, they need at least one keyframe to view the video. WebRTC solves this problem using the RTP Control Protocl (RTCP). When a new user consumes a stream, they send a Full Intra Request (FIR) to the producer. When a producer receives this request, they insert a keyframe into the stream. This keeps the bitrate low while ensuring all the users can view the stream. FFmpeg does not support RTCP. This means that the default FFmpeg settings will produce output that won’t be viewable if consumed mid-stream, at least until a key frame is received. Therefore, the parameter -force_key_frames expr:gte(t,n_forced*4) is needed, which produces a key frame every 4 seconds.

in case someone was wondering why it was a bad idea

tatersolid5y ago

H.264 and most other modern codecs support “intra refresh” to avoid this problem, at the cost of a marginally higher bitrate overall. Think of this as a “rolling keyframe slice” which marches across the screen every few seconds.

http://www.chaneru.com/Roku/HLS/X264_Settings.htm#intra-refr...

Dylan168075y ago

I would say intra refresh solves a different problem. You still have to wait for the intra refresh to cover the frame before you can start watching properly. That takes just as long as waiting for a keyframe, and needs slightly more bytes.

The benefit of intra refresh is that you avoid having any particularly large frames. If you're using a sub-second buffer, then intra refresh makes your maximum frame size much smaller without sacrificing quality. It's a godsend for getting latency down to tiny amounts. But if you have 1 or 2 seconds of buffer then it's no big deal if a keyframe is an order of magnitude bigger than other frames, and intra refresh is pointless.

Also it's not really a codec thing, it's a clever encoder trick that you can do on basically anything.

2 more replies

dimesOP5y ago

I was not aware of this at the time of writing, but it solves a large problem we've been having. Thank you so much for pointing that out.

Edit: I've just tried using intra refresh, and it works pretty well, but the key frame interval is still required.

wwweston5y ago

Thanks for the easy summary.

One thing to consider: for some IRL performances, it's not uncommon that if you arrive late, you might be seated at the timing discretion of an usher. I understand digital experiences may carry different expectations, but I could see building an experience around this, perhaps starting with audio-only and maybe even a countdown to a next keyframe event (every minute?) while a "please wait to be seated" is shown.

SirSavary5y ago

Incredible -- "Our digital usher is finding you a seat in the cloud" slapped on a screen might just save us months of planning and millions on infrastructure.

1 more reply

scottlamb5y ago

(The quoted paragraph is no longer in the article. But I'm still curious about it.)

> Therefore, the parameter -force_key_frames expr:gte(t,n_forced*4) is needed, which produces a key frame every 4 seconds.

How often would you like it to be producing key frames? My video experience is mostly with security cameras, and the ones I've used produce an I-frame every 2 seconds by default. Their encoders don't seem to be real high-quality; sometimes there's visible pulsing where the image will get worse until the next I-frame, so I wouldn't want to increase the interval much beyond that.

> WebRTC solves this problem using the RTP Control Protocl (RTCP). When a new user consumes a stream, they send a Full Intra Request (FIR) to the producer. When a producer receives this request, they insert a keyframe into the stream. This keeps the bitrate low while ensuring all the users can view the stream.

I'm writing an NVR that does live streaming with HTML5 Media Source Extensions, sending one-frame .mp4 fragments over a WebSocket connection. My approach to this problem is different: when a new client connects, I send them everything since the last key frame. IIRC there's more data in the I-frame than in the (up to 2 seconds of) P-frames since then, so this seems to work pretty well. If there were only an I-frame (say) every minute, I'd probably look at that inserting a keyframe approach...there is an ONVIF command to insert a key frame IIRC.

banana_giraffe5y ago

A few random videos I pulled from youtube had a key frame on average every 4.5 seconds, ranging from 0.1 to 5.5 seconds apart, seems pretty consistent regardless of type of video, at least for the few I tried.

The worst I've seen in production was a poorly configured encoder that insert a keyframe exactly every 30 seconds, along with segments every 10 seconds, which surprisingly, caused some players to crash trying to find a keyframe.

dimesOP5y ago

As a reference, I think Google Chrome sends a key frame every 90 seconds by default

legohead5y ago

Still sounds better than a FIR. If you consider a big streamer with thousands of users. Users are constantly arriving and leaving, so the keyframe requests are going to be so constant that I can see keyframes being generated much more often than 4 seconds (assuming I understand it all correctly).

dimesOP5y ago

Usually you’ll have an SFU between the users and the streamer that can limit the number of requests to one every X seconds.

zaroth5y ago

I guess it comes down to latency requirements?

I would expect where latency isn’t a huge concern, the best user experience would be to start the new receiver back at the last keyframe and fill the buffer up to “present” so they can start watching instantly, and keep a few seconds in the buffer for stability.

In more latency critical streams where you still want the perception of instant video startup I suppose you would have to start at the last keyframe and then as soon as the next key frame came through you could just jump ahead.

1vuio0pswjnm75y ago

"In order for users to watch the video, they must be able to download it in real time, so the maximum bitrate has to be lower than the slowest connection among your users."

Why not just (progressive) download, watch/save, delete? IOW, playback from saved file.

Better for variety of conditions, e.g., connection might be slow.

ngold5y ago

Streaming really is just downloading without the saving part.

pantalaimon5y ago

I'm sure this could be implemented if someone were to sit down and implement it.

amelius5y ago

Ok, so only a problem in live streams?

(And I suppose also when seeking inside a stream)

SahAssar5y ago

It's a problem when playing from a non-start, non-keyframe point (which in practice means any arbitrary point). I'm guessing that's what you meant.

gregoriol5y ago

It seems to me that you can't seek with a webrtc stream, as it is at least.

1 more reply

j1elo5y ago· 3 in thread

Note that this post doesn't really cover how to stream media using WebRTC. First and foremost, because WebRTC mandates the use of DTLS to encrypt the RTP flow, thus a plain RTP stream won't work. A more apt title would be "How to use FFmpeg to generate an encoded stream that happens to match the requirements for WebRTC".

Still, thanks for the article; it is always interesting to see specific applications of the FFmpeg command line, because in my opinion after having read them top to bottom, FFmpeg docs are very lacking in the department of explaining the whys.

Random example: You read the docs of genpts and it is something on the line of "Enables generation of PTS timestamps". Well, thank you (/s) But really, when should I use it? What does it actually change between using it or not? What scenarios would benefit from using it? Etc. Etc.

Orphis5y ago

Technically, it's SRTP with keys derived from the handshake of a DTLS connection though. That DTLS connection can be used for SCTP, the underlying protocol for WebRTC data channels.

So yeah, that won't work to stream to a WebRTC endpoint as you said!

at_a_remove5y ago

This is one of my pet peeves with documentation. "When to" and "when not to" are essentially the wisdom to match the intelligence of a how-to.

mpfundstein5y ago

a lot of open source media docs are like this. one of the worst offenders is gstreamer. geez is that stuff uninformative.

but at least its awesome software for free so who am I to complain?

rlyshw5y ago· 2 in thread

What about the WebRTC part?

The post ends at RTP out from FFMPEG. Maybe I’m supposed to know how to consume that with WebRTC but in my investigation it’s not at all straightforward... the WebRTC consumer needs to become aware of the stream through a whole complicated signaling and negotiation process. How is that handled after the FFMPEG RTP stream is produced?

j1elo5y ago

The WebRTC part would indeed be convoluted.

First, you would need to encrypt the RTP packets with DTLS.

Then, you would need an SDP message generator, where you would include all sorts of info:

* Codec and tunings of video and audio streams.

* RTCP ports where you'll be listening from RTCP Receiver feedback, if any.

* The TLS keys used for encryption.

* Some fake ICE candidates that the other part can use to reach you.

Then provide this as an SDP Offer to the WebRTC API of the other side (i.e. the RTCPeerConnection if we're talking about a web browser), and receive in response an SDP Answer. You should then be able to parse this Answer because the other participant might have rejected some of the parameters you gave it in the Offer (e.g. it could be ready only for audio and reject your video). Or just ignore the Answer and hope that you know the other party so well that they won't reject any of the parameters you provided in the Offer.

Finally you would need to receive ICE candidates from the other party, and parse them in order to know where (what IP and port) to send your RTP packets (and RTCP Sender Reports, if any)

jbaudanza5y ago

I use MediaSoup to bridge between ffmpeg and WebRTC. It works pretty well, and I like that it’s all node based.

doubleorseven5y ago· 2 in thread

Isn't Opus the only codec WebRTC supports? If so, I think it's another main parameter to note.

opencl5y ago

It's not the only codec, but it's the only high quality codec mandated by the spec and supported by all the browsers.

G711 is also mandated by the spec but it's a low quality codec intended for speech with a fixed 8kHz sampling rate. There are a few other codecs supported by Chrome and Safari but not Firefox.

oplav5y ago

H264 is also supported though it's limited to the Constrained Baseline profile [0]. That said, I have been able to use an H264 stream encoded with a Main profile that still worked in Chrome, so it could just be a strong recommendation.

https://developer.mozilla.org/en-US/docs/Web/Media/Formats/W...

kuter5y ago· 1 in thread

For the purpose of one to many type of live streaming you would probably want to use HLS.

Twitch uses it's own transcoding system. Here is a interesting read from their engineering blog [0]

[0] https://blog.twitch.tv/en/2017/10/10/live-video-transmuxing-...

Feoj5y ago

If you want to acheive something approaching the latency advantages of WebRTC with HLS its well worth checking out the low latency HLS work by Apple and the wider video-dev community.

https://developer.apple.com/documentation/http_live_streamin... https://tools.ietf.org/html/draft-pantos-hls-rfc8216bis-08

_Gyan_5y ago· 1 in thread

> -bsv:v h264_metadata=level=3.1

This should be `-bsf:v` and it's not required since this command encodes and the encoder has been informed via `-level`.

dimesOP5y ago

Thanks for the feedback. I've removed it.

Sean-Der5y ago

To get it into the browser check out rtp-to-webrtc[0]

Another big piece missing here is congestion control. It isn’t just about keeping bitrate low, but figuring out what you can use. It is a really interesting topic to measure RTT/Loss to figure out what is available. You don’t get that in ffmpeg or GStreamer yet. The best intro to this is the BBR IETF doc IMO [1]

[0] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...

[1] https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congest...

j / k navigate · click thread line to collapse

39 comments

33 comments · 7 top-level

mandis5y ago· 17 in thread

in case someone was wondering why it was a bad idea

tatersolid5y ago

http://www.chaneru.com/Roku/HLS/X264_Settings.htm#intra-refr...

Dylan168075y ago

Also it's not really a codec thing, it's a clever encoder trick that you can do on basically anything.

2 more replies

dimesOP5y ago

I was not aware of this at the time of writing, but it solves a large problem we've been having. Thank you so much for pointing that out.

Edit: I've just tried using intra refresh, and it works pretty well, but the key frame interval is still required.

wwweston5y ago

Thanks for the easy summary.

SirSavary5y ago

Incredible -- "Our digital usher is finding you a seat in the cloud" slapped on a screen might just save us months of planning and millions on infrastructure.

1 more reply

scottlamb5y ago

(The quoted paragraph is no longer in the article. But I'm still curious about it.)

> Therefore, the parameter -force_key_frames expr:gte(t,n_forced*4) is needed, which produces a key frame every 4 seconds.

banana_giraffe5y ago

dimesOP5y ago

As a reference, I think Google Chrome sends a key frame every 90 seconds by default

legohead5y ago

dimesOP5y ago

Usually you’ll have an SFU between the users and the streamer that can limit the number of requests to one every X seconds.

zaroth5y ago

I guess it comes down to latency requirements?

1vuio0pswjnm75y ago

"In order for users to watch the video, they must be able to download it in real time, so the maximum bitrate has to be lower than the slowest connection among your users."

Why not just (progressive) download, watch/save, delete? IOW, playback from saved file.

Better for variety of conditions, e.g., connection might be slow.

ngold5y ago

Streaming really is just downloading without the saving part.

pantalaimon5y ago

I'm sure this could be implemented if someone were to sit down and implement it.

amelius5y ago

Ok, so only a problem in live streams?

(And I suppose also when seeking inside a stream)

SahAssar5y ago

It's a problem when playing from a non-start, non-keyframe point (which in practice means any arbitrary point). I'm guessing that's what you meant.

gregoriol5y ago

It seems to me that you can't seek with a webrtc stream, as it is at least.

1 more reply

j1elo5y ago· 3 in thread

Orphis5y ago

Technically, it's SRTP with keys derived from the handshake of a DTLS connection though. That DTLS connection can be used for SCTP, the underlying protocol for WebRTC data channels.

So yeah, that won't work to stream to a WebRTC endpoint as you said!

at_a_remove5y ago

This is one of my pet peeves with documentation. "When to" and "when not to" are essentially the wisdom to match the intelligence of a how-to.

mpfundstein5y ago

a lot of open source media docs are like this. one of the worst offenders is gstreamer. geez is that stuff uninformative.

but at least its awesome software for free so who am I to complain?

rlyshw5y ago· 2 in thread

What about the WebRTC part?

j1elo5y ago

The WebRTC part would indeed be convoluted.

First, you would need to encrypt the RTP packets with DTLS.

Then, you would need an SDP message generator, where you would include all sorts of info:

* Codec and tunings of video and audio streams.

* RTCP ports where you'll be listening from RTCP Receiver feedback, if any.

* The TLS keys used for encryption.

* Some fake ICE candidates that the other part can use to reach you.

Finally you would need to receive ICE candidates from the other party, and parse them in order to know where (what IP and port) to send your RTP packets (and RTCP Sender Reports, if any)

jbaudanza5y ago

I use MediaSoup to bridge between ffmpeg and WebRTC. It works pretty well, and I like that it’s all node based.

doubleorseven5y ago· 2 in thread

Isn't Opus the only codec WebRTC supports? If so, I think it's another main parameter to note.

opencl5y ago

It's not the only codec, but it's the only high quality codec mandated by the spec and supported by all the browsers.

G711 is also mandated by the spec but it's a low quality codec intended for speech with a fixed 8kHz sampling rate. There are a few other codecs supported by Chrome and Safari but not Firefox.

oplav5y ago

https://developer.mozilla.org/en-US/docs/Web/Media/Formats/W...

kuter5y ago· 1 in thread

For the purpose of one to many type of live streaming you would probably want to use HLS.

Twitch uses it's own transcoding system. Here is a interesting read from their engineering blog [0]

[0] https://blog.twitch.tv/en/2017/10/10/live-video-transmuxing-...

Feoj5y ago

If you want to acheive something approaching the latency advantages of WebRTC with HLS its well worth checking out the low latency HLS work by Apple and the wider video-dev community.

https://developer.apple.com/documentation/http_live_streamin... https://tools.ietf.org/html/draft-pantos-hls-rfc8216bis-08

_Gyan_5y ago· 1 in thread

> -bsv:v h264_metadata=level=3.1

This should be `-bsf:v` and it's not required since this command encodes and the encoder has been informed via `-level`.

dimesOP5y ago

Thanks for the feedback. I've removed it.

Sean-Der5y ago

To get it into the browser check out rtp-to-webrtc[0]

[0] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...

[1] https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congest...

j / k navigate · click thread line to collapse