in case someone was wondering why it was a bad idea
http://www.chaneru.com/Roku/HLS/X264_Settings.htm#intra-refr...
The benefit of intra refresh is that you avoid having any particularly large frames. If you're using a sub-second buffer, then intra refresh makes your maximum frame size much smaller without sacrificing quality. It's a godsend for getting latency down to tiny amounts. But if you have 1 or 2 seconds of buffer then it's no big deal if a keyframe is an order of magnitude bigger than other frames, and intra refresh is pointless.
Also it's not really a codec thing, it's a clever encoder trick that you can do on basically anything.
Edit: I've just tried using intra refresh, and it works pretty well, but the key frame interval is still required.
One thing to consider: for some IRL performances, it's not uncommon that if you arrive late, you might be seated at the timing discretion of an usher. I understand digital experiences may carry different expectations, but I could see building an experience around this, perhaps starting with audio-only and maybe even a countdown to a next keyframe event (every minute?) while a "please wait to be seated" is shown.
> Therefore, the parameter -force_key_frames expr:gte(t,n_forced*4) is needed, which produces a key frame every 4 seconds.
How often would you like it to be producing key frames? My video experience is mostly with security cameras, and the ones I've used produce an I-frame every 2 seconds by default. Their encoders don't seem to be real high-quality; sometimes there's visible pulsing where the image will get worse until the next I-frame, so I wouldn't want to increase the interval much beyond that.
> WebRTC solves this problem using the RTP Control Protocl (RTCP). When a new user consumes a stream, they send a Full Intra Request (FIR) to the producer. When a producer receives this request, they insert a keyframe into the stream. This keeps the bitrate low while ensuring all the users can view the stream.
I'm writing an NVR that does live streaming with HTML5 Media Source Extensions, sending one-frame .mp4 fragments over a WebSocket connection. My approach to this problem is different: when a new client connects, I send them everything since the last key frame. IIRC there's more data in the I-frame than in the (up to 2 seconds of) P-frames since then, so this seems to work pretty well. If there were only an I-frame (say) every minute, I'd probably look at that inserting a keyframe approach...there is an ONVIF command to insert a key frame IIRC.
The worst I've seen in production was a poorly configured encoder that insert a keyframe exactly every 30 seconds, along with segments every 10 seconds, which surprisingly, caused some players to crash trying to find a keyframe.
I would expect where latency isn’t a huge concern, the best user experience would be to start the new receiver back at the last keyframe and fill the buffer up to “present” so they can start watching instantly, and keep a few seconds in the buffer for stability.
In more latency critical streams where you still want the perception of instant video startup I suppose you would have to start at the last keyframe and then as soon as the next key frame came through you could just jump ahead.
Why not just (progressive) download, watch/save, delete? IOW, playback from saved file.
Better for variety of conditions, e.g., connection might be slow.
(And I suppose also when seeking inside a stream)
Another big piece missing here is congestion control. It isn’t just about keeping bitrate low, but figuring out what you can use. It is a really interesting topic to measure RTT/Loss to figure out what is available. You don’t get that in ffmpeg or GStreamer yet. The best intro to this is the BBR IETF doc IMO [1]
[0] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...
[1] https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congest...
Still, thanks for the article; it is always interesting to see specific applications of the FFmpeg command line, because in my opinion after having read them top to bottom, FFmpeg docs are very lacking in the department of explaining the whys.
Random example: You read the docs of genpts and it is something on the line of "Enables generation of PTS timestamps". Well, thank you (/s) But really, when should I use it? What does it actually change between using it or not? What scenarios would benefit from using it? Etc. Etc.
So yeah, that won't work to stream to a WebRTC endpoint as you said!
but at least its awesome software for free so who am I to complain?
The post ends at RTP out from FFMPEG. Maybe I’m supposed to know how to consume that with WebRTC but in my investigation it’s not at all straightforward... the WebRTC consumer needs to become aware of the stream through a whole complicated signaling and negotiation process. How is that handled after the FFMPEG RTP stream is produced?
First, you would need to encrypt the RTP packets with DTLS.
Then, you would need an SDP message generator, where you would include all sorts of info:
* Codec and tunings of video and audio streams.
* RTCP ports where you'll be listening from RTCP Receiver feedback, if any.
* The TLS keys used for encryption.
* Some fake ICE candidates that the other part can use to reach you.
Then provide this as an SDP Offer to the WebRTC API of the other side (i.e. the RTCPeerConnection if we're talking about a web browser), and receive in response an SDP Answer. You should then be able to parse this Answer because the other participant might have rejected some of the parameters you gave it in the Offer (e.g. it could be ready only for audio and reject your video). Or just ignore the Answer and hope that you know the other party so well that they won't reject any of the parameters you provided in the Offer.
Finally you would need to receive ICE candidates from the other party, and parse them in order to know where (what IP and port) to send your RTP packets (and RTCP Sender Reports, if any)
Twitch uses it's own transcoding system. Here is a interesting read from their engineering blog [0]
[0] https://blog.twitch.tv/en/2017/10/10/live-video-transmuxing-...
https://developer.apple.com/documentation/http_live_streamin... https://tools.ietf.org/html/draft-pantos-hls-rfc8216bis-08
This should be `-bsf:v` and it's not required since this command encodes and the encoder has been informed via `-level`.
G711 is also mandated by the spec but it's a low quality codec intended for speech with a fixed 8kHz sampling rate. There are a few other codecs supported by Chrome and Safari but not Firefox.
https://developer.mozilla.org/en-US/docs/Web/Media/Formats/W...