This is the version of the js code that I got going (I couldn't reason about straight inline scripting, I had to make unnecessary classes. you dont need them) https://gist.github.com/emehrkay/1ea9a87a91e00b27843d9b71a3c...
You also need to tell nginx to serve the wss connection with http 1.1 or the handshakes fail
``` location /websocket/path { proxy_pass http://whateverSiteDotCom; proxy_http_version 1.1; proxy_set_header Connection "upgrade"; proxy_set_header Upgrade $http_upgrade; proxy_set_header Origin ''; } ```
A bit dense, and could use some error handling...but it actually seems to work fine!
- this blogpost: 100 loc - pion (open source): 100k loc? - dolby.io/ agora: I'm guessing >1m loc - zoom.... even more?
Not too mention all the features that people actually want like muting, toggle video, noise detection/cancellation.
So yeah, setting up a P2P video chat in 2021 is somewhat easy. Until it's not.
So of course it's hard. Nothing is built with video chat in mind, especially nothing that's existed for 30+ years like the Web. Our solutions are janky and feel bolted-on because they are.
Also, I think video (especially live-streamed video) is hands-down the hardest format to work with in computing. It's simultaneously network, disk, memory, and processor intensive, and doubly so with 2+ streams at the same time. We try to fix some of this with compression, but that just makes the codecs more complex, which makes it harder to work with...
Truth is though, you could "just add video chat," if you accept using a video chat vendor, of which there are probably hundreds (WebEx, Google Meet, Microsoft Teams, Discord, off the top of my head). But that means offloading the complexity to someone else. In many cases that's the right call. In OP's case this was clearly meant to be a learning experience so rolling something DIY is of course acceptable. Hard to estimate, maybe, but of course it would be hard to estimate something you don't know anything about and have never done before. Would a 4th grader be good at estimating how long it would take them to learn enough abstract algebra to start publishing papers on it?
EDIT: I guess one part might be that people are less likely to recognize specializations than in other disciplines?
"Oh yeah, and maybe some power, IDK maybe three phase. Also PoE and smart lighting and some insulation and enough room for a CNC machine and..."
It would be better if this sort of thing was heavily caveated ("this is the Hello World of WebRTC") because otherwise a lot of people (non-technical types, junior engineers) see it and think -- well we can do that, should take us a few weeks max.
sean@SeanLaptop:~/go/src/github.com/pion$ find . -type f -name '\*.go' | xargs wc
47871 162998 1394063 total
pion/webrtc is the largest package with 58k lines. Every other package (ICE, DTLS, SCTP....) are all around 20k lines. It feels wrong that WebRTC is so large (and not pushed into sub packages) will for sure be digging into that for fun in the next few weeks :)- A lot of examples: https://github.com/pion/webrtc/tree/master/examples - A lot of tests - if you exclude `_test.go` and `examples/` you are down to ~58k loines, which is only ~3x bigger than the (much simpler!) ICE and SCTP packages.
With a naive exclude via grep -v '_test.go' and grep -v 'examples/*' we are down to:
16180 58136 498113 totalTo stably build a negotiation system you'll probably need an infrastructure of websockets and some kind of nosql db to handle identity and other quirks around negotiation...
Example... how do you handle refresh from a new tab or after the connection has dropped... some kind of device signature is probably needed too!!
(We've just spent a year building this for ecommerce @ https://yown.it)
BIG thumbs up for the interest in WebRTC though enormous potential...
Have a look at WebTransport to see a future alternative with potential.
For those who are interested, the technical term is signalling (not negotiation), and there are many providers that will help with that (ably.com, pubnub.com, pusher.com), you don't need to build your own infrastructure. WebSockets is also just one option.
Using a SFU/ MCU is almost a requirement for multi person calls, becoming more important for bigger groups.
I had a look at yown.it, I don't know what it does, your description of it is a bit vague. Those problems you mention are not hard to solve: "device signature"? You just set a cookie. Connection dropped? Cookie got you covered. New tab? Cookie got you covered. Refresh? Cookie you got covered.
Other interesting technologies are:
Twilio's network traversal service: https://www.twilio.com/stun-turn
Agora's higher level products (e.g. video call, voice call) https://www.agora.io/en
Given we allow anonymous connections, we need to associate each WebRTC connection with user defined data (read user profile). It's not quite as simple as "a cookie" because one user can have multiple devices, updated user information has to sync across the other connections and for a smooth experience you have to have synced connection statuses.
We did look at syncing all this with RTC data channels, problem... you can't get message history and you also can't depend on the channel until after a successful negotiation, which again for us is only part of the larger infrastructure...
This forces the use of a parallel comms system such as websockets, allowing for event based synchronisation as well as the organisation of the WebRTC metadata both pre and post connection...
Most people don't want "naked javascript" with two faces on it, and WebRTC is a fantastic tool for video and audio streaming, however it is limited in its wider use (which is perfectly fine it does enough!)...
I think the problem is that people associate "video chat" with simply the media streaming, whereas the reality is that integrating it into a feature rich front end framework is significantly more complicated, and not simply a case of "adding a cookie"
The difference between the solutions you posted and websockets is as far as I can tell, "your own websockets" or "pay someone else to run your websockets".
It was really hard to make p2p work and debugging the ice connections was even harder.
I am working on a Open Source book that includes a WebRTC networking chapter[0]. Would love your opinions/feedback if this would have actually been helpful when learning this stuff!
I too experimented with a p2p golang webchat setup. All the jargon was confusing and very hard to look up. This post has already given me much more clarity!!
1. Backed by an OS manufacturer that doesn't care about the web 2. Spends more time working on features that suit itself than meeting standards agreed upon by a body of which they're a part. 3. The only sanctioned/allowed browser on their platform (MS didn't even achieve this holy grail) 4. Lagging behind most other popular browsers by years in some cases
But due to it being the ONLY browser that'll run on iOS, I have no choice but to dumb down user experience for it. This year's lovely issue has been MediaRecorder - but supposedly that's made it into the most recent release.
Which, as it turns out, is a lot of users. I've seen estimates in the range of 10 to 20% of users. Which means, for a random selection of 7 users, you pretty much have a 50/50 chance of not being able to peer everyone using just STUN.
Unless you're capping the video bitrate, the browser will try to use whatever the browser's default target is, for each connection. On Chrome that's 3mb/s, which is a lot of network bandwidth, and turns out to be a lot of cpu as well just shuffling those packets through the encoding->sending->bandwidth-estimation and receiving->decoding->rendering pipelines.
Capping the video bitrate is more complicated and confusing than it should be. It's better now that the browser implementations are all more or less closing in on "WebRTC 1.0" compliance. But you still need to reach into either the raw SDP you are exchanging during signaling, or the RTCPeerConnection objects, and set the encoding bitrate target.
The SaaS platforms that offer WebRTC APIs and infrastructure all do a lot of work under the covers to set bitrate caps, track constraints (resolution, for example), and other bits and pieces of WebRTC config that work well on a wide variety of networks, devices, and browsers.
There is a little more nuance then just paternalistic networks though. In same cases like NAT Mapping exhaustion you just can't give an individual user multiple long lived mappings. Address Dependendent filtering/mapping also makes sense in some cases. It makes P2P harder, but does give you the ability to provide your users more sessions at least!
https://medium.com/the-making-of-whereby/what-kind-of-turn-s...
If I were to guess, the problem GP is facing is bandwidth, a mesh network uses exponentially more bandwidth. For each user, the bandwidth is linear, N more people requires N more bandwidth. This is fine for downloads, but uploading N more can be much more challenging for certain networks.
So a little click baity title. If the backend wasn't distributed the title would be a little more apt.
If you are looking for a native option use [0] or [1] and you can send anything from ffmpeg to webrtc. ffmpeg itself doesn't support WebRTC so need to use something for the last part.
[0] https://github.com/rviscarra/webrtc-remote-screen
[1] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...