Building video chat into my personal website using WebRTC, WebSockets, and Go (opens in new tab)

(mattbutterfield.com)

245 pointsdeneb1505y ago64 comments

64 comments

Im doing exactly this, but will use it with a raspberrypi with a 7 inch touch screen as my doorbell. Someone hits the link on the screen, it hits the server, server texts me a link to join the video session and thats it really. I got the core code going (I used a simple tornado [python] implementation as it has web sockets built in)

This is the version of the js code that I got going (I couldn't reason about straight inline scripting, I had to make unnecessary classes. you dont need them) https://gist.github.com/emehrkay/1ea9a87a91e00b27843d9b71a3c...

You also need to tell nginx to serve the wss connection with http 1.1 or the handshakes fail

``` location /websocket/path { proxy_pass http://whateverSiteDotCom; proxy_http_version 1.1; proxy_set_header Connection "upgrade"; proxy_set_header Upgrade $http_upgrade; proxy_set_header Origin ''; } ```

deneb150OP5y ago

Looks nice, here's the final working JS file from this blog post's example, no classes: https://github.com/m-butterfield/mattbutterfield.com/blob/88...

A bit dense, and could use some error handling...but it actually seems to work fine!

emehrkay5y ago

This looks pretty straight-forward and makes me wonder if im doing too much work creating and juggling two RTCPeerConnection connections. I will try your approach

emehrkay5y ago

This code works fine in iOS safari (and desktop chrome, Firefox, and safari). I've seen a few comments on how mobile safari has pitfalls or is the new IE, but there isnt any specific code there for safari. it all just works

sscarduzio5y ago

I really like the video doorbell project, can you make it OSS and put it on a Github repo?

emehrkay5y ago

Yes. I will show a show hn post when I get it all running

tschellenbach5y ago

I feel like part of the reason why software engineering projects are hard to estimate is this. So you want to add video chat..

- this blogpost: 100 loc - pion (open source): 100k loc? - dolby.io/ agora: I'm guessing >1m loc - zoom.... even more?

VWWHFSfQ5y ago

To be fair this is a very simplistic project just to get the 2-way chat working. There are no fallbacks for anything that doesn't work correctly. There are also numerous edge cases especially iOS and Safari that would add 1,000 more lines of code to properly account for.

Not too mention all the features that people actually want like muting, toggle video, noise detection/cancellation.

So yeah, setting up a P2P video chat in 2021 is somewhat easy. Until it's not.

1 more reply

nexuist5y ago

It makes sense that video chat would be complicated though. Skype came out, what, 2003? And it didn't start becoming popular until years later. FaceTime came out 2010. I would bet the vast majority of people had their very first video chat experience some time in the past decade.

So of course it's hard. Nothing is built with video chat in mind, especially nothing that's existed for 30+ years like the Web. Our solutions are janky and feel bolted-on because they are.

Also, I think video (especially live-streamed video) is hands-down the hardest format to work with in computing. It's simultaneously network, disk, memory, and processor intensive, and doubly so with 2+ streams at the same time. We try to fix some of this with compression, but that just makes the codecs more complex, which makes it harder to work with...

Truth is though, you could "just add video chat," if you accept using a video chat vendor, of which there are probably hundreds (WebEx, Google Meet, Microsoft Teams, Discord, off the top of my head). But that means offloading the complexity to someone else. In many cases that's the right call. In OP's case this was clearly meant to be a learning experience so rolling something DIY is of course acceptable. Hard to estimate, maybe, but of course it would be hard to estimate something you don't know anything about and have never done before. Would a 4th grader be good at estimating how long it would take them to learn enough abstract algebra to start publishing papers on it?

detaro5y ago

I'm not that convinced. I mean yes, if you literally just estimate "add video chat" without any research or further clarification, but that'd get you in hell in every other discipline too.

EDIT: I guess one part might be that people are less likely to recognize specializations than in other disciplines?

rodgerd5y ago

Yeah. "I want a shed". OK, let's build a shed.

"Oh yeah, and maybe some power, IDK maybe three phase. Also PoE and smart lighting and some insulation and enough room for a CNC machine and..."

1 more reply

matthewhammond5y ago

Blogposts like this don't really help -- someone knocks up a basic WebRTC implementation for a weekend project, reckons it's better than what the specialists do and wants to show it off. That puts the message out that WebRTC is easy, when in reality scaling it to production usage with a full feature set has a hell of a lot of challenges.

It would be better if this sort of thing was heavily caveated ("this is the Hello World of WebRTC") because otherwise a lot of people (non-technical types, junior engineers) see it and think -- well we can do that, should take us a few weeks max.

Sean-Der5y ago

You made me curious about how large Pion is, 162k lines! I made sure to delete all test files first.

    sean@SeanLaptop:~/go/src/github.com/pion$ find . -type f -name '\*.go' | xargs wc
       47871  162998 1394063 total

pion/webrtc is the largest package with 58k lines. Every other package (ICE, DTLS, SCTP....) are all around 20k lines. It feels wrong that WebRTC is so large (and not pushed into sub packages) will for sure be digging into that for fun in the next few weeks :)

elithrar5y ago

There are:

- A lot of examples: https://github.com/pion/webrtc/tree/master/examples - A lot of tests - if you exclude `_test.go` and `examples/` you are down to ~58k loines, which is only ~3x bigger than the (much simpler!) ICE and SCTP packages.

With a naive exclude via grep -v '_test.go' and grep -v 'examples/*' we are down to:

   16180   58136  498113 total

tschellenbach5y ago

thanks for the amazing work on Pion, seriously impressive stuff

1 more reply

ejb5035y ago

Not quite so easy as the blog makes out... didn't see any mention of turn and stun servers, and multi-peer adds layers of complexity...

To stably build a negotiation system you'll probably need an infrastructure of websockets and some kind of nosql db to handle identity and other quirks around negotiation...

Example... how do you handle refresh from a new tab or after the connection has dropped... some kind of device signature is probably needed too!!

(We've just spent a year building this for ecommerce @ https://yown.it)

BIG thumbs up for the interest in WebRTC though enormous potential...

throw140820205y ago

WebRTC is complicated, its been around for a while and support in browsers have not been great in the past, which might be why Zoom first used WebSockets for video. They use WebRTC now though, and WebRTC is fine now, it is the standard, but potential is not the right word.

Have a look at WebTransport to see a future alternative with potential.

For those who are interested, the technical term is signalling (not negotiation), and there are many providers that will help with that (ably.com, pubnub.com, pusher.com), you don't need to build your own infrastructure. WebSockets is also just one option.

Using a SFU/ MCU is almost a requirement for multi person calls, becoming more important for bigger groups.

I had a look at yown.it, I don't know what it does, your description of it is a bit vague. Those problems you mention are not hard to solve: "device signature"? You just set a cookie. Connection dropped? Cookie got you covered. New tab? Cookie got you covered. Refresh? Cookie you got covered.

Other interesting technologies are:

Twilio's network traversal service: https://www.twilio.com/stun-turn

Agora's higher level products (e.g. video call, voice call) https://www.agora.io/en

ejb5035y ago

Essentially we enable in-browser comms (including but not limited to WebRTC for video and audio streams on top of storefronts).

Given we allow anonymous connections, we need to associate each WebRTC connection with user defined data (read user profile). It's not quite as simple as "a cookie" because one user can have multiple devices, updated user information has to sync across the other connections and for a smooth experience you have to have synced connection statuses.

We did look at syncing all this with RTC data channels, problem... you can't get message history and you also can't depend on the channel until after a successful negotiation, which again for us is only part of the larger infrastructure...

This forces the use of a parallel comms system such as websockets, allowing for event based synchronisation as well as the organisation of the WebRTC metadata both pre and post connection...

Most people don't want "naked javascript" with two faces on it, and WebRTC is a fantastic tool for video and audio streaming, however it is limited in its wider use (which is perfectly fine it does enough!)...

I think the problem is that people associate "video chat" with simply the media streaming, whereas the reality is that integrating it into a feature rich front end framework is significantly more complicated, and not simply a case of "adding a cookie"

The difference between the solutions you posted and websockets is as far as I can tell, "your own websockets" or "pay someone else to run your websockets".

1 more reply

socceroos5y ago

Another great alternative: https://jitsi.org/projects/

ejb5035y ago

https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/... also the term for a signalling server is "signalling" the term for negotiation is "negotiation".

regularemployee5y ago

I tried this myself too and when I try p2p with 4 people, out of 10 tests about 50% of the time I won't be able to see all 4 people or someone wouldn't be able to see all 4 people.

It was really hard to make p2p work and debugging the ice connections was even harder.

Sean-Der5y ago

WebRTC + networking is frustrating. IMO it is a leaky abstraction. There was a hope that ICE+TURN would work everywhere and users would never need to worry. That isn't true so we need to do a better job educating developers about what/why things went wrong.

I am working on a Open Source book that includes a WebRTC networking chapter[0]. Would love your opinions/feedback if this would have actually been helpful when learning this stuff!

[0] https://webrtcforthecurious.com/docs/03-connecting/

parhamn5y ago

Wonderful and very well written, thanks!

I too experimented with a p2p golang webchat setup. All the jargon was confusing and very hard to look up. This post has already given me much more clarity!!

VWWHFSfQ5y ago

iOS and Safari is riddled with WebRTC bugs like this. Sounds similar to my experience. Everything consistently works great in Chrome and Firefox and then only kinda works Safari. worst browser on the planet

nobleach5y ago

I've jokingly referred to Safari as SafarIE for the last 5 years. It does tick all these boxes:

1. Backed by an OS manufacturer that doesn't care about the web 2. Spends more time working on features that suit itself than meeting standards agreed upon by a body of which they're a part. 3. The only sanctioned/allowed browser on their platform (MS didn't even achieve this holy grail) 4. Lagging behind most other popular browsers by years in some cases

But due to it being the ONLY browser that'll run on iOS, I have no choice but to dumb down user experience for it. This year's lovely issue has been MediaRecorder - but supposedly that's made it into the most recent release.

3 more replies

yawnxyz5y ago

I thought I was just dumb for not being able to figure this out, but turns out it's just a hard problem for everyone haha

65105y ago

Yeah, it made me feel like I had to read the manual but there wasn't any. Seeing such a bad one reminded me how spoiled we are with beautiful abstractions.

deneb150OP5y ago

Debugging this is indeed quite hard. I have only used it for 1 on 1 p2p calls.

moron4hire5y ago

That's probably Network Address Translation (NAT), which requires TURN (a fancy name for a central relay for all media) to "punch through". TURN literally stands for "Traversal Using Relay around NAT". And it's just a traditional, centralized. non-p2p fallback for people on paternalistic networks that don't allow them to create UDP connections or TCP connections on any ports other than 80 or 443.

Which, as it turns out, is a lot of users. I've seen estimates in the range of 10 to 20% of users. Which means, for a random selection of 7 users, you pretty much have a 50/50 chance of not being able to peer everyone using just STUN.

kwindla5y ago

I think it is likely also bandwidth and cpu issues with mesh peer-to-peer.

Unless you're capping the video bitrate, the browser will try to use whatever the browser's default target is, for each connection. On Chrome that's 3mb/s, which is a lot of network bandwidth, and turns out to be a lot of cpu as well just shuffling those packets through the encoding->sending->bandwidth-estimation and receiving->decoding->rendering pipelines.

Capping the video bitrate is more complicated and confusing than it should be. It's better now that the browser implementations are all more or less closing in on "WebRTC 1.0" compliance. But you still need to reach into either the raw SDP you are exchanging during signaling, or the RTCPeerConnection objects, and set the encoding bitrate target.

The SaaS platforms that offer WebRTC APIs and infrastructure all do a lot of work under the covers to set bitrate caps, track constraints (resolution, for example), and other bits and pieces of WebRTC config that work well on a wide variety of networks, devices, and browsers.

Sean-Der5y ago

The only numbers I have ever seen published are Whereby's[0] they saw 17% used TURN.

There is a little more nuance then just paternalistic networks though. In same cases like NAT Mapping exhaustion you just can't give an individual user multiple long lived mappings. Address Dependendent filtering/mapping also makes sense in some cases. It makes P2P harder, but does give you the ability to provide your users more sessions at least!

https://medium.com/the-making-of-whereby/what-kind-of-turn-s...

1 more reply

throw140820205y ago

I always assumed everyone is behind NAT, you're saying on 10 to 20% of people are, and therefore only they need TURN. I'd love to see where you got that number.

If I were to guess, the problem GP is facing is bandwidth, a mesh network uses exponentially more bandwidth. For each user, the bandwidth is linear, N more people requires N more bandwidth. This is fine for downloads, but uploading N more can be much more challenging for certain networks.

3 more replies

meheleventyone5y ago

The worst one for me so far is mDNS not working on my local network so the one circumstance you should basically be able to guarantee an easy P2P connection doesn’t work.

Naac5y ago

...and GCP, and pub/sub/ and...

So a little click baity title. If the backend wasn't distributed the title would be a little more apt.

neophyt35y ago

no mention of turn servers, also p2p does not scale well since its mesh network... good article, but this wont make it to production alone

SilurianWenlock5y ago

I cant see this video feature on his personal website?

cblconfederate5y ago

i wonder if webrtc was built to be intentionally complex or if a better standard would make adoption easier, perhaps in conjunction with a standard server (like we have httpd for html)

jqpabc1235y ago

Anyone have something similar for screen sharing?

Sean-Der5y ago

Instead of `getUserMedia` replace with `getDisplayMedia`.

If you are looking for a native option use [0] or [1] and you can send anything from ffmpeg to webrtc. ffmpeg itself doesn't support WebRTC so need to use something for the last part.

[0] https://github.com/rviscarra/webrtc-remote-screen

[1] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...

tppiotrowski5y ago

Here’s a similar project I did w/screenshare

https://github.com/ted-piotrowski/ted-piotrowski.github.io

tomcooks5y ago

Stream screen to webrtc via ffmpeg

jqpabc1235y ago

Thanks for all the replies.

mfbx9da45y ago

Nice, did you setup a turn server?

mro_name5y ago

nice and concise, chapeau.

j / k navigate · click thread line to collapse

64 comments

emehrkay5y ago

You also need to tell nginx to serve the wss connection with http 1.1 or the handshakes fail

deneb150OP5y ago

Looks nice, here's the final working JS file from this blog post's example, no classes: https://github.com/m-butterfield/mattbutterfield.com/blob/88...

A bit dense, and could use some error handling...but it actually seems to work fine!

emehrkay5y ago

This looks pretty straight-forward and makes me wonder if im doing too much work creating and juggling two RTCPeerConnection connections. I will try your approach

emehrkay5y ago

sscarduzio5y ago

I really like the video doorbell project, can you make it OSS and put it on a Github repo?

emehrkay5y ago

Yes. I will show a show hn post when I get it all running

tschellenbach5y ago

I feel like part of the reason why software engineering projects are hard to estimate is this. So you want to add video chat..

- this blogpost: 100 loc - pion (open source): 100k loc? - dolby.io/ agora: I'm guessing >1m loc - zoom.... even more?

VWWHFSfQ5y ago

Not too mention all the features that people actually want like muting, toggle video, noise detection/cancellation.

So yeah, setting up a P2P video chat in 2021 is somewhat easy. Until it's not.

1 more reply

nexuist5y ago

So of course it's hard. Nothing is built with video chat in mind, especially nothing that's existed for 30+ years like the Web. Our solutions are janky and feel bolted-on because they are.

detaro5y ago

I'm not that convinced. I mean yes, if you literally just estimate "add video chat" without any research or further clarification, but that'd get you in hell in every other discipline too.

EDIT: I guess one part might be that people are less likely to recognize specializations than in other disciplines?

rodgerd5y ago

Yeah. "I want a shed". OK, let's build a shed.

"Oh yeah, and maybe some power, IDK maybe three phase. Also PoE and smart lighting and some insulation and enough room for a CNC machine and..."

1 more reply

matthewhammond5y ago

Sean-Der5y ago

You made me curious about how large Pion is, 162k lines! I made sure to delete all test files first.

    sean@SeanLaptop:~/go/src/github.com/pion$ find . -type f -name '\*.go' | xargs wc
       47871  162998 1394063 total

elithrar5y ago

There are:

With a naive exclude via grep -v '_test.go' and grep -v 'examples/*' we are down to:

   16180   58136  498113 total

tschellenbach5y ago

thanks for the amazing work on Pion, seriously impressive stuff

1 more reply

ejb5035y ago

Not quite so easy as the blog makes out... didn't see any mention of turn and stun servers, and multi-peer adds layers of complexity...

To stably build a negotiation system you'll probably need an infrastructure of websockets and some kind of nosql db to handle identity and other quirks around negotiation...

Example... how do you handle refresh from a new tab or after the connection has dropped... some kind of device signature is probably needed too!!

(We've just spent a year building this for ecommerce @ https://yown.it)

BIG thumbs up for the interest in WebRTC though enormous potential...

throw140820205y ago

Have a look at WebTransport to see a future alternative with potential.

Using a SFU/ MCU is almost a requirement for multi person calls, becoming more important for bigger groups.

Other interesting technologies are:

Twilio's network traversal service: https://www.twilio.com/stun-turn

Agora's higher level products (e.g. video call, voice call) https://www.agora.io/en

ejb5035y ago

Essentially we enable in-browser comms (including but not limited to WebRTC for video and audio streams on top of storefronts).

This forces the use of a parallel comms system such as websockets, allowing for event based synchronisation as well as the organisation of the WebRTC metadata both pre and post connection...

The difference between the solutions you posted and websockets is as far as I can tell, "your own websockets" or "pay someone else to run your websockets".

1 more reply

socceroos5y ago

Another great alternative: https://jitsi.org/projects/

ejb5035y ago

https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/... also the term for a signalling server is "signalling" the term for negotiation is "negotiation".

regularemployee5y ago

I tried this myself too and when I try p2p with 4 people, out of 10 tests about 50% of the time I won't be able to see all 4 people or someone wouldn't be able to see all 4 people.

It was really hard to make p2p work and debugging the ice connections was even harder.

Sean-Der5y ago

I am working on a Open Source book that includes a WebRTC networking chapter[0]. Would love your opinions/feedback if this would have actually been helpful when learning this stuff!

[0] https://webrtcforthecurious.com/docs/03-connecting/

parhamn5y ago

Wonderful and very well written, thanks!

I too experimented with a p2p golang webchat setup. All the jargon was confusing and very hard to look up. This post has already given me much more clarity!!

VWWHFSfQ5y ago

nobleach5y ago

I've jokingly referred to Safari as SafarIE for the last 5 years. It does tick all these boxes:

3 more replies

yawnxyz5y ago

I thought I was just dumb for not being able to figure this out, but turns out it's just a hard problem for everyone haha

65105y ago

Yeah, it made me feel like I had to read the manual but there wasn't any. Seeing such a bad one reminded me how spoiled we are with beautiful abstractions.

deneb150OP5y ago

Debugging this is indeed quite hard. I have only used it for 1 on 1 p2p calls.

moron4hire5y ago

kwindla5y ago

I think it is likely also bandwidth and cpu issues with mesh peer-to-peer.

Sean-Der5y ago

The only numbers I have ever seen published are Whereby's[0] they saw 17% used TURN.

https://medium.com/the-making-of-whereby/what-kind-of-turn-s...

1 more reply

throw140820205y ago

I always assumed everyone is behind NAT, you're saying on 10 to 20% of people are, and therefore only they need TURN. I'd love to see where you got that number.

3 more replies

meheleventyone5y ago

The worst one for me so far is mDNS not working on my local network so the one circumstance you should basically be able to guarantee an easy P2P connection doesn’t work.

Naac5y ago

...and GCP, and pub/sub/ and...

So a little click baity title. If the backend wasn't distributed the title would be a little more apt.

neophyt35y ago

no mention of turn servers, also p2p does not scale well since its mesh network... good article, but this wont make it to production alone

SilurianWenlock5y ago

I cant see this video feature on his personal website?

cblconfederate5y ago

i wonder if webrtc was built to be intentionally complex or if a better standard would make adoption easier, perhaps in conjunction with a standard server (like we have httpd for html)

jqpabc1235y ago

Anyone have something similar for screen sharing?

Sean-Der5y ago

Instead of `getUserMedia` replace with `getDisplayMedia`.

If you are looking for a native option use [0] or [1] and you can send anything from ffmpeg to webrtc. ffmpeg itself doesn't support WebRTC so need to use something for the last part.

[0] https://github.com/rviscarra/webrtc-remote-screen

[1] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...

tppiotrowski5y ago

Here’s a similar project I did w/screenshare

https://github.com/ted-piotrowski/ted-piotrowski.github.io

tomcooks5y ago

Stream screen to webrtc via ffmpeg

jqpabc1235y ago

Thanks for all the replies.

mfbx9da45y ago

Nice, did you setup a turn server?

mro_name5y ago

nice and concise, chapeau.

j / k navigate · click thread line to collapse