Relies heavily on GStreamer under the hood for the video processing bits. And mediapipe for the selfie segmentation bit. Obviously need v4l2loopback to emulate a webcam. It's functional, but feedback and contributions are welcome :)
See https://support.zoom.us/hc/en-us/articles/205759689-Release-... and https://support.zoom.us/hc/en-us/articles/360043484511
> The blurred background option is only available for the Windows and macOS desktop clients, as well as the Android and iOS mobile apps. Desktop clients must meet the "Image only without a physical green screen" requirements.
And OBS has lots of neat features - one of which is making my fans loud. Like a crowd cheering it on!
Especially, if you use the Zoom desktop client. Zoom on web (Firefox) seems to be easier on the CPU for some reason though.
Going forward, I wonder if OBS will offer pipewire or gstreamer streams (if those are a thing) for consumption by other apps?
Dedicated recording apps for cameras and screens will give you better performance and occasionally a marginally better image since there are fewer steps in the pipeline.
For v4loopback, a short gstreamer pipeline (for those willing to use the CLI) will perform better and give you noticably lower latency - even if some basic cropping is required, like if you're using a capture card with a camera that you can't hide the UI on.
You're likely using software encoding. My 2015 MBP has a GT750M, but "thanks" to Apple OBS can't use hardware acceleration (https://obsproject.com/forum/threads/question-about-hardware...). Situation on Linux is likely similar.
Any ideas how I can do this on linux?
There are many others similar to the above. They all basically work by using the v4l2loopback driver to create a loopback video device. Then you have the program consume video from the real webcam video device, apply a deep learning model that subtracts the background (leaving your face and torso) and replaces it with an image or video of your choice, and outputs the video to the loopback device. Then you configure your video conferencing software (zoom, teams, webex, whatever) to use the loopback device instead of the real one.
I also just realized that if I place my laptop camera in the exact same spot, while recording the background video and during class, all I need to do is sit on the left of the camera pane during class, and only stay on the right for the background video, then I just need to stitch the two videos side-by-side in OBS and it should work well enough.
I think I achieved my purpose here. Ask a question from a bunch of smart people. Either they give a great answer, or their response forces you to rethink and figure out a simple solution. Thank you HN for being my AI enabled rubber duck.
Use chroma key to remove the background and add a video there, possibly on a loop.
Would it be possible to avoid the conversion to RGB? (This forum thread says it's CPU-only: https://forums.developer.nvidia.com/t/videoconverts-performa...)
Right now SelfieSegmentation is just a thin wrapper around the selfie segmentation solution provided in https://google.github.io/mediapipe. It operates on RGB frames, so that's why I need the conversion. The model inference is also done on the CPU. Interestingly, there is a GPU mediapipe graph available, but I haven't looked into what's needed to use that yet.
And yes, Boxfilter is just a wrapper around opencv's boxfilter. This is probably the lowest hanging fruit that could be moved to use GPU.
Can't get it to work at the moment - but hopefully with a bit more experimenting, this will replace my current options.
Hadn't realised until reading this submission that Zoom has added blur support on Linux recently - on trying it out it does seem quite poor though - a very sharp edge and quite slow to update.