Over the last few months, my collaborators and I have put together a new, real-time method that makes NeRF models accessible from smartphones, laptops, and low-power desktops, and we think we’ve done a pretty stellar job! SMERF, as we like to call it, distills a large, high quality NeRF into a real-time, streaming-ready representation that’s easily deployed to devices as small as a smartphone via the web browser.
On top of that, our models look great! Compared to other real-time methods, SMERF has higher accuracy than ever before. On large multi-room scenes, SMERF renders are nearly indistinguishable from state-of-the-art offline models like Zip-NeRF and a solid leap ahead of other approaches.
The best part: you can try it out yourself! Check out our project website for demos and more.
If you have any questions or feedback, don’t hesitate to reach out by email (smerf@google.com) or Twitter (@duck).
The effect is a bit spooky. I felt like a ghost going through walls.
A girl is eating her morning cereal. Suddenly she looks apprehensively at the fridge. Camera dollies towards the appliance and seamlessly penetrates the reflective surface, revealing a deep hidden space that exactly matches the reflection. At the dark end of the tunnel, something stirs... A wildly grinning man takes a step forward and screams.
The barely-there reflection on the Berlin TV is also a trip to enter, and observe the room from.
Either matterport takes and runs with this or this is a startup waiting to disrupt Realestate.
I can’t believe how smooth this ran on my smartphone.
Feedback: if there was a mode to use the phone compass and gyro for navigation, it’d feel natural. Felt weird to navigate with fingers and figure how to move in xyz dimension.
As others have said, VR mode would be epic.
I think if you take this 1-2 steps further and combine this with halucinating already owned furniture, or furniture that matches the prospective buyers taste into the property, this will make it a lot easier to fall for a home.
I agree, we could do better with the movement UX. A challenge for another day.
Are you accepting pull requests?
Take for instance the fulllivingroom demo. (I prefer fps mode.)
1) How many images are input?
2) How long does it take to compute these models?
3) How long does it take to prepare these models for this browser, with all levels, etc?
4) Have you tried this in VR yet?
1) Around 100-150 if memory serves. This scene is part of the mip-NeRF 360 benchmark, which you can download from the corresponding project website: https://jonbarron.info/mipnerf360/
2) Between 12 and 48 hours, depending on the scene. We train on 8x V100s or 16x A100s.
3) The time for preparing assets is included in 2). I don't have a breakdown for you, but it's something like 50/50.
4) Nope! A keen hacker might be able to do this themselves by editing the JavaScript code. Open your browser's DevTools and have a look -- the code is all there!
https://github.com/smerf-3d/smerf-3d.github.io/blob/main/vie...
For VR, there’s going to be some very weird depth data from those reflections, but maybe they would not be so bad when you are in headset.
What I haven't seen anything of is feature and object detection, blocking and extraction.
Hopefully a more efficient and streamable codec necessitates the sort of structure that lends itself more easily to analysis.
It is funny how quickly goalposts move! I love to see progress though, and wow, is progress happening fast!
Found by putting "nerf sam segment 3d" into DuckDuckGo.
Sounds like an opportunity!
Does this relieve the computation constraint enough to run on Quest 2/3?
Is there something else that would prevent binocular use?
It's a sort of replayable cutscene that happens a couple times in the game where you can wander through it. The noteworthy bit is it's rendered out of voxels that look very similar to the demos but at a much lower resolution and if you push the frustrum into any objects, you get the same kind of effect where the surface breaks into blocks.
This is similar to Block-NeRF [0], in their project page they show some videos of what you’re asking.
As for an easy way of doing this, nothing out-of-the-box. You can keep an eye on nerfstudio [1], and if you feel brave you could implement this paper and make a PR!
For example a 3rd person game where the character you control and the NPCs/enemies is raster but the environment is all radiance fields
In the meantime, give it a shot in a Webkit- or Chromium-based browser. I've had good results on Safari on iPhone, Chrome on Android/Macbook/Windows.
WebRender initialization failed Blocklisted; failure code RcANGLE(no compositor device for EGLDisplay)(Create)_FIRST 3D11_COMPOSITING runtime failed Failed to acquire a D3D11 device Blocklisted; failure code FEATURE_FAILURE_D3D11_DEVICE2
I'm running a 3060
It makes it harder for me to trust the paper if I feel like the paper is trying to persuade me of something rather than describe the complete findings.
By no means do I claim that SMERF is the be-all-end-all in real-time rendering, but I do believe it's a sold step in the right direction. There are all kinds of ways to improve this work and others in the field: smaller representation sizes, faster training, higher quality, and fewer input images would all make this technology more accessible.
My one note is that it look a really long time to load all the images - the scene wouldn't render until all ~40 initial images loaded. Would it be possible to start partially rendering as the images arrive, or do you need to wait for all of them before you can do the first big render?
You do however point out one weakness of SMERF: large payload sizes. If we can figure out how to compress them by 10x, it'll be a very different experience!
The benchmark details are a bit complicated. Check out the technical paper's experiment section for the nitty gritty details.
B. That's hardly anything in 2023.
My grandpa died 2 years ago and in hindsight I took pictures for using them as in your demo.
Awesome thanks:)
When I look at the NYC scene in the highest quality on desktop, I'm surprised by how low-quality ex. the stuff on the counter and shelves is. So then I load the lego model, and see that's _very_ detailed, so it doesn't seem inherent to the method.
Is it a consequence of input photo quality, or something else?
Thank you :)
> Is it a consequence of input photo quality, or something else?
It's more a consequence of spatial resolution: the bigger the space, the more voxels you need to maintain a fixed resolution (e.g. 1 mm^3). At some point, we have to give up spatial resolution to represent larger scenes.
A second limitation is the teacher model we're distilling. Zip-NeRF (https://jonbarron.info/zipnerf/) is good, but it's not _perfect_. SMERF reconstruction quality is upper-bounded by its Zip-NeRF teacher.
Question for the authors, are there opportunities, where they exist, to not use optimization or tuning methods for reconstructing a model of a scene?
We are refining efficient ways of rendering a view of a scene from these models but the scenes remain static. The scenes also take a while to reconstruct too.
Can we still achieve the great look and details of RF and GS without paying for an expensive reconstruction per instance of the scene?
Are there ways of greedily reconstructing a scene with traditional CG methods into these new representations now that they are fast to render?
Please forgive any misconceptions that I may have in advanced! We really appreciate the work y'all are advancing!
If you know a way, let me know! Every system I'm aware of involves optimization in one way or another, from COLMAP to 3D Gaussian Splatting to Instant NGP and more. Optimization is a powerful workhorse that gives us a far wider range of models than a direct solver ever could. > Can we still achieve the great look and details of RF and GS without paying for an expensive reconstruction per instance of the scene?
In the future I hope so. We don't have a convincing way to generate 3D scenes yet, but given the progress in 2D, I think it's only a matter of time.
> Are there ways of greedily reconstructing a scene with traditional CG methods into these new representations now that they are fast to render?
Not that I'm aware of! If there were, I think these works should be on the front page instead of SMERF.
I'd recommend starting with a tutorial on neural radiance fields, aka NeRF, (https://sites.google.com/berkeley.edu/nerf-tutorial/home) and an applied overview of Deep Learning with tools like PyTorch or JAX. This line of work is still "cutting edge" research, so a lot of knowledge hasn't been rolled up into textbook or article form yet.
UPDATE: The code for our web viewer is here: https://github.com/smerf-3d/smerf-3d.github.io/blob/main/vie...
but perhaps that was biased?
What a variety of groups! How did this come about?
NeRF Studio > Included Methods, Third-party Methods: https://docs.nerf.studio/#supported-methods
Neural Radiance Field: https://en.wikipedia.org/wiki/Neural_radiance_field
I wonder how long it takes before Google street view gets replaced by some NeRF variant.