Show HN: Generate Stable Diffusion scenes around 3D models (opens in new tab)

(github.com)

123 pointsneilxm2y ago53 comments

3D-to-photo is an open source tool for Generative AI product photography, that uses 3D models to allow fine camera angle control in generated images.

If you have 3D models created using the iOS 3D scanner you can upload them directly on to 3D-to-photo and describe the scene you want to create. For example:

"on a city side walk" "near a lake, overlooking the water"

Then click "generate" to get the final images.

The tech stack behind 3D-to-photo:

Handling 3d models on the web: @threejs Hosting the diffusion model: @replicate 3D scanning apps: shopify,Polycam3D or LumaLabsAI

Show HN: Generate Stable Diffusion scenes around 3D models

(github.com)

123 pointsneilxm2y ago53 comments

3D-to-photo is an open source tool for Generative AI product photography, that uses 3D models to allow fine camera angle control in generated images.

If you have 3D models created using the iOS 3D scanner you can upload them directly on to 3D-to-photo and describe the scene you want to create. For example:

"on a city side walk" "near a lake, overlooking the water"

Then click "generate" to get the final images.

The tech stack behind 3D-to-photo:

Handling 3d models on the web: @threejs Hosting the diffusion model: @replicate 3D scanning apps: shopify,Polycam3D or LumaLabsAI

53 comments

43 comments · 11 top-level

mlboss2y ago· 8 in thread

Why does it need a 3d model ? It looks like it is just doing inpainting which can be done with a single image.

neilxmOP2y ago

Yea you don't absolutely need a 3D model. The benefit of 3D though, is you can define any camera angle and perspective of the main product. now of course you just take a picture of a product from any angle if it were in front of you, but that's not always feasible. the use case here is scalable photo generation for ecomm stores with thousands of inventory items.

Additionally, 3d means more than just camera angle control. you can define a scene in 3D and send it into control net to produce a very specific image

mlboss2y ago

yeah that makes sense. I think the challenge is how much you can automate vs the quality.

quitit2y ago

I can imagine a few good uses for this:

1. Industrial designers and retailers can quickly flesh out a surround for their project/product in various angles and settings without having a physical sample, shoot or building scenes / match perspectives. It would be trivial to automate this into an app so these designers don't need to know anything about AI.

2. People developing content with AI currently have to deal with the subject often varying from inference to inference. With this system the designer would use a model for the subject, and then let stable diffusion make the surrounds. This would reduce the fiddly work of trying to keep the subject consistent from image to image.

3. On-the-fly imagery: Imagine a retailer has a build-to-order ordering system that can have many different options. For example we'll say it's Mr Potato Head with hundreds of different noses, shoes, arms, and clothes. A retailer's website can generate realistic imagery of the customer's specific order on the fly, based on the BTO options selected. Instead of displaying the options in a generic template view, the preview image can instead be various scenes and settings, these can also match the themes of the selected accessories. (e.g. your mr potato head has a chefs hat, so he's in a kitchen cooking. Your mr potato head has sunglasses, now he's on a beach, he's got a chefs hat and sunglasses: now he's BBQing, etc.)

4. Customised content: There are currently services where parents can customise a generic character to look like their child and order a series of books featuring their child. These are usually limited to skin colour, gender and the colour of the clothes. Using this tech the customisation and output imagery could take a significant leap.

neilxmOP2y ago

awesome ideas! the configurability and flexibility of 3D models is a huge advantage over a pure 2D approach these scenarios.

astrange2y ago

It doesn't appear that this project uses it, but a 3D model would give you all the other information like depth maps/normal maps that you would need to light the object itself properly. i.e. change the pixels of the object and not just draw a background around it.

neilxmOP2y ago

That’s the idea for next steps.

Basically if you generate a backdrop and then estimate light direction you can inverse render that onto the 3d model given all the depth information you get for free from the model

1 more reply

jcims2y ago

Automatic lifelike shadows appears to be one benefit.

neilxmOP2y ago

Yea! Shadows on the item or shadows propogated back onto the scene

chankstein382y ago· 7 in thread

How is this different than just photoshopping a 2d image of a 3d object onto an SD generated background? Is it just meant to let people skip the step of generating a background and compositing? (Sorry, inpainting. But the distinction seems minimal here as people have been photoshopping 3d objects believably into scenes for decades before SD came around)

neilxmOP2y ago

Reading your question again, I should clarify - it's not quite compositing a random background under the 2d image. it's using stable diffusion to "in-paint" a background that makes sense spatially, with correct shadows, lighting and perspective to match your prompt. That being said, there's still a lot to be done to deeply integrate the 3D and 2D spaces and that's just going to be part of the ongoing exploration in this project.

chankstein382y ago

That's fair. I guess in my head they're still one in the same. This is compositing without the step of having to match shadows and find a background with a decent angle, etc. I had asked in my original post if this was to skip the step of compositing (because you would do things like fix shadows etc if you were compositing) and that's what it sounds like it is, which is cool.

Do you communicate anything about the angle or anything to SD? (outside of just giving it the image with a transparent background)

I guess, given the additional context (and vs discussing the semantics of compositing), the better question is, how does this extend the capabilities of stable diffusion inpainting?

Is it any different than just putting your 3d model into photoshop or in a 3d viewer, exporting with a transparent background, and inpainting around it?

1 more reply

brucethemoose22y ago

This streamlines the process of pasting a cropped render in, and Stable Diffusion frequently doesn't get the context right, so I assume thats useful for rapidly testing the shoe in different angles.

neilxmOP2y ago

yea so as of now, it is a fairly loose integration between threejs "rendering" a 2d image of a scanned product at some orientation, and then Stable diffusion inpainting a background around it.

However, there's a number of extensions here, that makes the integration of 2D and 3D more interesting going forward

1. 3D models means you can relight the model before running it through in-painting. adding lights in 3D around the product, plus using a more raycasted rendering system (which is now possible in-browser) means you can control the input to SD really well.

2. This one is the most interesting piece. You can create a 3D editor to allow very simple low poly style scene creation, lets say a pedestal and a vase or something from a product photography standpoint - then pass the depth map or canny edges as conditioning through something like control net and you have a super controllable scene design tool that you can finely control - both in camera angle and perspective.

scotty792y ago

You can adjust the pose of a thing you are pasting.

pj_mukh2y ago

Also this is inpainting, not compositing.

chankstein382y ago

Is it really very different? Mind you just slapping the subject and a random background together would, accurately, not produce a very good result but for decades before SD existed people were able to accurately composite objects into images matching shadows and manipulating lighting to fit.

The distinction doesn't feel worth commenting about overall.

2 more replies

causi2y ago· 3 in thread

God we're so close from being able to feed a photo and some measurements into a program and get an accurate model out of it. I can't wait until my smartphone and a set of calipers can replace a $700 3d scanner.

chankstein382y ago

For what it's worth, this doesn't look like it's anything towards that. This is just letting you manipulate a 3d model into an angle, save a 2d image of it, and put it into a generated background. This doesn't turn 2d->3d or seem to do anything 3d except allow a 3d model to be loaded for a photoshoot.

neilxmOP2y ago

Pretty much. additionally, i think NeRF is really promising for this use case. You don't really need a 3D model per se to view a thing from different angles. The only issue is lighting - both photogrammetry and nerf suck have baked lighting which really limits what you can do with the model. But there's a bunch of work towards solving that problem, so I can see that problem going away soon!

yieldcrv2y ago

3D Guassian Splatting seems to be superseding NeRF

NeRF might be a dead end technique in case your trying to keep up

tinytera2y ago· 3 in thread

Looks pretty cool. Can anyone comment on how to hack together the opposite? That is, going from 2D object image to 3D rendering with in-painted background? Or is that not possible right now.

quitit2y ago

Do you mean transforming a sketch to a 3D-looking image(i.e not a 3d mesh model): If so Stable diffusion with control net can do that using a good prompt and the SDXL model.

Do you mean you have an existing photo of something and would like to add a realistic setting. There's a lot of ways to do this, but probably the easiest right now is the Generative Fill feature in Adobe Photoshop (beta).

yanma2y ago

Not sure if they are planning on releasing this but you can mix a image2Nerf model (threestudio is a good repo for this) for the object 3d model, and an image2depth model like ZoeDepth to generate 3d background.

pj_mukh2y ago

2D to 3D networks are pretty amazing right now.

High poly meshes and low poly textures are possible.

halyconWays2y ago· 3 in thread

This sounds pretty cool, do you have a demo or maybe a webm to put in the README.md?

neilxmOP2y ago

Good point, I haven't put up a web demo you can try without running the repo, but there's a video linked to the image on the very top of the README. here's the video again, i should probably make it more obvious that the image is clickable :)

https://www.youtube.com/watch?v=P3yPn92v3u8

pj_mukh2y ago

haha a costly demo to run with the hn hug + inference costs..

neilxmOP2y ago

yeaaa. well i could technically add a settings page to add your Replicate api token and run it on Vercel.

chrisnight2y ago· 2 in thread

Given that Stable Diffusion is designed to be able to run on consumer hardware, without the need for a third party cloud platform, it saddens me to see that this, alongside many other similar projects, require the use of a third party platform for hosting the model, even for local usage. The tool itself does seem interesting though.

neilxmOP2y ago

So the api used for inpainting is very easily swappable to a local instance. But when you say consumer hardware, I think you might be overestimating how capable most people’s computing setups are.

Folcon2y ago

Isn't that to a degree a function of time?

There may be a point when that's not the case, but at present this stuff will run on hardware accessible to a consumer, assuming they're ok waiting a bit.

1 more reply

yieldcrv2y ago· 2 in thread

I took a quick look at the python flask code and I’m still not sure if there’s a reason of not just using Next’s server side aspects. JS can do every operation I skimmed by

Thoughts?

neilxmOP2y ago

Yes and no. So I did it because I honestly get annoyed with pixel operations in node js. The same thing in Python with the pillow library achieves the same things in quarter the lines of code as you would need in something like sharp in nodes

The other reason I used a Python backend was that I want to extend it to more involved image processing, like producing control net inputs or post processing the end result.

Does that make sense ? Open to better ideas though

yieldcrv2y ago

yeah it makes sense, but if you’re already offloading everything to replicate.ai it makes less sense

just extra stacks to keep track of and keep up with

1 more reply

spiderxxxx2y ago· 2 in thread

how is the lighting on the model? I assume you can't do anything other than overcast days, because the lighting isn't specified.

neilxmOP2y ago

So because this is a threejs canvas, we could build an interface to add lights anywhere, to adjust lighting on the model. this is an early version so it's not in there, but really easy to add in code. perhaps that will be the next feature.

bsenftner2y ago

You need to use your Stable Diffusion generated background environment to also generate spherical environment illumination maps so you can do image based lighting. That enables the 3D model(s) to correctly match any environment imaginable by projection mapping the environment onto the model; after which the model's materials define if that mapping is reflected, mixed with the underlying surface color, ignored, or some combination percentage. It's how integrated CGI is done in film VFX.

cvhashim042y ago· 1 in thread

Wow this is insanely cool

neilxmOP2y ago

thank you!

yanma2y ago· 1 in thread

Need gaussian splatting integrated asap https://huggingface.co/blog/gaussian-splatting

neilxmOP2y ago

Hah. Yep!

scotty792y ago

Neat. I recently was having fun with doing something similar manually in Blender, generating depth map and using it in Stable Diffusion with controlnet. Results were great. My models didn't have texture though, so sd generated it. But I imagine I could go with img2img to preserve texture if I had it.

j / k navigate · click thread line to collapse

53 comments

43 comments · 11 top-level

mlboss2y ago· 8 in thread

Why does it need a 3d model ? It looks like it is just doing inpainting which can be done with a single image.

neilxmOP2y ago

Additionally, 3d means more than just camera angle control. you can define a scene in 3D and send it into control net to produce a very specific image

mlboss2y ago

yeah that makes sense. I think the challenge is how much you can automate vs the quality.

quitit2y ago

I can imagine a few good uses for this:

neilxmOP2y ago

awesome ideas! the configurability and flexibility of 3D models is a huge advantage over a pure 2D approach these scenarios.

astrange2y ago

neilxmOP2y ago

That’s the idea for next steps.

Basically if you generate a backdrop and then estimate light direction you can inverse render that onto the 3d model given all the depth information you get for free from the model

1 more reply

jcims2y ago

Automatic lifelike shadows appears to be one benefit.

neilxmOP2y ago

Yea! Shadows on the item or shadows propogated back onto the scene

chankstein382y ago· 7 in thread

neilxmOP2y ago

chankstein382y ago

Do you communicate anything about the angle or anything to SD? (outside of just giving it the image with a transparent background)

I guess, given the additional context (and vs discussing the semantics of compositing), the better question is, how does this extend the capabilities of stable diffusion inpainting?

Is it any different than just putting your 3d model into photoshop or in a 3d viewer, exporting with a transparent background, and inpainting around it?

1 more reply

brucethemoose22y ago

This streamlines the process of pasting a cropped render in, and Stable Diffusion frequently doesn't get the context right, so I assume thats useful for rapidly testing the shoe in different angles.

neilxmOP2y ago

yea so as of now, it is a fairly loose integration between threejs "rendering" a 2d image of a scanned product at some orientation, and then Stable diffusion inpainting a background around it.

However, there's a number of extensions here, that makes the integration of 2D and 3D more interesting going forward

scotty792y ago

You can adjust the pose of a thing you are pasting.

pj_mukh2y ago

Also this is inpainting, not compositing.

chankstein382y ago

The distinction doesn't feel worth commenting about overall.

2 more replies

causi2y ago· 3 in thread

chankstein382y ago

neilxmOP2y ago

yieldcrv2y ago

3D Guassian Splatting seems to be superseding NeRF

NeRF might be a dead end technique in case your trying to keep up

tinytera2y ago· 3 in thread

Looks pretty cool. Can anyone comment on how to hack together the opposite? That is, going from 2D object image to 3D rendering with in-painted background? Or is that not possible right now.

quitit2y ago

Do you mean transforming a sketch to a 3D-looking image(i.e not a 3d mesh model): If so Stable diffusion with control net can do that using a good prompt and the SDXL model.

yanma2y ago

pj_mukh2y ago

2D to 3D networks are pretty amazing right now.

High poly meshes and low poly textures are possible.

halyconWays2y ago· 3 in thread

This sounds pretty cool, do you have a demo or maybe a webm to put in the README.md?

neilxmOP2y ago

https://www.youtube.com/watch?v=P3yPn92v3u8

pj_mukh2y ago

haha a costly demo to run with the hn hug + inference costs..

neilxmOP2y ago

yeaaa. well i could technically add a settings page to add your Replicate api token and run it on Vercel.

chrisnight2y ago· 2 in thread

neilxmOP2y ago

So the api used for inpainting is very easily swappable to a local instance. But when you say consumer hardware, I think you might be overestimating how capable most people’s computing setups are.

Folcon2y ago

Isn't that to a degree a function of time?

There may be a point when that's not the case, but at present this stuff will run on hardware accessible to a consumer, assuming they're ok waiting a bit.

1 more reply

yieldcrv2y ago· 2 in thread

I took a quick look at the python flask code and I’m still not sure if there’s a reason of not just using Next’s server side aspects. JS can do every operation I skimmed by

Thoughts?

neilxmOP2y ago

The other reason I used a Python backend was that I want to extend it to more involved image processing, like producing control net inputs or post processing the end result.

Does that make sense ? Open to better ideas though

yieldcrv2y ago

yeah it makes sense, but if you’re already offloading everything to replicate.ai it makes less sense

just extra stacks to keep track of and keep up with

1 more reply

spiderxxxx2y ago· 2 in thread

how is the lighting on the model? I assume you can't do anything other than overcast days, because the lighting isn't specified.

neilxmOP2y ago

bsenftner2y ago

cvhashim042y ago· 1 in thread

Wow this is insanely cool

neilxmOP2y ago