If you have 3D models created using the iOS 3D scanner you can upload them directly on to 3D-to-photo and describe the scene you want to create. For example:
"on a city side walk" "near a lake, overlooking the water"
Then click "generate" to get the final images.
The tech stack behind 3D-to-photo:
Handling 3d models on the web: @threejs Hosting the diffusion model: @replicate 3D scanning apps: shopify,Polycam3D or LumaLabsAI
Additionally, 3d means more than just camera angle control. you can define a scene in 3D and send it into control net to produce a very specific image
1. Industrial designers and retailers can quickly flesh out a surround for their project/product in various angles and settings without having a physical sample, shoot or building scenes / match perspectives. It would be trivial to automate this into an app so these designers don't need to know anything about AI.
2. People developing content with AI currently have to deal with the subject often varying from inference to inference. With this system the designer would use a model for the subject, and then let stable diffusion make the surrounds. This would reduce the fiddly work of trying to keep the subject consistent from image to image.
3. On-the-fly imagery: Imagine a retailer has a build-to-order ordering system that can have many different options. For example we'll say it's Mr Potato Head with hundreds of different noses, shoes, arms, and clothes. A retailer's website can generate realistic imagery of the customer's specific order on the fly, based on the BTO options selected. Instead of displaying the options in a generic template view, the preview image can instead be various scenes and settings, these can also match the themes of the selected accessories. (e.g. your mr potato head has a chefs hat, so he's in a kitchen cooking. Your mr potato head has sunglasses, now he's on a beach, he's got a chefs hat and sunglasses: now he's BBQing, etc.)
4. Customised content: There are currently services where parents can customise a generic character to look like their child and order a series of books featuring their child. These are usually limited to skin colour, gender and the colour of the clothes. Using this tech the customisation and output imagery could take a significant leap.
Basically if you generate a backdrop and then estimate light direction you can inverse render that onto the 3d model given all the depth information you get for free from the model
Do you communicate anything about the angle or anything to SD? (outside of just giving it the image with a transparent background)
I guess, given the additional context (and vs discussing the semantics of compositing), the better question is, how does this extend the capabilities of stable diffusion inpainting?
Is it any different than just putting your 3d model into photoshop or in a 3d viewer, exporting with a transparent background, and inpainting around it?
However, there's a number of extensions here, that makes the integration of 2D and 3D more interesting going forward
1. 3D models means you can relight the model before running it through in-painting. adding lights in 3D around the product, plus using a more raycasted rendering system (which is now possible in-browser) means you can control the input to SD really well.
2. This one is the most interesting piece. You can create a 3D editor to allow very simple low poly style scene creation, lets say a pedestal and a vase or something from a product photography standpoint - then pass the depth map or canny edges as conditioning through something like control net and you have a super controllable scene design tool that you can finely control - both in camera angle and perspective.
The distinction doesn't feel worth commenting about overall.
NeRF might be a dead end technique in case your trying to keep up
Do you mean you have an existing photo of something and would like to add a realistic setting. There's a lot of ways to do this, but probably the easiest right now is the Generative Fill feature in Adobe Photoshop (beta).
High poly meshes and low poly textures are possible.
There may be a point when that's not the case, but at present this stuff will run on hardware accessible to a consumer, assuming they're ok waiting a bit.
Thoughts?
The other reason I used a Python backend was that I want to extend it to more involved image processing, like producing control net inputs or post processing the end result.
Does that make sense ? Open to better ideas though
just extra stacks to keep track of and keep up with