All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.
[1] Not meant as an insult. Working professionals don't have time for this stuff.
Star Trek's Holodeck is actually a good case study here (especially with the recent series, Lower Decks, going as far as making two episodes that are interactive movies on a holodeck, going quite deep into how that could work in practice both in terms of producing and experiencing them).
One observation derived here is that infinite procedural content at your fingertip doesn't necessarily kill all meaning, if you bring the meaning with you. The two major use cases[0] for the holodeck are:
- Multiplayer scenarios in which you and your friends enjoy some experience in a program. The meaning is sourced from your friendship and roleplay; the program may be arbitrary output of an RNG in the global sense, but it's the same for you and your friends, so shared experience (and its importance as a social object) in your group is retained.
- Single-player simulations that are highly specific. The meaning here comes from whatever is the reason you're simulating that particular experience, and it's connection to the real world. Like idk., a flight simulator of a random space fighter flying over random world shooting at random shit would quickly get boring, but if I can get the simulator to give me a highly accurate cockpit of an F/A-18 Hornet, flying over real terrain and shooting at realistic enemies in realistic (even if fictional) storyline - now that would be deeply meaningful to me, because 1) F/A-18 Hornet is a real plane that I would otherwise never experience flying, and 2) I have a crush on this particular fighter because F/A-18 Hornet 3.0 is one of the first videogames I ever played in my life as a kid.
Now, to make Metaverse less like bullshit and more like Star Trek, we'd need to make sure the world generation is actually available to the users. No asset stores, no app marketplace bullshit. We live in a multimodal LLM era - we already have all the components to do it like Star Trek did it: "Computer, create a medieval fantasy village, in style of England around year 1400, set next to a forest, with tall mountains visible in the distance", then walk around that world and tweak the defaults from there.
--
[0] - Ignoring the third use case that's occasionally implied on the show, and that's really obvious given it's the same one the Internet is for - and I'm not talking about cat pictures.
It was rough at first, and needed plenty of tuning, but the terrain and environments it's capable of certainly have a wide audience.
But as far as pure, unbridled generation goes, yeah; I'm sure there will be plenty of slop made in the coming decade.
To me, this takes the place / augments procedural generation stuff. NPC crowds in which none of the participants are needed for the plot, but in which you can have unique clothing / appearance / lines is not "needed" for a game, but can flesh it out when done thoughtfully.
Recall the lambasting Cyberpunk 2077 got for its NPCs that cycled through a seemingly very limited number of appearances, to the point that you'd see clones right next to each other. This would solve that sort of problem, for example.
Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that? Do you know anyone in your immediate vicinity who can?
Read this satirical speech by Claude, in French https://x.com/pmarca/status/1881869448275177764) and in English (https://x.com/pmarca/status/1881869651329913047) and tell me: can you write fiction more entertaining or imaginative than that? Is there someone in your vicinity who can?
Perhaps that's mundane, so is there someone in your vicinity who can reason about a topic in mathematics/physics as well as this: https://x.com/hsu_steve/status/1881696226669916408 ?
Probably your answer is "yes, obviously!" to all the above.
My point: deep learning works and the era of slop ended ages ago except that some people are still living in the past or with some cartoon image of the state of the art.
> "Cost to zero" implies drinking directly from the AI firehose with no human in the loop
No. It means the marginal cost of production tends towards 0. If you can think it, then you can make it instantly and iterate a billion times to refine your idea with as much effort as it took to generate a single concept.
Your fixation on "content without a human directing them" is bizarre and counterproductive. Why is "no human in the loop" a prerequisite for productivity? Your fixation on that is confounding your reasoning.
It has a 'why would I strap on a headset for stuff I can do without'
I will not starting meeting friends just because of the meta verse. I have everything I need already.
And even video calls with Whatsapp is alweird as f.
Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.
These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.
I've been casually scanning huggingface for relevant models to try out but haven't really found anything.
There are now more advanced options than Gaussian splatting, and these can achieve normal playback speeds rather than hours of filtering. I'll drop a citation if I recall the recent paper and example code. However, note this style of 3D scene recovery tends to be heavily 3D location dependent.
Best of luck, =3
[0] https://github.com/NVlabs/instant-ngp
[1] https://github.com/NVlabs/instant-ngp/blob/master/docs/nerf_...
On the geometry side from the theoretical point of view you can repair meshes, [1], by inferring a signed or unsigned distance field from your existing mesh, then you contour this distance field.
If you like the distance field approach, there are also research work [2], to estimate neural unsigned distance fields directly, (kind of a similar way as Gaussian splats).
[1] https://github.com/nzfeng/signed-heat-3d [it works but it's research code, so buggy, not user friendly, and mostly on toy problems because complexity explode very quickly when using a grid the number of cells grows as a n^3, and then they solve a sparse linear system on top (so total complexity bounded by n^6), but tolerating approximations and writing things properly practical complexity should be on par with methods like finite element method in Computational Fluid Dynamics.
Isn't a static-object-rotating-camera basically a requirement for photogrammetry?
>These normally are ideal variables for photogrammetry
Actually no, my friend learned this the hard way during a photogrammetry project, he rented a photo studio, and made sure the background were perfectly black and took the photos but the photogrammetry program (Meshroom I think) was struggling to reconstruct the mesh. I did some research and I learned that it uses features in the background to help position itself to make the meshes. So he redid his tests outside with "messy" backgrounds and it worked much much better.
This was a few years ago so I don't know if things are different now.
They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2
I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.
Here's what I got
Leaf
PNG: https://0x0.st/8HDL.png
GLB: https://0x0.st/8HD9.glb
Guitar
PNG: https://0x0.st/8HDf.png other view: https://0x0.st/8HDO.png
GLB: https://0x0.st/8HDV.glb
Google Translate of Guitar:
Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
PNG: https://0x0.st/8HDt.png and https://0x0.st/8HDv.png
Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole.
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)
Prompt: A guitar
PNG: https://0x0.st/8HDg.png
Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
Prompt: A Monstera leaf
PNG: https://0x0.st/8HD6.png
https://0x0.st/8HDl.png
https://0x0.st/8HDU.png
Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things.
It's definitely a leaf and monstera like but a bit of a mutant.
Prompt: Mario from Super Mario Bros
PNG: https://0x0.st/8Hkq.png
Note: Now I'm VERY suspicious....
Prompt: Luigi from Super Mario Bros
PNG: https://0x0.st/8Hkc.png
https://0x0.st/8HkT.png
https://0x0.st/8HkA.png
Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario.
Where is the tie coming from? The suspender buttons are all messed up.
Really went uncanny valley here. So this suggests we're really brittle.
Prompt: Peach from Super Mario Bros
PNG: https://0x0.st/8Hku.png
https://0x0.st/8HkM.png
Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
Prompt: Toad from Super Mario Bros
PNG: https://0x0.st/8Hke.png
https://0x0.st/8Hk_.png
https://0x0.st/8HkL.png
Note: Lord have mercy on this toad, I think it is a mutated Squirtle.
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)
[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...
Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.
Prompt: A hawk flying in the sky
PNG: https://0x0.st/8Hkw.png
https://0x0.st/8Hkx.png
https://0x0.st/8Hk3.png
Note: This looks like it would need more work. I tried a few birds and generic too. They all seem to have similar form.
Prompt: A hawk with the head of a dragon flying in the sky and holding a snake
PNG: https://0x0.st/8HkE.png
https://0x0.st/8Hk6.png
https://0x0.st/8HkI.png
https://0x0.st/8Hkl.png
Note: This one really isn't great. Just a normal hawk head. Not how a bird holds a snake either...
This last one is really key for judging where the tech is at btw. Most of the generations are assets you could download freely from the internet and you could probably get better ones by some artist on fiver or something. But the last example is more our realistic use case. Something that is relatively reasonable, probably not in the set of easy to download assets, and might be something someone wants. It isn't too crazy of an ask given Chimera and how similar a dragon is to a bird in the first place, this should be on the "easier" end. I'm sure you could prompt engineer your way into it but then we have to have the discussion of what costs more a prompt engineer or an artist? And do you need a prompt engineer who can repair models? Because these look like they need repairs.This can make it hard to really tell if there's progress or not. It is really easy to make compelling images in a paper and beat benchmarks while not actually creating a something that is __or will become__ a usable product. All the little details matter. Little errors quickly compound... That said, I do much more on generative imagery than generative 3d objects so grain of salt here.
Keep in mind: generative models (of any kind) are incredibly difficult to evaluate. Always keep that in mind. You really only have a good idea after you've generated hundreds or thousands of samples yourself and are able to look at a lot with high scrutiny.
People just see fancy demos and start crapping on about the future, but just look at stable diffusion. It's been around for how long, and what serious professional game developers are using it as a core part of their workflow? Maybe some concept artists? But consistent style is such an important thing for any half decent game and these generative tools shit the bed on consistency in a way that's difficult to paper over.
I've spent a lot of time thinking about game design and experimenting with SD/Flux, and the only thing I think I could even get close to production that I couldn't before is maybe an MTG style card game where gameplay is far more important than graphics, and flashy nice looking static artwork is far more important than consistency. That's a fucking small niche, and I don't see a lot of paths to generalisation.
The second has similar problems: it has tuning knobs with missing winding posts, then five strings becoming four at the bridge. It also has a pickup under the fretboard.
Are these considered good capability examples?
It is pretty good with some easier assets that I suspect there's lots of samples of (and we're comparing to other generative models, not to what humans make. Humans probably still win by a good margin). But when moving out of obvious assets that we could easily find, I'm not seeing good performance at all. Probably a lot can be done with heavy prompt engineering but that just makes things more complicated to evaluate.
TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
https://github.com/Tencent/Hunyuan3D-2?tab=License-1-ov-file(I previously tried the stability 3d models: https://stability.ai/stable-3d and this seems similar in quality and speed)
https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan...
[1] https://github.com/Tencent/Hunyuan3D-2/blob/main/assets/imag...