If suddenly robot manipulators could grasp any object, operate any knob/switch, tie knots, manipulate cloth, with the same manipulator, on first sight, that would be quite a feat.
But then there's still task planning which is a very different topic. And ... and .... So much still to develop for generally useful robots.
Just getting it to navigate itself using vision would mean building a complex system with a lot of pieces (beyond the most basic demo anyway). You need separate neural nets doing all kinds of different tasks and you need a massive training system for it all. You can see how much work Tesla has had to do to get a robot to safely drive on public roads. [2]
From where I am sitting now, I think we are making good inroads on something like an "Imagenet moment" for robots. (Well, I should note that I am a robotics engineer but I mostly work on driver level software and hardware, not AI. Though I follow the research from the outside.)
It seems like a combination of transformers plus scale plus cross domain reasoning like CLIP [3] could begin to build a system that could mimic humans. I guess as good as transformers are we still haven't solved how to get them to learn for themselves, and that's probably a hard requirement for really being useful in the real world. Good work in RL happening there though.
Gosh, yeah, this is gonna take decades lol. Maybe we will have a spark that unites all this in one efficient system. Improving transformer efficiency and achieving big jumps in scale are a combo that will probably get interesting stuff solved. All the groundwork is a real slog.
[1] https://reboot.love/t/new-cameras-on-rover/277
RL, which I think this particular story is about, is an odd-duck. I have papers on this and I personally have mixed feelings. I am a very applications/solutions-oriented researcher and I am a bit skeptical about how pragmatic the state of the field is (e.g. reward function specification). The argument made by the OpenAI founder on RL not being amenable to taking advantage of large datasets is a pretty valid point.
Finally, you raise interesting points on running multiple complex DNNs. Have you tried hooking things to ROS and using that as a scaffolding (I'm not a robotics guy .. just dabble in that as a hobby so curious what the solutions are). Google has something called MediaPipe, which is intriguing but maybe not what you need. I've seen some NVIDIA frameworks but they basically do pub-sub in a sub-optimal way. Curious what your thoughts are on what makes existing solutions insufficient (I feel they are too!)
Yes unless the industry sees value in a step change in the scale on offer to regular devs, progress on massive nets will be slow.
Hooking things together is pretty much my job. I have used ROS extensively in the past but now I just hook things together using python.
But I consider what Tesla is doing to be pretty promising, and they are layering neural nets together where the output of three special purpose networks feed in to one big one etc. They call that a hydra net. No framework like ROS is required because each net was trained in situ with the other nets on the output of those nets, so I believe all compute logic is handled within the neural network processor (at some point they integrate standard logic too but a lot happens before that). Definitely watch some Karpathy talks on that.
And currently I am simply not skilled enough to compose multiple networks like that. So I could use multiple standalone networks, process them separately, and link them together using IPC of some kind, but it would be very slow compared to what's possible. That's why I say we're "not there yet". Something like Tesla's system available as an open source project would be a boon, but the method is still very labor intensive compared to a self-learning system. It does have the advantage of being modular and testable though.
I probably will hand compose a few networks (using IPC) eventually. I mean right now I am working on two networks - an RL trained trail following network trained in simulation on segmentation-like data (perhaps using Dreamer V2), and a semantic segmentation net that is trained on my hand labeled dataset with "trail/not-trail" segmentation. So far my segmentation net works okay. And a first step will actually be to hand-write an algorithm to go from segmentation data to steering. My simulation stuff is almost working. I built up a training environment using Godot video game engine and hacked the shared memory neural net training add on to accept image data, but when I run the sim in training on DreamerV2, something in the shared memory interface crashes and I have not resolved it. [1]
But all of this is a hobby and I have a huge work project [2] I am managing myself that is important to me, so the self driving off road stuff has been on pause. But I don't stress about it too much because the longer I wait, the better my options get on the neural network side. Currently my off road rover is getting some mechanical repairs, but I do want to bring it back up soon.
[1] https://github.com/lupoglaz/GodotAIGym/issues/15
[2] https://community.twistedfields.com/t/a-closer-look-at-acorn...
Thx for the pointers on Tesla. Had not seen the Hydranet stuff. There was a Karpathy talk about 2 weeks back at a CVPR workshop .. he revealed the scale of Tesla's current generation deep learning cluster [1]. It is insane! Despite being in industrial research, I don't foresee ever being able to touch a cluster like that.
A lot of our current research involves end-to-end training (some complex stuff with transformers and other networks stitched together). There was a CVPR tutorial on autonomous driving [2], where they pretty much said autonomy 2.0 is all about end-to-end. I've spoken to a few people who actually do commercial autonomy, and they seemed more skeptical on whether end2end is the answer in the near-term.
One idea we toy with is to use existing frozen architectures (OpenAI releases some and so do other big players) and do a small bit of fine-tuning.
[1] https://www.youtube.com/watch?v=NSDTZQdo6H8 [2] https://www.self-driving-cars.org/