A multimodal or segmentation algorithm is not the solution for bee-level path planning, obstacle avoidance or autonomous navigation. Getting LLMs to power a robot for household tasks with low latency to action and in an energy efficient manner is challenging enough, before talking about high-speed, highly maneuverable drones.
Not really practical at the moment of course since you can't put 8 A100s on a drone.