If I may, let me ask you an unrelated question that just 'popped in my head' only now but is related to your recent presentation at CVPR: Are you guys at Tesla fusing video with audio data for self-driving?
Just curious. I ask because (a) sound waves at frequencies detectable by the human ear appear to be quite important for both routine and edge-case situations (e.g., sounds of other vehicles braking/screeching/accelerating/passing, sirens of ambulances/police cars/fire trucks, bursts of honks from other vehicles, people suddenly shouting/screaming nearby), and (b) audio and video signals are already synchronized, so I imagine fusing them should be more straightforward (e.g., there's already some research out there on applying deep learning to video clips with audio).
No comments yet.