Facebook is working on this.[1] There's even a $100,000 "Hateful Memes" competition.[2]
In order for AI to become a more effective tool for detecting hate speech, it must be able to understand content the way people do: holistically. When viewing a meme, for example, we don’t think about the words and photo independently of each other; we understand the combined meaning. This is extremely challenging for machines, however, because it means they can’t analyze the text and the image separately. They must combine these different modalities and understand how the meaning changes when they are presented together.
Facebook is going for the really hard case, where non-hate images and non-hate text combine to induce hate. They already have a text-only system.
[1] https://venturebeat.com/2020/05/12/facebook-is-using-more-ai...
[2] https://ai.facebook.com/tools/hatefulmemes/