It’s especially easy to imagine happening implicitly: feed everything into an ML system and if it tags it with something like “(sex noises)” or “(moaning)” (which Hollywood subtitles and other things in someone’s training data probably have) that’s searchable without anyone explicitly setting out to build a system.