yup, I actually use the dlib facial detector in this app for a ventriloquist type feature (not shown in video). The facial landmarks detector assume you've framed an ROI for a face already. Locating faces in an image is trivial... locating hands in an image and their orientation, not so much : ) That's not to say a CNN couldn't be trained to do it, it's just going to take some work!