story
In your example, take a film of the factory floor when it is empty, then once work begins use a approximately human sized/shaped rectangular sliding window and look for areas that exceed a threshold of difference to the image of the empty floor.
You can then use that window as input to a classifier which will be easier due to the considerable dimension reduction or perhaps you can get sufficient performance using further deterministic techniques.