Just moments ago, I managed to turn a photo of a person into a short clip of them dancing, in half-decent quality,
fully locally, on a mid-range gaming GPU (RTX 4070 Ti, 12GB VRAM). I almost run out of RAM (32GB), but it worked, worked well, and took only couple of minutes.
Half a year ago, that was sort of possible for some genius really bent on making it happen. A year ago, that was unthinkable. Today, it's a matter of drag&dropping a workflow to a fresh ComfyUI install and downloading a couple dozen GB of img2vid models.
The returns on R&D are not diminishing, the progress is just not happening everywhere evenly and at the same time.