This is a surprisingly hard problem that I've explored before. Feel free to email me about it!
If it was for the store then you'd do one or more stationary mounts of something better and more purpose-built than a camera phone. Or something even simpler like counting bodies in and out at the door with the door dinger thing.
So I assume it's for surveilling the store surreptitiously, right?
I suppose it's an interesting problem for the machine vision people. But how do you deal with people down the aisles that you can't see? How do you avoid looking like you're collecting upskirts or kiddie porn? How do you avoid just plain getting your face punched in?
Is there a way to count cell phone signals, from your own phone or from special equipment? There were stories a year or so ago about some mall that was going to track individual phones through the mall, to discern shopping patterns and such. I imagine it's not impossible to create something that you could use surreptitiously (again, I'm assuming).
But if you could mount a phone...
Maybe some technique where you take a baseline photo with the store empty and then compare that to ones with people in it. You'd have to account for natural changes in light and ignore that in the algorithm. Shadows and bags would be difficult. Then count the number of changed pixels from baseline, the average number of pixels representing a person, and some sort of edge detection for added checking.
Or, a mechanical turk worker would solve this in a few seconds for probably a penny.
Q: How do I count people with computer vision?
A: 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 FizzBuzz 16 17 Fizz ...