I think the best you could do is use the video to determine where someone was standing, and try to reconstruct some of the stereo information based on multiple recorders.
I think this is technically not quite true. If two cell phones right next to each other are both sampling at 15kHz, in the best case you could combine their samples to get an equivalent sampling of 30 kHz. (Best case meaning phone 1 samples exactly half way between phone 2's samples.)
In practice, however, you would have to account for positioning and the fact that the phones' samples aren't perfectly offset from one another. It would require an amazing engineering feat to overcome this challenge, but I think it's within the realm of physically possible.
If the microphones, ADCs etc on both phones are incapable of capturing frequencies of above e.g. 15Khz below a certain range, combining those signals definitely won't bring you any closer to the original signal. You may be able to cancel out a fair bit of noise given enough processing but you won't get back what hasn't been originally captured by either device.
That's before you get into phase problems from trying to combine two signals. A likely outcome is that the amplitude of some signals are increased whilst some are decreased due to phasing issues.
/fuzzily remembered music tech degree. May be too fuzzy though!
Chances are that there is a low pass filter in front of the phone's ADC, blocking signals above the Nyquist limit from reaching the sampler. Assuming brick wall filters (ie perfect cutoff), combining the signals will reduce variance (noise) but not give any information on frequencies above the cutoff frequency of the filter.
Brick wall filters don't exist though. What you might see is a miniscule amount of signal in the filter's stop band. Combining the signal from many many phones might reduce the variable enough to give useful information for frequencies a tiny bit above the cutoff frequency.
A cool project would be to gather the audio from every networked microphone in an area (mobile phones, laptops, ...) and use beam-forming techniques to reconstruct the sound pressure field as a function of position. My guess is that the system would be sensitive enough that it could do amazing things like capture conversations though walls or from long distances.
The problem eventually comes down to the fact that "better" is subjective. We're in the murky realm of art here. Should your algorithm keep that fret noise or the squeaking of a vocalist's intake of breath? Are they "noise," or are they part of the performance?
>I know nothing about audio processing
Not wishing to be rude, but this much is very evident. Recording engineers position their microphones with millimetre precision in order to combat phase issues, and that is in an ideal studio scenario. Doing what you suggest is basically impossible.
Maybe I'm overstating it, you could probably do something and it'd be a nice bit of research, but you wouldn't get useful results in the way that you're imagining.
Actually, that much I know, because I've done some amateur home recording. I know that, for example, when you mic a snare drum with two microphones that are pointed at each other, you have to put a phase inverter on one microphone. I also know my way around the basic processors for audio production (compressor, limiter, EQ, etc.).
What I don't know much about is the undoubtedly more advanced techniques which may or may not exist that could realize the idea I'm talking about. The best idea I can come up with is, if you had one audio source that captured the dynamics of a concert (perhaps from a phone that was far away from the house speakers), and another audio source that captured a clearer yet "smashed" sound (perhaps from a phone closer to the house speakers), perhaps you could apply a compressor to the second source that was keyed on the dynamics of the first. Again, I might be full of crap here.
Typically what it seems you're talking about for audio here is similar to a matrix mix in the amateur/live audio world. People have been (manually) mixing soundboard audio with audience-recorded audio to improve the audio quality of recorded shows for some years now.
The thing is, sound doesn't travel all that fast when you consider the wavelengths of vocal-range soundwaves. Those spikes are not going to arrive at the same time on the different phones.
As ever with DSP, phase problems will be the ruin of you.
Or do you mean that different frequencies will travel at different speeds, enough to make (e.g.) high and low frequencies arrive at different times? Whoa, apparently it does (http://en.wikipedia.org/wiki/Speed_of_sound#Effect_of_freque...) but seems to be a small effect.