15khz is actually is very high frequency, most adults can barely hear it. The audio from phones is probably much more band limited than that. But that's the least of the challenges. People listen to and enjoy highly band limited music all the time: laptop speakers might have a frequency response of 500hz - 4khz.
The more challenging problem is the distortion from the phones being overloaded, crowd noise, built in limiters, different sample rates and compression. It is true that phase relationships from a single sound captured by multple sources can be very problematic.
However, the further the mics are away from the source the less this a problem at least with "phaseyness." This annoying artifact is a type of comb filtering , and it's based on the fact that two mics close to a sound source can be thought of really capturing the "same" sound at slightly different times. If the mics are far apart, the sound is no longer the same: it's picking up reflections from a myriad of sources, the phase relationships within the frequency spectrum have been smeared and shifted by traveling through air. This negates a lot of phase problems. The more likely problem is cancellation in the low frequencies which can be ameliorated with time alignment.