undefined | Better HN

0 pointsbaddox14y ago0 comments

The idea I'm talking about should provide a single consistent audio signal. I know nothing about audio processing, but it seems like it should be possible to take multiple bad audio signals and combine them into one signal that's better than any constituent audio source. Perhaps one audio source captured low frequencies well, while another captured higher frequencies better.

0 comments

16 comments · 4 top-level

nitrogen14y ago· 4 in thread

This seems unlikely to be possible. If you haven't captured any frequencies above 15kHz (which an average cell phone mic is unlikely to do), no amount of averaging, filtering, or combining will get them back. There will also be a considerable amount of distortion, since concerts tend to be so loud that even one's ears are distorting. Good luck separating physical distortion in the mic, limiter distortion in the analog or DSP stage, and clipping distortion at the ADC.

I think the best you could do is use the video to determine where someone was standing, and try to reconstruct some of the stereo information based on multiple recorders.

jackpirate14y ago

If you haven't captured any frequencies above 15kHz (which an average cell phone mic is unlikely to do), no amount of averaging, filtering, or combining will get them back.

I think this is technically not quite true. If two cell phones right next to each other are both sampling at 15kHz, in the best case you could combine their samples to get an equivalent sampling of 30 kHz. (Best case meaning phone 1 samples exactly half way between phone 2's samples.)

In practice, however, you would have to account for positioning and the fact that the phones' samples aren't perfectly offset from one another. It would require an amazing engineering feat to overcome this challenge, but I think it's within the realm of physically possible.

allbutlost14y ago

This is unfortunately so unlikely to as to be practically impossible (currently!).

If the microphones, ADCs etc on both phones are incapable of capturing frequencies of above e.g. 15Khz below a certain range, combining those signals definitely won't bring you any closer to the original signal. You may be able to cancel out a fair bit of noise given enough processing but you won't get back what hasn't been originally captured by either device.

That's before you get into phase problems from trying to combine two signals. A likely outcome is that the amplitude of some signals are increased whilst some are decreased due to phasing issues.

/fuzzily remembered music tech degree. May be too fuzzy though!

2 more replies

femto14y ago

Only if the two phones are under-sampling the signal.

Chances are that there is a low pass filter in front of the phone's ADC, blocking signals above the Nyquist limit from reaching the sampler. Assuming brick wall filters (ie perfect cutoff), combining the signals will reduce variance (noise) but not give any information on frequencies above the cutoff frequency of the filter.

Brick wall filters don't exist though. What you might see is a miniscule amount of signal in the filter's stop band. Combining the signal from many many phones might reduce the variable enough to give useful information for frequencies a tiny bit above the cutoff frequency.

A cool project would be to gather the audio from every networked microphone in an area (mobile phones, laptops, ...) and use beam-forming techniques to reconstruct the sound pressure field as a function of position. My guess is that the system would be sensitive enough that it could do amazing things like capture conversations though walls or from long distances.

1 more reply

SandB0x14y ago

In some cases you will have a good prior on the clean signal from studio recordings. Of course, "registering" a recording (or parts of a recording) to the video would be a formidable task in itself.

JonnieCache14y ago· 4 in thread

>take multiple bad audio signals and combine them into one signal that's better

The problem eventually comes down to the fact that "better" is subjective. We're in the murky realm of art here. Should your algorithm keep that fret noise or the squeaking of a vocalist's intake of breath? Are they "noise," or are they part of the performance?

>I know nothing about audio processing

Not wishing to be rude, but this much is very evident. Recording engineers position their microphones with millimetre precision in order to combat phase issues, and that is in an ideal studio scenario. Doing what you suggest is basically impossible.

Maybe I'm overstating it, you could probably do something and it'd be a nice bit of research, but you wouldn't get useful results in the way that you're imagining.

baddoxOP14y ago

> Not wishing to be rude, but this much is very evident. Recording engineers position their microphones with millimetre precision in order to combat phase issues, and that is in an ideal studio scenario. Doing what you suggest is basically impossible.

Actually, that much I know, because I've done some amateur home recording. I know that, for example, when you mic a snare drum with two microphones that are pointed at each other, you have to put a phase inverter on one microphone. I also know my way around the basic processors for audio production (compressor, limiter, EQ, etc.).

What I don't know much about is the undoubtedly more advanced techniques which may or may not exist that could realize the idea I'm talking about. The best idea I can come up with is, if you had one audio source that captured the dynamics of a concert (perhaps from a phone that was far away from the house speakers), and another audio source that captured a clearer yet "smashed" sound (perhaps from a phone closer to the house speakers), perhaps you could apply a compressor to the second source that was keyed on the dynamics of the first. Again, I might be full of crap here.

anjc14y ago

I presume he means better in the sense that you'd try to remove overt noise, e.g. small conversations in the background, maybe wind noise. That sounds like it'd be possible to do. Improving a single video from multiple video sources would surely be impossible. Even with scene reconstruction etc you're not going to be improving the quality of any single video source...(?)

joshu14y ago

Presumably one would get rid of per-device degradation and compression artifacts.

stuaxo14y ago

This is the main thing (also tricky)

rhizome14y ago· 2 in thread

You can certainly automate crossfaded audio between multiple sources to try to get the cleanest copy, but it's hard. For instance, how do you decide whether it's noise or the letter "s" or the "chk" of a pick across muted guitar strings? The heuristics for "better than any constitutent audio source" can be extremely nuanced, algorithmically intensive, and still difficult to pin down, akin to speech recognition. Speaking purely to SaaS'y automated purposes, natch.

Typically what it seems you're talking about for audio here is similar to a matrix mix in the amateur/live audio world. People have been (manually) mixing soundboard audio with audience-recorded audio to improve the audio quality of recorded shows for some years now.

anjc14y ago

I don't know anything about audio processing algorithms, but (assuming there's a way), presumably with enough audio sources, there'd be a commonality between each one that describes the 'correct' sound. I.e. if there's different noise going on in each source (people talking around each microphone, at a gig, intermittently), you don't really need to decide which sound is 'clean' because you'd know which sounds are inconsistent...(?)

rhizome14y ago

Sure, but it's determining what is "correct" that is the hard part. You could use a majority-rule if you have three or more sources, but the more additional sources are required starts getting into pretty niche territory and it still remains possible that the minority source is the most faithful one.

6ren14y ago· 2 in thread

If both channels have a similar spike at the same frequency at the same time, it is probably part of the signal (not noise), so combine those, and dampen all others. This would cover your case, if the other channel had enough of the low/hi freq of the other to relate them. I reckon Shannon looked at exactly this in developing Information Theory (for telephone signals on flaky lines), and it's probably all textbook stuff now.

JonnieCache14y ago

>at the same time

The thing is, sound doesn't travel all that fast when you consider the wavelengths of vocal-range soundwaves. Those spikes are not going to arrive at the same time on the different phones.

As ever with DSP, phase problems will be the ruin of you.

6ren14y ago

nice point, but they'd synchronize with an offset. I doubt absolute time would be used to synchronize the videos anyway; they'd be matched by content.

Or do you mean that different frequencies will travel at different speeds, enough to make (e.g.) high and low frequencies arrive at different times? Whoa, apparently it does (http://en.wikipedia.org/wiki/Speed_of_sound#Effect_of_freque...) but seems to be a small effect.

1 more reply

j / k navigate · click thread line to collapse

0 comments

16 comments · 4 top-level

nitrogen14y ago· 4 in thread

I think the best you could do is use the video to determine where someone was standing, and try to reconstruct some of the stereo information based on multiple recorders.

jackpirate14y ago

If you haven't captured any frequencies above 15kHz (which an average cell phone mic is unlikely to do), no amount of averaging, filtering, or combining will get them back.

allbutlost14y ago

This is unfortunately so unlikely to as to be practically impossible (currently!).

That's before you get into phase problems from trying to combine two signals. A likely outcome is that the amplitude of some signals are increased whilst some are decreased due to phasing issues.

/fuzzily remembered music tech degree. May be too fuzzy though!

2 more replies

femto14y ago

Only if the two phones are under-sampling the signal.

1 more reply

SandB0x14y ago

In some cases you will have a good prior on the clean signal from studio recordings. Of course, "registering" a recording (or parts of a recording) to the video would be a formidable task in itself.

JonnieCache14y ago· 4 in thread

>take multiple bad audio signals and combine them into one signal that's better

>I know nothing about audio processing

Maybe I'm overstating it, you could probably do something and it'd be a nice bit of research, but you wouldn't get useful results in the way that you're imagining.

baddoxOP14y ago

anjc14y ago

joshu14y ago

Presumably one would get rid of per-device degradation and compression artifacts.

stuaxo14y ago

This is the main thing (also tricky)

rhizome14y ago· 2 in thread

anjc14y ago

rhizome14y ago

6ren14y ago· 2 in thread

JonnieCache14y ago

>at the same time

The thing is, sound doesn't travel all that fast when you consider the wavelengths of vocal-range soundwaves. Those spikes are not going to arrive at the same time on the different phones.

As ever with DSP, phase problems will be the ruin of you.

6ren14y ago

nice point, but they'd synchronize with an offset. I doubt absolute time would be used to synchronize the videos anyway; they'd be matched by content.

1 more reply

j / k navigate · click thread line to collapse