It's even true for reconstruction. A digital waveform can represent peak levels far above "digital peak", in between samples.
This is why if you're mastering songs, you'd better keep your peak levels at -0.5dB or -1dB so (so the filtering from lossy compression won't make it clip), and why you'd better use an oversampling limiter. Especially if you're doing loudness war style brutal limiting, because that's the stuff that really creates inter sample peaks. But you shouldn't be doing that, because Spotify and YouTube will just turn your song down to -14 LUFS anyway and all you'll have accomplished is making it sound shitty :-)
In other words it will cause ringing (oscillations, Gibbs phenomenon) in the time domain (for signals, or spatial domain for images). If instead you want a smoother result, you will need to use oscillation in the FFT domain when zeroing out bins. As that is hard to do perfectly, it's easier to specify a window function (or a classically derived FIR or IIR filter kernel) and convolve it with your input signal / image. It is also more efficient to do online when the data is streaming.
Brickwall filters have the greatest possible frequency accuracy. I think you might be getting filters mixed up with windows - a boxcar window does not have the best frequency accuracy.
A tone played for a fixed time, a glissando, vibrato, a pure tone that lies between two frequency bins?
I wonder if this actually has subtle artifacts, or if it doesn't matter because the input is noise?
What is unreal data? If you have periodic data that's not aligned you're going to have a continuous signal going all the way to the high frequencies.
The easiest example would be when an earlier step supersampled the data, and you know that nothing above the original Nyqiust could possibly reflect reality. That's one example of when you'd want to zero bins.
It's used in OFDM, where the subchannels are generated by an FFT and have a sinc shape (the Impulse response of the FFT) in the frequency domain.
but yeah, the implicit boxcar is a sinc.
sometimes it's used for image data as 2d convolutions can be expensive though...
in the time domain, you're looking at what... for each sample a pointwise multiply for each tap and then a sum, right? i'm guessing for most audio applications at commonly used sample rates, you're rarely going to have more than 24 taps at the very most? (with most only really needed 8 to 16?)
with the fft, unless you're using super exotic multiwindow schemes, you're looking at a pointwise multiply just to do the windowing before the fft. then you're looking at n log n to compute the fft itself, then zeroing or applying an envelope (pointwise multiply), then another n log n back to the frequency domain.
i think with simd you're way faster to just stay in the time domain.
would be interesting to bench for sure though... small ffts and simd may be super fast and not that many instructions.
It's not something you can expect someone to reason through from the basics, and it's not something I'd expect someone to know unless they've worked on problems involving the technique.
The crossover point will be system-dependent and heavily influenced by overhead, but a crude WAG might be in the vicinity of 64- to 128-wide kernels. There is no question of one implementation being "better" than the other, they are capable of identical results if implemented accordingly.