Am I the only one who has trouble understanding what is meant by this? What is the exact operation that's referred to here?
I know box filters in the context of 2D image filtering and they're straightforward but the concept of applying them to shapes just doesn't make any sense to me.
Can someone clarify?
- You take a square a single pixel spacing wide by its center and attach it to a sampling point (“center of a pixel”). The value of that pixel is then your mathematically perfect image (of a polygon) integrated over that square (and normalized). This is perhaps the more intuitive definition.
- You take a box kernel (the indicator function of that square, centered, normalized), take the convolution[1] of it with the original perfect image, then sample the result at the final points (“pixel centers”). This is the standard definition, which yields exactly the same result as long as your kernel is symmetric (which the box kernel is).
The connection with the pixel-image filtering case is that you take the perfect image to be composed of delta functions at the original pixel centers and multiplied by the original pixel values. That is, in the first definition above, “integrate” means to sum the original pixel values multiplied by the filter’s value at the original pixel centers (for a box filter, zero if outside the box—i.e. throw away the addend—and a normalization constant if inside it). Alternatively, in the second definition above, “take the convolution” means to attach a copy of the filter (still sized according to the new pixel spacing) multiplied by the original pixel value to the original pixel center and sum up any overlaps. Try proving both of these give the answer you’re already accustomed to.
This is the most honest signal-processing answer, and it might be a bit challenging to work through but my hope is that it’ll be ultimately doable. I’m sure there’ll be neighboring answers in more elementary terms, but this is ultimately a (two-dimensional) signal processing task and there’s value in knowing exactly what those signal processing people are talking about.
[1] (f∗g)(x) = (g∗f)(x) = ∫f(y)g(x-y)dy is the definition you’re most likely to encounter. Equivalently, (f∗g)(x) is f(y)g(z) integrated over the line (plane, etc.) x=y+z, which sounds a bit more vague but exposes the underlying symmetry more directly. Convolving an image with a box filter gives you, at each point, the average of the original over the box centered around that point.
Essentially if you have a polygon function p(x,y) => { 1 if inside the polygon, otherwise 0 }, and a filter function f(x,y) centered at the origin, then you can evaluate the filter at any point x_0,y_0 with the double-integral / total sum of f(x-x_0,y-y_0)*p(x,y).
One significant limitation here is that the polygon needs to have constant colour, unfortunately.
There's also this work on analytic antialiasing by Michael Mccool: https://www.researchgate.net/publication/2524514_Analytic_An...
It can be surprising at first, but when you analytically compute the area of non-overlapping parts of a pixel (i.e., use Box Filtering) you can introduce high frequencies that cause visible aliasing artifacts that will never go away. This is also true if you are using sub-sampling of a pixel, taking point samples and averaging them, no matter how many samples you take.
You can see the aliasing I’m talking about in the example at the top of the article, the 3rd one is the Box Filter - equivalent to computing the area of the polygons within each pixel. Look closely near the center of the circle where all the lines converge, and you can see little artifacts above and below, and to the left and right of the center, artifacts that are not there in the “Bilinear Filter” example on the right.
The Mitchell-Netravali paper[1] correctly describes sampling as a tradeoff space. If you optimize for frequency response (brick wall rejection of aliasing) the impulse response is sinc and you get a lot of ringing. If you optimize for total rejection of aliasing while maintaining positive support, you get something that looks like a Gaussian impulse response, which is very smooth but blurry. And if you optimize for small spatial support and lack of ringing, you get a box filter, which lets some aliasing through.
Which is best, I think, depends on what you're filtering. For natural scenes, you can make an argument that the oblique projection approach of Rocha et al[2] is the optimal point in the tradeoff space. I tried it on text, though, and there were noticeable ringing artifacts; box filtering is definitely better quality to my eyes.
I like to think about antialiasing specific test images. The Siemens star is very sensitive in showing aliasing, but it also makes sense to look at a half-plane and a thin line, as they're more accurate models of real 2D scenes that people care about. It's hard to imagine doing better than a box filter for a half-plane; either you get ringing (which has the additional negative impact of clipping when the half-planes are at the gamut boundary of the display; not something you have to worry about with natural images) or blurriness. In particular, a tent filter is going to be softer but your eye won't pick up the reduction in aliasing, though it is certainly present in the frequency domain.
A thin line is a different story. With a box filter, you get basically a non antialiased line of single pixel thickness, just less alpha, and it's clearly possible to do better; a tent filter is going to look better.
But a thin line is just a linear combination of two half-planes. So if you accept that a box filter is better visual quality than a tent filter for a half-plane, and the other way around for a thin line, then the conclusion is that linear filtering is not the correct path to truly highest quality.
With the exception of thin lines, for most 2D scenes a box filter with antialiasing done in the correct color space is very close to the best quality - maybe the midwit meme applies, and it does make sense to model a pixel as a little square in that case. But I am interested in the question of how to truly achieve the best quality, and I don't think we really know the answer yet.
[1] https://www.cs.utexas.edu/~fussell/courses/cs384g-fall2013/l...
[2] https://www.inf.ufrgs.br/~eslgastal/SBS3/Rocha_Oliveira_Gast...
Anti-aliasing is exactly as it sounds, a low-pass filter to prevent artefacts. Convolution with a square pulse is serviceable, but is not actually that good a low-pass filter, you get all kinds of moire effects. This is why a Bicubic kernel that kind of mimics a perfect low-pass filter (which would be a sinc kernel), can perform better.
It is tempting to use a square kernel though, because it's pretty much the sharpest possible method of acceptable quality.
In the given example (periodic checkerboard), that would be impossible because the pixels that touch the horizon intersect an infinite amount of polygons.
Not that TFA solves that problem either. As far as I know the exact rendering of a periodic pattern in perspective is an open problem.
Analytic integration is always superior to multisampling, assuming the same choice of filter, and as long as the analytic integration is correct. Your comment is making an assumption that the analytic integration is incorrect in the presence of multiple polygons. This isn’t true though, the article is using multiple polygons, though the demo is limited in multiple ways for simplicity, it doesn’t appear to handle any arbitrary situation.
The limitations of the demo (whether it handles overlapping polygons, stitched meshes, textures, etc.) does not have any bearing on the conceptual point that computing the pixel analytically is better than taking multiple point samples. GPUs use multisampling because it’s easy and finite to compute, not because it’s higher quality. Multisampling is lower quality than analytic, but it’s far, far easier to productize, and it’s good enough for most things (especially games).
(Android 14, Android WebView/Chrome 127)
(However it seems wrong that Android doesn't set up things via eg cgroups or whatever to make sure that the browser can't hog all the resources. You'd want to reserve say 5% of memory and RAM for use by system tasks perhaps? (Reserve in the sense that these system tasks can pre-empt anyone else using these, not that no one else can use these.))
We do have a plan for conflation free compositing[1] which should closely approximate the quality of the samples here. That in turn depends on sparse strips[2], though a degraded performance experiment could be done to validate the quality outcomes. Sparse strips in turn depend on high performance segmented sort[3].
The analytic approach to path overlaps is intriguing, but I think it will be very challenging to implement efficiently on GPU. I'm looking forward to seeing what results.
[1]: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Con...
[2]: https://docs.google.com/document/d/16dlcHvvLMumRa5MAyk2Du_Ms...
[3]: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/A.2...
If I were going to take it on, I'd start with BVH construction - the H-PLOC paper at the latest HPG [1] looks promising - then traverse down the hierarchy until you get very small number of path segments so you can pairwise compare them. Obviously any time there is an intersection you need at least the two segments.
This seems hard to me, humbling even. I mean, overlap removal is hard enough on the CPU, especially because it's so sensitive to numerical robustness, and doubly so for curves. But I think you'll learn something for trying!
[﹡] https://news.ycombinator.com/item?id=41105102 but it didn't make the front page; I'm holding back promoting it further pending writing a companion blog post.
A more interesting approach to antialiasing, in my opinion, is the use of neural nets to generate aesthetically pleasing outputs from limited sample data, as seen for example in NVidia's DLAA [0]. These methods go beyond trying to optimize over-simplistic signal processing reconstruction metrics.
[0] https://en.wikipedia.org/wiki/Deep_learning_anti-aliasing
EDIT: yes it is.