Audio samples are point samples (usually). This is nice, because there's a whole theory on how to upsample point samples without loss of information. But more importantly, this theory works because it matches how your playback hardware functions (for both analog and digital reasons that I won't go into).
Pixels, however, are actually displayed by the hardware as little physical rectangles. Take a magnifying glass and check. Treating them as points is a bad approximation that can only result in unnecessarily blurry images.
I have no idea why this article is quoted so often. Maybe "everybody is doing it wrong" is just a popular article genre. Maybe not everyone is familiar enough with sampling theory to know exactly why it works in audio (to see why those reasons don't apply to graphics).
This signal processing applies to images as well. Resampling is used very often for upscaling, for example. Here's an example: https://en.wikipedia.org/wiki/Lanczos_resampling
> It was already wrong in 1995 when monitors where CRTs, and it's way wrong in 2025 in the LCD/OLED era where pixels are truly discrete.
I don't think it has anything to do with display technologies though. Imagine this: there is a computer that is dedicated to image processing. It has no display, no CRT, no LCD, nothing. The computer is running a service that is resizing images from 100x100 pixels to 200x200 pixels. Would the programmer of this server be better off thinking in terms of samples or rectangular subdivisions of a display?
Alvy Ray Smith, the author of this paper, was coming from the background of developing Renderman for Pixar. In that case, there were render farms doing all sorts of graphics processing before the final image was displayed anywhere.
I think your two examples nicely illustrate that it's all about the display technology.
> The computer is running a service that is resizing images from 100x100 pixels to 200x200 pixels. Would the programmer of this server be better off thinking in terms of samples or rectangular subdivisions of a display?
That entirely depends on how the resizing is done. Usually people choose nearest neighbor in scenarios like that to be faithful to the original 100x100 display, and to keep the images sharp. This treats the pixels as squares, which means the programmer should do so as well.
> Alvy Ray Smith, the author of this paper, was coming from the background of developing Renderman for Pixar.
That's meaningful context. I'm sure that in 1995, Pixar movies were exposed onto analog film before being shown in theatres. I'm almost certain this process didn't preserve sharp pixels, so "pixels aren't squares" was perhaps literally true for this technology.
How about a counter example: As part of a vectorization engine you need to trace the outline of all pixels of the same color in a bitmap. What other choice to you have than to think of pixels as squares with four sides?
> a bad approximation that can only result in unnecessarily blurry images
There is also the complication of composite video signals, where you can't treat pixels as linearly independent components.
Lots of past discussions:
https://news.ycombinator.com/item?id=35076487 74 points, 2 years ago, 69 comments
https://news.ycombinator.com/item?id=26950455 81 points, 4 years ago,70 comments
https://news.ycombinator.com/item?id=20535984 143 points, 6 years ago, 79 comments
https://news.ycombinator.com/item?id=8614159 118 points, 10 years ago, 64 comments
https://news.ycombinator.com/item?id=1472175 46 points, 15 years ago, 20 comments
Screen pixels are (nowadays) usually three vertical rectangles that occupy a square spot on the grid that forms the screen. This is sometimes exploited for sub-pixel font smoothing purposes.
Digital photography pixels are reconstructed from sensors that perceive cone of incoming light of certain frequency band, arranged in a Bayer grid.
Rendered 3D scene pixels are point samples unless they approximate cones via sampling neighborhood of the pixel center.
In any case, Nyquist will tear your head off and spit into your neck hole as soon as you come close to any kind of pixel. Square or point.
If you light up pixels in a row, you get a line - a long thin rectangle - and not a chain of blobs. If you light them up diagnoally, you get a jagged line. For me that is proof that they squares - at least close enough to squares. Heck even on old displays that don't have a square pixel ratio they are squished squares ;-). And you have to treat them like little squares if you want to understand antialiasing, or why you sometimes have to add (0.5, 0.5) to get sharp lines.
(And a counterpoint: The signal-theoretical view that they are point samples is useful if you want to understand the role of gamma in anti-aliasing, or if you want to do things like superresolution with RGB-sub-pixels.)
See also https://www.reddit.com/r/apple/comments/9fp1ty/did_you_ever_....
But these samples are usually called fragments, not pixels. They turn into little square pixels later in the pipeline, so yeah, I guess that pixels really are little squares, or maybe little rectangles.
Consider the Direct3D rasterization rules[1], which offset each sample point by 0.5 on each axis to sample "at the pixel center". Why are the "pixel centers" even at half-integer coordinates in the first place? Because if thinking of pixels as little squares, it's tempting to align the "corners" with integer coordinates like graph paper. If instead the specifiers had thought of pixels as lattice of sample points, it would have been natural to align the sample points with integer coordinates. "Little square" pixels resulted in an unneeded complication to sampling, an extra translation by a fractional distance, so now every use of the API for pixel perfect rendering must apply the inverse transform.
[1]: https://learn.microsoft.com/en-us/windows/win32/direct3d11/d...
To a user, usually.
To a home entertainment customer, never (even if they wouldn't really notice!).
To an optical engineer like myself, never true.
See, for example: https://bartwronski.com/2021/02/15/bilinear-down-upsampling-...
Implementations of resizing based on aligning pixel centers resulted in slight shifts, which caused a lot of trouble.
I think it actually depends what you define as "pixel". Sure, the pixel on your screen emits light on a tiny square into space. And sure, a sensor pixel measures the intensity on a tiny square.
But let's say I calculate something like:
# samples from 0, 0.1, ..., 1
x = range(0, 1, 11)
# evaluate the sin function at each point
y = sin.(x)
Then each pixel (or entry in the array) is not a tiny square. It represents the value of sin at this specific location. A real pixelated detector would have integrated sin from `y[u] = int_{u}^{u + 0.1} sin(x) dx` which is entirely different from the point wise evaluation before.So for me that's the main difference to understand.