1. Construct the underlying continuous field. This is just a function f(x,y) that returns one if the point is within the text and 0 otherwise.
2. Convolve f with an anti-aliasing filter. The filter could be tall and skinny to account for the fact that the horizontal resolution is 3x the vertical resolution.
3. Sample the resulting image at sub-pixel positions to produce the red, green, and blue values.
In the special case where the anti-aliasing filter is a box filter, this is exactly the same as computing the average for each subpixel. For the technique proposed in the article, the filter kernel would be the sum of six shifted impulses (Dirac deltas).
Anyways, I liked the article and wasn't trying to be critical of it. The convolution approach described above is of theoretical interest, but implementing it with any non-trivial kernel in real-time is almost certainly intractable. What I meant was that every implementation of anti-aliased vector graphics is a kludge, and it's pretty easy to coerce aliasing artifacts out of all of them using zone plates as inputs.
Edit: your article -> the article