Well, compute shaders have the drawback that you have to know how many work items (in this case, fragments) you need to dispatch in advance. You can figure this out for triangles and quads, but it's a pain, and it basically involves doing the same thing the rasterization hardware already does in software. It's much simpler and faster to just use the rasterization hardware built in to GPUs to dispatch fragment work groups dynamically via a triangle draw call and only override the sample processing step.
You know that GPUs can dispatch work dynamically themselves to overcome this problem, right? So you totally can have one computation step determine how many instances it requires for the next one.