Please relax a bit, geeze! For resequencing, you have to input all of the reads and this should be a significantly larger number of bytes than the number of bytes than the number of bytes in the unindexed reference. So let's look at the I/O costs of inputing the read data.
Essentially, any alignment technique that can come close enough to I/O saturating reading both the dominant factor (the reads) and the minor factor (the reference) is optimal.
When (for a slightly different kind of gapped, error-prone read) I wrote alignment software, I also beat MAQ by 10-20X, like the BWA algorithm but without using anything like the BWT hair. I contemplated how the gapping rules and error rules looked in a brute force NFA translation. I found simple ways to optimize that NFA very cheaply. I found that it paid to maximize read-data throughput by devoting all of memory to the NFA built from as many reads as possible. It was not hard, this way, to get in the ballpark (small enough constant factor) of I/O bounding read processing (where "not hard" means something like 3-6 months of staring at walls and scribbling stuff on paper until I found some solutions that were simple enough it is only bad luck I didn't find them right off the bat). Building a prefix or suffix tree or any other kind of index to the reference was the approach my boss suggested at the start and that I explicitly set out to beat (and did!). Streaming the reference is not the bottleneck if you can do it in an I/O bound way because (for resequencing, at least), the size of the read data is much larger and you can do alignment while coming within a small constant of I/O saturating the input of that read data using a simpler NFA approach.
Looking at the description of MAQ, besting its performance by any number of means is not so surprising. Reaching for the hair of using BWT does surprise me.