Standard AES-128 has a throughput of around 16 bytes per 8 clock cycles or even less in recent CPUs, because they can do 2 or 4 AES instructions per clock cycle (in the modes of operation that are not limited by latency).
AES-128 can be easily modified to independently encrypting four 32-bit words per execution, instead of one 128-bit block, by cancelling the byte permutation that extends the AES mixing function from 32-bit to 128-bit. this would increase the throughput at least twice, depending on whether PSHUFB is done concurrently or not.
You have given the latencies of the instructions, not their throughput. When you use AES in such a way that you are limited by latency, that is normally wrong. The cryptographic libraries have multi-buffer functions, which compute e.g. 8 AES values, so that they are not limited by latencies.
Regarding the parent article, if you want an unpredictable identifier for a record, you should not do this by encrypting some value with the intent of decrypting it in the future. Instead of this, you should use as identifier an unpredictable random number. Such identifiers can be generated with AES in batches, at maximum throughput, and stored until they are needed for assignment to a record.
If you need in your record some information like time of creation or a monotonically increasing number, which you consider private, such information should be put in distinct fields, that you do not give externally, instead of attempting to encrypt them in a record identifier, which would need to be decrypted to access such information.