undefined | Better HN

0 points2019841mo ago0 comments

Are you certain using AES is still faster? Let's say for a 32-bit block size and 64-bit key.

From https://en.wikipedia.org/wiki/Speck_(cipher), that Speck combination would use 22 rounds, and using the instruction timings for Zen 5 from https://instlatx64.github.io/InstLatx64/AuthenticAMD/Authent..., it looks like each round would take at most 3 cycles. (Dependency chain for each round is 3 instructions long, ror+add+xor). 22*3 = ~66 cycles.

Using AES with a pshufb to take out the ShiftRows step would be 2 cycles for the pshufb and 4 cycles for each aesenc, and at 10 rounds, you have ~60 cycles.

It's quite close, and to say which one wins, we'd need to actually benchmark it. One is not clearly much faster than the other.

0 comments

adrian_b1mo ago

Standard AES-128 has a throughput of around 16 bytes per 8 clock cycles or even less in recent CPUs, because they can do 2 or 4 AES instructions per clock cycle (in the modes of operation that are not limited by latency).

AES-128 can be easily modified to independently encrypting four 32-bit words per execution, instead of one 128-bit block, by cancelling the byte permutation that extends the AES mixing function from 32-bit to 128-bit. this would increase the throughput at least twice, depending on whether PSHUFB is done concurrently or not.

You have given the latencies of the instructions, not their throughput. When you use AES in such a way that you are limited by latency, that is normally wrong. The cryptographic libraries have multi-buffer functions, which compute e.g. 8 AES values, so that they are not limited by latencies.

Regarding the parent article, if you want an unpredictable identifier for a record, you should not do this by encrypting some value with the intent of decrypting it in the future. Instead of this, you should use as identifier an unpredictable random number. Such identifiers can be generated with AES in batches, at maximum throughput, and stored until they are needed for assignment to a record.

If you need in your record some information like time of creation or a monotonically increasing number, which you consider private, such information should be put in distinct fields, that you do not give externally, instead of attempting to encrypt them in a record identifier, which would need to be decrypted to access such information.

201984OP1mo ago

>You have given the latencies of the instructions, not their throughput. When you use AES in such a way that you are limited by latency, that is normally wrong.

I did that because TFA is talking about encrypting 32 bit IDs, which is 1/4th of an AES block. There aren't multiple blocks to do at once in this scenario, and throughput numbers do not apply because each instruction depends on the result of the one before.

You mention doing multiple IDs at once, but the overhead of pulling multiple IDs into a single batch from something akin to URLs in web requests is likely gonna be worse than any gains.

>Instead of this, you should use as identifier an unpredictable random number. Such identifiers can be generated with AES in batches, at maximum throughput, and stored until they are needed for assignment to a record.

Now you lose the ability to sort the records in a database, and I fail to see what AES gives you here over any other random number generator.

botusaurus1mo ago

maybe the reason they are so close is that the AES microcode is inplementing exactly those operations

201984OP1mo ago

There's nothing similar about AES and Speck, and the "microcode" for AES isn't like what you're thinking of. If you want to learn more about it, you can look up the specifications for AES and Intel's AES instruction set.

j / k navigate · click thread line to collapse

0 points2019841mo ago0 comments

Are you certain using AES is still faster? Let's say for a 32-bit block size and 64-bit key.

Using AES with a pshufb to take out the ShiftRows step would be 2 cycles for the pshufb and 4 cycles for each aesenc, and at 10 rounds, you have ~60 cycles.

It's quite close, and to say which one wins, we'd need to actually benchmark it. One is not clearly much faster than the other.

0 comments

adrian_b1mo ago

201984OP1mo ago

>You have given the latencies of the instructions, not their throughput. When you use AES in such a way that you are limited by latency, that is normally wrong.

You mention doing multiple IDs at once, but the overhead of pulling multiple IDs into a single batch from something akin to URLs in web requests is likely gonna be worse than any gains.

Now you lose the ability to sort the records in a database, and I fail to see what AES gives you here over any other random number generator.

botusaurus1mo ago

maybe the reason they are so close is that the AES microcode is inplementing exactly those operations

201984OP1mo ago

j / k navigate · click thread line to collapse