This is an efficiency hack for fpga.
As an example, suppose you were checking a 256-bit number for equality. This would be unacceptable:
bool eq256(uint32_t a[8], uint32_t b[8]) {
for (int i = 0; i < 8; ++i) {
if (a[i] != b[i]) return false;
}
return true;
}
Why? Because depending on the data this function takes longer or shorter, and timing attacks might be used to figure out secret data. Instead you need an implementation like this: bool eq256(uint32_t a[8], uint32_t b[8]) {
uint32_t diff = 0;
for (int i = 0; i < 8; ++i) {
diff |= a[i] ^ b[i];
}
return diff == 0;
}
This implementation always takes the same amount of time regardless of the contents of a and b.It goes further than this as well, you're not supposed to take branches based on secret data nor access memory locations based on secret data as those can be recovered through the branch predictor and cache respectively.
Wouldn't most implementations of fixed length integers have constant time though? It barely seems worth optimizing unless your integers vary massively in size, at which point using fixed size integers is clearly suboptimal.
https://github.com/mratsim/constantine/wiki/Constant-time-ar...
Edit: actually, nth_element is a great example here: https://eel.is/c++draft/alg.nth.element#5
The complexity requirements for container member functions are often more vague an implicit, but I believe those are also not expressed in terms of time either. It's some other implicitly implied operation. In [2] for example this operation is the construction of a single object.
[1] https://eel.is/c++draft/sort#5 [2] https://eel.is/c++draft/sequences#vector.cons-5
An other tangentially related part of C++ is <chrono>, but I do not think that there are strict requirements there for precision. At most there is monotonic requirement for steady_clock[3].
Modulo is also similarly slow but rarely needed as well. For example in elliptic curve cryptography it's only needed at the very beginning of a computation when hashing to the elliptic curve or when producing a secret key from a random input key material.
In terms of speed, LLVM iXYZ wide-integer code is usually faster for basic operations (modular addition, substraction, multiplication) the big slowness is for modular inversion where constant-time inversion is about 20x slower than GMP inversion.
Source: https://github.com/herumi/mcl#how-to-build-without-gmp this code has GMP backend, LLVM i254 / LLVM i381 backend, or it's own JIT compiler that uses the MULX, ADCX and ADOX instructions.
Plain bigint in general, including GMP are bottlenecked by memory allocation and the iXXX from LLVM are purely stack-based.
Regarding constant-time, I know that there have been petition to provide a constant-time flag to LLVM but no guarantees so far. Unfortunately you have to take an after the fact verification approach today unless you dropdown to assembly (see https://github.com/mratsim/constantine/wiki/Constant-time-ar... on a couple of constant-time verifiers available)
Does LLVM's bigint implementation use those instructions? Last I tried LLVM can not handle the required data-dependencies on individual carry bits, making it impossible to use ADCX/ADOX in LLVM without inline assembly.
For the iXYZ themselves, the carries are properly handled and it can generate mulx at least: https://github.com/herumi/mcl/blob/master/src/asm/x86-64.bmi... (from straight LLVM IR). I don't think ADCX/ADOX are possible though even in LLVM IR.
I think you are mixing with one comment on the GCC mailing list about GCC not having the adequate representation of carry and even less a representation that can separate a carry and overflow chains[2]
[1] https://gcc.godbolt.org/z/2h768y [2] https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html
Side-channel resistant algorithms are only required when you are handling sensitive data. This is often not the case when you are verifying signatures, verifying proofs or generating succinct proofs of computation without a zero-knowledge requirement.