Devs are too scared to be nice (ie not return errors) to clients when they misbehave.
But, rather than 409, I'd say that you should be using opportunistic concurrency control if you adopt this perspective. There should be a resource context for the request, so the client can obtain an ETag and send If-None-Match headers, and get a 412 response if things are out of sync. That allows them to retry a failed/lost request and safely prevent a double action.
Under a 412, they have to step back and retry a larger loop where they GET some new state and prepare a new action. Just like in DB transaction programming, where your failed commit means you roll back, clean the slate, and start a whole new interrogation of transaction-protected state leading up to your new mutation request.
That doesn't mean that idempotency keys have to be used. You can certainly hash message content if that is documented behavior. That probably only makes sense when there is already some logical session or transaction identifier that makes dedupe semantics clear.
The system you propose might be sound and might be necessary in some systems, but I can't think of what they might be that wouldn't be better served by the simpler solution that is already widely used for this purpose.
You need to store the payment state at each relevant step and process it asynchronously. If requests time out, you check the status of it using the key you store (with the processor) to see if it was even received.
It’s not perfect, some processors will 500 while processing the payment (Braintree), so you still need reconciliation on the backend.
If it processed 99% of the request and the final bookkeeping failed because of a duplicate, that's still a failed request.
Arguably this should be the primary way you check for idempotent requests - you shouldn't have a separate check for existence, you should have the insert/update fail atomically.
This is the same thing you see on filesystems for TOCTOU security holes - the right way is to atomically access and modify once, and you only know the request was already processed because that fails.
The fact that payments have a settlement process is not relevant to this discussion.
Even if you have a complex long-running multistep orchestration problem, you can break it down into simpler transactions. Eg you could start with a "lock the resources" txn.
But 99% of these conversations around idempotence are simple POST operations like "create order" that regular old database concurrency management handles just fine.
That doesn't answer my question. What response do you return to the client in the case I described?
So one will complete with 200, one will complete with 409. It doesn't matter which.
That said, there's something odd about the way you phrased this question. If the original request hasn't gotten a response yet, why is it sending a retry? What you're asking is more general: What happens when two conflicting requests come in? This is something we've been solving with RDBMSes since the 1970s.
But your follow up responses here are making me rethink. Now you have to have all these special cases where the original request is still in process. I think or assertion of "99% are simple POST operations" is bullshit. For the times where idempotency is hard and really matters, often times you're calling a third party API, like a payment processing API.
I would think a better approach would be to always return a 409 on a subsequent request, regardless of whether it passed or failed, and then have a separate standard API that lets you get the result of any request by its idempotency key.
I.e. idempotent DELETE with proper protocol behavior requires that one request see the 200 OK or 204 No Content and the other sees 404 Not Found, because the delete has already happened. It would be misleading to say 200 OK to both, because that answer means the resource was there when the request arrived.
Honestly, the whole HTTP resource model has a different conceptual backing for state management than the independently developed "idempotence" concepts in distributed systems. Those non-HTTP concepts came from more message-based rather than resource-based architectural assumptions.
The cleanest mapping in the spirit of HTTP would be that you do multiple round trips. A POST creates a new idempotence context, a bit like "start a transaction". The new URI is the key for coordinating state change and allowing restart/recovery.
As I remember it, the idea of idempotence keys in headers really came from the SOAP RPC mindset. It's kind of funny to see it persisting in some hybrid SOAP + REST mental model.
This isn't a special case, and it's the same problem if you want to replay the original response on conflict. If the original request isn't complete, what are you going to replay?