I'll be interested in the post-mortem from Azure on this one.
Well, that's interesting. We occasionally see getaddrinfo() calls fail claiming domains that we know exist at the failure time (b/c the records are completely static) don't exist. (We've not got a reproducible case for this yet, and it's incredibly rare for any given VM/service. But across our fleet, it crops up fairly regularly.)
That said the most common cause of authoritative nxdomain is if youre adding/deleting records and querying them before propagation is complete. You may want to log/poll your rrset change status separately to correlate.
The other is that depending on networks intermediate dns tampering happens all the time. Qname, rname, rtype, all get modified. Responses and queries are duplicated, intercepted, and manipulated. Some good research out of dns oarc and a dude out of australia (iirc).
That could be whatever resolvers you're hitting failing rather than an issue with Route 53 authoritative nameservers, though. The resolving DNS servers in EC2 are not actually part of Route 53, for example.
(I had to, see username!)
edit: seriously,-3 ? it was a joke.