That queue is about as fast as it gets, <10 ns. Timestamp is taken before queueing.
Again, due to bad calibration code the measured timestamps have quite a bit jitter.
Edit: TSC might not be synchronized in multi-socket systems. (Multiple physical CPU sockets). That can generate a large error.