Worked for many years measuring and instrumenting systems at the nanosecond levels on Linux.
Windows is generally not considered viable for work at that level. Other Unix I've not used in that fashion sadly.
The python datetimes just wrap Linux calls to clock_gettime() to populate a timespec. On 32bit that is a syscall but on 64bit it is a VDSO, thus costs mostly just a memory access and avoids stack swap. (so very fast)
If you are using it to measure anything reliably you need a fairly solid ptp time infrastructure or it is pointless