I convinced myself that a viable way to measure timestamps between a
request packet and its response packet can be the difference between two
Intel rdtsc calls

The restrictions to valid use include:

   - RTT (time difference) must be calculated on the same CORE
   - fencing instructions (lfence) could be required

The time difference is OK provided,

   - it delivers at least microsecond resolution (rdtsc does)
   - the difference is always positive (end-start) or zero
   - the details of whether the clock runs or does not run at the processor
   speed is not material so long as there's sufficient resolution
   - DPDK gives me the frequency rte_rdtsc_cycles(); this way I can convert
   from a rdtsc difference to elapsed time
   - The OS doesn't reset the counter or pause it for interrupts or on halts

I think rdtsc does all this. But then I read [1]:

   - The TSC is not always invariant
   - And of course context switches (if a thread is not pinned to a core)
   will invalidate any time difference
   - The TSC is not incremented when the processor enters a deep sleep. I
   don't care about this because I'll turn off the power saving modes anyway

So I am not so sure.

Now, of course, Mellanox can report time stamps. Is it actually possible to
get HW NIC timestamps reported for every packet sent and received without
overburdening the NIC? Based on what I can see for my case (Connect 4 LX)
resolution is nanoseconds. So I am tempted to not fool around with rdtsc
and just use NIC timestamps.

What is praxis in DPDK programming when one needs RTTs?

[1]
https://stackoverflow.com/questions/42189976/calculate-system-time-using-rdtsc