DPDK patches and discussions
 help / color / mirror / Atom feed
From: "N. Benes" <nbenes@eso.org>
To: dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
Date: Fri, 29 May 2020 20:46:00 +0000	[thread overview]
Message-ID: <660c0552-ded2-7454-c6fc-43db9844f06c@eso.org> (raw)
In-Reply-To: <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se>

Hi everyone,

Tom Barbette:
> 
> Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested
>>>>>> it in the
>>>>>> device/port init routine and the result looks reliable. Since this
>>>>>> approach
>>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>>> integrate.
>>>>>>
>>>>>> The conversion should occur in the primary process (testpmd) I
>>>>>> suppose.
>>>>>> 1) The needed clock info derives from ethernet device. Is it
>>>>>> possible to
>>>>>>     access that struct from a rx callback?
>>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>>> 3) any other idea?
>>>>> The timestamp is carried in mbuf.
>>>>> Then the conversion must be done by the ethdev caller (application or
>>>>> any other upper layer).
>>>> What if the converter function needs a clock_info?
>>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>>>
>>>> I'm affraid this info may change by the time the converter is called
>>>> by upper layer.
>>> Indeed, the clock in the device is not an atomic one :)
>>> We need to adjust the time conversion continuously.
>>> I am not an expert of time synchronization, so I add more people Cc
>>> who could help for having a precise timestamp.
>> Thanks Thomas.
>> Not sure this is a synchronization issue. We have dedicated processes
>> (linuxptp) to keep both NIC and sys clocks in sync with an external
>> clock.
>> It is "just" a matter of unit conversion.
>>
>> If it has to be performed in dpdk-pdump, I would need some help to
>> retrieve mlx5_clock_info from inside a secondary process. Looking at
>> mlx5_read_clock(), this info is extracted from ibv_context which looks
>> reachable in a primary process only (segfault, if I try in pdump).

The normal phc2sys can not only synchronise NIC -> system but also sys
-> NIC and (I believe it does but have not tried) NIC1 -> NIC2.
If I understand your proposal correctly, you want to use a free running
NIC counter and calibrate out the drift afterwards.
It may be easier to adapt phc2sys to use a NIC through DPDK and sync the
NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering
excursions). Then you could directly use the NIC counter value.

> I don't know about the integrated ts-to-ns, but we implemented a
> translation mechanism that mimics what NTP does in Linux to translate a
> given clock (TSC at first) to a wall time. You'll find more info at
> https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
> 3.4.1.  This is an often forgotten matter, as we saw in real switches
> that the time spent in time-related VDSO is enormous.

Do you have measurements of vDSO clock_gettime and how much is
"enormous" to you?
To my knowledge, clock_gettime via vDSO on Linux only takes a few
nanoseconds in the average case. However, it can go up to ~10 or even
~50 microseconds every few (~10) seconds, depending on the number of
CPUs (for example single vs. dual socket, though my hardware for this
test is quite old, Dell R210-II, R610). Presumably this is when the
kernel locks the struct in VVAR to update the TSC drift compensation
parameters.

Linux clock_gettime implementation is here (different versions):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98

I use busy waiting on clock_gettime in a packet generator application
(so far 10 GbE only) to pace jumbo frames according to a spec
(simulating the traffic pattern of a to-be-developed hardware with
FPGA), and COTS sniffer hardware with absolute timestamping to verify my
generator's performance. I can observe the above 10-50 us artefacts and
sufficiently good/low (for my needs) average execution time of
clock_gettime. The only sad thing is that TAI clock does not go through
vDSO and therefore I cannot use it.

> We wanted to do a very precise capture too, se we made that clock able
> to synchronize itself with the ConnectX 5 internal clock as a base
> instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
> nanosecond is impossible, but close enough. It is for that purpose that
> I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
> read the current clock (like rdtsc() instruction for TSC) to compute the
> frequency.

Doesn't this mean that you need to wait for the PCIe op from the NIC?
Is this really faster than a rdtsc, memory/cache read, integer
multiplication and shift?

Cheers,
nicolas


  reply	other threads:[~2020-05-29 20:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19 18:20 PATRICK KEROULAS
2020-05-21 15:33 ` Thomas Monjalon
2020-05-21 19:57   ` PATRICK KEROULAS
2020-05-21 20:09     ` Thomas Monjalon
2020-05-22 18:43       ` PATRICK KEROULAS
2020-05-26  7:44         ` Tom Barbette
2020-05-29 20:46           ` N. Benes [this message]
2020-05-26 16:00         ` Slava Ovsiienko
2020-05-29 20:56           ` PATRICK KEROULAS
2020-05-31 19:47             ` Slava Ovsiienko
2020-06-02 19:18           ` PATRICK KEROULAS
2020-06-03  7:48             ` Slava Ovsiienko
2020-06-05  0:09               ` PATRICK KEROULAS
2020-06-05 16:30                 ` Slava Ovsiienko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=660c0552-ded2-7454-c6fc-43db9844f06c@eso.org \
    --to=nbenes@eso.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).