From: "N. Benes" <nbenes@eso.org>
To: dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
Date: Fri, 29 May 2020 20:46:00 +0000 [thread overview]
Message-ID: <660c0552-ded2-7454-c6fc-43db9844f06c@eso.org> (raw)
In-Reply-To: <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se>
Hi everyone,
Tom Barbette:
>
> Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested
>>>>>> it in the
>>>>>> device/port init routine and the result looks reliable. Since this
>>>>>> approach
>>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>>> integrate.
>>>>>>
>>>>>> The conversion should occur in the primary process (testpmd) I
>>>>>> suppose.
>>>>>> 1) The needed clock info derives from ethernet device. Is it
>>>>>> possible to
>>>>>> access that struct from a rx callback?
>>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>> (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>>> 3) any other idea?
>>>>> The timestamp is carried in mbuf.
>>>>> Then the conversion must be done by the ethdev caller (application or
>>>>> any other upper layer).
>>>> What if the converter function needs a clock_info?
>>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>>>
>>>> I'm affraid this info may change by the time the converter is called
>>>> by upper layer.
>>> Indeed, the clock in the device is not an atomic one :)
>>> We need to adjust the time conversion continuously.
>>> I am not an expert of time synchronization, so I add more people Cc
>>> who could help for having a precise timestamp.
>> Thanks Thomas.
>> Not sure this is a synchronization issue. We have dedicated processes
>> (linuxptp) to keep both NIC and sys clocks in sync with an external
>> clock.
>> It is "just" a matter of unit conversion.
>>
>> If it has to be performed in dpdk-pdump, I would need some help to
>> retrieve mlx5_clock_info from inside a secondary process. Looking at
>> mlx5_read_clock(), this info is extracted from ibv_context which looks
>> reachable in a primary process only (segfault, if I try in pdump).
The normal phc2sys can not only synchronise NIC -> system but also sys
-> NIC and (I believe it does but have not tried) NIC1 -> NIC2.
If I understand your proposal correctly, you want to use a free running
NIC counter and calibrate out the drift afterwards.
It may be easier to adapt phc2sys to use a NIC through DPDK and sync the
NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering
excursions). Then you could directly use the NIC counter value.
> I don't know about the integrated ts-to-ns, but we implemented a
> translation mechanism that mimics what NTP does in Linux to translate a
> given clock (TSC at first) to a wall time. You'll find more info at
> https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
> 3.4.1. This is an often forgotten matter, as we saw in real switches
> that the time spent in time-related VDSO is enormous.
Do you have measurements of vDSO clock_gettime and how much is
"enormous" to you?
To my knowledge, clock_gettime via vDSO on Linux only takes a few
nanoseconds in the average case. However, it can go up to ~10 or even
~50 microseconds every few (~10) seconds, depending on the number of
CPUs (for example single vs. dual socket, though my hardware for this
test is quite old, Dell R210-II, R610). Presumably this is when the
kernel locks the struct in VVAR to update the TSC drift compensation
parameters.
Linux clock_gettime implementation is here (different versions):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98
I use busy waiting on clock_gettime in a packet generator application
(so far 10 GbE only) to pace jumbo frames according to a spec
(simulating the traffic pattern of a to-be-developed hardware with
FPGA), and COTS sniffer hardware with absolute timestamping to verify my
generator's performance. I can observe the above 10-50 us artefacts and
sufficiently good/low (for my needs) average execution time of
clock_gettime. The only sad thing is that TAI clock does not go through
vDSO and therefore I cannot use it.
> We wanted to do a very precise capture too, se we made that clock able
> to synchronize itself with the ConnectX 5 internal clock as a base
> instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
> nanosecond is impossible, but close enough. It is for that purpose that
> I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
> read the current clock (like rdtsc() instruction for TSC) to compute the
> frequency.
Doesn't this mean that you need to wait for the PCIe op from the NIC?
Is this really faster than a rdtsc, memory/cache read, integer
multiplication and shift?
Cheers,
nicolas
next prev parent reply other threads:[~2020-05-29 20:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-19 18:20 PATRICK KEROULAS
2020-05-21 15:33 ` Thomas Monjalon
2020-05-21 19:57 ` PATRICK KEROULAS
2020-05-21 20:09 ` Thomas Monjalon
2020-05-22 18:43 ` PATRICK KEROULAS
2020-05-26 7:44 ` Tom Barbette
2020-05-29 20:46 ` N. Benes [this message]
2020-05-26 16:00 ` Slava Ovsiienko
2020-05-29 20:56 ` PATRICK KEROULAS
2020-05-31 19:47 ` Slava Ovsiienko
2020-06-02 19:18 ` PATRICK KEROULAS
2020-06-03 7:48 ` Slava Ovsiienko
2020-06-05 0:09 ` PATRICK KEROULAS
2020-06-05 16:30 ` Slava Ovsiienko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=660c0552-ded2-7454-c6fc-43db9844f06c@eso.org \
--to=nbenes@eso.org \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).