From: Tom Barbette <barbette@kth.se>
To: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca>,
Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org, Vivien Didelot <vivien.didelot@gmail.com>,
shahafs@mellanox.com, rasland@mellanox.com, matan@mellanox.com
Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
Date: Tue, 26 May 2020 09:44:55 +0200 [thread overview]
Message-ID: <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se> (raw)
In-Reply-To: <CALEF-=AGL=rCSYhZyWYcD8G7Z1d5r+fUX_NZWpp9FzH6UDuwiw@mail.gmail.com>
Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested it in the
>>>>> device/port init routine and the result looks reliable. Since this approach
>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>> integrate.
>>>>>
>>>>> The conversion should occur in the primary process (testpmd) I suppose.
>>>>> 1) The needed clock info derives from ethernet device. Is it possible to
>>>>> access that struct from a rx callback?
>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>> (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>> 3) any other idea?
>>>> The timestamp is carried in mbuf.
>>>> Then the conversion must be done by the ethdev caller (application or
>>>> any other upper layer).
>>> What if the converter function needs a clock_info?
>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>> I'm affraid this info may change by the time the converter is called
>>> by upper layer.
>> Indeed, the clock in the device is not an atomic one :)
>> We need to adjust the time conversion continuously.
>> I am not an expert of time synchronization, so I add more people Cc
>> who could help for having a precise timestamp.
> Thanks Thomas.
> Not sure this is a synchronization issue. We have dedicated processes
> (linuxptp) to keep both NIC and sys clocks in sync with an external clock.
> It is "just" a matter of unit conversion.
>
> If it has to be performed in dpdk-pdump, I would need some help to
> retrieve mlx5_clock_info from inside a secondary process. Looking at
> mlx5_read_clock(), this info is extracted from ibv_context which looks
> reachable in a primary process only (segfault, if I try in pdump).
I don't know about the integrated ts-to-ns, but we implemented a
translation mechanism that mimics what NTP does in Linux to translate a
given clock (TSC at first) to a wall time. You'll find more info at
https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
3.4.1. This is an often forgotten matter, as we saw in real switches
that the time spent in time-related VDSO is enormous.
We wanted to do a very precise capture too, se we made that clock able
to synchronize itself with the ConnectX 5 internal clock as a base
instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
nanosecond is impossible, but close enough. It is for that purpose that
I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
read the current clock (like rdtsc() instruction for TSC) to compute the
frequency.
The "converter" code is there :
https://github.com/tbarbette/fastclick/blob/master/elements/userlevel/tscclock.cc,
the source is configurable (TSC, rte_eth_read_clock, GPS meinberg clock,
...), the DPDK one is there :
https://github.com/tbarbette/fastclick/blob/2ab021283b82d0b980551480c505ec8dced98e0a/elements/userlevel/dpdkdevclock.cc#L27
One important thing is that the conversion factor must be changed from
time to time to fix the drifiting. That is the reason why we can't just
push a bunch of code to DPDK (and it's probably not as simple as using
the ts-to-ns in mlx5) because you must have a timer, and use a RCU to
update "atomically" a > 64bits struct. Though most of that is available
now in DPDK but there will always be some setup (rcu barrier, timer
init, ...).
In the end it's not hard science... It worked like a charm to do a
campus trace capture on 100G hardware.
next prev parent reply other threads:[~2020-05-26 7:44 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-19 18:20 PATRICK KEROULAS
2020-05-21 15:33 ` Thomas Monjalon
2020-05-21 19:57 ` PATRICK KEROULAS
2020-05-21 20:09 ` Thomas Monjalon
2020-05-22 18:43 ` PATRICK KEROULAS
2020-05-26 7:44 ` Tom Barbette [this message]
2020-05-29 20:46 ` N. Benes
2020-05-26 16:00 ` Slava Ovsiienko
2020-05-29 20:56 ` PATRICK KEROULAS
2020-05-31 19:47 ` Slava Ovsiienko
2020-06-02 19:18 ` PATRICK KEROULAS
2020-06-03 7:48 ` Slava Ovsiienko
2020-06-05 0:09 ` PATRICK KEROULAS
2020-06-05 16:30 ` Slava Ovsiienko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se \
--to=barbette@kth.se \
--cc=dev@dpdk.org \
--cc=matan@mellanox.com \
--cc=patrick.keroulas@radio-canada.ca \
--cc=rasland@mellanox.com \
--cc=shahafs@mellanox.com \
--cc=thomas@monjalon.net \
--cc=vivien.didelot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).