DPDK patches and discussions
 help / color / mirror / Atom feed
From: Tom Barbette <barbette@kth.se>
To: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca>,
	Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org, Vivien Didelot <vivien.didelot@gmail.com>,
	shahafs@mellanox.com, rasland@mellanox.com, matan@mellanox.com
Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
Date: Tue, 26 May 2020 09:44:55 +0200	[thread overview]
Message-ID: <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se> (raw)
In-Reply-To: <CALEF-=AGL=rCSYhZyWYcD8G7Z1d5r+fUX_NZWpp9FzH6UDuwiw@mail.gmail.com>


Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested it in the
>>>>> device/port init routine and the result looks reliable. Since this approach
>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>> integrate.
>>>>>
>>>>> The conversion should occur in the primary process (testpmd) I suppose.
>>>>> 1) The needed clock info derives from ethernet device. Is it possible to
>>>>>     access that struct from a rx callback?
>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>> 3) any other idea?
>>>> The timestamp is carried in mbuf.
>>>> Then the conversion must be done by the ethdev caller (application or
>>>> any other upper layer).
>>> What if the converter function needs a clock_info?
>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>> I'm affraid this info may change by the time the converter is called
>>> by upper layer.
>> Indeed, the clock in the device is not an atomic one :)
>> We need to adjust the time conversion continuously.
>> I am not an expert of time synchronization, so I add more people Cc
>> who could help for having a precise timestamp.
> Thanks Thomas.
> Not sure this is a synchronization issue. We have dedicated processes
> (linuxptp) to keep both NIC and sys clocks in sync with an external clock.
> It is "just" a matter of unit conversion.
>
> If it has to be performed in dpdk-pdump, I would need some help to
> retrieve mlx5_clock_info from inside a secondary process. Looking at
> mlx5_read_clock(), this info is extracted from ibv_context which looks
> reachable in a primary process only (segfault, if I try in pdump).


I don't know about the integrated ts-to-ns, but we implemented a 
translation mechanism that mimics what NTP does in Linux to translate a 
given clock (TSC at first) to a wall time. You'll find more info at 
https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter 
3.4.1.  This is an often forgotten matter, as we saw in real switches 
that the time spent in time-related VDSO is enormous.

We wanted to do a very precise capture too, se we made that clock able 
to synchronize itself with the ConnectX 5 internal clock as a base 
instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure 
nanosecond is impossible, but close enough. It is for that purpose that 
I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to 
read the current clock (like rdtsc() instruction for TSC) to compute the 
frequency.

The "converter" code is there : 
https://github.com/tbarbette/fastclick/blob/master/elements/userlevel/tscclock.cc, 
the source is configurable (TSC, rte_eth_read_clock, GPS meinberg clock, 
...), the DPDK one is there : 
https://github.com/tbarbette/fastclick/blob/2ab021283b82d0b980551480c505ec8dced98e0a/elements/userlevel/dpdkdevclock.cc#L27 


One important thing is that the conversion factor must be changed from 
time to time to fix the drifiting. That is the reason why we can't just 
push a bunch of code to DPDK (and it's probably not as simple as using 
the ts-to-ns in mlx5) because you must have a timer, and use a RCU to 
update "atomically" a > 64bits struct. Though most of that is available 
now in DPDK but there will always be some setup (rcu barrier, timer 
init, ...).

In the end it's not hard science... It worked like a charm to do a 
campus trace capture on 100G hardware.


  reply	other threads:[~2020-05-26  7:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19 18:20 PATRICK KEROULAS
2020-05-21 15:33 ` Thomas Monjalon
2020-05-21 19:57   ` PATRICK KEROULAS
2020-05-21 20:09     ` Thomas Monjalon
2020-05-22 18:43       ` PATRICK KEROULAS
2020-05-26  7:44         ` Tom Barbette [this message]
2020-05-29 20:46           ` N. Benes
2020-05-26 16:00         ` Slava Ovsiienko
2020-05-29 20:56           ` PATRICK KEROULAS
2020-05-31 19:47             ` Slava Ovsiienko
2020-06-02 19:18           ` PATRICK KEROULAS
2020-06-03  7:48             ` Slava Ovsiienko
2020-06-05  0:09               ` PATRICK KEROULAS
2020-06-05 16:30                 ` Slava Ovsiienko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se \
    --to=barbette@kth.se \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    --cc=patrick.keroulas@radio-canada.ca \
    --cc=rasland@mellanox.com \
    --cc=shahafs@mellanox.com \
    --cc=thomas@monjalon.net \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).