DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
@ 2020-05-19 18:20 PATRICK KEROULAS
  2020-05-21 15:33 ` Thomas Monjalon
  0 siblings, 1 reply; 14+ messages in thread
From: PATRICK KEROULAS @ 2020-05-19 18:20 UTC (permalink / raw)
  To: dev

Hello,

I'm trying to build an accurate capture device based on Mellanox
Connect-X5 with following requirements:
- capture every incoming packets with hardware timestamps
- output: pcap with timestamps in nanoseconds
My problem is that the packets forwarded to `dpdk-pdump` carry raw
timestamps from NIC clock.

mlx5 part of libibverbs includes a ts-to-ns converter which takes the
instantaneous clock info. It's unused in dpdk so far. I've tested it in the
device/port init routine and the result looks reliable. Since this approach
looks very simple, compared to the time sync mechanism, I'm trying to
integrate.

The conversion should occur in the primary process (testpmd) I suppose.
1) The needed clock info derives from ethernet device. Is it possible to
   access that struct from a rx callback?
2) how to attach the nanosecond to mbuf so that `pdump` catches it?
   (workaround: copy `mbuf->udata64` in forwarded packets.)
3) any other idea?

Regards,

Patrick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-19 18:20 [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds PATRICK KEROULAS
@ 2020-05-21 15:33 ` Thomas Monjalon
  2020-05-21 19:57   ` PATRICK KEROULAS
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Monjalon @ 2020-05-21 15:33 UTC (permalink / raw)
  To: PATRICK KEROULAS; +Cc: dev

19/05/2020 20:20, PATRICK KEROULAS:
> Hello,
> 
> I'm trying to build an accurate capture device based on Mellanox
> Connect-X5 with following requirements:
> - capture every incoming packets with hardware timestamps
> - output: pcap with timestamps in nanoseconds
> My problem is that the packets forwarded to `dpdk-pdump` carry raw
> timestamps from NIC clock.

Please could you describe how you use dpdk-pdump?
Is it using the mlx5 PMD or pcap PMD?


> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
> instantaneous clock info. It's unused in dpdk so far. I've tested it in the
> device/port init routine and the result looks reliable. Since this approach
> looks very simple, compared to the time sync mechanism, I'm trying to
> integrate.
> 
> The conversion should occur in the primary process (testpmd) I suppose.
> 1) The needed clock info derives from ethernet device. Is it possible to
>    access that struct from a rx callback?
> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>    (workaround: copy `mbuf->udata64` in forwarded packets.)
> 3) any other idea?

The timestamp is carried in mbuf.
Then the conversion must be done by the ethdev caller (application or
any other upper layer).



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-21 15:33 ` Thomas Monjalon
@ 2020-05-21 19:57   ` PATRICK KEROULAS
  2020-05-21 20:09     ` Thomas Monjalon
  0 siblings, 1 reply; 14+ messages in thread
From: PATRICK KEROULAS @ 2020-05-21 19:57 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: Vivien Didelot

> > I'm trying to build an accurate capture device based on Mellanox
> > Connect-X5 with following requirements:
> > - capture every incoming packets with hardware timestamps
> > - output: pcap with timestamps in nanoseconds
> > My problem is that the packets forwarded to `dpdk-pdump` carry raw
> > timestamps from NIC clock.
>
> Please could you describe how you use dpdk-pdump?
> Is it using the mlx5 PMD or pcap PMD?

We're actually using both:
# Rx, receive from NIC
CONFIG_RTE_LIBRTE_MLX5_PMD=y
# Tx, output to pcap file
CONFIG_RTE_LIBRTE_PMD_PCAP=y

$ sudo ./build/app/testpmd -w 0000:01:00.0 -w 0000:01:00.1 -n4 --
--enable-rx-timestamp
$ dpdk-pdump -- --pdump 'port=0,queue=*,rx-dev=capture.pcap'

We've sent this placeholder with the intention to start the discussion.
https://github.com/DPDK/dpdk/commit/bd371e1ba5bfc5b7092d712a01bbc28799fd58bc.patch
https://github.com/DPDK/dpdk/commit/e6f5c731c4ab27ab80b229af98c9b3d257e13843.patch

> > mlx5 part of libibverbs includes a ts-to-ns converter which takes the
> > instantaneous clock info. It's unused in dpdk so far. I've tested it in the
> > device/port init routine and the result looks reliable. Since this approach
> > looks very simple, compared to the time sync mechanism, I'm trying to
> > integrate.
> >
> > The conversion should occur in the primary process (testpmd) I suppose.
> > 1) The needed clock info derives from ethernet device. Is it possible to
> >    access that struct from a rx callback?
> > 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
> >    (workaround: copy `mbuf->udata64` in forwarded packets.)
> > 3) any other idea?
>
> The timestamp is carried in mbuf.
> Then the conversion must be done by the ethdev caller (application or
> any other upper layer).

What if the converter function needs a clock_info?
https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
I'm affraid this info may change by the time the converter is called
by upper layer.

Thanks,
PK

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-21 19:57   ` PATRICK KEROULAS
@ 2020-05-21 20:09     ` Thomas Monjalon
  2020-05-22 18:43       ` PATRICK KEROULAS
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Monjalon @ 2020-05-21 20:09 UTC (permalink / raw)
  To: PATRICK KEROULAS; +Cc: dev, Vivien Didelot, shahafs, rasland, matan

21/05/2020 21:57, PATRICK KEROULAS:
> > > I'm trying to build an accurate capture device based on Mellanox
> > > Connect-X5 with following requirements:
> > > - capture every incoming packets with hardware timestamps
> > > - output: pcap with timestamps in nanoseconds
> > > My problem is that the packets forwarded to `dpdk-pdump` carry raw
> > > timestamps from NIC clock.
> >
> > Please could you describe how you use dpdk-pdump?
> > Is it using the mlx5 PMD or pcap PMD?
> 
> We're actually using both:
> # Rx, receive from NIC
> CONFIG_RTE_LIBRTE_MLX5_PMD=y
> # Tx, output to pcap file
> CONFIG_RTE_LIBRTE_PMD_PCAP=y
> 
> $ sudo ./build/app/testpmd -w 0000:01:00.0 -w 0000:01:00.1 -n4 --
> --enable-rx-timestamp
> $ dpdk-pdump -- --pdump 'port=0,queue=*,rx-dev=capture.pcap'

OK thanks

> We've sent this placeholder with the intention to start the discussion.
> https://github.com/DPDK/dpdk/commit/bd371e1ba5bfc5b7092d712a01bbc28799fd58bc.patch
> https://github.com/DPDK/dpdk/commit/e6f5c731c4ab27ab80b229af98c9b3d257e13843.patch

We don't use GitHub. It is just a mirror.
Thanks for having started the discussion in the mailing list,
it is more efficient :)


> > > mlx5 part of libibverbs includes a ts-to-ns converter which takes the
> > > instantaneous clock info. It's unused in dpdk so far. I've tested it in the
> > > device/port init routine and the result looks reliable. Since this approach
> > > looks very simple, compared to the time sync mechanism, I'm trying to
> > > integrate.
> > >
> > > The conversion should occur in the primary process (testpmd) I suppose.
> > > 1) The needed clock info derives from ethernet device. Is it possible to
> > >    access that struct from a rx callback?
> > > 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
> > >    (workaround: copy `mbuf->udata64` in forwarded packets.)
> > > 3) any other idea?
> >
> > The timestamp is carried in mbuf.
> > Then the conversion must be done by the ethdev caller (application or
> > any other upper layer).
> 
> What if the converter function needs a clock_info?
> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
> I'm affraid this info may change by the time the converter is called
> by upper layer.

Indeed, the clock in the device is not an atomic one :)
We need to adjust the time conversion continuously.
I am not an expert of time synchronization, so I add more people Cc
who could help for having a precise timestamp.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-21 20:09     ` Thomas Monjalon
@ 2020-05-22 18:43       ` PATRICK KEROULAS
  2020-05-26  7:44         ` Tom Barbette
  2020-05-26 16:00         ` Slava Ovsiienko
  0 siblings, 2 replies; 14+ messages in thread
From: PATRICK KEROULAS @ 2020-05-22 18:43 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Vivien Didelot, shahafs, rasland, matan

> > > > mlx5 part of libibverbs includes a ts-to-ns converter which takes the
> > > > instantaneous clock info. It's unused in dpdk so far. I've tested it in the
> > > > device/port init routine and the result looks reliable. Since this approach
> > > > looks very simple, compared to the time sync mechanism, I'm trying to
> > > > integrate.
> > > >
> > > > The conversion should occur in the primary process (testpmd) I suppose.
> > > > 1) The needed clock info derives from ethernet device. Is it possible to
> > > >    access that struct from a rx callback?
> > > > 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
> > > >    (workaround: copy `mbuf->udata64` in forwarded packets.)
> > > > 3) any other idea?
> > >
> > > The timestamp is carried in mbuf.
> > > Then the conversion must be done by the ethdev caller (application or
> > > any other upper layer).
> >
> > What if the converter function needs a clock_info?
> > https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
> > I'm affraid this info may change by the time the converter is called
> > by upper layer.
>
> Indeed, the clock in the device is not an atomic one :)
> We need to adjust the time conversion continuously.
> I am not an expert of time synchronization, so I add more people Cc
> who could help for having a precise timestamp.

Thanks Thomas.
Not sure this is a synchronization issue. We have dedicated processes
(linuxptp) to keep both NIC and sys clocks in sync with an external clock.
It is "just" a matter of unit conversion.

If it has to be performed in dpdk-pdump, I would need some help to
retrieve mlx5_clock_info from inside a secondary process. Looking at
mlx5_read_clock(), this info is extracted from ibv_context which looks
reachable in a primary process only (segfault, if I try in pdump).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-22 18:43       ` PATRICK KEROULAS
@ 2020-05-26  7:44         ` Tom Barbette
  2020-05-29 20:46           ` N. Benes
  2020-05-26 16:00         ` Slava Ovsiienko
  1 sibling, 1 reply; 14+ messages in thread
From: Tom Barbette @ 2020-05-26  7:44 UTC (permalink / raw)
  To: PATRICK KEROULAS, Thomas Monjalon
  Cc: dev, Vivien Didelot, shahafs, rasland, matan


Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested it in the
>>>>> device/port init routine and the result looks reliable. Since this approach
>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>> integrate.
>>>>>
>>>>> The conversion should occur in the primary process (testpmd) I suppose.
>>>>> 1) The needed clock info derives from ethernet device. Is it possible to
>>>>>     access that struct from a rx callback?
>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>> 3) any other idea?
>>>> The timestamp is carried in mbuf.
>>>> Then the conversion must be done by the ethdev caller (application or
>>>> any other upper layer).
>>> What if the converter function needs a clock_info?
>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>> I'm affraid this info may change by the time the converter is called
>>> by upper layer.
>> Indeed, the clock in the device is not an atomic one :)
>> We need to adjust the time conversion continuously.
>> I am not an expert of time synchronization, so I add more people Cc
>> who could help for having a precise timestamp.
> Thanks Thomas.
> Not sure this is a synchronization issue. We have dedicated processes
> (linuxptp) to keep both NIC and sys clocks in sync with an external clock.
> It is "just" a matter of unit conversion.
>
> If it has to be performed in dpdk-pdump, I would need some help to
> retrieve mlx5_clock_info from inside a secondary process. Looking at
> mlx5_read_clock(), this info is extracted from ibv_context which looks
> reachable in a primary process only (segfault, if I try in pdump).


I don't know about the integrated ts-to-ns, but we implemented a 
translation mechanism that mimics what NTP does in Linux to translate a 
given clock (TSC at first) to a wall time. You'll find more info at 
https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter 
3.4.1.  This is an often forgotten matter, as we saw in real switches 
that the time spent in time-related VDSO is enormous.

We wanted to do a very precise capture too, se we made that clock able 
to synchronize itself with the ConnectX 5 internal clock as a base 
instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure 
nanosecond is impossible, but close enough. It is for that purpose that 
I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to 
read the current clock (like rdtsc() instruction for TSC) to compute the 
frequency.

The "converter" code is there : 
https://github.com/tbarbette/fastclick/blob/master/elements/userlevel/tscclock.cc, 
the source is configurable (TSC, rte_eth_read_clock, GPS meinberg clock, 
...), the DPDK one is there : 
https://github.com/tbarbette/fastclick/blob/2ab021283b82d0b980551480c505ec8dced98e0a/elements/userlevel/dpdkdevclock.cc#L27 


One important thing is that the conversion factor must be changed from 
time to time to fix the drifiting. That is the reason why we can't just 
push a bunch of code to DPDK (and it's probably not as simple as using 
the ts-to-ns in mlx5) because you must have a timer, and use a RCU to 
update "atomically" a > 64bits struct. Though most of that is available 
now in DPDK but there will always be some setup (rcu barrier, timer 
init, ...).

In the end it's not hard science... It worked like a charm to do a 
campus trace capture on 100G hardware.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-22 18:43       ` PATRICK KEROULAS
  2020-05-26  7:44         ` Tom Barbette
@ 2020-05-26 16:00         ` Slava Ovsiienko
  2020-05-29 20:56           ` PATRICK KEROULAS
  2020-06-02 19:18           ` PATRICK KEROULAS
  1 sibling, 2 replies; 14+ messages in thread
From: Slava Ovsiienko @ 2020-05-26 16:00 UTC (permalink / raw)
  To: PATRICK KEROULAS, Thomas Monjalon
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad

Hi, Patrick

ConnectX HW timestamp is the captured value of internal 64-bit counter running at the frequency,
reported in the device_frequency_khz field of struct mlx5_ifc_cmd_hca_cap_bits{}.
This structure is queried in mlx5_devx_cmd_query_hca_attr() routine.
So, with known frequency it is possible to recalculate timestamp ticks to desired units.

With best regards, Slava

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of PATRICK KEROULAS
> Sent: Friday, May 22, 2020 21:43
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Vivien Didelot <vivien.didelot@gmail.com>; Shahaf
> Shuler <shahafs@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Matan Azrad <matan@mellanox.com>
> Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to
> nanoseconds
> 
> > > > > mlx5 part of libibverbs includes a ts-to-ns converter which
> > > > > takes the instantaneous clock info. It's unused in dpdk so far.
> > > > > I've tested it in the device/port init routine and the result
> > > > > looks reliable. Since this approach looks very simple, compared
> > > > > to the time sync mechanism, I'm trying to integrate.
> > > > >
> > > > > The conversion should occur in the primary process (testpmd) I
> suppose.
> > > > > 1) The needed clock info derives from ethernet device. Is it possible to
> > > > >    access that struct from a rx callback?
> > > > > 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
> > > > >    (workaround: copy `mbuf->udata64` in forwarded packets.)
> > > > > 3) any other idea?
> > > >
> > > > The timestamp is carried in mbuf.
> > > > Then the conversion must be done by the ethdev caller (application
> > > > or any other upper layer).
> > >
> > > What if the converter function needs a clock_info?
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub.com%2Flinux-rdma%2Frdma-
> core%2Fblob%2F7af01c79e00555207dee6132d
> > >
> 72e7bfc1bb5485e%2Fproviders%2Fmlx5%2Fmlx5dv.h%23L1201&amp;data=
> 02%7C
> > >
> 01%7Cviacheslavo%40mellanox.com%7C381f1c9dd36f4e18e9c908d7fe8001
> 3b%7
> > >
> Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63725769806348887
> 6&amp;s
> > >
> data=CNc%2B5dyFCeFRQn56S5NNzEfTCtnInm059wxwX5GX96E%3D&amp;re
> served=0
> > > I'm affraid this info may change by the time the converter is called
> > > by upper layer.
> >
> > Indeed, the clock in the device is not an atomic one :) We need to
> > adjust the time conversion continuously.
> > I am not an expert of time synchronization, so I add more people Cc
> > who could help for having a precise timestamp.
> 
> Thanks Thomas.
> Not sure this is a synchronization issue. We have dedicated processes
> (linuxptp) to keep both NIC and sys clocks in sync with an external clock.
> It is "just" a matter of unit conversion.
> 
> If it has to be performed in dpdk-pdump, I would need some help to retrieve
> mlx5_clock_info from inside a secondary process. Looking at
> mlx5_read_clock(), this info is extracted from ibv_context which looks
> reachable in a primary process only (segfault, if I try in pdump).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-26  7:44         ` Tom Barbette
@ 2020-05-29 20:46           ` N. Benes
  0 siblings, 0 replies; 14+ messages in thread
From: N. Benes @ 2020-05-29 20:46 UTC (permalink / raw)
  To: dev

Hi everyone,

Tom Barbette:
> 
> Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested
>>>>>> it in the
>>>>>> device/port init routine and the result looks reliable. Since this
>>>>>> approach
>>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>>> integrate.
>>>>>>
>>>>>> The conversion should occur in the primary process (testpmd) I
>>>>>> suppose.
>>>>>> 1) The needed clock info derives from ethernet device. Is it
>>>>>> possible to
>>>>>>     access that struct from a rx callback?
>>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>>> 3) any other idea?
>>>>> The timestamp is carried in mbuf.
>>>>> Then the conversion must be done by the ethdev caller (application or
>>>>> any other upper layer).
>>>> What if the converter function needs a clock_info?
>>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>>>
>>>> I'm affraid this info may change by the time the converter is called
>>>> by upper layer.
>>> Indeed, the clock in the device is not an atomic one :)
>>> We need to adjust the time conversion continuously.
>>> I am not an expert of time synchronization, so I add more people Cc
>>> who could help for having a precise timestamp.
>> Thanks Thomas.
>> Not sure this is a synchronization issue. We have dedicated processes
>> (linuxptp) to keep both NIC and sys clocks in sync with an external
>> clock.
>> It is "just" a matter of unit conversion.
>>
>> If it has to be performed in dpdk-pdump, I would need some help to
>> retrieve mlx5_clock_info from inside a secondary process. Looking at
>> mlx5_read_clock(), this info is extracted from ibv_context which looks
>> reachable in a primary process only (segfault, if I try in pdump).

The normal phc2sys can not only synchronise NIC -> system but also sys
-> NIC and (I believe it does but have not tried) NIC1 -> NIC2.
If I understand your proposal correctly, you want to use a free running
NIC counter and calibrate out the drift afterwards.
It may be easier to adapt phc2sys to use a NIC through DPDK and sync the
NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering
excursions). Then you could directly use the NIC counter value.

> I don't know about the integrated ts-to-ns, but we implemented a
> translation mechanism that mimics what NTP does in Linux to translate a
> given clock (TSC at first) to a wall time. You'll find more info at
> https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
> 3.4.1.  This is an often forgotten matter, as we saw in real switches
> that the time spent in time-related VDSO is enormous.

Do you have measurements of vDSO clock_gettime and how much is
"enormous" to you?
To my knowledge, clock_gettime via vDSO on Linux only takes a few
nanoseconds in the average case. However, it can go up to ~10 or even
~50 microseconds every few (~10) seconds, depending on the number of
CPUs (for example single vs. dual socket, though my hardware for this
test is quite old, Dell R210-II, R610). Presumably this is when the
kernel locks the struct in VVAR to update the TSC drift compensation
parameters.

Linux clock_gettime implementation is here (different versions):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98

I use busy waiting on clock_gettime in a packet generator application
(so far 10 GbE only) to pace jumbo frames according to a spec
(simulating the traffic pattern of a to-be-developed hardware with
FPGA), and COTS sniffer hardware with absolute timestamping to verify my
generator's performance. I can observe the above 10-50 us artefacts and
sufficiently good/low (for my needs) average execution time of
clock_gettime. The only sad thing is that TAI clock does not go through
vDSO and therefore I cannot use it.

> We wanted to do a very precise capture too, se we made that clock able
> to synchronize itself with the ConnectX 5 internal clock as a base
> instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
> nanosecond is impossible, but close enough. It is for that purpose that
> I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
> read the current clock (like rdtsc() instruction for TSC) to compute the
> frequency.

Doesn't this mean that you need to wait for the PCIe op from the NIC?
Is this really faster than a rdtsc, memory/cache read, integer
multiplication and shift?

Cheers,
nicolas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-26 16:00         ` Slava Ovsiienko
@ 2020-05-29 20:56           ` PATRICK KEROULAS
  2020-05-31 19:47             ` Slava Ovsiienko
  2020-06-02 19:18           ` PATRICK KEROULAS
  1 sibling, 1 reply; 14+ messages in thread
From: PATRICK KEROULAS @ 2020-05-29 20:56 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad

On Tue, May 26, 2020 at 12:00 PM Slava Ovsiienko <viacheslavo@mellanox.com>
wrote:
>> Hi, Patrick
>
> ConnectX HW timestamp is the captured value of internal 64-bit counter
running at the frequency,
> reported in the device_frequency_khz field of struct
mlx5_ifc_cmd_hca_cap_bits{}.
> This structure is queried in mlx5_devx_cmd_query_hca_attr() routine.
> So, with known frequency it is possible to recalculate timestamp ticks to
desired units.

Hello Slava,

Assuming that the NIC clock is already synced thanks to a PTP client,
does the bit counter give an absolute time value (0 => 1 January 1970
00:00:00)? Or do I need to calculate a time duration from the process
start time?

I just want to validate the path from mlx5 eth dev(Rx) to eth pcap (Tx) :
- query the oscillator frequency at the mlx5_eth_dev init step
  (mlx5_devx_cmd_query_hca_attr())
- store the freq with other hca_attr, carried by dev config which should
  be shared with the secondary process
- in eth_pcap_tx_dumper(), retrieve the freq from the dev given by
  mbuf->port
- convert all the incoming mbuf->timestamp using this freq whose
  variation should be negligible over the capture duration

Last question: what is your opinion about this other method?
https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201

Thanks a lot!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-29 20:56           ` PATRICK KEROULAS
@ 2020-05-31 19:47             ` Slava Ovsiienko
  0 siblings, 0 replies; 14+ messages in thread
From: Slava Ovsiienko @ 2020-05-31 19:47 UTC (permalink / raw)
  To: PATRICK KEROULAS
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad

Hi, Patrick

Please, see below.

>From: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca> 
>Sent: Friday, May 29, 2020 23:56
>To: Slava Ovsiienko <viacheslavo@mellanox.com>
>Cc: dev@dpdk.org; Vivien Didelot <vivien.didelot@gmail.com>; Shahaf Shuler <shahafs@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>; Matan Azrad ><matan@mellanox.com>
>Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
>
>
>On Tue, May 26, 2020 at 12:00 PM Slava Ovsiienko <mailto:viacheslavo@mellanox.com> wrote:
>>> Hi, Patrick
>>
>> ConnectX HW timestamp is the captured value of internal 64-bit counter running at the frequency,
>> reported in the device_frequency_khz field of struct mlx5_ifc_cmd_hca_cap_bits{}.
>> This structure is queried in mlx5_devx_cmd_query_hca_attr() routine.
>> So, with known frequency it is possible to recalculate timestamp ticks to desired units.
>
>Hello Slava,
>
>Assuming that the NIC clock is already synced thanks to a PTP client,
>does the bit counter give an absolute time value (0 => 1 January 1970
>00:00:00)? Or do I need to calculate a time duration from the process
>start time?
[SO]
I would not make any assumption about internal clock phase and its relation to time (UTC?).
I suppose the getting the initial value of clock counter and calculating the actual time at the app start is valid approach.

>I just want to validate the path from mlx5 eth dev(Rx) to eth pcap (Tx) :
>- query the oscillator frequency at the mlx5_eth_dev init step
>  (mlx5_devx_cmd_query_hca_attr())
>- store the freq with other hca_attr, carried by dev config which should
>  be shared with the secondary process
>- in eth_pcap_tx_dumper(), retrieve the freq from the dev given by
>  mbuf->port
>- convert all the incoming mbuf->timestamp using this freq whose
>  variation should be negligible over the capture duration
>
Should work OK, as for me.

>Last question: what is your opinion about this other method?
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flinux-rdma%2Frdma-core%2Fblob%2F7af01c79e00555207dee6132d72e7bfc1bb5485e%>2Fproviders%2Fmlx5%2Fmlx5dv.h%23L1201&data=02%7C01%7Cviacheslavo%40mellanox.com%7C81833b88026b4aa93ecb08d80412b902%>7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637263825741568283&sdata=dNr63ujwKDcWTCAAWO7und3B50kcmEFYxu01y2hoy%2Bw%3D&reserved=0
>
>Thanks a lot!
This code checks the counter periodically to track the counter wraparound and provides the older timestamp conversion (got before clock base update).
 If your have the stream of pkts with monotonically increasing timestamp you could track this counter wrap in your code (save the last ts conversion result and counter value).

With best regards, Slava


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-05-26 16:00         ` Slava Ovsiienko
  2020-05-29 20:56           ` PATRICK KEROULAS
@ 2020-06-02 19:18           ` PATRICK KEROULAS
  2020-06-03  7:48             ` Slava Ovsiienko
  1 sibling, 1 reply; 14+ messages in thread
From: PATRICK KEROULAS @ 2020-06-02 19:18 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad

On Tue, May 26, 2020 at 12:00 PM Slava Ovsiienko
<viacheslavo@mellanox.com> wrote:
> ConnectX HW timestamp is the captured value of internal 64-bit counter running at the frequency,
> reported in the device_frequency_khz field of struct mlx5_ifc_cmd_hca_cap_bits{}.
> This structure is queried in mlx5_devx_cmd_query_hca_attr() routine.

I can't query this because "DevX is NOT supported".
As a matter of fact, mlx5dv_open_device() returns NULL.
Not sure if this limitation comes from HW/firmware config or capability.
* rdma-code, libibverbs-dev: 28.0
* NIC Part Number:      MCX516A-CDA_Ax
* ConnectX-5 Ex EN
* FW: 16.25.1020

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-06-02 19:18           ` PATRICK KEROULAS
@ 2020-06-03  7:48             ` Slava Ovsiienko
  2020-06-05  0:09               ` PATRICK KEROULAS
  0 siblings, 1 reply; 14+ messages in thread
From: Slava Ovsiienko @ 2020-06-03  7:48 UTC (permalink / raw)
  To: PATRICK KEROULAS
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad


> -----Original Message-----
> From: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca>
> Sent: Tuesday, June 2, 2020 22:18
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Vivien Didelot <vivien.didelot@gmail.com>; Shahaf
> Shuler <shahafs@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Matan Azrad <matan@mellanox.com>
> Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to
> nanoseconds
> 
> On Tue, May 26, 2020 at 12:00 PM Slava Ovsiienko
> <viacheslavo@mellanox.com> wrote:
> > ConnectX HW timestamp is the captured value of internal 64-bit counter
> > running at the frequency, reported in the device_frequency_khz field of
> struct mlx5_ifc_cmd_hca_cap_bits{}.
> > This structure is queried in mlx5_devx_cmd_query_hca_attr() routine.
> 
> I can't query this because "DevX is NOT supported".
> As a matter of fact, mlx5dv_open_device() returns NULL.
> Not sure if this limitation comes from HW/firmware config or capability.
> * rdma-code, libibverbs-dev: 28.0
> * NIC Part Number:      MCX516A-CDA_Ax
> * ConnectX-5 Ex EN
> * FW: 16.25.1020

It looks like outdated firmware, please:
- update the firmware - at least 16.27.2008 is GA. I would recommend to install OFED - it updates the FW
- make sure the UCTX_EN option in FW configuration is set to "true"

With best regards, Slava


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-06-03  7:48             ` Slava Ovsiienko
@ 2020-06-05  0:09               ` PATRICK KEROULAS
  2020-06-05 16:30                 ` Slava Ovsiienko
  0 siblings, 1 reply; 14+ messages in thread
From: PATRICK KEROULAS @ 2020-06-05  0:09 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad

On Wed, Jun 3, 2020 at 3:48 AM Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
>
> > From: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca>
> > * rdma-code, libibverbs-dev: 28.0
> > * NIC Part Number:      MCX516A-CDA_Ax
> > * ConnectX-5 Ex EN
> > * FW: 16.25.1020
>
> It looks like outdated firmware, please:
> - update the firmware - at least 16.27.2008 is GA. I would recommend to install OFED - it updates the FW
> - make sure the UCTX_EN option in FW configuration is set to "true"

Hello Slava,

I managed to query device_frequency_khz by simply setting UCTX_EN=1,
convert the mbuf->timestamp to nsec and write a pcap. However, the
accuracy is quite disappointing, compared to libvma or even SW TS.

The freq value looks constant (=78125kHz). Correct me if I'm wrong, a
ptp client is supposed to continuously adjust some kind of VCO on the
NIC. And even setting a crazy value through /dev/ptp interface manually
doesn't affect device_frequency_khz. Please could you clarify?

This leads me back to mlx5dv_clock_info->nsec. If this is a valid method,
I think the only missing piece is to access it from the secondary process,
which implies to share ibv_context.

Best Regards,

PK

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
  2020-06-05  0:09               ` PATRICK KEROULAS
@ 2020-06-05 16:30                 ` Slava Ovsiienko
  0 siblings, 0 replies; 14+ messages in thread
From: Slava Ovsiienko @ 2020-06-05 16:30 UTC (permalink / raw)
  To: PATRICK KEROULAS
  Cc: dev, Vivien Didelot, Shahaf Shuler, Raslan Darawsheh, Matan Azrad

> -----Original Message-----
> From: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca>
> Sent: Friday, June 5, 2020 3:10
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Vivien Didelot <vivien.didelot@gmail.com>; Shahaf
> Shuler <shahafs@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Matan Azrad <matan@mellanox.com>
> Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to
> nanoseconds
> 
> On Wed, Jun 3, 2020 at 3:48 AM Slava Ovsiienko
> <viacheslavo@mellanox.com> wrote:
> >
> > > From: PATRICK KEROULAS <patrick.keroulas@radio-canada.ca>
> > > * rdma-code, libibverbs-dev: 28.0
> > > * NIC Part Number:      MCX516A-CDA_Ax
> > > * ConnectX-5 Ex EN
> > > * FW: 16.25.1020
> >
> > It looks like outdated firmware, please:
> > - update the firmware - at least 16.27.2008 is GA. I would recommend
> > to install OFED - it updates the FW
> > - make sure the UCTX_EN option in FW configuration is set to "true"
> 
> Hello Slava,
> 
> I managed to query device_frequency_khz by simply setting UCTX_EN=1,
> convert the mbuf->timestamp to nsec and write a pcap. However, the
> accuracy is quite disappointing, compared to libvma or even SW TS.
> 
> The freq value looks constant (=78125kHz). Correct me if I'm wrong, a ptp
> client is supposed to continuously adjust some kind of VCO on the NIC. And

AFAIK, it is not the case for ConnectX-5, no clock adjustment, just some free running counter.
ConnectX6DX  will provide an option of adjustable nanosecond UTC in timestamps.

> even setting a crazy value through /dev/ptp interface manually doesn't affect
> device_frequency_khz. Please could you clarify?
> 
> This leads me back to mlx5dv_clock_info->nsec. If this is a valid method, I
> think the only missing piece is to access it from the secondary process, which
> implies to share ibv_context.
> 
> Best Regards,
> 
> PK

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-05 16:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 18:20 [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds PATRICK KEROULAS
2020-05-21 15:33 ` Thomas Monjalon
2020-05-21 19:57   ` PATRICK KEROULAS
2020-05-21 20:09     ` Thomas Monjalon
2020-05-22 18:43       ` PATRICK KEROULAS
2020-05-26  7:44         ` Tom Barbette
2020-05-29 20:46           ` N. Benes
2020-05-26 16:00         ` Slava Ovsiienko
2020-05-29 20:56           ` PATRICK KEROULAS
2020-05-31 19:47             ` Slava Ovsiienko
2020-06-02 19:18           ` PATRICK KEROULAS
2020-06-03  7:48             ` Slava Ovsiienko
2020-06-05  0:09               ` PATRICK KEROULAS
2020-06-05 16:30                 ` Slava Ovsiienko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).