From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3C7E4A00C5; Fri, 29 May 2020 22:46:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E421F1D9A9; Fri, 29 May 2020 22:46:55 +0200 (CEST) Received: from hqmgw2.hq.eso.org (hqmgw2.hq.eso.org [134.171.42.202]) by dpdk.org (Postfix) with ESMTP id 85AF51D9A8 for ; Fri, 29 May 2020 22:46:54 +0200 (CEST) X-IronPort-AV: E=Sophos;i="5.73,450,1583190000"; d="scan'208";a="56666349" Received: from mxroute02.hq.eso.org ([134.171.15.34]) by hqmgw2.hq.eso.org with ESMTP; 29 May 2020 22:46:47 +0200 Received: from mxadauth01.hq.eso.org (mxadauth01.hq.eso.org [134.171.42.78]) by mxroute02.hq.eso.org (Postfix) with ESMTP id 211B3120005; Fri, 29 May 2020 22:46:47 +0200 (CEST) Received: from [127.0.0.1] (hq-000-kemp01.hq.eso.org [134.171.42.9]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nbenes) by mxadauth01.hq.eso.org (Postfix) with ESMTPSA id 0CF4B100D32; Fri, 29 May 2020 22:46:47 +0200 (CEST) To: dev@dpdk.org References: <3986292.sUUuQTochr@thomas> <2766519.SlWGiteSXv@thomas> <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se> From: "N. Benes" Openpgp: preference=signencrypt Message-ID: <660c0552-ded2-7454-c6fc-43db9844f06c@eso.org> Date: Fri, 29 May 2020 20:46:00 +0000 MIME-Version: 1.0 In-Reply-To: <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi everyone, Tom Barbette: > > Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit : >>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the >>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested >>>>>> it in the >>>>>> device/port init routine and the result looks reliable. Since this >>>>>> approach >>>>>> looks very simple, compared to the time sync mechanism, I'm trying to >>>>>> integrate. >>>>>> >>>>>> The conversion should occur in the primary process (testpmd) I >>>>>> suppose. >>>>>> 1) The needed clock info derives from ethernet device. Is it >>>>>> possible to >>>>>>     access that struct from a rx callback? >>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it? >>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.) >>>>>> 3) any other idea? >>>>> The timestamp is carried in mbuf. >>>>> Then the conversion must be done by the ethdev caller (application or >>>>> any other upper layer). >>>> What if the converter function needs a clock_info? >>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201 >>>> >>>> I'm affraid this info may change by the time the converter is called >>>> by upper layer. >>> Indeed, the clock in the device is not an atomic one :) >>> We need to adjust the time conversion continuously. >>> I am not an expert of time synchronization, so I add more people Cc >>> who could help for having a precise timestamp. >> Thanks Thomas. >> Not sure this is a synchronization issue. We have dedicated processes >> (linuxptp) to keep both NIC and sys clocks in sync with an external >> clock. >> It is "just" a matter of unit conversion. >> >> If it has to be performed in dpdk-pdump, I would need some help to >> retrieve mlx5_clock_info from inside a secondary process. Looking at >> mlx5_read_clock(), this info is extracted from ibv_context which looks >> reachable in a primary process only (segfault, if I try in pdump). The normal phc2sys can not only synchronise NIC -> system but also sys -> NIC and (I believe it does but have not tried) NIC1 -> NIC2. If I understand your proposal correctly, you want to use a free running NIC counter and calibrate out the drift afterwards. It may be easier to adapt phc2sys to use a NIC through DPDK and sync the NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering excursions). Then you could directly use the NIC counter value. > I don't know about the integrated ts-to-ns, but we implemented a > translation mechanism that mimics what NTP does in Linux to translate a > given clock (TSC at first) to a wall time. You'll find more info at > https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter > 3.4.1.  This is an often forgotten matter, as we saw in real switches > that the time spent in time-related VDSO is enormous. Do you have measurements of vDSO clock_gettime and how much is "enormous" to you? To my knowledge, clock_gettime via vDSO on Linux only takes a few nanoseconds in the average case. However, it can go up to ~10 or even ~50 microseconds every few (~10) seconds, depending on the number of CPUs (for example single vs. dual socket, though my hardware for this test is quite old, Dell R210-II, R610). Presumably this is when the kernel locks the struct in VVAR to update the TSC drift compensation parameters. Linux clock_gettime implementation is here (different versions): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98 I use busy waiting on clock_gettime in a packet generator application (so far 10 GbE only) to pace jumbo frames according to a spec (simulating the traffic pattern of a to-be-developed hardware with FPGA), and COTS sniffer hardware with absolute timestamping to verify my generator's performance. I can observe the above 10-50 us artefacts and sufficiently good/low (for my needs) average execution time of clock_gettime. The only sad thing is that TAI clock does not go through vDSO and therefore I cannot use it. > We wanted to do a very precise capture too, se we made that clock able > to synchronize itself with the ConnectX 5 internal clock as a base > instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure > nanosecond is impossible, but close enough. It is for that purpose that > I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to > read the current clock (like rdtsc() instruction for TSC) to compute the > frequency. Doesn't this mean that you need to wait for the PCIe op from the NIC? Is this really faster than a rdtsc, memory/cache read, integer multiplication and shift? Cheers, nicolas