From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-3.sys.kth.se (smtp-3.sys.kth.se [130.237.48.192]) by dpdk.org (Postfix) with ESMTP id EA67A2BD5 for ; Wed, 8 May 2019 09:50:03 +0200 (CEST) Received: from smtp-3.sys.kth.se (localhost.localdomain [127.0.0.1]) by smtp-3.sys.kth.se (Postfix) with ESMTP id 91A8B9730; Wed, 8 May 2019 09:50:03 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-3.sys.kth.se ([127.0.0.1]) by smtp-3.sys.kth.se (smtp-3.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id olguUk75k4dl; Wed, 8 May 2019 09:50:01 +0200 (CEST) X-KTH-Auth: barbette [130.237.20.142] DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kth.se; s=default; t=1557301801; bh=7hMkxoKCLRISBO4UODUZj7AIPfFPX+bS63mcBaff2JE=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=Lsckg3FO38JkzmISi6wzhkVIhcwviYC5ImrU+t0C8+/a1jcG9mBsgM7vkuCDm8WAA J9As2SvucrGV0imBoZA8wBI7EldrmKVzgygH1MFVXZZ/f4ImkDWI49uWo9ZBexfnCb TlKOLCCrR0Wq+LwWiPoNu+XXBvcvi2tIZo2A69/A= X-KTH-mail-from: barbette@kth.se Received: from [130.237.20.142] (s2587.it.kth.se [130.237.20.142]) by smtp-3.sys.kth.se (Postfix) with ESMTPSA id B68B09737; Wed, 8 May 2019 09:49:59 +0200 (CEST) To: dev@dpdk.org Cc: bruce.richardson@intel.com, john.mcnamara@intel.com, Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , Shahaf Shuler , Yongseok Koh , olivier.matz@6wind.com References: <20190502121135.18775-1-barbette@kth.se> From: Tom Barbette Message-ID: <2715295a-43fa-cc8b-8271-ded8b237501a@kth.se> Date: Wed, 8 May 2019 09:49:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190502121135.18775-1-barbette@kth.se> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 May 2019 07:50:04 -0000 Maybe a (last) motivation point. We just did a 100G link traffic capture with time-stamping of all packets in HW using a Mellanox CX5. SW time-stamping fails to reveal queueing delays, and as multi-queue is needed for writing 100G traffic to multiple NVMe drives, does not allow to recover the original ordering mixed by multi-queuing. Here, we timestamped traffic in hardware (FYI, given in ticks of the internal CX5 clock, not in unit of time), and thanks to the new API, converted it to real time value (through frequency + base). But precision is not the only improvement. As DPDK is userlevel, calling get_timeofday for millions of packets pretty much kills the capture. Here we do a simple math per packet to convert the packet's timestamp in ticks to the real clock time, (very) much cheaper than even a vDSO syscall. Tom On 2019-05-02 14:11, Tom Barbette wrote: > Some NICs allow to timestamp packets, but do not support the full > PTP synchronization process. Hence, the value set in the mbuf > timestamp field is only the raw value of an internal clock. > > To make sense of this value, one at least needs to be able to query > the current hardware clock value. This patch series adds a new API to do > so, rte_eth_read_clock. As with the TSC, from there > a frequency can be derieved by querying multiple time the current value of the > internal clock with some known delay between the queries (example > provided in the API doc). > > This patch series adds support of read_clock for MLX5. > > An example app is provided in the rxtx_callback application. > It has been updated to display, on top of the software latency > in cycles, the total latency since the packet was received in hardware. > The API is used to compute a delta in the Tx callback. The raw amount of > ticks is converted to cycles using a variation of the technique describe above. > > Aside from offloading timestamping, which relieve the > software from a few operations, this allows to get much more precision > when studying the source of the latency in a system. > Eg. in our 100G, CX5 setup the rxtx callback application shows > SW latency is around 74 cycles (TSC is 3.2Ghz), but the latency > including NIC processing, PCIe, and queuing is around 196 cycles. > > One may think at first this API is overlapping with te_eth_timesync_read_time. > rte_eth_timesync_read_time is clearly identified as part of a set of functions > to use PTP synchronization. > The device raw clock is not "sync" in any way. More importantly, the returned > value is not a timeval, but an amount of ticks. We could have a cast-based > solution, but on top of being an ugly solution, some people seeing the timeval > type of rte_eth_timesync_read_time could use it blindly. > > Change in v2: > - Rebase on current master > > Change in v3: > - Address comments from Ferruh Yigit > > Changes in v4: > - Address comments from Keith Wiles and Andrew Rybchenko > - Use "clock" as argunment name everywhere. > - Expand the API description to make clear that read_clock gives an > amount in ticks, and that it has no unit. > > Tom Barbette (3): > rte_ethdev: Add API function to read dev clock > mlx5: Implement support for read_clock > rxtx_callbacks: Add support for HW timestamp > > doc/guides/nics/features.rst | 1 + > doc/guides/sample_app_ug/rxtx_callbacks.rst | 9 ++- > drivers/net/mlx5/mlx5.c | 1 + > drivers/net/mlx5/mlx5.h | 1 + > drivers/net/mlx5/mlx5_ethdev.c | 30 +++++++ > drivers/net/mlx5/mlx5_glue.c | 8 ++ > drivers/net/mlx5/mlx5_glue.h | 2 + > examples/rxtx_callbacks/Makefile | 3 + > examples/rxtx_callbacks/main.c | 87 ++++++++++++++++++++- > examples/rxtx_callbacks/meson.build | 3 + > lib/librte_ethdev/rte_ethdev.c | 12 +++ > lib/librte_ethdev/rte_ethdev.h | 47 +++++++++++ > lib/librte_ethdev/rte_ethdev_core.h | 6 ++ > lib/librte_ethdev/rte_ethdev_version.map | 1 + > lib/librte_mbuf/rte_mbuf.h | 2 + > 15 files changed, 208 insertions(+), 5 deletions(-) > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id D9D21A0096 for ; Wed, 8 May 2019 09:50:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EDBCB34F0; Wed, 8 May 2019 09:50:05 +0200 (CEST) Received: from smtp-3.sys.kth.se (smtp-3.sys.kth.se [130.237.48.192]) by dpdk.org (Postfix) with ESMTP id EA67A2BD5 for ; Wed, 8 May 2019 09:50:03 +0200 (CEST) Received: from smtp-3.sys.kth.se (localhost.localdomain [127.0.0.1]) by smtp-3.sys.kth.se (Postfix) with ESMTP id 91A8B9730; Wed, 8 May 2019 09:50:03 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-3.sys.kth.se ([127.0.0.1]) by smtp-3.sys.kth.se (smtp-3.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id olguUk75k4dl; Wed, 8 May 2019 09:50:01 +0200 (CEST) X-KTH-Auth: barbette [130.237.20.142] DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kth.se; s=default; t=1557301801; bh=7hMkxoKCLRISBO4UODUZj7AIPfFPX+bS63mcBaff2JE=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=Lsckg3FO38JkzmISi6wzhkVIhcwviYC5ImrU+t0C8+/a1jcG9mBsgM7vkuCDm8WAA J9As2SvucrGV0imBoZA8wBI7EldrmKVzgygH1MFVXZZ/f4ImkDWI49uWo9ZBexfnCb TlKOLCCrR0Wq+LwWiPoNu+XXBvcvi2tIZo2A69/A= X-KTH-mail-from: barbette@kth.se Received: from [130.237.20.142] (s2587.it.kth.se [130.237.20.142]) by smtp-3.sys.kth.se (Postfix) with ESMTPSA id B68B09737; Wed, 8 May 2019 09:49:59 +0200 (CEST) To: dev@dpdk.org Cc: bruce.richardson@intel.com, john.mcnamara@intel.com, Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , Shahaf Shuler , Yongseok Koh , olivier.matz@6wind.com References: <20190502121135.18775-1-barbette@kth.se> From: Tom Barbette Message-ID: <2715295a-43fa-cc8b-8271-ded8b237501a@kth.se> Date: Wed, 8 May 2019 09:49:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190502121135.18775-1-barbette@kth.se> Content-Type: text/plain; charset="UTF-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190508074959.0UXA6L7IJkMQa7GrZ9V9QB_sDawUzxp0WQPM2LUk8EU@z> Maybe a (last) motivation point. We just did a 100G link traffic capture with time-stamping of all packets in HW using a Mellanox CX5. SW time-stamping fails to reveal queueing delays, and as multi-queue is needed for writing 100G traffic to multiple NVMe drives, does not allow to recover the original ordering mixed by multi-queuing. Here, we timestamped traffic in hardware (FYI, given in ticks of the internal CX5 clock, not in unit of time), and thanks to the new API, converted it to real time value (through frequency + base). But precision is not the only improvement. As DPDK is userlevel, calling get_timeofday for millions of packets pretty much kills the capture. Here we do a simple math per packet to convert the packet's timestamp in ticks to the real clock time, (very) much cheaper than even a vDSO syscall. Tom On 2019-05-02 14:11, Tom Barbette wrote: > Some NICs allow to timestamp packets, but do not support the full > PTP synchronization process. Hence, the value set in the mbuf > timestamp field is only the raw value of an internal clock. > > To make sense of this value, one at least needs to be able to query > the current hardware clock value. This patch series adds a new API to do > so, rte_eth_read_clock. As with the TSC, from there > a frequency can be derieved by querying multiple time the current value of the > internal clock with some known delay between the queries (example > provided in the API doc). > > This patch series adds support of read_clock for MLX5. > > An example app is provided in the rxtx_callback application. > It has been updated to display, on top of the software latency > in cycles, the total latency since the packet was received in hardware. > The API is used to compute a delta in the Tx callback. The raw amount of > ticks is converted to cycles using a variation of the technique describe above. > > Aside from offloading timestamping, which relieve the > software from a few operations, this allows to get much more precision > when studying the source of the latency in a system. > Eg. in our 100G, CX5 setup the rxtx callback application shows > SW latency is around 74 cycles (TSC is 3.2Ghz), but the latency > including NIC processing, PCIe, and queuing is around 196 cycles. > > One may think at first this API is overlapping with te_eth_timesync_read_time. > rte_eth_timesync_read_time is clearly identified as part of a set of functions > to use PTP synchronization. > The device raw clock is not "sync" in any way. More importantly, the returned > value is not a timeval, but an amount of ticks. We could have a cast-based > solution, but on top of being an ugly solution, some people seeing the timeval > type of rte_eth_timesync_read_time could use it blindly. > > Change in v2: > - Rebase on current master > > Change in v3: > - Address comments from Ferruh Yigit > > Changes in v4: > - Address comments from Keith Wiles and Andrew Rybchenko > - Use "clock" as argunment name everywhere. > - Expand the API description to make clear that read_clock gives an > amount in ticks, and that it has no unit. > > Tom Barbette (3): > rte_ethdev: Add API function to read dev clock > mlx5: Implement support for read_clock > rxtx_callbacks: Add support for HW timestamp > > doc/guides/nics/features.rst | 1 + > doc/guides/sample_app_ug/rxtx_callbacks.rst | 9 ++- > drivers/net/mlx5/mlx5.c | 1 + > drivers/net/mlx5/mlx5.h | 1 + > drivers/net/mlx5/mlx5_ethdev.c | 30 +++++++ > drivers/net/mlx5/mlx5_glue.c | 8 ++ > drivers/net/mlx5/mlx5_glue.h | 2 + > examples/rxtx_callbacks/Makefile | 3 + > examples/rxtx_callbacks/main.c | 87 ++++++++++++++++++++- > examples/rxtx_callbacks/meson.build | 3 + > lib/librte_ethdev/rte_ethdev.c | 12 +++ > lib/librte_ethdev/rte_ethdev.h | 47 +++++++++++ > lib/librte_ethdev/rte_ethdev_core.h | 6 ++ > lib/librte_ethdev/rte_ethdev_version.map | 1 + > lib/librte_mbuf/rte_mbuf.h | 2 + > 15 files changed, 208 insertions(+), 5 deletions(-) >