DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] MLX5 should define the timestamp field in the doc
@ 2018-09-04 14:26 Tom Barbette
  2018-09-05  5:29 ` Shahaf Shuler
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Barbette @ 2018-09-04 14:26 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, john.mcnamara, marko.kovacevic

Hi all,


As far as I know, MLX5 is the only driver to support hardware timestamping.


It would be great to update the doc to explain what the hardware timestamp is supposed to be. If it's nanoseconds, then just a shift regarding system time is enough ? Does it also need a multiplication? Can we query that from hardware? Or provide a piece of code to be used. As it, the feature is useless...


It would be interesting to normalize the hardware timestamping. I guess for any driver, an offset and a multiplication(shift+multiplier eventually) would be enough, and the API should be updated to provide a function to convert a hardware timestamp to software (or that should be part of the driver and done automatically if offloading is enabled?) and probably one to initialize the time, much like the Linux one at https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c#L488 .


Thanks,


Tom

?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-04 14:26 [dpdk-dev] MLX5 should define the timestamp field in the doc Tom Barbette
@ 2018-09-05  5:29 ` Shahaf Shuler
  2018-09-05  7:11   ` Tom Barbette
  0 siblings, 1 reply; 8+ messages in thread
From: Shahaf Shuler @ 2018-09-05  5:29 UTC (permalink / raw)
  To: Tom Barbette, dev; +Cc: Yongseok Koh, john.mcnamara, marko.kovacevic

Hi Tom,

Tuesday, September 4, 2018 5:26 PM, Tom Barbette:
>

>Hi all,

>

>As far as I know, MLX5 is the only driver to support hardware timestamping.

>

>It would be great to update the doc to explain what the hardware timestamp is supposed to be.



We probably need to update the doc, but the current timestamp support is very basic. It just provide a raw timestamp from the NIC running clock. The granularity of each “tick” is undefined.

It is not yet the full IEEE1588 you seek to put all the events in the system on a single time axis.



>If it's nanoseconds, then just a shift regarding system time is enough ? Does it also need a >multiplication? Can we query that from hardware? Or provide a piece of code to be used. As it, the feature >is useless...



Unfortunately we lack the APIs to query the offset and multiplication factor from the device/kernel. This is why the current feature is very basic.

It can be used only to understand the order of the events at the NIC (for example if one packet passed the NIC before the other), however one cannot understand the absolute diff between 2 events in a common clock granularity.





>

>It would be interesting to normalize the hardware timestamping. I guess for any driver, an offset and a multiplication(shift+multiplier eventually) would be enough, and the API should be updated to >provide a function to convert a hardware timestamp to software (or that should be part of the driver and done automatically if offloading is enabled?) and probably one to initialize the time, much >like the Linux one at https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c#L488<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.18.5%2Fsource%2Fdrivers%2Fnet%2Fethernet%2Fmellanox%2Fmlx5%2Fcore%2Flib%2Fclock.c%23L488&data=02%7C01%7Cshahafs%40mellanox.com%7Ce1623f43e33c4c9e2a3308d612726753%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636716679902892854&sdata=Xus3oma6Rig2AAxrBCUNgSvWx1NSYSbiFO0ei%2Bz0R40%3D&reserved=0> .



Of course, when we will have the APIs to support it, the PMD will do the conversion into the real clock units, no need for the application to do it by itself.



It will be interesting to me to completely understand your use case in order to evaluate the need and priority for such API.



>

>Thanks,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-05  5:29 ` Shahaf Shuler
@ 2018-09-05  7:11   ` Tom Barbette
  2018-09-05  8:18     ` Shahaf Shuler
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Barbette @ 2018-09-05  7:11 UTC (permalink / raw)
  To: Shahaf Shuler, dev; +Cc: Yongseok Koh, john.mcnamara, marko.kovacevic

Thanks for your answer Shahaf !


We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.


We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.


I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.

?

Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.


Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) ?from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.?


Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-05  7:11   ` Tom Barbette
@ 2018-09-05  8:18     ` Shahaf Shuler
  2018-09-05  9:00       ` Tom Barbette
  0 siblings, 1 reply; 8+ messages in thread
From: Shahaf Shuler @ 2018-09-05  8:18 UTC (permalink / raw)
  To: Tom Barbette, dev, Alex Rosenbaum
  Cc: Yongseok Koh, john.mcnamara, marko.kovacevic

Thanks for the details.

The use case is clear. We will take it internally to see when we can support it.
AFAIK we cannot read the internal time from userspace.

Adding also AlexR to comment

From: Tom Barbette <barbette@kth.se>
Sent: Wednesday, September 5, 2018 10:11 AM
To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
Cc: Yongseok Koh <yskoh@mellanox.com>; john.mcnamara@intel.com; marko.kovacevic@intel.com
Subject: RE: MLX5 should define the timestamp field in the doc


Thanks for your answer Shahaf !



We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.



We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.



I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.

​

Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.



Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) ​from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.​



Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-05  8:18     ` Shahaf Shuler
@ 2018-09-05  9:00       ` Tom Barbette
  2018-09-06  9:07         ` Shahaf Shuler
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Barbette @ 2018-09-05  9:00 UTC (permalink / raw)
  To: Shahaf Shuler, dev, Alex Rosenbaum
  Cc: Yongseok Koh, john.mcnamara, marko.kovacevic

Actually I managed this patch to implement support for rte_eth_timesync_read_time.


Please tell me potential modifications, and if I shall submit it again as a "normal" patch to dev ?


---
 drivers/net/mlx5/mlx5.c        |  1 +
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_ethdev.c | 30 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_glue.c   |  8 ++++++++
 drivers/net/mlx5/mlx5_glue.h   |  2 ++
 5 files changed, 42 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c933e27..8c34794 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -324,6 +324,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
  .xstats_reset = mlx5_xstats_reset,
  .xstats_get_names = mlx5_xstats_get_names,
  .dev_infos_get = mlx5_dev_infos_get,
+ .timesync_read_time = mlx5_timesync_read_time,
  .dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
  .vlan_filter_set = mlx5_vlan_filter_set,
  .rx_queue_setup = mlx5_rx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 997b04a..5747304 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -217,6 +217,7 @@ int mlx5_set_flags(struct rte_eth_dev *dev, unsigned int keep,
     unsigned int flags);
 int mlx5_dev_configure(struct rte_eth_dev *dev);
 void mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
+int mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec* time);
 const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev);
 int mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete);
 int mlx5_force_link_status_change(struct rte_eth_dev *dev, int status);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 90488af..b7f0d91 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -480,6 +480,36 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 }

 /**
+ * Get device current time
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @param[out] time
+ *   Time output value.
+ *
+ * @return
+ *   0 if the time has correctly been set
+ */
+int
+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
+{
+    struct priv *priv = dev->data->dev_private;
+    struct ibv_values_ex values;
+    int err = 0;
+
+    values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
+    if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {
+ DRV_LOG(WARNING, "Could not query time !");
+        return err;
+    }
+
+    *time = values.raw_clock;
+    return 0;
+}
+
+
+/**
  * Get supported packet types.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index c7965e5..3c72f5b 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
 }

 static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+   struct ibv_values_ex* values)
+{
+ return ibv_query_rt_values_ex(context, values);
+}
+
+static int
 mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
       struct ibv_port_attr *port_attr)
 {
@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
  .close_device = mlx5_glue_close_device,
  .query_device = mlx5_glue_query_device,
  .query_device_ex = mlx5_glue_query_device_ex,
+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
  .query_port = mlx5_glue_query_port,
  .create_comp_channel = mlx5_glue_create_comp_channel,
  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index e584d36..0582e95 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -54,6 +54,8 @@ struct mlx5_glue {
  int (*query_device_ex)(struct ibv_context *context,
         const struct ibv_query_device_ex_input *input,
         struct ibv_device_attr_ex *attr);
+ int (*query_rt_values_ex)(struct ibv_context *context,
+        struct ibv_values_ex *values);
  int (*query_port)(struct ibv_context *context, uint8_t port_num,
    struct ibv_port_attr *port_attr);
  struct ibv_comp_channel *(*create_comp_channel)
--
2.7.4





________________________________
De : Shahaf Shuler <shahafs@mellanox.com>
Envoyé : mercredi 5 septembre 2018 10:18
À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
Cc : Yongseok Koh; john.mcnamara@intel.com; marko.kovacevic@intel.com
Objet : RE: MLX5 should define the timestamp field in the doc

Thanks for the details.

The use case is clear. We will take it internally to see when we can support it.
AFAIK we cannot read the internal time from userspace.

Adding also AlexR to comment

From: Tom Barbette <barbette@kth.se>
Sent: Wednesday, September 5, 2018 10:11 AM
To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
Cc: Yongseok Koh <yskoh@mellanox.com>; john.mcnamara@intel.com; marko.kovacevic@intel.com
Subject: RE: MLX5 should define the timestamp field in the doc


Thanks for your answer Shahaf !



We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.



We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.



I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.

?

Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.



Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) ?from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.?



Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-05  9:00       ` Tom Barbette
@ 2018-09-06  9:07         ` Shahaf Shuler
  2018-09-06  9:33           ` Tom Barbette
  0 siblings, 1 reply; 8+ messages in thread
From: Shahaf Shuler @ 2018-09-06  9:07 UTC (permalink / raw)
  To: Tom Barbette, dev, Alex Rosenbaum
  Cc: Yongseok Koh, john.mcnamara, marko.kovacevic

Wednesday, September 5, 2018 12:00 PM, Tom Barbette:
>Actually I managed this patch to implement support for rte_eth_timesync_read_time.

I am not fully familiar w/ this API, but it looks like the timespec returned from this call is expected to be in real time values (i.e. seconds and nano seconds),
at least this is what I see on the ptpclient example on the DPDK tree.

>
>Please tell me potential modifications, and if I shall submit it again as a "normal" patch to dev ?
>
>---

[...]

> }
>
> /**
>+ * Get device current time
>+ *
>+ * @param dev
>+ *   Pointer to Ethernet device structure.
>+ *
>+ * @param[out] time
>+ *   Time output value.
>+ *
>+ * @return
>+ *   0 if the time has correctly been set
>+ */
>+int
>+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
>+{
>+    struct priv *priv = dev->data->dev_private;
>+    struct ibv_values_ex values;
>+    int err = 0;
>+
>+    values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
>+    if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {

The use of this function will not bring you the outcome the API defines.
see the man page of ibv_query_rt_values_ex:
struct ibv_values_ex {
        uint32_t             comp_mask;    /* Compatibility mask that defines the query/queried fields [in/out]

        struct timespec      raw_clock;    /* HW raw clock */
};

enum ibv_values_mask {
        IBV_VALUES_MASK_RAW_CLOCK = 1 << 0, /* HW raw clock */
};

The output is the HW raw clock (just like you have in the mbuf).

In order it to work the application needs to understand the PTP coefficients for the raw->real time conversion. this can be done, just need some more work.
do you have a ptp daemon implemented to calc the coefficients?

>+ DRV_LOG(WARNING, "Could not query time !");
>+        return err;
>+    }
>+
>+    *time = values.raw_clock;
>+    return 0;
>+}
>+
>+
>+/**
>  * Get supported packet types.
>  *
>  * @param dev
>diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
>index c7965e5..3c72f5b 100644
>--- a/drivers/net/mlx5/mlx5_glue.c
>+++ b/drivers/net/mlx5/mlx5_glue.c
>@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
> }
>
> static int
>+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
>+   struct ibv_values_ex* values)
>+{
>+ return ibv_query_rt_values_ex(context, values);
>+}
>+
>+static int
> mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
>       struct ibv_port_attr *port_attr)
> {
>@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
>  .close_device = mlx5_glue_close_device,
>  .query_device = mlx5_glue_query_device,
>  .query_device_ex = mlx5_glue_query_device_ex,
>+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
>  .query_port = mlx5_glue_query_port,
>  .create_comp_channel = mlx5_glue_create_comp_channel,
>  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
>diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
>index e584d36..0582e95 100644
>--- a/drivers/net/mlx5/mlx5_glue.h
>+++ b/drivers/net/mlx5/mlx5_glue.h
>@@ -54,6 +54,8 @@ struct mlx5_glue {
>  int (*query_device_ex)(struct ibv_context *context,
>         const struct ibv_query_device_ex_input *input,
>         struct ibv_device_attr_ex *attr);
>+ int (*query_rt_values_ex)(struct ibv_context *context,
>+        struct ibv_values_ex *values);
>  int (*query_port)(struct ibv_context *context, uint8_t port_num,
>    struct ibv_port_attr *port_attr);
>  struct ibv_comp_channel *(*create_comp_channel)
>--
>2.7.4
>
>
>________________________________________
>De : Shahaf Shuler <shahafs@mellanox.com>
>Envoyé : mercredi 5 septembre 2018 10:18
>À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
>Cc : Yongseok Koh; john.mcnamara@intel.com; marko.kovacevic@intel.com
>Objet : RE: MLX5 should define the timestamp field in the doc
>
>Thanks for the details.
>
>The use case is clear. We will take it internally to see when we can support it.
>AFAIK we cannot read the internal time from userspace.
>
>Adding also AlexR to comment
>
>From: Tom Barbette <barbette@kth.se>
>Sent: Wednesday, September 5, 2018 10:11 AM
>To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
>Cc: Yongseok Koh <yskoh@mellanox.com>; john.mcnamara@intel.com; marko.kovacevic@intel.com
>Subject: RE: MLX5 should define the timestamp field in the doc
>
>Thanks for your answer Shahaf !
>
>We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.
>
>We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.
>
>I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.
>
>Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.
>
>Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.
>
>Tom



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-06  9:07         ` Shahaf Shuler
@ 2018-09-06  9:33           ` Tom Barbette
  2018-09-06 14:21             ` Shahaf Shuler
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Barbette @ 2018-09-06  9:33 UTC (permalink / raw)
  To: Shahaf Shuler, dev, Alex Rosenbaum
  Cc: Yongseok Koh, john.mcnamara, marko.kovacevic, thomas

​It's true that it is a little bit a distortion of the original purpose. Here I want to query the time from the device (ie, the device's current clock). Maybe a new function in the API would be more suited?  CCing Thomas Mojalon for that part of the discussion.


I guess there is a case to query the device's timestamp to make our own precise time computations.


I also just saw that patch from two years ago that did not made it to the main branch : http://mails.dpdk.org/archives/dev/2016-October/048810.html​ , I guess it's because it is approximative in the time computation instead of a real synchronization? But now timestamp is in the rte_mbuf, so it could also technically go in.​


Tom


________________________________
De : Shahaf Shuler <shahafs@mellanox.com>
Envoyé : jeudi 6 septembre 2018 11:07
À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
Cc : Yongseok Koh; john.mcnamara@intel.com; marko.kovacevic@intel.com
Objet : RE: MLX5 should define the timestamp field in the doc

Wednesday, September 5, 2018 12:00 PM, Tom Barbette:
>Actually I managed this patch to implement support for rte_eth_timesync_read_time.

I am not fully familiar w/ this API, but it looks like the timespec returned from this call is expected to be in real time values (i.e. seconds and nano seconds),
at least this is what I see on the ptpclient example on the DPDK tree.

>
>Please tell me potential modifications, and if I shall submit it again as a "normal" patch to dev ?
>
>---

[...]

> }
>
> /**
>+ * Get device current time
>+ *
>+ * @param dev
>+ *   Pointer to Ethernet device structure.
>+ *
>+ * @param[out] time
>+ *   Time output value.
>+ *
>+ * @return
>+ *   0 if the time has correctly been set
>+ */
>+int
>+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
>+{
>+    struct priv *priv = dev->data->dev_private;
>+    struct ibv_values_ex values;
>+    int err = 0;
>+
>+    values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
>+    if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {

The use of this function will not bring you the outcome the API defines.
see the man page of ibv_query_rt_values_ex:
struct ibv_values_ex {
        uint32_t             comp_mask;    /* Compatibility mask that defines the query/queried fields [in/out]

        struct timespec      raw_clock;    /* HW raw clock */
};

enum ibv_values_mask {
        IBV_VALUES_MASK_RAW_CLOCK = 1 << 0, /* HW raw clock */
};

The output is the HW raw clock (just like you have in the mbuf).

In order it to work the application needs to understand the PTP coefficients for the raw->real time conversion. this can be done, just need some more work.
do you have a ptp daemon implemented to calc the coefficients?

>+ DRV_LOG(WARNING, "Could not query time !");
>+        return err;
>+    }
>+
>+    *time = values.raw_clock;
>+    return 0;
>+}
>+
>+
>+/**
>  * Get supported packet types.
>  *
>  * @param dev
>diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
>index c7965e5..3c72f5b 100644
>--- a/drivers/net/mlx5/mlx5_glue.c
>+++ b/drivers/net/mlx5/mlx5_glue.c
>@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
> }
>
> static int
>+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
>+   struct ibv_values_ex* values)
>+{
>+ return ibv_query_rt_values_ex(context, values);
>+}
>+
>+static int
> mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
>       struct ibv_port_attr *port_attr)
> {
>@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
>  .close_device = mlx5_glue_close_device,
>  .query_device = mlx5_glue_query_device,
>  .query_device_ex = mlx5_glue_query_device_ex,
>+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
>  .query_port = mlx5_glue_query_port,
>  .create_comp_channel = mlx5_glue_create_comp_channel,
>  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
>diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
>index e584d36..0582e95 100644
>--- a/drivers/net/mlx5/mlx5_glue.h
>+++ b/drivers/net/mlx5/mlx5_glue.h
>@@ -54,6 +54,8 @@ struct mlx5_glue {
>  int (*query_device_ex)(struct ibv_context *context,
>         const struct ibv_query_device_ex_input *input,
>         struct ibv_device_attr_ex *attr);
>+ int (*query_rt_values_ex)(struct ibv_context *context,
>+        struct ibv_values_ex *values);
>  int (*query_port)(struct ibv_context *context, uint8_t port_num,
>    struct ibv_port_attr *port_attr);
>  struct ibv_comp_channel *(*create_comp_channel)
>--
>2.7.4
>
>
>________________________________________
>De : Shahaf Shuler <shahafs@mellanox.com>
>Envoyé : mercredi 5 septembre 2018 10:18
>À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
>Cc : Yongseok Koh; john.mcnamara@intel.com; marko.kovacevic@intel.com
>Objet : RE: MLX5 should define the timestamp field in the doc
>
>Thanks for the details.
>
>The use case is clear. We will take it internally to see when we can support it.
>AFAIK we cannot read the internal time from userspace.
>
>Adding also AlexR to comment
>
>From: Tom Barbette <barbette@kth.se>
>Sent: Wednesday, September 5, 2018 10:11 AM
>To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
>Cc: Yongseok Koh <yskoh@mellanox.com>; john.mcnamara@intel.com; marko.kovacevic@intel.com
>Subject: RE: MLX5 should define the timestamp field in the doc
>
>Thanks for your answer Shahaf !
>
>We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.
>
>We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.
>
>I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.
>
>Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.
>
>Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.
>
>Tom



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] MLX5 should define the timestamp field in the doc
  2018-09-06  9:33           ` Tom Barbette
@ 2018-09-06 14:21             ` Shahaf Shuler
  0 siblings, 0 replies; 8+ messages in thread
From: Shahaf Shuler @ 2018-09-06 14:21 UTC (permalink / raw)
  To: Tom Barbette, dev, Alex Rosenbaum
  Cc: Yongseok Koh, john.mcnamara, marko.kovacevic, Thomas Monjalon

Thursday, September 6, 2018 12:33 PM, Tom Barbette:
Subject: RE: MLX5 should define the timestamp field in the doc
>
>It's true that it is a little bit a distortion of the original purpose. Here I want to query the time from the device (ie, the device's current clock). Maybe a new function in the API would be more suited?  CCing Thomas Mojalon for that part of the discussion.

yes, we cannot use the current API for that.

>
>I guess there is a case to query the device's timestamp to make our own precise time computations.

yes this is a valid use case. it will enable to implement ptp daemon for the clocks sync on top of DPDK.

>
>I also just saw that patch from two years ago that did not made it to the main branch : http://mails.dpdk.org/archives/dev/2016-October/048810.html , I guess it's because it is approximative in the time computation instead of a real synchronization? But now timestamp is in the rte_mbuf, so it could also technically go in.
>

i need to refresh my memory about this one (too long ago).

anyway, for your case there is a way to go, just need an app to sync the clocks.
i can help w/ the reviews and guidance on mlx5/ethdev if you wish to push such support upstream.

>Tom

From: Tom Barbette <barbette@kth.se>
Sent: Thursday, September 6, 2018 12:33 PM
To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org; Alex Rosenbaum <Alexr@mellanox.com>
Cc: Yongseok Koh <yskoh@mellanox.com>; john.mcnamara@intel.com; marko.kovacevic@intel.com; Thomas Monjalon <thomas@monjalon.net>
Subject: RE: MLX5 should define the timestamp field in the doc


​It's true that it is a little bit a distortion of the original purpose. Here I want to query the time from the device (ie, the device's current clock). Maybe a new function in the API would be more suited?  CCing Thomas Mojalon for that part of the discussion.



I guess there is a case to query the device's timestamp to make our own precise time computations.



I also just saw that patch from two years ago that did not made it to the main branch : http://mails.dpdk.org/archives/dev/2016-October/048810.html<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2016-October%2F048810.html&data=02%7C01%7Cshahafs%40mellanox.com%7Cc69ee4cf172f44d0918508d613dbd051%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636718232173908110&sdata=X3INk2WYSUHKPKRomoJMxpMQA2LZxZCqde4KQ%2BrKmho%3D&reserved=0>​ , I guess it's because it is approximative in the time computation instead of a real synchronization? But now timestamp is in the rte_mbuf, so it could also technically go in.​



Tom



________________________________
De : Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>
Envoyé : jeudi 6 septembre 2018 11:07
À : Tom Barbette; dev@dpdk.org<mailto:dev@dpdk.org>; Alex Rosenbaum
Cc : Yongseok Koh; john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>; marko.kovacevic@intel.com<mailto:marko.kovacevic@intel.com>
Objet : RE: MLX5 should define the timestamp field in the doc

Wednesday, September 5, 2018 12:00 PM, Tom Barbette:
>Actually I managed this patch to implement support for rte_eth_timesync_read_time.

I am not fully familiar w/ this API, but it looks like the timespec returned from this call is expected to be in real time values (i.e. seconds and nano seconds),
at least this is what I see on the ptpclient example on the DPDK tree.

>
>Please tell me potential modifications, and if I shall submit it again as a "normal" patch to dev ?
>
>---

[...]

> }
>
> /**
>+ * Get device current time
>+ *
>+ * @param dev
>+ *   Pointer to Ethernet device structure.
>+ *
>+ * @param[out] time
>+ *   Time output value.
>+ *
>+ * @return
>+ *   0 if the time has correctly been set
>+ */
>+int
>+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
>+{
>+    struct priv *priv = dev->data->dev_private;
>+    struct ibv_values_ex values;
>+    int err = 0;
>+
>+    values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
>+    if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {

The use of this function will not bring you the outcome the API defines.
see the man page of ibv_query_rt_values_ex:
struct ibv_values_ex {
        uint32_t             comp_mask;    /* Compatibility mask that defines the query/queried fields [in/out]

        struct timespec      raw_clock;    /* HW raw clock */
};

enum ibv_values_mask {
        IBV_VALUES_MASK_RAW_CLOCK = 1 << 0, /* HW raw clock */
};

The output is the HW raw clock (just like you have in the mbuf).

In order it to work the application needs to understand the PTP coefficients for the raw->real time conversion. this can be done, just need some more work.
do you have a ptp daemon implemented to calc the coefficients?

>+ DRV_LOG(WARNING, "Could not query time !");
>+        return err;
>+    }
>+
>+    *time = values.raw_clock;
>+    return 0;
>+}
>+
>+
>+/**
>  * Get supported packet types.
>  *
>  * @param dev
>diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
>index c7965e5..3c72f5b 100644
>--- a/drivers/net/mlx5/mlx5_glue.c
>+++ b/drivers/net/mlx5/mlx5_glue.c
>@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
> }
>
> static int
>+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
>+   struct ibv_values_ex* values)
>+{
>+ return ibv_query_rt_values_ex(context, values);
>+}
>+
>+static int
> mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
>       struct ibv_port_attr *port_attr)
> {
>@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
>  .close_device = mlx5_glue_close_device,
>  .query_device = mlx5_glue_query_device,
>  .query_device_ex = mlx5_glue_query_device_ex,
>+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
>  .query_port = mlx5_glue_query_port,
>  .create_comp_channel = mlx5_glue_create_comp_channel,
>  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
>diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
>index e584d36..0582e95 100644
>--- a/drivers/net/mlx5/mlx5_glue.h
>+++ b/drivers/net/mlx5/mlx5_glue.h
>@@ -54,6 +54,8 @@ struct mlx5_glue {
>  int (*query_device_ex)(struct ibv_context *context,
>         const struct ibv_query_device_ex_input *input,
>         struct ibv_device_attr_ex *attr);
>+ int (*query_rt_values_ex)(struct ibv_context *context,
>+        struct ibv_values_ex *values);
>  int (*query_port)(struct ibv_context *context, uint8_t port_num,
>    struct ibv_port_attr *port_attr);
>  struct ibv_comp_channel *(*create_comp_channel)
>--
>2.7.4
>
>
>________________________________________
>De : Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>
>Envoyé : mercredi 5 septembre 2018 10:18
>À : Tom Barbette; dev@dpdk.org<mailto:dev@dpdk.org>; Alex Rosenbaum
>Cc : Yongseok Koh; john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>; marko.kovacevic@intel.com<mailto:marko.kovacevic@intel.com>
>Objet : RE: MLX5 should define the timestamp field in the doc
>
>Thanks for the details.
>
>The use case is clear. We will take it internally to see when we can support it.
>AFAIK we cannot read the internal time from userspace.
>
>Adding also AlexR to comment
>
>From: Tom Barbette <barbette@kth.se<mailto:barbette@kth.se>>
>Sent: Wednesday, September 5, 2018 10:11 AM
>To: Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>; dev@dpdk.org<mailto:dev@dpdk.org>
>Cc: Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>>; john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>; marko.kovacevic@intel.com<mailto:marko.kovacevic@intel.com>
>Subject: RE: MLX5 should define the timestamp field in the doc
>
>Thanks for your answer Shahaf !
>
>We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.
>
>We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.
>
>I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.
>
>Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.
>
>Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.
>
>Tom



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-06 14:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 14:26 [dpdk-dev] MLX5 should define the timestamp field in the doc Tom Barbette
2018-09-05  5:29 ` Shahaf Shuler
2018-09-05  7:11   ` Tom Barbette
2018-09-05  8:18     ` Shahaf Shuler
2018-09-05  9:00       ` Tom Barbette
2018-09-06  9:07         ` Shahaf Shuler
2018-09-06  9:33           ` Tom Barbette
2018-09-06 14:21             ` Shahaf Shuler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).