DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support
@ 2016-10-13 14:35 Oleg Kuporosov
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-13 14:35 UTC (permalink / raw)
  To: dev


Hello DPDK Developers,

Financial Services Industry which is pretty eager for several DPDK
features especially low latency while high throughput. The major issue
so far for increasing DPDK adoption there is requirement for several
applications (capturers, some trading algorithms) to support of accurate
timestamping. The requirement mostly came from regulatory and customers
need strictly follow it.

Current state of timestamping support in DPDK looks pretty good:
 - there is API to enable/disable timestamping acquisition by 
   rte_eth_timesync_enable/disable
 - get timestamps itself by timesync_read_rx/tx_timestamp
 - and even implementation of NTP IEEE 1588 for time synchronization
   by rte_eth_timesync_adjust_read/write_time APIs.
   
But it misses the most important feature there – embedding timestamp
in rte_mbuf aligning it with packet.

We would like to change this to increase DPDK adoption for several new
DPDK-based applications for FSI segment. It also might be very
applicable for several RT media and debugging purposes of network
flows/streams in other segments like HPC.

There are several thoughts how to improve there:
 - include uint64_t timestamp field into rte_mbuf with minimal impact
   to throughput/latency. Keep it just simple uint64_t in ns (more than
   580 years) would be enough for immediate needs while using full
   struct timespec with twice bigger size would have much stronger
   performance impact as missed cacheline0. It is possible as there is
   6-bytes gap in 1st cacheline (fast path) and moving uint16_t
   vlan_tci_outer field to 2nd cacheline. 
 - such move will only impact for pretty rare usable VLAN RX stripping
   mode for outer TCI (it used only for one NIC i40e from the whole
   set and keep minimal performance impact for timestamps.  
 - PMD can fill the field in their callback completion routines
   depending on enabling this feature in runtime.

We evaluated other possible options but looks it will have even worse
performance impact.


Oleg Kuporosov (3):
  mbuf: embedding timestamp into the packet
  app/testpmd: enabled control for packet timestamps
  net/mlx5: implementation of Rx packet timestamping support

 app/test-pmd/cmdline.c                      | 122 +++++++++++++++
 app/test-pmd/parameters.c                   |   4 +
 app/test-pmd/testpmd.c                      |   5 +
 app/test-pmd/testpmd.h                      |   1 +
 doc/guides/testpmd_app_ug/run_app.rst       |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 +
 drivers/net/mlx5/mlx5.c                     |   7 +-
 drivers/net/mlx5/mlx5.h                     |  10 +-
 drivers/net/mlx5/mlx5_defs.h                |   4 +
 drivers/net/mlx5/mlx5_ethdev.c              | 222 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_rxq.c                 |   2 +
 drivers/net/mlx5/mlx5_rxtx.c                |  19 ++-
 drivers/net/mlx5/mlx5_rxtx.h                |   7 +-
 drivers/net/mlx5/mlx5_time.h                |  53 +++++++
 drivers/net/mlx5/mlx5_trigger.c             |   1 +
 lib/librte_eal/common/include/rte_time.h    |  45 ++++++
 lib/librte_mbuf/rte_mbuf.h                  |   6 +-
 17 files changed, 507 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_time.h

Thanks,
Oleg.
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-13 14:35 [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Oleg Kuporosov
@ 2016-10-13 14:35 ` Oleg Kuporosov
  2016-10-18 15:43   ` Olivier Matz
  2017-01-24 15:27   ` Olivier MATZ
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps Oleg Kuporosov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-13 14:35 UTC (permalink / raw)
  To: dev

The hard requirement of financial services industry is accurate
timestamping aligned with the packet itself. This patch is to satisfy
this requirement:

- include uint64_t timestamp field into rte_mbuf with minimal impact to
  throughput/latency. Keep it just simple uint64_t in ns (more than 580
  years) would be enough for immediate needs while using full
  struct timespec with twice bigger size would have much stronger
  performance impact as missed cacheline0.

- it is possible as there is 6-bytes gap in 1st cacheline (fast path)
  and moving uint16_t vlan_tci_outer field to 2nd cacheline.

- such move will only impact for pretty rare usable VLAN RX stripping
  mode for outer TCI (it used only for one NIC i40e from the whole set and
  allows to keep minimal performance impact for RX/TX timestamps.

Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
---
 lib/librte_mbuf/rte_mbuf.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 23b7bf8..1e1f2ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -851,8 +851,7 @@ struct rte_mbuf {
 
 	uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
 
-	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
-	uint16_t vlan_tci_outer;
+	uint64_t timestamp;       /**< Packet's timestamp, usually in ns */
 
 	/* second cache line - fields only used in slow path or on TX */
 	MARKER cacheline1 __rte_cache_min_aligned;
@@ -885,6 +884,9 @@ struct rte_mbuf {
 		};
 	};
 
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
 	/** Size of the application private data. In case of an indirect
 	 * mbuf, it stores the direct mbuf private data size. */
 	uint16_t priv_size;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
  2016-10-13 14:35 [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Oleg Kuporosov
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
@ 2016-10-13 14:35 ` Oleg Kuporosov
  2017-04-25 14:02   ` Wu, Jingjing
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support Oleg Kuporosov
  2016-10-17 11:24 ` [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Nélio Laranjeiro
  3 siblings, 1 reply; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-13 14:35 UTC (permalink / raw)
  To: dev

Implemented two methods of control

- by --enable-timestamps CL testpmd application we can enable timestamping
  for all ports;
- in interactive mode port config <port> timestamps on|off is able to
  configure timestamping per specific port.

The control doesn't interact with IEEE1588 PTP implementation there as
it is under macro compilation but can be extended in the future.

This feature is required for debugging/testing purposes for real time HW
packet timestamping.

Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 122 ++++++++++++++++++++++++++++
 app/test-pmd/parameters.c                   |   4 +
 app/test-pmd/testpmd.c                      |   5 ++
 app/test-pmd/testpmd.h                      |   1 +
 doc/guides/testpmd_app_ug/run_app.rst       |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 ++
 6 files changed, 143 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 17d238f..9b202ce 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -561,6 +561,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
 			" for ports.\n\n"
 
+			"port config (port_id|all) timestamps (on|off)\n"
+			"    Enable/disable packet timestamps.\n\n"
+
 			"port config all rss (all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)\n"
 			"    Set the RSS mode.\n\n"
 
@@ -10188,6 +10191,123 @@ cmdline_parse_inst_t cmd_config_l2_tunnel_en_dis_specific = {
 	},
 };
 
+/* Configure port timestamping */
+struct cmd_config_timestamps_result {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	cmdline_fixed_string_t all;
+	uint8_t id;
+	cmdline_fixed_string_t timestamps;
+	cmdline_fixed_string_t mode;
+};
+
+cmdline_parse_token_string_t cmd_config_timestamps_port =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_timestamps_result,
+		 port, "port");
+cmdline_parse_token_string_t cmd_config_timestamps_config =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_timestamps_result,
+		 config, "config");
+cmdline_parse_token_string_t cmd_config_timestamps_all_str =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_timestamps_result,
+		 all, "all");
+cmdline_parse_token_num_t cmd_config_timestamps_id =
+	TOKEN_NUM_INITIALIZER
+		(struct cmd_config_timestamps_result,
+		 id, UINT8);
+cmdline_parse_token_string_t cmd_config_timestamps_ts_str =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_timestamps_result,
+		timestamps, "timestamps");
+cmdline_parse_token_string_t cmd_config_timestamps_path =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_timestamps_result,
+		mode, "on#off");
+
+/* enable/disable timestamps (on/off) for a port */
+static void
+cmd_config_timestamps_all_parsed(
+	void *parsed_result,
+	__attribute__((unused)) struct cmdline *cl,
+	__attribute__((unused)) void *data)
+{
+	struct cmd_config_timestamps_result *res = parsed_result;
+	portid_t pid;
+	uint8_t mode = 0;
+
+	if (!strcmp("on", res->mode))
+		mode = 1;
+	else if (!strcmp("off", res->mode))
+		mode = 0;
+	else {
+		printf("Unknown timestamps mode parameter\n");
+		return;
+	}
+	FOREACH_PORT(pid, ports) {
+		if (mode)
+			rte_eth_timesync_enable(pid);
+		else
+			rte_eth_timesync_disable(pid);
+	}
+}
+
+cmdline_parse_inst_t cmd_config_timestamps_all = {
+	.f = cmd_config_timestamps_all_parsed,
+	.data = NULL,
+	.help_str = "port config all timestamps on|off",
+	.tokens = {
+		(void *)&cmd_config_timestamps_port,
+		(void *)&cmd_config_timestamps_config,
+		(void *)&cmd_config_timestamps_all_str,
+		(void *)&cmd_config_timestamps_ts_str,
+		(void *)&cmd_config_timestamps_path,
+		NULL,
+	},
+};
+
+/* enable/disable timestamps (rx/tx/both) for a port */
+static void
+cmd_config_timestamps_specific_parsed(
+	void *parsed_result,
+	__attribute__((unused)) struct cmdline *cl,
+	__attribute__((unused)) void *data)
+{
+	struct cmd_config_timestamps_result *res =
+		parsed_result;
+	uint8_t mode = 0;
+
+	if (port_id_is_invalid(res->id, ENABLED_WARN))
+		return;
+	if (!strcmp("on", res->mode))
+		mode = 1;
+	else if (!strcmp("off", res->mode))
+		mode = 0;
+	else {
+		printf("Unknown timestamps mode parameter\n");
+		return;
+	}
+	if (mode)
+		rte_eth_timesync_enable(res->id);
+	else
+		rte_eth_timesync_disable(res->id);
+}
+
+cmdline_parse_inst_t cmd_config_timestamps_specific = {
+	.f = cmd_config_timestamps_specific_parsed,
+	.data = NULL,
+	.help_str = "port config <port> timestamps on|off",
+	.tokens = {
+		(void *)&cmd_config_timestamps_port,
+		(void *)&cmd_config_timestamps_config,
+		(void *)&cmd_config_timestamps_id,
+		(void *)&cmd_config_timestamps_ts_str,
+		(void *)&cmd_config_timestamps_path,
+		NULL,
+	},
+};
+
 /* E-tag configuration */
 
 /* Common result structure for all E-tag configuration */
@@ -10739,6 +10859,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_config_e_tag_forwarding_en_dis,
 	(cmdline_parse_inst_t *)&cmd_config_e_tag_filter_add,
 	(cmdline_parse_inst_t *)&cmd_config_e_tag_filter_del,
+	(cmdline_parse_inst_t *)&cmd_config_timestamps_all,
+	(cmdline_parse_inst_t *)&cmd_config_timestamps_specific,
 	NULL,
 };
 
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 6a6a07e..f15cd5a 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -150,6 +150,7 @@ usage(char* progname)
 	       "By default drop-queue=127.\n");
 	printf("  --crc-strip: enable CRC stripping by hardware.\n");
 	printf("  --enable-rx-cksum: enable rx hardware checksum offload.\n");
+	printf("  --enable-timestamps: enable rx hardware timestamp.\n");
 	printf("  --disable-hw-vlan: disable hardware vlan.\n");
 	printf("  --disable-hw-vlan-filter: disable hardware vlan filter.\n");
 	printf("  --disable-hw-vlan-strip: disable hardware vlan strip.\n");
@@ -525,6 +526,7 @@ launch_args_parse(int argc, char** argv)
 		{ "pkt-filter-drop-queue",      1, 0, 0 },
 		{ "crc-strip",                  0, 0, 0 },
 		{ "enable-rx-cksum",            0, 0, 0 },
+		{ "enable-timestamps",          0, 0, 0 },
 		{ "enable-scatter",             0, 0, 0 },
 		{ "disable-hw-vlan",            0, 0, 0 },
 		{ "disable-hw-vlan-filter",     0, 0, 0 },
@@ -768,6 +770,8 @@ launch_args_parse(int argc, char** argv)
 				rx_mode.enable_scatter = 1;
 			if (!strcmp(lgopts[opt_idx].name, "enable-rx-cksum"))
 				rx_mode.hw_ip_checksum = 1;
+			if (!strcmp(lgopts[opt_idx].name, "enable-timestamps"))
+				timestamps_enabled = 1;
 
 			if (!strcmp(lgopts[opt_idx].name, "disable-hw-vlan")) {
 				rx_mode.hw_vlan_filter = 0;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index e2403c3..ee76282 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -189,6 +189,8 @@ uint8_t dcb_config = 0;
 /* Whether the dcb is in testing status */
 uint8_t dcb_test = 0;
 
+/**< Enabling runtime packet timestamps by CL: --enable-timestamps */
+uint8_t timestamps_enabled = 0;
 /*
  * Configurable number of RX/TX queues.
  */
@@ -1851,6 +1853,9 @@ init_port_config(void)
 
 		rte_eth_macaddr_get(pid, &port->eth_addr);
 
+		if (timestamps_enabled)
+			rte_eth_timesync_enable(pid);
+
 		map_port_queue_stats_mapping_registers(pid, port);
 #ifdef RTE_NIC_BYPASS
 		rte_eth_dev_bypass_init(pid);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2b281cc..489661e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -353,6 +353,7 @@ extern int32_t txq_flags;
 
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
+extern uint8_t timestamps_enabled; /**< Timestamps by --enable-timestamps */
 extern enum dcb_queue_mapping_mode dcb_q_mapping;
 
 extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 7712bd2..fb56846 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -313,6 +313,10 @@ The commandline options are:
 
     Enable per-queue packet drop for packets with no descriptors.
 
+*   ``--enable-timestamps``
+
+    Enable timesync and per packet timestamping for all ports.
+
 *   ``--disable-rss``
 
     Disable RSS (Receive Side Scaling).
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index f87e0c2..ed84c6d 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1312,6 +1312,13 @@ Set the DCB mode for an individual port::
 
 The traffic class should be 4 or 8.
 
+port config - Timesync/timestamping
+~~~~~~~~~~~~~~~~~~~
+
+Enable/disable time synhronzation/packet timestamping for specific port or all ports::
+
+   testpmd> port config (port_id|all) timestamps (on|off)
+
 port config - Burst
 ~~~~~~~~~~~~~~~~~~~
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support
  2016-10-13 14:35 [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Oleg Kuporosov
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps Oleg Kuporosov
@ 2016-10-13 14:35 ` Oleg Kuporosov
  2016-10-17 11:24 ` [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Nélio Laranjeiro
  3 siblings, 0 replies; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-13 14:35 UTC (permalink / raw)
  To: dev

Accurate RX timestaping is minimal requirement for several important
financial services industry requirement for several applications like
packet capture.

Support of periodic time synchronization with system time and adjusting
clock deviation was also added. We assume the system is synchronized by
PTP itself, so there is no links to PTP client implementation in DPDK.
Time synchronization is run by control thread by rte_alarm for each 5
seconds.  New synchronization object is copied to each configured RXQ
for TS calculation.

RX timestamp is calculated in nanoseconds and stored in rte_mbuf.
RX timestamp doesn't require additional API as simply can be
read from timestamp field of rte_mbuf.

TX timestamps were not added due to several reasons - there is no API yet,
issue with burst mode as TS will be provided only for the latest packet in
the burst and rte_mbuf will be discarded immediately after completion.

Enabling/disabling time synchronization DPDK API was added.

Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
---
 drivers/net/mlx5/mlx5.c                  |   7 +-
 drivers/net/mlx5/mlx5.h                  |  10 +-
 drivers/net/mlx5/mlx5_defs.h             |   4 +
 drivers/net/mlx5/mlx5_ethdev.c           | 222 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_rxq.c              |   2 +
 drivers/net/mlx5/mlx5_rxtx.c             |  19 ++-
 drivers/net/mlx5/mlx5_rxtx.h             |   7 +-
 drivers/net/mlx5/mlx5_time.h             |  53 ++++++++
 drivers/net/mlx5/mlx5_trigger.c          |   1 +
 lib/librte_eal/common/include/rte_time.h |  45 +++++++
 10 files changed, 360 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_time.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 83bdf65..64dab28 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -122,6 +122,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
 	/* In case mlx5_dev_stop() has not been called. */
+	mlx5_timesync_disable(dev);
 	priv_dev_interrupt_handler_uninstall(priv, dev);
 	priv_special_flow_disable_all(priv);
 	priv_mac_addrs_disable(priv);
@@ -219,6 +220,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
 	.rss_hash_update = mlx5_rss_hash_update,
 	.rss_hash_conf_get = mlx5_rss_hash_conf_get,
 	.filter_ctrl = mlx5_dev_filter_ctrl,
+	.timesync_enable = mlx5_timesync_enable,
+	.timesync_disable = mlx5_timesync_disable,
 };
 
 static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d4fb5ff..49447c4 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -54,6 +54,7 @@
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-pedantic"
 #endif
+#include <rte_alarm.h>
 #include <rte_ether.h>
 #include <rte_ethdev.h>
 #include <rte_spinlock.h>
@@ -64,6 +65,7 @@
 #endif
 
 #include "mlx5_utils.h"
+#include "mlx5_time.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -113,6 +115,7 @@ struct priv {
 	unsigned int mps:1; /* Whether multi-packet send is supported. */
 	unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
+	unsigned int timesync_en:1; /* Timesync (timestamping) enabled */
 	unsigned int txq_inline; /* Maximum packet size for inlining. */
 	unsigned int txqs_inline; /* Queue number threshold for inlining. */
 	/* RX/TX queues. */
@@ -120,6 +123,7 @@ struct priv {
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
+	struct mlx5_timesync timesync; /* time synronization object */
 	/* Indirection tables referencing all RX WQs. */
 	struct ibv_exp_rwq_ind_table *(*ind_tables)[];
 	unsigned int ind_tables_n; /* Number of indirection tables. */
@@ -203,6 +207,8 @@ int mlx5_set_link_up(struct rte_eth_dev *dev);
 struct priv *mlx5_secondary_data_setup(struct priv *priv);
 void priv_select_tx_function(struct priv *);
 void priv_select_rx_function(struct priv *);
+int mlx5_timesync_enable(struct rte_eth_dev *dev);
+int mlx5_timesync_disable(struct rte_eth_dev *dev);
 
 /* mlx5_mac.c */
 
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index cc2a6f3..10b3f04 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -79,4 +79,8 @@
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
+/* Clock deviation fix alarm timeout*/
+#define MLX5_ALARM_CLOCK_DEVIATION_US 5000000
+#define MLX5_CLOCK_DEVIATION_THRESHOLD 10
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 264299a..16ba733 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -59,6 +59,7 @@
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
 #include <rte_malloc.h>
+#include <rte_time.h>
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -1412,3 +1413,220 @@ priv_select_rx_function(struct priv *priv)
 {
 	priv->dev->rx_pkt_burst = mlx5_rx_burst;
 }
+
+/**
+ * Synchronize with system time. We get the best (min) from 10 attempts
+ * to minimize shift from sys time and HCA clocks.
+ *
+ * @param ibv_context
+ *   Pointer to IB verbs context.
+ * @param st
+ *   Pointer to store system time.
+ * @param st_ns
+ *   Pointer to store system time represented in ns.
+ * @param hw_clock
+ *   Pointer to store HCA HW clock.
+ *
+ * @return
+ *   0 on success, -1 value on failure.
+ */
+static int
+mlx5_sync_clocks(struct ibv_context *ibv_ctx, struct timespec *st,
+		 volatile uint64_t *st_ns, volatile uint64_t *hw_clock)
+{
+	struct timespec st1, st2, diff, st_min = TIMESPEC_INITIALIZER;
+	struct ibv_exp_values query_val = {0};
+	int64_t interval, best_interval = 0;
+	uint64_t hw_clock_min = 0;
+
+	memset(&query_val, 0, sizeof(query_val));
+	query_val.comp_mask = IBV_EXP_VALUES_HW_CLOCK;
+	for (int i = 0 ; i < 10 ; ++i) {
+		clock_gettime(CLOCK_REALTIME, &st1);
+		if (ibv_exp_query_values(ibv_ctx, IBV_EXP_VALUES_HW_CLOCK,
+					 &query_val) || !query_val.hwclock)
+			return -1;
+		clock_gettime(CLOCK_REALTIME, &st2);
+		interval = (st2.tv_sec - st1.tv_sec) * NSEC_PER_SEC +
+			   (st2.tv_nsec - st1.tv_nsec);
+
+		if (!best_interval || interval < best_interval) {
+			best_interval = interval;
+			hw_clock_min = query_val.hwclock;
+
+			interval /= 2;
+			diff.tv_sec = interval / NSEC_PER_SEC;
+			diff.tv_nsec = interval - (diff.tv_sec * NSEC_PER_SEC);
+			rte_timespec_add(&st1, &diff, &st_min);
+		}
+	}
+	*st = st_min;
+	*st_ns = st->tv_sec * NSEC_PER_SEC + st->tv_nsec;
+	*hw_clock = hw_clock_min;
+	return 0;
+}
+
+/**
+ * Periodic function to run by rte_eal_alarm and to synchronize with system
+ * time and calculate HCA HW сlock deviation. The deviation will be included
+ * into timestamp calculation in RX/TX callbacks.
+ *
+ * @param arg
+ *   Void pointer to struct priv.
+ *
+ */
+static void
+mlx5_fix_hw_clock_deviation_handler(void *arg)
+{
+	struct priv *pv = arg;
+	struct timespec current_time, diff_systime;
+	uint64_t diff_hw_clock, hw_clock, estimated_hw_clock;
+	uint64_t systime_ns, diff_systime_ns;
+	int64_t clock_deviation_hw;
+	volatile struct mlx5_timestamp_sync *ts = &pv->timesync.sync_timestamp;
+
+	if (!ts->port_clock_frequency)
+		return;
+	if (mlx5_sync_clocks(pv->ctx, &current_time, &systime_ns, &hw_clock))
+		return;
+	/* time between current and previous time sync */
+	rte_timespec_sub(&current_time, &pv->timesync.sync_systime,
+			 &diff_systime);
+	/* also clocks */
+	diff_hw_clock = hw_clock - ts->sync_hw_clock;
+	diff_systime_ns = rte_timespec_to_ns(&diff_systime);
+	estimated_hw_clock = (diff_systime.tv_sec * ts->port_clock_frequency) +
+			     (diff_systime.tv_nsec * ts->port_clock_frequency /
+			     NSEC_PER_SEC);
+	clock_deviation_hw = estimated_hw_clock - diff_hw_clock;
+	priv_lock(pv);
+	if (abs(clock_deviation_hw) >= MLX5_CLOCK_DEVIATION_THRESHOLD) {
+		ts->port_clock_frequency = (diff_hw_clock * NSEC_PER_SEC) /
+					   diff_systime_ns;
+		ts->mskd_duration = (NSEC_PER_SEC << 30) /
+				    ts->port_clock_frequency;
+	}
+	ts->sync_hw_clock = hw_clock;
+	ts->sync_time_ns = systime_ns;
+	pv->timesync.sync_systime = current_time;
+	DEBUG("%ld.%09ld since last fix, time_ns: %lu estimate_hw_clock = %ld,"
+	      "diff_hw_clock = %ld, deviation = %ld, freq = %ld durat: %lu",
+	      diff_systime.tv_sec, diff_systime.tv_nsec, systime_ns,
+	      estimated_hw_clock, diff_hw_clock, clock_deviation_hw,
+	      ts->port_clock_frequency, ts->mskd_duration);
+	/* update all queues */
+	for (uint32_t i = 0; i != pv->rxqs_n; i++) {
+		struct rxq *rxq = (*pv->rxqs)[i];
+
+		if (rxq == NULL)
+			continue;
+		rxq->timestamps_enabled = pv->timesync_en;
+		rxq->timesync = *ts;
+	}
+	priv_unlock(pv);
+	rte_eal_alarm_set(MLX5_ALARM_CLOCK_DEVIATION_US,
+			  mlx5_fix_hw_clock_deviation_handler,
+			  (void *)pv);
+}
+
+/**
+ * Return HCA port clock frequency in Hz.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static uint64_t
+mlx5_get_port_clock_frequency(struct priv *pv)
+{
+	struct ibv_exp_device_attr exp_device_attr;
+	exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_WITH_HCA_CORE_CLOCK;
+	if (ibv_exp_query_device(pv->ctx, &exp_device_attr)) {
+		ERROR("ibv_exp_query_device() failed");
+		return 0;
+	}
+	return exp_device_attr.hca_core_clock * 1000; /* orig in KHz */
+}
+
+/**
+ *  DPDK callback to enable timestamping. rte_mbuf.timestamp will hold
+ *  value of the packet in ns.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_timesync_enable(struct rte_eth_dev *dev)
+{
+	struct priv *pv = dev->data->dev_private;
+	struct mlx5_timesync *tso = &pv->timesync;
+	volatile struct mlx5_timestamp_sync *sts = &tso->sync_timestamp;
+
+	priv_lock(pv);
+	sts->port_clock_frequency = mlx5_get_port_clock_frequency(pv);
+	if (!sts->port_clock_frequency) {
+		pv->timesync_en = 0;
+		INFO("Timesync disabled: %d as port clock frequency is 0",
+		     pv->timesync_en);
+		priv_unlock(pv);
+		return -ENOTSUP;
+	}
+	INFO("port %u Clock frequency: %lu Hz", pv->port,
+	     sts->port_clock_frequency);
+	if (mlx5_sync_clocks(pv->ctx, &tso->sync_systime, &sts->sync_time_ns,
+			     &sts->sync_hw_clock)) {
+		pv->timesync_en = 0;
+		INFO("Timesync disabled: %d", pv->timesync_en);
+		priv_unlock(pv);
+		return -ENOTSUP;
+	}
+	sts->mskd_duration = (NSEC_PER_SEC << 30) / sts->port_clock_frequency;
+	pv->timesync_en = 1;
+	DEBUG("%p: Timesync enabled, masked duration: %lu", (void *)dev,
+	      sts->mskd_duration);
+	rte_eal_alarm_set(1000,
+			  mlx5_fix_hw_clock_deviation_handler,
+			  pv);
+	DEBUG("%p: sync_systime: %lu.%lu, time_ns: %lu sync_hw_clock: %lu",
+	      (void *)dev, tso->sync_systime.tv_sec,
+	      tso->sync_systime.tv_nsec, sts->sync_time_ns, sts->sync_hw_clock);
+	/* update all queues */
+	for (uint32_t i = 0; i != pv->rxqs_n; i++) {
+		struct rxq *rxq = (*pv->rxqs)[i];
+
+		if (rxq == NULL)
+			continue;
+		rxq->timestamps_enabled = pv->timesync_en;
+		rxq->timesync = *sts;
+	}
+	priv_unlock(pv);
+	return 0;
+}
+
+/**
+ *  DPDK callback to disable timestamping. Value of rte_mbuf.timestamp is
+ *  undefined.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_timesync_disable(struct rte_eth_dev *dev)
+{
+	struct priv *pv = dev->data->dev_private;
+
+	pv->timesync_en = 0;
+	rte_eal_alarm_cancel(mlx5_fix_hw_clock_deviation_handler, pv);
+	for (uint32_t i = 0; i != pv->rxqs_n; i++) {
+		struct rxq *rxq = (*pv->rxqs)[i];
+		if (rxq == NULL)
+			continue;
+		rxq->timestamps_enabled = pv->timesync_en;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index d32ad68..b9a5fe6 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1258,6 +1258,8 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		rte_free(rxq_ctrl);
 	else {
 		rxq_ctrl->rxq.stats.idx = idx;
+		rxq_ctrl->rxq.timestamps_enabled = priv->timesync_en;
+		rxq_ctrl->rxq.timesync = priv->timesync.sync_timestamp;
 		DEBUG("%p: adding RX queue %p to list",
 		      (void *)dev, (void *)rxq_ctrl);
 		(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 91b0131..54f284d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -59,6 +59,7 @@
 #include <rte_branch_prediction.h>
 #include <rte_ether.h>
 #include <rte_vect.h>
+#include <rte_time.h>
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -1385,6 +1386,20 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					len -= ETHER_CRC_LEN;
 			}
 			PKT_LEN(pkt) = len;
+			/* Calculate synchronized timestamp in ns */
+			if (unlikely(rxq->timestamps_enabled)) {
+				volatile struct mlx5_timestamp_sync *tso =
+					&rxq->timesync;
+				uint64_t clock_diff;
+				rte_prefetch0(tso);
+				clock_diff = ntohll(cqe->timestamp) -
+					tso->sync_hw_clock;
+				clock_diff = (clock_diff * tso->mskd_duration)
+					>> 30;
+				pkt->timestamp = tso->sync_time_ns +
+					clock_diff;
+				pkt->ol_flags |= PKT_RX_IEEE1588_TMST;
+			}
 		}
 		DATA_LEN(rep) = DATA_LEN(seg);
 		PKT_LEN(rep) = PKT_LEN(seg);
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 844cabc..2ff873c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -60,6 +60,7 @@
 #endif
 
 #include "mlx5_utils.h"
+#include "mlx5_time.h"
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -109,6 +110,7 @@ struct rxq {
 	unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
 	unsigned int vlan_strip:1; /* Enable VLAN stripping. */
 	unsigned int crc_present:1; /* CRC must be subtracted. */
+	unsigned int timestamps_enabled:1; /* timestamping enabled */
 	unsigned int sges_n:2; /* Log 2 of SGEs (max buffers per packet). */
 	unsigned int cqe_n:4; /* Log 2 of CQ elements. */
 	unsigned int elts_n:4; /* Log 2 of Mbufs. */
@@ -125,6 +127,7 @@ struct rxq {
 	struct rte_mbuf *(*elts)[];
 	struct rte_mempool *mp;
 	struct mlx5_rxq_stats stats;
+	volatile struct mlx5_timestamp_sync timesync; /* per queue copy */
 } __rte_cache_aligned;
 
 /* RX queue control descriptor. */
diff --git a/drivers/net/mlx5/mlx5_time.h b/drivers/net/mlx5/mlx5_time.h
new file mode 100644
index 0000000..b44bcb1
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_time.h
@@ -0,0 +1,53 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_TIME_H_
+#define RTE_PMD_MLX5_TIME_H_
+
+#include <stddef.h>
+#include <stdint.h>
+#include <limits.h>
+
+
+struct mlx5_timestamp_sync {
+	uint64_t sync_hw_clock; /* the last HW clocks */
+	uint64_t sync_time_ns; /* the last system time in ns */
+	uint64_t mskd_duration; /* adjusted masked duration */
+	uint64_t port_clock_frequency; /* in Hz */
+};
+
+struct mlx5_timesync {
+	volatile struct mlx5_timestamp_sync sync_timestamp;
+	struct timespec sync_systime; /* the last system time */
+};
+
+#endif /* RTE_PMD_MLX5_TIME_H_ */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e9b9a29..ff94e6b 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -116,6 +116,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 		return;
 	}
 	DEBUG("%p: cleaning up and destroying hash RX queues", (void *)dev);
+	mlx5_timesync_disable(dev);
 	priv_special_flow_disable_all(priv);
 	priv_mac_addrs_disable(priv);
 	priv_destroy_hash_rxqs(priv);
diff --git a/lib/librte_eal/common/include/rte_time.h b/lib/librte_eal/common/include/rte_time.h
index 28c6274..7fa8e64 100644
--- a/lib/librte_eal/common/include/rte_time.h
+++ b/lib/librte_eal/common/include/rte_time.h
@@ -37,6 +37,8 @@
 #include <stdint.h>
 #include <time.h>
 
+#define TIMESPEC_INITIALIZER    {0, 0}
+
 #define NSEC_PER_SEC             1000000000L
 
 /**
@@ -127,4 +129,47 @@ rte_ns_to_timespec(uint64_t nsec)
 	return ts;
 }
 
+/**
+ * Addition of two timespec times to result.
+ *
+ * @param a
+ *    Pointer to the first time
+ * @param b
+ *    Pointer to the second time
+ * @param res
+ *    Pointer to result
+ *
+ */
+static inline void rte_timespec_add(struct timespec *a, struct timespec *b,
+	struct timespec *res)
+{
+	res->tv_sec = a->tv_sec + b->tv_sec;
+	res->tv_nsec = a->tv_nsec + b->tv_nsec;
+	if (res->tv_nsec >= NSEC_PER_SEC) {
+		++res->tv_sec;
+		res->tv_nsec -= NSEC_PER_SEC;
+	}
+}
+
+/**
+ * Substruction of the first timespec by second to result.
+ *
+ * @param a
+ *    Pointer to the first time
+ * @param b
+ *    Pointer to the second time
+ * @param res
+ *    Pointer to result
+ */
+static inline void rte_timespec_sub(struct timespec *a, struct timespec *b,
+	struct timespec *res)
+{
+	res->tv_sec = a->tv_sec - b->tv_sec;
+	res->tv_nsec = a->tv_nsec - b->tv_nsec;
+	if (res->tv_nsec < 0) {
+		--res->tv_sec;
+		res->tv_nsec += NSEC_PER_SEC;
+	}
+}
+
 #endif /* _RTE_TIME_H_ */
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support
  2016-10-13 14:35 [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Oleg Kuporosov
                   ` (2 preceding siblings ...)
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support Oleg Kuporosov
@ 2016-10-17 11:24 ` Nélio Laranjeiro
  2017-10-17 13:57   ` Thomas Monjalon
  3 siblings, 1 reply; 20+ messages in thread
From: Nélio Laranjeiro @ 2016-10-17 11:24 UTC (permalink / raw)
  To: Oleg Kuporosov; +Cc: dev

On Thu, Oct 13, 2016 at 02:35:05PM +0000, Oleg Kuporosov wrote:
> 
> Hello DPDK Developers,
> 
> Financial Services Industry which is pretty eager for several DPDK
> features especially low latency while high throughput. The major issue
> so far for increasing DPDK adoption there is requirement for several
> applications (capturers, some trading algorithms) to support of accurate
> timestamping. The requirement mostly came from regulatory and customers
> need strictly follow it.
> 
> Current state of timestamping support in DPDK looks pretty good:
>  - there is API to enable/disable timestamping acquisition by 
>    rte_eth_timesync_enable/disable
>  - get timestamps itself by timesync_read_rx/tx_timestamp
>  - and even implementation of NTP IEEE 1588 for time synchronization
>    by rte_eth_timesync_adjust_read/write_time APIs.
>    
> But it misses the most important feature there – embedding timestamp
> in rte_mbuf aligning it with packet.
> 
> We would like to change this to increase DPDK adoption for several new
> DPDK-based applications for FSI segment. It also might be very
> applicable for several RT media and debugging purposes of network
> flows/streams in other segments like HPC.
> 
> There are several thoughts how to improve there:
>  - include uint64_t timestamp field into rte_mbuf with minimal impact
>    to throughput/latency. Keep it just simple uint64_t in ns (more than
>    580 years) would be enough for immediate needs while using full
>    struct timespec with twice bigger size would have much stronger
>    performance impact as missed cacheline0. It is possible as there is
>    6-bytes gap in 1st cacheline (fast path) and moving uint16_t
>    vlan_tci_outer field to 2nd cacheline. 
>  - such move will only impact for pretty rare usable VLAN RX stripping
>    mode for outer TCI (it used only for one NIC i40e from the whole
>    set and keep minimal performance impact for timestamps.  
>  - PMD can fill the field in their callback completion routines
>    depending on enabling this feature in runtime.
> 
> We evaluated other possible options but looks it will have even worse
> performance impact.
> 
> 
> Oleg Kuporosov (3):
>   mbuf: embedding timestamp into the packet
>   app/testpmd: enabled control for packet timestamps
>   net/mlx5: implementation of Rx packet timestamping support
> 
>  app/test-pmd/cmdline.c                      | 122 +++++++++++++++
>  app/test-pmd/parameters.c                   |   4 +
>  app/test-pmd/testpmd.c                      |   5 +
>  app/test-pmd/testpmd.h                      |   1 +
>  doc/guides/testpmd_app_ug/run_app.rst       |   4 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 +
>  drivers/net/mlx5/mlx5.c                     |   7 +-
>  drivers/net/mlx5/mlx5.h                     |  10 +-
>  drivers/net/mlx5/mlx5_defs.h                |   4 +
>  drivers/net/mlx5/mlx5_ethdev.c              | 222 +++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_rxq.c                 |   2 +
>  drivers/net/mlx5/mlx5_rxtx.c                |  19 ++-
>  drivers/net/mlx5/mlx5_rxtx.h                |   7 +-
>  drivers/net/mlx5/mlx5_time.h                |  53 +++++++
>  drivers/net/mlx5/mlx5_trigger.c             |   1 +
>  lib/librte_eal/common/include/rte_time.h    |  45 ++++++
>  lib/librte_mbuf/rte_mbuf.h                  |   6 +-
>  17 files changed, 507 insertions(+), 12 deletions(-)
>  create mode 100644 drivers/net/mlx5/mlx5_time.h
> 
> Thanks,
> Oleg.
> -- 
> 1.8.3.1

Reviewed-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
@ 2016-10-18 15:43   ` Olivier Matz
  2016-10-19 10:14     ` Pattan, Reshma
                       ` (2 more replies)
  2017-01-24 15:27   ` Olivier MATZ
  1 sibling, 3 replies; 20+ messages in thread
From: Olivier Matz @ 2016-10-18 15:43 UTC (permalink / raw)
  To: dev



On 10/13/2016 04:35 PM, Oleg Kuporosov wrote:
> The hard requirement of financial services industry is accurate
> timestamping aligned with the packet itself. This patch is to satisfy
> this requirement:
> 
> - include uint64_t timestamp field into rte_mbuf with minimal impact to
>   throughput/latency. Keep it just simple uint64_t in ns (more than 580
>   years) would be enough for immediate needs while using full
>   struct timespec with twice bigger size would have much stronger
>   performance impact as missed cacheline0.
> 
> - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
>   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> 
> - such move will only impact for pretty rare usable VLAN RX stripping
>   mode for outer TCI (it used only for one NIC i40e from the whole set and
>   allows to keep minimal performance impact for RX/TX timestamps.

This argument is difficult to accept. One can say you are adding
a field for a pretty rare case used by only one NIC :)

Honestly, I'm not able to judge whether timestamp is more important than
vlan_tci_outer. As room is tight in the first cache line, your patch
submission is the occasion to raise the question: how to decide what
should be in the first part of the mbuf? There are also some other
candidates for moving: m->seqn is only used in librte_reorder and it
is not set in the RX part of a driver.

About the timestamp, it would be valuable to have other opinions,
not only about the placement of the field in the structure, but also
to check that this API is also usable for other NICs.

Have you measured the impact of having the timestamp in the second part
of the mbuf?

Changing the mbuf structure should happen as rarely as possible, and we
have to make sure we take the correct decisions. I think we will
discuss this at dpdk userland 2016.


Apart from that, I wonder if an ol_flag should be added to tell that
the timestamp field is valid in the mbuf.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-18 15:43   ` Olivier Matz
@ 2016-10-19 10:14     ` Pattan, Reshma
  2016-10-19 17:40       ` Oleg Kuporosov
  2016-10-19 13:31     ` Ananyev, Konstantin
  2016-10-19 17:08     ` Oleg Kuporosov
  2 siblings, 1 reply; 20+ messages in thread
From: Pattan, Reshma @ 2016-10-19 10:14 UTC (permalink / raw)
  To: Olivier Matz, Oleg Kuporosov; +Cc: Richardson, Bruce, Ananyev, Konstantin, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, October 18, 2016 4:44 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the
> packet
> 
> 
> 
> On 10/13/2016 04:35 PM, Oleg Kuporosov wrote:
> > The hard requirement of financial services industry is accurate
> > timestamping aligned with the packet itself. This patch is to satisfy
> > this requirement:
> >
> > - include uint64_t timestamp field into rte_mbuf with minimal impact to
> >   throughput/latency. Keep it just simple uint64_t in ns (more than 580
> >   years) would be enough for immediate needs while using full
> >   struct timespec with twice bigger size would have much stronger
> >   performance impact as missed cacheline0.
> >
> > - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
> >   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> >
> > - such move will only impact for pretty rare usable VLAN RX stripping
> >   mode for outer TCI (it used only for one NIC i40e from the whole set and
> >   allows to keep minimal performance impact for RX/TX timestamps.
> 
> This argument is difficult to accept. One can say you are adding a field for a
> pretty rare case used by only one NIC :)
> 
> Honestly, I'm not able to judge whether timestamp is more important than
> vlan_tci_outer. As room is tight in the first cache line, your patch submission is
> the occasion to raise the question: how to decide what should be in the first part
> of the mbuf? There are also some other candidates for moving: m->seqn is only
> used in librte_reorder and it is not set in the RX part of a driver.
> 
> About the timestamp, it would be valuable to have other opinions, not only
> about the placement of the field in the structure, but also to check that this API
> is also usable for other NICs.
> 
> Have you measured the impact of having the timestamp in the second part of
> the mbuf?
> 
> Changing the mbuf structure should happen as rarely as possible, and we have to
> make sure we take the correct decisions. I think we will discuss this at dpdk
> userland 2016.
> 
> 

I just read this mail chain, to make every one aware again, I am emphasizing on the point that I am also adding new "time_arraived" field 
to mbuf struct as part of  below 17.02 Road map item. 

" Extended Stats (Latency and Bit Rate Statistics): Enhance the Extended NIC Stats (Xstats) implementation to support the collection and reporting of latency and bit rate measurements. Latency statistics will include min, max and average latency, and jitter. Bit rate statistics will include peak and average bit rate aggregated over a user-defined time period. This will be implemented for IXGBE and I40E."

 Adding new field  " time_arrived" to the 2nd cache line of rte_mbuf struct to mark the packet arrival time for latency measurements. 
Adding this new filed to second cache line will not break the ABI. I implemented a new library to mark the time as part of rte_eth_rx callbacks
and use that time stamp inside rte_eth_tx callback for measuring the latency. Below is the latest v3 RFC patch for reference. 
http://dpdk.org/dev/patchwork/patch/16655/

Comments are welcome.

Thanks,
Reshma

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-18 15:43   ` Olivier Matz
  2016-10-19 10:14     ` Pattan, Reshma
@ 2016-10-19 13:31     ` Ananyev, Konstantin
  2016-10-20  8:03       ` Oleg Kuporosov
  2016-10-19 17:08     ` Oleg Kuporosov
  2 siblings, 1 reply; 20+ messages in thread
From: Ananyev, Konstantin @ 2016-10-19 13:31 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: Oleg Kuporosov


Hi lads,
> 
> 
> 
> On 10/13/2016 04:35 PM, Oleg Kuporosov wrote:
> > The hard requirement of financial services industry is accurate
> > timestamping aligned with the packet itself. This patch is to satisfy
> > this requirement:
> >
> > - include uint64_t timestamp field into rte_mbuf with minimal impact to
> >   throughput/latency. Keep it just simple uint64_t in ns (more than 580
> >   years) would be enough for immediate needs while using full
> >   struct timespec with twice bigger size would have much stronger
> >   performance impact as missed cacheline0.
> >
> > - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
> >   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> >
> > - such move will only impact for pretty rare usable VLAN RX stripping
> >   mode for outer TCI (it used only for one NIC i40e from the whole set and
> >   allows to keep minimal performance impact for RX/TX timestamps.
> 
> This argument is difficult to accept. One can say you are adding
> a field for a pretty rare case used by only one NIC :)
> 
> Honestly, I'm not able to judge whether timestamp is more important than
> vlan_tci_outer. As room is tight in the first cache line, your patch
> submission is the occasion to raise the question: how to decide what
> should be in the first part of the mbuf? There are also some other
> candidates for moving: m->seqn is only used in librte_reorder and it
> is not set in the RX part of a driver.
> 
> About the timestamp, it would be valuable to have other opinions,
> not only about the placement of the field in the structure, but also
> to check that this API is also usable for other NICs.
> 
> Have you measured the impact of having the timestamp in the second part
> of the mbuf?

My vote also would be to have timestamp in the second cache line.
About moving seqn to the 2-nd cache line too - that's probably a fair point.

About the rest of the patch: 
Do you really need to put that code into the PMDs itself?
Can't the same result be achieved by using RX callbacks?
Again that approach would work with any PMD and
there would be no need to modify PMD code itself.

Another thing, that I am in doubt why to use system time?
Wouldn't raw HW TSC value (rte_rdtsc()) do here?
Konstantin

> 
> Changing the mbuf structure should happen as rarely as possible, and we
> have to make sure we take the correct decisions. I think we will
> discuss this at dpdk userland 2016.
> 
> 
> Apart from that, I wonder if an ol_flag should be added to tell that
> the timestamp field is valid in the mbuf.
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-18 15:43   ` Olivier Matz
  2016-10-19 10:14     ` Pattan, Reshma
  2016-10-19 13:31     ` Ananyev, Konstantin
@ 2016-10-19 17:08     ` Oleg Kuporosov
  2 siblings, 0 replies; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-19 17:08 UTC (permalink / raw)
  To: Olivier Matz, dev

Hello Oliver

Great thanks for your review and con

> > - include uint64_t timestamp field into rte_mbuf with minimal impact to
> >   throughput/latency. Keep it just simple uint64_t in ns (more than 580
> >   years) would be enough for immediate needs while using full
> >   struct timespec with twice bigger size would have much stronger
> >   performance impact as missed cacheline0.
> >
> > - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
> >   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> >
> > - such move will only impact for pretty rare usable VLAN RX stripping
> >   mode for outer TCI (it used only for one NIC i40e from the whole set and
> >   allows to keep minimal performance impact for RX/TX timestamps.
> 
> This argument is difficult to accept. One can say you are adding a field for a
> pretty rare case used by only one NIC :)

It  may be looks so, but in fact not only for one NIC. Absence of timestamp there 
Required from developers implement its support in out of DPDK data path with
Local DPDK patching which also may lead some penalty in accuracy.
Good example here is open source implementation of tracing library - 
https://github.com/wanduow/libtrace/tree/libtrace4

These folks patched DPDK to have timestamp for every packet (Intel DPDK Patches folder)
And used it in out of band to store and later processing (lib/format_dpdk.c).
That was actually the starting point of my investigations.
Such approach is working for 1GBb case but not for 10-100 cases.

> 
> Honestly, I'm not able to judge whether timestamp is more important than
> vlan_tci_outer. As room is tight in the first cache line, your patch submission
> is the occasion to raise the question: how to decide what should be in the
> first part of the mbuf? There are also some other candidates for moving: m-
> >seqn is only used in librte_reorder and it is not set in the RX part of a driver.

Agree, it is difficult to decide, my thoughts were:
- there is hole (6 bytes) which wasn't marked as reserved for any planned extensions;
-vlan_tci_outer is being used by only one NIC (i40e) so far, 2nd NIC (fm10k) is using it
With comment:
		 * mbuf->vlan_tci_outer is an idle field in fm10k driver,
		 * so it can be selected to store sglort value.
To store some another value under some specific "if".

Also for i40e it is under #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC which per doc should be
Enabled for high throughput of small packets. So in default case (disabled) it anyway has
some performance penalty with using 32 bytes descriptor and moving it to 2nd CL would
not hit big additional penalty. Unfortunately I have no such NIC to measure.
Is there any data how often double tagging in being used  in  DPDK applications?

Another my thought was to have at the end of 1st CL enum which may hold
Reserved fields per specific use cases and data widths (uint8, 2xuint4, 4xuint2, 8xbytes).
 
> 
> About the timestamp, it would be valuable to have other opinions, not only
> about the placement of the field in the structure, but also to check that this
> API is also usable for other NICs.

Sure, but I didn't change timesync/timestamping API itself.

> Have you measured the impact of having the timestamp in the second part
> of the mbuf?

Yes, the worst case with prefetching of 2nd CL it is 5.7-5.9 % penalty for combined RX+TX
And expectedly much worse without prefetching.
In the best case it is 0.3..0.5 % for RX only. It can be explained by much harder cache
trashing when TX is "on".
 
> Changing the mbuf structure should happen as rarely as possible, and we
> have to make sure we take the correct decisions. I think we will discuss this at
> dpdk userland 2016.

Oh, yes, please discuss, I would not be able to join. :(

> Apart from that, I wonder if an ol_flag should be added to tell that the
> timestamp field is valid in the mbuf.

Oliver, there is PKT_RX_IEEE1588_TMST flag already.

Best regards,
Oleg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-19 10:14     ` Pattan, Reshma
@ 2016-10-19 17:40       ` Oleg Kuporosov
  2016-10-25 12:39         ` Pattan, Reshma
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-19 17:40 UTC (permalink / raw)
  To: Pattan, Reshma, Olivier Matz; +Cc: Richardson, Bruce, Ananyev, Konstantin, dev

Hello Reshma

> I just read this mail chain, to make every one aware again, I am emphasizing
> on the point that I am also adding new "time_arraived" field to mbuf struct as
> part of  below 17.02 Road map item.

Thanks for your work for extending statistics support in DPDK.

"time_arrived" points here for mostly RX path, while more common term used "timestamp"
Allows also using it for TX path as well in the future.

Best regards,
Oleg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-19 13:31     ` Ananyev, Konstantin
@ 2016-10-20  8:03       ` Oleg Kuporosov
  2016-10-20 10:57         ` Jan Blunck
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Kuporosov @ 2016-10-20  8:03 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier Matz, dev

Hello Konstantin,

> 
> My vote also would be to have timestamp in the second cache line.
> About moving seqn to the 2-nd cache line too - that's probably a fair point.

It may impact throughput till ~6% for applications required such embedded 
Timestamps.
 
> About the rest of the patch:
> Do you really need to put that code into the PMDs itself?
> Can't the same result be achieved by using RX callbacks?
> Again that approach would work with any PMD and there would be no need
> to modify PMD code itself.

Correct, the approach with using callbacs (rte_eth_timesync_read_[r|t]x_timestamp())
Has also some Cons for this use case:
- FSI needs the most accurate stamping as possible by reasons were described in
Cover letter
- callback will be called from user app and so you have to take into account 
Difference between time when packet was released by NIC and callback call
- such difference is not easy to estimate correctly due to dependency on CPU type,
Its frequency and current load conditions
- even estimated it would be hard without additional performance penalty to align
Packet with timestamp, taking account that call may actually placed from
Different thread or even process.

It looks the least impacting and correct way is to have timestamp in rte_mbuf and fill
It in Rx burst procedure.

> Another thing, that I am in doubt why to use system time?
> Wouldn't raw HW TSC value (rte_rdtsc()) do here?

System time is being used for periodic clock synchronization between wall
clock and NIC to estimate NIC clock deviation. It is in assumption the system itself is
synchronized by PTP from master clock. It is run on context of control thread.

Thanks,
Oleg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-20  8:03       ` Oleg Kuporosov
@ 2016-10-20 10:57         ` Jan Blunck
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Blunck @ 2016-10-20 10:57 UTC (permalink / raw)
  To: Oleg Kuporosov; +Cc: Ananyev, Konstantin, Olivier Matz, dev

On Thu, Oct 20, 2016 at 4:03 AM, Oleg Kuporosov <olegk@mellanox.com> wrote:
> Hello Konstantin,
>
>>
>> My vote also would be to have timestamp in the second cache line.
>> About moving seqn to the 2-nd cache line too - that's probably a fair point.
>
> It may impact throughput till ~6% for applications required such embedded
> Timestamps.
>
>> About the rest of the patch:
>> Do you really need to put that code into the PMDs itself?
>> Can't the same result be achieved by using RX callbacks?
>> Again that approach would work with any PMD and there would be no need
>> to modify PMD code itself.
>
> Correct, the approach with using callbacs (rte_eth_timesync_read_[r|t]x_timestamp())
> Has also some Cons for this use case:
> - FSI needs the most accurate stamping as possible by reasons were described in
> Cover letter

>From my experience this is only true if there is near-zero performance
impact. From my perspective this is only relevant if the used hardware
supports offloading of writing the timestamps. Everything else is a
huge impact if its unconditionally enabled.

The regulatory requirements are already covered by the exchange
protocols which means that timestamps are already present in the
network packet payload (generated by the exchange trading system
and/or the trading application itself). In the end it is the exchange
itself and its members that are regulated. I can see that this might
be interesting for exchange members allowing sponsored naked access
(for non-exchange members) to generate data that they are not
front-running their clients.

I doubt that this non-functional requirement is important enough to
sacrifice the functional requirement of supporting QinQ.

> - callback will be called from user app and so you have to take into account
> Difference between time when packet was released by NIC and callback call

Have you looked at using dedicated preallocated trace buffers that are
filled with timestamps values? This should work fine for getting some
inside into the latency between application readiness and the actual
time the burst happened.

Thanks,
Jan

> - such difference is not easy to estimate correctly due to dependency on CPU type,
> Its frequency and current load conditions
> - even estimated it would be hard without additional performance penalty to align
> Packet with timestamp, taking account that call may actually placed from
> Different thread or even process.
>
> It looks the least impacting and correct way is to have timestamp in rte_mbuf and fill
> It in Rx burst procedure.
>
>> Another thing, that I am in doubt why to use system time?
>> Wouldn't raw HW TSC value (rte_rdtsc()) do here?
>
> System time is being used for periodic clock synchronization between wall
> clock and NIC to estimate NIC clock deviation. It is in assumption the system itself is
> synchronized by PTP from master clock. It is run on context of control thread.
>
> Thanks,
> Oleg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-19 17:40       ` Oleg Kuporosov
@ 2016-10-25 12:39         ` Pattan, Reshma
  0 siblings, 0 replies; 20+ messages in thread
From: Pattan, Reshma @ 2016-10-25 12:39 UTC (permalink / raw)
  To: Oleg Kuporosov, Olivier Matz; +Cc: Richardson, Bruce, Ananyev, Konstantin, dev

Hi,

> -----Original Message-----
> From: Oleg Kuporosov [mailto:olegk@mellanox.com]
> Sent: Wednesday, October 19, 2016 6:40 PM
> To: Pattan, Reshma <reshma.pattan@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the
> packet
> 
> Hello Reshma
> 
> > I just read this mail chain, to make every one aware again, I am
> > emphasizing on the point that I am also adding new "time_arraived"
> > field to mbuf struct as part of  below 17.02 Road map item.
> 
> Thanks for your work for extending statistics support in DPDK.
> 
> "time_arrived" points here for mostly RX path, while more common term used
> "timestamp"
> Allows also using it for TX path as well in the future.
> 

I will take a note of this for next revision.

Thanks,
Reshma

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
  2016-10-18 15:43   ` Olivier Matz
@ 2017-01-24 15:27   ` Olivier MATZ
  1 sibling, 0 replies; 20+ messages in thread
From: Olivier MATZ @ 2017-01-24 15:27 UTC (permalink / raw)
  To: Oleg Kuporosov; +Cc: dev

On Thu, 13 Oct 2016 14:35:06 +0000, Oleg Kuporosov <olegk@mellanox.com>
wrote:
> The hard requirement of financial services industry is accurate
> timestamping aligned with the packet itself. This patch is to satisfy
> this requirement:
> 
> - include uint64_t timestamp field into rte_mbuf with minimal impact
> to throughput/latency. Keep it just simple uint64_t in ns (more than
> 580 years) would be enough for immediate needs while using full
>   struct timespec with twice bigger size would have much stronger
>   performance impact as missed cacheline0.
> 
> - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
>   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> 
> - such move will only impact for pretty rare usable VLAN RX stripping
>   mode for outer TCI (it used only for one NIC i40e from the whole
> set and allows to keep minimal performance impact for RX/TX
> timestamps.
> 
> Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 23b7bf8..1e1f2ed 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -851,8 +851,7 @@ struct rte_mbuf {
>  
>  	uint32_t seqn; /**< Sequence number. See also
> rte_reorder_insert() */ 
> -	/** Outer VLAN TCI (CPU order), valid if
> PKT_RX_QINQ_STRIPPED is set. */
> -	uint16_t vlan_tci_outer;
> +	uint64_t timestamp;       /**< Packet's timestamp, usually
> in ns */ 
>  	/* second cache line - fields only used in slow path or on
> TX */ MARKER cacheline1 __rte_cache_min_aligned;
> @@ -885,6 +884,9 @@ struct rte_mbuf {
>  		};
>  	};
>  
> +	/** Outer VLAN TCI (CPU order), valid if
> PKT_RX_QINQ_STRIPPED is set. */
> +	uint16_t vlan_tci_outer;
> +
>  	/** Size of the application private data. In case of an
> indirect
>  	 * mbuf, it stores the direct mbuf private data size. */
>  	uint16_t priv_size;

FYI, I posted a RFC patchset that introduces the timestamp field in the
mbuf for v17.05:
http://dpdk.org/ml/archives/dev/2017-January/056187.html


Regards,
Olivier

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
  2016-10-13 14:35 ` [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps Oleg Kuporosov
@ 2017-04-25 14:02   ` Wu, Jingjing
  2017-04-25 16:22     ` Thomas Monjalon
  0 siblings, 1 reply; 20+ messages in thread
From: Wu, Jingjing @ 2017-04-25 14:02 UTC (permalink / raw)
  To: Oleg Kuporosov, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Oleg Kuporosov
> Sent: Thursday, October 13, 2016 10:35 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet
> timestamps
> 
> Implemented two methods of control
> 
> - by --enable-timestamps CL testpmd application we can enable timestamping
>   for all ports;
> - in interactive mode port config <port> timestamps on|off is able to
>   configure timestamping per specific port.
> 
> The control doesn't interact with IEEE1588 PTP implementation there as it is
> under macro compilation but can be extended in the future.
> 
> This feature is required for debugging/testing purposes for real time HW packet
> timestamping.

We have ieee1588fwd.c to demo the timesync enable/disable, can we reuse
The fwd engine instead of defining new commands?

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
  2017-04-25 14:02   ` Wu, Jingjing
@ 2017-04-25 16:22     ` Thomas Monjalon
  2017-04-28  0:19       ` Wu, Jingjing
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Monjalon @ 2017-04-25 16:22 UTC (permalink / raw)
  To: Wu, Jingjing; +Cc: dev, Oleg Kuporosov, olivier.matz

Hi,

25/04/2017 16:02, Wu, Jingjing:
> From: Oleg Kuporosov
> > Implemented two methods of control
> > 
> > - by --enable-timestamps CL testpmd application we can enable timestamping
> >   for all ports;
> > - in interactive mode port config <port> timestamps on|off is able to
> >   configure timestamping per specific port.
> > 
> > The control doesn't interact with IEEE1588 PTP implementation there as it is
> > under macro compilation but can be extended in the future.
> > 
> > This feature is required for debugging/testing purposes for real time HW packet
> > timestamping.
> 
> We have ieee1588fwd.c to demo the timesync enable/disable, can we reuse
> The fwd engine instead of defining new commands?

Yes for IEEE1588 feature, we should use app/test-pmd/ieee1588fwd.c.

There is more to say about this feature.

The main goal of this patchset was to add a timestamp in the mbuf.
It has been done by another patchset in 17.05.
Do we know how to test this timestamp in testpmd?

About IEEE1588 feature, why is there a config option?
	CONFIG_RTE_LIBRTE_IEEE1588
A feature should never be disabled at compile time.
There is also a runtime enablement with rte_eth_timesync_enable().

I think we need some discussions here.
Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
  2017-04-25 16:22     ` Thomas Monjalon
@ 2017-04-28  0:19       ` Wu, Jingjing
  2017-04-28  9:12         ` Thomas Monjalon
  0 siblings, 1 reply; 20+ messages in thread
From: Wu, Jingjing @ 2017-04-28  0:19 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Oleg Kuporosov, olivier.matz

> 25/04/2017 16:02, Wu, Jingjing:
> > From: Oleg Kuporosov
> > > Implemented two methods of control
> > >
> > > - by --enable-timestamps CL testpmd application we can enable
> timestamping
> > >   for all ports;
> > > - in interactive mode port config <port> timestamps on|off is able to
> > >   configure timestamping per specific port.
> > >
> > > The control doesn't interact with IEEE1588 PTP implementation there
> > > as it is under macro compilation but can be extended in the future.
> > >
> > > This feature is required for debugging/testing purposes for real
> > > time HW packet timestamping.
> >
> > We have ieee1588fwd.c to demo the timesync enable/disable, can we
> > reuse The fwd engine instead of defining new commands?
> 
> Yes for IEEE1588 feature, we should use app/test-pmd/ieee1588fwd.c.
> 
> There is more to say about this feature.
> 
> The main goal of this patchset was to add a timestamp in the mbuf.
> It has been done by another patchset in 17.05.
OK. But it is not clear now what is the timestamp for, right?

> Do we know how to test this timestamp in testpmd?
>
Mbuf dump can show this value. The problem is if we can use the
rte_eth_timesync_enable/disable to indicate the timestamp
is in mbuf or not.

> About IEEE1588 feature, why is there a config option?
> 	CONFIG_RTE_LIBRTE_IEEE1588
> A feature should never be disabled at compile time.
> There is also a runtime enablement with rte_eth_timesync_enable().
> 
> I think we need some discussions here.

Yes, I agree.

> Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
  2017-04-28  0:19       ` Wu, Jingjing
@ 2017-04-28  9:12         ` Thomas Monjalon
  2017-04-28 10:04           ` Olivier Matz
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Monjalon @ 2017-04-28  9:12 UTC (permalink / raw)
  To: Wu, Jingjing, Daniel Mrzyglod, Pablo de Lara
  Cc: dev, Oleg Kuporosov, olivier.matz

28/04/2017 02:19, Wu, Jingjing:
> > 25/04/2017 16:02, Wu, Jingjing:
> > > From: Oleg Kuporosov
> > > > Implemented two methods of control
> > > >
> > > > - by --enable-timestamps CL testpmd application we can enable
> > timestamping
> > > >   for all ports;
> > > > - in interactive mode port config <port> timestamps on|off is able to
> > > >   configure timestamping per specific port.
> > > >
> > > > The control doesn't interact with IEEE1588 PTP implementation there
> > > > as it is under macro compilation but can be extended in the future.
> > > >
> > > > This feature is required for debugging/testing purposes for real
> > > > time HW packet timestamping.
> > >
> > > We have ieee1588fwd.c to demo the timesync enable/disable, can we
> > > reuse The fwd engine instead of defining new commands?
> > 
> > Yes for IEEE1588 feature, we should use app/test-pmd/ieee1588fwd.c.
> > 
> > There is more to say about this feature.
> > 
> > The main goal of this patchset was to add a timestamp in the mbuf.
> > It has been done by another patchset in 17.05.
> 
> OK. But it is not clear now what is the timestamp for, right?

Olivier can comment on the Rx timestamp (PKT_RX_TIMESTAMP).

Please Jingjing (or Pablo or Daniel),
could you explain the relation with PKT_TX_IEEE1588_TMST?

> > Do we know how to test this timestamp in testpmd?
> 
> Mbuf dump can show this value. The problem is if we can use the
> rte_eth_timesync_enable/disable to indicate the timestamp
> is in mbuf or not.

Please could you describe the problem?

> > About IEEE1588 feature, why is there a config option?
> > 	CONFIG_RTE_LIBRTE_IEEE1588
> > A feature should never be disabled at compile time.
> > There is also a runtime enablement with rte_eth_timesync_enable().
> > 
> > I think we need some discussions here.
> 
> Yes, I agree.

What is your opinion?
What prevents from removing CONFIG_RTE_LIBRTE_IEEE1588?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
  2017-04-28  9:12         ` Thomas Monjalon
@ 2017-04-28 10:04           ` Olivier Matz
  0 siblings, 0 replies; 20+ messages in thread
From: Olivier Matz @ 2017-04-28 10:04 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Wu, Jingjing, Daniel Mrzyglod, Pablo de Lara, dev, Oleg Kuporosov

Hi,

On Fri, 28 Apr 2017 11:12:29 +0200, Thomas Monjalon <thomas@monjalon.net> wrote:
> 28/04/2017 02:19, Wu, Jingjing:
> > > 25/04/2017 16:02, Wu, Jingjing:  
> > > > From: Oleg Kuporosov  
> > > > > Implemented two methods of control
> > > > >
> > > > > - by --enable-timestamps CL testpmd application we can enable  
> > > timestamping  
> > > > >   for all ports;
> > > > > - in interactive mode port config <port> timestamps on|off is able to
> > > > >   configure timestamping per specific port.
> > > > >
> > > > > The control doesn't interact with IEEE1588 PTP implementation there
> > > > > as it is under macro compilation but can be extended in the future.
> > > > >
> > > > > This feature is required for debugging/testing purposes for real
> > > > > time HW packet timestamping.  
> > > >
> > > > We have ieee1588fwd.c to demo the timesync enable/disable, can we
> > > > reuse The fwd engine instead of defining new commands?  
> > > 
> > > Yes for IEEE1588 feature, we should use app/test-pmd/ieee1588fwd.c.
> > > 
> > > There is more to say about this feature.
> > > 
> > > The main goal of this patchset was to add a timestamp in the mbuf.
> > > It has been done by another patchset in 17.05.  
> > 
> > OK. But it is not clear now what is the timestamp for, right?  
> 
> Olivier can comment on the Rx timestamp (PKT_RX_TIMESTAMP).

The timestamp can be added in a mbuf by a PMD or by software.
The unit and time reference (the 0) is not normalized: they
can be different between 2 PMDs but must be the same for one
PMD on the same port.

As of now, this field could be used by an application to know when
a packet was effectively received by the hardware. The use case could
be high frequency trading for instance. The PMD can provide an API to
convert timestamp value into another unit.

Possible evolutions are:
- a library to normalize and compare timestamps from different
  sources
- implementations in more PMDs



> 
> Please Jingjing (or Pablo or Daniel),
> could you explain the relation with PKT_TX_IEEE1588_TMST?
> 
> > > Do we know how to test this timestamp in testpmd?  
> > 
> > Mbuf dump can show this value. The problem is if we can use the
> > rte_eth_timesync_enable/disable to indicate the timestamp
> > is in mbuf or not.  
> 
> Please could you describe the problem?
> 
> > > About IEEE1588 feature, why is there a config option?
> > > 	CONFIG_RTE_LIBRTE_IEEE1588
> > > A feature should never be disabled at compile time.
> > > There is also a runtime enablement with rte_eth_timesync_enable().
> > > 
> > > I think we need some discussions here.  
> > 
> > Yes, I agree.  
> 
> What is your opinion?
> What prevents from removing CONFIG_RTE_LIBRTE_IEEE1588?
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support
  2016-10-17 11:24 ` [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Nélio Laranjeiro
@ 2017-10-17 13:57   ` Thomas Monjalon
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Monjalon @ 2017-10-17 13:57 UTC (permalink / raw)
  To: Oleg Kuporosov; +Cc: dev, Nélio Laranjeiro

> On Thu, Oct 13, 2016 at 02:35:05PM +0000, Oleg Kuporosov wrote:
> > Oleg Kuporosov (3):
> >   mbuf: embedding timestamp into the packet
> >   app/testpmd: enabled control for packet timestamps
> >   net/mlx5: implementation of Rx packet timestamping support

One year later, it has been superseded by Raslan's patches.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-10-17 13:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-13 14:35 [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Oleg Kuporosov
2016-10-13 14:35 ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
2016-10-18 15:43   ` Olivier Matz
2016-10-19 10:14     ` Pattan, Reshma
2016-10-19 17:40       ` Oleg Kuporosov
2016-10-25 12:39         ` Pattan, Reshma
2016-10-19 13:31     ` Ananyev, Konstantin
2016-10-20  8:03       ` Oleg Kuporosov
2016-10-20 10:57         ` Jan Blunck
2016-10-19 17:08     ` Oleg Kuporosov
2017-01-24 15:27   ` Olivier MATZ
2016-10-13 14:35 ` [dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps Oleg Kuporosov
2017-04-25 14:02   ` Wu, Jingjing
2017-04-25 16:22     ` Thomas Monjalon
2017-04-28  0:19       ` Wu, Jingjing
2017-04-28  9:12         ` Thomas Monjalon
2017-04-28 10:04           ` Olivier Matz
2016-10-13 14:35 ` [dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support Oleg Kuporosov
2016-10-17 11:24 ` [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Nélio Laranjeiro
2017-10-17 13:57   ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).