DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements
@ 2015-03-10 16:13 Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 1/6] rte_sched: don't put tabs in log messages Stephen Hemminger
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev

These are the bug fixes and enhancements to the RTE sched
code. Based off 2.0 latest git.

Stephen Hemminger (6):
  rte_sched: don't put tabs in log messages
  rte_sched: make RED optional at runtime
  rte_sched: expand scheduler hierarchy for more VLAN's
  rte_sched: keep track of RED drops
  rte_sched: allow reading statistics without clearing
  rte_sched: eliminate floating point in calculating byte clock

 app/test/test_sched.c        |   4 +-
 examples/qos_sched/stats.c   |  16 +++++--
 lib/librte_mbuf/rte_mbuf.h   |   7 ++-
 lib/librte_sched/rte_sched.c | 109 ++++++++++++++++++++++++++++---------------
 lib/librte_sched/rte_sched.h |  62 +++++++++++++++---------
 5 files changed, 129 insertions(+), 69 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] rte_sched: don't put tabs in log messages
  2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
@ 2015-03-10 16:13 ` Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 2/6] rte_sched: make RED optional at runtime Stephen Hemminger
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

syslog does not like tabs in log messages; tab gets translated to #011

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 -- no changes

 lib/librte_sched/rte_sched.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 95dee27..8249dbb 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -495,10 +495,10 @@ rte_sched_port_log_pipe_profile(struct rte_sched_port *port, uint32_t i)
 	struct rte_sched_pipe_profile *p = port->pipe_profiles + i;
 
 	RTE_LOG(INFO, SCHED, "Low level config for pipe profile %u:\n"
-		"\tToken bucket: period = %u, credits per period = %u, size = %u\n"
-		"\tTraffic classes: period = %u, credits per period = [%u, %u, %u, %u]\n"
-		"\tTraffic class 3 oversubscription: weight = %hhu\n"
-		"\tWRR cost: [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu]\n",
+		"    Token bucket: period = %u, credits per period = %u, size = %u\n"
+		"    Traffic classes: period = %u, credits per period = [%u, %u, %u, %u]\n"
+		"    Traffic class 3 oversubscription: weight = %hhu\n"
+		"    WRR cost: [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu]\n",
 		i,
 
 		/* Token bucket */
@@ -710,9 +710,9 @@ rte_sched_port_log_subport_config(struct rte_sched_port *port, uint32_t i)
 	struct rte_sched_subport *s = port->subport + i;
 
 	RTE_LOG(INFO, SCHED, "Low level config for subport %u:\n"
-		"\tToken bucket: period = %u, credits per period = %u, size = %u\n"
-		"\tTraffic classes: period = %u, credits per period = [%u, %u, %u, %u]\n"
-		"\tTraffic class 3 oversubscription: wm min = %u, wm max = %u\n",
+		"    Token bucket: period = %u, credits per period = %u, size = %u\n"
+		"    Traffic classes: period = %u, credits per period = [%u, %u, %u, %u]\n"
+		"    Traffic class 3 oversubscription: wm min = %u, wm max = %u\n",
 		i,
 
 		/* Token bucket */
-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] rte_sched: make RED optional at runtime
  2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 1/6] rte_sched: don't put tabs in log messages Stephen Hemminger
@ 2015-03-10 16:13 ` Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 3/6] rte_sched: expand scheduler hierarchy for more VLAN's Stephen Hemminger
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Want to be able to build with RTE_SCHED_RED enabled but
allow disabling RED on a per-queue basis at runtime.

RED is disabled unless min/max thresholds set.

Signed-off-by: Stephen Hemmminger <stephen@networkplumber.org>
---
v2 -- no changes

 lib/librte_sched/rte_sched.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 8249dbb..1fd2cce 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -636,6 +636,12 @@ rte_sched_port_config(struct rte_sched_port_params *params)
 		uint32_t j;
 
 		for (j = 0; j < e_RTE_METER_COLORS; j++) {
+			/* if min/max are both zero, then RED is disabled */
+			if ((params->red_params[i][j].min_th |
+			     params->red_params[i][j].max_th) == 0) {
+				continue;
+			}
+
 			if (rte_red_config_init(&port->red_config[i][j],
 				params->red_params[i][j].wq_log2,
 				params->red_params[i][j].min_th,
@@ -1069,6 +1075,9 @@ rte_sched_port_red_drop(struct rte_sched_port *port, struct rte_mbuf *pkt, uint3
 	color = rte_sched_port_pkt_read_color(pkt);
 	red_cfg = &port->red_config[tc_index][color];
 
+	if ( (red_cfg->min_th | red_cfg->max_th) == 0)
+		return 0;
+
 	qe = port->queue_extra + qindex;
 	red = &qe->red;
 
-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] rte_sched: expand scheduler hierarchy for more VLAN's
  2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 1/6] rte_sched: don't put tabs in log messages Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 2/6] rte_sched: make RED optional at runtime Stephen Hemminger
@ 2015-03-10 16:13 ` Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 4/6] rte_sched: keep track of RED drops Stephen Hemminger
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

The QoS subport is limited to 8 bits in original code.
But customers demanded ability to support full number of VLAN's (4096)
therefore use the full part of the tag field of mbuf.

Resize the pipe as well to allow for more pipes in future and
avoid expensive bitfield access.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 -- no changes

 lib/librte_mbuf/rte_mbuf.h   |  7 +++++--
 lib/librte_sched/rte_sched.h | 38 ++++++++++++++++++++++++--------------
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 17ba791..ef420ec 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -284,8 +284,11 @@ struct rte_mbuf {
 			/**< First 4 flexible bytes or FD ID, dependent on
 			     PKT_RX_FDIR_* flag in ol_flags. */
 		} fdir;           /**< Filter identifier if FDIR enabled */
-		uint32_t sched;   /**< Hierarchical scheduler */
-		uint32_t usr;	  /**< User defined tags. See rte_distributor_process() */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;     	  /**< Hierarchical scheduler */
+		uint32_t usr;	  /**< User defined tags. See rte_distributor_process */
 	} hash;                   /**< hash information */
 
 	uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
index e6bba22..dda287f 100644
--- a/lib/librte_sched/rte_sched.h
+++ b/lib/librte_sched/rte_sched.h
@@ -195,16 +195,20 @@ struct rte_sched_port_params {
 #endif
 };
 
-/** Path through the scheduler hierarchy used by the scheduler enqueue operation to
-identify the destination queue for the current packet. Stored in the field hash.sched
-of struct rte_mbuf of each packet, typically written by the classification stage and read by
-scheduler enqueue.*/
+/*
+ * Path through the scheduler hierarchy used by the scheduler enqueue
+ * operation to identify the destination queue for the current
+ * packet. Stored in the field pkt.hash.sched of struct rte_mbuf of
+ * each packet, typically written by the classification stage and read
+ * by scheduler enqueue.
+ */
 struct rte_sched_port_hierarchy {
-	uint32_t queue:2;                /**< Queue ID (0 .. 3) */
-	uint32_t traffic_class:2;        /**< Traffic class ID (0 .. 3)*/
-	uint32_t pipe:20;                /**< Pipe ID */
-	uint32_t subport:6;              /**< Subport ID */
-	uint32_t color:2;                /**< Color */
+	uint16_t queue:2;                /**< Queue ID (0 .. 3) */
+	uint16_t traffic_class:2;        /**< Traffic class ID (0 .. 3)*/
+	uint16_t color:2;                /**< Color */
+	uint16_t unused:10;
+	uint16_t subport;              	 /**< Subport ID */
+	uint32_t pipe;		         /**< Pipe ID */
 };
 
 /*
@@ -350,12 +354,15 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
  */
 static inline void
 rte_sched_port_pkt_write(struct rte_mbuf *pkt,
-	uint32_t subport, uint32_t pipe, uint32_t traffic_class, uint32_t queue, enum rte_meter_color color)
+			 uint32_t subport, uint32_t pipe,
+			 uint32_t traffic_class,
+			 uint32_t queue, enum rte_meter_color color)
 {
-	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
+	struct rte_sched_port_hierarchy *sched
+		= (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
 
-	sched->color = (uint32_t) color;
 	sched->subport = subport;
+	sched->color = (uint32_t) color;
 	sched->pipe = pipe;
 	sched->traffic_class = traffic_class;
 	sched->queue = queue;
@@ -379,9 +386,12 @@ rte_sched_port_pkt_write(struct rte_mbuf *pkt,
  *
  */
 static inline void
-rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t *subport, uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue)
+rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t *subport,
+				  uint32_t *pipe, uint32_t *traffic_class,
+				  uint32_t *queue)
 {
-	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
+	struct rte_sched_port_hierarchy *sched
+		= (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
 
 	*subport = sched->subport;
 	*pipe = sched->pipe;
-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] rte_sched: keep track of RED drops
  2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
                   ` (2 preceding siblings ...)
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 3/6] rte_sched: expand scheduler hierarchy for more VLAN's Stephen Hemminger
@ 2015-03-10 16:13 ` Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing Stephen Hemminger
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock Stephen Hemminger
  5 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Add new statistic to keep track of drops due to RED.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 -- no changes

 lib/librte_sched/rte_sched.c | 28 +++++++++++++++++++++++-----
 lib/librte_sched/rte_sched.h |  6 ++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 1fd2cce..2071414 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -1028,7 +1028,9 @@ rte_sched_port_update_subport_stats(struct rte_sched_port *port, uint32_t qindex
 }
 
 static inline void
-rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port, uint32_t qindex, struct rte_mbuf *pkt)
+rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port,
+					    uint32_t qindex,
+					    struct rte_mbuf *pkt, uint32_t red)
 {
 	struct rte_sched_subport *s = port->subport + (qindex / rte_sched_port_queues_per_subport(port));
 	uint32_t tc_index = (qindex >> 2) & 0x3;
@@ -1036,6 +1038,9 @@ rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port, uint32_
 
 	s->stats.n_pkts_tc_dropped[tc_index] += 1;
 	s->stats.n_bytes_tc_dropped[tc_index] += pkt_len;
+#ifdef RTE_SCHED_RED
+	s->stats.n_pkts_red_dropped[tc_index] += red;
+#endif
 }
 
 static inline void
@@ -1049,13 +1054,18 @@ rte_sched_port_update_queue_stats(struct rte_sched_port *port, uint32_t qindex,
 }
 
 static inline void
-rte_sched_port_update_queue_stats_on_drop(struct rte_sched_port *port, uint32_t qindex, struct rte_mbuf *pkt)
+rte_sched_port_update_queue_stats_on_drop(struct rte_sched_port *port,
+					  uint32_t qindex,
+					  struct rte_mbuf *pkt, uint32_t red)
 {
 	struct rte_sched_queue_extra *qe = port->queue_extra + qindex;
 	uint32_t pkt_len = pkt->pkt_len;
 
 	qe->stats.n_pkts_dropped += 1;
 	qe->stats.n_bytes_dropped += pkt_len;
+#ifdef RTE_SCHED_RED
+	qe->stats.n_pkts_red_dropped += red;
+#endif
 }
 
 #endif /* RTE_SCHED_COLLECT_STATS */
@@ -1206,12 +1216,20 @@ rte_sched_port_enqueue_qwa(struct rte_sched_port *port, uint32_t qindex, struct
 	qlen = q->qw - q->qr;
 
 	/* Drop the packet (and update drop stats) when queue is full */
-	if (unlikely(rte_sched_port_red_drop(port, pkt, qindex, qlen) || (qlen >= qsize))) {
+	if (unlikely(rte_sched_port_red_drop(port, pkt, qindex, qlen))) {
+#ifdef RTE_SCHED_COLLECT_STATS
+		rte_sched_port_update_subport_stats_on_drop(port, qindex, pkt, 1);
+		rte_sched_port_update_queue_stats_on_drop(port, qindex, pkt, 1);
+#endif
 		rte_pktmbuf_free(pkt);
+	}
+
+	if (qlen >= qsize) {
 #ifdef RTE_SCHED_COLLECT_STATS
-		rte_sched_port_update_subport_stats_on_drop(port, qindex, pkt);
-		rte_sched_port_update_queue_stats_on_drop(port, qindex, pkt);
+		rte_sched_port_update_subport_stats_on_drop(port, qindex, pkt, 0);
+		rte_sched_port_update_queue_stats_on_drop(port, qindex, pkt, 0);
 #endif
+		rte_pktmbuf_free(pkt);
 		return 0;
 	}
 
diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
index dda287f..e9bf18a 100644
--- a/lib/librte_sched/rte_sched.h
+++ b/lib/librte_sched/rte_sched.h
@@ -140,6 +140,9 @@ struct rte_sched_subport_stats {
 	                                      subport for each traffic class*/
 	uint32_t n_bytes_tc_dropped[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /**< Number of bytes dropped by the current
                                           subport for each traffic class due to subport queues being full or congested */
+#ifdef RTE_SCHED_RED
+	uint32_t n_pkts_red_dropped[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /**< Number of packets dropped by red */
+#endif
 };
 
 /** Pipe configuration parameters. The period and credits_per_period parameters are measured
@@ -168,6 +171,9 @@ struct rte_sched_queue_stats {
 	/* Packets */
 	uint32_t n_pkts;                 /**< Number of packets successfully written to current queue */
 	uint32_t n_pkts_dropped;         /**< Number of packets dropped due to current queue being full or congested */
+#ifdef RTE_SCHED_RED
+	uint32_t n_pkts_red_dropped;
+#endif
 
 	/* Bytes */
 	uint32_t n_bytes;                /**< Number of bytes successfully written to current queue */
-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
                   ` (3 preceding siblings ...)
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 4/6] rte_sched: keep track of RED drops Stephen Hemminger
@ 2015-03-10 16:13 ` Stephen Hemminger
  2015-03-12 19:28   ` Dumitrescu, Cristian
  2015-03-16 20:00   ` Dumitrescu, Cristian
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock Stephen Hemminger
  5 siblings, 2 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

The rte_sched statistics API should allow reading statistics without
clearing. Make auto-clear optional.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 -- change to flag for clear from separate read/clear

 app/test/test_sched.c        |  4 ++--
 examples/qos_sched/stats.c   | 16 +++++++++++-----
 lib/librte_sched/rte_sched.c | 44 ++++++++++++++++++++++----------------------
 lib/librte_sched/rte_sched.h | 18 ++++++++++--------
 4 files changed, 45 insertions(+), 37 deletions(-)

diff --git a/app/test/test_sched.c b/app/test/test_sched.c
index 60c62de..3c9c9e3 100644
--- a/app/test/test_sched.c
+++ b/app/test/test_sched.c
@@ -208,13 +208,13 @@ test_sched(void)
 
 	struct rte_sched_subport_stats subport_stats;
 	uint32_t tc_ov;
-	rte_sched_subport_read_stats(port, SUBPORT, &subport_stats, &tc_ov);
+	rte_sched_subport_read_stats(port, SUBPORT, &subport_stats, &tc_ov, 1);
 #if 0
 	TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong subport stats\n");
 #endif
 	struct rte_sched_queue_stats queue_stats;
 	uint16_t qlen;
-	rte_sched_queue_read_stats(port, QUEUE, &queue_stats, &qlen);
+	rte_sched_queue_read_stats(port, QUEUE, &queue_stats, &qlen, 1);
 #if 0
 	TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue stats\n");
 #endif
diff --git a/examples/qos_sched/stats.c b/examples/qos_sched/stats.c
index b4db7b5..88caa54 100644
--- a/examples/qos_sched/stats.c
+++ b/examples/qos_sched/stats.c
@@ -61,7 +61,7 @@ qavg_q(uint8_t port_id, uint32_t subport_id, uint32_t pipe_id, uint8_t tc, uint8
         average = 0;
 
         for (count = 0; count < qavg_ntimes; count++) {
-                rte_sched_queue_read_stats(port, queue_id, &stats, &qlen);
+                rte_sched_queue_read_stats(port, queue_id, &stats, &qlen, 1);
                 average += qlen;
                 usleep(qavg_period);
         }
@@ -99,7 +99,9 @@ qavg_tcpipe(uint8_t port_id, uint32_t subport_id, uint32_t pipe_id, uint8_t tc)
         for (count = 0; count < qavg_ntimes; count++) {
                 part_average = 0;
                 for (i = 0; i < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) {
-                        rte_sched_queue_read_stats(port, queue_id + (tc * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i), &stats, &qlen);
+                        rte_sched_queue_read_stats(port,
+				   queue_id + (tc * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i),
+				   &stats, &qlen, 1);
                         part_average += qlen;
                 }
                 average += part_average / RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS;
@@ -138,7 +140,8 @@ qavg_pipe(uint8_t port_id, uint32_t subport_id, uint32_t pipe_id)
         for (count = 0; count < qavg_ntimes; count++) {
                 part_average = 0;
                 for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) {
-                        rte_sched_queue_read_stats(port, queue_id + i, &stats, &qlen);
+                        rte_sched_queue_read_stats(port, queue_id + i,
+						   &stats, &qlen, 1);
                         part_average += qlen;
                 }
                 average += part_average / (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS);
@@ -178,7 +181,9 @@ qavg_tcsubport(uint8_t port_id, uint32_t subport_id, uint8_t tc)
                         queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id * port_params.n_pipes_per_subport + i);
 
                         for (j = 0; j < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; j++) {
-                                rte_sched_queue_read_stats(port, queue_id + (tc * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j), &stats, &qlen);
+                                rte_sched_queue_read_stats(port,
+							   queue_id + (tc * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j),
+							   &stats, &qlen, 1);
                                 part_average += qlen;
                         }
                 }
@@ -220,7 +225,8 @@ qavg_subport(uint8_t port_id, uint32_t subport_id)
                         queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id * port_params.n_pipes_per_subport + i);
 
                         for (j = 0; j < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; j++) {
-                                rte_sched_queue_read_stats(port, queue_id + j, &stats, &qlen);
+                                rte_sched_queue_read_stats(port, queue_id + j,
+							   &stats, &qlen, 1);
                                 part_average += qlen;
                         }
                 }
diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 2071414..74d0e0a 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -909,56 +909,56 @@ rte_sched_pipe_config(struct rte_sched_port *port,
 
 int
 rte_sched_subport_read_stats(struct rte_sched_port *port,
-	uint32_t subport_id,
-	struct rte_sched_subport_stats *stats,
-	uint32_t *tc_ov)
+			     uint32_t subport_id,
+			     struct rte_sched_subport_stats *stats,
+			     uint32_t *tc_ov, int clear)
 {
 	struct rte_sched_subport *s;
 
 	/* Check user parameters */
-	if ((port == NULL) ||
-	    (subport_id >= port->n_subports_per_port) ||
-		(stats == NULL) ||
-		(tc_ov == NULL)) {
+	if (port == NULL || subport_id >= port->n_subports_per_port) 
 		return -1;
-	}
+
 	s = port->subport + subport_id;
 
 	/* Copy subport stats and clear */
-	memcpy(stats, &s->stats, sizeof(struct rte_sched_subport_stats));
-	memset(&s->stats, 0, sizeof(struct rte_sched_subport_stats));
+	if (stats)
+		*stats = s->stats;
+	if (clear)
+		memset(&s->stats, 0, sizeof(struct rte_sched_subport_stats));
 
 	/* Subport TC ovesubscription status */
-	*tc_ov = s->tc_ov;
+	if (tc_ov)
+		*tc_ov = s->tc_ov;
 
 	return 0;
 }
 
 int
 rte_sched_queue_read_stats(struct rte_sched_port *port,
-	uint32_t queue_id,
-	struct rte_sched_queue_stats *stats,
-	uint16_t *qlen)
+			   uint32_t queue_id,
+			   struct rte_sched_queue_stats *stats,
+			   uint16_t *qlen, int clear)
 {
 	struct rte_sched_queue *q;
 	struct rte_sched_queue_extra *qe;
 
 	/* Check user parameters */
-	if ((port == NULL) ||
-	    (queue_id >= rte_sched_port_queues_per_port(port)) ||
-		(stats == NULL) ||
-		(qlen == NULL)) {
+	if (port == NULL || queue_id >= rte_sched_port_queues_per_port(port))
 		return -1;
-	}
+
 	q = port->queue + queue_id;
 	qe = port->queue_extra + queue_id;
 
 	/* Copy queue stats and clear */
-	memcpy(stats, &qe->stats, sizeof(struct rte_sched_queue_stats));
-	memset(&qe->stats, 0, sizeof(struct rte_sched_queue_stats));
+	if (stats)
+		*stats = qe->stats;
+	if (clear)
+		memset(&qe->stats, 0, sizeof(struct rte_sched_queue_stats));
 
 	/* Queue length */
-	*qlen = q->qw - q->qr;
+	if (qlen)
+		*qlen = q->qw - q->qr;
 
 	return 0;
 }
diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
index e9bf18a..f0bf088 100644
--- a/lib/librte_sched/rte_sched.h
+++ b/lib/librte_sched/rte_sched.h
@@ -308,14 +308,15 @@ rte_sched_port_get_memory_footprint(struct rte_sched_port_params *params);
  * @param tc_ov
  *   Pointer to pre-allocated 4-entry array where the oversubscription status for
  *   each of the 4 subport traffic classes should be stored.
+ * @parm clear
+ *   Reset statistics after read
  * @return
  *   0 upon success, error code otherwise
  */
 int
-rte_sched_subport_read_stats(struct rte_sched_port *port,
-	uint32_t subport_id,
-	struct rte_sched_subport_stats *stats,
-	uint32_t *tc_ov);
+rte_sched_subport_read_stats(struct rte_sched_port *port, uint32_t subport_id,
+			     struct rte_sched_subport_stats *stats,
+			     uint32_t *tc_ov, int clear);
 
 /**
  * Hierarchical scheduler queue statistics read
@@ -329,14 +330,15 @@ rte_sched_subport_read_stats(struct rte_sched_port *port,
  *   counters should be stored
  * @param qlen
  *   Pointer to pre-allocated variable where the current queue length should be stored.
+ * @parm clear
+ *   Reset statistics after read
  * @return
  *   0 upon success, error code otherwise
  */
 int
-rte_sched_queue_read_stats(struct rte_sched_port *port,
-	uint32_t queue_id,
-	struct rte_sched_queue_stats *stats,
-	uint16_t *qlen);
+rte_sched_queue_read_stats(struct rte_sched_port *port, uint32_t queue_id,
+			   struct rte_sched_queue_stats *stats,
+			   uint16_t *qlen, int clear);
 
 /*
  * Run-time
-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock
  2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
                   ` (4 preceding siblings ...)
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing Stephen Hemminger
@ 2015-03-10 16:13 ` Stephen Hemminger
  2015-03-12 19:06   ` Dumitrescu, Cristian
  5 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-10 16:13 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled math instead.
This improved performance from 5Gbit/sec to 10 Gbit/sec

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 -- no changes
      despite objections, the performance observation is real
      on Intel(R) Core(TM) i7-3770 CPU

 lib/librte_sched/rte_sched.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 74d0e0a..522a647 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -102,6 +102,9 @@
 
 #define RTE_SCHED_BMP_POS_INVALID             UINT32_MAX
 
+/* For cycles_per_byte calculation */
+#define RTE_SCHED_TIME_SHIFT		      20
+
 struct rte_sched_subport {
 	/* Token bucket (TB) */
 	uint64_t tb_time; /* time of last update */
@@ -239,7 +242,7 @@ struct rte_sched_port {
 	uint64_t time_cpu_cycles;     /* Current CPU time measured in CPU cyles */
 	uint64_t time_cpu_bytes;      /* Current CPU time measured in bytes */
 	uint64_t time;                /* Current NIC TX time measured in bytes */
-	double cycles_per_byte;       /* CPU cycles per byte */
+	uint32_t cycles_per_byte;       /* CPU cycles per byte (scaled) */
 
 	/* Scheduling loop detection */
 	uint32_t pipe_loop;
@@ -657,7 +660,9 @@ rte_sched_port_config(struct rte_sched_port_params *params)
 	port->time_cpu_cycles = rte_get_tsc_cycles();
 	port->time_cpu_bytes = 0;
 	port->time = 0;
-	port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) params->rate);
+
+	port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT)
+		/ params->rate;
 
 	/* Scheduling loop detection */
 	port->pipe_loop = RTE_SCHED_PIPE_INVALID;
@@ -2126,11 +2131,12 @@ rte_sched_port_time_resync(struct rte_sched_port *port)
 {
 	uint64_t cycles = rte_get_tsc_cycles();
 	uint64_t cycles_diff = cycles - port->time_cpu_cycles;
-	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
+	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
+		/ port->cycles_per_byte;
 
 	/* Advance port time */
 	port->time_cpu_cycles = cycles;
-	port->time_cpu_bytes += (uint64_t) bytes_diff;
+	port->time_cpu_bytes += bytes_diff;
 	if (port->time < port->time_cpu_bytes) {
 		port->time = port->time_cpu_bytes;
 	}
-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock Stephen Hemminger
@ 2015-03-12 19:06   ` Dumitrescu, Cristian
  2015-03-12 23:09     ` Stephen Hemminger
  0 siblings, 1 reply; 16+ messages in thread
From: Dumitrescu, Cristian @ 2015-03-12 19:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, March 10, 2015 4:14 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org; Stephen Hemminger; Stephen Hemminger
> Subject: [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte
> clock
> 
> From: Stephen Hemminger <shemming@brocade.com>
> 
> The old code was doing a floating point divide for each rte_dequeue()
> which is very expensive. Change to using fixed point scaled math instead.
> This improved performance from 5Gbit/sec to 10 Gbit/sec
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> v2 -- no changes
>       despite objections, the performance observation is real
>       on Intel(R) Core(TM) i7-3770 CPU
> 
>  lib/librte_sched/rte_sched.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> index 74d0e0a..522a647 100644
> --- a/lib/librte_sched/rte_sched.c
> +++ b/lib/librte_sched/rte_sched.c
> @@ -102,6 +102,9 @@
> 
>  #define RTE_SCHED_BMP_POS_INVALID             UINT32_MAX
> 
> +/* For cycles_per_byte calculation */
> +#define RTE_SCHED_TIME_SHIFT		      20
> +
>  struct rte_sched_subport {
>  	/* Token bucket (TB) */
>  	uint64_t tb_time; /* time of last update */
> @@ -239,7 +242,7 @@ struct rte_sched_port {
>  	uint64_t time_cpu_cycles;     /* Current CPU time measured in CPU
> cyles */
>  	uint64_t time_cpu_bytes;      /* Current CPU time measured in bytes
> */
>  	uint64_t time;                /* Current NIC TX time measured in bytes */
> -	double cycles_per_byte;       /* CPU cycles per byte */
> +	uint32_t cycles_per_byte;       /* CPU cycles per byte (scaled) */
> 
>  	/* Scheduling loop detection */
>  	uint32_t pipe_loop;
> @@ -657,7 +660,9 @@ rte_sched_port_config(struct
> rte_sched_port_params *params)
>  	port->time_cpu_cycles = rte_get_tsc_cycles();
>  	port->time_cpu_bytes = 0;
>  	port->time = 0;
> -	port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double)
> params->rate);
> +
> +	port->cycles_per_byte = (rte_get_tsc_hz() <<
> RTE_SCHED_TIME_SHIFT)
> +		/ params->rate;
> 
>  	/* Scheduling loop detection */
>  	port->pipe_loop = RTE_SCHED_PIPE_INVALID;
> @@ -2126,11 +2131,12 @@ rte_sched_port_time_resync(struct
> rte_sched_port *port)
>  {
>  	uint64_t cycles = rte_get_tsc_cycles();
>  	uint64_t cycles_diff = cycles - port->time_cpu_cycles;
> -	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
> +	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
> +		/ port->cycles_per_byte;
> 
>  	/* Advance port time */
>  	port->time_cpu_cycles = cycles;
> -	port->time_cpu_bytes += (uint64_t) bytes_diff;
> +	port->time_cpu_bytes += bytes_diff;
>  	if (port->time < port->time_cpu_bytes) {
>  		port->time = port->time_cpu_bytes;
>  	}
> --
> 2.1.4

Stephen,

We agreed during the previous round to look at 64-bit multiplication option, but looks like this patch is identical to the previous one. Did you meet any issues in implementing this approach? As stated before, I do not think this is the best solution for the reasons listed previously, and this part of the code is too sensitive to take the risk.

Since Thomas indicated these patches will be considered for 2.1 rather than 2.0 release, it looks like we have some time to refine this work. I would reiterate the same proposal that Thomas made: re-submit the patches where we have consensus, and keep this one out for the moment; you and me can sync up offline and come back with an implementation proposal that would hopefully address the previous concerns for 2.1 release. Would this work for you?

Thank you for your work and for your understanding!

Regards,
Cristian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing Stephen Hemminger
@ 2015-03-12 19:28   ` Dumitrescu, Cristian
  2015-03-12 23:06     ` Stephen Hemminger
  2015-03-16 20:00   ` Dumitrescu, Cristian
  1 sibling, 1 reply; 16+ messages in thread
From: Dumitrescu, Cristian @ 2015-03-12 19:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, March 10, 2015 4:14 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org; Stephen Hemminger; Stephen Hemminger
> Subject: [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
> 
> From: Stephen Hemminger <shemming@brocade.com>
> 
> The rte_sched statistics API should allow reading statistics without
> clearing. Make auto-clear optional.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> v2 -- change to flag for clear from separate read/clear
> 
>  app/test/test_sched.c        |  4 ++--
>  examples/qos_sched/stats.c   | 16 +++++++++++-----
>  lib/librte_sched/rte_sched.c | 44 ++++++++++++++++++++++-----------------
> -----
>  lib/librte_sched/rte_sched.h | 18 ++++++++++--------
>  4 files changed, 45 insertions(+), 37 deletions(-)
> 
> diff --git a/app/test/test_sched.c b/app/test/test_sched.c
> index 60c62de..3c9c9e3 100644
> --- a/app/test/test_sched.c
> +++ b/app/test/test_sched.c
> @@ -208,13 +208,13 @@ test_sched(void)
> 
>  	struct rte_sched_subport_stats subport_stats;
>  	uint32_t tc_ov;
> -	rte_sched_subport_read_stats(port, SUBPORT, &subport_stats,
> &tc_ov);
> +	rte_sched_subport_read_stats(port, SUBPORT, &subport_stats,
> &tc_ov, 1);
>  #if 0
>  	TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong
> subport stats\n");
>  #endif
>  	struct rte_sched_queue_stats queue_stats;
>  	uint16_t qlen;
> -	rte_sched_queue_read_stats(port, QUEUE, &queue_stats, &qlen);
> +	rte_sched_queue_read_stats(port, QUEUE, &queue_stats, &qlen,
> 1);
>  #if 0
>  	TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue
> stats\n");
>  #endif
> diff --git a/examples/qos_sched/stats.c b/examples/qos_sched/stats.c
> index b4db7b5..88caa54 100644
> --- a/examples/qos_sched/stats.c
> +++ b/examples/qos_sched/stats.c
> @@ -61,7 +61,7 @@ qavg_q(uint8_t port_id, uint32_t subport_id, uint32_t
> pipe_id, uint8_t tc, uint8
>          average = 0;
> 
>          for (count = 0; count < qavg_ntimes; count++) {
> -                rte_sched_queue_read_stats(port, queue_id, &stats, &qlen);
> +                rte_sched_queue_read_stats(port, queue_id, &stats, &qlen, 1);
>                  average += qlen;
>                  usleep(qavg_period);
>          }
> @@ -99,7 +99,9 @@ qavg_tcpipe(uint8_t port_id, uint32_t subport_id,
> uint32_t pipe_id, uint8_t tc)
>          for (count = 0; count < qavg_ntimes; count++) {
>                  part_average = 0;
>                  for (i = 0; i < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) {
> -                        rte_sched_queue_read_stats(port, queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i), &stats, &qlen);
> +                        rte_sched_queue_read_stats(port,
> +				   queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i),
> +				   &stats, &qlen, 1);
>                          part_average += qlen;
>                  }
>                  average += part_average /
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS;
> @@ -138,7 +140,8 @@ qavg_pipe(uint8_t port_id, uint32_t subport_id,
> uint32_t pipe_id)
>          for (count = 0; count < qavg_ntimes; count++) {
>                  part_average = 0;
>                  for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) {
> -                        rte_sched_queue_read_stats(port, queue_id + i, &stats,
> &qlen);
> +                        rte_sched_queue_read_stats(port, queue_id + i,
> +						   &stats, &qlen, 1);
>                          part_average += qlen;
>                  }
>                  average += part_average /
> (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS);
> @@ -178,7 +181,9 @@ qavg_tcsubport(uint8_t port_id, uint32_t subport_id,
> uint8_t tc)
>                          queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id *
> port_params.n_pipes_per_subport + i);
> 
>                          for (j = 0; j < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; j++) {
> -                                rte_sched_queue_read_stats(port, queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j), &stats, &qlen);
> +                                rte_sched_queue_read_stats(port,
> +							   queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j),
> +							   &stats, &qlen, 1);
>                                  part_average += qlen;
>                          }
>                  }
> @@ -220,7 +225,8 @@ qavg_subport(uint8_t port_id, uint32_t subport_id)
>                          queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id *
> port_params.n_pipes_per_subport + i);
> 
>                          for (j = 0; j < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; j++) {
> -                                rte_sched_queue_read_stats(port, queue_id + j, &stats,
> &qlen);
> +                                rte_sched_queue_read_stats(port, queue_id + j,
> +							   &stats, &qlen, 1);
>                                  part_average += qlen;
>                          }
>                  }
> diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> index 2071414..74d0e0a 100644
> --- a/lib/librte_sched/rte_sched.c
> +++ b/lib/librte_sched/rte_sched.c
> @@ -909,56 +909,56 @@ rte_sched_pipe_config(struct rte_sched_port
> *port,
> 
>  int
>  rte_sched_subport_read_stats(struct rte_sched_port *port,
> -	uint32_t subport_id,
> -	struct rte_sched_subport_stats *stats,
> -	uint32_t *tc_ov)
> +			     uint32_t subport_id,
> +			     struct rte_sched_subport_stats *stats,
> +			     uint32_t *tc_ov, int clear)
>  {
>  	struct rte_sched_subport *s;
> 
>  	/* Check user parameters */
> -	if ((port == NULL) ||
> -	    (subport_id >= port->n_subports_per_port) ||
> -		(stats == NULL) ||
> -		(tc_ov == NULL)) {
> +	if (port == NULL || subport_id >= port->n_subports_per_port)
>  		return -1;
> -	}
> +
>  	s = port->subport + subport_id;
> 
>  	/* Copy subport stats and clear */
> -	memcpy(stats, &s->stats, sizeof(struct rte_sched_subport_stats));
> -	memset(&s->stats, 0, sizeof(struct rte_sched_subport_stats));
> +	if (stats)
> +		*stats = s->stats;
> +	if (clear)
> +		memset(&s->stats, 0, sizeof(struct
> rte_sched_subport_stats));
> 
>  	/* Subport TC ovesubscription status */
> -	*tc_ov = s->tc_ov;
> +	if (tc_ov)
> +		*tc_ov = s->tc_ov;
> 
>  	return 0;
>  }
> 
>  int
>  rte_sched_queue_read_stats(struct rte_sched_port *port,
> -	uint32_t queue_id,
> -	struct rte_sched_queue_stats *stats,
> -	uint16_t *qlen)
> +			   uint32_t queue_id,
> +			   struct rte_sched_queue_stats *stats,
> +			   uint16_t *qlen, int clear)
>  {
>  	struct rte_sched_queue *q;
>  	struct rte_sched_queue_extra *qe;
> 
>  	/* Check user parameters */
> -	if ((port == NULL) ||
> -	    (queue_id >= rte_sched_port_queues_per_port(port)) ||
> -		(stats == NULL) ||
> -		(qlen == NULL)) {
> +	if (port == NULL || queue_id >=
> rte_sched_port_queues_per_port(port))
>  		return -1;
> -	}
> +
>  	q = port->queue + queue_id;
>  	qe = port->queue_extra + queue_id;
> 
>  	/* Copy queue stats and clear */
> -	memcpy(stats, &qe->stats, sizeof(struct rte_sched_queue_stats));
> -	memset(&qe->stats, 0, sizeof(struct rte_sched_queue_stats));
> +	if (stats)
> +		*stats = qe->stats;
> +	if (clear)
> +		memset(&qe->stats, 0, sizeof(struct
> rte_sched_queue_stats));
> 
>  	/* Queue length */
> -	*qlen = q->qw - q->qr;
> +	if (qlen)
> +		*qlen = q->qw - q->qr;
> 
>  	return 0;
>  }
> diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
> index e9bf18a..f0bf088 100644
> --- a/lib/librte_sched/rte_sched.h
> +++ b/lib/librte_sched/rte_sched.h
> @@ -308,14 +308,15 @@ rte_sched_port_get_memory_footprint(struct
> rte_sched_port_params *params);
>   * @param tc_ov
>   *   Pointer to pre-allocated 4-entry array where the oversubscription status
> for
>   *   each of the 4 subport traffic classes should be stored.
> + * @parm clear
> + *   Reset statistics after read
>   * @return
>   *   0 upon success, error code otherwise
>   */
>  int
> -rte_sched_subport_read_stats(struct rte_sched_port *port,
> -	uint32_t subport_id,
> -	struct rte_sched_subport_stats *stats,
> -	uint32_t *tc_ov);
> +rte_sched_subport_read_stats(struct rte_sched_port *port, uint32_t
> subport_id,
> +			     struct rte_sched_subport_stats *stats,
> +			     uint32_t *tc_ov, int clear);
> 
>  /**
>   * Hierarchical scheduler queue statistics read
> @@ -329,14 +330,15 @@ rte_sched_subport_read_stats(struct
> rte_sched_port *port,
>   *   counters should be stored
>   * @param qlen
>   *   Pointer to pre-allocated variable where the current queue length should
> be stored.
> + * @parm clear
> + *   Reset statistics after read
>   * @return
>   *   0 upon success, error code otherwise
>   */
>  int
> -rte_sched_queue_read_stats(struct rte_sched_port *port,
> -	uint32_t queue_id,
> -	struct rte_sched_queue_stats *stats,
> -	uint16_t *qlen);
> +rte_sched_queue_read_stats(struct rte_sched_port *port, uint32_t
> queue_id,
> +			   struct rte_sched_queue_stats *stats,
> +			   uint16_t *qlen, int clear);
> 
>  /*
>   * Run-time
> --
> 2.1.4

Hi Stephen,

Thank you for adding flexibility over clearing the stats or not.

I have one concern though: why change the stats read API to add a clear parameter rather than keep prototype for the stats functions unchanged and add the flag as part of the port creation parameters in struct rte_sched_port_params? This parameter could be saved into the internal struct rte_sched_port, which is passed (as a pointer) to the stats read functions. In my opinion, this approach is slightly more elegant and it keeps the changes to a minimum.

int
rte_sched_queue_read_stats(struct rte_sched_port *port,
	uint32_t queue_id,
	struct rte_sched_queue_stats *stats,
	uint16_t *qlen)
{
	...
	if (port->clear_stats_on_read)
		memset(...);
}

I made this suggestion during the previous round, but I did not get any opinion from you on it yet.

Regards,
Cristian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-12 19:28   ` Dumitrescu, Cristian
@ 2015-03-12 23:06     ` Stephen Hemminger
  2015-03-16 19:58       ` Dumitrescu, Cristian
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-12 23:06 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev, Stephen Hemminger

On Thu, 12 Mar 2015 19:28:11 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> Hi Stephen,
> 
> Thank you for adding flexibility over clearing the stats or not.
> 
> I have one concern though: why change the stats read API to add a clear parameter rather than keep prototype for the stats functions unchanged and add the flag as part of the port creation parameters in struct rte_sched_port_params? This parameter could be saved into the internal struct rte_sched_port, which is passed (as a pointer) to the stats read functions. In my opinion, this approach is slightly more elegant and it keeps the changes to a minimum.
> 
> int
> rte_sched_queue_read_stats(struct rte_sched_port *port,
> 	uint32_t queue_id,
> 	struct rte_sched_queue_stats *stats,
> 	uint16_t *qlen)
> {
> 	...
> 	if (port->clear_stats_on_read)
> 		memset(...);
> }
> 
> I made this suggestion during the previous round, but I did not get any opinion from you on it yet.
> 
> Regards,
> Cristian

I rejected the config parameter idea because I don't like it is inconsistent
with other statistics in DPDK and in related software. There is not a
config parameter that changes what BSD or Linux kernel API does.

The only reason for keeping the read and clear in one operation is
because you like it, and there are somebody might have built code
that expects it.

Changing the function signature is a nice red flag so that people will
notice at change.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock
  2015-03-12 19:06   ` Dumitrescu, Cristian
@ 2015-03-12 23:09     ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-12 23:09 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev, Stephen Hemminger

On Thu, 12 Mar 2015 19:06:39 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Tuesday, March 10, 2015 4:14 PM
> > To: Dumitrescu, Cristian
> > Cc: dev@dpdk.org; Stephen Hemminger; Stephen Hemminger
> > Subject: [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte
> > clock
> > 
> > From: Stephen Hemminger <shemming@brocade.com>
> > 
> > The old code was doing a floating point divide for each rte_dequeue()
> > which is very expensive. Change to using fixed point scaled math instead.
> > This improved performance from 5Gbit/sec to 10 Gbit/sec
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > v2 -- no changes
> >       despite objections, the performance observation is real
> >       on Intel(R) Core(TM) i7-3770 CPU
> > 
> >  lib/librte_sched/rte_sched.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> > index 74d0e0a..522a647 100644
> > --- a/lib/librte_sched/rte_sched.c
> > +++ b/lib/librte_sched/rte_sched.c
> > @@ -102,6 +102,9 @@
> > 
> >  #define RTE_SCHED_BMP_POS_INVALID             UINT32_MAX
> > 
> > +/* For cycles_per_byte calculation */
> > +#define RTE_SCHED_TIME_SHIFT		      20
> > +
> >  struct rte_sched_subport {
> >  	/* Token bucket (TB) */
> >  	uint64_t tb_time; /* time of last update */
> > @@ -239,7 +242,7 @@ struct rte_sched_port {
> >  	uint64_t time_cpu_cycles;     /* Current CPU time measured in CPU
> > cyles */
> >  	uint64_t time_cpu_bytes;      /* Current CPU time measured in bytes
> > */
> >  	uint64_t time;                /* Current NIC TX time measured in bytes */
> > -	double cycles_per_byte;       /* CPU cycles per byte */
> > +	uint32_t cycles_per_byte;       /* CPU cycles per byte (scaled) */
> > 
> >  	/* Scheduling loop detection */
> >  	uint32_t pipe_loop;
> > @@ -657,7 +660,9 @@ rte_sched_port_config(struct
> > rte_sched_port_params *params)
> >  	port->time_cpu_cycles = rte_get_tsc_cycles();
> >  	port->time_cpu_bytes = 0;
> >  	port->time = 0;
> > -	port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double)
> > params->rate);
> > +
> > +	port->cycles_per_byte = (rte_get_tsc_hz() <<
> > RTE_SCHED_TIME_SHIFT)
> > +		/ params->rate;
> > 
> >  	/* Scheduling loop detection */
> >  	port->pipe_loop = RTE_SCHED_PIPE_INVALID;
> > @@ -2126,11 +2131,12 @@ rte_sched_port_time_resync(struct
> > rte_sched_port *port)
> >  {
> >  	uint64_t cycles = rte_get_tsc_cycles();
> >  	uint64_t cycles_diff = cycles - port->time_cpu_cycles;
> > -	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
> > +	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
> > +		/ port->cycles_per_byte;
> > 
> >  	/* Advance port time */
> >  	port->time_cpu_cycles = cycles;
> > -	port->time_cpu_bytes += (uint64_t) bytes_diff;
> > +	port->time_cpu_bytes += bytes_diff;
> >  	if (port->time < port->time_cpu_bytes) {
> >  		port->time = port->time_cpu_bytes;
> >  	}
> > --
> > 2.1.4
> 
> Stephen,
> 
> We agreed during the previous round to look at 64-bit multiplication option, but looks like this patch is identical to the previous one. Did you meet any issues in implementing this approach? As stated before, I do not think this is the best solution for the reasons listed previously, and this part of the code is too sensitive to take the risk.
> 
> Since Thomas indicated these patches will be considered for 2.1 rather than 2.0 release, it looks like we have some time to refine this work. I would reiterate the same proposal that Thomas made: re-submit the patches where we have consensus, and keep this one out for the moment; you and me can sync up offline and come back with an implementation proposal that would hopefully address the previous concerns for 2.1 release. Would this work for you?
> 
> Thank you for your work and for your understanding!
> 
> Regards,
> Cristian
> 

I haven't had time to do the analysis to figure out how to keep the maths
in range so that 64 bit multiplicative divide would work. When I first tried
it the issue was that the cycles_diff and bytes_diff would both have to
scaled so that they are 32 bit values.

Should be possible, might get back to it (or someone else might take up
the cause).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-12 23:06     ` Stephen Hemminger
@ 2015-03-16 19:58       ` Dumitrescu, Cristian
  2015-03-16 20:11         ` Stephen Hemminger
  0 siblings, 1 reply; 16+ messages in thread
From: Dumitrescu, Cristian @ 2015-03-16 19:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, March 12, 2015 11:06 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org; Stephen Hemminger
> Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without
> clearing
> 
> On Thu, 12 Mar 2015 19:28:11 +0000
> "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
> 
> > Hi Stephen,
> >
> > Thank you for adding flexibility over clearing the stats or not.
> >
> > I have one concern though: why change the stats read API to add a clear
> parameter rather than keep prototype for the stats functions unchanged and
> add the flag as part of the port creation parameters in struct
> rte_sched_port_params? This parameter could be saved into the internal
> struct rte_sched_port, which is passed (as a pointer) to the stats read
> functions. In my opinion, this approach is slightly more elegant and it keeps
> the changes to a minimum.
> >
> > int
> > rte_sched_queue_read_stats(struct rte_sched_port *port,
> > 	uint32_t queue_id,
> > 	struct rte_sched_queue_stats *stats,
> > 	uint16_t *qlen)
> > {
> > 	...
> > 	if (port->clear_stats_on_read)
> > 		memset(...);
> > }
> >
> > I made this suggestion during the previous round, but I did not get any
> opinion from you on it yet.
> >
> > Regards,
> > Cristian
> 
> I rejected the config parameter idea because I don't like it is inconsistent
> with other statistics in DPDK and in related software. There is not a
> config parameter that changes what BSD or Linux kernel API does.

Your approach has the advantage of being able to clear/not clear the stats per each read operation rather than configuring the behavior globally. I think this approach allows for the ultimate flexibility, so I am OK to go with it.

> 
> The only reason for keeping the read and clear in one operation is
> because you like it, and there are somebody might have built code
> that expects it.
>

Clearing the stats with a delay after the stats have been read is prone to a race condition, as during this time more packets could be processed, and these packets will not show up in the counters that the user read.
I think it depends on the need of each particular application whether this race condition is important or not: if the counters are read rarely (e.g. once per day) and only course accuracy is required, the error is probably irrelevant; if the app is looking for fine great accuracy (e.g. rate measurement, debugging, etc), then the error is not allowed. You seem to favour the former and ignore the later case.


> Changing the function signature is a nice red flag so that people will
> notice at change.

There is a small API change here. I am OK with it for the above reasons, provided that there are no objections on this from other contributors.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing Stephen Hemminger
  2015-03-12 19:28   ` Dumitrescu, Cristian
@ 2015-03-16 20:00   ` Dumitrescu, Cristian
  1 sibling, 0 replies; 16+ messages in thread
From: Dumitrescu, Cristian @ 2015-03-16 20:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, March 10, 2015 4:14 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org; Stephen Hemminger; Stephen Hemminger
> Subject: [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
> 
> From: Stephen Hemminger <shemming@brocade.com>
> 
> The rte_sched statistics API should allow reading statistics without
> clearing. Make auto-clear optional.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> v2 -- change to flag for clear from separate read/clear
> 
>  app/test/test_sched.c        |  4 ++--
>  examples/qos_sched/stats.c   | 16 +++++++++++-----
>  lib/librte_sched/rte_sched.c | 44 ++++++++++++++++++++++-----------------
> -----
>  lib/librte_sched/rte_sched.h | 18 ++++++++++--------
>  4 files changed, 45 insertions(+), 37 deletions(-)
> 
> diff --git a/app/test/test_sched.c b/app/test/test_sched.c
> index 60c62de..3c9c9e3 100644
> --- a/app/test/test_sched.c
> +++ b/app/test/test_sched.c
> @@ -208,13 +208,13 @@ test_sched(void)
> 
>  	struct rte_sched_subport_stats subport_stats;
>  	uint32_t tc_ov;
> -	rte_sched_subport_read_stats(port, SUBPORT, &subport_stats,
> &tc_ov);
> +	rte_sched_subport_read_stats(port, SUBPORT, &subport_stats,
> &tc_ov, 1);
>  #if 0
>  	TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong
> subport stats\n");
>  #endif
>  	struct rte_sched_queue_stats queue_stats;
>  	uint16_t qlen;
> -	rte_sched_queue_read_stats(port, QUEUE, &queue_stats, &qlen);
> +	rte_sched_queue_read_stats(port, QUEUE, &queue_stats, &qlen,
> 1);
>  #if 0
>  	TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue
> stats\n");
>  #endif
> diff --git a/examples/qos_sched/stats.c b/examples/qos_sched/stats.c
> index b4db7b5..88caa54 100644
> --- a/examples/qos_sched/stats.c
> +++ b/examples/qos_sched/stats.c
> @@ -61,7 +61,7 @@ qavg_q(uint8_t port_id, uint32_t subport_id, uint32_t
> pipe_id, uint8_t tc, uint8
>          average = 0;
> 
>          for (count = 0; count < qavg_ntimes; count++) {
> -                rte_sched_queue_read_stats(port, queue_id, &stats, &qlen);
> +                rte_sched_queue_read_stats(port, queue_id, &stats, &qlen, 1);
>                  average += qlen;
>                  usleep(qavg_period);
>          }
> @@ -99,7 +99,9 @@ qavg_tcpipe(uint8_t port_id, uint32_t subport_id,
> uint32_t pipe_id, uint8_t tc)
>          for (count = 0; count < qavg_ntimes; count++) {
>                  part_average = 0;
>                  for (i = 0; i < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) {
> -                        rte_sched_queue_read_stats(port, queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i), &stats, &qlen);
> +                        rte_sched_queue_read_stats(port,
> +				   queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + i),
> +				   &stats, &qlen, 1);
>                          part_average += qlen;
>                  }
>                  average += part_average /
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS;
> @@ -138,7 +140,8 @@ qavg_pipe(uint8_t port_id, uint32_t subport_id,
> uint32_t pipe_id)
>          for (count = 0; count < qavg_ntimes; count++) {
>                  part_average = 0;
>                  for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; i++) {
> -                        rte_sched_queue_read_stats(port, queue_id + i, &stats,
> &qlen);
> +                        rte_sched_queue_read_stats(port, queue_id + i,
> +						   &stats, &qlen, 1);
>                          part_average += qlen;
>                  }
>                  average += part_average /
> (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS);
> @@ -178,7 +181,9 @@ qavg_tcsubport(uint8_t port_id, uint32_t subport_id,
> uint8_t tc)
>                          queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id *
> port_params.n_pipes_per_subport + i);
> 
>                          for (j = 0; j < RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; j++) {
> -                                rte_sched_queue_read_stats(port, queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j), &stats, &qlen);
> +                                rte_sched_queue_read_stats(port,
> +							   queue_id + (tc *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j),
> +							   &stats, &qlen, 1);
>                                  part_average += qlen;
>                          }
>                  }
> @@ -220,7 +225,8 @@ qavg_subport(uint8_t port_id, uint32_t subport_id)
>                          queue_id = RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS * (subport_id *
> port_params.n_pipes_per_subport + i);
> 
>                          for (j = 0; j < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE *
> RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; j++) {
> -                                rte_sched_queue_read_stats(port, queue_id + j, &stats,
> &qlen);
> +                                rte_sched_queue_read_stats(port, queue_id + j,
> +							   &stats, &qlen, 1);
>                                  part_average += qlen;
>                          }
>                  }
> diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> index 2071414..74d0e0a 100644
> --- a/lib/librte_sched/rte_sched.c
> +++ b/lib/librte_sched/rte_sched.c
> @@ -909,56 +909,56 @@ rte_sched_pipe_config(struct rte_sched_port
> *port,
> 
>  int
>  rte_sched_subport_read_stats(struct rte_sched_port *port,
> -	uint32_t subport_id,
> -	struct rte_sched_subport_stats *stats,
> -	uint32_t *tc_ov)
> +			     uint32_t subport_id,
> +			     struct rte_sched_subport_stats *stats,
> +			     uint32_t *tc_ov, int clear)
>  {
>  	struct rte_sched_subport *s;
> 
>  	/* Check user parameters */
> -	if ((port == NULL) ||
> -	    (subport_id >= port->n_subports_per_port) ||
> -		(stats == NULL) ||
> -		(tc_ov == NULL)) {
> +	if (port == NULL || subport_id >= port->n_subports_per_port)
>  		return -1;
> -	}
> +
>  	s = port->subport + subport_id;
> 
>  	/* Copy subport stats and clear */
> -	memcpy(stats, &s->stats, sizeof(struct rte_sched_subport_stats));
> -	memset(&s->stats, 0, sizeof(struct rte_sched_subport_stats));
> +	if (stats)
> +		*stats = s->stats;
> +	if (clear)
> +		memset(&s->stats, 0, sizeof(struct
> rte_sched_subport_stats));
> 
>  	/* Subport TC ovesubscription status */
> -	*tc_ov = s->tc_ov;
> +	if (tc_ov)
> +		*tc_ov = s->tc_ov;
> 
>  	return 0;
>  }
> 
>  int
>  rte_sched_queue_read_stats(struct rte_sched_port *port,
> -	uint32_t queue_id,
> -	struct rte_sched_queue_stats *stats,
> -	uint16_t *qlen)
> +			   uint32_t queue_id,
> +			   struct rte_sched_queue_stats *stats,
> +			   uint16_t *qlen, int clear)
>  {
>  	struct rte_sched_queue *q;
>  	struct rte_sched_queue_extra *qe;
> 
>  	/* Check user parameters */
> -	if ((port == NULL) ||
> -	    (queue_id >= rte_sched_port_queues_per_port(port)) ||
> -		(stats == NULL) ||
> -		(qlen == NULL)) {
> +	if (port == NULL || queue_id >=
> rte_sched_port_queues_per_port(port))
>  		return -1;
> -	}
> +
>  	q = port->queue + queue_id;
>  	qe = port->queue_extra + queue_id;
> 
>  	/* Copy queue stats and clear */
> -	memcpy(stats, &qe->stats, sizeof(struct rte_sched_queue_stats));
> -	memset(&qe->stats, 0, sizeof(struct rte_sched_queue_stats));
> +	if (stats)
> +		*stats = qe->stats;
> +	if (clear)
> +		memset(&qe->stats, 0, sizeof(struct
> rte_sched_queue_stats));
> 
>  	/* Queue length */
> -	*qlen = q->qw - q->qr;
> +	if (qlen)
> +		*qlen = q->qw - q->qr;
> 
>  	return 0;
>  }
> diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
> index e9bf18a..f0bf088 100644
> --- a/lib/librte_sched/rte_sched.h
> +++ b/lib/librte_sched/rte_sched.h
> @@ -308,14 +308,15 @@ rte_sched_port_get_memory_footprint(struct
> rte_sched_port_params *params);
>   * @param tc_ov
>   *   Pointer to pre-allocated 4-entry array where the oversubscription status
> for
>   *   each of the 4 subport traffic classes should be stored.
> + * @parm clear
> + *   Reset statistics after read
>   * @return
>   *   0 upon success, error code otherwise
>   */
>  int
> -rte_sched_subport_read_stats(struct rte_sched_port *port,
> -	uint32_t subport_id,
> -	struct rte_sched_subport_stats *stats,
> -	uint32_t *tc_ov);
> +rte_sched_subport_read_stats(struct rte_sched_port *port, uint32_t
> subport_id,
> +			     struct rte_sched_subport_stats *stats,
> +			     uint32_t *tc_ov, int clear);
> 
>  /**
>   * Hierarchical scheduler queue statistics read
> @@ -329,14 +330,15 @@ rte_sched_subport_read_stats(struct
> rte_sched_port *port,
>   *   counters should be stored
>   * @param qlen
>   *   Pointer to pre-allocated variable where the current queue length should
> be stored.
> + * @parm clear
> + *   Reset statistics after read
>   * @return
>   *   0 upon success, error code otherwise
>   */
>  int
> -rte_sched_queue_read_stats(struct rte_sched_port *port,
> -	uint32_t queue_id,
> -	struct rte_sched_queue_stats *stats,
> -	uint16_t *qlen);
> +rte_sched_queue_read_stats(struct rte_sched_port *port, uint32_t
> queue_id,
> +			   struct rte_sched_queue_stats *stats,
> +			   uint16_t *qlen, int clear);
> 
>  /*
>   * Run-time
> --
> 2.1.4

Acked by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-16 19:58       ` Dumitrescu, Cristian
@ 2015-03-16 20:11         ` Stephen Hemminger
  2015-03-16 20:21           ` Dumitrescu, Cristian
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-16 20:11 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev, Stephen Hemminger

On Mon, 16 Mar 2015 19:58:51 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, March 12, 2015 11:06 PM
> > To: Dumitrescu, Cristian
> > Cc: dev@dpdk.org; Stephen Hemminger
> > Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without
> > clearing
> > 
> > On Thu, 12 Mar 2015 19:28:11 +0000
> > "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
> > 
> > > Hi Stephen,
> > >
> > > Thank you for adding flexibility over clearing the stats or not.
> > >
> > > I have one concern though: why change the stats read API to add a clear
> > parameter rather than keep prototype for the stats functions unchanged and
> > add the flag as part of the port creation parameters in struct
> > rte_sched_port_params? This parameter could be saved into the internal
> > struct rte_sched_port, which is passed (as a pointer) to the stats read
> > functions. In my opinion, this approach is slightly more elegant and it keeps
> > the changes to a minimum.
> > >
> > > int
> > > rte_sched_queue_read_stats(struct rte_sched_port *port,
> > > 	uint32_t queue_id,
> > > 	struct rte_sched_queue_stats *stats,
> > > 	uint16_t *qlen)
> > > {
> > > 	...
> > > 	if (port->clear_stats_on_read)
> > > 		memset(...);
> > > }
> > >
> > > I made this suggestion during the previous round, but I did not get any
> > opinion from you on it yet.
> > >
> > > Regards,
> > > Cristian
> > 
> > I rejected the config parameter idea because I don't like it is inconsistent
> > with other statistics in DPDK and in related software. There is not a
> > config parameter that changes what BSD or Linux kernel API does.
> 
> Your approach has the advantage of being able to clear/not clear the stats per each read operation rather than configuring the behavior globally. I think this approach allows for the ultimate flexibility, so I am OK to go with it.
> 
> > 
> > The only reason for keeping the read and clear in one operation is
> > because you like it, and there are somebody might have built code
> > that expects it.
> >
> 
> Clearing the stats with a delay after the stats have been read is prone to a race condition, as during this time more packets could be processed, and these packets will not show up in the counters that the user read.
> I think it depends on the need of each particular application whether this race condition is important or not: if the counters are read rarely (e.g. once per day) and only course accuracy is required, the error is probably irrelevant; if the app is looking for fine great accuracy (e.g. rate measurement, debugging, etc), then the error is not allowed. You seem to favour the former and ignore the later case.
> 
> 
> > Changing the function signature is a nice red flag so that people will
> > notice at change.
> 
> There is a small API change here. I am OK with it for the above reasons, provided that there are no objections on this from other contributors.
> 

If you really wanted a fast/safe read and clear operation, how about using
exchange instruction to exchange in a zero?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-16 20:11         ` Stephen Hemminger
@ 2015-03-16 20:21           ` Dumitrescu, Cristian
  2015-03-16 20:33             ` Stephen Hemminger
  0 siblings, 1 reply; 16+ messages in thread
From: Dumitrescu, Cristian @ 2015-03-16 20:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, March 16, 2015 8:11 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org; Stephen Hemminger
> Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without
> clearing
> 
> On Mon, 16 Mar 2015 19:58:51 +0000
> "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Thursday, March 12, 2015 11:06 PM
> > > To: Dumitrescu, Cristian
> > > Cc: dev@dpdk.org; Stephen Hemminger
> > > Subject: Re: [PATCH v2 5/6] rte_sched: allow reading statistics without
> > > clearing
> > >
> > > On Thu, 12 Mar 2015 19:28:11 +0000
> > > "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
> > >
> > > > Hi Stephen,
> > > >
> > > > Thank you for adding flexibility over clearing the stats or not.
> > > >
> > > > I have one concern though: why change the stats read API to add a
> clear
> > > parameter rather than keep prototype for the stats functions unchanged
> and
> > > add the flag as part of the port creation parameters in struct
> > > rte_sched_port_params? This parameter could be saved into the internal
> > > struct rte_sched_port, which is passed (as a pointer) to the stats read
> > > functions. In my opinion, this approach is slightly more elegant and it
> keeps
> > > the changes to a minimum.
> > > >
> > > > int
> > > > rte_sched_queue_read_stats(struct rte_sched_port *port,
> > > > 	uint32_t queue_id,
> > > > 	struct rte_sched_queue_stats *stats,
> > > > 	uint16_t *qlen)
> > > > {
> > > > 	...
> > > > 	if (port->clear_stats_on_read)
> > > > 		memset(...);
> > > > }
> > > >
> > > > I made this suggestion during the previous round, but I did not get any
> > > opinion from you on it yet.
> > > >
> > > > Regards,
> > > > Cristian
> > >
> > > I rejected the config parameter idea because I don't like it is inconsistent
> > > with other statistics in DPDK and in related software. There is not a
> > > config parameter that changes what BSD or Linux kernel API does.
> >
> > Your approach has the advantage of being able to clear/not clear the stats
> per each read operation rather than configuring the behavior globally. I think
> this approach allows for the ultimate flexibility, so I am OK to go with it.
> >
> > >
> > > The only reason for keeping the read and clear in one operation is
> > > because you like it, and there are somebody might have built code
> > > that expects it.
> > >
> >
> > Clearing the stats with a delay after the stats have been read is prone to a
> race condition, as during this time more packets could be processed, and
> these packets will not show up in the counters that the user read.
> > I think it depends on the need of each particular application whether this
> race condition is important or not: if the counters are read rarely (e.g. once
> per day) and only course accuracy is required, the error is probably irrelevant;
> if the app is looking for fine great accuracy (e.g. rate measurement,
> debugging, etc), then the error is not allowed. You seem to favour the
> former and ignore the later case.
> >
> >
> > > Changing the function signature is a nice red flag so that people will
> > > notice at change.
> >
> > There is a small API change here. I am OK with it for the above reasons,
> provided that there are no objections on this from other contributors.
> >
> 
> If you really wanted a fast/safe read and clear operation, how about using
> exchange instruction to exchange in a zero?

When stats read is decoupled from stats clear, the exchange operation cannot fix this.

To your point: when stats are cleared on read, would it make sense to use the exchange operation? In my opinion, it does not make sense, as the rte_sched API is not thread safe (for performance reasons), which is explicitly  stated. As clearly documented, the scheduler enqueue and dequeue operations need to run on the same CPU core. Therefore, IMHO having a thread safe stats read function while the main functional API is not thread safe is not going to add value.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing
  2015-03-16 20:21           ` Dumitrescu, Cristian
@ 2015-03-16 20:33             ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2015-03-16 20:33 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev, Stephen Hemminger

On Mon, 16 Mar 2015 20:21:39 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> When stats read is decoupled from stats clear, the exchange operation cannot fix this.
> 
> To your point: when stats are cleared on read, would it make sense to use the exchange operation? In my opinion, it does not make sense, as the rte_sched API is not thread safe (for performance reasons), which is explicitly  stated. As clearly documented, the scheduler enqueue and dequeue operations need to run on the same CPU core. Therefore, IMHO having a thread safe stats read function while the main functional API is not thread safe is not going to add value.

Agreed but for a different reason.
It is safe but not consistent to read statistics from another management thread.
As long as no other thread is making changes to underlying configuration only
the values might change while packets in flight. Making all the statistics
fully atomic_t would be a performance killer.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-03-16 20:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-10 16:13 [dpdk-dev] [PATCH v2 0/6] sched: bugfixes and enhancements Stephen Hemminger
2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 1/6] rte_sched: don't put tabs in log messages Stephen Hemminger
2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 2/6] rte_sched: make RED optional at runtime Stephen Hemminger
2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 3/6] rte_sched: expand scheduler hierarchy for more VLAN's Stephen Hemminger
2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 4/6] rte_sched: keep track of RED drops Stephen Hemminger
2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 5/6] rte_sched: allow reading statistics without clearing Stephen Hemminger
2015-03-12 19:28   ` Dumitrescu, Cristian
2015-03-12 23:06     ` Stephen Hemminger
2015-03-16 19:58       ` Dumitrescu, Cristian
2015-03-16 20:11         ` Stephen Hemminger
2015-03-16 20:21           ` Dumitrescu, Cristian
2015-03-16 20:33             ` Stephen Hemminger
2015-03-16 20:00   ` Dumitrescu, Cristian
2015-03-10 16:13 ` [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock Stephen Hemminger
2015-03-12 19:06   ` Dumitrescu, Cristian
2015-03-12 23:09     ` Stephen Hemminger

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git