- * [dpdk-dev] [PATCH v3 1/4] event/dlb2: remove references to deferred scheduling
  2021-05-21  9:11 [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 David Marchand
@ 2021-05-21  9:11 ` David Marchand
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 2/4] doc: fix runtime options in DLB2 guide David Marchand
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: David Marchand @ 2021-05-21  9:11 UTC (permalink / raw)
  To: dev; +Cc: thomas, jerinj, timothy.mcdaniel, stable
From: Timothy McDaniel <timothy.mcdaniel@intel.com>
Deferred scheduling is a DLB v1.0 feature, and is not valid for
DLB v2.0 or v2.5.
Fixes: bc62748bd7d4 ("event/dlb2: add private data structures and constants")
Fixes: a2e4f1f5e79f ("event/dlb2: add dequeue and its burst variants")
Cc: stable@dpdk.org
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---
 doc/guides/eventdevs/dlb2.rst  | 21 ---------------------
 drivers/event/dlb2/dlb2_priv.h |  3 ---
 2 files changed, 24 deletions(-)
diff --git a/doc/guides/eventdevs/dlb2.rst b/doc/guides/eventdevs/dlb2.rst
index 31de6bc470..c60c454d6b 100644
--- a/doc/guides/eventdevs/dlb2.rst
+++ b/doc/guides/eventdevs/dlb2.rst
@@ -293,27 +293,6 @@ The PMD does not support the following configuration sequences:
 This sequence is not supported because the event device must be reconfigured
 before its ports or queues can be.
 
-Deferred Scheduling
-~~~~~~~~~~~~~~~~~~~
-
-The DLB PMD's default behavior for managing a CQ is to "pop" the CQ once per
-dequeued event before returning from rte_event_dequeue_burst(). This frees the
-corresponding entries in the CQ, which enables the DLB to schedule more events
-to it.
-
-To support applications seeking finer-grained scheduling control -- for example
-deferring scheduling to get the best possible priority scheduling and
-load-balancing -- the PMD supports a deferred scheduling mode. In this mode,
-the CQ entry is not popped until the *subsequent* rte_event_dequeue_burst()
-call. This mode only applies to load-balanced event ports with dequeue depth of
-1.
-
-To enable deferred scheduling, use the defer_sched vdev argument like so:
-
-    .. code-block:: console
-
-       --vdev=dlb2_event,defer_sched=on
-
 Atomic Inflights Allocation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
index 3140764a59..b1225af37e 100644
--- a/drivers/event/dlb2/dlb2_priv.h
+++ b/drivers/event/dlb2/dlb2_priv.h
@@ -32,7 +32,6 @@
 #define DLB2_MAX_NUM_EVENTS "max_num_events"
 #define DLB2_NUM_DIR_CREDITS "num_dir_credits"
 #define DEV_ID_ARG "dev_id"
-#define DLB2_DEFER_SCHED_ARG "defer_sched"
 #define DLB2_QID_DEPTH_THRESH_ARG "qid_depth_thresh"
 #define DLB2_COS_ARG "cos"
 #define DLB2_POLL_INTERVAL_ARG "poll_interval"
@@ -585,7 +584,6 @@ struct dlb2_eventdev {
 	uint16_t num_dir_ports; /* total num of dir ports requested */
 	bool umwait_allowed;
 	bool global_dequeue_wait; /* Not using per dequeue wait if true */
-	bool defer_sched;
 	enum dlb2_cq_poll_modes poll_mode;
 	int poll_interval;
 	int sw_credit_quanta;
@@ -620,7 +618,6 @@ struct dlb2_devargs {
 	int max_num_events;
 	int num_dir_credits_override;
 	int dev_id;
-	int defer_sched;
 	struct dlb2_qid_depth_thresholds qid_depth_thresholds;
 	enum dlb2_cos cos_id;
 	int poll_interval;
-- 
2.23.0
^ permalink raw reply	[flat|nested] 7+ messages in thread
- * [dpdk-dev] [PATCH v3 2/4] doc: fix runtime options in DLB2 guide
  2021-05-21  9:11 [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 David Marchand
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 1/4] event/dlb2: remove references to deferred scheduling David Marchand
@ 2021-05-21  9:11 ` David Marchand
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 3/4] event/dlb2: fix extraction of HW scheduling type David Marchand
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: David Marchand @ 2021-05-21  9:11 UTC (permalink / raw)
  To: dev; +Cc: thomas, jerinj, timothy.mcdaniel, stable
From: Timothy McDaniel <timothy.mcdaniel@intel.com>
Convert to PCI "--allow" devarg format.
The documentation was previously using the "--vdev" form, which cannot
be used with the DLB2 PF PMD.
Fixes: f3cad285bb88 ("event/dlb2: add infos get and configure")
Fixes: f7cc194b0f7e ("event/dlb2: add enqueue and its burst variants")
Fixes: a2e4f1f5e79f ("event/dlb2: add dequeue and its burst variants")
Fixes: 95aa7101cd3c ("doc: add some features to DLB2 guide")
Cc: stable@dpdk.org
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v2:
- updated title,
- fixed Fixes: lines,
- rebased with patch introducing vector option,
---
 doc/guides/eventdevs/dlb2.rst | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/doc/guides/eventdevs/dlb2.rst b/doc/guides/eventdevs/dlb2.rst
index c60c454d6b..99ea8418a5 100644
--- a/doc/guides/eventdevs/dlb2.rst
+++ b/doc/guides/eventdevs/dlb2.rst
@@ -152,19 +152,19 @@ These pools' sizes are controlled by the nb_events_limit field in struct
 rte_event_dev_config. The load-balanced pool is sized to contain
 nb_events_limit credits, and the directed pool is sized to contain
 nb_events_limit/4 credits. The directed pool size can be overridden with the
-num_dir_credits vdev argument, like so:
+num_dir_credits devargs argument, like so:
 
     .. code-block:: console
 
-       --vdev=dlb2_event,num_dir_credits=<value>
+       --allow ea:00.0,num_dir_credits=<value>
 
 This can be used if the default allocation is too low or too high for the
-specific application needs. The PMD also supports a vdev arg that limits the
+specific application needs. The PMD also supports a devarg that limits the
 max_num_events reported by rte_event_dev_info_get():
 
     .. code-block:: console
 
-       --vdev=dlb2_event,max_num_events=<value>
+       --allow ea:00.0,max_num_events=<value>
 
 By default, max_num_events is reported as the total available load-balanced
 credits. If multiple DLB-based applications are being used, it may be desirable
@@ -315,11 +315,11 @@ buffer space (e.g. if not all queues are used, or aren't used for atomic
 scheduling).
 
 The PMD provides a dev arg to override the default per-queue allocation. To
-increase a vdev's per-queue atomic-inflight allocation to (for example) 64:
+increase per-queue atomic-inflight allocation to (for example) 64:
 
     .. code-block:: console
 
-       --vdev=dlb2_event,atm_inflights=64
+       --allow ea:00.0,atm_inflights=64
 
 QID Depth Threshold
 ~~~~~~~~~~~~~~~~~~~
@@ -342,9 +342,9 @@ shown below.
 
     .. code-block:: console
 
-       --vdev=dlb2_event,qid_depth_thresh=all:<threshold_value>
-       --vdev=dlb2_event,qid_depth_thresh=qidA-qidB:<threshold_value>
-       --vdev=dlb2_event,qid_depth_thresh=qid:<threshold_value>
+       --allow ea:00.0,qid_depth_thresh=all:<threshold_value>
+       --allow ea:00.0,qid_depth_thresh=qidA-qidB:<threshold_value>
+       --allow ea:00.0,qid_depth_thresh=qid:<threshold_value>
 
 Class of service
 ~~~~~~~~~~~~~~~~
@@ -366,4 +366,4 @@ Class of service can be specified in the devargs, as follows
 
     .. code-block:: console
 
-       --vdev=dlb2_event,cos=<0..4>
+       --allow ea:00.0,cos=<0..4>
-- 
2.23.0
^ permalink raw reply	[flat|nested] 7+ messages in thread
- * [dpdk-dev] [PATCH v3 3/4] event/dlb2: fix extraction of HW scheduling type
  2021-05-21  9:11 [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 David Marchand
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 1/4] event/dlb2: remove references to deferred scheduling David Marchand
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 2/4] doc: fix runtime options in DLB2 guide David Marchand
@ 2021-05-21  9:11 ` David Marchand
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 4/4] event/dlb2: select scalar dequeue by default David Marchand
  2021-05-21 10:26 ` [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 Ferruh Yigit
  4 siblings, 0 replies; 7+ messages in thread
From: David Marchand @ 2021-05-21  9:11 UTC (permalink / raw)
  To: dev; +Cc: thomas, jerinj, timothy.mcdaniel
From: Timothy McDaniel <timothy.mcdaniel@intel.com>
The HW scheduling type was not being extracted properly
in the vector optimizaed dequeue path. It was also not
being recorded in the xstats.
Fixes: 000a7b8e7582 ("event/dlb2: optimize dequeue operation")
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---
 drivers/event/dlb2/dlb2.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index 5696f568cd..2a3e4ddb47 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3561,6 +3561,11 @@ _process_deq_qes_vec_impl(struct dlb2_port *qm_port,
 	int ev_qid2 = qm_port->qid_mappings[hw_qid2];
 	int ev_qid3 = qm_port->qid_mappings[hw_qid3];
 
+	int hw_sched0 = _mm_extract_epi8(v_qe_meta, 3) & 3ul;
+	int hw_sched1 = _mm_extract_epi8(v_qe_meta, 7) & 3ul;
+	int hw_sched2 = _mm_extract_epi8(v_qe_meta, 11) & 3ul;
+	int hw_sched3 = _mm_extract_epi8(v_qe_meta, 15) & 3ul;
+
 	v_qid_done = _mm_insert_epi8(v_qid_done, ev_qid0, 2);
 	v_qid_done = _mm_insert_epi8(v_qid_done, ev_qid1, 6);
 	v_qid_done = _mm_insert_epi8(v_qid_done, ev_qid2, 10);
@@ -3682,19 +3687,27 @@ _process_deq_qes_vec_impl(struct dlb2_port *qm_port,
 		v_ev_3 = _mm_blend_epi16(v_unpk_ev_23, v_qe_3, 0x0F);
 		v_ev_3 = _mm_alignr_epi8(v_ev_3, v_ev_3, 8);
 		_mm_storeu_si128((__m128i *)&events[3], v_ev_3);
+		DLB2_INC_STAT(qm_port->ev_port->stats.rx_sched_cnt[hw_sched3],
+			      1);
 		/* fallthrough */
 	case 3:
 		v_ev_2 = _mm_unpacklo_epi64(v_unpk_ev_23, v_qe_2);
 		_mm_storeu_si128((__m128i *)&events[2], v_ev_2);
+		DLB2_INC_STAT(qm_port->ev_port->stats.rx_sched_cnt[hw_sched2],
+			      1);
 		/* fallthrough */
 	case 2:
 		v_ev_1 = _mm_blend_epi16(v_unpk_ev_01, v_qe_1, 0x0F);
 		v_ev_1 = _mm_alignr_epi8(v_ev_1, v_ev_1, 8);
 		_mm_storeu_si128((__m128i *)&events[1], v_ev_1);
+		DLB2_INC_STAT(qm_port->ev_port->stats.rx_sched_cnt[hw_sched1],
+			      1);
 		/* fallthrough */
 	case 1:
 		v_ev_0 = _mm_unpacklo_epi64(v_unpk_ev_01, v_qe_0);
 		_mm_storeu_si128((__m128i *)&events[0], v_ev_0);
+		DLB2_INC_STAT(qm_port->ev_port->stats.rx_sched_cnt[hw_sched0],
+			      1);
 	}
 }
 
-- 
2.23.0
^ permalink raw reply	[flat|nested] 7+ messages in thread
- * [dpdk-dev] [PATCH v3 4/4] event/dlb2: select scalar dequeue by default
  2021-05-21  9:11 [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 David Marchand
                   ` (2 preceding siblings ...)
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 3/4] event/dlb2: fix extraction of HW scheduling type David Marchand
@ 2021-05-21  9:11 ` David Marchand
  2021-05-21 10:26 ` [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 Ferruh Yigit
  4 siblings, 0 replies; 7+ messages in thread
From: David Marchand @ 2021-05-21  9:11 UTC (permalink / raw)
  To: dev; +Cc: thomas, jerinj, timothy.mcdaniel
From: Timothy McDaniel <timothy.mcdaniel@intel.com>
Optimized dequeue using x86 vector instructions was added
in 21.05, but due to limited testing the default has been
changed back to the scalar mode implementation. The vector mode
implementation can be enabled via the devargs option
"vector_opts_enabled=<y/Y>".
Fixes: 000a7b8e7582 ("event/dlb2: optimize dequeue operation")
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v2:
- updated title,
- rebased with doc patch,
---
 doc/guides/eventdevs/dlb2.rst  | 13 +++++++++++++
 drivers/event/dlb2/dlb2.c      | 24 ++++++++++++------------
 drivers/event/dlb2/dlb2_priv.h |  6 +++---
 3 files changed, 28 insertions(+), 15 deletions(-)
diff --git a/doc/guides/eventdevs/dlb2.rst b/doc/guides/eventdevs/dlb2.rst
index 99ea8418a5..bce984ca08 100644
--- a/doc/guides/eventdevs/dlb2.rst
+++ b/doc/guides/eventdevs/dlb2.rst
@@ -367,3 +367,16 @@ Class of service can be specified in the devargs, as follows
     .. code-block:: console
 
        --allow ea:00.0,cos=<0..4>
+
+Use X86 Vector Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DLB supports using x86 vector instructions to optimize the data path.
+
+The default mode of operation is to use scalar instructions, but
+the use of vector instructions can be enabled in the devargs, as
+follows
+
+    .. code-block:: console
+
+       --allow ea:00.0,vector_opts_enabled=<y/Y>
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index 2a3e4ddb47..eca183753f 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -376,11 +376,11 @@ set_default_depth_thresh(const char *key __rte_unused,
 }
 
 static int
-set_vector_opts_disab(const char *key __rte_unused,
+set_vector_opts_enab(const char *key __rte_unused,
 	const char *value,
 	void *opaque)
 {
-	bool *dlb2_vector_opts_disabled = opaque;
+	bool *dlb2_vector_opts_enabled = opaque;
 
 	if (value == NULL || opaque == NULL) {
 		DLB2_LOG_ERR("NULL pointer\n");
@@ -388,9 +388,9 @@ set_vector_opts_disab(const char *key __rte_unused,
 	}
 
 	if ((*value == 'y') || (*value == 'Y'))
-		*dlb2_vector_opts_disabled = true;
+		*dlb2_vector_opts_enabled = true;
 	else
-		*dlb2_vector_opts_disabled = false;
+		*dlb2_vector_opts_enabled = false;
 
 	return 0;
 }
@@ -1469,7 +1469,7 @@ dlb2_hw_create_ldb_port(struct dlb2_eventdev *dlb2,
 #else
 	if ((qm_port->cq_depth > 64) ||
 	    (!rte_is_power_of_2(qm_port->cq_depth)) ||
-	    (dlb2->vector_opts_disabled == true))
+	    (dlb2->vector_opts_enabled == false))
 		qm_port->use_scalar = true;
 #endif
 
@@ -1665,7 +1665,7 @@ dlb2_hw_create_dir_port(struct dlb2_eventdev *dlb2,
 #else
 	if ((qm_port->cq_depth > 64) ||
 	    (!rte_is_power_of_2(qm_port->cq_depth)) ||
-	    (dlb2->vector_opts_disabled == true))
+	    (dlb2->vector_opts_enabled == false))
 		qm_port->use_scalar = true;
 #endif
 
@@ -4434,7 +4434,7 @@ dlb2_primary_eventdev_probe(struct rte_eventdev *dev,
 	dlb2->poll_interval = dlb2_args->poll_interval;
 	dlb2->sw_credit_quanta = dlb2_args->sw_credit_quanta;
 	dlb2->default_depth_thresh = dlb2_args->default_depth_thresh;
-	dlb2->vector_opts_disabled = dlb2_args->vector_opts_disabled;
+	dlb2->vector_opts_enabled = dlb2_args->vector_opts_enabled;
 
 	err = dlb2_iface_open(&dlb2->qm_instance, name);
 	if (err < 0) {
@@ -4538,7 +4538,7 @@ dlb2_parse_params(const char *params,
 					     DLB2_POLL_INTERVAL_ARG,
 					     DLB2_SW_CREDIT_QUANTA_ARG,
 					     DLB2_DEPTH_THRESH_ARG,
-					     DLB2_VECTOR_OPTS_DISAB_ARG,
+					     DLB2_VECTOR_OPTS_ENAB_ARG,
 					     NULL };
 
 	if (params != NULL && params[0] != '\0') {
@@ -4653,11 +4653,11 @@ dlb2_parse_params(const char *params,
 			}
 
 			ret = rte_kvargs_process(kvlist,
-					DLB2_VECTOR_OPTS_DISAB_ARG,
-					set_vector_opts_disab,
-					&dlb2_args->vector_opts_disabled);
+					DLB2_VECTOR_OPTS_ENAB_ARG,
+					set_vector_opts_enab,
+					&dlb2_args->vector_opts_enabled);
 			if (ret != 0) {
-				DLB2_LOG_ERR("%s: Error parsing vector opts disabled",
+				DLB2_LOG_ERR("%s: Error parsing vector opts enabled",
 					     name);
 				rte_kvargs_free(kvlist);
 				return ret;
diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
index b1225af37e..bb87072da0 100644
--- a/drivers/event/dlb2/dlb2_priv.h
+++ b/drivers/event/dlb2/dlb2_priv.h
@@ -37,7 +37,7 @@
 #define DLB2_POLL_INTERVAL_ARG "poll_interval"
 #define DLB2_SW_CREDIT_QUANTA_ARG "sw_credit_quanta"
 #define DLB2_DEPTH_THRESH_ARG "default_depth_thresh"
-#define DLB2_VECTOR_OPTS_DISAB_ARG "vector_opts_disable"
+#define DLB2_VECTOR_OPTS_ENAB_ARG "vector_opts_enable"
 
 /* Begin HW related defines and structs */
 
@@ -565,7 +565,7 @@ struct dlb2_eventdev {
 	uint32_t new_event_limit;
 	int max_num_events_override;
 	int num_dir_credits_override;
-	bool vector_opts_disabled;
+	bool vector_opts_enabled;
 	volatile enum dlb2_run_state run_state;
 	uint16_t num_dir_queues; /* total num of evdev dir queues requested */
 	union {
@@ -623,7 +623,7 @@ struct dlb2_devargs {
 	int poll_interval;
 	int sw_credit_quanta;
 	int default_depth_thresh;
-	bool vector_opts_disabled;
+	bool vector_opts_enabled;
 };
 
 /* End Eventdev related defines and structs */
-- 
2.23.0
^ permalink raw reply	[flat|nested] 7+ messages in thread
- * Re: [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05
  2021-05-21  9:11 [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 David Marchand
                   ` (3 preceding siblings ...)
  2021-05-21  9:11 ` [dpdk-dev] [PATCH v3 4/4] event/dlb2: select scalar dequeue by default David Marchand
@ 2021-05-21 10:26 ` Ferruh Yigit
  2021-05-21 13:46   ` Thomas Monjalon
  4 siblings, 1 reply; 7+ messages in thread
From: Ferruh Yigit @ 2021-05-21 10:26 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: thomas, jerinj, timothy.mcdaniel, John McNamara
On 5/21/2021 10:11 AM, David Marchand wrote:
> Just sending a series with 4 ordered fixes (versioned v3 since some were
> marked as v2).
> Fixed rebase damage, fixes lines and updated titles.
> 
Thanks David!
^ permalink raw reply	[flat|nested] 7+ messages in thread
- * Re: [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05
  2021-05-21 10:26 ` [dpdk-dev] [PATCH v3 0/4] DLB2 fixes for 21.05 Ferruh Yigit
@ 2021-05-21 13:46   ` Thomas Monjalon
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2021-05-21 13:46 UTC (permalink / raw)
  To: David Marchand, jerinj, timothy.mcdaniel; +Cc: dev, John McNamara, Ferruh Yigit
21/05/2021 12:26, Ferruh Yigit:
> On 5/21/2021 10:11 AM, David Marchand wrote:
> > Just sending a series with 4 ordered fixes (versioned v3 since some were
> > marked as v2).
> > Fixed rebase damage, fixes lines and updated titles.
> 
> Thanks David!
Thanks David for cleaning last minute DLB patches.
Applied
DLB and DLB2 were introduced 2 releases ago but it is already a long story.
We should be more cautious when integrating DLB patches in future,
but it will take time and could delay integration.
^ permalink raw reply	[flat|nested] 7+ messages in thread