DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics
       [not found] <cover.1503651900.git.anatoly.burakov@intel.com>
@ 2017-08-25  9:30 ` Anatoly Burakov
  2017-09-04 14:39   ` De Lara Guarch, Pablo
  2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 2/3] qat: enable RX head writes coalescing Anatoly Burakov
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 3/3] qat: enable TX tail " Anatoly Burakov
  2 siblings, 2 replies; 12+ messages in thread
From: Anatoly Burakov @ 2017-08-25  9:30 UTC (permalink / raw)
  To: dev
  Cc: john.griffin, fiona.trahe, deepak.k.jain, pablo.de.lara.guarch,
	Burakov, Anatoly

From: "Burakov, Anatoly" <anatoly.burakov@intel.com>

Replacing atomics in the qat driver with simple 16-bit integers for
number of inflight packets.

This adds a new limitation to the QAT driver: each queue pair is
now explicitly single-threaded.

Signed-off-by: Burakov, Anatoly <anatoly.burakov@intel.com>
---
 doc/guides/cryptodevs/qat.rst          |  1 +
 doc/guides/rel_notes/release_17_11.rst |  6 ++++++
 drivers/crypto/qat/qat_crypto.c        | 12 +++++-------
 drivers/crypto/qat/qat_crypto.h        |  2 +-
 drivers/crypto/qat/qat_qp.c            |  4 ++--
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index a3fce7b..cb17b6b 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -90,6 +90,7 @@ Limitations
 * No BSD support as BSD QAT kernel driver not available.
 * ZUC EEA3/EIA3 is not supported by dh895xcc devices
 * Maximum additional authenticated data (AAD) for GCM is 240 bytes long.
+* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported).
 
 
 Installation
diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 170f4f9..67b6f68 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -41,6 +41,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Updated qat crypto PMD.**
+
+  Performance enhancements:
+
+  * Removed atomics from the internal queue pair structure.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index 1f52cab..2ee5866 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -52,7 +52,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
@@ -946,10 +945,10 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 	tail = queue->tail;
 
 	/* Find how many can actually fit on the ring */
-	overflow = rte_atomic16_add_return(&tmp_qp->inflights16, nb_ops)
-				- queue->max_inflights;
+	tmp_qp->inflights16 += nb_ops;
+	overflow = tmp_qp->inflights16 - queue->max_inflights;
 	if (overflow > 0) {
-		rte_atomic16_sub(&tmp_qp->inflights16, overflow);
+		tmp_qp->inflights16 -= overflow;
 		nb_ops_possible = nb_ops - overflow;
 		if (nb_ops_possible == 0)
 			return 0;
@@ -964,8 +963,7 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 			 * This message cannot be enqueued,
 			 * decrease number of ops that wasn't sent
 			 */
-			rte_atomic16_sub(&tmp_qp->inflights16,
-					nb_ops_possible - nb_ops_sent);
+			tmp_qp->inflights16 -= nb_ops_possible - nb_ops_sent;
 			if (nb_ops_sent == 0)
 				return 0;
 			goto kick_tail;
@@ -1037,7 +1035,7 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 		WRITE_CSR_RING_HEAD(tmp_qp->mmap_bar_addr,
 					queue->hw_bundle_number,
 					queue->hw_queue_number, queue->head);
-		rte_atomic16_sub(&tmp_qp->inflights16, msg_counter);
+		tmp_qp->inflights16 -= msg_counter;
 		tmp_qp->stats.dequeued_count += msg_counter;
 	}
 	return msg_counter;
diff --git a/drivers/crypto/qat/qat_crypto.h b/drivers/crypto/qat/qat_crypto.h
index 3f35a00..7773b57 100644
--- a/drivers/crypto/qat/qat_crypto.h
+++ b/drivers/crypto/qat/qat_crypto.h
@@ -77,7 +77,7 @@ struct qat_queue {
 
 struct qat_qp {
 	void			*mmap_bar_addr;
-	rte_atomic16_t		inflights16;
+	uint16_t		inflights16;
 	struct	qat_queue	tx_q;
 	struct	qat_queue	rx_q;
 	struct	rte_cryptodev_stats stats;
diff --git a/drivers/crypto/qat/qat_qp.c b/drivers/crypto/qat/qat_qp.c
index 5048d21..e98bffe 100644
--- a/drivers/crypto/qat/qat_qp.c
+++ b/drivers/crypto/qat/qat_qp.c
@@ -186,7 +186,7 @@ int qat_crypto_sym_qp_setup(struct rte_cryptodev *dev, uint16_t queue_pair_id,
 			RTE_CACHE_LINE_SIZE);
 
 	qp->mmap_bar_addr = pci_dev->mem_resource[0].addr;
-	rte_atomic16_init(&qp->inflights16);
+	qp->inflights16 = 0;
 
 	if (qat_tx_queue_create(dev, &(qp->tx_q),
 		queue_pair_id, qp_conf->nb_descriptors, socket_id) != 0) {
@@ -269,7 +269,7 @@ int qat_crypto_sym_qp_release(struct rte_cryptodev *dev, uint16_t queue_pair_id)
 	}
 
 	/* Don't free memory if there are still responses to be processed */
-	if (rte_atomic16_read(&(qp->inflights16)) == 0) {
+	if (qp->inflights16 == 0) {
 		qat_queue_delete(&(qp->tx_q));
 		qat_queue_delete(&(qp->rx_q));
 	} else {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [DPDK] [PATCH 2/3] qat: enable RX head writes coalescing
       [not found] <cover.1503651900.git.anatoly.burakov@intel.com>
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics Anatoly Burakov
@ 2017-08-25  9:30 ` Anatoly Burakov
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 3/3] qat: enable TX tail " Anatoly Burakov
  2 siblings, 0 replies; 12+ messages in thread
From: Anatoly Burakov @ 2017-08-25  9:30 UTC (permalink / raw)
  To: dev
  Cc: john.griffin, fiona.trahe, deepak.k.jain, pablo.de.lara.guarch,
	Burakov, Anatoly

From: "Burakov, Anatoly" <anatoly.burakov@intel.com>

Don't write CSR head until we processed enough RX descriptors.
Also delay marking them as free until we are writing CSR head.

Signed-off-by: Burakov, Anatoly <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_17_11.rst |  1 +
 drivers/crypto/qat/qat_crypto.c        | 49 ++++++++++++++++++++++++++--------
 drivers/crypto/qat/qat_crypto.h        |  6 +++++
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 67b6f68..0a400cd 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -46,6 +46,7 @@ New Features
   Performance enhancements:
 
   * Removed atomics from the internal queue pair structure.
+  * Coalesce writes to HEAD CSR on response processing.
 
 
 Resolved Issues
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index 2ee5866..e520049 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -981,6 +981,33 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 	return nb_ops_sent;
 }
 
+static inline
+void rxq_free_desc(struct qat_qp *qp, struct qat_queue *q)
+{
+	uint32_t old_head, new_head;
+	uint32_t max_head;
+
+	old_head = q->csr_head;
+	new_head = q->head;
+	max_head = qp->nb_descriptors * q->msg_size;
+
+	/* write out free descriptors */
+	void *cur_desc = (uint8_t *)q->base_addr + old_head;
+
+	if (new_head < old_head) {
+		memset(cur_desc, ADF_RING_EMPTY_SIG, max_head - old_head);
+		memset(q->base_addr, ADF_RING_EMPTY_SIG, new_head);
+	} else {
+		memset(cur_desc, ADF_RING_EMPTY_SIG, new_head - old_head);
+	}
+	q->nb_processed_responses = 0;
+	q->csr_head = new_head;
+
+	/* write current head to CSR */
+	WRITE_CSR_RING_HEAD(qp->mmap_bar_addr, q->hw_bundle_number,
+			    q->hw_queue_number, new_head);
+}
+
 uint16_t
 qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -990,10 +1017,12 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 	uint32_t msg_counter = 0;
 	struct rte_crypto_op *rx_op;
 	struct icp_qat_fw_comn_resp *resp_msg;
+	uint32_t head;
 
 	queue = &(tmp_qp->rx_q);
+	head = queue->head;
 	resp_msg = (struct icp_qat_fw_comn_resp *)
-			((uint8_t *)queue->base_addr + queue->head);
+			((uint8_t *)queue->base_addr + head);
 
 	while (*(uint32_t *)resp_msg != ADF_RING_EMPTY_SIG &&
 			msg_counter != nb_ops) {
@@ -1020,23 +1049,21 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 			rx_op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
 		}
 
-		*(uint32_t *)resp_msg = ADF_RING_EMPTY_SIG;
-		queue->head = adf_modulo(queue->head +
-				queue->msg_size,
-				ADF_RING_SIZE_MODULO(queue->queue_size));
+		head = adf_modulo(head + queue->msg_size, queue->modulo);
 		resp_msg = (struct icp_qat_fw_comn_resp *)
-					((uint8_t *)queue->base_addr +
-							queue->head);
+				((uint8_t *)queue->base_addr + head);
 		*ops = rx_op;
 		ops++;
 		msg_counter++;
 	}
 	if (msg_counter > 0) {
-		WRITE_CSR_RING_HEAD(tmp_qp->mmap_bar_addr,
-					queue->hw_bundle_number,
-					queue->hw_queue_number, queue->head);
-		tmp_qp->inflights16 -= msg_counter;
+		queue->head = head;
 		tmp_qp->stats.dequeued_count += msg_counter;
+		queue->nb_processed_responses += msg_counter;
+		tmp_qp->inflights16 -= msg_counter;
+
+		if (queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH)
+			rxq_free_desc(tmp_qp, queue);
 	}
 	return msg_counter;
 }
diff --git a/drivers/crypto/qat/qat_crypto.h b/drivers/crypto/qat/qat_crypto.h
index 7773b57..d78957c 100644
--- a/drivers/crypto/qat/qat_crypto.h
+++ b/drivers/crypto/qat/qat_crypto.h
@@ -50,6 +50,9 @@
 	(((num) + (align) - 1) & ~((align) - 1))
 #define QAT_64_BTYE_ALIGN_MASK (~0x3f)
 
+#define QAT_CSR_HEAD_WRITE_THRESH 32U
+/* number of requests to accumulate before writing head CSR */
+
 struct qat_session;
 
 enum qat_device_gen {
@@ -73,6 +76,9 @@ struct qat_queue {
 	uint8_t		hw_bundle_number;
 	uint8_t		hw_queue_number;
 	/* HW queue aka ring offset on bundle */
+	uint32_t	csr_head;		/* last written head value */
+	uint16_t	nb_processed_responses;
+	/* number of responses processed since last CSR head write */
 };
 
 struct qat_qp {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [DPDK] [PATCH 3/3] qat: enable TX tail writes coalescing
       [not found] <cover.1503651900.git.anatoly.burakov@intel.com>
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics Anatoly Burakov
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 2/3] qat: enable RX head writes coalescing Anatoly Burakov
@ 2017-08-25  9:30 ` Anatoly Burakov
  2 siblings, 0 replies; 12+ messages in thread
From: Anatoly Burakov @ 2017-08-25  9:30 UTC (permalink / raw)
  To: dev
  Cc: john.griffin, fiona.trahe, deepak.k.jain, pablo.de.lara.guarch,
	Burakov, Anatoly

From: "Burakov, Anatoly" <anatoly.burakov@intel.com>

Don't write CSR tail until we processed enough TX descriptors.

To avoid crypto operations sitting in the TX ring indefinitely,
the "force write" threshold is used:
 - on TX, no tail write coalescing will occur if number of inflights
   is below force write threshold
 - on RX, check if we have a number of crypto ops enqueued that is
   below force write threshold that are not yet submitted to
   processing.

Signed-off-by: Burakov, Anatoly <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_17_11.rst |  1 +
 drivers/crypto/qat/qat_crypto.c        | 41 ++++++++++++++++++++++++----------
 drivers/crypto/qat/qat_crypto.h        |  7 ++++++
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 0a400cd..ad9474c 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -47,6 +47,7 @@ New Features
 
   * Removed atomics from the internal queue pair structure.
   * Coalesce writes to HEAD CSR on response processing.
+  * Coalesce writes to TAIL CSR on request processing.
 
 
 Resolved Issues
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index e520049..fba699e 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -922,6 +922,14 @@ qat_bpicipher_postprocess(struct qat_session *ctx,
 	return sym_op->cipher.data.length - last_block_len;
 }
 
+static inline void
+txq_write_tail(struct qat_qp *qp, struct qat_queue *q) {
+	WRITE_CSR_RING_TAIL(qp->mmap_bar_addr, q->hw_bundle_number,
+			q->hw_queue_number, q->tail);
+	q->nb_pending_requests = 0;
+	q->csr_tail = q->tail;
+}
+
 uint16_t
 qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -974,10 +982,13 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 		cur_op++;
 	}
 kick_tail:
-	WRITE_CSR_RING_TAIL(tmp_qp->mmap_bar_addr, queue->hw_bundle_number,
-			queue->hw_queue_number, tail);
 	queue->tail = tail;
 	tmp_qp->stats.enqueued_count += nb_ops_sent;
+	queue->nb_pending_requests += nb_ops_sent;
+	if (tmp_qp->inflights16 < QAT_CSR_TAIL_FORCE_WRITE_THRESH ||
+			queue->nb_pending_requests > QAT_CSR_TAIL_WRITE_THRESH) {
+		txq_write_tail(tmp_qp, queue);
+	}
 	return nb_ops_sent;
 }
 
@@ -1012,17 +1023,18 @@ uint16_t
 qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
 {
-	struct qat_queue *queue;
+	struct qat_queue *rx_queue, *tx_queue;
 	struct qat_qp *tmp_qp = (struct qat_qp *)qp;
 	uint32_t msg_counter = 0;
 	struct rte_crypto_op *rx_op;
 	struct icp_qat_fw_comn_resp *resp_msg;
 	uint32_t head;
 
-	queue = &(tmp_qp->rx_q);
-	head = queue->head;
+	rx_queue = &(tmp_qp->rx_q);
+	tx_queue = &(tmp_qp->tx_q);
+	head = rx_queue->head;
 	resp_msg = (struct icp_qat_fw_comn_resp *)
-			((uint8_t *)queue->base_addr + head);
+			((uint8_t *)rx_queue->base_addr + head);
 
 	while (*(uint32_t *)resp_msg != ADF_RING_EMPTY_SIG &&
 			msg_counter != nb_ops) {
@@ -1049,21 +1061,26 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 			rx_op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
 		}
 
-		head = adf_modulo(head + queue->msg_size, queue->modulo);
+		head = adf_modulo(head + rx_queue->msg_size, rx_queue->modulo);
 		resp_msg = (struct icp_qat_fw_comn_resp *)
-				((uint8_t *)queue->base_addr + head);
+				((uint8_t *)rx_queue->base_addr + head);
 		*ops = rx_op;
 		ops++;
 		msg_counter++;
 	}
 	if (msg_counter > 0) {
-		queue->head = head;
+		rx_queue->head = head;
 		tmp_qp->stats.dequeued_count += msg_counter;
-		queue->nb_processed_responses += msg_counter;
+		rx_queue->nb_processed_responses += msg_counter;
 		tmp_qp->inflights16 -= msg_counter;
 
-		if (queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH)
-			rxq_free_desc(tmp_qp, queue);
+		if (rx_queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH)
+			rxq_free_desc(tmp_qp, rx_queue);
+	}
+	/* also check if tail needs to be advanced */
+	if (tmp_qp->inflights16 <= QAT_CSR_TAIL_FORCE_WRITE_THRESH &&
+			tx_queue->tail != tx_queue->csr_tail) {
+		txq_write_tail(tmp_qp, tx_queue);
 	}
 	return msg_counter;
 }
diff --git a/drivers/crypto/qat/qat_crypto.h b/drivers/crypto/qat/qat_crypto.h
index d78957c..0ebb083 100644
--- a/drivers/crypto/qat/qat_crypto.h
+++ b/drivers/crypto/qat/qat_crypto.h
@@ -52,6 +52,10 @@
 
 #define QAT_CSR_HEAD_WRITE_THRESH 32U
 /* number of requests to accumulate before writing head CSR */
+#define QAT_CSR_TAIL_WRITE_THRESH 32U
+/* number of requests to accumulate before writing tail CSR */
+#define QAT_CSR_TAIL_FORCE_WRITE_THRESH 256U
+/* number of inflights below which no tail write coalescing should occur */
 
 struct qat_session;
 
@@ -77,8 +81,11 @@ struct qat_queue {
 	uint8_t		hw_queue_number;
 	/* HW queue aka ring offset on bundle */
 	uint32_t	csr_head;		/* last written head value */
+	uint32_t	csr_tail;		/* last written tail value */
 	uint16_t	nb_processed_responses;
 	/* number of responses processed since last CSR head write */
+	uint16_t	nb_pending_requests;
+	/* number of requests pending since last CSR tail write */
 };
 
 struct qat_qp {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics Anatoly Burakov
@ 2017-09-04 14:39   ` De Lara Guarch, Pablo
  2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
  1 sibling, 0 replies; 12+ messages in thread
From: De Lara Guarch, Pablo @ 2017-09-04 14:39 UTC (permalink / raw)
  To: Burakov, Anatoly, dev; +Cc: Griffin, John, Trahe, Fiona, Jain, Deepak K



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Friday, August 25, 2017 10:31 AM
> To: dev@dpdk.org
> Cc: Griffin, John <john.griffin@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>; De
> Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: [DPDK] [PATCH 1/3] qat: remove atomics

Title should be "crypto/qat: remove..."

> 
> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> 
> Replacing atomics in the qat driver with simple 16-bit integers for number
> of inflight packets.
> 
> This adds a new limitation to the QAT driver: each queue pair is now
> explicitly single-threaded.
> 
> Signed-off-by: Burakov, Anatoly <anatoly.burakov@intel.com>
> ---
>  doc/guides/cryptodevs/qat.rst          |  1 +
>  doc/guides/rel_notes/release_17_11.rst |  6 ++++++
>  drivers/crypto/qat/qat_crypto.c        | 12 +++++-------
>  drivers/crypto/qat/qat_crypto.h        |  2 +-
>  drivers/crypto/qat/qat_qp.c            |  4 ++--
>  5 files changed, 15 insertions(+), 10 deletions(-)
> 

...

> diff --git a/doc/guides/rel_notes/release_17_11.rst
> b/doc/guides/rel_notes/release_17_11.rst
> index 170f4f9..67b6f68 100644
> --- a/doc/guides/rel_notes/release_17_11.rst
> +++ b/doc/guides/rel_notes/release_17_11.rst
> @@ -41,6 +41,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
> 
> =========================================================
> 
> +* **Updated qat crypto PMD.**

"qat" should be in capital letters.

> +
> +  Performance enhancements:
> +
> +  * Removed atomics from the internal queue pair structure.
> +
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver
  2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics Anatoly Burakov
  2017-09-04 14:39   ` De Lara Guarch, Pablo
@ 2017-09-12  9:31   ` Anatoly Burakov
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics Anatoly Burakov
                       ` (3 more replies)
  1 sibling, 4 replies; 12+ messages in thread
From: Anatoly Burakov @ 2017-09-12  9:31 UTC (permalink / raw)
  To: dev; +Cc: fiona.trahe, john.griffin, deepak.k.jain, pablo.de.lara.guarch

A few performance enhancements for QAT crypto driver. These include:
- Removing reliance on atomics on hot path
  - This adds a new limitation, making queue pairs single-threaded
- Coalesce RX and TX CSR writes

v2: added cover letter
    fixed commit messages
    fixed documentation

Anatoly Burakov (3):
  crypto/qat: remove atomics
  crypto/qat: enable RX head writes coalescing
  crypto/qat: enable TX tail writes coalescing

 doc/guides/cryptodevs/qat.rst          |  1 +
 doc/guides/rel_notes/release_17_11.rst |  8 ++++
 drivers/crypto/qat/qat_crypto.c        | 84 +++++++++++++++++++++++++---------
 drivers/crypto/qat/qat_crypto.h        | 15 +++++-
 drivers/crypto/qat/qat_qp.c            |  4 +-
 5 files changed, 88 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics
  2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
@ 2017-09-12  9:31     ` Anatoly Burakov
  2017-09-15 11:35       ` Trahe, Fiona
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing Anatoly Burakov
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Anatoly Burakov @ 2017-09-12  9:31 UTC (permalink / raw)
  To: dev; +Cc: fiona.trahe, john.griffin, deepak.k.jain, pablo.de.lara.guarch

Replacing atomics in the QAT driver with simple 16-bit integers for
number of inflight packets.

This adds a new limitation to the QAT driver: each queue pair is
now explicitly single-threaded.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
v2: fixed commit message
    fixed documentation

 doc/guides/cryptodevs/qat.rst          |  1 +
 doc/guides/rel_notes/release_17_11.rst |  6 ++++++
 drivers/crypto/qat/qat_crypto.c        | 12 +++++-------
 drivers/crypto/qat/qat_crypto.h        |  2 +-
 drivers/crypto/qat/qat_qp.c            |  4 ++--
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index a3fce7b..cb17b6b 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -90,6 +90,7 @@ Limitations
 * No BSD support as BSD QAT kernel driver not available.
 * ZUC EEA3/EIA3 is not supported by dh895xcc devices
 * Maximum additional authenticated data (AAD) for GCM is 240 bytes long.
+* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported).
 
 
 Installation
diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 170f4f9..96f954f 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -41,6 +41,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Updated QAT crypto PMD.**
+
+  Performance enhancements:
+
+  * Removed atomics from the internal queue pair structure.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index 62ee175..bb199ae 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -51,7 +51,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
@@ -945,10 +944,10 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 	tail = queue->tail;
 
 	/* Find how many can actually fit on the ring */
-	overflow = rte_atomic16_add_return(&tmp_qp->inflights16, nb_ops)
-				- queue->max_inflights;
+	tmp_qp->inflights16 += nb_ops;
+	overflow = tmp_qp->inflights16 - queue->max_inflights;
 	if (overflow > 0) {
-		rte_atomic16_sub(&tmp_qp->inflights16, overflow);
+		tmp_qp->inflights16 -= overflow;
 		nb_ops_possible = nb_ops - overflow;
 		if (nb_ops_possible == 0)
 			return 0;
@@ -963,8 +962,7 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 			 * This message cannot be enqueued,
 			 * decrease number of ops that wasn't sent
 			 */
-			rte_atomic16_sub(&tmp_qp->inflights16,
-					nb_ops_possible - nb_ops_sent);
+			tmp_qp->inflights16 -= nb_ops_possible - nb_ops_sent;
 			if (nb_ops_sent == 0)
 				return 0;
 			goto kick_tail;
@@ -1036,7 +1034,7 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 		WRITE_CSR_RING_HEAD(tmp_qp->mmap_bar_addr,
 					queue->hw_bundle_number,
 					queue->hw_queue_number, queue->head);
-		rte_atomic16_sub(&tmp_qp->inflights16, msg_counter);
+		tmp_qp->inflights16 -= msg_counter;
 		tmp_qp->stats.dequeued_count += msg_counter;
 	}
 	return msg_counter;
diff --git a/drivers/crypto/qat/qat_crypto.h b/drivers/crypto/qat/qat_crypto.h
index 3f35a00..7773b57 100644
--- a/drivers/crypto/qat/qat_crypto.h
+++ b/drivers/crypto/qat/qat_crypto.h
@@ -77,7 +77,7 @@ struct qat_queue {
 
 struct qat_qp {
 	void			*mmap_bar_addr;
-	rte_atomic16_t		inflights16;
+	uint16_t		inflights16;
 	struct	qat_queue	tx_q;
 	struct	qat_queue	rx_q;
 	struct	rte_cryptodev_stats stats;
diff --git a/drivers/crypto/qat/qat_qp.c b/drivers/crypto/qat/qat_qp.c
index 5048d21..e98bffe 100644
--- a/drivers/crypto/qat/qat_qp.c
+++ b/drivers/crypto/qat/qat_qp.c
@@ -186,7 +186,7 @@ int qat_crypto_sym_qp_setup(struct rte_cryptodev *dev, uint16_t queue_pair_id,
 			RTE_CACHE_LINE_SIZE);
 
 	qp->mmap_bar_addr = pci_dev->mem_resource[0].addr;
-	rte_atomic16_init(&qp->inflights16);
+	qp->inflights16 = 0;
 
 	if (qat_tx_queue_create(dev, &(qp->tx_q),
 		queue_pair_id, qp_conf->nb_descriptors, socket_id) != 0) {
@@ -269,7 +269,7 @@ int qat_crypto_sym_qp_release(struct rte_cryptodev *dev, uint16_t queue_pair_id)
 	}
 
 	/* Don't free memory if there are still responses to be processed */
-	if (rte_atomic16_read(&(qp->inflights16)) == 0) {
+	if (qp->inflights16 == 0) {
 		qat_queue_delete(&(qp->tx_q));
 		qat_queue_delete(&(qp->rx_q));
 	} else {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing
  2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics Anatoly Burakov
@ 2017-09-12  9:31     ` Anatoly Burakov
  2017-09-15 11:55       ` Trahe, Fiona
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 3/3] crypto/qat: enable TX tail " Anatoly Burakov
  2017-09-18 11:03     ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver De Lara Guarch, Pablo
  3 siblings, 1 reply; 12+ messages in thread
From: Anatoly Burakov @ 2017-09-12  9:31 UTC (permalink / raw)
  To: dev; +Cc: fiona.trahe, john.griffin, deepak.k.jain, pablo.de.lara.guarch

Don't write CSR head until we processed enough RX descriptors.
Also delay marking them as free until we are writing CSR head.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
v2: fixed commit message

 doc/guides/rel_notes/release_17_11.rst |  1 +
 drivers/crypto/qat/qat_crypto.c        | 49 ++++++++++++++++++++++++++--------
 drivers/crypto/qat/qat_crypto.h        |  6 +++++
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 96f954f..0b77095 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -46,6 +46,7 @@ New Features
   Performance enhancements:
 
   * Removed atomics from the internal queue pair structure.
+  * Coalesce writes to HEAD CSR on response processing.
 
 
 Resolved Issues
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index bb199ae..1656e0f 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -980,6 +980,33 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 	return nb_ops_sent;
 }
 
+static inline
+void rxq_free_desc(struct qat_qp *qp, struct qat_queue *q)
+{
+	uint32_t old_head, new_head;
+	uint32_t max_head;
+
+	old_head = q->csr_head;
+	new_head = q->head;
+	max_head = qp->nb_descriptors * q->msg_size;
+
+	/* write out free descriptors */
+	void *cur_desc = (uint8_t *)q->base_addr + old_head;
+
+	if (new_head < old_head) {
+		memset(cur_desc, ADF_RING_EMPTY_SIG, max_head - old_head);
+		memset(q->base_addr, ADF_RING_EMPTY_SIG, new_head);
+	} else {
+		memset(cur_desc, ADF_RING_EMPTY_SIG, new_head - old_head);
+	}
+	q->nb_processed_responses = 0;
+	q->csr_head = new_head;
+
+	/* write current head to CSR */
+	WRITE_CSR_RING_HEAD(qp->mmap_bar_addr, q->hw_bundle_number,
+			    q->hw_queue_number, new_head);
+}
+
 uint16_t
 qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -989,10 +1016,12 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 	uint32_t msg_counter = 0;
 	struct rte_crypto_op *rx_op;
 	struct icp_qat_fw_comn_resp *resp_msg;
+	uint32_t head;
 
 	queue = &(tmp_qp->rx_q);
+	head = queue->head;
 	resp_msg = (struct icp_qat_fw_comn_resp *)
-			((uint8_t *)queue->base_addr + queue->head);
+			((uint8_t *)queue->base_addr + head);
 
 	while (*(uint32_t *)resp_msg != ADF_RING_EMPTY_SIG &&
 			msg_counter != nb_ops) {
@@ -1019,23 +1048,21 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 			rx_op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
 		}
 
-		*(uint32_t *)resp_msg = ADF_RING_EMPTY_SIG;
-		queue->head = adf_modulo(queue->head +
-				queue->msg_size,
-				ADF_RING_SIZE_MODULO(queue->queue_size));
+		head = adf_modulo(head + queue->msg_size, queue->modulo);
 		resp_msg = (struct icp_qat_fw_comn_resp *)
-					((uint8_t *)queue->base_addr +
-							queue->head);
+				((uint8_t *)queue->base_addr + head);
 		*ops = rx_op;
 		ops++;
 		msg_counter++;
 	}
 	if (msg_counter > 0) {
-		WRITE_CSR_RING_HEAD(tmp_qp->mmap_bar_addr,
-					queue->hw_bundle_number,
-					queue->hw_queue_number, queue->head);
-		tmp_qp->inflights16 -= msg_counter;
+		queue->head = head;
 		tmp_qp->stats.dequeued_count += msg_counter;
+		queue->nb_processed_responses += msg_counter;
+		tmp_qp->inflights16 -= msg_counter;
+
+		if (queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH)
+			rxq_free_desc(tmp_qp, queue);
 	}
 	return msg_counter;
 }
diff --git a/drivers/crypto/qat/qat_crypto.h b/drivers/crypto/qat/qat_crypto.h
index 7773b57..d78957c 100644
--- a/drivers/crypto/qat/qat_crypto.h
+++ b/drivers/crypto/qat/qat_crypto.h
@@ -50,6 +50,9 @@
 	(((num) + (align) - 1) & ~((align) - 1))
 #define QAT_64_BTYE_ALIGN_MASK (~0x3f)
 
+#define QAT_CSR_HEAD_WRITE_THRESH 32U
+/* number of requests to accumulate before writing head CSR */
+
 struct qat_session;
 
 enum qat_device_gen {
@@ -73,6 +76,9 @@ struct qat_queue {
 	uint8_t		hw_bundle_number;
 	uint8_t		hw_queue_number;
 	/* HW queue aka ring offset on bundle */
+	uint32_t	csr_head;		/* last written head value */
+	uint16_t	nb_processed_responses;
+	/* number of responses processed since last CSR head write */
 };
 
 struct qat_qp {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH v2 3/3] crypto/qat: enable TX tail writes coalescing
  2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics Anatoly Burakov
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing Anatoly Burakov
@ 2017-09-12  9:31     ` Anatoly Burakov
  2017-09-15 13:17       ` Trahe, Fiona
  2017-09-18 11:03     ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver De Lara Guarch, Pablo
  3 siblings, 1 reply; 12+ messages in thread
From: Anatoly Burakov @ 2017-09-12  9:31 UTC (permalink / raw)
  To: dev; +Cc: fiona.trahe, john.griffin, deepak.k.jain, pablo.de.lara.guarch

Don't write CSR tail until we processed enough TX descriptors.

To avoid crypto operations sitting in the TX ring indefinitely,
the "force write" threshold is used:
 - on TX, no tail write coalescing will occur if number of inflights
   is below force write threshold
 - on RX, check if we have a number of crypto ops enqueued that is
   below force write threshold that are not yet submitted to
   processing.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
v2: fixed commit message

 doc/guides/rel_notes/release_17_11.rst |  1 +
 drivers/crypto/qat/qat_crypto.c        | 41 ++++++++++++++++++++++++----------
 drivers/crypto/qat/qat_crypto.h        |  7 ++++++
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 0b77095..f0d3960 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -47,6 +47,7 @@ New Features
 
   * Removed atomics from the internal queue pair structure.
   * Coalesce writes to HEAD CSR on response processing.
+  * Coalesce writes to TAIL CSR on request processing.
 
 
 Resolved Issues
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index 1656e0f..a2b202f 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -921,6 +921,14 @@ qat_bpicipher_postprocess(struct qat_session *ctx,
 	return sym_op->cipher.data.length - last_block_len;
 }
 
+static inline void
+txq_write_tail(struct qat_qp *qp, struct qat_queue *q) {
+	WRITE_CSR_RING_TAIL(qp->mmap_bar_addr, q->hw_bundle_number,
+			q->hw_queue_number, q->tail);
+	q->nb_pending_requests = 0;
+	q->csr_tail = q->tail;
+}
+
 uint16_t
 qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -973,10 +981,13 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
 		cur_op++;
 	}
 kick_tail:
-	WRITE_CSR_RING_TAIL(tmp_qp->mmap_bar_addr, queue->hw_bundle_number,
-			queue->hw_queue_number, tail);
 	queue->tail = tail;
 	tmp_qp->stats.enqueued_count += nb_ops_sent;
+	queue->nb_pending_requests += nb_ops_sent;
+	if (tmp_qp->inflights16 < QAT_CSR_TAIL_FORCE_WRITE_THRESH ||
+			queue->nb_pending_requests > QAT_CSR_TAIL_WRITE_THRESH) {
+		txq_write_tail(tmp_qp, queue);
+	}
 	return nb_ops_sent;
 }
 
@@ -1011,17 +1022,18 @@ uint16_t
 qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
 {
-	struct qat_queue *queue;
+	struct qat_queue *rx_queue, *tx_queue;
 	struct qat_qp *tmp_qp = (struct qat_qp *)qp;
 	uint32_t msg_counter = 0;
 	struct rte_crypto_op *rx_op;
 	struct icp_qat_fw_comn_resp *resp_msg;
 	uint32_t head;
 
-	queue = &(tmp_qp->rx_q);
-	head = queue->head;
+	rx_queue = &(tmp_qp->rx_q);
+	tx_queue = &(tmp_qp->tx_q);
+	head = rx_queue->head;
 	resp_msg = (struct icp_qat_fw_comn_resp *)
-			((uint8_t *)queue->base_addr + head);
+			((uint8_t *)rx_queue->base_addr + head);
 
 	while (*(uint32_t *)resp_msg != ADF_RING_EMPTY_SIG &&
 			msg_counter != nb_ops) {
@@ -1048,21 +1060,26 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op **ops,
 			rx_op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
 		}
 
-		head = adf_modulo(head + queue->msg_size, queue->modulo);
+		head = adf_modulo(head + rx_queue->msg_size, rx_queue->modulo);
 		resp_msg = (struct icp_qat_fw_comn_resp *)
-				((uint8_t *)queue->base_addr + head);
+				((uint8_t *)rx_queue->base_addr + head);
 		*ops = rx_op;
 		ops++;
 		msg_counter++;
 	}
 	if (msg_counter > 0) {
-		queue->head = head;
+		rx_queue->head = head;
 		tmp_qp->stats.dequeued_count += msg_counter;
-		queue->nb_processed_responses += msg_counter;
+		rx_queue->nb_processed_responses += msg_counter;
 		tmp_qp->inflights16 -= msg_counter;
 
-		if (queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH)
-			rxq_free_desc(tmp_qp, queue);
+		if (rx_queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH)
+			rxq_free_desc(tmp_qp, rx_queue);
+	}
+	/* also check if tail needs to be advanced */
+	if (tmp_qp->inflights16 <= QAT_CSR_TAIL_FORCE_WRITE_THRESH &&
+			tx_queue->tail != tx_queue->csr_tail) {
+		txq_write_tail(tmp_qp, tx_queue);
 	}
 	return msg_counter;
 }
diff --git a/drivers/crypto/qat/qat_crypto.h b/drivers/crypto/qat/qat_crypto.h
index d78957c..0ebb083 100644
--- a/drivers/crypto/qat/qat_crypto.h
+++ b/drivers/crypto/qat/qat_crypto.h
@@ -52,6 +52,10 @@
 
 #define QAT_CSR_HEAD_WRITE_THRESH 32U
 /* number of requests to accumulate before writing head CSR */
+#define QAT_CSR_TAIL_WRITE_THRESH 32U
+/* number of requests to accumulate before writing tail CSR */
+#define QAT_CSR_TAIL_FORCE_WRITE_THRESH 256U
+/* number of inflights below which no tail write coalescing should occur */
 
 struct qat_session;
 
@@ -77,8 +81,11 @@ struct qat_queue {
 	uint8_t		hw_queue_number;
 	/* HW queue aka ring offset on bundle */
 	uint32_t	csr_head;		/* last written head value */
+	uint32_t	csr_tail;		/* last written tail value */
 	uint16_t	nb_processed_responses;
 	/* number of responses processed since last CSR head write */
+	uint16_t	nb_pending_requests;
+	/* number of requests pending since last CSR tail write */
 };
 
 struct qat_qp {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics Anatoly Burakov
@ 2017-09-15 11:35       ` Trahe, Fiona
  0 siblings, 0 replies; 12+ messages in thread
From: Trahe, Fiona @ 2017-09-15 11:35 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Griffin, John, Jain, Deepak K, De Lara Guarch, Pablo



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, September 12, 2017 10:31 AM
> To: dev@dpdk.org
> Cc: Trahe, Fiona <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>; Jain, Deepak K
> <deepak.k.jain@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: [PATCH v2 1/3] crypto/qat: remove atomics
> 
> Replacing atomics in the QAT driver with simple 16-bit integers for
> number of inflight packets.
> 
> This adds a new limitation to the QAT driver: each queue pair is
> now explicitly single-threaded.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing Anatoly Burakov
@ 2017-09-15 11:55       ` Trahe, Fiona
  0 siblings, 0 replies; 12+ messages in thread
From: Trahe, Fiona @ 2017-09-15 11:55 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Griffin, John, Jain, Deepak K, De Lara Guarch, Pablo



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, September 12, 2017 10:31 AM
> To: dev@dpdk.org
> Cc: Trahe, Fiona <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>; Jain, Deepak K
> <deepak.k.jain@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing
> 
> Don't write CSR head until we processed enough RX descriptors.
> Also delay marking them as free until we are writing CSR head.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/3] crypto/qat: enable TX tail writes coalescing
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 3/3] crypto/qat: enable TX tail " Anatoly Burakov
@ 2017-09-15 13:17       ` Trahe, Fiona
  0 siblings, 0 replies; 12+ messages in thread
From: Trahe, Fiona @ 2017-09-15 13:17 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Griffin, John, Jain, Deepak K, De Lara Guarch, Pablo



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, September 12, 2017 10:31 AM
> To: dev@dpdk.org
> Cc: Trahe, Fiona <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>; Jain, Deepak K
> <deepak.k.jain@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: [PATCH v2 3/3] crypto/qat: enable TX tail writes coalescing
> 
> Don't write CSR tail until we processed enough TX descriptors.
> 
> To avoid crypto operations sitting in the TX ring indefinitely,
> the "force write" threshold is used:
>  - on TX, no tail write coalescing will occur if number of inflights
>    is below force write threshold
>  - on RX, check if we have a number of crypto ops enqueued that is
>    below force write threshold that are not yet submitted to
>    processing.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver
  2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
                       ` (2 preceding siblings ...)
  2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 3/3] crypto/qat: enable TX tail " Anatoly Burakov
@ 2017-09-18 11:03     ` De Lara Guarch, Pablo
  3 siblings, 0 replies; 12+ messages in thread
From: De Lara Guarch, Pablo @ 2017-09-18 11:03 UTC (permalink / raw)
  To: Burakov, Anatoly, dev; +Cc: Trahe, Fiona, Griffin, John, Jain, Deepak K



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, September 12, 2017 10:31 AM
> To: dev@dpdk.org
> Cc: Trahe, Fiona <fiona.trahe@intel.com>; Griffin, John
> <john.griffin@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>; De
> Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: [PATCH v2 0/3] performance enhancements for QAT driver
> 
> A few performance enhancements for QAT crypto driver. These include:
> - Removing reliance on atomics on hot path
>   - This adds a new limitation, making queue pairs single-threaded
> - Coalesce RX and TX CSR writes
> 
> v2: added cover letter
>     fixed commit messages
>     fixed documentation
> 
> Anatoly Burakov (3):
>   crypto/qat: remove atomics
>   crypto/qat: enable RX head writes coalescing
>   crypto/qat: enable TX tail writes coalescing
> 
>  doc/guides/cryptodevs/qat.rst          |  1 +
>  doc/guides/rel_notes/release_17_11.rst |  8 ++++
>  drivers/crypto/qat/qat_crypto.c        | 84 +++++++++++++++++++++++++-
> --------
>  drivers/crypto/qat/qat_crypto.h        | 15 +++++-
>  drivers/crypto/qat/qat_qp.c            |  4 +-
>  5 files changed, 88 insertions(+), 24 deletions(-)
> 
> --
> 2.7.4

Applied to dpdk-next-crypto.
Thanks,

Pablo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-18 11:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1503651900.git.anatoly.burakov@intel.com>
2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 1/3] qat: remove atomics Anatoly Burakov
2017-09-04 14:39   ` De Lara Guarch, Pablo
2017-09-12  9:31   ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver Anatoly Burakov
2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 1/3] crypto/qat: remove atomics Anatoly Burakov
2017-09-15 11:35       ` Trahe, Fiona
2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 2/3] crypto/qat: enable RX head writes coalescing Anatoly Burakov
2017-09-15 11:55       ` Trahe, Fiona
2017-09-12  9:31     ` [dpdk-dev] [PATCH v2 3/3] crypto/qat: enable TX tail " Anatoly Burakov
2017-09-15 13:17       ` Trahe, Fiona
2017-09-18 11:03     ` [dpdk-dev] [PATCH v2 0/3] performance enhancements for QAT driver De Lara Guarch, Pablo
2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 2/3] qat: enable RX head writes coalescing Anatoly Burakov
2017-08-25  9:30 ` [dpdk-dev] [DPDK] [PATCH 3/3] qat: enable TX tail " Anatoly Burakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).