DPDK patches and discussions
 help / color / Atom feed
From: Arek Kusztal <arkadiuszx.kusztal@intel.com>
To: dev@dpdk.org
Cc: akhil.goyal@nxp.com, fiona.trahe@intel.com,
	declan.doherty@intel.com,
	Arek Kusztal <arkadiuszx.kusztal@intel.com>
Subject: [dpdk-dev] [PATCH v3 3/4] common/qat: add dual thread support
Date: Wed, 11 Dec 2019 15:50:03 +0100
Message-ID: <20191211145004.11672-4-arkadiuszx.kusztal@intel.com> (raw)
In-Reply-To: <20191211145004.11672-1-arkadiuszx.kusztal@intel.com>

From: Fiona Trahe <fiona.trahe@intel.com>

Remove the limitation whereby enqueue and dequeue must be
done in same thread.
The inflight calculation is reworked to be thread-safe for 2
threads - note this is not general multi-thread support, i.e
all enqueues to a qp must still be done in one thread and
all dequeues must be done in one thread, but enqueues and
dequeues may be in separate threads.
Documentation updated.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
---
 doc/guides/compressdevs/qat_comp.rst |  5 ++++-
 doc/guides/cryptodevs/qat.rst        | 10 +++++++--
 drivers/common/qat/qat_qp.c          | 40 +++++++++++++++++++++---------------
 drivers/common/qat/qat_qp.h          |  3 ++-
 4 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/doc/guides/compressdevs/qat_comp.rst b/doc/guides/compressdevs/qat_comp.rst
index 6421f76..757611a 100644
--- a/doc/guides/compressdevs/qat_comp.rst
+++ b/doc/guides/compressdevs/qat_comp.rst
@@ -37,7 +37,10 @@ Limitations
 -----------
 
 * Compressdev level 0, no compression, is not supported.
-* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported).
+* Queue-pairs are thread-safe on Intel CPUs but Queues are not (that is, within a single
+  queue-pair all enqueues to the TX queue must be done from one thread and all dequeues
+  from the RX queue must be done from one thread, but enqueues and dequeues may be done
+  in different threads.)
 * No BSD support as BSD QAT kernel driver not available.
 * When using Deflate dynamic huffman encoding for compression, the input size (op.src.length)
   must be < CONFIG_RTE_PMD_QAT_COMP_IM_BUFFER_SIZE from the config file,
diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 6197875..3a4a189 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -81,7 +81,10 @@ Limitations
 * No BSD support as BSD QAT kernel driver not available.
 * ZUC EEA3/EIA3 is not supported by dh895xcc devices
 * Maximum additional authenticated data (AAD) for GCM is 240 bytes long and must be passed to the device in a buffer rounded up to the nearest block-size multiple (x16) and padded with zeros.
-* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported).
+* Queue-pairs are thread-safe on Intel CPUs but Queues are not (that is, within a single
+  queue-pair all enqueues to the TX queue must be done from one thread and all dequeues
+  from the RX queue must be done from one thread, but enqueues and dequeues may be done
+  in different threads.)
 * A GCM limitation exists, but only in the case where there are multiple
   generations of QAT devices on a single platform.
   To optimise performance, the GCM crypto session should be initialised for the
@@ -133,7 +136,10 @@ Limitations
 ~~~~~~~~~~~
 
 * Big integers longer than 4096 bits are not supported.
-* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported).
+* Queue-pairs are thread-safe on Intel CPUs but Queues are not (that is, within a single
+  queue-pair all enqueues to the TX queue must be done from one thread and all dequeues
+  from the RX queue must be done from one thread, but enqueues and dequeues may be done
+  in different threads.)
 * RSA-2560, RSA-3584 are not supported
 
 .. _building_qat:
diff --git a/drivers/common/qat/qat_qp.c b/drivers/common/qat/qat_qp.c
index 8e4c74a..30cdc61 100644
--- a/drivers/common/qat/qat_qp.c
+++ b/drivers/common/qat/qat_qp.c
@@ -230,7 +230,7 @@ int qat_qp_setup(struct qat_pci_device *qat_dev,
 	}
 
 	qp->mmap_bar_addr = pci_dev->mem_resource[0].addr;
-	qp->inflights16 = 0;
+	qp->enqueued = qp->dequeued = 0;
 
 	if (qat_queue_create(qat_dev, &(qp->tx_q), qat_qp_conf,
 					ADF_RING_DIR_TX) != 0) {
@@ -321,7 +321,7 @@ int qat_qp_release(struct qat_qp **qp_addr)
 				qp->qat_dev->qat_dev_id);
 
 	/* Don't free memory if there are still responses to be processed */
-	if (qp->inflights16 == 0) {
+	if ((qp->enqueued - qp->dequeued) == 0) {
 		qat_queue_delete(&(qp->tx_q));
 		qat_queue_delete(&(qp->rx_q));
 	} else {
@@ -579,7 +579,6 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops)
 	uint16_t nb_ops_possible = nb_ops;
 	register uint8_t *base_addr;
 	register uint32_t tail;
-	int overflow;
 
 	if (unlikely(nb_ops == 0))
 		return 0;
@@ -590,13 +589,25 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops)
 	tail = queue->tail;
 
 	/* Find how many can actually fit on the ring */
-	tmp_qp->inflights16 += nb_ops;
-	overflow = tmp_qp->inflights16 - tmp_qp->max_inflights;
-	if (overflow > 0) {
-		tmp_qp->inflights16 -= overflow;
-		nb_ops_possible = nb_ops - overflow;
-		if (nb_ops_possible == 0)
-			return 0;
+	{
+		/* dequeued can only be written by one thread, but it may not
+		 * be this thread. As it's 4-byte aligned it will be read
+		 * atomically here by any Intel CPU.
+		 * enqueued can wrap before dequeued, but cannot
+		 * lap it as var size of enq/deq (uint32_t) > var size of
+		 * max_inflights (uint16_t). In reality inflights is never
+		 * even as big as max uint16_t, as it's <= ADF_MAX_DESC.
+		 * On wrapping, the calculation still returns the correct
+		 * positive value as all three vars are unsigned.
+		 */
+		uint32_t inflights =
+			tmp_qp->enqueued - tmp_qp->dequeued;
+
+		if ((inflights + nb_ops) > tmp_qp->max_inflights) {
+			nb_ops_possible = tmp_qp->max_inflights - inflights;
+			if (nb_ops_possible == 0)
+				return 0;
+		}
 	}
 
 	while (nb_ops_sent != nb_ops_possible) {
@@ -605,11 +616,7 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops)
 				tmp_qp->qat_dev_gen);
 		if (ret != 0) {
 			tmp_qp->stats.enqueue_err_count++;
-			/*
-			 * This message cannot be enqueued,
-			 * decrease number of ops that wasn't sent
-			 */
-			tmp_qp->inflights16 -= nb_ops_possible - nb_ops_sent;
+			/* This message cannot be enqueued */
 			if (nb_ops_sent == 0)
 				return 0;
 			goto kick_tail;
@@ -621,6 +628,7 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops)
 	}
 kick_tail:
 	queue->tail = tail;
+	tmp_qp->enqueued += nb_ops_sent;
 	tmp_qp->stats.enqueued_count += nb_ops_sent;
 	txq_write_tail(tmp_qp, queue);
 	return nb_ops_sent;
@@ -664,9 +672,9 @@ qat_dequeue_op_burst(void *qp, void **ops, uint16_t nb_ops)
 	}
 	if (resp_counter > 0) {
 		rx_queue->head = head;
+		tmp_qp->dequeued += resp_counter;
 		tmp_qp->stats.dequeued_count += resp_counter;
 		rx_queue->nb_processed_responses += resp_counter;
-		tmp_qp->inflights16 -= resp_counter;
 
 		if (rx_queue->nb_processed_responses >
 						QAT_CSR_HEAD_WRITE_THRESH)
diff --git a/drivers/common/qat/qat_qp.h b/drivers/common/qat/qat_qp.h
index 5066f06..8b9ab79 100644
--- a/drivers/common/qat/qat_qp.h
+++ b/drivers/common/qat/qat_qp.h
@@ -63,7 +63,6 @@ struct qat_queue {
 
 struct qat_qp {
 	void			*mmap_bar_addr;
-	uint16_t		inflights16;
 	struct qat_queue	tx_q;
 	struct qat_queue	rx_q;
 	struct qat_common_stats stats;
@@ -75,6 +74,8 @@ struct qat_qp {
 	enum qat_service_type service_type;
 	struct qat_pci_device *qat_dev;
 	/**< qat device this qp is on */
+	uint32_t enqueued;
+	uint32_t dequeued __rte_aligned(4);
 	uint16_t max_inflights;
 } __rte_cache_aligned;
 
-- 
2.1.0


  parent reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 14:50 [dpdk-dev] [PATCH v3 0/4] Add dual threading in QAT PMD Arek Kusztal
2019-12-11 14:50 ` [dpdk-dev] [PATCH v3 1/4] common/qat: remove tail write coalescing feature Arek Kusztal
2020-01-14 17:37   ` Trahe, Fiona
2019-12-11 14:50 ` [dpdk-dev] [PATCH v3 2/4] common/qat: move max inflights param into qp Arek Kusztal
2020-01-14 17:38   ` Trahe, Fiona
2019-12-11 14:50 ` Arek Kusztal [this message]
2020-01-14 17:38   ` [dpdk-dev] [PATCH v3 3/4] common/qat: add dual thread support Trahe, Fiona
2019-12-11 14:50 ` [dpdk-dev] [PATCH v3 4/4] crypto/qat: add minimum enq threshold to qat pmd Arek Kusztal
2020-01-14 12:59   ` Trahe, Fiona
2019-12-11 14:56 ` [dpdk-dev] [PATCH v3 0/4] Add dual threading in QAT PMD Trahe, Fiona

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191211145004.11672-4-arkadiuszx.kusztal@intel.com \
    --to=arkadiuszx.kusztal@intel.com \
    --cc=akhil.goyal@nxp.com \
    --cc=declan.doherty@intel.com \
    --cc=dev@dpdk.org \
    --cc=fiona.trahe@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox