From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 663DCA04B0 for ; Fri, 7 Aug 2020 14:59:53 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4EC221C0C0; Fri, 7 Aug 2020 14:59:53 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 8407A1C031 for ; Fri, 7 Aug 2020 14:59:52 +0200 (CEST) IronPort-SDR: lN34bay4nfxH06pfvYCDofDKCn+5gtxpbz/Xi7i7hJAW17B18So1snmj8fukCudflDjxppfjJU ApJko0reZ72Q== X-IronPort-AV: E=McAfee;i="6000,8403,9705"; a="132628754" X-IronPort-AV: E=Sophos;i="5.75,445,1589266800"; d="scan'208";a="132628754" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Aug 2020 05:59:52 -0700 IronPort-SDR: ElUoO/VzP5vTWE2Hu+i0HhHN1SJMmVRgycXsamAy7BVnJ+eEKbKarygpk/1TOwYdUIiAdVOcn/ b+7ExRyauZTA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,445,1589266800"; d="scan'208";a="307368557" Received: from akusztax-mobl1.ger.corp.intel.com ([10.104.121.21]) by orsmga002.jf.intel.com with ESMTP; 07 Aug 2020 05:59:50 -0700 From: Arek Kusztal To: fiona.trahe@intel.com Cc: stable@dpdk.org, Arek Kusztal Date: Fri, 7 Aug 2020 14:57:00 +0200 Message-Id: <20200807125701.1764-3-arkadiuszx.kusztal@intel.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200807125701.1764-1-arkadiuszx.kusztal@intel.com> References: <20200807125701.1764-1-arkadiuszx.kusztal@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-stable] [19.11 3/4] common/qat: support dual threads for enqueue/dequeue X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" From: Fiona Trahe [ upstream commit 026f21c0b95120a4e249af4480a0ddad75838ff9 ] Remove the limitation whereby enqueue and dequeue must be done in same thread. The inflight calculation is reworked to be thread-safe for 2 threads - note this is not general multi-thread support, i.e all enqueues to a qp must still be done in one thread and all dequeues must be done in one thread, but enqueues and dequeues may be in separate threads. Documentation updated. Cc: stable@dpdk.org Signed-off-by: Fiona Trahe Signed-off-by: Arek Kusztal Acked-by: Fiona Trahe --- doc/guides/compressdevs/qat_comp.rst | 5 ++++- doc/guides/cryptodevs/qat.rst | 10 +++++++-- drivers/common/qat/qat_qp.c | 40 +++++++++++++++++++++--------------- drivers/common/qat/qat_qp.h | 3 ++- 4 files changed, 38 insertions(+), 20 deletions(-) diff --git a/doc/guides/compressdevs/qat_comp.rst b/doc/guides/compressdevs/qat_comp.rst index 6421f76..757611a 100644 --- a/doc/guides/compressdevs/qat_comp.rst +++ b/doc/guides/compressdevs/qat_comp.rst @@ -37,7 +37,10 @@ Limitations ----------- * Compressdev level 0, no compression, is not supported. -* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported). +* Queue-pairs are thread-safe on Intel CPUs but Queues are not (that is, within a single + queue-pair all enqueues to the TX queue must be done from one thread and all dequeues + from the RX queue must be done from one thread, but enqueues and dequeues may be done + in different threads.) * No BSD support as BSD QAT kernel driver not available. * When using Deflate dynamic huffman encoding for compression, the input size (op.src.length) must be < CONFIG_RTE_PMD_QAT_COMP_IM_BUFFER_SIZE from the config file, diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst index 8981a1e..5135703 100644 --- a/doc/guides/cryptodevs/qat.rst +++ b/doc/guides/cryptodevs/qat.rst @@ -109,7 +109,10 @@ Limitations * No BSD support as BSD QAT kernel driver not available. * ZUC EEA3/EIA3 is not supported by dh895xcc devices * Maximum additional authenticated data (AAD) for GCM is 240 bytes long and must be passed to the device in a buffer rounded up to the nearest block-size multiple (x16) and padded with zeros. -* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported). +* Queue-pairs are thread-safe on Intel CPUs but Queues are not (that is, within a single + queue-pair all enqueues to the TX queue must be done from one thread and all dequeues + from the RX queue must be done from one thread, but enqueues and dequeues may be done + in different threads.) * A GCM limitation exists, but only in the case where there are multiple generations of QAT devices on a single platform. To optimise performance, the GCM crypto session should be initialised for the @@ -163,7 +166,10 @@ Limitations ~~~~~~~~~~~ * Big integers longer than 4096 bits are not supported. -* Queue pairs are not thread-safe (that is, within a single queue pair, RX and TX from different lcores is not supported). +* Queue-pairs are thread-safe on Intel CPUs but Queues are not (that is, within a single + queue-pair all enqueues to the TX queue must be done from one thread and all dequeues + from the RX queue must be done from one thread, but enqueues and dequeues may be done + in different threads.) * RSA-2560, RSA-3584 are not supported .. _building_qat: diff --git a/drivers/common/qat/qat_qp.c b/drivers/common/qat/qat_qp.c index 791c469..0725708 100644 --- a/drivers/common/qat/qat_qp.c +++ b/drivers/common/qat/qat_qp.c @@ -233,7 +233,7 @@ int qat_qp_setup(struct qat_pci_device *qat_dev, } qp->mmap_bar_addr = pci_dev->mem_resource[0].addr; - qp->inflights16 = 0; + qp->enqueued = qp->dequeued = 0; if (qat_queue_create(qat_dev, &(qp->tx_q), qat_qp_conf, ADF_RING_DIR_TX) != 0) { @@ -324,7 +324,7 @@ int qat_qp_release(struct qat_qp **qp_addr) qp->qat_dev->qat_dev_id); /* Don't free memory if there are still responses to be processed */ - if (qp->inflights16 == 0) { + if ((qp->enqueued - qp->dequeued) == 0) { qat_queue_delete(&(qp->tx_q)); qat_queue_delete(&(qp->rx_q)); } else { @@ -583,7 +583,6 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops) uint16_t nb_ops_possible = nb_ops; register uint8_t *base_addr; register uint32_t tail; - int overflow; if (unlikely(nb_ops == 0)) return 0; @@ -594,13 +593,25 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops) tail = queue->tail; /* Find how many can actually fit on the ring */ - tmp_qp->inflights16 += nb_ops; - overflow = tmp_qp->inflights16 - tmp_qp->max_inflights; - if (overflow > 0) { - tmp_qp->inflights16 -= overflow; - nb_ops_possible = nb_ops - overflow; - if (nb_ops_possible == 0) - return 0; + { + /* dequeued can only be written by one thread, but it may not + * be this thread. As it's 4-byte aligned it will be read + * atomically here by any Intel CPU. + * enqueued can wrap before dequeued, but cannot + * lap it as var size of enq/deq (uint32_t) > var size of + * max_inflights (uint16_t). In reality inflights is never + * even as big as max uint16_t, as it's <= ADF_MAX_DESC. + * On wrapping, the calculation still returns the correct + * positive value as all three vars are unsigned. + */ + uint32_t inflights = + tmp_qp->enqueued - tmp_qp->dequeued; + + if ((inflights + nb_ops) > tmp_qp->max_inflights) { + nb_ops_possible = tmp_qp->max_inflights - inflights; + if (nb_ops_possible == 0) + return 0; + } } while (nb_ops_sent != nb_ops_possible) { @@ -624,11 +635,7 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops) if (ret != 0) { tmp_qp->stats.enqueue_err_count++; - /* - * This message cannot be enqueued, - * decrease number of ops that wasn't sent - */ - tmp_qp->inflights16 -= nb_ops_possible - nb_ops_sent; + /* This message cannot be enqueued */ if (nb_ops_sent == 0) return 0; goto kick_tail; @@ -640,6 +647,7 @@ qat_enqueue_op_burst(void *qp, void **ops, uint16_t nb_ops) } kick_tail: queue->tail = tail; + tmp_qp->enqueued += nb_ops_sent; tmp_qp->stats.enqueued_count += nb_ops_sent; txq_write_tail(tmp_qp, queue); return nb_ops_sent; @@ -683,9 +691,9 @@ qat_dequeue_op_burst(void *qp, void **ops, uint16_t nb_ops) } if (resp_counter > 0) { rx_queue->head = head; + tmp_qp->dequeued += resp_counter; tmp_qp->stats.dequeued_count += resp_counter; rx_queue->nb_processed_responses += resp_counter; - tmp_qp->inflights16 -= resp_counter; if (rx_queue->nb_processed_responses > QAT_CSR_HEAD_WRITE_THRESH) diff --git a/drivers/common/qat/qat_qp.h b/drivers/common/qat/qat_qp.h index 02a613a..973e883 100644 --- a/drivers/common/qat/qat_qp.h +++ b/drivers/common/qat/qat_qp.h @@ -63,7 +63,6 @@ struct qat_queue { struct qat_qp { void *mmap_bar_addr; - uint16_t inflights16; struct qat_queue tx_q; struct qat_queue rx_q; struct qat_common_stats stats; @@ -75,6 +74,8 @@ struct qat_qp { enum qat_service_type service_type; struct qat_pci_device *qat_dev; /**< qat device this qp is on */ + uint32_t enqueued; + uint32_t dequeued __rte_aligned(4); uint16_t max_inflights; } __rte_cache_aligned; -- 2.1.0