From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by dpdk.org (Postfix) with ESMTP id F0455C13A for ; Wed, 22 Jun 2016 11:06:28 +0200 (CEST) Received: by mail-lb0-f180.google.com with SMTP id o4so18473312lbp.2 for ; Wed, 22 Jun 2016 02:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GTHQVwAOYM2+hUonnZkUEFDoUO5sguEtKQr97lq0cj4=; b=jhV6Ruh4T/kCGFtXsj9v5Alwx2zZ7knWa8bCl8sCbEbeMaPrgU9imHMvEzijGK12Kn pGWgBEyLZxEqI1wx6+73+2Or3VJqZB7RcsNjw/5y/KoptfzC7HpxUl+jGcTvO3SZtyhJ q8kQT0e98vPI1JzYbLY3vKIjK+FSUMxJIPBjZRa0w9UijoSqM1fTw3aOdCfLsHhPi9U4 3mrbv93OKQjz4qGvuPTpx0yQJ3a0qmrCS7O+q9BtQ6OGZLc3uBRGGgpPX6vragB+Aoty sTgVAzwB0kxL8FXf9YKeYJjn1MYf6CXDvZtOdtjlMrU/c9kD4Ns3gDUtOBT6Dr5G32io 9q0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GTHQVwAOYM2+hUonnZkUEFDoUO5sguEtKQr97lq0cj4=; b=Fs49qWITcDEFRhDzmW47zkCAVku/Xr++bzm5sdw3VxaQKMbjUyyGFxFGfFil/0oVhv wYbNrO8Y6WtXm+fR02exTBXQ+OsUweMSxIrmHkHRAvmeLFuyC6lsZDePw9A4BYTISMIs 3P3SQ5KE7v9uBREpdoloFV9+aZj3JsGBWAqs8r9PTdKIc77X3s+h+TMBbN45YmP1Pbj8 76DgtpeoMTY9O9YxvoXyKqtvjgVhfnZY6TsKMpQFbIHtcjlH8hVJ3Czn2zqUeC85eyZM Pmv5RmWwxNFo7aLDk44aEwmZQwq1GeWV7QOfOUA6D8cI/AJfeRlYVU0wZSCVrx1E8vFm cWng== X-Gm-Message-State: ALyK8tKXl/VO0CJk959l+sc2FUFATLHV1B+nC5rtQmBYP1sdEY/mOwRuh1UdrEctdrXATk39 X-Received: by 10.194.114.228 with SMTP id jj4mr23711013wjb.121.1466586388087; Wed, 22 Jun 2016 02:06:28 -0700 (PDT) Received: from ping.vm.6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id z5sm7019178wme.5.2016.06.22.02.06.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 22 Jun 2016 02:06:27 -0700 (PDT) From: Nelio Laranjeiro To: dev@dpdk.org Cc: Ferruh Yigit , Yaacov Hazan , Adrien Mazarguil Date: Wed, 22 Jun 2016 11:05:47 +0200 Message-Id: <1466586355-30777-18-git-send-email-nelio.laranjeiro@6wind.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1466586355-30777-1-git-send-email-nelio.laranjeiro@6wind.com> References: <1466493818-1877-1-git-send-email-nelio.laranjeiro@6wind.com> <1466586355-30777-1-git-send-email-nelio.laranjeiro@6wind.com> Subject: [dpdk-dev] [PATCH v4 17/25] mlx5: add support for inline send X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2016 09:06:29 -0000 From: Yaacov Hazan Implement send inline feature which copies packet data directly into WQEs for improved latency. The maximum packet size and the minimum number of Tx queues to qualify for inline send are user-configurable. This feature is effective when HW causes a performance bottleneck. Signed-off-by: Yaacov Hazan Signed-off-by: Adrien Mazarguil Signed-off-by: Nelio Laranjeiro --- doc/guides/nics/mlx5.rst | 17 +++ drivers/net/mlx5/mlx5.c | 13 ++ drivers/net/mlx5/mlx5.h | 2 + drivers/net/mlx5/mlx5_ethdev.c | 5 + drivers/net/mlx5/mlx5_rxtx.c | 271 +++++++++++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_rxtx.h | 2 + drivers/net/mlx5/mlx5_txq.c | 4 + 7 files changed, 314 insertions(+) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 756153b..9ada221 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -154,6 +154,23 @@ Run-time configuration allows to save PCI bandwidth and improve performance at the cost of a slightly higher CPU usage. Enabled by default. +- ``txq_inline`` parameter [int] + + Amount of data to be inlined during TX operations. Improves latency. + Can improve PPS performance when PCI back pressure is detected and may be + useful for scenarios involving heavy traffic on many queues. + + It is not enabled by default (set to 0) since the additional software + logic necessary to handle this mode can lower performance when back + pressure is not expected. + +- ``txqs_min_inline`` parameter [int] + + Enable inline send only when the number of TX queues is greater or equal + to this value. + + This option should be used in combination with ``txq_inline`` above. + Prerequisites ------------- diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index ec4e0b6..49c7ae8 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -72,6 +72,13 @@ /* Device parameter to enable RX completion queue compression. */ #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en" +/* Device parameter to configure inline send. */ +#define MLX5_TXQ_INLINE "txq_inline" + +/* Device parameter to configure the number of TX queues threshold for + * enabling inline send. */ +#define MLX5_TXQS_MIN_INLINE "txqs_min_inline" + /** * Retrieve integer value from environment variable. * @@ -269,6 +276,10 @@ mlx5_args_check(const char *key, const char *val, void *opaque) } if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) priv->cqe_comp = !!tmp; + else if (strcmp(MLX5_TXQ_INLINE, key) == 0) + priv->txq_inline = tmp; + else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) + priv->txqs_inline = tmp; else { WARN("%s: unknown parameter", key); return -EINVAL; @@ -292,6 +303,8 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs) { static const char *params[] = { MLX5_RXQ_CQE_COMP_EN, + MLX5_TXQ_INLINE, + MLX5_TXQS_MIN_INLINE, }; struct rte_kvargs *kvlist; int ret = 0; diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 8f5a6df..3a86609 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -113,6 +113,8 @@ struct priv { unsigned int mps:1; /* Whether multi-packet send is supported. */ unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ unsigned int pending_alarm:1; /* An alarm is pending. */ + unsigned int txq_inline; /* Maximum packet size for inlining. */ + unsigned int txqs_inline; /* Queue number threshold for inlining. */ /* RX/TX queues. */ unsigned int rxqs_n; /* RX queues array size. */ unsigned int txqs_n; /* TX queues array size. */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 47e64b2..aeea4ff 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -1318,6 +1318,11 @@ void priv_select_tx_function(struct priv *priv) { priv->dev->tx_pkt_burst = mlx5_tx_burst; + if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) { + priv->dev->tx_pkt_burst = mlx5_tx_burst_inline; + DEBUG("selected inline TX function (%u >= %u queues)", + priv->txqs_n, priv->txqs_inline); + } } /** diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index d56c9e9..43fe532 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -374,6 +374,139 @@ mlx5_wqe_write_vlan(struct txq *txq, volatile union mlx5_wqe *wqe, } /** + * Write a inline WQE. + * + * @param txq + * Pointer to TX queue structure. + * @param wqe + * Pointer to the WQE to fill. + * @param addr + * Buffer data address. + * @param length + * Packet length. + * @param lkey + * Memory region lkey. + */ +static inline void +mlx5_wqe_write_inline(struct txq *txq, volatile union mlx5_wqe *wqe, + uintptr_t addr, uint32_t length) +{ + uint32_t size; + uint16_t wqe_cnt = txq->wqe_n - 1; + uint16_t wqe_ci = txq->wqe_ci + 1; + + /* Copy the first 16 bytes into inline header. */ + rte_memcpy((void *)(uintptr_t)wqe->inl.eseg.inline_hdr_start, + (void *)(uintptr_t)addr, + MLX5_ETH_INLINE_HEADER_SIZE); + addr += MLX5_ETH_INLINE_HEADER_SIZE; + length -= MLX5_ETH_INLINE_HEADER_SIZE; + size = 3 + ((4 + length + 15) / 16); + wqe->inl.byte_cnt = htonl(length | MLX5_INLINE_SEG); + rte_memcpy((void *)(uintptr_t)&wqe->inl.data[0], + (void *)addr, MLX5_WQE64_INL_DATA); + addr += MLX5_WQE64_INL_DATA; + length -= MLX5_WQE64_INL_DATA; + while (length) { + volatile union mlx5_wqe *wqe_next = + &(*txq->wqes)[wqe_ci & wqe_cnt]; + uint32_t copy_bytes = (length > sizeof(*wqe)) ? + sizeof(*wqe) : + length; + + rte_mov64((uint8_t *)(uintptr_t)&wqe_next->data[0], + (uint8_t *)addr); + addr += copy_bytes; + length -= copy_bytes; + ++wqe_ci; + } + assert(size < 64); + wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND); + wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size); + wqe->inl.ctrl.data[3] = 0; + wqe->inl.eseg.rsvd0 = 0; + wqe->inl.eseg.rsvd1 = 0; + wqe->inl.eseg.mss = 0; + wqe->inl.eseg.rsvd2 = 0; + wqe->inl.eseg.inline_hdr_sz = htons(MLX5_ETH_INLINE_HEADER_SIZE); + /* Increment consumer index. */ + txq->wqe_ci = wqe_ci; +} + +/** + * Write a inline WQE with VLAN. + * + * @param txq + * Pointer to TX queue structure. + * @param wqe + * Pointer to the WQE to fill. + * @param addr + * Buffer data address. + * @param length + * Packet length. + * @param lkey + * Memory region lkey. + * @param vlan_tci + * VLAN field to insert in packet. + */ +static inline void +mlx5_wqe_write_inline_vlan(struct txq *txq, volatile union mlx5_wqe *wqe, + uintptr_t addr, uint32_t length, uint16_t vlan_tci) +{ + uint32_t size; + uint32_t wqe_cnt = txq->wqe_n - 1; + uint16_t wqe_ci = txq->wqe_ci + 1; + uint32_t vlan = htonl(0x81000000 | vlan_tci); + + /* + * Copy 12 bytes of source & destination MAC address. + * Copy 4 bytes of VLAN. + * Copy 2 bytes of Ether type. + */ + rte_memcpy((uint8_t *)(uintptr_t)wqe->inl.eseg.inline_hdr_start, + (uint8_t *)addr, 12); + rte_memcpy((uint8_t *)(uintptr_t)wqe->inl.eseg.inline_hdr_start + 12, + &vlan, sizeof(vlan)); + rte_memcpy((uint8_t *)(uintptr_t)wqe->inl.eseg.inline_hdr_start + 16, + ((uint8_t *)addr + 12), 2); + addr += MLX5_ETH_VLAN_INLINE_HEADER_SIZE - sizeof(vlan); + length -= MLX5_ETH_VLAN_INLINE_HEADER_SIZE - sizeof(vlan); + size = (sizeof(wqe->inl.ctrl.ctrl) + + sizeof(wqe->inl.eseg) + + sizeof(wqe->inl.byte_cnt) + + length + 15) / 16; + wqe->inl.byte_cnt = htonl(length | MLX5_INLINE_SEG); + rte_memcpy((void *)(uintptr_t)&wqe->inl.data[0], + (void *)addr, MLX5_WQE64_INL_DATA); + addr += MLX5_WQE64_INL_DATA; + length -= MLX5_WQE64_INL_DATA; + while (length) { + volatile union mlx5_wqe *wqe_next = + &(*txq->wqes)[wqe_ci & wqe_cnt]; + uint32_t copy_bytes = (length > sizeof(*wqe)) ? + sizeof(*wqe) : + length; + + rte_mov64((uint8_t *)(uintptr_t)&wqe_next->data[0], + (uint8_t *)addr); + addr += copy_bytes; + length -= copy_bytes; + ++wqe_ci; + } + assert(size < 64); + wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND); + wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size); + wqe->inl.ctrl.data[3] = 0; + wqe->inl.eseg.rsvd0 = 0; + wqe->inl.eseg.rsvd1 = 0; + wqe->inl.eseg.mss = 0; + wqe->inl.eseg.rsvd2 = 0; + wqe->inl.eseg.inline_hdr_sz = htons(MLX5_ETH_VLAN_INLINE_HEADER_SIZE); + /* Increment consumer index. */ + txq->wqe_ci = wqe_ci; +} + +/** * Ring TX queue doorbell. * * @param txq @@ -415,6 +548,23 @@ tx_prefetch_cqe(struct txq *txq, uint16_t ci) } /** + * Prefetch a WQE. + * + * @param txq + * Pointer to TX queue structure. + * @param wqe_ci + * WQE consumer index. + */ +static inline void +tx_prefetch_wqe(struct txq *txq, uint16_t ci) +{ + volatile union mlx5_wqe *wqe; + + wqe = &(*txq->wqes)[ci & (txq->wqe_n - 1)]; + rte_prefetch0(wqe); +} + +/** * DPDK callback for TX. * * @param dpdk_txq @@ -525,6 +675,127 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) } /** + * DPDK callback for TX with inline support. + * + * @param dpdk_txq + * Generic pointer to TX queue structure. + * @param[in] pkts + * Packets to transmit. + * @param pkts_n + * Number of packets in array. + * + * @return + * Number of packets successfully transmitted (<= pkts_n). + */ +uint16_t +mlx5_tx_burst_inline(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) +{ + struct txq *txq = (struct txq *)dpdk_txq; + uint16_t elts_head = txq->elts_head; + const unsigned int elts_n = txq->elts_n; + unsigned int i; + unsigned int max; + unsigned int comp; + volatile union mlx5_wqe *wqe; + struct rte_mbuf *buf; + unsigned int max_inline = txq->max_inline; + + if (unlikely(!pkts_n)) + return 0; + buf = pkts[0]; + /* Prefetch first packet cacheline. */ + tx_prefetch_cqe(txq, txq->cq_ci); + tx_prefetch_cqe(txq, txq->cq_ci + 1); + rte_prefetch0(buf); + /* Start processing. */ + txq_complete(txq); + max = (elts_n - (elts_head - txq->elts_tail)); + if (max > elts_n) + max -= elts_n; + assert(max >= 1); + assert(max <= elts_n); + /* Always leave one free entry in the ring. */ + --max; + if (max == 0) + return 0; + if (max > pkts_n) + max = pkts_n; + for (i = 0; (i != max); ++i) { + unsigned int elts_head_next = (elts_head + 1) & (elts_n - 1); + uintptr_t addr; + uint32_t length; + uint32_t lkey; + + wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)]; + tx_prefetch_wqe(txq, txq->wqe_ci); + tx_prefetch_wqe(txq, txq->wqe_ci + 1); + if (i + 1 < max) + rte_prefetch0(pkts[i + 1]); + /* Should we enable HW CKSUM offload */ + if (buf->ol_flags & + (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) { + wqe->inl.eseg.cs_flags = + MLX5_ETH_WQE_L3_CSUM | + MLX5_ETH_WQE_L4_CSUM; + } else + wqe->inl.eseg.cs_flags = 0; + /* Retrieve buffer information. */ + addr = rte_pktmbuf_mtod(buf, uintptr_t); + length = DATA_LEN(buf); + /* Update element. */ + (*txq->elts)[elts_head] = buf; + /* Prefetch next buffer data. */ + if (i + 1 < max) + rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 1], + volatile void *)); + if (length <= max_inline) { + if (buf->ol_flags & PKT_TX_VLAN_PKT) + mlx5_wqe_write_inline_vlan(txq, wqe, + addr, length, + buf->vlan_tci); + else + mlx5_wqe_write_inline(txq, wqe, addr, length); + } else { + /* Retrieve Memory Region key for this memory pool. */ + lkey = txq_mp2mr(txq, txq_mb2mp(buf)); + if (buf->ol_flags & PKT_TX_VLAN_PKT) + mlx5_wqe_write_vlan(txq, wqe, addr, length, + lkey, buf->vlan_tci); + else + mlx5_wqe_write(txq, wqe, addr, length, lkey); + } + wqe->inl.ctrl.data[2] = 0; + elts_head = elts_head_next; + buf = pkts[i + 1]; +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment sent bytes counter. */ + txq->stats.obytes += length; +#endif + } + /* Take a shortcut if nothing must be sent. */ + if (unlikely(i == 0)) + return 0; + /* Check whether completion threshold has been reached. */ + comp = txq->elts_comp + i; + if (comp >= MLX5_TX_COMP_THRESH) { + /* Request completion on last WQE. */ + wqe->inl.ctrl.data[2] = htonl(8); + /* Save elts_head in unused "immediate" field of WQE. */ + wqe->inl.ctrl.data[3] = elts_head; + txq->elts_comp = 0; + } else + txq->elts_comp = comp; +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment sent packets counter. */ + txq->stats.opackets += i; +#endif + /* Ring QP doorbell. */ + mlx5_tx_dbrec(txq); + txq->elts_head = elts_head; + return i; +} + +/** * Translate RX completion flags to packet type. * * @param[in] cqe diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index f900e65..3c83148 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -246,6 +246,7 @@ struct txq { uint16_t wqe_n; /* Number of WQ elements. */ uint16_t bf_offset; /* Blueflame offset. */ uint16_t bf_buf_size; /* Blueflame size. */ + uint16_t max_inline; /* Maximum size to inline in a WQE. */ uint32_t qp_num_8s; /* QP number shifted by 8. */ volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ volatile union mlx5_wqe (*wqes)[]; /* Work queue. */ @@ -310,6 +311,7 @@ uint16_t mlx5_tx_burst_secondary_setup(void *, struct rte_mbuf **, uint16_t); /* mlx5_rxtx.c */ uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t); +uint16_t mlx5_tx_burst_inline(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t); uint16_t removed_tx_burst(void *, struct rte_mbuf **, uint16_t); uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t); diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 7b2dc7c..6a4a96e 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -332,6 +332,10 @@ txq_ctrl_setup(struct rte_eth_dev *dev, struct txq_ctrl *txq_ctrl, .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD | IBV_EXP_QP_INIT_ATTR_RES_DOMAIN), }; + if (priv->txq_inline && priv->txqs_n >= priv->txqs_inline) { + tmpl.txq.max_inline = priv->txq_inline; + attr.init.cap.max_inline_data = tmpl.txq.max_inline; + } tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); if (tmpl.qp == NULL) { ret = (errno ? errno : EINVAL); -- 2.1.4