From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by dpdk.org (Postfix) with ESMTP id 9C802C448 for ; Tue, 30 Jun 2015 11:29:09 +0200 (CEST) Received: by widjy10 with SMTP id jy10so24871039wid.1 for ; Tue, 30 Jun 2015 02:29:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ZBLXDS9hrE9UsYbEHMBvRb5l/iDfaZHMIVjbmjgU+8E=; b=k5UkBAkDRHbnepx3EhJPZWNxN+8yX7W95ykozBbIYhmttYrrgopq9HpGjUpDY3Y4MK CqU4eVAhmRvLxQ44mGxRsrfQ84NoW9M/F0H0bwxSIybkszfbGrWqLWvjop67ce9mdtUx bxXlW/9vfP3lbSjVpXWiXIH+4hdrzeKh2YnjCt3psPBtivNJyRHth945PY7vH9MHFt+v VUbsu8nKchP5WEH11yxQBmk9JC04e9tuZ4Ifvk7QpTnAQ9BdehmkbQ6mqbDumEse2wZg p8xrc0sTyZeew7uXt0+64mGzk/hocjUrNT5dcmqg+ZnjNbtBhGTvx03P2XvWJizvGK6r vhAA== X-Gm-Message-State: ALoCoQkV7w29q23/agFd8+u9ljDanUDSpPn2n7ixhybGu3DvuD3neTD6u7A/v9WaM3/OI3MFE13V X-Received: by 10.194.47.164 with SMTP id e4mr36263050wjn.157.1435656549508; Tue, 30 Jun 2015 02:29:09 -0700 (PDT) Received: from 6wind.com (6wind.net2.nerim.net. [213.41.151.210]) by mx.google.com with ESMTPSA id i5sm16133881wic.10.2015.06.30.02.29.07 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 30 Jun 2015 02:29:08 -0700 (PDT) From: Adrien Mazarguil To: dev@dpdk.org Date: Tue, 30 Jun 2015 11:28:03 +0200 Message-Id: <1435656489-27986-18-git-send-email-adrien.mazarguil@6wind.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1435656489-27986-1-git-send-email-adrien.mazarguil@6wind.com> References: <1433546120-2254-1-git-send-email-adrien.mazarguil@6wind.com> <1435656489-27986-1-git-send-email-adrien.mazarguil@6wind.com> Cc: Alex Rosenbaum Subject: [dpdk-dev] [PATCH v2 17/23] mlx4: shrink TX queue elements for better performance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2015 09:29:09 -0000 From: Alex Rosenbaum TX queue elements (struct txq_elt) contain WR and SGE structures required by ibv_post_send(). This commit replaces them with a single pointer to the related TX mbuf considering that: - There is no need to keep these structures around forever since the hardware doesn't access them after ibv_post_send() and send_pending*() have returned. - The TX queue index stored in the WR ID field is not used for completions anymore since they use a separate counter (elts_comp_cd). - The WR structure itself was only useful for ibv_post_send(), it is currently only used to store the mbuf data address and an offset to the mbuf structure in the WR ID field. send_pending*() callbacks only require SGEs or buffer pointers. Therefore for single segment mbufs, send_pending() or send_pending_inline() can be used directly without involving SGEs. For scattered mbufs, SGEs are allocated on the stack and passed to send_pending_sg_list(). Signed-off-by: Alex Rosenbaum Signed-off-by: Adrien Mazarguil --- drivers/net/mlx4/mlx4.c | 244 +++++++++++++++++------------------------------- 1 file changed, 84 insertions(+), 160 deletions(-) diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index acf1290..f251eb4 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -203,9 +203,7 @@ struct rxq { /* TX element. */ struct txq_elt { - struct ibv_send_wr wr; /* Work Request. */ - struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */ - /* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */ + struct rte_mbuf *buf; }; /* Linear buffer type. It is used when transmitting buffers with too many @@ -790,14 +788,8 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n) } for (i = 0; (i != elts_n); ++i) { struct txq_elt *elt = &(*elts)[i]; - struct ibv_send_wr *wr = &elt->wr; - /* Configure WR. */ - WR_ID(wr->wr_id).id = i; - WR_ID(wr->wr_id).offset = 0; - wr->sg_list = &elt->sges[0]; - wr->opcode = IBV_WR_SEND; - /* Other fields are updated during TX. */ + elt->buf = NULL; } DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n); txq->elts_n = elts_n; @@ -856,10 +848,9 @@ txq_free_elts(struct txq *txq) for (i = 0; (i != elemof(*elts)); ++i) { struct txq_elt *elt = &(*elts)[i]; - if (WR_ID(elt->wr.wr_id).offset == 0) + if (elt->buf == NULL) continue; - rte_pktmbuf_free((void *)((uintptr_t)elt->sges[0].addr - - WR_ID(elt->wr.wr_id).offset)); + rte_pktmbuf_free(elt->buf); } rte_free(elts); } @@ -1072,35 +1063,37 @@ linearize_mbuf(linear_t *linear, struct rte_mbuf *buf) * Buffer to process. * @param elts_head * Index of the linear buffer to use if necessary (normally txq->elts_head). + * @param[out] sges + * Array filled with SGEs on success. * * @return - * Processed packet size in bytes or (unsigned int)-1 in case of failure. + * A structure containing the processed packet size in bytes and the + * number of SGEs. Both fields are set to (unsigned int)-1 in case of + * failure. */ -static unsigned int +static struct tx_burst_sg_ret { + unsigned int length; + unsigned int num; +} tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt, - struct rte_mbuf *buf, unsigned int elts_head) + struct rte_mbuf *buf, unsigned int elts_head, + struct ibv_sge (*sges)[MLX4_PMD_SGE_WR_N]) { - struct ibv_send_wr *wr = &elt->wr; unsigned int sent_size = 0; unsigned int j; int linearize = 0; /* When there are too many segments, extra segments are * linearized in the last SGE. */ - if (unlikely(segs > elemof(elt->sges))) { - segs = (elemof(elt->sges) - 1); + if (unlikely(segs > elemof(*sges))) { + segs = (elemof(*sges) - 1); linearize = 1; } - /* Set WR fields. */ - assert((rte_pktmbuf_mtod(buf, uintptr_t) - - (uintptr_t)buf) <= 0xffff); - WR_ID(wr->wr_id).offset = - (rte_pktmbuf_mtod(buf, uintptr_t) - - (uintptr_t)buf); - wr->num_sge = segs; + /* Update element. */ + elt->buf = buf; /* Register segments as SGEs. */ for (j = 0; (j != segs); ++j) { - struct ibv_sge *sge = &elt->sges[j]; + struct ibv_sge *sge = &(*sges)[j]; uint32_t lkey; /* Retrieve Memory Region key for this memory pool. */ @@ -1110,24 +1103,9 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt, DEBUG("%p: unable to get MP <-> MR association", (void *)txq); /* Clean up TX element. */ - WR_ID(elt->wr.wr_id).offset = 0; -#ifndef NDEBUG - /* For assert(). */ - while (j) { - --j; - --sge; - sge->addr = 0; - sge->length = 0; - sge->lkey = 0; - } - wr->num_sge = 0; -#endif + elt->buf = NULL; goto stop; } - /* Sanity checks, only relevant with debugging enabled. */ - assert(sge->addr == 0); - assert(sge->length == 0); - assert(sge->lkey == 0); /* Update SGE. */ sge->addr = rte_pktmbuf_mtod(buf, uintptr_t); if (txq->priv->vf) @@ -1144,57 +1122,44 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt, assert((buf == NULL) || (linearize)); /* Linearize extra segments. */ if (linearize) { - struct ibv_sge *sge = &elt->sges[segs]; + struct ibv_sge *sge = &(*sges)[segs]; linear_t *linear = &(*txq->elts_linear)[elts_head]; unsigned int size = linearize_mbuf(linear, buf); - assert(segs == (elemof(elt->sges) - 1)); + assert(segs == (elemof(*sges) - 1)); if (size == 0) { /* Invalid packet. */ DEBUG("%p: packet too large to be linearized.", (void *)txq); /* Clean up TX element. */ - WR_ID(elt->wr.wr_id).offset = 0; -#ifndef NDEBUG - /* For assert(). */ - while (j) { - --j; - --sge; - sge->addr = 0; - sge->length = 0; - sge->lkey = 0; - } - wr->num_sge = 0; -#endif + elt->buf = NULL; goto stop; } - /* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately - * and clear offset from WR ID. */ - if (elemof(elt->sges) == 1) { + /* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately. */ + if (elemof(*sges) == 1) { do { struct rte_mbuf *next = NEXT(buf); rte_pktmbuf_free_seg(buf); buf = next; } while (buf != NULL); - WR_ID(wr->wr_id).offset = 0; + elt->buf = NULL; } - /* Set WR fields and fill SGE with linear buffer. */ - ++wr->num_sge; - /* Sanity checks, only relevant with debugging - * enabled. */ - assert(sge->addr == 0); - assert(sge->length == 0); - assert(sge->lkey == 0); /* Update SGE. */ sge->addr = (uintptr_t)&(*linear)[0]; sge->length = size; sge->lkey = txq->mr_linear->lkey; sent_size += size; } - return sent_size; + return (struct tx_burst_sg_ret){ + .length = sent_size, + .num = segs, + }; stop: - return -1; + return (struct tx_burst_sg_ret){ + .length = -1, + .num = -1, + }; } #endif /* MLX4_PMD_SGE_WR_N > 1 */ @@ -1216,8 +1181,6 @@ static uint16_t mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) { struct txq *txq = (struct txq *)dpdk_txq; - struct ibv_send_wr head; - struct ibv_send_wr **wr_next = &head.next; unsigned int elts_head = txq->elts_head; const unsigned int elts_tail = txq->elts_tail; const unsigned int elts_n = txq->elts_n; @@ -1243,21 +1206,15 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) for (i = 0; (i != max); ++i) { struct rte_mbuf *buf = pkts[i]; struct txq_elt *elt = &(*txq->elts)[elts_head]; - struct ibv_send_wr *wr = &elt->wr; unsigned int segs = NB_SEGS(buf); #ifdef MLX4_PMD_SOFT_COUNTERS unsigned int sent_size = 0; #endif -#ifndef NDEBUG - unsigned int j; -#endif uint32_t send_flags = 0; /* Clean up old buffer. */ - if (likely(WR_ID(wr->wr_id).offset != 0)) { - struct rte_mbuf *tmp = (void *) - ((uintptr_t)elt->sges[0].addr - - WR_ID(wr->wr_id).offset); + if (likely(elt->buf != NULL)) { + struct rte_mbuf *tmp = elt->buf; /* Faster than rte_pktmbuf_free(). */ do { @@ -1267,38 +1224,20 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) tmp = next; } while (tmp != NULL); } -#ifndef NDEBUG - /* For assert(). */ - WR_ID(wr->wr_id).offset = 0; - for (j = 0; ((int)j < wr->num_sge); ++j) { - elt->sges[j].addr = 0; - elt->sges[j].length = 0; - elt->sges[j].lkey = 0; + /* Request TX completion. */ + if (unlikely(--elts_comp_cd == 0)) { + elts_comp_cd = txq->elts_comp_cd_init; + ++elts_comp; + send_flags |= IBV_EXP_QP_BURST_SIGNALED; } - wr->next = NULL; - wr->num_sge = 0; -#endif - /* Sanity checks, most of which are only relevant with - * debugging enabled. */ - assert(WR_ID(wr->wr_id).id == elts_head); - assert(WR_ID(wr->wr_id).offset == 0); - assert(wr->next == NULL); - assert(wr->sg_list == &elt->sges[0]); - assert(wr->num_sge == 0); - assert(wr->opcode == IBV_WR_SEND); if (likely(segs == 1)) { - struct ibv_sge *sge = &elt->sges[0]; + uintptr_t addr; + uint32_t length; uint32_t lkey; - /* Set WR fields. */ - assert((rte_pktmbuf_mtod(buf, uintptr_t) - - (uintptr_t)buf) <= 0xffff); - WR_ID(wr->wr_id).offset = - (rte_pktmbuf_mtod(buf, uintptr_t) - - (uintptr_t)buf); - wr->num_sge = segs; - /* Register segment as SGE. */ - sge = &elt->sges[0]; + /* Retrieve buffer information. */ + addr = rte_pktmbuf_mtod(buf, uintptr_t); + length = DATA_LEN(buf); /* Retrieve Memory Region key for this memory pool. */ lkey = txq_mp2mr(txq, buf->pool); if (unlikely(lkey == (uint32_t)-1)) { @@ -1306,40 +1245,54 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) DEBUG("%p: unable to get MP <-> MR" " association", (void *)txq); /* Clean up TX element. */ - WR_ID(elt->wr.wr_id).offset = 0; -#ifndef NDEBUG - /* For assert(). */ - sge->addr = 0; - sge->length = 0; - sge->lkey = 0; - wr->num_sge = 0; -#endif + elt->buf = NULL; goto stop; } - /* Sanity checks, only relevant with debugging - * enabled. */ - assert(sge->addr == 0); - assert(sge->length == 0); - assert(sge->lkey == 0); - /* Update SGE. */ - sge->addr = rte_pktmbuf_mtod(buf, uintptr_t); + /* Update element. */ + elt->buf = buf; if (txq->priv->vf) rte_prefetch0((volatile void *) - (uintptr_t)sge->addr); - sge->length = DATA_LEN(buf); - sge->lkey = lkey; + (uintptr_t)addr); + /* Put packet into send queue. */ +#if MLX4_PMD_MAX_INLINE > 0 + if (length <= txq->max_inline) + err = txq->if_qp->send_pending_inline + (txq->qp, + (void *)addr, + length, + send_flags); + else +#endif + err = txq->if_qp->send_pending + (txq->qp, + addr, + length, + lkey, + send_flags); + if (unlikely(err)) + goto stop; #ifdef MLX4_PMD_SOFT_COUNTERS - sent_size += sge->length; + sent_size += length; #endif } else { #if MLX4_PMD_SGE_WR_N > 1 - unsigned int ret; + struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; + struct tx_burst_sg_ret ret; - ret = tx_burst_sg(txq, segs, elt, buf, elts_head); - if (ret == (unsigned int)-1) + ret = tx_burst_sg(txq, segs, elt, buf, elts_head, + &sges); + if (ret.length == (unsigned int)-1) + goto stop; + /* Put SG list into send queue. */ + err = txq->if_qp->send_pending_sg_list + (txq->qp, + sges, + ret.num, + send_flags); + if (unlikely(err)) goto stop; #ifdef MLX4_PMD_SOFT_COUNTERS - sent_size += ret; + sent_size += ret.length; #endif #else /* MLX4_PMD_SGE_WR_N > 1 */ DEBUG("%p: TX scattered buffers support not" @@ -1347,40 +1300,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) goto stop; #endif /* MLX4_PMD_SGE_WR_N > 1 */ } - /* Link WRs together for ibv_post_send(). */ - *wr_next = wr; - wr_next = &wr->next; - assert(wr->send_flags == 0); - /* Request TX completion. */ - if (unlikely(--elts_comp_cd == 0)) { - elts_comp_cd = txq->elts_comp_cd_init; - ++elts_comp; - send_flags |= IBV_EXP_QP_BURST_SIGNALED; - } if (++elts_head >= elts_n) elts_head = 0; #ifdef MLX4_PMD_SOFT_COUNTERS /* Increment sent bytes counter. */ txq->stats.obytes += sent_size; #endif - /* Put SG list into send queue and ask for completion event. */ -#if MLX4_PMD_MAX_INLINE > 0 - if ((segs == 1) && - (elt->sges[0].length <= txq->max_inline)) - err = txq->if_qp->send_pending_inline - (txq->qp, - (void *)(uintptr_t)elt->sges[0].addr, - elt->sges[0].length, - send_flags); - else -#endif - err = txq->if_qp->send_pending_sg_list - (txq->qp, - elt->sges, - segs, - send_flags); - if (unlikely(err)) - goto stop; } stop: /* Take a shortcut if nothing must be sent. */ @@ -1390,7 +1315,6 @@ stop: /* Increment sent packets counter. */ txq->stats.opackets += i; #endif - *wr_next = NULL; /* Ring QP doorbell. */ err = txq->if_qp->send_flush(txq->qp); if (unlikely(err)) { -- 2.1.0