From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by dpdk.org (Postfix) with ESMTP id 85B201BA03 for ; Wed, 25 Oct 2017 18:49:50 +0200 (CEST) Received: by mail-wm0-f66.google.com with SMTP id m72so3126312wmc.1 for ; Wed, 25 Oct 2017 09:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=d4JhXmHj3zKZm7GUyHmpV3fwkv4Y+B+AyxOHR6oXQvY=; b=YwggenT/e+1sBWDb+FMTAaJDpW6h85evC0owKq1Zov2+NWExKVcmdKXeynu2if46FE 9ld9hOmf/RhQBP2QJRwE6i3SfrRGarfrQ6/NRUxt1qouu2A06RxZLEnXEgJ110ZV5Fgx mIEF0Re1qT0YYzYR4aoOQbU8MfktV2yBpEZ+pTmdowF7POfgINA604Ou6RdE0FRpjnFo rQZklHsX7tL7bW8mRCC8KCWBNkD6YOY7P9qsCjEWYG5wEn+N+O5gVlSiiDjnFPOokgrC KXDwBhZkCzDoUpB9pJnopWao6aQVnVp45ObM52KfIvR7w6LSJNCrtpzRCQJmfwsV1toe o3sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=d4JhXmHj3zKZm7GUyHmpV3fwkv4Y+B+AyxOHR6oXQvY=; b=erDlKqsQAr4XjJaQPkYEWjtEr8zL+NDyayrKq81wA0MJobT2gqm8dJ98LQ1XHH221/ stJ+W45lAcd272HSfR/NPhHmGSUQnRPXnfgvU7twoSBFutctosTmGR+hFtTCwfmAxghL 382FzUHg5isZ+46qtGraG7u471rySG4nnTF4EixhQIcmtJiBnoD57HNenmzWM3FYblIR BKuVjui0XN/x5CyfwPABxs4P6omXhCQeZkx2vwboVwOT3zyAnXcGKB1Ap8ro98vgV957 eCDXSPcGV9FE5llU9DHxz+yORNYF4Net05ehS+2vYFfGuHSSsJmjE/+WN604K27sPpQU AlGg== X-Gm-Message-State: AMCzsaXUzYimQFjIGZoChNWqka2OtnG8TAWFmTjFhb9b/Px0u+kgonDr dTbzAVIyz84qVqrUKOLiB6ip9A== X-Google-Smtp-Source: ABhQp+R4l/VxCfNEZw7OzGYY7kQmh4H4Njyr7ErUAxVqsT7HcLR2IOuaCVrupIJCIoediKQDIb95oQ== X-Received: by 10.28.134.73 with SMTP id i70mr2326462wmd.107.1508950189953; Wed, 25 Oct 2017 09:49:49 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id t6sm3018605wra.28.2017.10.25.09.49.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2017 09:49:49 -0700 (PDT) Date: Wed, 25 Oct 2017 18:49:38 +0200 From: Adrien Mazarguil To: Ophir Munk Cc: dev@dpdk.org, Thomas Monjalon , Olga Shern , Matan Azrad Message-ID: <20171025164938.GH26782@6wind.com> References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-3-git-send-email-ophirmu@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1508768520-4810-3-git-send-email-ophirmu@mellanox.com> Subject: Re: [dpdk-dev] [PATCH v2 2/7] net/mlx4: inline more Tx functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Oct 2017 16:49:50 -0000 Hi Ophir, On Mon, Oct 23, 2017 at 02:21:55PM +0000, Ophir Munk wrote: > Change functions to inline on Tx fast path to improve performance > > Inside the inline function call other functions to handle "unlikely" > cases such that the inline function code footprint is small. > > Signed-off-by: Ophir Munk Reading this, it's like adding __rte_always_inline improves performance at all, which I doubt unless you can show proof through performance results. When in doubt, leave it to the compiler, the static keyword is usually enough of a hint. Too much forced inlining may actually be harmful. What this patch really does is splitting the heavy lookup/registration function in two halves with one small static inline function for the lookup part that calls the separate registration part in the unlikely event MR is not already registered. Thankfully the compiler doesn't inline the large registration function back, which results in the perceived performance improvement for the time being, however there is no guarantee it won't happen in the future (you didn't use the noinline keyword on the registration function for that). Therefore I have a bunch of comments and suggestions, see below. > --- > drivers/net/mlx4/mlx4_rxtx.c | 43 ++++++------------------------------ > drivers/net/mlx4/mlx4_rxtx.h | 52 +++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 58 insertions(+), 37 deletions(-) > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c > index 011ea79..ae37f9b 100644 > --- a/drivers/net/mlx4/mlx4_rxtx.c > +++ b/drivers/net/mlx4/mlx4_rxtx.c > @@ -220,54 +220,25 @@ mlx4_txq_complete(struct txq *txq) > return 0; > } > > -/** > - * Get memory pool (MP) from mbuf. If mbuf is indirect, the pool from which > - * the cloned mbuf is allocated is returned instead. > - * > - * @param buf > - * Pointer to mbuf. > - * > - * @return > - * Memory pool where data is located for given mbuf. > - */ > -static struct rte_mempool * > -mlx4_txq_mb2mp(struct rte_mbuf *buf) > -{ > - if (unlikely(RTE_MBUF_INDIRECT(buf))) > - return rte_mbuf_from_indirect(buf)->pool; > - return buf->pool; > -} > > /** > - * Get memory region (MR) <-> memory pool (MP) association from txq->mp2mr[]. > - * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full, > - * remove an entry first. > + * Add memory region (MR) <-> memory pool (MP) association to txq->mp2mr[]. > + * If mp2mr[] is full, remove an entry first. > * > * @param txq > * Pointer to Tx queue structure. > * @param[in] mp > - * Memory pool for which a memory region lkey must be returned. > + * Memory pool for which a memory region lkey must be added > + * @param[in] i > + * Index in memory pool (MP) where to add memory region (MR) > * > * @return > - * mr->lkey on success, (uint32_t)-1 on failure. > + * Added mr->lkey on success, (uint32_t)-1 on failure. > */ > -uint32_t > -mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp) > +uint32_t mlx4_txq_add_mr(struct txq *txq, struct rte_mempool *mp, uint32_t i) > { > - unsigned int i; > struct ibv_mr *mr; > > - for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) { > - if (unlikely(txq->mp2mr[i].mp == NULL)) { > - /* Unknown MP, add a new MR for it. */ > - break; > - } > - if (txq->mp2mr[i].mp == mp) { > - assert(txq->mp2mr[i].lkey != (uint32_t)-1); > - assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey); > - return txq->mp2mr[i].lkey; > - } > - } > /* Add a new entry, register MR first. */ > DEBUG("%p: discovered new memory pool \"%s\" (%p)", > (void *)txq, mp->name, (void *)mp); > diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h > index e10bbca..719ef45 100644 > --- a/drivers/net/mlx4/mlx4_rxtx.h > +++ b/drivers/net/mlx4/mlx4_rxtx.h > @@ -53,6 +53,7 @@ > > #include "mlx4.h" > #include "mlx4_prm.h" > +#include "mlx4_utils.h" Why? > > /** Rx queue counters. */ > struct mlx4_rxq_stats { > @@ -160,7 +161,6 @@ void mlx4_rx_queue_release(void *dpdk_rxq); > > /* mlx4_rxtx.c */ > > -uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp); > uint16_t mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, > uint16_t pkts_n); > uint16_t mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, > @@ -169,6 +169,8 @@ uint16_t mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts, > uint16_t pkts_n); > uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts, > uint16_t pkts_n); > +uint32_t mlx4_txq_add_mr(struct txq *txq, struct rte_mempool *mp, > + unsigned int i); > > /* mlx4_txq.c */ > > @@ -177,4 +179,52 @@ int mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, > const struct rte_eth_txconf *conf); > void mlx4_tx_queue_release(void *dpdk_txq); > > +/** > + * Get memory pool (MP) from mbuf. If mbuf is indirect, the pool from which > + * the cloned mbuf is allocated is returned instead. > + * > + * @param buf > + * Pointer to mbuf. > + * > + * @return > + * Memory pool where data is located for given mbuf. > + */ > +static __rte_always_inline struct rte_mempool * > +mlx4_txq_mb2mp(struct rte_mbuf *buf) > +{ > + if (unlikely(RTE_MBUF_INDIRECT(buf))) > + return rte_mbuf_from_indirect(buf)->pool; > + return buf->pool; > +} > + > +/** > + * Get memory region (MR) <-> memory pool (MP) association from txq->mp2mr[]. > + * Call mlx4_txq_add_mr() if MP is not registered yet. > + * > + * @param txq > + * Pointer to Tx queue structure. > + * @param[in] mp > + * Memory pool for which a memory region lkey must be returned. > + * > + * @return > + * mr->lkey on success, (uint32_t)-1 on failure. > + */ > +static __rte_always_inline uint32_t Note __rte_always_inline is defined in rte_common.h and should be explicitly included (however don't do that, see below). > +mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp) > +{ > + unsigned int i; > + > + for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) { > + if (unlikely(txq->mp2mr[i].mp == NULL)) { > + /* Unknown MP, add a new MR for it. */ > + break; > + } > + if (txq->mp2mr[i].mp == mp) { > + assert(txq->mp2mr[i].lkey != (uint32_t)-1); > + assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey); assert() requires assert.h (but don't include it, see subsequent suggestion). > + return txq->mp2mr[i].lkey; > + } > + } > + return mlx4_txq_add_mr(txq, mp, i); > +} > #endif /* MLX4_RXTX_H_ */ So as described above, these functions do not need the __rte_always_inline, please remove it. They also do not need to be located in a header file; the reason it's the case for their mlx5 counterparts is that they have to be shared between vectorized/non-vectorized code. No such requirement here, you should move them back to their original spot. My suggestion for this performance improvement is to move mlx4_txq_add_mr() to a different file, mlx4_mr.c looks like a good candidate. This fact will ensure it's never inlined and far away from the data path. -- Adrien Mazarguil 6WIND