From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50065.outbound.protection.outlook.com [40.107.5.65]) by dpdk.org (Postfix) with ESMTP id D3A0B1B39B for ; Sun, 29 Oct 2017 20:30:51 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=kHWC02ong+Uzi+Sv6EtnUykjvada1XN+OYmX7euPse0=; b=Y28ie2OsJKwsDIUNCw2AzowUQIynykXirZLFJS5IjD3MaXzP1DSaxr7XccNMA1meAE4LPHkQ1LxJTzL9Ed/Mm1NMbH05PG40kzN0C1R8JrilMPDxv30Rq1x6iyWDLZO8yYkjFzWgww+pA6cdsk9e5FwVCPtDXKl0ouWCTP+VaXU= Received: from DB5PR05MB1254.eurprd05.prod.outlook.com (10.162.157.140) by DB5SPR8PMB304.eurprd05.prod.outlook.com (10.164.43.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.178.6; Sun, 29 Oct 2017 19:30:48 +0000 Received: from DB5PR05MB1254.eurprd05.prod.outlook.com ([fe80::69a8:7948:e0cb:c16e]) by DB5PR05MB1254.eurprd05.prod.outlook.com ([fe80::69a8:7948:e0cb:c16e%13]) with mapi id 15.20.0178.012; Sun, 29 Oct 2017 19:30:48 +0000 From: Ophir Munk To: Adrien Mazarguil CC: "dev@dpdk.org" , Thomas Monjalon , "Olga Shern" , Matan Azrad Thread-Topic: [PATCH v2 2/7] net/mlx4: inline more Tx functions Thread-Index: AQHTTApqth6x0LV1TkeHYFXL14f/5qL0ymgAgABGhrCAALS5gIAAZu8ggAUSosA= Date: Sun, 29 Oct 2017 19:30:48 +0000 Message-ID: References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-3-git-send-email-ophirmu@mellanox.com> <20171025164938.GH26782@6wind.com> <20171026074853.GL26782@6wind.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=ophirmu@mellanox.com; x-originating-ip: [93.173.17.154] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB5SPR8PMB304; 6:Da1YoKOMEGP7QK0xWm+tdHGfchIOcitgGas3oT/zw265KLRn2G4c7i5UrHmmM/UsyS5g52h1Ap02hxrPTqIJ6Fy5qae7hvTFN+KPYXDZ4d3eLcB3Og4pT7s4ICJG5iGPgAkefZytMe3sJTPf7IzxfVfQEfql4bUp6cy+69i5htbjZMDoJoSojI2vQqXTQWezoBZdCLEJFAhAlrlxQwG52+Jf+i1DI8cnMWlkeZ5/tUovpbxxYlphFx4vBm0gcg2KfNE8TqtgMo6gvHOBSwE0jmH0vOmLspgzhsy6BJwQZ+nx/MYbPyCmZxVpQ1kJTRdGnAxQmb9L42iucsYAIsz2vWio/xa34C8wOh+jl5yL3Lc=; 5:FBJaL2eeHxZzoEVbDn/NPjkEfUvN1mrvjbi4jk7owrWbCbVrWzG4hfB6r0B+Ktb+qPe7tu/9fm+YkGEtUltwKJc/I9AAnNzEfvJex6mnfgyIB1VDzuzAeom2iY/7l3DPPJ8QlWqgMY46Ca5uTKnCck/oUZP20+XKNAkjYEiBaXQ=; 24:eE+TWse5OdQp9BUiYgkmDlOy8X2jJtyiP1D6399P9FgJ+rEU3XyhJw05ld+UJuyumQJ2VCiN8fbFZMN7j4+EzwuVVoRZEibcrQaiQDRCNgA=; 7:IESEwowM1mfA3XNJ0WlCn5pL8rlofvTqH5mmPBpgUR8t6X6eYH+bShRN9TF58JL3N8ZbTmTwKGY6Z0GoYp+jASDGWSgDB1LUVFhw2s/FkLXpG7LQ2mKACfBQ4KuTnfsI+kzoqgJgRNGPyOOjfmWOfJTk6G5S1pK23HycMqF/nbixKPiWneHJ941b64xs5kXOAMhvvioAn0PGjPHtWezY8JJ0IsUmh7REuSOp5tnDCoxLKAqO9QkH0hEHad1xZVLs x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 7856d350-0329-4f9e-9b1e-08d51f038eff x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(4534020)(4602075)(48565401081)(2017052603199); SRVR:DB5SPR8PMB304; x-ms-traffictypediagnostic: DB5SPR8PMB304: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-exchange-antispam-report-test: UriScan:(17755550239193); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(3231020)(93006095)(93001095)(100000703101)(100105400095)(3002001)(10201501046)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123560025)(20161123555025)(20161123564025)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB5SPR8PMB304; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB5SPR8PMB304; x-forefront-prvs: 0475418F50 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(376002)(346002)(189002)(24454002)(199003)(6436002)(4326008)(97736004)(3846002)(5250100002)(102836003)(6116002)(53546010)(101416001)(9686003)(50986999)(107886003)(76176999)(54906003)(54356999)(99286003)(316002)(33656002)(55016002)(6246003)(86362001)(68736007)(66066001)(93886005)(14454004)(53936002)(189998001)(2900100001)(2950100002)(6916009)(105586002)(229853002)(8676002)(81156014)(3660700001)(81166006)(5660300001)(7696004)(106356001)(3280700002)(478600001)(74316002)(305945005)(7736002)(8936002)(25786009)(2906002)(6506006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB5SPR8PMB304; H:DB5PR05MB1254.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7856d350-0329-4f9e-9b1e-08d51f038eff X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Oct 2017 19:30:48.5240 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5SPR8PMB304 Subject: Re: [dpdk-dev] [PATCH v2 2/7] net/mlx4: inline more Tx functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Oct 2017 19:30:52 -0000 On Thursday, October 26, 2017 5:28 PM, Ophir Munk wrote: >=20 > Hi, > Please see inside >=20 > On Thursday, October 26, 2017 10:49 AM Adrien Mazarguil wrote: > > To: Ophir Munk > > Cc: dev@dpdk.org; Thomas Monjalon ; Olga > Shern > > ; Matan Azrad > > Subject: Re: [PATCH v2 2/7] net/mlx4: inline more Tx functions > > > > Hi Ophir, > > > > Please see below. > > > > On Wed, Oct 25, 2017 at 09:42:46PM +0000, Ophir Munk wrote: > > > Hi Adrien, > > > > > > On Wednesday, October 25, 2017 7:50 PM, Adrien Mazarguil wrote: > > > > > > > > Hi Ophir, > > > > > > > > On Mon, Oct 23, 2017 at 02:21:55PM +0000, Ophir Munk wrote: > > > > > Change functions to inline on Tx fast path to improve > > > > > performance > > > > > > > > > > Inside the inline function call other functions to handle "unlike= ly" > > > > > cases such that the inline function code footprint is small. > > > > > > > > > > Signed-off-by: Ophir Munk > > > > > > > > Reading this, it's like adding __rte_always_inline improves > > > > performance at all, which I doubt unless you can show proof > > > > through > > performance results. > > > > > > > > When in doubt, leave it to the compiler, the static keyword is > > > > usually enough of a hint. Too much forced inlining may actually be > > harmful. > > > > > > > > What this patch really does is splitting the heavy > > > > lookup/registration function in two halves with one small static > > > > inline function for the lookup part that calls the separate > > > > registration part in the unlikely event MR is not already registere= d. > > > > > > > > Thankfully the compiler doesn't inline the large registration > > > > function back, which results in the perceived performance > > > > improvement for the time being, however there is no guarantee it > > > > won't happen in the future (you didn't use the noinline keyword on > > > > the > > registration function for that). > > > > > > > > Therefore I have a bunch of comments and suggestions, see below. > > > > > > > > > --- > > > > > drivers/net/mlx4/mlx4_rxtx.c | 43 > > > > > ++++++------------------------------ > > > > > drivers/net/mlx4/mlx4_rxtx.h | 52 > > > > > +++++++++++++++++++++++++++++++++++++++++++- > > > > > 2 files changed, 58 insertions(+), 37 deletions(-) > > > > > > > > > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c > > > > > b/drivers/net/mlx4/mlx4_rxtx.c index 011ea79..ae37f9b 100644 > > > > > --- a/drivers/net/mlx4/mlx4_rxtx.c > > > > > +++ b/drivers/net/mlx4/mlx4_rxtx.c > > > > > @@ -220,54 +220,25 @@ mlx4_txq_complete(struct txq *txq) > > > > > return 0; > > > > > } > > > > > > > > > > -/** > > > > > - * Get memory pool (MP) from mbuf. If mbuf is indirect, the > > > > > pool from which > > > > > - * the cloned mbuf is allocated is returned instead. > > > > > - * > > > > > - * @param buf > > > > > - * Pointer to mbuf. > > > > > - * > > > > > - * @return > > > > > - * Memory pool where data is located for given mbuf. > > > > > - */ > > > > > -static struct rte_mempool * > > > > > -mlx4_txq_mb2mp(struct rte_mbuf *buf) -{ > > > > > - if (unlikely(RTE_MBUF_INDIRECT(buf))) > > > > > - return rte_mbuf_from_indirect(buf)->pool; > > > > > - return buf->pool; > > > > > -} > > > > > > > > > > /** > > > > > - * Get memory region (MR) <-> memory pool (MP) association from > > > > >txq- mp2mr[]. > > > > > - * Add MP to txq->mp2mr[] if it's not registered yet. If > > > > >mp2mr[] is full, > > > > > - * remove an entry first. > > > > > + * Add memory region (MR) <-> memory pool (MP) association to > > > > > + txq- > > > > >mp2mr[]. > > > > > + * If mp2mr[] is full, remove an entry first. > > > > > * > > > > > * @param txq > > > > > * Pointer to Tx queue structure. > > > > > * @param[in] mp > > > > > - * Memory pool for which a memory region lkey must be returned= . > > > > > + * Memory pool for which a memory region lkey must be added > > > > > + * @param[in] i > > > > > + * Index in memory pool (MP) where to add memory region (MR) > > > > > * > > > > > * @return > > > > > - * mr->lkey on success, (uint32_t)-1 on failure. > > > > > + * Added mr->lkey on success, (uint32_t)-1 on failure. > > > > > */ > > > > > -uint32_t > > > > > -mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp) > > > > > +uint32_t mlx4_txq_add_mr(struct txq *txq, struct rte_mempool > > > > > +*mp, uint32_t i) > > > > > { > > > > > - unsigned int i; > > > > > struct ibv_mr *mr; > > > > > > > > > > - for (i =3D 0; (i !=3D RTE_DIM(txq->mp2mr)); ++i) { > > > > > - if (unlikely(txq->mp2mr[i].mp =3D=3D NULL)) { > > > > > - /* Unknown MP, add a new MR for it. */ > > > > > - break; > > > > > - } > > > > > - if (txq->mp2mr[i].mp =3D=3D mp) { > > > > > - assert(txq->mp2mr[i].lkey !=3D (uint32_t)-1); > > > > > - assert(txq->mp2mr[i].mr->lkey =3D=3D txq- > > > > >mp2mr[i].lkey); > > > > > - return txq->mp2mr[i].lkey; > > > > > - } > > > > > - } > > > > > /* Add a new entry, register MR first. */ > > > > > DEBUG("%p: discovered new memory pool \"%s\" (%p)", > > > > > (void *)txq, mp->name, (void *)mp); diff --git > > > > >a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h > > > > >index > > > > > e10bbca..719ef45 100644 > > > > > --- a/drivers/net/mlx4/mlx4_rxtx.h > > > > > +++ b/drivers/net/mlx4/mlx4_rxtx.h > > > > > @@ -53,6 +53,7 @@ > > > > > > > > > > #include "mlx4.h" > > > > > #include "mlx4_prm.h" > > > > > +#include "mlx4_utils.h" > > > > > > > > Why? > > > > > > > > > > > > > > /** Rx queue counters. */ > > > > > struct mlx4_rxq_stats { > > > > > @@ -160,7 +161,6 @@ void mlx4_rx_queue_release(void > *dpdk_rxq); > > > > > > > > > > /* mlx4_rxtx.c */ > > > > > > > > > > -uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool > > > > > *mp); uint16_t mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pk= ts, > > > > > uint16_t pkts_n); > > > > > uint16_t mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, > > > > > @@ > > > > > -169,6 +169,8 @@ uint16_t mlx4_tx_burst_removed(void *dpdk_txq, > > > > struct rte_mbuf **pkts, > > > > > uint16_t pkts_n); > > > > > uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf > > **pkts, > > > > > uint16_t pkts_n); > > > > > +uint32_t mlx4_txq_add_mr(struct txq *txq, struct rte_mempool > *mp, > > > > > + unsigned int i); > > > > > > > > > > /* mlx4_txq.c */ > > > > > > > > > > @@ -177,4 +179,52 @@ int mlx4_tx_queue_setup(struct rte_eth_dev > > > > *dev, uint16_t idx, > > > > > const struct rte_eth_txconf *conf); void > > > > > mlx4_tx_queue_release(void *dpdk_txq); > > > > > > > > > > +/** > > > > > + * Get memory pool (MP) from mbuf. If mbuf is indirect, the > > > > > +pool from which > > > > > + * the cloned mbuf is allocated is returned instead. > > > > > + * > > > > > + * @param buf > > > > > + * Pointer to mbuf. > > > > > + * > > > > > + * @return > > > > > + * Memory pool where data is located for given mbuf. > > > > > + */ > > > > > +static __rte_always_inline struct rte_mempool * > > > > > +mlx4_txq_mb2mp(struct rte_mbuf *buf) { > > > > > + if (unlikely(RTE_MBUF_INDIRECT(buf))) > > > > > + return rte_mbuf_from_indirect(buf)->pool; > > > > > + return buf->pool; > > > > > +} > > > > > + > > > > > +/** > > > > > + * Get memory region (MR) <-> memory pool (MP) association from > > > > > +txq- > > > > >mp2mr[]. > > > > > + * Call mlx4_txq_add_mr() if MP is not registered yet. > > > > > + * > > > > > + * @param txq > > > > > + * Pointer to Tx queue structure. > > > > > + * @param[in] mp > > > > > + * Memory pool for which a memory region lkey must be returned= . > > > > > + * > > > > > + * @return > > > > > + * mr->lkey on success, (uint32_t)-1 on failure. > > > > > + */ > > > > > +static __rte_always_inline uint32_t > > > > > > > > Note __rte_always_inline is defined in rte_common.h and should be > > > > explicitly included (however don't do that, see below). > > > > > > > > > +mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp) { > > > > > + unsigned int i; > > > > > + > > > > > + for (i =3D 0; (i !=3D RTE_DIM(txq->mp2mr)); ++i) { > > > > > + if (unlikely(txq->mp2mr[i].mp =3D=3D NULL)) { > > > > > + /* Unknown MP, add a new MR for it. */ > > > > > + break; > > > > > + } > > > > > + if (txq->mp2mr[i].mp =3D=3D mp) { > > > > > + assert(txq->mp2mr[i].lkey !=3D (uint32_t)-1); > > > > > + assert(txq->mp2mr[i].mr->lkey =3D=3D txq- > > > > >mp2mr[i].lkey); > > > > > > > > assert() requires assert.h (but don't include it, see subsequent > > suggestion). > > > > > > > > > + return txq->mp2mr[i].lkey; > > > > > + } > > > > > + } > > > > > + return mlx4_txq_add_mr(txq, mp, i); } > > > > > #endif /* MLX4_RXTX_H_ */ > > > > > > > > So as described above, these functions do not need the > > > > __rte_always_inline, please remove it. They also do not need to be > > > > located in a header file; the reason it's the case for their mlx5 > > > > counterparts is that they have to be shared between > > > > vectorized/non-vectorized code. No such requirement here, you > > > > should > > move them back to their original spot. > > > > > > > > > > Static function mlx4_txq_mp2mr() must be in a header file because it > > > is > > shared by 2 files: mlx4_txq.c and mlx4_rxtx.c. > > > It is not related to vectorized/non-vectorized code in mlx5. > > > Having said that -__rte_always_inline is required as well otherwise > > > compilation fails with > > > drivers/net/mlx4/mlx4_rxtx.h:200:1: error: 'mlx4_txq_mp2mr' defined > > > but not used [-Werror=3Dunused-function] for files which include > > > mlx4_rxtx.h > > > > All right, then what you were looking or was static inline, not *force* > inline. > > The former is a hint, the latter doesn't leave much of a choice to the > > compiler, it means you're sure this way brings the most performance, > > however for this patch I really think inlining plays a really minor > > part (even changes anything at all) compared to dividing this > > function, which is the real performance improvement. >=20 > Without inline I get ~0.2Mpps degradation on my setup, therefore I sugges= t > keeping inline. > The final call is yours. Please let me know if to leave inline or remove. >=20 Till further indication - function mlx4_txq_mp2mr() is left static in next = v3 of this series. > > > > > > My suggestion for this performance improvement is to move > > > > mlx4_txq_add_mr() to a different file, mlx4_mr.c looks like a good > > > > candidate. This fact will ensure it's never inlined and far away > > > > from the data path. > > > > > > > > > > Function mlx4_txq_add_mr() is relatively small. > > > What do you say about preceding it with __attribute((noinline)) > > > instead of > > creating a new file? > > > > What I mean is you should declare mlx4_txq_add_mr() which does the > > heavy lifting inside mlx4_mr.c and provide its definition in mlx4.h > > instead of mlx4_rxtx.h. > > > > Then, mlx4_txq_mp2mr() can remain defined in mlx4_rxtx.c in its > > original spot as a non-static function with its public declaration > > remaining in mlx4_rxtx.h for users outside of this file. > > >=20 > That's done before. mlx4_txq_mp2mr() is defined in mlx4_rxtx.c as > described. >=20 > > The fact mlx4_txq_mp2mr() remains defined in that file *before* > > mlx4_post_send()'s definition where it's needed allows the compiler to > > optimize it away as if it was static inline thanks to -O3, that is, > > unless it thinks doing so would hurt performance, but as a (now) small > > function this shouldn't be an issue. > > > > Other reasons includes that doing so would make a smaller diff that > > focuses on the performance improvement itself. The extra performance > > brought by a statically inlined version of mlx4_txq_mp2mr() is not > > needed in mlx4_txq.c, whose only purpose is to set up queues. >=20 > I have moved mlx4_txq_add_mr() to mlx_mr.c. One of the disadvantages of > moving it is that now it requires adding #include "mlx4_rxtx.h" in the C = file in > order to know the details of txq struct. > I liked mlx4_mr.c for being encapsulated from any Tx/Rx specific structur= es > and just handling MR issues. >=20 > > > > -- > > Adrien Mazarguil > > 6WIND