From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0065.outbound.protection.outlook.com [104.47.0.65]) by dpdk.org (Postfix) with ESMTP id 5D1371BA03 for ; Thu, 26 Oct 2017 18:21:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=eZrrfsLYJzws/ajY45nvEVnawjwXPGKE2wRyZtfTDD0=; b=lMPxZi7NP9c8bouM4xslNSiLKjS+BsNpOzEJnQQkgjn1FVgH8a9DsA2VSrcY26qzLD4dSHdSRrFt55GUI+j8gSNbfMZPM84E6gNFxsR9yodDl0n8AQE43yWKX9VCLoKQz/UvoQXSAWou3m+2HbRRPAqT+oEP2b9q7n5I4loILXs= Received: from HE1PR0502MB3659.eurprd05.prod.outlook.com (10.167.127.17) by VI1PR05MB1920.eurprd05.prod.outlook.com (10.166.44.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.156.4; Thu, 26 Oct 2017 16:21:38 +0000 Received: from HE1PR0502MB3659.eurprd05.prod.outlook.com ([fe80::c524:908c:b99c:3f4b]) by HE1PR0502MB3659.eurprd05.prod.outlook.com ([fe80::c524:908c:b99c:3f4b%13]) with mapi id 15.20.0156.007; Thu, 26 Oct 2017 16:21:37 +0000 From: Matan Azrad To: =?iso-8859-1?Q?N=E9lio_Laranjeiro?= CC: Ophir Munk , Adrien Mazarguil , "dev@dpdk.org" , Thomas Monjalon , Olga Shern , Mordechay Haimovsky Thread-Topic: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions Thread-Index: AQHTTApr4imkSYrit0WXMr28NXPk2aLzBmWAgABxKwCAALwaAIABuooggAAhDoCAAAJoAIAAF1IAgAAn+xA= Date: Thu, 26 Oct 2017 16:21:37 +0000 Message-ID: References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-5-git-send-email-ophirmu@mellanox.com> <20171024135149.fyg4nzcbygo2amtz@laranjeiro-vm> <20171025075006.znxl7mezy4pfyzsj@laranjeiro-vm> <20171026121219.ke3dz7hv4a5zfpih@laranjeiro-vm> <20171026134424.6hww2zyc3crbe322@laranjeiro-vm> In-Reply-To: <20171026134424.6hww2zyc3crbe322@laranjeiro-vm> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR05MB1920; 6:hKHFpDIbkQFhTiKmm6nC849KdJnZpVLUC8wwgAOICHp8ct7XvCsJ3lnxnfGkjOoW4Sxi4wsW1FchZEyeu5kd6Z5h1+OUdgl8L1P6tlhVclwT0xmI3xcqoMAvQYF6yv731Sqi6aF80ncsgftcFVp2k+38UTVL1DpkweZJBNyNANxoZyC0ShthpncTeW6isQWI8ko0yMKaCT0L/cPyPhu/HRF1DfXivXv/hLQgqwqp6d8f/TkTeXvJXKRYcgEiK6n0Sy4jyZEP5QzNtGIYNvG+Bxf3tSkz//YMe8Lo3i+XFigwfnkZhZdaAryONpe9Qw/SI6mUbIvX1dmUSrvHZYGiZQ==; 5:5Mqi+kxfZX7s1JZs3gNzSe15WxzicLmL2EasY1ajGdYbRxdcuOisimSY5waymWMv1OekO49xNU1kt2zzGi6Fz6KWBB5/OdixquWKddwej7VRLlmwnXo4r9XfZA8393C74KeFwc5t7WSiLmqnL4KhSg==; 24:bOlo5XojF6tEvBeExRCPa1lHcodkHUvE0DQj/+MFEohSLxDHm8SQnQh1Paw3pEW5hzcMGZXeZvMhf49FUzZRVryPopDG9Ip1cTjWexn/DjE=; 7:GFBkOivsYuUoPYXyquEKROI2XupN8jh0HzpHOdZGNlIzGDmUCFD43TevbI0BSsxTwRXwhYu0sfJFJTQ/AuMqyaDrzzCvaOGz2w4VPTEXjnkTYWNx9Zar16Sb5SFnKfsCTM782SUK8Jvlj9bcELLZ7MFVq8A0GqDUuPG6VnAdF6SabYjNpIozPjr/UaborSUXZ14AOuTQ9l4fxXjViwlxcFGjCvn0yCnGgJLJ6xJgZlo= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 830bc8f9-96f5-4f7b-4f58-08d51c8da222 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(48565401081)(4534020)(4602075)(4627075)(201703031133081)(201702281549075)(2017052603199); SRVR:VI1PR05MB1920; x-ms-traffictypediagnostic: VI1PR05MB1920: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-exchange-antispam-report-test: UriScan:; x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(3231020)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(6055026)(6041248)(20161123564025)(20161123555025)(20161123558100)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:VI1PR05MB1920; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:VI1PR05MB1920; x-forefront-prvs: 04724A515E x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(376002)(39860400002)(346002)(43544003)(189002)(24454002)(199003)(13464003)(53936002)(86362001)(66066001)(102836003)(99286003)(305945005)(55016002)(7736002)(4326008)(74316002)(478600001)(3660700001)(54356999)(3280700002)(9686003)(8676002)(14454004)(81156014)(6436002)(50986999)(316002)(8936002)(76176999)(93886005)(25786009)(101416001)(68736007)(5660300001)(105586002)(81166006)(7696004)(2950100002)(54906003)(5250100002)(106356001)(53546010)(6916009)(33656002)(229853002)(189998001)(6506006)(97736004)(3846002)(107886003)(2906002)(6116002)(2900100001)(6246003)(427584002); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR05MB1920; H:HE1PR0502MB3659.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 830bc8f9-96f5-4f7b-4f58-08d51c8da222 X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Oct 2017 16:21:37.7198 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR05MB1920 Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Oct 2017 16:21:40 -0000 Hi Nelio > -----Original Message----- > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > Sent: Thursday, October 26, 2017 4:44 PM > To: Matan Azrad > Cc: Ophir Munk ; Adrien Mazarguil > ; dev@dpdk.org; Thomas Monjalon > ; Olga Shern ; Mordechay > Haimovsky > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions >=20 > On Thu, Oct 26, 2017 at 12:30:54PM +0000, Matan Azrad wrote: > > Hi Nelio > > Please see my comments below (3). > > > > > > > -----Original Message----- > > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > > Sent: Thursday, October 26, 2017 3:12 PM > > > To: Matan Azrad > > > Cc: Ophir Munk ; Adrien Mazarguil > > > ; dev@dpdk.org; Thomas Monjalon > > > ; Olga Shern ; > Mordechay > > > Haimovsky > > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path > > > functions > > > > > > On Thu, Oct 26, 2017 at 10:31:06AM +0000, Matan Azrad wrote: > > > > Hi Nelio > > > > > > > > I think the memory barrier discussion is not relevant for this > > > > patch (if it will be relevant I will create new one). > > > > Please see my comments inline. > > > > > > It was not my single comment. There is also useless code like > > > having null segments in the packets which is not allowed on DPDK. > > > > Sorry, but I can't find comments in the previous mails. >=20 > You should search in the series, >=20 > > Moreover this comment(first time I see it) is not relevant to this pat= ch and > asking something else. > > All what this patch does is to merge 2 functions to prevent double > > asking about WQ remain space... >=20 > Again in the series itself. >=20 > The point, this series embed 7 patches for "performance improvement", > whereas the single improvement is avoiding to call an outside function by > copy/pasting it into the PMD. > In fact it will save few cycles, but this improvements could have been mu= ch > more if the it was not a bare copy/paste. >=20 This simple merge improves 0.2MPPS in my setup. If you have more improvements (other than reduce if statement) regarding th= is merge please suggest.=20 > The real question is what is the improvement? If the improvement is > significant, it worse having this series, otherwise it does not as it may= also > bring some bugs which may be resolve from its original source whereas thi= s > one will remain. >=20 Each commit in this series improves performance - all of them improve perfo= rmance significantly and brought us to our target. By the way, I think series discussion should be in patch 0 :) > > Remove memory\compiler barriers or dealing with null segments are not i= n > the scope here. > > > > > > > > > Regarding this specific patch, I didn't see any comment from you, > > > > Are you agree with it? > > > > > > > > > -----Original Message----- > > > > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > > > > Sent: Wednesday, October 25, 2017 10:50 AM > > > > > To: Ophir Munk > > > > > Cc: Adrien Mazarguil ; dev@dpdk.org; > > > > > Thomas Monjalon ; Olga Shern > > > > > ; Matan Azrad > > > > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path > > > > > functions > > > > > > > > > > On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote: > > > > > > Hi, > > > > > > > > > > > > On Tuesday, October 24, 2017 4:52 PM, N=E9lio Laranjeiro wrote: > > > > > > > > > > > > > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote: > > > > > > > > From: Matan Azrad > > > > > > > > > > > > > > > > Merge tx_burst and mlx4_post_send functions to prevent > > > > > > > > double asking about WQ remain space. > > > > > > > > > > > > > > > > This should improve performance. > > > > > > > > > > > > > > > > Signed-off-by: Matan Azrad > > > > > > > > --- > > > > > > > > drivers/net/mlx4/mlx4_rxtx.c | 353 > > > > > > > > +++++++++++++++++++++---------------------- > > > > > > > > 1 file changed, 170 insertions(+), 183 deletions(-) > > > > > > > > > > > > > > What are the real expectation you have on the remaining > > > > > > > patches of the series? > > > > > > > > > > > > > > According to the comment of this commit log "This should > > > > > > > improve performance" there are too many barriers at each > > > > > > > packet/segment level to improve something. > > > > > > > > > > > > > > The point is, mlx4_burst_tx() should write all the WQE > > > > > > > without any barrier as it is processing a burst of packets > > > > > > > (whereas Verbs functions which may only process a single > packet). > > > > > > > > > > > > > The lonely barrier which should be present is the one to > > > > > > > ensure that all the host memory is flushed before triggering = the Tx > doorbell. > > > > > > > > > > > > > > > > > > > There is a known ConnectX-3 HW limitation: the first 4 bytes > > > > > > of every TXWBB (64 bytes chunks) should be written in a > > > > > > reversed order (from last TXWBB to first TXWBB). > > > > > > > > > > This means the first WQE filled by the burst function is the door= bell. > > > > > In such situation, the first four bytes of it can be written > > > > > before leaving the burst function and after a write memory barrie= r. > > > > > > > > > > Until this first WQE is not complete, the NIC won't start > > > > > processing the packets. Memory barriers per packets becomes > useless. > > > > > > > > I think this is not true, Since mlx4 HW can prefetch advanced > > > > TXbbs if their first 4 bytes are valid in spite of the first WQE > > > > is still not valid (please > > > read the spec). > > > > > > A compiler barrier is enough on x86 to forbid the CPU to re-order > > > the instructions, on arm you need a memory barrier, there is a macro > > > in DPDK for that, rte_io_wmb(). > > > > > We are also using compiler barrier here. > > > > > Before triggering the doorbell you must flush the case, this is the > > > only place where the rte_wmb() should be used. > > > > > > > We are also using memory barrier only for this reason. > > > > > > > It gives something like: > > > > > > > > > > uint32_t tx_bb_db =3D 0; > > > > > void *first_wqe =3D NULL; > > > > > > > > > > /* > > > > > * Prepare all Packets by writing the WQEs without the 4 first b= ytes of > > > > > * the first WQE. > > > > > */ > > > > > for () { > > > > > if (!wqe) { > > > > > first_wqe =3D wqe; > > > > > tx_bb_db =3D foo; > > > > > } > > > > > } > > > > > /* Leaving. */ > > > > > rte_wmb(); > > > > > *(uin32_t*)wqe =3D tx_bb_db; > > > > > return n; > > > > > > > > > > > > > I will take care to check if we can do 2 loops: > > > > Write all last 60B per TXbb. > > > > Memory barrier. > > > > Write all first 4B per TXbbs. > > > > > > > > > > The last 60 bytes of any TXWBB can be written in any order > > > > > > (before writing the first 4 bytes). > > > > > > Is your last statement (using lonely barrier) is in accordance > > > > > > with this limitation? Please explain. > > > > > > > > > > > > > There is also too many cases handled which are useless in > > > > > > > bursts > > > > > situation, > > > > > > > this function needs to be re-written to its minimal use case = i.e. > > > > > processing a > > > > > > > valid burst of packets/segments and triggering at the end of > > > > > > > the burst the > > > > > Tx > > > > > > > doorbell. > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > -- > > > > > N=E9lio Laranjeiro > > > > > 6WIND > > > > > > Regards, > > > > > > -- > > > N=E9lio Laranjeiro > > > 6WIND >=20 > -- > N=E9lio Laranjeiro > 6WIND