From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <matan@mellanox.com>
Received: from EUR01-HE1-obe.outbound.protection.outlook.com
 (mail-he1eur01on0065.outbound.protection.outlook.com [104.47.0.65])
 by dpdk.org (Postfix) with ESMTP id 5D1371BA03
 for <dev@dpdk.org>; Thu, 26 Oct 2017 18:21:40 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 bh=eZrrfsLYJzws/ajY45nvEVnawjwXPGKE2wRyZtfTDD0=;
 b=lMPxZi7NP9c8bouM4xslNSiLKjS+BsNpOzEJnQQkgjn1FVgH8a9DsA2VSrcY26qzLD4dSHdSRrFt55GUI+j8gSNbfMZPM84E6gNFxsR9yodDl0n8AQE43yWKX9VCLoKQz/UvoQXSAWou3m+2HbRRPAqT+oEP2b9q7n5I4loILXs=
Received: from HE1PR0502MB3659.eurprd05.prod.outlook.com (10.167.127.17) by
 VI1PR05MB1920.eurprd05.prod.outlook.com (10.166.44.147) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id
 15.20.156.4; Thu, 26 Oct 2017 16:21:38 +0000
Received: from HE1PR0502MB3659.eurprd05.prod.outlook.com
 ([fe80::c524:908c:b99c:3f4b]) by HE1PR0502MB3659.eurprd05.prod.outlook.com
 ([fe80::c524:908c:b99c:3f4b%13]) with mapi id 15.20.0156.007; Thu, 26 Oct
 2017 16:21:37 +0000
From: Matan Azrad <matan@mellanox.com>
To: =?iso-8859-1?Q?N=E9lio_Laranjeiro?= <nelio.laranjeiro@6wind.com>
CC: Ophir Munk <ophirmu@mellanox.com>, Adrien Mazarguil
 <adrien.mazarguil@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon
 <thomas@monjalon.net>, Olga Shern <olgas@mellanox.com>, Mordechay Haimovsky
 <motih@mellanox.com>
Thread-Topic: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
Thread-Index: AQHTTApr4imkSYrit0WXMr28NXPk2aLzBmWAgABxKwCAALwaAIABuooggAAhDoCAAAJoAIAAF1IAgAAn+xA=
Date: Thu, 26 Oct 2017 16:21:37 +0000
Message-ID: <HE1PR0502MB3659806FDE4AF1D103584860D2450@HE1PR0502MB3659.eurprd05.prod.outlook.com>
References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com>
 <1508768520-4810-1-git-send-email-ophirmu@mellanox.com>
 <1508768520-4810-5-git-send-email-ophirmu@mellanox.com>
 <20171024135149.fyg4nzcbygo2amtz@laranjeiro-vm>
 <DB5PR05MB1254D9B4C27C02D4512D56E3D1470@DB5PR05MB1254.eurprd05.prod.outlook.com>
 <20171025075006.znxl7mezy4pfyzsj@laranjeiro-vm>
 <HE1PR0502MB365998C9ABE7E943F60382CBD2450@HE1PR0502MB3659.eurprd05.prod.outlook.com>
 <20171026121219.ke3dz7hv4a5zfpih@laranjeiro-vm>
 <HE1PR0502MB365971EF45928921B608F49DD2450@HE1PR0502MB3659.eurprd05.prod.outlook.com>
 <20171026134424.6hww2zyc3crbe322@laranjeiro-vm>
In-Reply-To: <20171026134424.6hww2zyc3crbe322@laranjeiro-vm>
Accept-Language: en-US, he-IL
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=matan@mellanox.com; 
x-originating-ip: [193.47.165.251]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; VI1PR05MB1920;
 6:hKHFpDIbkQFhTiKmm6nC849KdJnZpVLUC8wwgAOICHp8ct7XvCsJ3lnxnfGkjOoW4Sxi4wsW1FchZEyeu5kd6Z5h1+OUdgl8L1P6tlhVclwT0xmI3xcqoMAvQYF6yv731Sqi6aF80ncsgftcFVp2k+38UTVL1DpkweZJBNyNANxoZyC0ShthpncTeW6isQWI8ko0yMKaCT0L/cPyPhu/HRF1DfXivXv/hLQgqwqp6d8f/TkTeXvJXKRYcgEiK6n0Sy4jyZEP5QzNtGIYNvG+Bxf3tSkz//YMe8Lo3i+XFigwfnkZhZdaAryONpe9Qw/SI6mUbIvX1dmUSrvHZYGiZQ==;
 5:5Mqi+kxfZX7s1JZs3gNzSe15WxzicLmL2EasY1ajGdYbRxdcuOisimSY5waymWMv1OekO49xNU1kt2zzGi6Fz6KWBB5/OdixquWKddwej7VRLlmwnXo4r9XfZA8393C74KeFwc5t7WSiLmqnL4KhSg==;
 24:bOlo5XojF6tEvBeExRCPa1lHcodkHUvE0DQj/+MFEohSLxDHm8SQnQh1Paw3pEW5hzcMGZXeZvMhf49FUzZRVryPopDG9Ip1cTjWexn/DjE=;
 7:GFBkOivsYuUoPYXyquEKROI2XupN8jh0HzpHOdZGNlIzGDmUCFD43TevbI0BSsxTwRXwhYu0sfJFJTQ/AuMqyaDrzzCvaOGz2w4VPTEXjnkTYWNx9Zar16Sb5SFnKfsCTM782SUK8Jvlj9bcELLZ7MFVq8A0GqDUuPG6VnAdF6SabYjNpIozPjr/UaborSUXZ14AOuTQ9l4fxXjViwlxcFGjCvn0yCnGgJLJ6xJgZlo=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: 830bc8f9-96f5-4f7b-4f58-08d51c8da222
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
 RULEID:(22001)(48565401081)(4534020)(4602075)(4627075)(201703031133081)(201702281549075)(2017052603199);
 SRVR:VI1PR05MB1920; 
x-ms-traffictypediagnostic: VI1PR05MB1920:
x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr
x-exchange-antispam-report-test: UriScan:;
x-microsoft-antispam-prvs: <VI1PR05MB1920FB326F7644FB3752B9B5D2450@VI1PR05MB1920.eurprd05.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(3231020)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(6055026)(6041248)(20161123564025)(20161123555025)(20161123558100)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);
 SRVR:VI1PR05MB1920; BCL:0; PCL:0;
 RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);
 SRVR:VI1PR05MB1920; 
x-forefront-prvs: 04724A515E
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(6009001)(376002)(39860400002)(346002)(43544003)(189002)(24454002)(199003)(13464003)(53936002)(86362001)(66066001)(102836003)(99286003)(305945005)(55016002)(7736002)(4326008)(74316002)(478600001)(3660700001)(54356999)(3280700002)(9686003)(8676002)(14454004)(81156014)(6436002)(50986999)(316002)(8936002)(76176999)(93886005)(25786009)(101416001)(68736007)(5660300001)(105586002)(81166006)(7696004)(2950100002)(54906003)(5250100002)(106356001)(53546010)(6916009)(33656002)(229853002)(189998001)(6506006)(97736004)(3846002)(107886003)(2906002)(6116002)(2900100001)(6246003)(427584002);
 DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR05MB1920;
 H:HE1PR0502MB3659.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;
 MX:1; A:1; LANG:en; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 830bc8f9-96f5-4f7b-4f58-08d51c8da222
X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Oct 2017 16:21:37.7198 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR05MB1920
Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Oct 2017 16:21:40 -0000

Hi Nelio

> -----Original Message-----
> From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Thursday, October 26, 2017 4:44 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: Ophir Munk <ophirmu@mellanox.com>; Adrien Mazarguil
> <adrien.mazarguil@6wind.com>; dev@dpdk.org; Thomas Monjalon
> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Mordechay
> Haimovsky <motih@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
>=20
> On Thu, Oct 26, 2017 at 12:30:54PM +0000, Matan Azrad wrote:
> > Hi Nelio
> > Please see my comments below (3).
> >
> >
> > > -----Original Message-----
> > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > > Sent: Thursday, October 26, 2017 3:12 PM
> > > To: Matan Azrad <matan@mellanox.com>
> > > Cc: Ophir Munk <ophirmu@mellanox.com>; Adrien Mazarguil
> > > <adrien.mazarguil@6wind.com>; dev@dpdk.org; Thomas Monjalon
> > > <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> Mordechay
> > > Haimovsky <motih@mellanox.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path
> > > functions
> > >
> > > On Thu, Oct 26, 2017 at 10:31:06AM +0000, Matan Azrad wrote:
> > > > Hi Nelio
> > > >
> > > > I think the memory barrier discussion is not relevant for this
> > > > patch (if it will be relevant I will create new one).
> > > > Please see my comments inline.
> > >
> > > It was not my single comment.  There is also useless code like
> > > having null segments in the packets which is not allowed on DPDK.
> >
> > Sorry, but I can't find comments in the previous mails.
>=20
> You should search in the series,
>=20
> > Moreover  this comment(first time I see it) is not relevant to this pat=
ch and
> asking something else.
> > All what this patch does is to merge 2 functions to prevent double
> > asking about WQ remain space...
>=20
> Again in the series itself.
>=20
> The point, this series embed 7 patches for "performance improvement",
> whereas the single improvement is avoiding to call an outside function by
> copy/pasting it into the PMD.
> In fact it will save few cycles, but this improvements could have been mu=
ch
> more if the it was not a bare copy/paste.
>=20

This simple merge improves 0.2MPPS in my setup.
If you have more improvements (other than reduce if statement) regarding th=
is merge please suggest.=20

> The real question is what is the improvement?  If the improvement is
> significant, it worse having this series, otherwise it does not as it may=
 also
> bring some bugs which may be resolve from its original source whereas thi=
s
> one will remain.
>=20

Each commit in this series improves performance - all of them improve perfo=
rmance significantly and brought us to our target.

By the way, I think series discussion should be in patch 0 :)

> > Remove memory\compiler barriers or dealing with null segments are not i=
n
> the scope here.
> >
> > >
> > > > Regarding this specific patch, I didn't see any comment from you,
> > > > Are you agree with it?
> > > >
> > > > > -----Original Message-----
> > > > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > > > > Sent: Wednesday, October 25, 2017 10:50 AM
> > > > > To: Ophir Munk <ophirmu@mellanox.com>
> > > > > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org;
> > > > > Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > > > > <olgas@mellanox.com>; Matan Azrad <matan@mellanox.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path
> > > > > functions
> > > > >
> > > > > On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Tuesday, October 24, 2017 4:52 PM, N=E9lio Laranjeiro wrote:
> > > > > > >
> > > > > > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote:
> > > > > > > > From: Matan Azrad <matan@mellanox.com>
> > > > > > > >
> > > > > > > > Merge tx_burst and mlx4_post_send functions to prevent
> > > > > > > > double asking about WQ remain space.
> > > > > > > >
> > > > > > > > This should improve performance.
> > > > > > > >
> > > > > > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/mlx4/mlx4_rxtx.c | 353
> > > > > > > > +++++++++++++++++++++----------------------
> > > > > > > >  1 file changed, 170 insertions(+), 183 deletions(-)
> > > > > > >
> > > > > > > What are the real expectation you have on the remaining
> > > > > > > patches of the series?
> > > > > > >
> > > > > > > According to the comment of this commit log "This should
> > > > > > > improve performance" there are too many barriers at each
> > > > > > > packet/segment level to improve something.
> > > > > > >
> > > > > > > The point is, mlx4_burst_tx() should write all the WQE
> > > > > > > without any barrier as it is processing a burst of packets
> > > > > > > (whereas Verbs functions which may only process a single
> packet).
> > > > > >
> > > > > > > The lonely barrier which should be present is the one to
> > > > > > > ensure that all the host memory is flushed before triggering =
the Tx
> doorbell.
> > > > > > >
> > > > > >
> > > > > > There is a known ConnectX-3 HW limitation: the first 4 bytes
> > > > > > of every TXWBB (64 bytes chunks) should be written in a
> > > > > > reversed order (from last TXWBB to first TXWBB).
> > > > >
> > > > > This means the first WQE filled by the burst function is the door=
bell.
> > > > > In such situation, the first four bytes of it can be written
> > > > > before leaving the burst function and after a write memory barrie=
r.
> > > > >
> > > > > Until this first WQE is not complete, the NIC won't start
> > > > > processing the packets.  Memory barriers per packets becomes
> useless.
> > > >
> > > > I think this is not true, Since mlx4 HW can prefetch advanced
> > > > TXbbs if their first 4 bytes are valid in spite of the first WQE
> > > > is still not valid (please
> > > read the spec).
> > >
> > > A compiler barrier is enough on x86 to forbid the CPU to re-order
> > > the instructions, on arm you need a memory barrier, there is a macro
> > > in DPDK for that, rte_io_wmb().
> > >
> > We are also using compiler barrier here.
> >
> > > Before triggering the doorbell you must flush the case, this is the
> > > only place where the rte_wmb() should be used.
> > >
> >
> > We are also using memory barrier only for this reason.
> >
> > > > > It gives something like:
> > > > >
> > > > >  uint32_t tx_bb_db =3D 0;
> > > > >  void *first_wqe =3D NULL;
> > > > >
> > > > >  /*
> > > > >   * Prepare all Packets by writing the WQEs without the 4 first b=
ytes of
> > > > >   * the first WQE.
> > > > >   */
> > > > >  for () {
> > > > >  	if (!wqe) {
> > > > > 		first_wqe =3D wqe;
> > > > > 		tx_bb_db =3D foo;
> > > > > 	}
> > > > >  }
> > > > >  /* Leaving. */
> > > > >  rte_wmb();
> > > > >  *(uin32_t*)wqe =3D tx_bb_db;
> > > > >  return n;
> > > > >
> > > >
> > > > I will take care to check if we can do 2 loops:
> > > > Write all  last 60B per TXbb.
> > > > Memory barrier.
> > > > Write all first 4B per TXbbs.
> > > >
> > > > > > The last 60 bytes of any TXWBB can be written in any order
> > > > > > (before writing the first 4 bytes).
> > > > > > Is your last statement (using lonely barrier) is in accordance
> > > > > > with this limitation? Please explain.
> > > > > >
> > > > > > > There is also too many cases handled which are useless in
> > > > > > > bursts
> > > > > situation,
> > > > > > > this function needs to be re-written to its minimal use case =
i.e.
> > > > > processing a
> > > > > > > valid burst of packets/segments and triggering at the end of
> > > > > > > the burst the
> > > > > Tx
> > > > > > > doorbell.
> > > > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > --
> > > > > N=E9lio Laranjeiro
> > > > > 6WIND
> > >
> > > Regards,
> > >
> > > --
> > > N=E9lio Laranjeiro
> > > 6WIND
>=20
> --
> N=E9lio Laranjeiro
> 6WIND