From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 96CF9A04B1; Wed, 9 Sep 2020 15:29:43 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 77E681BF8D; Wed, 9 Sep 2020 15:29:42 +0200 (CEST) Received: from hqnvemgate25.nvidia.com (hqnvemgate25.nvidia.com [216.228.121.64]) by dpdk.org (Postfix) with ESMTP id 161E21B9B7 for ; Wed, 9 Sep 2020 15:29:40 +0200 (CEST) Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 09 Sep 2020 06:28:56 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 09 Sep 2020 06:29:40 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 09 Sep 2020 06:29:40 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 9 Sep 2020 13:29:37 +0000 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (104.47.57.175) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 9 Sep 2020 13:29:37 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RbqSJM+FruD2BNz2iGsgjjh0mk2yL7UtcrL4UlrwdAxEnzK5eKviASr50704J9OPGggDldqHgUiB9kswzVNrUBwKeQiUMjHLTS9R+BQDlI7+cfGCPgu0+fqFtpAhg2v/r+1MhFJZpujbQrM1kjrxSrtWOGM+GGcQ76SArKFQ7ycbumHAvWHn5LNl+CvCvwDk07YY7/haUFuZ3BK7sBR54s/rjbenYrHkh0Wq2SqqoCjyAUJL6cGSmogxbn1xTC4DFy8dN8Ad1HYPJ7ndDHu5m1PMXp9rZr0THh/8iU9uL//XQ1uo2aHGdKAcIXXDJok4LCHVeWX6k6XtiSTNLj66ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bBzFjj9W3iOAc8xN8uaIEY5fypTBlI2qfzc5grNaF0w=; b=ZfJuUEuz2reglc8JvaIei2zXNVM11cJt8goFEIxNECxrSGtPk/OfeZ5jRyqsjgkx+1b/3v14eRv4iAcXa2lsuoAmjKUgbHrXlfxP0lZDQKIXPV1mrDiciScEAqPLWCLWKgbIF4pFsCd8aVOHSePQtBKOwTTPw0V2wILRh0y5puG7XTPmjaxTqLn98WTsnn19lUUAllbkGaG8bdzLptz6XIgZp+0Oy20Y1f9Kc/PWJPtQnrh34+/x0hkoeZMkwu/ShR6YCSw/1V7LXz1m0cndCmPPomeK4n/XociHMC56G7hZoQjgxckGT22zH0IxOZOhMWkz+bMebx5j4vc9DI/VjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none Received: from BN7PR12MB2707.namprd12.prod.outlook.com (2603:10b6:408:2f::29) by BN6PR1201MB0194.namprd12.prod.outlook.com (2603:10b6:405:59::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3370.16; Wed, 9 Sep 2020 13:29:36 +0000 Received: from BN7PR12MB2707.namprd12.prod.outlook.com ([fe80::10ac:16a3:8da:90db]) by BN7PR12MB2707.namprd12.prod.outlook.com ([fe80::10ac:16a3:8da:90db%7]) with mapi id 15.20.3348.019; Wed, 9 Sep 2020 13:29:36 +0000 From: Alexander Kozyrev To: Phil Yang , Honnappa Nagarahalli , Alexander Kozyrev , Matan Azrad , Shahaf Shuler , Slava Ovsiienko CC: "drc@linux.vnet.ibm.com" , nd , "dev@dpdk.org" , nd , nd Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Thread-Index: AQHWb59A0moFfvs4OECmAewY9lb1m6lWBrBggABVYQCACh5scA== Date: Wed, 9 Sep 2020 13:29:36 +0000 Message-ID: References: <20200410164127.54229-7-gavin.hu@arm.com> <1592900807-13289-1-git-send-email-phil.yang@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=nvidia.com; x-originating-ip: [2607:fea8:e380:d8e0:74e3:2f60:801a:767f] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 5a96dc2e-d672-4543-2045-08d854c46576 x-ms-traffictypediagnostic: BN6PR1201MB0194: x-ld-processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-transport-forked: True x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: FKepEutc/3hnBP511tA9GN9UsHmFZAUAcbsZx/VGbhyAkEXNO3KVXSiWwIGoW4JtbAbKkIwanMnZm99qyPlu4dIocz6r7xuux1+GAmeoLAMAFrvgKTjKhBQ3GMXrICTra1m2+Y3j10pRQXyKRFE6QOvXzBG0dF21Uw5GNU6SWOLnu921I5QDkqCHeX6rev/yItGbV47KgJCDvFEHJDlHZieRJUAVM3TP/XlXdKQDe5jgzgSEh1miv7EEhAcqq85jkPV/kSI/F5D5UXXi+sDtfU3N6lpw//ZFL3p01jmP30EkbT8QPplyJrkVa9HUq+028NoYGhlWOhVsxdIfEcWi8xnw46q9xB0MwAM4tYo98xw= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN7PR12MB2707.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(376002)(346002)(39860400002)(136003)(396003)(5660300002)(66476007)(64756008)(66946007)(66556008)(66446008)(2906002)(54906003)(8936002)(76116006)(86362001)(186003)(52536014)(45080400002)(478600001)(110136005)(6506007)(83380400001)(71200400001)(55016002)(9686003)(7696005)(966005)(316002)(4326008)(33656002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: TVGogTgh4vutqPTDc4FC6VyhK2fpiIBb7BNUUy74lwdJpCwPCAIjxNmROlw87Dx1DgZgk2DhXtwITI55AIfoAqiFWvFDwpRWco/xOMtDCDJ/w9VXM5N/goZSSYY+6gGrcRKGdqqE1iCL3+H7YSB8NbiIPJwm3evxjMNygAjwbw2B6tcSp5F8+f5z1y2wFBaNjAOb3ThPU7mpuEOo0W8CW3/VMg3XJeasnscJwJAkqYdkvzwVlAUa6ZCYpy+iPouVIOq79h8wC9NKWocHMblyouDJ8efrihro02s5Q0VB/JsK8rueQpLL4+HjA4o7QjHPtuSGHwEAS+ZbfCdqHCg+cfBBBso8KFjSix+K+G849eMkWLdA/hv3pZCOjeF1qWkE1StxGzmyGYpUi7a3jBQCKjMVN/YMm7fPZXZFWfu8IZR0LJ9890c2t6PNjOrDZI7Pz0e6Vx9n/jwmcGmCgZU4EvGNqH8oAHxRdTdyWqH9Ybly9oSOEM7ExG0M14FIS7A85jV901cfbEdQvXs874OTZHh+GsmOH8ZzaBvtTOv67+HI/MhPje4/o4bTKuAuO3+qaqGh9EN7P08ju+gudvjatdtcrjgBgw6JyEvwtUk+boRmvV93D7cOwgYme53CCbRJcUWA8nEsIMSSl9+7KgUP2Md4U8ap25ZZ0hy+Xnf0Vf/ss8vTWCVljM5IgRiGTo5FiqYDqkvF5eqpAm5JAKil9g== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN7PR12MB2707.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5a96dc2e-d672-4543-2045-08d854c46576 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Sep 2020 13:29:36.3589 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: X65wpuvr/zB7lmpPdvOhfaB4RJQ21aM7EXGT60n9mSKzDRVZiSsHL4F1kudPpOVO5XwltHOnooKalh2y7pHJYg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0194 X-OriginatorOrg: Nvidia.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1599658136; bh=bBzFjj9W3iOAc8xN8uaIEY5fypTBlI2qfzc5grNaF0w=; h=X-PGP-Universal:ARC-Seal:ARC-Message-Signature: ARC-Authentication-Results:From:To:CC:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:authentication-results:x-originating-ip: x-ms-publictraffictype:x-ms-office365-filtering-correlation-id: x-ms-traffictypediagnostic:x-ld-processed: x-microsoft-antispam-prvs:x-ms-exchange-transport-forked: x-ms-oob-tlc-oobclassifiers:x-ms-exchange-senderadcheck: x-microsoft-antispam:x-microsoft-antispam-message-info: x-forefront-antispam-report:x-ms-exchange-antispam-messagedata: Content-Type:Content-Transfer-Encoding:MIME-Version: X-MS-Exchange-CrossTenant-AuthAs: X-MS-Exchange-CrossTenant-AuthSource: X-MS-Exchange-CrossTenant-Network-Message-Id: X-MS-Exchange-CrossTenant-originalarrivaltime: X-MS-Exchange-CrossTenant-fromentityheader: X-MS-Exchange-CrossTenant-id:X-MS-Exchange-CrossTenant-mailboxtype: X-MS-Exchange-CrossTenant-userprincipalname: X-MS-Exchange-Transport-CrossTenantHeadersStamped:X-OriginatorOrg; b=NquNk4CgUJxOZNcclzasQx8Tn8Uh1alXvwt3rDlC+0V9tzxTBgS7JQCyaCkfBaO/3 Rzzb0rmRSZbrxXcWht9DZX0CiUQVcY61vUpaw1egchLHxTf90KY7/+h6fyKPsV9kHE Z92GXwDAxYLtoWKyK+sNcptMp8jCsEB3X/DrAsKl+OSzMe5Twi3fhlKjWSO3v48gwp +5ualljLlgOD1gEr1TQEW5yENMLOTx0Khzcuny8rCvFGOyTldaNZZZ2n5H7Ddj5C8Q d/6YuCoY6ImaSE9kP3X19yPfZ/EKPJTHiJdEjdJDBZ5mIolMs5/6I/vVBlhD867wyG f2R1t+QM7ziWA== Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > > > > > > > > > > > > > > > > > > > @@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void > > > *dpdk_rxq, > > > > > > > struct > > > > > > > > > > > rte_mbuf **pkts, uint16_t pkts_n) void *buf_addr; > > > > > > > > > > > > > > > > > > > > > > /* Increment the refcnt of the whole chunk. */ > > > > > > > > > > > -rte_atomic16_add_return(&buf->refcnt, 1); > > > > > > > > rte_atomic16_add_return includes a full barrier along with > > > > > > > > atomic > > > > > > > operation. > > > > > > > > But is full barrier required here? For ex: > > > > > > > > __atomic_add_fetch(&buf->refcnt, 1, > > > > > > > > __ATOMIC_RELAXED) will offer atomicity, but no barrier. > > > > > > > > Would that be enough? > > > > > > > > > > > > > > > > > > > -MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf- > > > > > > > > > > > >refcnt) <=3D > > > > > > > > > > > - strd_n + 1); > > > > > > > > > > > +__atomic_add_fetch(&buf->refcnt, 1, > > > > > > > > > > > __ATOMIC_ACQUIRE); > > > > > > > > > > > > > > The atomic load in MLX5_ASSERT() accesses the same memory > > space > > > > > > > as the previous __atomic_add_fetch() does. > > > > > > > They will access this memory space in the program order when > > > > > > > we enabled MLX5_PMD_DEBUG. So the ACQUIRE barrier in > > > > > > > __atomic_add_fetch() becomes unnecessary. > > > > > > > > > > > > > > By changing it to RELAXED ordering, this patch got 7.6% > > > > > > > performance improvement on N1 (making it generate A72 alike > > > > instructions). > > > > > > > > > > > > > > Could you please also try it on your testbed, Alex? > > > > > > > > > > > > Situation got better with this modification, here are the resul= ts: > > > > > > - no patch: 3.0 Mpps CPU cycles/packet=3D51.52 > > > > > > - original patch: 2.1 Mpps CPU cycles/packet=3D71.05 > > > > > > - modified patch: 2.9 Mpps CPU cycles/packet=3D52.79 Also, I > > > > > > found that the degradation is there only in case I enable burst= s stats. > > > > > > > > > > > > > > > Great! So this patch will not hurt the normal datapath performanc= e. > > > > > > > > > > > > > > > > Could you please turn on the following config options and see > > > > > > if you can reproduce this as well? > > > > > > CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dy > > > > > > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dy > > > > > > > > > > Thanks, Alex. Some updates. > > > > > > > > > > Slightly (about 1%) throughput degradation was detected after we > > > > > enabled these two config options on N1 SoC. > > > > > > > > > > If we look insight the perf stats results, with this patch, both > > > > > mlx5_rx_burst and mlx5_tx_burst consume fewer CPU cycles than > > > > > the > > > > original code. > > > > > However, __memcpy_generic takes more cycles. I think that might > > > > > be the reason for CPU cycles per packet increment after applying > > > > > this > > patch. > > > > > > > > > > Original code: > > > > > 98.07%--pkt_burst_io_forward > > > > > | > > > > > |--44.53%--__memcpy_generic > > > > > | > > > > > |--35.85%--mlx5_rx_burst_mprq > > > > > | > > > > > |--15.94%--mlx5_tx_burst_none_empw > > > > > | | > > > > > | |--7.32%--mlx5_tx_handle_completion.isra.0 > > > > > | | > > > > > | --0.50%--__memcpy_generic > > > > > | > > > > > --1.14%--memcpy@plt > > > > > > > > > > Use C11 with RELAXED ordering: > > > > > 99.36%--pkt_burst_io_forward > > > > > | > > > > > |--47.40%--__memcpy_generic > > > > > | > > > > > |--34.62%--mlx5_rx_burst_mprq > > > > > | > > > > > |--15.55%--mlx5_tx_burst_none_empw > > > > > | | > > > > > | --7.08%--mlx5_tx_handle_completion.isra.0 > > > > > | > > > > > --1.17%--memcpy@plt > > > > > > > > > > BTW, all the atomic operations in this patch are not the hotspot. > > > > > > > > Phil, we are seeing much worse degradation on our ARM platform > > > > unfortunately. > > > > I don't think that discrepancy in memcpy can explain this behavior. > > > > Your patch is not touching this area of code. Let me collect some > > > > perf stat on our side. > > > Are you testing the patch as is or have you made the changes that > > > were discussed in the thread? > > > > > > > Yes, I made the changes you suggested. It really gets better with them. > > Could you please respin the patch to make sure I got it right in my > > environment? >=20 > Thanks, Alex. > Please check the new version here. > https://nam11.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fpatchw= o > rk.dpdk.org%2Fpatch%2F76335%2F&data=3D02%7C01%7Cakozyrev%40nvidi > a.com%7C2486830050214bac8b9708d84fb4d9f9%7C43083d15727340c1b7db39 > efd9ccc17a%7C0%7C0%7C637346985463620568&sdata=3DWGw0JZPcbjosSiI > UxJuQz3r2pZBYkz%2BIXSqlOXimZdc%3D&reserved=3D0 This patch is definitely better, do not see a degradation anymore, thank yo= u. Acked-by: Alexander Kozyrev >=20 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you replace just the above line with the following > > > > > > > > lines and > > test > > > it? > > > > > > > > > > > > > > > > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED); > > > > > > > > __atomic_thread_fence(__ATOMIC_ACQ_REL); > > > > > > > > > > > > > > > > This should make the generated code same as before this pat= ch. > > > > > > > > Let me know if you would prefer us to re-spin the patch > > > > > > > > instead (for > > > > > testing). > > > > > > > > > > > > > > > > > > > +MLX5_ASSERT(__atomic_load_n(&buf->refcnt, > > > > > > > > > > > + __ATOMIC_RELAXED) <=3D strd_n + 1); > > > > > > > > > > > buf_addr =3D RTE_PTR_SUB(addr, > > > RTE_PKTMBUF_HEADROOM); > > > > > > > > > > > /* > > > > > > > > > > > * MLX5 device doesn't use iova but it is necessary > > > > > > > > > > > in a > > > > > > > > > > diff > > > > > > > > > > > --git a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > > > b/drivers/net/mlx5/mlx5_rxtx.h index > > > > > > > > > > > 26621ff..0fc15f3 > > > > > > > > > > > 100644 > > > > > > > > > > > --- a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > > > > >