From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 79D61A04D7; Wed, 2 Sep 2020 23:52:53 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BC5E9137D; Wed, 2 Sep 2020 23:52:52 +0200 (CEST) Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by dpdk.org (Postfix) with ESMTP id E6A10E07 for ; Wed, 2 Sep 2020 23:52:50 +0200 (CEST) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 02 Sep 2020 14:52:36 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 02 Sep 2020 14:52:50 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 02 Sep 2020 14:52:50 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 2 Sep 2020 21:52:47 +0000 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (104.47.59.174) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 2 Sep 2020 21:52:45 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q12IBN02qo0nEMH+ShGZWYp5OPZYw2pCLRfuMGmtcCxBWMtuXtETUKbAgETpBjwT/X8jbQvc72lMqgrlnAb3tnGX8Uk9UmlKGEzjGrQbecWQnE8Ew7eSZ1tCY4vEabCqOyLPoNR7YzkEbcTK6AjKdt8HiqbzVPxkiXvYygy9WdlrDjCHSZShXrfLPN7hHCaahjLHK3T6h8+QdOR44+eME/BnB7WCJ5YeMEUTYz8CQjolqDF0/BabTPhnqHCtPnEkfQJXSLQqokiSL19IGz70i42GxIRuGKWL9Nz6v5PTJAG4J64BbXZ5z86is1mCblYY0ybYW6U7ZOdsJvek8TciiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZJ87wiA2+brVbYqVoIoA6DavryPvv4d4Y/DG9JqDWzQ=; b=jd2LKDl7htFZ8OPGcd7yshEWOVWwm0T0JYlZ0DG99C8Blb4bYGWcjlZ8FsSRgAxTXHj7T1M0rxgBFIIUvkt8PHuiv5FNdYeSzA+UAR14sXCxt5ZmyQqiSEI9RgYkYsMyDj4S9euMI/qDvO9rGtkgfk5e21+u1Q+T72JFUr4l2E7g+4XNwRkOplkNp5Ic8m90mS2Wp6Zr/HxHr+t275/8BQDyR+9qEOJa0i76yDIwQ3RlHDyZWLg3wRAAWxNRId8DeCxAZURihDd6M8J+saFo5TTrDEqcMu3uTWtWLqmOETTBppcu3fmlc/6NvMNpDC7Y5l6zGaXjxh7a4W6YmW4Ntw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none Received: from BN7PR12MB2707.namprd12.prod.outlook.com (2603:10b6:408:2f::29) by BN6PR1201MB0116.namprd12.prod.outlook.com (2603:10b6:405:56::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.15; Wed, 2 Sep 2020 21:52:44 +0000 Received: from BN7PR12MB2707.namprd12.prod.outlook.com ([fe80::10ac:16a3:8da:90db]) by BN7PR12MB2707.namprd12.prod.outlook.com ([fe80::10ac:16a3:8da:90db%7]) with mapi id 15.20.3326.025; Wed, 2 Sep 2020 21:52:44 +0000 From: Alexander Kozyrev To: Honnappa Nagarahalli , Alexander Kozyrev , Phil Yang , Matan Azrad , Shahaf Shuler , Slava Ovsiienko CC: "drc@linux.vnet.ibm.com" , nd , "dev@dpdk.org" , nd Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Thread-Index: AQHWb59A0moFfvs4OECmAewY9lb1m6lWBrBg Date: Wed, 2 Sep 2020 21:52:44 +0000 Message-ID: References: <20200410164127.54229-7-gavin.hu@arm.com> <1592900807-13289-1-git-send-email-phil.yang@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=nvidia.com; x-originating-ip: [2607:fea8:e380:d8e0:c158:af3a:4b17:9ea0] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 763b5a36-4984-432b-3819-08d84f8a85e4 x-ms-traffictypediagnostic: BN6PR1201MB0116: x-ld-processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-transport-forked: True x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: w1Gi45d/JhAETt/agVdBDC74nc7GHh2hG8H9OpxV/Ab3NGFfUfsGNbLjuru81+kkgjhLnoSd8bm8Tqjrm86Dfkcx3JWQkY+7X4u8zR0KFlQxewB9ysdwg1wYjjuPEKtbqycpTJFDlO+dFExE6NmuAYqkRzo/PqLodva2KwlRL5GUajHUJfLcDXl4iuc07fsQlmRtrxALE2MebrLM03c0r+cqmqUCIM5a4uPvbGltbteG6lZN2liI8ls8CTIGux+pKjPPDk9JpEYY0htoaUOMR+ziZtkrbMorFLSg9em+wYbNlczYZ9uARu1YgNrMyU11RG/8Kh2D30Kc71x80d03Fw== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN7PR12MB2707.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(376002)(346002)(136003)(366004)(396003)(33656002)(66476007)(4326008)(186003)(9686003)(64756008)(86362001)(5660300002)(478600001)(66556008)(8936002)(66446008)(52536014)(66946007)(71200400001)(6506007)(7696005)(54906003)(110136005)(2906002)(55016002)(83380400001)(316002)(76116006); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: A0EE9UNPw8V+V7K4IonMdr5df4M4yVNpaAeN3LSFbwJsyA25FcyK4TfzV7AkG9IBWzUsg3nYwFpl9ZFJ7RzurWwMKRgiF1bynhEuboV1Ebi43mJu7LMJ5AyZAgsmDoVCQxUgpX8P/2nL+7UBixru0mX9SUXWjqJDw8ziCVbEcBzww/sz8kKvey2mesZJPt/x4+22G9Ugx2R4UsQ1ux4OrhY+r6mK8LpRVhXZvbLC+lZaG3lb6rh5P5d2A7VWq9/+2Wc8+weDT7k2GWczSgkUJIe6/tf2Nh+oBoaE1crpLwJz8j9O2F9ynFj/MYaBSCNOJ6PIv65Zx2jsYt9J9JE6OoLFjzXfJCsegRi2Zi0XWVlCaKzQkz51hTBuKZDbJPqMM1ejMNSbX5zAAiN8blxcDLJFPT200Ub4RyX7PgX1ZQ6u2tryAaE9oHDz2Z+RLT4lwFGeAsBxWk6h9dQKkoOcWaF8qfSaFJRNQuw1R/gLi6bv/VI959OF4W3rY1Aw7ZhztG2cwl5RRZvkB9khHtSCmcB7zUzP5Lz4Rfg7yVbyqLUkTCIYesoxA/Zp8LvKsXFP9ajOrVgLMIlEHUB/5CWLLoSN0s5bKjv7hlmOKVqPfbuTX8l+2x7a0jrmPAFkNQhZwym3xEGZtlKKrz0bt7xvhdF5dd1AO/fm4sVVMJH7NTJuLs4j81bC4ShNXZgO0A+1azTY9TmY/OZaS0PCuhO+jg== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN7PR12MB2707.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 763b5a36-4984-432b-3819-08d84f8a85e4 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Sep 2020 21:52:44.2183 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: OBT0ssIjO50sEW3kLl3BW5ZBjInvTE1dOg89/mFfsXX7jb6bbEn8XagfYKidd3385MRQiGKMSk8YuktzZDKXlw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0116 X-OriginatorOrg: Nvidia.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1599083556; bh=ZJ87wiA2+brVbYqVoIoA6DavryPvv4d4Y/DG9JqDWzQ=; h=X-PGP-Universal:ARC-Seal:ARC-Message-Signature: ARC-Authentication-Results:From:To:CC:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:authentication-results:x-originating-ip: x-ms-publictraffictype:x-ms-office365-filtering-correlation-id: x-ms-traffictypediagnostic:x-ld-processed: x-microsoft-antispam-prvs:x-ms-exchange-transport-forked: x-ms-oob-tlc-oobclassifiers:x-ms-exchange-senderadcheck: x-microsoft-antispam:x-microsoft-antispam-message-info: x-forefront-antispam-report:x-ms-exchange-antispam-messagedata: Content-Type:Content-Transfer-Encoding:MIME-Version: X-MS-Exchange-CrossTenant-AuthAs: X-MS-Exchange-CrossTenant-AuthSource: X-MS-Exchange-CrossTenant-Network-Message-Id: X-MS-Exchange-CrossTenant-originalarrivaltime: X-MS-Exchange-CrossTenant-fromentityheader: X-MS-Exchange-CrossTenant-id:X-MS-Exchange-CrossTenant-mailboxtype: X-MS-Exchange-CrossTenant-userprincipalname: X-MS-Exchange-Transport-CrossTenantHeadersStamped:X-OriginatorOrg; b=eNI5XPozqrZdQ8JLZbFB7GJvKkMmX7ni+UYS+Z48MzRqAEDR0Xllg3Vv3QnZ/410l M03Alr7WoGvZMzGEAGCxSDSWyddRPJ2ZHaineX+0SVor+1LLza58SEoO+FU6tbFwTn OhmAwO3LisGgDWeGoB4roGHHdTQkbOoF0f7C0vy/ZJMqET8nvl3dnStYq9cvAOYMVJ 5snAbcYjjHBFf9A19xtLgRcUeRi+sno1QiYUVnOyULBDLzMcPM61acyKgy3gNw+KiT AZfP3mDZiKBHZzJZ3QasoX0QymkIvT9k/MA+b+iMinpfo6TiF/vAPivi3raUiwoHXH 98Wu+hiHU/KGg== Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > >=20 > > > > > > > > > > > > @@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void > *dpdk_rxq, > > > > > struct > > > > > > > > > rte_mbuf **pkts, uint16_t pkts_n) void *buf_addr; > > > > > > > > > > > > > > > > > > /* Increment the refcnt of the whole chunk. */ > > > > > > > > > -rte_atomic16_add_return(&buf->refcnt, 1); > > > > > > rte_atomic16_add_return includes a full barrier along with > > > > > > atomic > > > > > operation. > > > > > > But is full barrier required here? For ex: > > > > > > __atomic_add_fetch(&buf->refcnt, 1, > > > > > > __ATOMIC_RELAXED) will offer atomicity, but no barrier. Would > > > > > > that be enough? > > > > > > > > > > > > > > > -MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf- > > > > > > > > > >refcnt) <=3D > > > > > > > > > - strd_n + 1); > > > > > > > > > +__atomic_add_fetch(&buf->refcnt, 1, > > > > > > > > > __ATOMIC_ACQUIRE); > > > > > > > > > > The atomic load in MLX5_ASSERT() accesses the same memory space > > > > > as the previous __atomic_add_fetch() does. > > > > > They will access this memory space in the program order when we > > > > > enabled MLX5_PMD_DEBUG. So the ACQUIRE barrier in > > > > > __atomic_add_fetch() becomes unnecessary. > > > > > > > > > > By changing it to RELAXED ordering, this patch got 7.6% > > > > > performance improvement on N1 (making it generate A72 alike > > instructions). > > > > > > > > > > Could you please also try it on your testbed, Alex? > > > > > > > > Situation got better with this modification, here are the results: > > > > - no patch: 3.0 Mpps CPU cycles/packet=3D51.52 > > > > - original patch: 2.1 Mpps CPU cycles/packet=3D71.05 > > > > - modified patch: 2.9 Mpps CPU cycles/packet=3D52.79 Also, I found > > > > that the degradation is there only in case I enable bursts stats. > > > > > > > > > Great! So this patch will not hurt the normal datapath performance. > > > > > > > > > > Could you please turn on the following config options and see if > > > > you can reproduce this as well? > > > > CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dy > > > > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dy > > > > > > Thanks, Alex. Some updates. > > > > > > Slightly (about 1%) throughput degradation was detected after we > > > enabled these two config options on N1 SoC. > > > > > > If we look insight the perf stats results, with this patch, both > > > mlx5_rx_burst and mlx5_tx_burst consume fewer CPU cycles than the > > original code. > > > However, __memcpy_generic takes more cycles. I think that might be > > > the reason for CPU cycles per packet increment after applying this pa= tch. > > > > > > Original code: > > > 98.07%--pkt_burst_io_forward > > > | > > > |--44.53%--__memcpy_generic > > > | > > > |--35.85%--mlx5_rx_burst_mprq > > > | > > > |--15.94%--mlx5_tx_burst_none_empw > > > | | > > > | |--7.32%--mlx5_tx_handle_completion.isra.0 > > > | | > > > | --0.50%--__memcpy_generic > > > | > > > --1.14%--memcpy@plt > > > > > > Use C11 with RELAXED ordering: > > > 99.36%--pkt_burst_io_forward > > > | > > > |--47.40%--__memcpy_generic > > > | > > > |--34.62%--mlx5_rx_burst_mprq > > > | > > > |--15.55%--mlx5_tx_burst_none_empw > > > | | > > > | --7.08%--mlx5_tx_handle_completion.isra.0 > > > | > > > --1.17%--memcpy@plt > > > > > > BTW, all the atomic operations in this patch are not the hotspot. > > > > Phil, we are seeing much worse degradation on our ARM platform > > unfortunately. > > I don't think that discrepancy in memcpy can explain this behavior. > > Your patch is not touching this area of code. Let me collect some perf > > stat on our side. > Are you testing the patch as is or have you made the changes that were > discussed in the thread? >=20 Yes, I made the changes you suggested. It really gets better with them. Could you please respin the patch to make sure I got it right in my environ= ment? > > > > > > > > > > > > > > > > > > > > > Can you replace just the above line with the following lines an= d test > it? > > > > > > > > > > > > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED); > > > > > > __atomic_thread_fence(__ATOMIC_ACQ_REL); > > > > > > > > > > > > This should make the generated code same as before this patch. > > > > > > Let me know if you would prefer us to re-spin the patch > > > > > > instead (for > > > testing). > > > > > > > > > > > > > > > +MLX5_ASSERT(__atomic_load_n(&buf->refcnt, > > > > > > > > > + __ATOMIC_RELAXED) <=3D strd_n + 1); > > > > > > > > > buf_addr =3D RTE_PTR_SUB(addr, > RTE_PKTMBUF_HEADROOM); > > > > > > > > > /* > > > > > > > > > * MLX5 device doesn't use iova but it is necessary in > > > > > > > > > a > > > > > > > > diff > > > > > > > > > --git a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > b/drivers/net/mlx5/mlx5_rxtx.h index 26621ff..0fc15f3 > > > > > > > > > 100644 > > > > > > > > > --- a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > >