From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 542FEA04D7; Thu, 3 Sep 2020 04:55:31 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C70F71C0B0; Thu, 3 Sep 2020 04:55:30 +0200 (CEST) Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30041.outbound.protection.outlook.com [40.107.3.41]) by dpdk.org (Postfix) with ESMTP id 82AFA1C0AF for ; Thu, 3 Sep 2020 04:55:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4546Bsp63D7FxhisLNKs6KqJ2PlB6FOXujbo8btFxW0=; b=1DoucrTrN3l/UhO7MIUUdO/sp7t1r/sr3fc2gLF/24S8iaf9zr2eHqGNsez/61U7diKV3LW0j8SE+i2GhBd+Xe151ov0HsUF27nWeMNsu69YoPgn0EmGwrQm4eTa0YZ9PzVzy3uhe+zM64b3X40sTEhGg5dXRgLTR5/cuCivgDU= Received: from DB6PR0802CA0035.eurprd08.prod.outlook.com (2603:10a6:4:a3::21) by DBBPR08MB4757.eurprd08.prod.outlook.com (2603:10a6:10:f0::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3326.23; Thu, 3 Sep 2020 02:55:26 +0000 Received: from DB5EUR03FT034.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:a3:cafe::d9) by DB6PR0802CA0035.outlook.office365.com (2603:10a6:4:a3::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.15 via Frontend Transport; Thu, 3 Sep 2020 02:55:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dpdk.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dpdk.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT034.mail.protection.outlook.com (10.152.20.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3326.19 via Frontend Transport; Thu, 3 Sep 2020 02:55:26 +0000 Received: ("Tessian outbound bac899b43a54:v64"); Thu, 03 Sep 2020 02:55:26 +0000 X-CR-MTA-TID: 64aa7808 Received: from c4e564ba8665.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A97C55F1-A2A3-4663-A7AC-EB2ABAEC190F.1; Thu, 03 Sep 2020 02:55:21 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c4e564ba8665.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 03 Sep 2020 02:55:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CNC0sEWHG5Bt5YccZJPSBvCf51QmrHuTTWSxTAjtN07+k6YeLQkHX4sJZ6TbBnjWpiFz/nMj4UGjCC2budoLaQt5QFW+RMFbFwnRboo7TuAUkuZiol4tiXzLLz2Par/6EKD/tGg1f65eQDpOgNpDOsSkX0oIMkYvkkNJMMKbqoKO7pU+3snVzXNKCuv9MhfQ7zD83o9BKmzf4djDPrwec1U53DheNmFyKg/9/fVmymFqNvsBzoI6OHcBaK0Dq8fbApOolpSURVHb6BEp8aCMDiZHGstJbYtEADwYmKYcPmWGNzoyR+R3y49x597EImLOHKBIiNAja6SFRwzGY/WASQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4546Bsp63D7FxhisLNKs6KqJ2PlB6FOXujbo8btFxW0=; b=lfNLvqTTleSmr23Lakewjb7BNGKeBNIeXJ8i5x+WUnG4wTkFLGHDwJ73bxtrKOmZ0BXzqglqB7w54seaMBh25z+IYrhXwRgIA/+kMiXUOZmMHuM+Wwq+B2hVatWpwh4IkT3Qy0K+2Vljn7+BF5T2SS5gauO7tjQEMZ3iMIBoRcT8gnlw3TPJ0ttsB9//qLpR0ACEkQu6If2zdpody4qPJr3ygpa29/9KAnhapJpmOhyw4g3Qh58v2FNCgaNPhxdmmMvWp5UxpE1j6Sn5xR9rEdebpAytvRVKg8LaewzCY2bAhxzW2EvbCw/cyhnnY4Q1SgRivdd738NKOWIs44zzSg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4546Bsp63D7FxhisLNKs6KqJ2PlB6FOXujbo8btFxW0=; b=1DoucrTrN3l/UhO7MIUUdO/sp7t1r/sr3fc2gLF/24S8iaf9zr2eHqGNsez/61U7diKV3LW0j8SE+i2GhBd+Xe151ov0HsUF27nWeMNsu69YoPgn0EmGwrQm4eTa0YZ9PzVzy3uhe+zM64b3X40sTEhGg5dXRgLTR5/cuCivgDU= Received: from DB7PR08MB3865.eurprd08.prod.outlook.com (2603:10a6:10:74::25) by DB7PR08MB3724.eurprd08.prod.outlook.com (2603:10a6:10:30::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.15; Thu, 3 Sep 2020 02:55:18 +0000 Received: from DB7PR08MB3865.eurprd08.prod.outlook.com ([fe80::519c:72bd:e189:625b]) by DB7PR08MB3865.eurprd08.prod.outlook.com ([fe80::519c:72bd:e189:625b%7]) with mapi id 15.20.3326.025; Thu, 3 Sep 2020 02:55:18 +0000 From: Phil Yang To: Alexander Kozyrev , Honnappa Nagarahalli , Alexander Kozyrev , Matan Azrad , Shahaf Shuler , Slava Ovsiienko CC: "drc@linux.vnet.ibm.com" , nd , "dev@dpdk.org" , nd , nd Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Thread-Index: AQHWXuyRqTPpygPJg0imDSLz2DSNPakUmuIAgAAMARCAAL7QAIAF6krwgA8o4ICACAemAIAjqGoAgABUKzA= Date: Thu, 3 Sep 2020 02:55:18 +0000 Message-ID: References: <20200410164127.54229-7-gavin.hu@arm.com> <1592900807-13289-1-git-send-email-phil.yang@arm.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 07DF85D851A39B469D603062CC2A9C27.0 x-checkrecipientchecked: true Authentication-Results-Original: nvidia.com; dkim=none (message not signed) header.d=none;nvidia.com; dmarc=none action=none header.from=arm.com; x-originating-ip: [203.126.0.113] x-ms-publictraffictype: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 2e690f03-2542-4563-51a8-08d84fb4cf8c x-ms-traffictypediagnostic: DB7PR08MB3724:|DBBPR08MB4757: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 5IwvlrvbZOAy8ntRLdgiE2i2/ZPkHsXCRX4jmss4pAYDqt8c3pscd7AX1Kja7nTRS+XcH54ynePtcrdLsQXT3QnccQ1xIaelEKLQiNWiFXxRdDEAQiH53P7jjDtQhufKkjIStGCRmNDRAHY288L+0b7qr7RtPnNTTPFwuY+JTHDFLD6pyoQBFqay76go+J5mFPfkcupFQX9ab0IIVUvsN3sQLTFPzCqR0bws11ApMC4abM1RUCQpKq+8kKM9QFgtI2MfjfT+zQN53xN+HulkrPM72egJPO0z42RV2qre40HcXcvxWBewHI+Zgq6eYfjdOvnMBTAqtyhTEybYS/LmFRpTibqQjYGi7UWhF1yftLKHEOfHKvlbbz+ZWtjQNLcBdFl5EbDstMI7phVxCh50AA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB7PR08MB3865.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(136003)(366004)(39860400002)(376002)(346002)(55016002)(4326008)(9686003)(2906002)(110136005)(8936002)(966005)(54906003)(76116006)(66446008)(66476007)(66556008)(86362001)(71200400001)(66946007)(6506007)(26005)(478600001)(186003)(7696005)(316002)(83380400001)(5660300002)(52536014)(33656002)(64756008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: JFlEfB9xqX2zc33CLJPbZzTAnIH2G7b0I+WPUkfi5QEZ7CbKGnMkGlQnoRlM23ILBz3gNiljNgzPf+AdEqYfn4FNOG2+rKYPWnmJSQXKmPvpY3wpW7KyaczO/+3EY4bkIsazoYtY9WztA/bshjY2ycIvX6W3z7MbbZdBAZLOlvtH7ptraFO52y59Y73eqE7hBCDDBKBOGN45ZDbyazVQb9qUIP2kCPzTVAQn2VLlWerY7cDaXA/DpakWR78UTsMb3cSmtgYwaHWSFagaKQOlDFYujclm2jtAznXa5fGYvqhe4EN750KS3sehczHbF99oB8ILKNjv++2jYmauXIB7pn8zTSBuj6obQ7TkUluuf6lBG4JcQxXvEOw0EDu6uzXM5JOwDCGUN/egyuOm0FlVMDN8M4Ch8LVKYRdcBeN3qMMBez+7RGnIt2xpWLeq8cQLKJVgFEtxvxgu4KsuIFXzncnkPZuTPZNeajTEAA1ARWQvCwKggQEu1OF2crKmDHht6wbGXywpxbO2AO09ww/Fuv7VIJvQmyrnQzHZ14OMDQwSh/kQhrHjDmXzlsbrtFZIYEE2fkNJN8fYZeBPwVRYXWJHM3fpFsHArvu0e5ejuTm4D/DMNyNWse7T51ocWpqrU6Pj3cYua8EbnV1qEE8vEQ== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3724 Original-Authentication-Results: nvidia.com; dkim=none (message not signed) header.d=none;nvidia.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: d9a25dba-8741-4a8d-7ef2-08d84fb4ca8b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Flanmsjrl2HpbccU3ws/Dolw2mvxbZth7Y+g+WDozZzZ/x4K6dF5t2Zn3ieWTLgmd7Xkc1hccNHyVEdE2+oLtkj0HpyzUeWHO6Bl8dbguvP9Bk3IBZ4Jl2zTBPGX5is716r758GgQWYnrnu/Uo24/JF97tZkIN8jUxkRLbJ15LWy3B/usCrbIQwZiTvHVZaCwnwmvzRDrJs/ZoXoXL/qioP53ThMPS19t2nJWfXmcADOfE5Ki7p8WFEp91pCn0ZIBicaJJPnOJ2GZYpsG/dFQVzRf3+DHaa7TIlLzSIScE2G8/Fh8Byq/RxJjcAicJGVpK/c+BrI57NgMY9VpjZcNKuyU5mT4Xs3qf5zV6I2nejlwRWRzJYloKPS2qpIM3wHilnCBXrihEcZIVa9zuAR2ECAvfOiUiYQiD3X3ewL3FKIFuIkNszC3pyQxv6+cVcSCyDjTYaPgRBv9HoJDDoP/GSVDqCzU+VlPdPuLuBG/Fc= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(39860400002)(346002)(376002)(396003)(46966005)(82310400003)(52536014)(478600001)(110136005)(186003)(336012)(9686003)(966005)(54906003)(5660300002)(47076004)(26005)(55016002)(86362001)(82740400003)(70586007)(2906002)(83380400001)(70206006)(7696005)(8936002)(4326008)(6506007)(33656002)(316002)(81166007)(356005); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Sep 2020 02:55:26.7642 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2e690f03-2542-4563-51a8-08d84fb4cf8c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB4757 Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > > > > > > > > > > > > > > > @@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void > > *dpdk_rxq, > > > > > > struct > > > > > > > > > > rte_mbuf **pkts, uint16_t pkts_n) void *buf_addr; > > > > > > > > > > > > > > > > > > > > /* Increment the refcnt of the whole chunk. */ > > > > > > > > > > -rte_atomic16_add_return(&buf->refcnt, 1); > > > > > > > rte_atomic16_add_return includes a full barrier along with > > > > > > > atomic > > > > > > operation. > > > > > > > But is full barrier required here? For ex: > > > > > > > __atomic_add_fetch(&buf->refcnt, 1, > > > > > > > __ATOMIC_RELAXED) will offer atomicity, but no barrier. Would > > > > > > > that be enough? > > > > > > > > > > > > > > > > > -MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf- > > > > > > > > > > >refcnt) <=3D > > > > > > > > > > - strd_n + 1); > > > > > > > > > > +__atomic_add_fetch(&buf->refcnt, 1, > > > > > > > > > > __ATOMIC_ACQUIRE); > > > > > > > > > > > > The atomic load in MLX5_ASSERT() accesses the same memory > space > > > > > > as the previous __atomic_add_fetch() does. > > > > > > They will access this memory space in the program order when we > > > > > > enabled MLX5_PMD_DEBUG. So the ACQUIRE barrier in > > > > > > __atomic_add_fetch() becomes unnecessary. > > > > > > > > > > > > By changing it to RELAXED ordering, this patch got 7.6% > > > > > > performance improvement on N1 (making it generate A72 alike > > > instructions). > > > > > > > > > > > > Could you please also try it on your testbed, Alex? > > > > > > > > > > Situation got better with this modification, here are the results= : > > > > > - no patch: 3.0 Mpps CPU cycles/packet=3D51.52 > > > > > - original patch: 2.1 Mpps CPU cycles/packet=3D71.05 > > > > > - modified patch: 2.9 Mpps CPU cycles/packet=3D52.79 Also, I fou= nd > > > > > that the degradation is there only in case I enable bursts stats. > > > > > > > > > > > > Great! So this patch will not hurt the normal datapath performance. > > > > > > > > > > > > > Could you please turn on the following config options and see if > > > > > you can reproduce this as well? > > > > > CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dy > > > > > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dy > > > > > > > > Thanks, Alex. Some updates. > > > > > > > > Slightly (about 1%) throughput degradation was detected after we > > > > enabled these two config options on N1 SoC. > > > > > > > > If we look insight the perf stats results, with this patch, both > > > > mlx5_rx_burst and mlx5_tx_burst consume fewer CPU cycles than the > > > original code. > > > > However, __memcpy_generic takes more cycles. I think that might be > > > > the reason for CPU cycles per packet increment after applying this > patch. > > > > > > > > Original code: > > > > 98.07%--pkt_burst_io_forward > > > > | > > > > |--44.53%--__memcpy_generic > > > > | > > > > |--35.85%--mlx5_rx_burst_mprq > > > > | > > > > |--15.94%--mlx5_tx_burst_none_empw > > > > | | > > > > | |--7.32%--mlx5_tx_handle_completion.isra.0 > > > > | | > > > > | --0.50%--__memcpy_generic > > > > | > > > > --1.14%--memcpy@plt > > > > > > > > Use C11 with RELAXED ordering: > > > > 99.36%--pkt_burst_io_forward > > > > | > > > > |--47.40%--__memcpy_generic > > > > | > > > > |--34.62%--mlx5_rx_burst_mprq > > > > | > > > > |--15.55%--mlx5_tx_burst_none_empw > > > > | | > > > > | --7.08%--mlx5_tx_handle_completion.isra.0 > > > > | > > > > --1.17%--memcpy@plt > > > > > > > > BTW, all the atomic operations in this patch are not the hotspot. > > > > > > Phil, we are seeing much worse degradation on our ARM platform > > > unfortunately. > > > I don't think that discrepancy in memcpy can explain this behavior. > > > Your patch is not touching this area of code. Let me collect some per= f > > > stat on our side. > > Are you testing the patch as is or have you made the changes that were > > discussed in the thread? > > >=20 > Yes, I made the changes you suggested. It really gets better with them. > Could you please respin the patch to make sure I got it right in my > environment? Thanks, Alex. Please check the new version here. http://patchwork.dpdk.org/patch/76335/ >=20 > > > > > > > > > > > > > > > > > > > > > > > > > > Can you replace just the above line with the following lines = and > test > > it? > > > > > > > > > > > > > > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED); > > > > > > > __atomic_thread_fence(__ATOMIC_ACQ_REL); > > > > > > > > > > > > > > This should make the generated code same as before this patch= . > > > > > > > Let me know if you would prefer us to re-spin the patch > > > > > > > instead (for > > > > testing). > > > > > > > > > > > > > > > > > +MLX5_ASSERT(__atomic_load_n(&buf->refcnt, > > > > > > > > > > + __ATOMIC_RELAXED) <=3D strd_n + 1); > > > > > > > > > > buf_addr =3D RTE_PTR_SUB(addr, > > RTE_PKTMBUF_HEADROOM); > > > > > > > > > > /* > > > > > > > > > > * MLX5 device doesn't use iova but it is necessary in > > > > > > > > > > a > > > > > > > > > diff > > > > > > > > > > --git a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > > b/drivers/net/mlx5/mlx5_rxtx.h index 26621ff..0fc15f3 > > > > > > > > > > 100644 > > > > > > > > > > --- a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > > > > >