From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 28123A052B; Thu, 6 Aug 2020 04:43:29 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0D6D21C036; Thu, 6 Aug 2020 04:43:29 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150055.outbound.protection.outlook.com [40.107.15.55]) by dpdk.org (Postfix) with ESMTP id 6BC282BF2 for ; Thu, 6 Aug 2020 04:43:28 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=N5Q0AchmEU6R11Z8RUJ+cQOjfVPisclyVxj8IxlmK4X7lQ656ZMF7bmFNygQaM38b6lGzEWch4zGNTAHj+8M7OahwKXZ/6TDY/Q8U6+0nqn6ZcKhfSikkOnlMHv+yJmKTqCNzQ3+AZIOkm/907HCAV2hJTLtTtChz2PQ3IAuxQitHFRU6NMqaw+F5jGmjoY+7YB4YvPlLzK0KwjNw/u9+C5OvV+HcKgfBFODc58zjLeDoE0WMH779ZUracpTAK7b+uIOJsm9lYk9fakFhMVP1kGBW9zqikVCVF2QE+Dqv9ZkaS4OdJUvruYUSRQb6cooqFPad0qXZzGU9NIDP/kC2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RgyYdAl5aYJ/lDd6jMOGYFZ1rD0OOeYWvsMqPPCwid8=; b=X1eK/NQLBUwbAI2sjocKPE64R29TZNzXPvq9gyeVXLhgkbMm8AxiefK4n03pL07tzZW2vt4RF6uyvUIegGcf1woMq67afMcrupNpHpUH7gJW4KwMVTviQdYiv7sUhPoMCZR1F1xcSU8lZxgss6tAa01MlGV51WPDnArBVLnTBIlaii2utTm51GK2M/JL5mRDte0y6so7scLaTWkn7y5/M9TFsXM4C9U86tFvExjbpoRct7S22lDGTqaj0HPY5SKg8cSvNSSnRmcCZdZjPGx2W+r9GBIfKU/rg+3IqdgCJo+gVP3STyHb9qQANNYbx4EpH0Qy/TFezdPE3gHuX+FC4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RgyYdAl5aYJ/lDd6jMOGYFZ1rD0OOeYWvsMqPPCwid8=; b=rzQTloGfXWNacxawKMRN7jaAmgoifgbGFVJHZhO7oK75be3N0FSfsl+PC/iRTwg1phmeZkI3PZi9RyCR5U4h36ypONraJxWcIFe9VXMxUamX4r1Tx2aKrluvYxTIumWNV770fiK+2Q0TOhdxWOrqVlwO0qhYtmi1AIGm968h5mA= Received: from AM0PR05MB4561.eurprd05.prod.outlook.com (2603:10a6:208:ad::20) by AM0PR05MB6019.eurprd05.prod.outlook.com (2603:10a6:208:130::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3261.19; Thu, 6 Aug 2020 02:43:25 +0000 Received: from AM0PR05MB4561.eurprd05.prod.outlook.com ([fe80::840d:df3e:be5d:1c2]) by AM0PR05MB4561.eurprd05.prod.outlook.com ([fe80::840d:df3e:be5d:1c2%2]) with mapi id 15.20.3239.021; Thu, 6 Aug 2020 02:43:25 +0000 From: Alexander Kozyrev To: Phil Yang , Honnappa Nagarahalli , Matan Azrad , Shahaf Shuler , Slava Ovsiienko CC: "drc@linux.vnet.ibm.com" , nd , "dev@dpdk.org" , nd Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Thread-Index: AQHWWMITvRJ6cPOqEEiXXa7L02rE6qkRJjhAgAOA/wCAABdygIAAsZnggAYpXQCADuZekA== Date: Thu, 6 Aug 2020 02:43:25 +0000 Message-ID: References: <20200410164127.54229-7-gavin.hu@arm.com> <1592900807-13289-1-git-send-email-phil.yang@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=mellanox.com; x-originating-ip: [2607:fea8:e380:d8e0:299c:e7f0:2cc9:ac52] x-ms-publictraffictype: Email x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: cd8ee83e-4d45-480a-622f-08d839b27e30 x-ms-traffictypediagnostic: AM0PR05MB6019: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: HtlpY3JjVUop4r4R4aJwIr9mC7tN1BviUi9IN4htjYJ/oG9entaxoMyrzk8t1vVBM9MtBFDXfTxOJV2i6UNLsHS7amWYg46A62fDXYKXeiUVLyU66ZKwjt7YJ8EKxzh0HZ4vI55xGTq9ZvMTbQp7Gce8WeLEkzF+jwhslU36apybFUPQt2slSLA1K8LV1D+xhRTIKw7o38RPKH46rT/B+l56kjPUDzmAlVFp+BAQ5Rj/f+SbgQRZTFhw2cZjXWT9GG5scboMcg+DJwdEH3TX3Fz2pWLfIN2lKelZJNsbUfMjJ4cvMLOobCuiPonj848rwBWS6W/RFeT2V+zB9R1iPg== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM0PR05MB4561.eurprd05.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(39860400002)(376002)(136003)(366004)(346002)(396003)(52536014)(86362001)(6506007)(9686003)(7696005)(478600001)(54906003)(55016002)(76116006)(64756008)(8936002)(66946007)(66476007)(66556008)(4326008)(66446008)(33656002)(316002)(6636002)(71200400001)(110136005)(5660300002)(2906002)(186003)(83380400001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: TokSB7P4M4X7aUJkeZr0u2KuAZ6IdnjhKCjzB8heeCswS6YkhZKkKbFqgGKXwCD0KL2iNAhUPXu3DQxDu/tJQNaDEi5tJn59na3CEccm/970SEmFBvzxN4XxewLG8ZQbcHJwQnRl+4wpC5UmNMqqKFM+SIgXdFUa2eptsSymgkAJmv0zA4QNJdN4D1ZMpoSn+yfoo61OSZaEkCdEmbVOKDo6LxM6F9654QlgozIb1SW+5K9zCgKv8C1bVk9aEIK0/kWVB5vmP+croXXxGjAOhb2B9di4RJ3Y+ebkbfkBsbeDxudesd4oUi/0PTR+P18JIXTJh961AkEF+p+QhvhqDRpZxcyD7Ry0eYNaXtSA07DqbQPcjZcf8Kyu0sWfeMHHMgqo+svGldkbASzLXVvOn0xJwKjuJenkXE5HjP/HNDmD4pUol1WGJRdv0DParPzSGYrbyixNnwJOgN4CtV3ofwl+3+hxmIGdA96cAWnzEZcOTGr7SPZjXH3quWMIvxd+4xvptrno2YTb2VMEToJeOG1JY/4X5+TZek4I31GFCn1Xqv5w+eYxyaW7vi/36iwnuTl+OshXAhlWJHBTOCN0uiYkBN7RUHmF4uFaJLEAGUJpr2KFHvSmtAY3XRKEGv4N7sS52BeqRuPEm2kUckZzV2/4rO0pTqJqBbDtTotfbMgMt73oJIv6gF4LdNW1KfGr+BHYpNJgUf4M/B0PJaUcIQ== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: AM0PR05MB4561.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: cd8ee83e-4d45-480a-622f-08d839b27e30 X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Aug 2020 02:43:25.6337 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: gAZtLG5o2sD83CLpvMa1UA3BGt2zPQt7F7j0QWl4ox6bMI6j4GuaucQbB3xFg35rCfb97j2a4VsxJWWN0e2icg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR05MB6019 Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > Phil Yang writes: >=20 > >=20 > > > > > > > @@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, > > > struct > > > > > > > rte_mbuf **pkts, uint16_t pkts_n) void *buf_addr; > > > > > > > > > > > > > > /* Increment the refcnt of the whole chunk. */ > > > > > > > -rte_atomic16_add_return(&buf->refcnt, 1); > > > > rte_atomic16_add_return includes a full barrier along with atomic > > > operation. > > > > But is full barrier required here? For ex: > > > > __atomic_add_fetch(&buf->refcnt, 1, > > > > __ATOMIC_RELAXED) will offer atomicity, but no barrier. Would that > > > > be enough? > > > > > > > > > > > -MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf- > > > > > > > >refcnt) <=3D > > > > > > > - strd_n + 1); > > > > > > > +__atomic_add_fetch(&buf->refcnt, 1, > > > > > > > __ATOMIC_ACQUIRE); > > > > > > The atomic load in MLX5_ASSERT() accesses the same memory space as > > > the previous __atomic_add_fetch() does. > > > They will access this memory space in the program order when we > > > enabled MLX5_PMD_DEBUG. So the ACQUIRE barrier in > > > __atomic_add_fetch() becomes unnecessary. > > > > > > By changing it to RELAXED ordering, this patch got 7.6% performance > > > improvement on N1 (making it generate A72 alike instructions). > > > > > > Could you please also try it on your testbed, Alex? > > > > Situation got better with this modification, here are the results: > > - no patch: 3.0 Mpps CPU cycles/packet=3D51.52 > > - original patch: 2.1 Mpps CPU cycles/packet=3D71.05 > > - modified patch: 2.9 Mpps CPU cycles/packet=3D52.79 Also, I found tha= t > > the degradation is there only in case I enable bursts stats. >=20 >=20 > Great! So this patch will not hurt the normal datapath performance. >=20 >=20 > > Could you please turn on the following config options and see if you > > can reproduce this as well? > > CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dy > > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dy >=20 > Thanks, Alex. Some updates. >=20 > Slightly (about 1%) throughput degradation was detected after we enabled > these two config options on N1 SoC. >=20 > If we look insight the perf stats results, with this patch, both mlx5_rx_= burst > and mlx5_tx_burst consume fewer CPU cycles than the original code. > However, __memcpy_generic takes more cycles. I think that might be the > reason for CPU cycles per packet increment after applying this patch. >=20 > Original code: > 98.07%--pkt_burst_io_forward > | > |--44.53%--__memcpy_generic > | > |--35.85%--mlx5_rx_burst_mprq > | > |--15.94%--mlx5_tx_burst_none_empw > | | > | |--7.32%--mlx5_tx_handle_completion.isra.0 > | | > | --0.50%--__memcpy_generic > | > --1.14%--memcpy@plt >=20 > Use C11 with RELAXED ordering: > 99.36%--pkt_burst_io_forward > | > |--47.40%--__memcpy_generic > | > |--34.62%--mlx5_rx_burst_mprq > | > |--15.55%--mlx5_tx_burst_none_empw > | | > | --7.08%--mlx5_tx_handle_completion.isra.0 > | > --1.17%--memcpy@plt >=20 > BTW, all the atomic operations in this patch are not the hotspot. Phil, we are seeing much worse degradation on our ARM platform unfortunatel= y. I don't think that discrepancy in memcpy can explain this behavior. Your patch is not touching this area of code. Let me collect some perf stat= on our side. >=20 > > > > > > > > > > Can you replace just the above line with the following lines and te= st it? > > > > > > > > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED); > > > > __atomic_thread_fence(__ATOMIC_ACQ_REL); > > > > > > > > This should make the generated code same as before this patch. Let > > > > me know if you would prefer us to re-spin the patch instead (for > testing). > > > > > > > > > > > +MLX5_ASSERT(__atomic_load_n(&buf->refcnt, > > > > > > > + __ATOMIC_RELAXED) <=3D strd_n + 1); > > > > > > > buf_addr =3D RTE_PTR_SUB(addr, RTE_PKTMBUF_HEADROOM); > > > > > > > /* > > > > > > > * MLX5 device doesn't use iova but it is necessary in a > > > > > > diff > > > > > > > --git a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > b/drivers/net/mlx5/mlx5_rxtx.h index 26621ff..0fc15f3 100644 > > > > > > > --- a/drivers/net/mlx5/mlx5_rxtx.h > > > > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.h > > > > >