From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9C8064601E; Wed, 8 Jan 2025 14:42:27 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 398B14014F; Wed, 8 Jan 2025 14:42:27 +0100 (CET) Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by mails.dpdk.org (Postfix) with ESMTP id C68E5400D6 for ; Wed, 8 Jan 2025 14:42:25 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4YSpvD6Hd6z22kcP; Wed, 8 Jan 2025 21:40:08 +0800 (CST) Received: from kwepemf200007.china.huawei.com (unknown [7.202.181.233]) by mail.maildlp.com (Postfix) with ESMTPS id 14C961402C4; Wed, 8 Jan 2025 21:42:24 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by kwepemf200007.china.huawei.com (7.202.181.233) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 8 Jan 2025 21:42:22 +0800 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Wed, 8 Jan 2025 14:42:20 +0100 From: Konstantin Ananyev To: huangdengdui , "dev@dpdk.org" CC: "wathsala.vithanage@arm.com" , "stephen@networkplumber.org" , liuyonglong , Fengchengwen , haijie , "lihuisong (C)" Subject: RE: [PATCH] examples/l3fwd: optimize packet prefetch Thread-Topic: [PATCH] examples/l3fwd: optimize packet prefetch Thread-Index: AQHbVqIcSetcUR8ocUOw3vCaD3RnRLMM9pOw Date: Wed, 8 Jan 2025 13:42:20 +0000 Message-ID: References: <20241225075302.353013-1-huangdengdui@huawei.com> In-Reply-To: <20241225075302.353013-1-huangdengdui@huawei.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.206.138.73] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org >=20 > The prefetch window depending on the hardware platform. The current prefe= tch > policy may not be applicable to all platforms. In most cases, the number = of > packets received by Rx burst is small (64 is used in most performance rep= orts). > In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all > packets before processing can achieve better performance. As you mentioned 'prefetch' behavior differs a lot from one HW platform to = another. So it could easily be that changes you suggesting will cause performance boost on one platform and degradation on another. In fact, right now l3fwd 'prefetch' usage is a bit of mess: - l3fwd_lpm_neon.h uses FWDSTEP as a prefetch window. - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at= all =20 Probably what we need here is some unified approach: configurable at run-time prefetch_window_size that all code-paths will obey= .=20 > Signed-off-by: Dengdui Huang > --- > examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++----------------------------- > 1 file changed, 5 insertions(+), 37 deletions(-) >=20 > diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_n= eon.h > index 3c1f827424..0b51782b8c 100644 > --- a/examples/l3fwd/l3fwd_lpm_neon.h > +++ b/examples/l3fwd/l3fwd_lpm_neon.h > @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf = **pkts_burst, > const int32_t k =3D RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); > const int32_t m =3D nb_rx % FWDSTEP; >=20 > - if (k) { > - for (i =3D 0; i < FWDSTEP; i++) { > - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], > - void *)); > - } > - for (j =3D 0; j !=3D k - FWDSTEP; j +=3D FWDSTEP) { > - for (i =3D 0; i < FWDSTEP; i++) { > - rte_prefetch0(rte_pktmbuf_mtod( > - pkts_burst[j + i + FWDSTEP], > - void *)); > - } > + /* The number of packets is small. Prefetch all packets. */ > + for (i =3D 0; i < nb_rx; i++) > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *)); >=20 > + if (k) { > + for (j =3D 0; j !=3D k; j +=3D FWDSTEP) { > processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); > processx4_step2(qconf, dip, ipv4_flag, portid, > &pkts_burst[j], &dst_port[j]); > if (do_step3) > processx4_step3(&pkts_burst[j], &dst_port[j]); > } > - > - processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); > - processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j], > - &dst_port[j]); > - if (do_step3) > - processx4_step3(&pkts_burst[j], &dst_port[j]); > - > - j +=3D FWDSTEP; > } >=20 > if (m) { > - /* Prefetch last up to 3 packets one by one */ > - switch (m) { > - case 3: > - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > - void *)); > - j++; > - /* fallthrough */ > - case 2: > - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > - void *)); > - j++; > - /* fallthrough */ > - case 1: > - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > - void *)); > - j++; > - } > - j -=3D m; > /* Classify last up to 3 packets one by one */ > switch (m) { > case 3: > -- > 2.33.0