From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 152C84602B; Thu, 9 Jan 2025 12:31:10 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F155140BA5; Thu, 9 Jan 2025 12:31:07 +0100 (CET) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 3625040B94 for ; Thu, 9 Jan 2025 12:31:04 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4YTMxr4MNjzpbNh; Thu, 9 Jan 2025 19:29:20 +0800 (CST) Received: from kwepemo500011.china.huawei.com (unknown [7.202.195.194]) by mail.maildlp.com (Postfix) with ESMTPS id EF79B1800D1; Thu, 9 Jan 2025 19:31:01 +0800 (CST) Received: from [10.67.121.193] (10.67.121.193) by kwepemo500011.china.huawei.com (7.202.195.194) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 9 Jan 2025 19:31:01 +0800 Message-ID: <68164537-b1a3-4135-a8c0-5aad66b7ce73@huawei.com> Date: Thu, 9 Jan 2025 19:31:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] examples/l3fwd: optimize packet prefetch To: Konstantin Ananyev , "dev@dpdk.org" CC: "wathsala.vithanage@arm.com" , "stephen@networkplumber.org" , liuyonglong , Fengchengwen , haijie , "lihuisong (C)" References: <20241225075302.353013-1-huangdengdui@huawei.com> Content-Language: en-US From: huangdengdui In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.121.193] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemo500011.china.huawei.com (7.202.195.194) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2025/1/8 21:42, Konstantin Ananyev wrote: > > >> >> The prefetch window depending on the hardware platform. The current prefetch >> policy may not be applicable to all platforms. In most cases, the number of >> packets received by Rx burst is small (64 is used in most performance reports). >> In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all >> packets before processing can achieve better performance. > > As you mentioned 'prefetch' behavior differs a lot from one HW platform to another. > So it could easily be that changes you suggesting will cause performance > boost on one platform and degradation on another. > In fact, right now l3fwd 'prefetch' usage is a bit of mess: > - l3fwd_lpm_neon.h uses FWDSTEP as a prefetch window. > - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose > - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at all > > Probably what we need here is some unified approach: > configurable at run-time prefetch_window_size that all code-paths will obey. Agreed, I'll add a parameter to configure the prefetch window. > >> Signed-off-by: Dengdui Huang >> --- >> examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++----------------------------- >> 1 file changed, 5 insertions(+), 37 deletions(-) >> >> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h >> index 3c1f827424..0b51782b8c 100644 >> --- a/examples/l3fwd/l3fwd_lpm_neon.h >> +++ b/examples/l3fwd/l3fwd_lpm_neon.h >> @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf **pkts_burst, >> const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); >> const int32_t m = nb_rx % FWDSTEP; >> >> - if (k) { >> - for (i = 0; i < FWDSTEP; i++) { >> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], >> - void *)); >> - } >> - for (j = 0; j != k - FWDSTEP; j += FWDSTEP) { >> - for (i = 0; i < FWDSTEP; i++) { >> - rte_prefetch0(rte_pktmbuf_mtod( >> - pkts_burst[j + i + FWDSTEP], >> - void *)); >> - } >> + /* The number of packets is small. Prefetch all packets. */ >> + for (i = 0; i < nb_rx; i++) >> + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *)); >> >> + if (k) { >> + for (j = 0; j != k; j += FWDSTEP) { >> processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); >> processx4_step2(qconf, dip, ipv4_flag, portid, >> &pkts_burst[j], &dst_port[j]); >> if (do_step3) >> processx4_step3(&pkts_burst[j], &dst_port[j]); >> } >> - >> - processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); >> - processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j], >> - &dst_port[j]); >> - if (do_step3) >> - processx4_step3(&pkts_burst[j], &dst_port[j]); >> - >> - j += FWDSTEP; >> } >> >> if (m) { >> - /* Prefetch last up to 3 packets one by one */ >> - switch (m) { >> - case 3: >> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], >> - void *)); >> - j++; >> - /* fallthrough */ >> - case 2: >> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], >> - void *)); >> - j++; >> - /* fallthrough */ >> - case 1: >> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], >> - void *)); >> - j++; >> - } >> - j -= m; >> /* Classify last up to 3 packets one by one */ >> switch (m) { >> case 3: >> -- >> 2.33.0 >