From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 151B845F3A; Wed, 25 Dec 2024 08:53:10 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A1416402F2; Wed, 25 Dec 2024 08:53:09 +0100 (CET) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by mails.dpdk.org (Postfix) with ESMTP id AA2A4402EA for ; Wed, 25 Dec 2024 08:53:08 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4YJ3nH71VSz11NQX; Wed, 25 Dec 2024 15:49:39 +0800 (CST) Received: from kwepemo500011.china.huawei.com (unknown [7.202.195.194]) by mail.maildlp.com (Postfix) with ESMTPS id CAA2F1402CF; Wed, 25 Dec 2024 15:53:03 +0800 (CST) Received: from localhost.huawei.com (10.50.165.33) by kwepemo500011.china.huawei.com (7.202.195.194) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 25 Dec 2024 15:53:03 +0800 From: Dengdui Huang To: CC: , , , , , Subject: [PATCH] examples/l3fwd: optimize packet prefetch Date: Wed, 25 Dec 2024 15:53:02 +0800 Message-ID: <20241225075302.353013-1-huangdengdui@huawei.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.50.165.33] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemo500011.china.huawei.com (7.202.195.194) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The prefetch window depending on the hardware platform. The current prefetch policy may not be applicable to all platforms. In most cases, the number of packets received by Rx burst is small (64 is used in most performance reports). In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all packets before processing can achieve better performance. Signed-off-by: Dengdui Huang --- examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++----------------------------- 1 file changed, 5 insertions(+), 37 deletions(-) diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h index 3c1f827424..0b51782b8c 100644 --- a/examples/l3fwd/l3fwd_lpm_neon.h +++ b/examples/l3fwd/l3fwd_lpm_neon.h @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf **pkts_burst, const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); const int32_t m = nb_rx % FWDSTEP; - if (k) { - for (i = 0; i < FWDSTEP; i++) { - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], - void *)); - } - for (j = 0; j != k - FWDSTEP; j += FWDSTEP) { - for (i = 0; i < FWDSTEP; i++) { - rte_prefetch0(rte_pktmbuf_mtod( - pkts_burst[j + i + FWDSTEP], - void *)); - } + /* The number of packets is small. Prefetch all packets. */ + for (i = 0; i < nb_rx; i++) + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *)); + if (k) { + for (j = 0; j != k; j += FWDSTEP) { processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j], &dst_port[j]); if (do_step3) processx4_step3(&pkts_burst[j], &dst_port[j]); } - - processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); - processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j], - &dst_port[j]); - if (do_step3) - processx4_step3(&pkts_burst[j], &dst_port[j]); - - j += FWDSTEP; } if (m) { - /* Prefetch last up to 3 packets one by one */ - switch (m) { - case 3: - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], - void *)); - j++; - /* fallthrough */ - case 2: - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], - void *)); - j++; - /* fallthrough */ - case 1: - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], - void *)); - j++; - } - j -= m; /* Classify last up to 3 packets one by one */ switch (m) { case 3: -- 2.33.0