Re: [PATCH] examples/l3fwd: optimize packet prefetch

DPDK patches and discussions
 help / color / mirror / Atom feed

From: huangdengdui <huangdengdui@huawei.com>
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "wathsala.vithanage@arm.com" <wathsala.vithanage@arm.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	liuyonglong <liuyonglong@huawei.com>,
	Fengchengwen <fengchengwen@huawei.com>,
	haijie <haijie1@huawei.com>,
	"lihuisong (C)" <lihuisong@huawei.com>
Subject: Re: [PATCH] examples/l3fwd: optimize packet prefetch
Date: Thu, 9 Jan 2025 19:31:00 +0800	[thread overview]
Message-ID: <68164537-b1a3-4135-a8c0-5aad66b7ce73@huawei.com> (raw)
In-Reply-To: <a966e66c538946d9b22ed337c77d9b25@huawei.com>


On 2025/1/8 21:42, Konstantin Ananyev wrote:
> 
> 
>>
>> The prefetch window depending on the hardware platform. The current prefetch
>> policy may not be applicable to all platforms. In most cases, the number of
>> packets received by Rx burst is small (64 is used in most performance reports).
>> In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all
>> packets before processing can achieve better performance.
> 
> As you mentioned 'prefetch' behavior differs a lot from one HW platform to another.
> So it could easily be that changes you suggesting will cause performance
> boost on one platform and degradation on another.
> In fact, right now l3fwd 'prefetch' usage is a bit of mess:
> - l3fwd_lpm_neon.h uses  FWDSTEP as a prefetch window.
> - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose
> - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at all
>  
> Probably what we need here is some unified approach:
> configurable at run-time prefetch_window_size that all code-paths will obey. 

Agreed, I'll add a parameter to configure the prefetch window.

> 
>> Signed-off-by: Dengdui Huang <huangdengdui@huawei.com>
>> ---
>>  examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++-----------------------------
>>  1 file changed, 5 insertions(+), 37 deletions(-)
>>
>> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h
>> index 3c1f827424..0b51782b8c 100644
>> --- a/examples/l3fwd/l3fwd_lpm_neon.h
>> +++ b/examples/l3fwd/l3fwd_lpm_neon.h
>> @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf **pkts_burst,
>>  	const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
>>  	const int32_t m = nb_rx % FWDSTEP;
>>
>> -	if (k) {
>> -		for (i = 0; i < FWDSTEP; i++) {
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
>> -							void *));
>> -		}
>> -		for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
>> -			for (i = 0; i < FWDSTEP; i++) {
>> -				rte_prefetch0(rte_pktmbuf_mtod(
>> -						pkts_burst[j + i + FWDSTEP],
>> -						void *));
>> -			}
>> +	/* The number of packets is small. Prefetch all packets. */
>> +	for (i = 0; i < nb_rx; i++)
>> +		rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *));
>>
>> +	if (k) {
>> +		for (j = 0; j != k; j += FWDSTEP) {
>>  			processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>>  			processx4_step2(qconf, dip, ipv4_flag, portid,
>>  					&pkts_burst[j], &dst_port[j]);
>>  			if (do_step3)
>>  				processx4_step3(&pkts_burst[j], &dst_port[j]);
>>  		}
>> -
>> -		processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>> -		processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j],
>> -				&dst_port[j]);
>> -		if (do_step3)
>> -			processx4_step3(&pkts_burst[j], &dst_port[j]);
>> -
>> -		j += FWDSTEP;
>>  	}
>>
>>  	if (m) {
>> -		/* Prefetch last up to 3 packets one by one */
>> -		switch (m) {
>> -		case 3:
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -							void *));
>> -			j++;
>> -			/* fallthrough */
>> -		case 2:
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -							void *));
>> -			j++;
>> -			/* fallthrough */
>> -		case 1:
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -							void *));
>> -			j++;
>> -		}
>> -		j -= m;
>>  		/* Classify last up to 3 packets one by one */
>>  		switch (m) {
>>  		case 3:
>> --
>> 2.33.0
>

next prev parent reply	other threads:[~2025-01-09 11:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-25  7:53 Dengdui Huang
2024-12-25 21:21 ` Stephen Hemminger
2025-01-08 13:42 ` Konstantin Ananyev
2025-01-09 11:31   ` huangdengdui [this message]
2025-01-10  9:37 ` [PATCH] examples/l3fwd: add option to set refetch offset Dengdui Huang
2025-01-10 17:19   ` Stephen Hemminger
2025-01-14  9:23     ` huangdengdui
2025-01-10 17:20   ` Stephen Hemminger
2025-01-14  9:22     ` huangdengdui
2025-01-14 16:07       ` Stephen Hemminger
2025-01-14 16:11         ` Stephen Hemminger
2025-02-19 16:44   ` Konstantin Ananyev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68164537-b1a3-4135-a8c0-5aad66b7ce73@huawei.com \
    --to=huangdengdui@huawei.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=haijie1@huawei.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=lihuisong@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=stephen@networkplumber.org \
    --cc=wathsala.vithanage@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).