Intel E810 Performance Regression

DPDK CI discussions
 help / color / mirror / Atom feed

* Intel E810 Performance Regression - ARM Grace Server
@ 2025-07-02 21:09 Manit Mahajan
  2025-07-03  7:03 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 13+ messages in thread
From: Manit Mahajan @ 2025-07-02 21:09 UTC (permalink / raw)
  To: anatoly.burakov; +Cc: ci, bruce.richardson, honnappa.nagarahalli

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

Hi we have an update about the single core forwarding test on the ARM Grace
server with the E810 100G Ice card. There was an intel PMDs series that was
merged a week ago which had some performance failures when it was going
through the CI:
https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc872a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/

and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html

As you can see it causes roughly a 6% decrease in packets forwarded in the
single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
tolerance on the single core forwarding test is 5%, so a 6% reduction in
MPPS forwarded is a failure.

This was merged into mainline 6 days ago, which is why some failures
started to come in this week for the E810 Grace test.

To double check this, on DPDK I checked out to:

test/event: fix event vector adapter timeouts
(2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
Intel PMD patchseries, and ran the test and it forwarded the pre-regression
MPPS that we expected.

Then I checked out to net/intel: add common Tx mbuf recycle
(f5fd081c86ae415515ab55cbacf10c9c50536ca1)

and I ran the test and it had the 6% reduction in MPPS forwarded.

Another thing to note is that regrettably the ARM Grace E810 test did not
get run on the v7 (the final version) of this series, which meant the
failure was not displayed on that version and that's probably why it was
merged. We will look back into our job history and see why this test failed
to report.

Please let me know if you have any questions about the test, the testbed
environment info, or anything else.

Thanks,
Manit Mahajan

[-- Attachment #2: Type: text/html, Size: 2022 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel E810 Performance Regression - ARM Grace Server
  2025-07-02 21:09 Intel E810 Performance Regression - ARM Grace Server Manit Mahajan
@ 2025-07-03  7:03 ` Honnappa Nagarahalli
  2025-07-03  8:42   ` Richardson, Bruce
  0 siblings, 1 reply; 13+ messages in thread
From: Honnappa Nagarahalli @ 2025-07-03  7:03 UTC (permalink / raw)
  To: Manit Mahajan
  Cc: anatoly.burakov, ci, bruce.richardson,
	Wathsala Wathawana Vithanage, Paul Szczepanek

+ Wathsala, Paul

> On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu> wrote:
>
> Hi we have an update about the single core forwarding test on the ARM Grace server with the E810 100G Ice card. There was an intel PMDs series that was merged a week ago which had some performance failures when it was going through the CI: https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc872a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
>
> and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
>
> As you can see it causes roughly a 6% decrease in packets forwarded in the single core forwarding test with 64Byte frames and 512 txd/rxd. The delta tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS forwarded is a failure.
>
> This was merged into mainline 6 days ago, which is why some failures started to come in this week for the E810 Grace test.
>
> To double check this, on DPDK I checked out to:
>
> test/event: fix event vector adapter timeouts (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the Intel PMD patchseries, and ran the test and it forwarded the pre-regression MPPS that we expected.
>
> Then I checked out to net/intel: add common Tx mbuf recycle (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
>
> and I ran the test and it had the 6% reduction in MPPS forwarded.
>
> Another thing to note is that regrettably the ARM Grace E810 test did not get run on the v7 (the final version) of this series, which meant the failure was not displayed on that version and that's probably why it was merged. We will look back into our job history and see why this test failed to report.
>
> Please let me know if you have any questions about the test, the testbed environment info, or anything else.
Thanks Manit for looking into this. Adding few folks from Arm to follow up.

>
> Thanks,
> Manit Mahajan

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03  7:03 ` Honnappa Nagarahalli
@ 2025-07-03  8:42   ` Richardson, Bruce
  2025-07-03 12:53     ` Patrick Robb
  0 siblings, 1 reply; 13+ messages in thread
From: Richardson, Bruce @ 2025-07-03  8:42 UTC (permalink / raw)
  To: Nagarahalli, Honnappa, Manit Mahajan
  Cc: Burakov, Anatoly, ci, Wathsala Wathawana Vithanage, Paul Szczepanek

Hi Manit,

Can you identify which patch exactly within the series is causing the regression? We were not expecting performance to change with the patchset, but obviously something got missed.
I will follow up on our end to see if we see any regressions. 

I must say, though, that 512 entries is pretty small rings sizes to use for 100G traffic. The slightest stall would cause those rings to overflow. What is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Richardson,
> Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com>; Paul Szczepanek
> <Paul.Szczepanek@arm.com>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
> 
> + Wathsala, Paul
> 
> > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs series that
> was merged a week ago which had some performance failures when it was
> going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forwarded in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failures started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> Intel PMD patchseries, and ran the test and it forwarded the pre-regression
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.
> >
> > Another thing to note is that regrettably the ARM Grace E810 test did not get
> run on the v7 (the final version) of this series, which meant the failure was not
> displayed on that version and that's probably why it was merged. We will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follow up.
> 
> >
> > Thanks,
> > Manit Mahajan
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03  8:42   ` Richardson, Bruce
@ 2025-07-03 12:53     ` Patrick Robb
  2025-07-03 13:11       ` Richardson, Bruce
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Robb @ 2025-07-03 12:53 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Nagarahalli, Honnappa, Manit Mahajan, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek

[-- Attachment #1: Type: text/plain, Size: 4140 bytes --]

Hi Bruce,

Manit can identify the specific commit this morning.

You raise a good point about the descriptor count. It is worth us assessing
the performance with a broader set of descriptor counts and deciding what
set of test configurations will yield helpful results for developers going
forward. By my understanding, we want to test with a set of descriptor
counts which are basically appropriate for the given traffic flow, not the
other way around. We will gather more info this morning and share it back
to you.

On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <bruce.richardson@intel.com>
wrote:

> Hi Manit,
>
> Can you identify which patch exactly within the series is causing the
> regression? We were not expecting performance to change with the patchset,
> but obviously something got missed.
> I will follow up on our end to see if we see any regressions.
>
> I must say, though, that 512 entries is pretty small rings sizes to use
> for 100G traffic. The slightest stall would cause those rings to overflow.
> What is perf like at other ring sizes, e.g. 1k or 2k?
>
> /Bruce
>
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Sent: Thursday, July 3, 2025 8:03 AM
> > To: Manit Mahajan <mmahajan@iol.unh.edu>
> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org;
> Richardson,
> > Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> > <wathsala.vithanage@arm.com>; Paul Szczepanek
> > <Paul.Szczepanek@arm.com>
> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server
> >
> > + Wathsala, Paul
> >
> > > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
> > wrote:
> > >
> > > Hi we have an update about the single core forwarding test on the ARM
> > Grace server with the E810 100G Ice card. There was an intel PMDs series
> that
> > was merged a week ago which had some performance failures when it was
> > going through the CI:
> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> > >
> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> > >
> > > As you can see it causes roughly a 6% decrease in packets forwarded in
> the
> > single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> > tolerance on the single core forwarding test is 5%, so a 6% reduction in
> MPPS
> > forwarded is a failure.
> > >
> > > This was merged into mainline 6 days ago, which is why some failures
> started
> > to come in this week for the E810 Grace test.
> > >
> > > To double check this, on DPDK I checked out to:
> > >
> > > test/event: fix event vector adapter timeouts
> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> > Intel PMD patchseries, and ran the test and it forwarded the
> pre-regression
> > MPPS that we expected.
> > >
> > > Then I checked out to net/intel: add common Tx mbuf recycle
> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> > >
> > > and I ran the test and it had the 6% reduction in MPPS forwarded.
> > >
> > > Another thing to note is that regrettably the ARM Grace E810 test did
> not get
> > run on the v7 (the final version) of this series, which meant the
> failure was not
> > displayed on that version and that's probably why it was merged. We will
> look
> > back into our job history and see why this test failed to report.
> > >
> > > Please let me know if you have any questions about the test, the
> testbed
> > environment info, or anything else.
> > Thanks Manit for looking into this. Adding few folks from Arm to follow
> up.
> >
> > >
> > > Thanks,
> > > Manit Mahajan
> >
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> recipient,
> > please notify the sender immediately and do not disclose the contents to
> any
> > other person, use it for any purpose, or store or copy the information
> in any
> > medium. Thank you.
>

[-- Attachment #2: Type: text/html, Size: 5706 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 12:53     ` Patrick Robb
@ 2025-07-03 13:11       ` Richardson, Bruce
  2025-07-03 15:21         ` Manit Mahajan
  0 siblings, 1 reply; 13+ messages in thread
From: Richardson, Bruce @ 2025-07-03 13:11 UTC (permalink / raw)
  To: Patrick Robb
  Cc: Nagarahalli, Honnappa, Manit Mahajan, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 5641 bytes --]

Thanks Patrick, I’m planning on checking some performance numbers again on our end too.

My thoughts on the ring size, is that the total number of ring slots across all rings should be enough to ride out an expected stall. So back in the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of 512 entries, which would give us just short of 35usec of buffering. Even with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, admittedly CPUs are faster too, so should be less likely to stop polling for that amount of time, but they aren’t 10x as fast as in the 10G days so I find 512 of a ring size a little small. For 100G, I would expect 2k to be a reasonable min ring size to test with – if testing single queue. Obviously the more queues and cores we test with, the smaller each ring can be, since the arrival rate per-ring should be lower.

/Bruce

From: Patrick Robb <probb@iol.unh.edu>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit Mahajan <mmahajan@iol.unh.edu>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

Manit can identify the specific commit this morning.

You raise a good point about the descriptor count. It is worth us assessing the performance with a broader set of descriptor counts and deciding what set of test configurations will yield helpful results for developers going forward. By my understanding, we want to test with a set of descriptor counts which are basically appropriate for the given traffic flow, not the other way around. We will gather more info this morning and share it back to you.

On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Hi Manit,

Can you identify which patch exactly within the series is causing the regression? We were not expecting performance to change with the patchset, but obviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for 100G traffic. The slightest stall would cause those rings to overflow. What is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Richardson,
> Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek
> <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs series that
> was merged a week ago which had some performance failures when it was
> going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/<http://72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/>
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forwarded in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failures started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> Intel PMD patchseries, and ran the test and it forwarded the pre-regression
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.
> >
> > Another thing to note is that regrettably the ARM Grace E810 test did not get
> run on the v7 (the final version) of this series, which meant the failure was not
> displayed on that version and that's probably why it was merged. We will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follow up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

[-- Attachment #2: Type: text/html, Size: 10126 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 15:22           ` Richardson, Bruce
@ 2025-07-03 15:20             ` Patrick Robb
  2025-07-03 15:30               ` Richardson, Bruce
  2025-07-03 15:31               ` Manit Mahajan
  0 siblings, 2 replies; 13+ messages in thread
From: Patrick Robb @ 2025-07-03 15:20 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Manit Mahajan, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 7209 bytes --]

Hi Bruce,

When the NIC is E810, the test runs the meson setup with
flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC

I think that is what you mean? Is this setup correct?

On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <
bruce.richardson@intel.com> wrote:

> Is the test you are running setting the 16B descriptor flag, and does it
> need updating to take account of the new flag name?
>
>
>
> *From:* Manit Mahajan <mmahajan@iol.unh.edu>
> *Sent:* Thursday, July 3, 2025 4:22 PM
> *To:* Richardson, Bruce <bruce.richardson@intel.com>
> *Cc:* Patrick Robb <probb@iol.unh.edu>; Nagarahalli, Honnappa <
> Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>;
> Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <
> john.mcnamara@intel.com>
> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>
>
>
> Hi Bruce,
>
> This morning, I was able to narrow down the performance issue to a
> specific commit. I ran performance tests on the following two commits:
>
>    - d1a350c089e0 – net/ice: rename 16-byte descriptor flag
>    - 4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
>
> The net/i40e commit directly precedes the net/ice commit. I observed a
> significant drop in mpps beginning with commit d1a350c089e0, confirming
> that this commit introduced the regression.
>
> Thanks,
> Manit
>
>
>
> On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
> Thanks Patrick, I’m planning on checking some performance numbers again on
> our end too.
>
>
>
> My thoughts on the ring size, is that the total number of ring slots
> across all rings should be enough to ride out an expected stall. So back in
> the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of
> 512 entries, which would give us just short of 35usec of buffering. Even
> with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now,
> admittedly CPUs are faster too, so should be less likely to stop polling
> for that amount of time, but they aren’t 10x as fast as in the 10G days so
> I find 512 of a ring size a little small. For 100G, I would expect 2k to be
> a reasonable min ring size to test with – if testing single queue.
> Obviously the more queues and cores we test with, the smaller each ring can
> be, since the arrival rate per-ring should be lower.
>
>
>
> /Bruce
>
>
>
> *From:* Patrick Robb <probb@iol.unh.edu>
> *Sent:* Thursday, July 3, 2025 1:53 PM
> *To:* Richardson, Bruce <bruce.richardson@intel.com>
> *Cc:* Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit Mahajan
> <mmahajan@iol.unh.edu>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>;
> Paul Szczepanek <Paul.Szczepanek@arm.com>
> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>
>
>
> Hi Bruce,
>
>
>
> Manit can identify the specific commit this morning.
>
>
>
> You raise a good point about the descriptor count. It is worth us
> assessing the performance with a broader set of descriptor counts and
> deciding what set of test configurations will yield helpful results for
> developers going forward. By my understanding, we want to test with a set
> of descriptor counts which are basically appropriate for the given traffic
> flow, not the other way around. We will gather more info this morning and
> share it back to you.
>
>
>
> On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
> Hi Manit,
>
> Can you identify which patch exactly within the series is causing the
> regression? We were not expecting performance to change with the patchset,
> but obviously something got missed.
> I will follow up on our end to see if we see any regressions.
>
> I must say, though, that 512 entries is pretty small rings sizes to use
> for 100G traffic. The slightest stall would cause those rings to overflow.
> What is perf like at other ring sizes, e.g. 1k or 2k?
>
> /Bruce
>
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Sent: Thursday, July 3, 2025 8:03 AM
> > To: Manit Mahajan <mmahajan@iol.unh.edu>
> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org;
> Richardson,
> > Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> > <wathsala.vithanage@arm.com>; Paul Szczepanek
> > <Paul.Szczepanek@arm.com>
> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server
> >
> > + Wathsala, Paul
> >
> > > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
> > wrote:
> > >
> > > Hi we have an update about the single core forwarding test on the ARM
> > Grace server with the E810 100G Ice card. There was an intel PMDs series
> that
> > was merged a week ago which had some performance failures when it was
> > going through the CI:
> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> > >
> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> > >
> > > As you can see it causes roughly a 6% decrease in packets forwarded in
> the
> > single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> > tolerance on the single core forwarding test is 5%, so a 6% reduction in
> MPPS
> > forwarded is a failure.
> > >
> > > This was merged into mainline 6 days ago, which is why some failures
> started
> > to come in this week for the E810 Grace test.
> > >
> > > To double check this, on DPDK I checked out to:
> > >
> > > test/event: fix event vector adapter timeouts
> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> > Intel PMD patchseries, and ran the test and it forwarded the
> pre-regression
> > MPPS that we expected.
> > >
> > > Then I checked out to net/intel: add common Tx mbuf recycle
> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> > >
> > > and I ran the test and it had the 6% reduction in MPPS forwarded.
> > >
> > > Another thing to note is that regrettably the ARM Grace E810 test did
> not get
> > run on the v7 (the final version) of this series, which meant the
> failure was not
> > displayed on that version and that's probably why it was merged. We will
> look
> > back into our job history and see why this test failed to report.
> > >
> > > Please let me know if you have any questions about the test, the
> testbed
> > environment info, or anything else.
> > Thanks Manit for looking into this. Adding few folks from Arm to follow
> up.
> >
> > >
> > > Thanks,
> > > Manit Mahajan
> >
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> recipient,
> > please notify the sender immediately and do not disclose the contents to
> any
> > other person, use it for any purpose, or store or copy the information
> in any
> > medium. Thank you.
>
>

[-- Attachment #2: Type: text/html, Size: 12964 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 13:11       ` Richardson, Bruce
@ 2025-07-03 15:21         ` Manit Mahajan
  2025-07-03 15:22           ` Richardson, Bruce
  0 siblings, 1 reply; 13+ messages in thread
From: Manit Mahajan @ 2025-07-03 15:21 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Patrick Robb, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 6223 bytes --]

Hi Bruce,

This morning, I was able to narrow down the performance issue to a specific
commit. I ran performance tests on the following two commits:

   - d1a350c089e0 – net/ice: rename 16-byte descriptor flag
   - 4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag

The net/i40e commit directly precedes the net/ice commit. I observed a
significant drop in mpps beginning with commit d1a350c089e0, confirming
that this commit introduced the regression.

Thanks,
Manit

On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <bruce.richardson@intel.com>
wrote:

> Thanks Patrick, I’m planning on checking some performance numbers again on
> our end too.
>
>
>
> My thoughts on the ring size, is that the total number of ring slots
> across all rings should be enough to ride out an expected stall. So back in
> the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of
> 512 entries, which would give us just short of 35usec of buffering. Even
> with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now,
> admittedly CPUs are faster too, so should be less likely to stop polling
> for that amount of time, but they aren’t 10x as fast as in the 10G days so
> I find 512 of a ring size a little small. For 100G, I would expect 2k to be
> a reasonable min ring size to test with – if testing single queue.
> Obviously the more queues and cores we test with, the smaller each ring can
> be, since the arrival rate per-ring should be lower.
>
>
>
> /Bruce
>
>
>
> *From:* Patrick Robb <probb@iol.unh.edu>
> *Sent:* Thursday, July 3, 2025 1:53 PM
> *To:* Richardson, Bruce <bruce.richardson@intel.com>
> *Cc:* Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit Mahajan
> <mmahajan@iol.unh.edu>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>;
> Paul Szczepanek <Paul.Szczepanek@arm.com>
> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>
>
>
> Hi Bruce,
>
>
>
> Manit can identify the specific commit this morning.
>
>
>
> You raise a good point about the descriptor count. It is worth us
> assessing the performance with a broader set of descriptor counts and
> deciding what set of test configurations will yield helpful results for
> developers going forward. By my understanding, we want to test with a set
> of descriptor counts which are basically appropriate for the given traffic
> flow, not the other way around. We will gather more info this morning and
> share it back to you.
>
>
>
> On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
> Hi Manit,
>
> Can you identify which patch exactly within the series is causing the
> regression? We were not expecting performance to change with the patchset,
> but obviously something got missed.
> I will follow up on our end to see if we see any regressions.
>
> I must say, though, that 512 entries is pretty small rings sizes to use
> for 100G traffic. The slightest stall would cause those rings to overflow.
> What is perf like at other ring sizes, e.g. 1k or 2k?
>
> /Bruce
>
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Sent: Thursday, July 3, 2025 8:03 AM
> > To: Manit Mahajan <mmahajan@iol.unh.edu>
> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org;
> Richardson,
> > Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> > <wathsala.vithanage@arm.com>; Paul Szczepanek
> > <Paul.Szczepanek@arm.com>
> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server
> >
> > + Wathsala, Paul
> >
> > > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
> > wrote:
> > >
> > > Hi we have an update about the single core forwarding test on the ARM
> > Grace server with the E810 100G Ice card. There was an intel PMDs series
> that
> > was merged a week ago which had some performance failures when it was
> > going through the CI:
> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> > >
> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> > >
> > > As you can see it causes roughly a 6% decrease in packets forwarded in
> the
> > single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> > tolerance on the single core forwarding test is 5%, so a 6% reduction in
> MPPS
> > forwarded is a failure.
> > >
> > > This was merged into mainline 6 days ago, which is why some failures
> started
> > to come in this week for the E810 Grace test.
> > >
> > > To double check this, on DPDK I checked out to:
> > >
> > > test/event: fix event vector adapter timeouts
> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> > Intel PMD patchseries, and ran the test and it forwarded the
> pre-regression
> > MPPS that we expected.
> > >
> > > Then I checked out to net/intel: add common Tx mbuf recycle
> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> > >
> > > and I ran the test and it had the 6% reduction in MPPS forwarded.
> > >
> > > Another thing to note is that regrettably the ARM Grace E810 test did
> not get
> > run on the v7 (the final version) of this series, which meant the
> failure was not
> > displayed on that version and that's probably why it was merged. We will
> look
> > back into our job history and see why this test failed to report.
> > >
> > > Please let me know if you have any questions about the test, the
> testbed
> > environment info, or anything else.
> > Thanks Manit for looking into this. Adding few folks from Arm to follow
> up.
> >
> > >
> > > Thanks,
> > > Manit Mahajan
> >
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> recipient,
> > please notify the sender immediately and do not disclose the contents to
> any
> > other person, use it for any purpose, or store or copy the information
> in any
> > medium. Thank you.
>
>

[-- Attachment #2: Type: text/html, Size: 9967 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 15:21         ` Manit Mahajan
@ 2025-07-03 15:22           ` Richardson, Bruce
  2025-07-03 15:20             ` Patrick Robb
  0 siblings, 1 reply; 13+ messages in thread
From: Richardson, Bruce @ 2025-07-03 15:22 UTC (permalink / raw)
  To: Manit Mahajan
  Cc: Patrick Robb, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 7129 bytes --]

Is the test you are running setting the 16B descriptor flag, and does it need updating to take account of the new flag name?

From: Manit Mahajan <mmahajan@iol.unh.edu>
Sent: Thursday, July 3, 2025 4:22 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Patrick Robb <probb@iol.unh.edu>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <john.mcnamara@intel.com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

This morning, I was able to narrow down the performance issue to a specific commit. I ran performance tests on the following two commits:

  *   d1a350c089e0 – net/ice: rename 16-byte descriptor flag
  *   4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
The net/i40e commit directly precedes the net/ice commit. I observed a significant drop in mpps beginning with commit d1a350c089e0, confirming that this commit introduced the regression.

Thanks,
Manit

On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Thanks Patrick, I’m planning on checking some performance numbers again on our end too.

My thoughts on the ring size, is that the total number of ring slots across all rings should be enough to ride out an expected stall. So back in the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of 512 entries, which would give us just short of 35usec of buffering. Even with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, admittedly CPUs are faster too, so should be less likely to stop polling for that amount of time, but they aren’t 10x as fast as in the 10G days so I find 512 of a ring size a little small. For 100G, I would expect 2k to be a reasonable min ring size to test with – if testing single queue. Obviously the more queues and cores we test with, the smaller each ring can be, since the arrival rate per-ring should be lower.

/Bruce

From: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

Manit can identify the specific commit this morning.

You raise a good point about the descriptor count. It is worth us assessing the performance with a broader set of descriptor counts and deciding what set of test configurations will yield helpful results for developers going forward. By my understanding, we want to test with a set of descriptor counts which are basically appropriate for the given traffic flow, not the other way around. We will gather more info this morning and share it back to you.

On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Hi Manit,

Can you identify which patch exactly within the series is causing the regression? We were not expecting performance to change with the patchset, but obviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for 100G traffic. The slightest stall would cause those rings to overflow. What is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Richardson,
> Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek
> <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs series that
> was merged a week ago which had some performance failures when it was
> going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/<http://72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/>
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forwarded in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failures started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> Intel PMD patchseries, and ran the test and it forwarded the pre-regression
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.
> >
> > Another thing to note is that regrettably the ARM Grace E810 test did not get
> run on the v7 (the final version) of this series, which meant the failure was not
> displayed on that version and that's probably why it was merged. We will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follow up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

[-- Attachment #2: Type: text/html, Size: 16425 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 15:20             ` Patrick Robb
@ 2025-07-03 15:30               ` Richardson, Bruce
  2025-07-03 15:31               ` Manit Mahajan
  1 sibling, 0 replies; 13+ messages in thread
From: Richardson, Bruce @ 2025-07-03 15:30 UTC (permalink / raw)
  To: Patrick Robb
  Cc: Manit Mahajan, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 8815 bytes --]

That will need to be renamed based on the patch merged. We have switched from one flag per driver to one flag across both i40e and ice. Use RTE_NET_INTEL_USE_16BYTE_DESC now. For safety, in your script I recommend just using both old and new flags.

As an aside, I’m not happy in general at having this build-time switch at all. I’d really like to get away from it, as I consider it an unrealistic scenario. The flag for ICE driver is not documented anywhere, 32Byte descriptors are the default, and to enable a number of offloads we need the extra descriptor space that 32byte descriptors provide.

From: Patrick Robb <probb@iol.unh.edu>
Sent: Thursday, July 3, 2025 4:21 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Manit Mahajan <mmahajan@iol.unh.edu>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <john.mcnamara@intel.com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

When the NIC is E810, the test runs the meson setup with flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC

I think that is what you mean? Is this setup correct?

On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Is the test you are running setting the 16B descriptor flag, and does it need updating to take account of the new flag name?

From: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
Sent: Thursday, July 3, 2025 4:22 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>; Mcnamara, John <john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

This morning, I was able to narrow down the performance issue to a specific commit. I ran performance tests on the following two commits:

  *   d1a350c089e0 – net/ice: rename 16-byte descriptor flag
  *   4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
The net/i40e commit directly precedes the net/ice commit. I observed a significant drop in mpps beginning with commit d1a350c089e0, confirming that this commit introduced the regression.

Thanks,
Manit

On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Thanks Patrick, I’m planning on checking some performance numbers again on our end too.

My thoughts on the ring size, is that the total number of ring slots across all rings should be enough to ride out an expected stall. So back in the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of 512 entries, which would give us just short of 35usec of buffering. Even with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, admittedly CPUs are faster too, so should be less likely to stop polling for that amount of time, but they aren’t 10x as fast as in the 10G days so I find 512 of a ring size a little small. For 100G, I would expect 2k to be a reasonable min ring size to test with – if testing single queue. Obviously the more queues and cores we test with, the smaller each ring can be, since the arrival rate per-ring should be lower.

/Bruce

From: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

Manit can identify the specific commit this morning.

You raise a good point about the descriptor count. It is worth us assessing the performance with a broader set of descriptor counts and deciding what set of test configurations will yield helpful results for developers going forward. By my understanding, we want to test with a set of descriptor counts which are basically appropriate for the given traffic flow, not the other way around. We will gather more info this morning and share it back to you.

On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Hi Manit,

Can you identify which patch exactly within the series is causing the regression? We were not expecting performance to change with the patchset, but obviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for 100G traffic. The slightest stall would cause those rings to overflow. What is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Richardson,
> Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek
> <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs series that
> was merged a week ago which had some performance failures when it was
> going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/<http://72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/>
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forwarded in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failures started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> Intel PMD patchseries, and ran the test and it forwarded the pre-regression
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.
> >
> > Another thing to note is that regrettably the ARM Grace E810 test did not get
> run on the v7 (the final version) of this series, which meant the failure was not
> displayed on that version and that's probably why it was merged. We will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follow up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

[-- Attachment #2: Type: text/html, Size: 20231 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 15:20             ` Patrick Robb
  2025-07-03 15:30               ` Richardson, Bruce
@ 2025-07-03 15:31               ` Manit Mahajan
  2025-07-03 15:39                 ` Richardson, Bruce
  1 sibling, 1 reply; 13+ messages in thread
From: Manit Mahajan @ 2025-07-03 15:31 UTC (permalink / raw)
  To: Patrick Robb
  Cc: Richardson, Bruce, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 7782 bytes --]

Hi Bruce,

I looked at the commit and I see that it
changes RTE_LIBRTE_ICE_16BYTE_RX_DESC to RTE_NET_INTEL_USE_16BYTE_DESC. The
test I ran runs the meson setup with flag
-Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with the
new flag name.

Thanks,
Manit


On Thu, Jul 3, 2025 at 11:26 AM Patrick Robb <probb@iol.unh.edu> wrote:

> Hi Bruce,
>
> When the NIC is E810, the test runs the meson setup with
> flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC
>
> I think that is what you mean? Is this setup correct?
>
> On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
>> Is the test you are running setting the 16B descriptor flag, and does it
>> need updating to take account of the new flag name?
>>
>>
>>
>> *From:* Manit Mahajan <mmahajan@iol.unh.edu>
>> *Sent:* Thursday, July 3, 2025 4:22 PM
>> *To:* Richardson, Bruce <bruce.richardson@intel.com>
>> *Cc:* Patrick Robb <probb@iol.unh.edu>; Nagarahalli, Honnappa <
>> Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <
>> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <
>> wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>;
>> Mcnamara, John <john.mcnamara@intel.com>
>> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>>
>>
>>
>> Hi Bruce,
>>
>> This morning, I was able to narrow down the performance issue to a
>> specific commit. I ran performance tests on the following two commits:
>>
>>    - d1a350c089e0 – net/ice: rename 16-byte descriptor flag
>>    - 4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
>>
>> The net/i40e commit directly precedes the net/ice commit. I observed a
>> significant drop in mpps beginning with commit d1a350c089e0, confirming
>> that this commit introduced the regression.
>>
>> Thanks,
>> Manit
>>
>>
>>
>> On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <
>> bruce.richardson@intel.com> wrote:
>>
>> Thanks Patrick, I’m planning on checking some performance numbers again
>> on our end too.
>>
>>
>>
>> My thoughts on the ring size, is that the total number of ring slots
>> across all rings should be enough to ride out an expected stall. So back in
>> the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of
>> 512 entries, which would give us just short of 35usec of buffering. Even
>> with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now,
>> admittedly CPUs are faster too, so should be less likely to stop polling
>> for that amount of time, but they aren’t 10x as fast as in the 10G days so
>> I find 512 of a ring size a little small. For 100G, I would expect 2k to be
>> a reasonable min ring size to test with – if testing single queue.
>> Obviously the more queues and cores we test with, the smaller each ring can
>> be, since the arrival rate per-ring should be lower.
>>
>>
>>
>> /Bruce
>>
>>
>>
>> *From:* Patrick Robb <probb@iol.unh.edu>
>> *Sent:* Thursday, July 3, 2025 1:53 PM
>> *To:* Richardson, Bruce <bruce.richardson@intel.com>
>> *Cc:* Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit
>> Mahajan <mmahajan@iol.unh.edu>; Burakov, Anatoly <
>> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <
>> wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>
>> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>>
>>
>>
>> Hi Bruce,
>>
>>
>>
>> Manit can identify the specific commit this morning.
>>
>>
>>
>> You raise a good point about the descriptor count. It is worth us
>> assessing the performance with a broader set of descriptor counts and
>> deciding what set of test configurations will yield helpful results for
>> developers going forward. By my understanding, we want to test with a set
>> of descriptor counts which are basically appropriate for the given traffic
>> flow, not the other way around. We will gather more info this morning and
>> share it back to you.
>>
>>
>>
>> On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <
>> bruce.richardson@intel.com> wrote:
>>
>> Hi Manit,
>>
>> Can you identify which patch exactly within the series is causing the
>> regression? We were not expecting performance to change with the patchset,
>> but obviously something got missed.
>> I will follow up on our end to see if we see any regressions.
>>
>> I must say, though, that 512 entries is pretty small rings sizes to use
>> for 100G traffic. The slightest stall would cause those rings to overflow.
>> What is perf like at other ring sizes, e.g. 1k or 2k?
>>
>> /Bruce
>>
>>
>> > -----Original Message-----
>> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>> > Sent: Thursday, July 3, 2025 8:03 AM
>> > To: Manit Mahajan <mmahajan@iol.unh.edu>
>> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org;
>> Richardson,
>> > Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
>> > <wathsala.vithanage@arm.com>; Paul Szczepanek
>> > <Paul.Szczepanek@arm.com>
>> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>> >
>> > + Wathsala, Paul
>> >
>> > > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
>> > wrote:
>> > >
>> > > Hi we have an update about the single core forwarding test on the ARM
>> > Grace server with the E810 100G Ice card. There was an intel PMDs
>> series that
>> > was merged a week ago which had some performance failures when it was
>> > going through the CI:
>> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
>> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
>> > >
>> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
>> > >
>> > > As you can see it causes roughly a 6% decrease in packets forwarded
>> in the
>> > single core forwarding test with 64Byte frames and 512 txd/rxd. The
>> delta
>> > tolerance on the single core forwarding test is 5%, so a 6% reduction
>> in MPPS
>> > forwarded is a failure.
>> > >
>> > > This was merged into mainline 6 days ago, which is why some failures
>> started
>> > to come in this week for the E810 Grace test.
>> > >
>> > > To double check this, on DPDK I checked out to:
>> > >
>> > > test/event: fix event vector adapter timeouts
>> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
>> > Intel PMD patchseries, and ran the test and it forwarded the
>> pre-regression
>> > MPPS that we expected.
>> > >
>> > > Then I checked out to net/intel: add common Tx mbuf recycle
>> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
>> > >
>> > > and I ran the test and it had the 6% reduction in MPPS forwarded.
>> > >
>> > > Another thing to note is that regrettably the ARM Grace E810 test did
>> not get
>> > run on the v7 (the final version) of this series, which meant the
>> failure was not
>> > displayed on that version and that's probably why it was merged. We
>> will look
>> > back into our job history and see why this test failed to report.
>> > >
>> > > Please let me know if you have any questions about the test, the
>> testbed
>> > environment info, or anything else.
>> > Thanks Manit for looking into this. Adding few folks from Arm to follow
>> up.
>> >
>> > >
>> > > Thanks,
>> > > Manit Mahajan
>> >
>> > IMPORTANT NOTICE: The contents of this email and any attachments are
>> > confidential and may also be privileged. If you are not the intended
>> recipient,
>> > please notify the sender immediately and do not disclose the contents
>> to any
>> > other person, use it for any purpose, or store or copy the information
>> in any
>> > medium. Thank you.
>>
>>

[-- Attachment #2: Type: text/html, Size: 13529 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 15:31               ` Manit Mahajan
@ 2025-07-03 15:39                 ` Richardson, Bruce
  2025-07-03 17:38                   ` Patrick Robb
  0 siblings, 1 reply; 13+ messages in thread
From: Richardson, Bruce @ 2025-07-03 15:39 UTC (permalink / raw)
  To: Manit Mahajan, Patrick Robb
  Cc: Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 8727 bytes --]

Suggest testing with both flag names set, just for safety and backward compatibility. Having the old flag still defined is harmless.

From: Manit Mahajan <mmahajan@iol.unh.edu>
Sent: Thursday, July 3, 2025 4:31 PM
To: Patrick Robb <probb@iol.unh.edu>
Cc: Richardson, Bruce <bruce.richardson@intel.com>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <john.mcnamara@intel.com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

I looked at the commit and I see that it changes RTE_LIBRTE_ICE_16BYTE_RX_DESC to RTE_NET_INTEL_USE_16BYTE_DESC. The test I ran runs the meson setup with flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with the new flag name.

Thanks,
Manit

On Thu, Jul 3, 2025 at 11:26 AM Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>> wrote:
Hi Bruce,

When the NIC is E810, the test runs the meson setup with flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC

I think that is what you mean? Is this setup correct?

On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Is the test you are running setting the 16B descriptor flag, and does it need updating to take account of the new flag name?

From: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
Sent: Thursday, July 3, 2025 4:22 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>; Mcnamara, John <john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

This morning, I was able to narrow down the performance issue to a specific commit. I ran performance tests on the following two commits:

  *   d1a350c089e0 – net/ice: rename 16-byte descriptor flag
  *   4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
The net/i40e commit directly precedes the net/ice commit. I observed a significant drop in mpps beginning with commit d1a350c089e0, confirming that this commit introduced the regression.

Thanks,
Manit

On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Thanks Patrick, I’m planning on checking some performance numbers again on our end too.

My thoughts on the ring size, is that the total number of ring slots across all rings should be enough to ride out an expected stall. So back in the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of 512 entries, which would give us just short of 35usec of buffering. Even with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, admittedly CPUs are faster too, so should be less likely to stop polling for that amount of time, but they aren’t 10x as fast as in the 10G days so I find 512 of a ring size a little small. For 100G, I would expect 2k to be a reasonable min ring size to test with – if testing single queue. Obviously the more queues and cores we test with, the smaller each ring can be, since the arrival rate per-ring should be lower.

/Bruce

From: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

Manit can identify the specific commit this morning.

You raise a good point about the descriptor count. It is worth us assessing the performance with a broader set of descriptor counts and deciding what set of test configurations will yield helpful results for developers going forward. By my understanding, we want to test with a set of descriptor counts which are basically appropriate for the given traffic flow, not the other way around. We will gather more info this morning and share it back to you.

On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Hi Manit,

Can you identify which patch exactly within the series is causing the regression? We were not expecting performance to change with the patchset, but obviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for 100G traffic. The slightest stall would cause those rings to overflow. What is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Richardson,
> Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek
> <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs series that
> was merged a week ago which had some performance failures when it was
> going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/<http://72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/>
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forwarded in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failures started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> Intel PMD patchseries, and ran the test and it forwarded the pre-regression
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.
> >
> > Another thing to note is that regrettably the ARM Grace E810 test did not get
> run on the v7 (the final version) of this series, which meant the failure was not
> displayed on that version and that's probably why it was merged. We will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follow up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

[-- Attachment #2: Type: text/html, Size: 20400 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 15:39                 ` Richardson, Bruce
@ 2025-07-03 17:38                   ` Patrick Robb
  2025-07-03 18:54                     ` Mcnamara, John
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Robb @ 2025-07-03 17:38 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Manit Mahajan, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek, Mcnamara, John

[-- Attachment #1: Type: text/plain, Size: 8557 bytes --]

We finished up on Slack but I'm just noting for the CI mailing list that
this is resolved now as we are using the new flag in the test, thanks.

On Thu, Jul 3, 2025 at 11:39 AM Richardson, Bruce <
bruce.richardson@intel.com> wrote:

> Suggest testing with both flag names set, just for safety and backward
> compatibility. Having the old flag still defined is harmless.
>
>
>
> *From:* Manit Mahajan <mmahajan@iol.unh.edu>
> *Sent:* Thursday, July 3, 2025 4:31 PM
> *To:* Patrick Robb <probb@iol.unh.edu>
> *Cc:* Richardson, Bruce <bruce.richardson@intel.com>; Nagarahalli,
> Honnappa <Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <
> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <
> wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>;
> Mcnamara, John <john.mcnamara@intel.com>
> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>
>
>
> Hi Bruce,
>
> I looked at the commit and I see that it
> changes RTE_LIBRTE_ICE_16BYTE_RX_DESC to RTE_NET_INTEL_USE_16BYTE_DESC. The
> test I ran runs the meson setup with flag
> -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with the
> new flag name.
>
> Thanks,
> Manit
>
>
>
> On Thu, Jul 3, 2025 at 11:26 AM Patrick Robb <probb@iol.unh.edu> wrote:
>
> Hi Bruce,
>
>
>
> When the NIC is E810, the test runs the meson setup with
> flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC
>
> I think that is what you mean? Is this setup correct?
>
>
>
> On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
> Is the test you are running setting the 16B descriptor flag, and does it
> need updating to take account of the new flag name?
>
>
>
> *From:* Manit Mahajan <mmahajan@iol.unh.edu>
> *Sent:* Thursday, July 3, 2025 4:22 PM
> *To:* Richardson, Bruce <bruce.richardson@intel.com>
> *Cc:* Patrick Robb <probb@iol.unh.edu>; Nagarahalli, Honnappa <
> Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>;
> Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <
> john.mcnamara@intel.com>
> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>
>
>
> Hi Bruce,
>
> This morning, I was able to narrow down the performance issue to a
> specific commit. I ran performance tests on the following two commits:
>
>    - d1a350c089e0 – net/ice: rename 16-byte descriptor flag
>    - 4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
>
> The net/i40e commit directly precedes the net/ice commit. I observed a
> significant drop in mpps beginning with commit d1a350c089e0, confirming
> that this commit introduced the regression.
>
> Thanks,
> Manit
>
>
>
> On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
> Thanks Patrick, I’m planning on checking some performance numbers again on
> our end too.
>
>
>
> My thoughts on the ring size, is that the total number of ring slots
> across all rings should be enough to ride out an expected stall. So back in
> the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of
> 512 entries, which would give us just short of 35usec of buffering. Even
> with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now,
> admittedly CPUs are faster too, so should be less likely to stop polling
> for that amount of time, but they aren’t 10x as fast as in the 10G days so
> I find 512 of a ring size a little small. For 100G, I would expect 2k to be
> a reasonable min ring size to test with – if testing single queue.
> Obviously the more queues and cores we test with, the smaller each ring can
> be, since the arrival rate per-ring should be lower.
>
>
>
> /Bruce
>
>
>
> *From:* Patrick Robb <probb@iol.unh.edu>
> *Sent:* Thursday, July 3, 2025 1:53 PM
> *To:* Richardson, Bruce <bruce.richardson@intel.com>
> *Cc:* Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit Mahajan
> <mmahajan@iol.unh.edu>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>;
> Paul Szczepanek <Paul.Szczepanek@arm.com>
> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>
>
>
> Hi Bruce,
>
>
>
> Manit can identify the specific commit this morning.
>
>
>
> You raise a good point about the descriptor count. It is worth us
> assessing the performance with a broader set of descriptor counts and
> deciding what set of test configurations will yield helpful results for
> developers going forward. By my understanding, we want to test with a set
> of descriptor counts which are basically appropriate for the given traffic
> flow, not the other way around. We will gather more info this morning and
> share it back to you.
>
>
>
> On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
> Hi Manit,
>
> Can you identify which patch exactly within the series is causing the
> regression? We were not expecting performance to change with the patchset,
> but obviously something got missed.
> I will follow up on our end to see if we see any regressions.
>
> I must say, though, that 512 entries is pretty small rings sizes to use
> for 100G traffic. The slightest stall would cause those rings to overflow.
> What is perf like at other ring sizes, e.g. 1k or 2k?
>
> /Bruce
>
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Sent: Thursday, July 3, 2025 8:03 AM
> > To: Manit Mahajan <mmahajan@iol.unh.edu>
> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org;
> Richardson,
> > Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> > <wathsala.vithanage@arm.com>; Paul Szczepanek
> > <Paul.Szczepanek@arm.com>
> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server
> >
> > + Wathsala, Paul
> >
> > > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
> > wrote:
> > >
> > > Hi we have an update about the single core forwarding test on the ARM
> > Grace server with the E810 100G Ice card. There was an intel PMDs series
> that
> > was merged a week ago which had some performance failures when it was
> > going through the CI:
> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> > >
> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> > >
> > > As you can see it causes roughly a 6% decrease in packets forwarded in
> the
> > single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> > tolerance on the single core forwarding test is 5%, so a 6% reduction in
> MPPS
> > forwarded is a failure.
> > >
> > > This was merged into mainline 6 days ago, which is why some failures
> started
> > to come in this week for the E810 Grace test.
> > >
> > > To double check this, on DPDK I checked out to:
> > >
> > > test/event: fix event vector adapter timeouts
> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> > Intel PMD patchseries, and ran the test and it forwarded the
> pre-regression
> > MPPS that we expected.
> > >
> > > Then I checked out to net/intel: add common Tx mbuf recycle
> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> > >
> > > and I ran the test and it had the 6% reduction in MPPS forwarded.
> > >
> > > Another thing to note is that regrettably the ARM Grace E810 test did
> not get
> > run on the v7 (the final version) of this series, which meant the
> failure was not
> > displayed on that version and that's probably why it was merged. We will
> look
> > back into our job history and see why this test failed to report.
> > >
> > > Please let me know if you have any questions about the test, the
> testbed
> > environment info, or anything else.
> > Thanks Manit for looking into this. Adding few folks from Arm to follow
> up.
> >
> > >
> > > Thanks,
> > > Manit Mahajan
> >
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> recipient,
> > please notify the sender immediately and do not disclose the contents to
> any
> > other person, use it for any purpose, or store or copy the information
> in any
> > medium. Thank you.
>
>

[-- Attachment #2: Type: text/html, Size: 16633 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Intel E810 Performance Regression - ARM Grace Server
  2025-07-03 17:38                   ` Patrick Robb
@ 2025-07-03 18:54                     ` Mcnamara, John
  0 siblings, 0 replies; 13+ messages in thread
From: Mcnamara, John @ 2025-07-03 18:54 UTC (permalink / raw)
  To: Patrick Robb, Richardson, Bruce
  Cc: Manit Mahajan, Nagarahalli, Honnappa, Burakov, Anatoly, ci,
	Wathsala Wathawana Vithanage, Paul Szczepanek

[-- Attachment #1: Type: text/plain, Size: 9849 bytes --]

Thanks everyone for spotting and then resolving this issue.

John

From: Patrick Robb <probb@iol.unh.edu>
Sent: Thursday, July 3, 2025 6:38 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Manit Mahajan <mmahajan@iol.unh.edu>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <john.mcnamara@intel.com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

We finished up on Slack but I'm just noting for the CI mailing list that this is resolved now as we are using the new flag in the test, thanks.

On Thu, Jul 3, 2025 at 11:39 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Suggest testing with both flag names set, just for safety and backward compatibility. Having the old flag still defined is harmless.

From: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
Sent: Thursday, July 3, 2025 4:31 PM
To: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>
Cc: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>; Mcnamara, John <john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

I looked at the commit and I see that it changes RTE_LIBRTE_ICE_16BYTE_RX_DESC to RTE_NET_INTEL_USE_16BYTE_DESC. The test I ran runs the meson setup with flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with the new flag name.

Thanks,
Manit

On Thu, Jul 3, 2025 at 11:26 AM Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>> wrote:
Hi Bruce,

When the NIC is E810, the test runs the meson setup with flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC

I think that is what you mean? Is this setup correct?

On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Is the test you are running setting the 16B descriptor flag, and does it need updating to take account of the new flag name?

From: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
Sent: Thursday, July 3, 2025 4:22 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>; Mcnamara, John <john.mcnamara@intel.com<mailto:john.mcnamara@intel.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

This morning, I was able to narrow down the performance issue to a specific commit. I ran performance tests on the following two commits:

  *   d1a350c089e0 – net/ice: rename 16-byte descriptor flag
  *   4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
The net/i40e commit directly precedes the net/ice commit. I observed a significant drop in mpps beginning with commit d1a350c089e0, confirming that this commit introduced the regression.

Thanks,
Manit

On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Thanks Patrick, I’m planning on checking some performance numbers again on our end too.

My thoughts on the ring size, is that the total number of ring slots across all rings should be enough to ride out an expected stall. So back in the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of 512 entries, which would give us just short of 35usec of buffering. Even with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, admittedly CPUs are faster too, so should be less likely to stop polling for that amount of time, but they aren’t 10x as fast as in the 10G days so I find 512 of a ring size a little small. For 100G, I would expect 2k to be a reasonable min ring size to test with – if testing single queue. Obviously the more queues and cores we test with, the smaller each ring can be, since the arrival rate per-ring should be lower.

/Bruce

From: Patrick Robb <probb@iol.unh.edu<mailto:probb@iol.unh.edu>>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

Hi Bruce,

Manit can identify the specific commit this morning.

You raise a good point about the descriptor count. It is worth us assessing the performance with a broader set of descriptor counts and deciding what set of test configurations will yield helpful results for developers going forward. By my understanding, we want to test with a set of descriptor counts which are basically appropriate for the given traffic flow, not the other way around. We will gather more info this morning and share it back to you.

On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:
Hi Manit,

Can you identify which patch exactly within the series is causing the regression? We were not expecting performance to change with the patchset, but obviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for 100G traffic. The slightest stall would cause those rings to overflow. What is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Richardson,
> Bruce <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com<mailto:wathsala.vithanage@arm.com>>; Paul Szczepanek
> <Paul.Szczepanek@arm.com<mailto:Paul.Szczepanek@arm.com>>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu<mailto:mmahajan@iol.unh.edu>>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs series that
> was merged a week ago which had some performance failures when it was
> going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
> 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/<http://72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/>
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forwarded in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The delta
> tolerance on the single core forwarding test is 5%, so a 6% reduction in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failures started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
> Intel PMD patchseries, and ran the test and it forwarded the pre-regression
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.
> >
> > Another thing to note is that regrettably the ARM Grace E810 test did not get
> run on the v7 (the final version) of this series, which meant the failure was not
> displayed on that version and that's probably why it was merged. We will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follow up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

[-- Attachment #2: Type: text/html, Size: 24254 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-07-03 18:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-02 21:09 Intel E810 Performance Regression - ARM Grace Server Manit Mahajan
2025-07-03  7:03 ` Honnappa Nagarahalli
2025-07-03  8:42   ` Richardson, Bruce
2025-07-03 12:53     ` Patrick Robb
2025-07-03 13:11       ` Richardson, Bruce
2025-07-03 15:21         ` Manit Mahajan
2025-07-03 15:22           ` Richardson, Bruce
2025-07-03 15:20             ` Patrick Robb
2025-07-03 15:30               ` Richardson, Bruce
2025-07-03 15:31               ` Manit Mahajan
2025-07-03 15:39                 ` Richardson, Bruce
2025-07-03 17:38                   ` Patrick Robb
2025-07-03 18:54                     ` Mcnamara, John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).