Hi Bruce,

I looked at the commit and I see that it
changes RTE_LIBRTE_ICE_16BYTE_RX_DESC to RTE_NET_INTEL_USE_16BYTE_DESC. The
test I ran runs the meson setup with flag
-Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with the
new flag name.

Thanks,
Manit


On Thu, Jul 3, 2025 at 11:26 AM Patrick Robb <probb@iol.unh.edu> wrote:

> Hi Bruce,
>
> When the NIC is E810, the test runs the meson setup with
> flag -Dc_args=-DRTE_LIBRTE_ICE_16BYTE_RX_DESC
>
> I think that is what you mean? Is this setup correct?
>
> On Thu, Jul 3, 2025 at 11:22 AM Richardson, Bruce <
> bruce.richardson@intel.com> wrote:
>
>> Is the test you are running setting the 16B descriptor flag, and does it
>> need updating to take account of the new flag name?
>>
>>
>>
>> *From:* Manit Mahajan <mmahajan@iol.unh.edu>
>> *Sent:* Thursday, July 3, 2025 4:22 PM
>> *To:* Richardson, Bruce <bruce.richardson@intel.com>
>> *Cc:* Patrick Robb <probb@iol.unh.edu>; Nagarahalli, Honnappa <
>> Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly <
>> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <
>> wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>;
>> Mcnamara, John <john.mcnamara@intel.com>
>> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>>
>>
>>
>> Hi Bruce,
>>
>> This morning, I was able to narrow down the performance issue to a
>> specific commit. I ran performance tests on the following two commits:
>>
>>    - d1a350c089e0 – net/ice: rename 16-byte descriptor flag
>>    - 4c4b9ce017fe – net/i40e: rename 16-byte descriptor flag
>>
>> The net/i40e commit directly precedes the net/ice commit. I observed a
>> significant drop in mpps beginning with commit d1a350c089e0, confirming
>> that this commit introduced the regression.
>>
>> Thanks,
>> Manit
>>
>>
>>
>> On Thu, Jul 3, 2025 at 9:12 AM Richardson, Bruce <
>> bruce.richardson@intel.com> wrote:
>>
>> Thanks Patrick, I’m planning on checking some performance numbers again
>> on our end too.
>>
>>
>>
>> My thoughts on the ring size, is that the total number of ring slots
>> across all rings should be enough to ride out an expected stall. So back in
>> the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of
>> 512 entries, which would give us just short of 35usec of buffering. Even
>> with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now,
>> admittedly CPUs are faster too, so should be less likely to stop polling
>> for that amount of time, but they aren’t 10x as fast as in the 10G days so
>> I find 512 of a ring size a little small. For 100G, I would expect 2k to be
>> a reasonable min ring size to test with – if testing single queue.
>> Obviously the more queues and cores we test with, the smaller each ring can
>> be, since the arrival rate per-ring should be lower.
>>
>>
>>
>> /Bruce
>>
>>
>>
>> *From:* Patrick Robb <probb@iol.unh.edu>
>> *Sent:* Thursday, July 3, 2025 1:53 PM
>> *To:* Richardson, Bruce <bruce.richardson@intel.com>
>> *Cc:* Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit
>> Mahajan <mmahajan@iol.unh.edu>; Burakov, Anatoly <
>> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage <
>> wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>
>> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server
>>
>>
>>
>> Hi Bruce,
>>
>>
>>
>> Manit can identify the specific commit this morning.
>>
>>
>>
>> You raise a good point about the descriptor count. It is worth us
>> assessing the performance with a broader set of descriptor counts and
>> deciding what set of test configurations will yield helpful results for
>> developers going forward. By my understanding, we want to test with a set
>> of descriptor counts which are basically appropriate for the given traffic
>> flow, not the other way around. We will gather more info this morning and
>> share it back to you.
>>
>>
>>
>> On Thu, Jul 3, 2025 at 4:43 AM Richardson, Bruce <
>> bruce.richardson@intel.com> wrote:
>>
>> Hi Manit,
>>
>> Can you identify which patch exactly within the series is causing the
>> regression? We were not expecting performance to change with the patchset,
>> but obviously something got missed.
>> I will follow up on our end to see if we see any regressions.
>>
>> I must say, though, that 512 entries is pretty small rings sizes to use
>> for 100G traffic. The slightest stall would cause those rings to overflow.
>> What is perf like at other ring sizes, e.g. 1k or 2k?
>>
>> /Bruce
>>
>>
>> > -----Original Message-----
>> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>> > Sent: Thursday, July 3, 2025 8:03 AM
>> > To: Manit Mahajan <mmahajan@iol.unh.edu>
>> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org;
>> Richardson,
>> > Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
>> > <wathsala.vithanage@arm.com>; Paul Szczepanek
>> > <Paul.Szczepanek@arm.com>
>> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>> >
>> > + Wathsala, Paul
>> >
>> > > On Jul 2, 2025, at 10:09 PM, Manit Mahajan <mmahajan@iol.unh.edu>
>> > wrote:
>> > >
>> > > Hi we have an update about the single core forwarding test on the ARM
>> > Grace server with the E810 100G Ice card. There was an intel PMDs
>> series that
>> > was merged a week ago which had some performance failures when it was
>> > going through the CI:
>> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8
>> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
>> > >
>> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
>> > >
>> > > As you can see it causes roughly a 6% decrease in packets forwarded
>> in the
>> > single core forwarding test with 64Byte frames and 512 txd/rxd. The
>> delta
>> > tolerance on the single core forwarding test is 5%, so a 6% reduction
>> in MPPS
>> > forwarded is a failure.
>> > >
>> > > This was merged into mainline 6 days ago, which is why some failures
>> started
>> > to come in this week for the E810 Grace test.
>> > >
>> > > To double check this, on DPDK I checked out to:
>> > >
>> > > test/event: fix event vector adapter timeouts
>> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the
>> > Intel PMD patchseries, and ran the test and it forwarded the
>> pre-regression
>> > MPPS that we expected.
>> > >
>> > > Then I checked out to net/intel: add common Tx mbuf recycle
>> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
>> > >
>> > > and I ran the test and it had the 6% reduction in MPPS forwarded.
>> > >
>> > > Another thing to note is that regrettably the ARM Grace E810 test did
>> not get
>> > run on the v7 (the final version) of this series, which meant the
>> failure was not
>> > displayed on that version and that's probably why it was merged. We
>> will look
>> > back into our job history and see why this test failed to report.
>> > >
>> > > Please let me know if you have any questions about the test, the
>> testbed
>> > environment info, or anything else.
>> > Thanks Manit for looking into this. Adding few folks from Arm to follow
>> up.
>> >
>> > >
>> > > Thanks,
>> > > Manit Mahajan
>> >
>> > IMPORTANT NOTICE: The contents of this email and any attachments are
>> > confidential and may also be privileged. If you are not the intended
>> recipient,
>> > please notify the sender immediately and do not disclose the contents
>> to any
>> > other person, use it for any purpose, or store or copy the information
>> in any
>> > medium. Thank you.
>>
>>