From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 39E8946AE6; Thu, 3 Jul 2025 17:31:36 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 34DC640267; Thu, 3 Jul 2025 17:31:36 +0200 (CEST) Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by mails.dpdk.org (Postfix) with ESMTP id A47D140653 for ; Thu, 3 Jul 2025 17:31:34 +0200 (CEST) Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-70a57a8ffc3so84881207b3.0 for ; Thu, 03 Jul 2025 08:31:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; t=1751556694; x=1752161494; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=CtZir6Ddf+EheXuyInhrUaQ60bexRF6ryeDOFENBNeI=; b=eaRrYk9PXpGVZG7Nlj0pMMhEbU2+NFP5TD0Z7NvutayMjlpOL5uPpxHStHb+kLiOaU MIa9U+OKBkAvUYgp6zfancrOASeTLGZ4kfKDQgMt/OUe2o4dLcKdzXLb8PBSJCWGTRl4 dg21ns58JKNF3IpROpUALapQTIc9otBlbA3no= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751556694; x=1752161494; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CtZir6Ddf+EheXuyInhrUaQ60bexRF6ryeDOFENBNeI=; b=rpp2NF6S258/l6i9/Em2wzvtWhnh3UNVQpmZY9ghso4GYVXNYf4eIHejN0zUG1JZv8 6u4HQgkmyqs7BxczlE8SLeurHGjmnz9QNseg8GuawnGHZnkFDPGTcLd1kDLiLorZJ4GZ oqd+AOj22W2G+qGku46tKdUw1cf2ezWqJDyziaob2bR4n/Aav8vNjhwNbP6Dzh3EA5mo xtfVntN1941xjY2zujMbPLZFVzaUZFFKN9pTT3Jym552dx9KLFb+jnCaHFts8yBvoAap cqOxp9wqyBFzF85kuihK0qlfCZWPBUVG6h9JVtuXQoy1iispvxwW1aeDn0O2Bk5j/6eE ZKPA== X-Forwarded-Encrypted: i=1; AJvYcCXDO+rEoJXDXHo1vQVyyp8OrJagaRcRHr8337owNZmfoOtFFAOsxYLELOqlEtjXmAmnsg==@dpdk.org X-Gm-Message-State: AOJu0YzPARJ6MtKos2wwpEuZbER1aiW+mXbagYqxTj6ZhOFmtL0dl8mc WzbJOa2nHuVD7rnzj2H4KLDxQJkFwQY76VJlyD4JSHSgGbb1+0aFZ6G8AJUmc9+QgdhK3vOy3Dz BYed7I37Q3KAnIJ6ihxsDJQo1qLlrSuTqv6AiINwixw== X-Gm-Gg: ASbGncvM75YVyUDVS4AEHN9HJn6bq0OK7OoObTLJAVX0Y5KrvQcojPH8R2GPZ9uDHFS oIuZJ+vPAi14bMeR6vDppXXC0CyUlb0MXQIr7xixsSTdn/SlBej1Qj4J1mkjDXYX8/yi9TRUgLS x9KstTr3vm3g4MLgOtgrD5A59xJKRca4qq77LkOFdp19VSVA== X-Google-Smtp-Source: AGHT+IFTnykSEFax/Fh5DvOJhOReMlE0FM3EnOk9uQCmvgTbqBP03ZZVMj0WIyoodGD4i5kSftjp6O/kfsURxsudo34= X-Received: by 2002:a05:690c:4908:b0:714:5cb:df37 with SMTP id 00721157ae682-71658fbc49fmr60474117b3.1.1751556693834; Thu, 03 Jul 2025 08:31:33 -0700 (PDT) MIME-Version: 1.0 References: <1A9A6C1A-B762-4295-BA5B-E3FB6DE10EB8@arm.com> In-Reply-To: From: Manit Mahajan Date: Thu, 3 Jul 2025 11:31:22 -0400 X-Gm-Features: Ac12FXzYrGLy2_oBL3UEHmZ56OZ_BJyR3-L6C3kr1cVLee3EEu8msdx5Ew5VVuE Message-ID: Subject: Re: Intel E810 Performance Regression - ARM Grace Server To: Patrick Robb Cc: "Richardson, Bruce" , "Nagarahalli, Honnappa" , "Burakov, Anatoly" , "ci@dpdk.org" , Wathsala Wathawana Vithanage , Paul Szczepanek , "Mcnamara, John" Content-Type: multipart/alternative; boundary="0000000000008a4be10639081279" X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org --0000000000008a4be10639081279 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Bruce, I looked at the commit and I see that it changes RTE_LIBRTE_ICE_16BYTE_RX_DESC to RTE_NET_INTEL_USE_16BYTE_DESC. The test I ran runs the meson setup with flag -Dc_args=3D-DRTE_LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with th= e new flag name. Thanks, Manit On Thu, Jul 3, 2025 at 11:26=E2=80=AFAM Patrick Robb wr= ote: > Hi Bruce, > > When the NIC is E810, the test runs the meson setup with > flag -Dc_args=3D-DRTE_LIBRTE_ICE_16BYTE_RX_DESC > > I think that is what you mean? Is this setup correct? > > On Thu, Jul 3, 2025 at 11:22=E2=80=AFAM Richardson, Bruce < > bruce.richardson@intel.com> wrote: > >> Is the test you are running setting the 16B descriptor flag, and does it >> need updating to take account of the new flag name? >> >> >> >> *From:* Manit Mahajan >> *Sent:* Thursday, July 3, 2025 4:22 PM >> *To:* Richardson, Bruce >> *Cc:* Patrick Robb ; Nagarahalli, Honnappa < >> Honnappa.Nagarahalli@arm.com>; Burakov, Anatoly < >> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage < >> wathsala.vithanage@arm.com>; Paul Szczepanek ; >> Mcnamara, John >> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server >> >> >> >> Hi Bruce, >> >> This morning, I was able to narrow down the performance issue to a >> specific commit. I ran performance tests on the following two commits: >> >> - d1a350c089e0 =E2=80=93 net/ice: rename 16-byte descriptor flag >> - 4c4b9ce017fe =E2=80=93 net/i40e: rename 16-byte descriptor flag >> >> The net/i40e commit directly precedes the net/ice commit. I observed a >> significant drop in mpps beginning with commit d1a350c089e0, confirming >> that this commit introduced the regression. >> >> Thanks, >> Manit >> >> >> >> On Thu, Jul 3, 2025 at 9:12=E2=80=AFAM Richardson, Bruce < >> bruce.richardson@intel.com> wrote: >> >> Thanks Patrick, I=E2=80=99m planning on checking some performance number= s again >> on our end too. >> >> >> >> My thoughts on the ring size, is that the total number of ring slots >> across all rings should be enough to ride out an expected stall. So back= in >> the 10G days (max packet arrival rate of ~67ns), we would use ring sizes= of >> 512 entries, which would give us just short of 35usec of buffering. Even >> with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now= , >> admittedly CPUs are faster too, so should be less likely to stop polling >> for that amount of time, but they aren=E2=80=99t 10x as fast as in the 1= 0G days so >> I find 512 of a ring size a little small. For 100G, I would expect 2k to= be >> a reasonable min ring size to test with =E2=80=93 if testing single queu= e. >> Obviously the more queues and cores we test with, the smaller each ring = can >> be, since the arrival rate per-ring should be lower. >> >> >> >> /Bruce >> >> >> >> *From:* Patrick Robb >> *Sent:* Thursday, July 3, 2025 1:53 PM >> *To:* Richardson, Bruce >> *Cc:* Nagarahalli, Honnappa ; Manit >> Mahajan ; Burakov, Anatoly < >> anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithanage < >> wathsala.vithanage@arm.com>; Paul Szczepanek >> *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server >> >> >> >> Hi Bruce, >> >> >> >> Manit can identify the specific commit this morning. >> >> >> >> You raise a good point about the descriptor count. It is worth us >> assessing the performance with a broader set of descriptor counts and >> deciding what set of test configurations will yield helpful results for >> developers going forward. By my understanding, we want to test with a se= t >> of descriptor counts which are basically appropriate for the given traff= ic >> flow, not the other way around. We will gather more info this morning an= d >> share it back to you. >> >> >> >> On Thu, Jul 3, 2025 at 4:43=E2=80=AFAM Richardson, Bruce < >> bruce.richardson@intel.com> wrote: >> >> Hi Manit, >> >> Can you identify which patch exactly within the series is causing the >> regression? We were not expecting performance to change with the patchse= t, >> but obviously something got missed. >> I will follow up on our end to see if we see any regressions. >> >> I must say, though, that 512 entries is pretty small rings sizes to use >> for 100G traffic. The slightest stall would cause those rings to overflo= w. >> What is perf like at other ring sizes, e.g. 1k or 2k? >> >> /Bruce >> >> >> > -----Original Message----- >> > From: Honnappa Nagarahalli >> > Sent: Thursday, July 3, 2025 8:03 AM >> > To: Manit Mahajan >> > Cc: Burakov, Anatoly ; ci@dpdk.org; >> Richardson, >> > Bruce ; Wathsala Wathawana Vithanage >> > ; Paul Szczepanek >> > >> > Subject: Re: Intel E810 Performance Regression - ARM Grace Server >> > >> > + Wathsala, Paul >> > >> > > On Jul 2, 2025, at 10:09=E2=80=AFPM, Manit Mahajan >> > wrote: >> > > >> > > Hi we have an update about the single core forwarding test on the AR= M >> > Grace server with the E810 100G Ice card. There was an intel PMDs >> series that >> > was merged a week ago which had some performance failures when it was >> > going through the CI: >> > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8 >> > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/ >> > > >> > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.htm= l >> > > >> > > As you can see it causes roughly a 6% decrease in packets forwarded >> in the >> > single core forwarding test with 64Byte frames and 512 txd/rxd. The >> delta >> > tolerance on the single core forwarding test is 5%, so a 6% reduction >> in MPPS >> > forwarded is a failure. >> > > >> > > This was merged into mainline 6 days ago, which is why some failures >> started >> > to come in this week for the E810 Grace test. >> > > >> > > To double check this, on DPDK I checked out to: >> > > >> > > test/event: fix event vector adapter timeouts >> > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the >> > Intel PMD patchseries, and ran the test and it forwarded the >> pre-regression >> > MPPS that we expected. >> > > >> > > Then I checked out to net/intel: add common Tx mbuf recycle >> > (f5fd081c86ae415515ab55cbacf10c9c50536ca1) >> > > >> > > and I ran the test and it had the 6% reduction in MPPS forwarded. >> > > >> > > Another thing to note is that regrettably the ARM Grace E810 test di= d >> not get >> > run on the v7 (the final version) of this series, which meant the >> failure was not >> > displayed on that version and that's probably why it was merged. We >> will look >> > back into our job history and see why this test failed to report. >> > > >> > > Please let me know if you have any questions about the test, the >> testbed >> > environment info, or anything else. >> > Thanks Manit for looking into this. Adding few folks from Arm to follo= w >> up. >> > >> > > >> > > Thanks, >> > > Manit Mahajan >> > >> > IMPORTANT NOTICE: The contents of this email and any attachments are >> > confidential and may also be privileged. If you are not the intended >> recipient, >> > please notify the sender immediately and do not disclose the contents >> to any >> > other person, use it for any purpose, or store or copy the information >> in any >> > medium. Thank you. >> >> --0000000000008a4be10639081279 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Bruce,=C2=A0

I looked at the commit and I see th= at it changes=C2=A0RTE_LIBRTE_ICE_16BYTE_RX_DESC to=C2=A0RTE_NET_INTEL_USE_= 16BYTE_DESC. The test I ran runs the meson setup with flag -Dc_args=3D-DRTE= _LIBRTE_ICE_16BYTE_RX_DESC. I will run another test with the new flag name.= =C2=A0

Thanks,
Manit


On Thu, Jul 3, = 2025 at 11:26=E2=80=AFAM Patrick Robb <probb@iol.unh.edu> wrote:
Hi Bruce,=C2=A0

Wh= en the NIC is E810, the test runs the meson setup with flag=C2=A0-Dc_args= =3D-DRTE_LIBRTE_ICE_16BYTE_RX_DESC

I think that is what you mean? Is= this setup correct?

On Thu, Jul 3, 2025 at 11:22=E2=80=AFAM Richardso= n, Bruce <bruce.richardson@intel.com> wrote:

Is the test you are r= unning setting the 16B descriptor flag, and does it need updating to take a= ccount of the new flag name?

=C2=A0<= /span>

From: Manit Mahajan <mmahajan@iol.unh.edu>= ;
Sent: Thursday, July 3, 2025 4:22 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Patrick Robb <probb@iol.unh.edu>; Nagarahalli, Honnappa <Honnappa.Nagarahalli@ar= m.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawana Vithan= age <wat= hsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com>; Mcnamara, John <john.mcnamara@intel.com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server=

=C2=A0

Hi Bruce,=C2=A0

This morning, I was able to narrow down the performance issue to a specific= commit. I ran performance tests on the following two commits:

  • d1a350c089e0 =E2=80=93 net/ice: rename 16-byte descriptor flag
  • 4c4b9ce017fe =E2=80=93 net/i40e: rename 16-byte descriptor flag

The net/i40e commit directly precedes the net/ice co= mmit. I observed a significant drop in mpps beginning with commit d1a350c08= 9e0, confirming that this commit introduced the regression.

Thanks,
Manit

=C2=A0

On Thu, Jul 3, 2025 at 9:12=E2=80=AFAM Richardson, Bruce <bruce.richardson@intel.co= m> wrote:

Thanks Patrick, I=E2= =80=99m planning on checking some performance numbers again on our end too.=

=C2=A0<= u>

My thoughts on the ri= ng size, is that the total number of ring slots across all rings should be = enough to ride out an expected stall. So back in the 10G days (max packet arrival rate of ~67ns), we would use ring sizes of 51= 2 entries, which would give us just short of 35usec of buffering. Even with= 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, admit= tedly CPUs are faster too, so should be less likely to stop polling for that amount of time, but they aren=E2= =80=99t 10x as fast as in the 10G days so I find 512 of a ring size a littl= e small. For 100G, I would expect 2k to be a reasonable min ring size to te= st with =E2=80=93 if testing single queue. Obviously the more queues and cores we test with, the smaller each ring can be, sinc= e the arrival rate per-ring should be lower.

=C2=A0<= u>

/Bruce<= u>

=C2=A0<= u>

From: Patrick Robb <probb@iol.= unh.edu>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit Mah= ajan <mmahajan= @iol.unh.edu>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala = Wathawana Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.= com>
Subject: Re: Intel E810 Performance Regression - ARM Grace Server

=C2=A0

Hi Bruce,

=C2=A0

Manit can identify the specific commit this morning.=

=C2=A0

You raise a good point about the descriptor count. I= t is worth us assessing the performance with a broader set of descriptor co= unts and deciding what set of test configurations will yield helpful results for developers going forward. By my understandi= ng, we want to test with a set of descriptor counts which are basically app= ropriate for the given traffic flow, not the other way around. We will gath= er more info this morning and share it back to you.

=C2=A0

On Thu, Jul 3, 2025 at 4:43=E2=80=AFAM Richardson, Bruce <bruce.richardson@intel.co= m> wrote:

Hi Manit,

Can you identify which patch exactly within the series is causing the regre= ssion? We were not expecting performance to change with the patchset, but o= bviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for= 100G traffic. The slightest stall would cause those rings to overflow. Wha= t is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Richardso= n,
> Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> <wa= thsala.vithanage@arm.com>; Paul Szczepanek
> <Paul.= Szczepanek@arm.com>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09=E2=80=AFPM, Manit Mahajan <mmahajan@iol.unh.edu>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the= ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs seri= es that
> was merged a week ago which had some performance failures when it was<= br> > going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8 > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forward= ed in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The de= lta
> tolerance on the single core forwarding test is 5%, so a 6% reduction = in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failu= res started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the=
> Intel PMD patchseries, and ran the test and it forwarded the pre-regre= ssion
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.=
> >
> > Another thing to note is that regrettably the ARM Grace E810 test= did not get
> run on the v7 (the final version) of this series, which meant the fail= ure was not
> displayed on that version and that's probably why it was merged. W= e will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the = testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follo= w up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended r= ecipient,
> please notify the sender immediately and do not disclose the contents = to any
> other person, use it for any purpose, or store or copy the information= in any
> medium. Thank you.

--0000000000008a4be10639081279--