From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EC1AC46AE6; Thu, 3 Jul 2025 17:21:44 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E59B340655; Thu, 3 Jul 2025 17:21:44 +0200 (CEST) Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by mails.dpdk.org (Postfix) with ESMTP id 8FE2740659 for ; Thu, 3 Jul 2025 17:21:43 +0200 (CEST) Received: by mail-yb1-f169.google.com with SMTP id 3f1490d57ef6-e812c817de0so4866826276.0 for ; Thu, 03 Jul 2025 08:21:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; t=1751556103; x=1752160903; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Er6eW+xrGnToothP0KC5+n0EKXEbvj2qeo6LcqE4LBs=; b=PJb3DdQ6nG2BHYGJR/2kXrRC62oh/WXVrG8cKHinJsJYHESWQVGXyeR8fhqkdINUOj IjXnbWo7EpZCmWSoO8Ec+UW2PzTJ/JltvRQb7JN03AH1ZnFaQq/CSi0hSGwuBmRr78Dy HtH0HObb5vcrgIWAImixIGew45hR9ZSE3qL48= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751556103; x=1752160903; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Er6eW+xrGnToothP0KC5+n0EKXEbvj2qeo6LcqE4LBs=; b=DVg2EwUSQqqcaqZrIGnqfwkxCXx55BzDvvj4O0hsFNpE+GPQBHo/ZpvEyJbp1Nyz4Q Sf85mu6gY3WtFiI/EUvMeUrE8O3SnzSQeuq5V11tckVbJO7h2hqS58HnqUaFGF8uHi4E K+5KSV8j2eiaF86MajzrbobsSRwdFDTQ+wcZIQDXpf/mD6IWvpt/pbIrvLJl3O78CHJT ZKDEJc2zIQRZ8GeqtQIW9hqUGcRXjcXrV8SYHIEIRx2jlWCd+a+w9AVKoWP8yptP1ZBg X6OBKjF4SfmNJ3TPa/bz8PaW98eHOFHHTX6BYy9JKSFhlEr20L/UNrT0o2Zt7s1D5G6Z jrhg== X-Forwarded-Encrypted: i=1; AJvYcCXgv2Jw5HBmAyoIJTnwL3kvuHx32gGR1KhOcs10+UftT5zdIjqH0PXlmjXG5KSkmlSCAg==@dpdk.org X-Gm-Message-State: AOJu0YwxJUqVv6jggRy04hgRErQrXj/3Va35Dq+CLKYA90LRhrSYYWSJ KkHyQN3sF5ntWj3aaVC1jO+jMbsFHsPNN1o27DkQvf4OJJhJJlc8/sKBi28ukGXWW9pAp/rye98 +aSMzXIaleiQXUYUpAiPrHUdDpoksoQR2zgwOntgEgQ== X-Gm-Gg: ASbGncsnITXgxftK29Bid492QauOqbGI5xDo253s42BriDwjAisx0zd0qHfptPgP1H8 m96UFYrmDdFAYhw8HiNGAqk4vWZ2YY6m5vaChAAL3T6ldmsy0eJZQA3/62jK84mQxYFZrisXoK1 uYzXnQX3n8kM+IDK+wvbWwUVEh/YtI/TaWAkf3ASf9OmUs9A== X-Google-Smtp-Source: AGHT+IH33UtUlmouzDGXMsMlHRjB7TnBAh2nRoPReGy4Pk0ROUXUviXgG30H8LOf7/79K9E8W8ypqoex88zo9sBlJ1o= X-Received: by 2002:a05:690c:7105:b0:711:33d3:92ed with SMTP id 00721157ae682-7164d4c9992mr103341087b3.38.1751556102627; Thu, 03 Jul 2025 08:21:42 -0700 (PDT) MIME-Version: 1.0 References: <1A9A6C1A-B762-4295-BA5B-E3FB6DE10EB8@arm.com> In-Reply-To: From: Manit Mahajan Date: Thu, 3 Jul 2025 11:21:31 -0400 X-Gm-Features: Ac12FXzpST3f7SnfIPDLpPaZGQX6bB8aIoaheIyETNSTDbKawkGFS9cCk9G22fc Message-ID: Subject: Re: Intel E810 Performance Regression - ARM Grace Server To: "Richardson, Bruce" Cc: Patrick Robb , "Nagarahalli, Honnappa" , "Burakov, Anatoly" , "ci@dpdk.org" , Wathsala Wathawana Vithanage , Paul Szczepanek , "Mcnamara, John" Content-Type: multipart/alternative; boundary="0000000000004d38f5063907ef0c" X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org --0000000000004d38f5063907ef0c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Bruce, This morning, I was able to narrow down the performance issue to a specific commit. I ran performance tests on the following two commits: - d1a350c089e0 =E2=80=93 net/ice: rename 16-byte descriptor flag - 4c4b9ce017fe =E2=80=93 net/i40e: rename 16-byte descriptor flag The net/i40e commit directly precedes the net/ice commit. I observed a significant drop in mpps beginning with commit d1a350c089e0, confirming that this commit introduced the regression. Thanks, Manit On Thu, Jul 3, 2025 at 9:12=E2=80=AFAM Richardson, Bruce wrote: > Thanks Patrick, I=E2=80=99m planning on checking some performance numbers= again on > our end too. > > > > My thoughts on the ring size, is that the total number of ring slots > across all rings should be enough to ride out an expected stall. So back = in > the 10G days (max packet arrival rate of ~67ns), we would use ring sizes = of > 512 entries, which would give us just short of 35usec of buffering. Even > with 4k of a ring size, at 100G we only have 27.5 usec of buffering. Now, > admittedly CPUs are faster too, so should be less likely to stop polling > for that amount of time, but they aren=E2=80=99t 10x as fast as in the 10= G days so > I find 512 of a ring size a little small. For 100G, I would expect 2k to = be > a reasonable min ring size to test with =E2=80=93 if testing single queue= . > Obviously the more queues and cores we test with, the smaller each ring c= an > be, since the arrival rate per-ring should be lower. > > > > /Bruce > > > > *From:* Patrick Robb > *Sent:* Thursday, July 3, 2025 1:53 PM > *To:* Richardson, Bruce > *Cc:* Nagarahalli, Honnappa ; Manit Mahajan > ; Burakov, Anatoly ; > ci@dpdk.org; Wathsala Wathawana Vithanage ; > Paul Szczepanek > *Subject:* Re: Intel E810 Performance Regression - ARM Grace Server > > > > Hi Bruce, > > > > Manit can identify the specific commit this morning. > > > > You raise a good point about the descriptor count. It is worth us > assessing the performance with a broader set of descriptor counts and > deciding what set of test configurations will yield helpful results for > developers going forward. By my understanding, we want to test with a set > of descriptor counts which are basically appropriate for the given traffi= c > flow, not the other way around. We will gather more info this morning and > share it back to you. > > > > On Thu, Jul 3, 2025 at 4:43=E2=80=AFAM Richardson, Bruce < > bruce.richardson@intel.com> wrote: > > Hi Manit, > > Can you identify which patch exactly within the series is causing the > regression? We were not expecting performance to change with the patchset= , > but obviously something got missed. > I will follow up on our end to see if we see any regressions. > > I must say, though, that 512 entries is pretty small rings sizes to use > for 100G traffic. The slightest stall would cause those rings to overflow= . > What is perf like at other ring sizes, e.g. 1k or 2k? > > /Bruce > > > > -----Original Message----- > > From: Honnappa Nagarahalli > > Sent: Thursday, July 3, 2025 8:03 AM > > To: Manit Mahajan > > Cc: Burakov, Anatoly ; ci@dpdk.org; > Richardson, > > Bruce ; Wathsala Wathawana Vithanage > > ; Paul Szczepanek > > > > Subject: Re: Intel E810 Performance Regression - ARM Grace Server > > > > + Wathsala, Paul > > > > > On Jul 2, 2025, at 10:09=E2=80=AFPM, Manit Mahajan > > wrote: > > > > > > Hi we have an update about the single core forwarding test on the ARM > > Grace server with the E810 100G Ice card. There was an intel PMDs serie= s > that > > was merged a week ago which had some performance failures when it was > > going through the CI: > > https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8 > > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/ > > > > > > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html > > > > > > As you can see it causes roughly a 6% decrease in packets forwarded i= n > the > > single core forwarding test with 64Byte frames and 512 txd/rxd. The del= ta > > tolerance on the single core forwarding test is 5%, so a 6% reduction i= n > MPPS > > forwarded is a failure. > > > > > > This was merged into mainline 6 days ago, which is why some failures > started > > to come in this week for the E810 Grace test. > > > > > > To double check this, on DPDK I checked out to: > > > > > > test/event: fix event vector adapter timeouts > > (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the > > Intel PMD patchseries, and ran the test and it forwarded the > pre-regression > > MPPS that we expected. > > > > > > Then I checked out to net/intel: add common Tx mbuf recycle > > (f5fd081c86ae415515ab55cbacf10c9c50536ca1) > > > > > > and I ran the test and it had the 6% reduction in MPPS forwarded. > > > > > > Another thing to note is that regrettably the ARM Grace E810 test did > not get > > run on the v7 (the final version) of this series, which meant the > failure was not > > displayed on that version and that's probably why it was merged. We wil= l > look > > back into our job history and see why this test failed to report. > > > > > > Please let me know if you have any questions about the test, the > testbed > > environment info, or anything else. > > Thanks Manit for looking into this. Adding few folks from Arm to follow > up. > > > > > > > > Thanks, > > > Manit Mahajan > > > > IMPORTANT NOTICE: The contents of this email and any attachments are > > confidential and may also be privileged. If you are not the intended > recipient, > > please notify the sender immediately and do not disclose the contents t= o > any > > other person, use it for any purpose, or store or copy the information > in any > > medium. Thank you. > > --0000000000004d38f5063907ef0c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Bruce,=C2=A0

This morning, I was able to narrow = down the performance issue to a specific commit. I ran performance tests on= the following two commits:
  • d1a350c089e0 =E2=80=93 net/ice: rename 1= 6-byte descriptor flag
  • 4c4b9ce017fe =E2=80=93 net/i40e: rename 16-b= yte descriptor flag
The net/i40e commit directly precedes the net/= ice commit. I observed a significant drop in mpps beginning with commit d1a= 350c089e0, confirming that this commit introduced the regression.

Th= anks,
Manit

On Thu, Jul 3, 2025 at 9:12=E2=80=AFAM Richardson, Bruce &l= t;bruce.ric= hardson@intel.com> wrote:

Thanks Patrick, I=E2= =80=99m planning on checking some performance numbers again on our end too.=

=C2=A0<= /span>

My thoughts on the ri= ng size, is that the total number of ring slots across all rings should be = enough to ride out an expected stall. So back in the 10G days (max packet a= rrival rate of ~67ns), we would use ring sizes of 512 entries, which would give u= s just short of 35usec of buffering. Even with 4k of a ring size, at 100G w= e only have 27.5 usec of buffering. Now, admittedly CPUs are faster too, so= should be less likely to stop polling for that amount of time, but they aren=E2=80=99t 10x as fast as in the 10G= days so I find 512 of a ring size a little small. For 100G, I would expect= 2k to be a reasonable min ring size to test with =E2=80=93 if testing sing= le queue. Obviously the more queues and cores we test with, the smaller each ring can be, since the arrival rate per-ring s= hould be lower.

=C2=A0<= /span>

/Bruce<= /span>

=C2=A0<= /span>

From: Patrick Robb <probb@iol.unh.edu>
Sent: Thursday, July 3, 2025 1:53 PM
To: Richardson, Bruce <bruce.richardson@intel.com>
Cc: Nagarahalli, Honnappa <Honnappa.Nagarahalli@arm.com>; Manit Mah= ajan <mmahajan= @iol.unh.edu>; Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Wathsala Wathawa= na Vithanage <wathsala.vithanage@arm.com>; Paul Szczepanek <Paul.Szczepanek@arm.com= >
Subject: Re: Intel E810 Performance Regression - ARM Grace Server=

=C2=A0

Hi Bruce,

=C2=A0

Manit can identify the specific commit this morning.=

=C2=A0

You raise a good point about the descriptor count. I= t is worth us assessing the performance with a broader set of descriptor co= unts and deciding what set of test configurations will yield helpful result= s for developers going forward. By my understanding, we want to test with a set of descriptor counts which ar= e basically appropriate for the given traffic flow, not the other way aroun= d. We will gather more info this morning and share it back to you.

=C2=A0

On Thu, Jul 3, 2025 at 4:43=E2=80=AFAM Richardson, Bruce <bruce.richardson@intel.co= m> wrote:

Hi Manit,

Can you identify which patch exactly within the series is causing the regre= ssion? We were not expecting performance to change with the patchset, but o= bviously something got missed.
I will follow up on our end to see if we see any regressions.

I must say, though, that 512 entries is pretty small rings sizes to use for= 100G traffic. The slightest stall would cause those rings to overflow. Wha= t is perf like at other ring sizes, e.g. 1k or 2k?

/Bruce


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, July 3, 2025 8:03 AM
> To: Manit Mahajan <mmahajan@iol.unh.edu>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; ci@dpdk.org; Richardso= n,
> Bruce <bruce.richardson@intel.com>; Wathsala Wathawana Vithanage
> <wa= thsala.vithanage@arm.com>; Paul Szczepanek
> <Paul.= Szczepanek@arm.com>
> Subject: Re: Intel E810 Performance Regression - ARM Grace Server
>
> + Wathsala, Paul
>
> > On Jul 2, 2025, at 10:09=E2=80=AFPM, Manit Mahajan <mmahajan@iol.unh.edu>
> wrote:
> >
> > Hi we have an update about the single core forwarding test on the= ARM
> Grace server with the E810 100G Ice card. There was an intel PMDs seri= es that
> was merged a week ago which had some performance failures when it was<= br> > going through the CI:
> https://patches.dpdk.org/project/dpdk/patch/01c94afcb0b1c2795c031afc8 > 72a8faf3f0db2b5.1749229651.git.anatoly.burakov@intel.com/
> >
> > and: http://mails.dpdk.org/archives/test-report/2025-June/883654.html
> >
> > As you can see it causes roughly a 6% decrease in packets forward= ed in the
> single core forwarding test with 64Byte frames and 512 txd/rxd. The de= lta
> tolerance on the single core forwarding test is 5%, so a 6% reduction = in MPPS
> forwarded is a failure.
> >
> > This was merged into mainline 6 days ago, which is why some failu= res started
> to come in this week for the E810 Grace test.
> >
> > To double check this, on DPDK I checked out to:
> >
> > test/event: fix event vector adapter timeouts
> (2eca0f4cd5daf6cd54b8705f6f76f3003c923912) which directly precedes the=
> Intel PMD patchseries, and ran the test and it forwarded the pre-regre= ssion
> MPPS that we expected.
> >
> > Then I checked out to net/intel: add common Tx mbuf recycle
> (f5fd081c86ae415515ab55cbacf10c9c50536ca1)
> >
> > and I ran the test and it had the 6% reduction in MPPS forwarded.=
> >
> > Another thing to note is that regrettably the ARM Grace E810 test= did not get
> run on the v7 (the final version) of this series, which meant the fail= ure was not
> displayed on that version and that's probably why it was merged. W= e will look
> back into our job history and see why this test failed to report.
> >
> > Please let me know if you have any questions about the test, the = testbed
> environment info, or anything else.
> Thanks Manit for looking into this. Adding few folks from Arm to follo= w up.
>
> >
> > Thanks,
> > Manit Mahajan
>
> IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended r= ecipient,
> please notify the sender immediately and do not disclose the contents = to any
> other person, use it for any purpose, or store or copy the information= in any
> medium. Thank you.

--0000000000004d38f5063907ef0c--