From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1E3A5A0352 for ; Mon, 21 Feb 2022 18:33:10 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DEED44068C; Mon, 21 Feb 2022 18:33:09 +0100 (CET) Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by mails.dpdk.org (Postfix) with ESMTP id ED95940141 for ; Fri, 18 Feb 2022 14:49:32 +0100 (CET) Received: by mail-lf1-f41.google.com with SMTP id j15so5537035lfe.11 for ; Fri, 18 Feb 2022 05:49:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LI+YVMR5RuOD9OMUUDFB98t6httcX8BRA/UtXc9DkR8=; b=p/ezn+dKW8kqVrsBy1P1bePGxt5Xsh3LjNHcZ+OWA22tKtkyFkkXXYH+WgGRsyJC3H ujjtX2tdfraWkNkn+YItC88gNYqFhIug/+1xrb828teAM5or2xfN1C4zZqgdRe2H8wQM TZlCNoc6Xk+gvQwO+xufRhLpiJrwxYIqpQhVP/uECrsxwLdrBcHfRT3eZTOb30UEJkJU b+BLOL0D47/YmZwOtL0buycv9/XPRkoOmJhdtC6m2v+oXp/vVWsdZ2dHOjMRe9tnF2s/ RO6zDMeUKCSYoa84hrGNvVWq8YYLZd4+bqFKoR0RQyWKmPl70vKJSYFmW8lLsJJzGjfQ r94g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LI+YVMR5RuOD9OMUUDFB98t6httcX8BRA/UtXc9DkR8=; b=fYB+FOFBvrx2UxLLC2Tx1OdCwOK6u+WZl3Y40rymAGTm1uGaGNl+1QVMiV6qjPJrvV ofjfn2BwL6XlE81subxyECQ88Z37yrEG2Lg8MF2lVWOq1CGkTf8qa/NLEqagTuC3cJUC C9WykNRZa81Vlgvv85ncE0MusAgJDwmh+Hr4jKYfOHoUh7WaCSab93VIO5eggbjcgdae KTjxvwGlwO8rgp46E96Ykl7YCSsNKHNDPsLR/3BWTOfzVxU/wy7uEWyIVSouFb32zf3R Qh6S18alwHtkIIdVSKxX7R5h2Q5OI+hBdCnyTYi8XiBYV9xtJNDfUVnQR9EQJpxldxcM qv3Q== X-Gm-Message-State: AOAM532TyWV0/ThVmARPwBu9RiVh55fGw2Q7limrqQrarpx6lpPmyncR CX4lLgn0CaSVXblBmokqH9Usdfgqdc5d4lBxQ2E= X-Google-Smtp-Source: ABdhPJxpME4fYTAVmCioHW/qFqfM8vRj7wdKMqkMxXE5185A4rJl7PgWLNmeAoDn9F5sKutV4BJ+vGLKOm96+BYf/XE= X-Received: by 2002:ac2:418c:0:b0:43e:8f98:98f0 with SMTP id z12-20020ac2418c000000b0043e8f9898f0mr5747083lfh.604.1645192172314; Fri, 18 Feb 2022 05:49:32 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?0JTQvNC40YLRgNC40Lkg0KHRgtC10L/QsNC90L7Qsg==?= Date: Fri, 18 Feb 2022 16:49:21 +0300 Message-ID: Subject: Re: Mellanox performance degradation with more than 12 lcores To: Asaf Penso Cc: "users@dpdk.org" Content-Type: multipart/alternative; boundary="00000000000004a2c605d84b272e" X-Mailman-Approved-At: Mon, 21 Feb 2022 18:33:08 +0100 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --00000000000004a2c605d84b272e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I get 125 Mpps from single port using 12 lcores: numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=3D12 --rxq=3D12 --txq=3D12 --rx= d=3D512 With 63 cores i get 35 Mpps: numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=3D63 --rxq=3D63 --txq=3D63 --rx= d=3D512 I'm using this guide as a reference - https://fast.dpdk.org/doc/perf/DPDK_20_11_Mellanox_NIC_performance_report.p= df This reference suggests examples of how to get the best performance but all of them use maximum 12 lcores. 125 Mpps with 12 lcores is nearly the maximum I can get from single 100GB port (148Mpps theoretical maximum for 64byte packet). I just want to understand - why I get good performance with 12 lcores and bad performance with 63 cores? =D0=BF=D1=82, 18 =D1=84=D0=B5=D0=B2=D1=80. 2022 =D0=B3. =D0=B2 16:30, Asaf = Penso : > Hello Dmitry, > > Could you please paste the testpmd commands per each experiment? > > Also, have you looked into dpdk.org performance report to see how to tune > for best results? > > Regards, > Asaf Penso > ------------------------------ > *From:* =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=A1=D1=82=D0=B5=D0= =BF=D0=B0=D0=BD=D0=BE=D0=B2 > *Sent:* Friday, February 18, 2022 9:32:59 AM > *To:* users@dpdk.org > *Subject:* Mellanox performance degradation with more than 12 lcores > > Hi folks! > > I'm using Mellanox ConnectX-6 Dx EN adapter card (100GbE; Dual-port > QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on a server with AMD EPYC 7702 > 64-Core Processor (NUMA system with 2 sockets). Hyperthreading is turned > off. > I'm testing the maximum receive throughput I can get from a single port > using testpmd utility (shipped with dpdk). My generator produces random U= DP > packets with zero payload length. > > I get the maximum performance using 8-12 lcores (overall 120-125Mpps on > receive path of single port): > > numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 > -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=3D12 --rxq=3D12 --txq=3D1= 2 > --rxd=3D512 > > With more than 12 lcores overall receive performance reduces. With 16-32 > lcores I get 100-110 Mpps, and I get a significant performance fall with = 33 > lcores - 84Mpps. With 63 cores I get even 35Mpps overall receive > performance. > > Are there any limitations on the total number of receive queues (total > lcores) that can handle a single port on a given NIC? > > Thanks, > Dmitriy Stepanov > --00000000000004a2c605d84b272e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I get 125 Mpps from single port using 12 lcores:
numac= tl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 =C2=A0-a= 0000:c1:00.0 -- --stats-period 1 --nb-cores=3D12 --rxq=3D12 --txq=3D12 --r= xd=3D512

With 63 cores i get 35 Mpps:
numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 = =C2=A0-a 0000:c1:00.0 -- --stats-period 1 --nb-cores=3D63 --rxq=3D63 --txq= =3D63 --rxd=3D512

This reference suggests examp= les of how to get the best performance but all of them use maximum 12 lcore= s.=C2=A0
125 Mpps with 12 lcores is nearly the maximum I can get = from single 100GB port (148Mpps theoretical maximum for 64byte packet). I j= ust want to understand - why I get good performance with 12 lcores and bad = performance with 63 cores?

=D0=BF=D1=82, 18 =D1=84=D0=B5=D0=B2=D1=80. = 2022 =D0=B3. =D0=B2 16:30, Asaf Penso <asafp@nvidia.com>:
Hello Dmitry,

Could you please paste the testpmd commands per each expe= riment?

Also, have you looked into dpdk.org performance report to see how to tune for best= results?

Regards,
Asaf Penso

Fro= m: =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=A1=D1=82=D0=B5=D0=BF= =D0=B0=D0=BD=D0=BE=D0=B2 <stepanov.dmit@gmail.com>
Sent: Friday, February 18, 2022 9:32:59 AM
To: users@dpdk.o= rg <users@dpdk.o= rg>
Subject: Mellanox performance degradation with more than 12 lcores
=C2=A0
Hi folks!

I'm using Mellanox ConnectX-6 Dx EN adapter card (100GbE; Dual-port QSF= P56; PCIe 4.0/3.0 x16) with DPDK 21.11 on a server with AMD EPYC 7702 64-Co= re Processor (NUMA system with 2 sockets). Hyperthreading is turned off. I'm testing the maximum receive throughput I can get from a single port= using testpmd utility (shipped with dpdk). My generator produces random UD= P packets with zero payload length.

I get the maximum performance using 8-12 lcores (overall 120-125Mpps on rec= eive path of single port):

numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 =C2= =A0-a 0000:c1:00.0 -- --stats-period 1 --nb-cores=3D12 --rxq=3D12 --txq=3D1= 2 --rxd=3D512

With more than 12 lcores overall receive performance reduces. With 16-32 lc= ores I get 100-110 Mpps, and I get a significant performance fall with 33 l= cores - 84Mpps. With 63 cores I get even 35Mpps =C2=A0overall receive perfo= rmance.

Are there any limitations on the total number of receive queues (total lcor= es) that can handle a single port on a given NIC?

Thanks,
Dmitriy Stepanov
--00000000000004a2c605d84b272e--