From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <users-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 725C4A0352
	for <public@inbox.dpdk.org>; Mon, 21 Feb 2022 18:33:13 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 7BF7A410F3;
	Mon, 21 Feb 2022 18:33:10 +0100 (CET)
Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com
 [209.85.208.172])
 by mails.dpdk.org (Postfix) with ESMTP id 995A640141
 for <users@dpdk.org>; Fri, 18 Feb 2022 17:14:20 +0100 (CET)
Received: by mail-lj1-f172.google.com with SMTP id o6so5037917ljp.3
 for <users@dpdk.org>; Fri, 18 Feb 2022 08:14:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=Ebic/lNaMVXzY+xQhL7+3IxeC/zE7LJPXGXaYMNQgmg=;
 b=AHLovuGmsnpC7Vcuc79S8fgYKMDdcBxX1UZQomrHS8Qr3MJXo6ca0Vbg0iPA09S+0D
 L4AXu0sXcSumQptgnhSMWMhea8evoP+U8+upkJQGdbght+VcOS9ISqulQOF5YhcTVxxI
 nPMVmtFjAhEfrSm07mUcYpcvSPNfPGDCc/morjxvJR0frn9Xf6altescg7G93vJ9Adll
 oC1LnQ09UyWTsFI+wyEVkZ2QA4wl9BNq2FDErM1WTR5o408+Vqb8yovL29+UGyOM1Gx8
 DZq6MLUIwMfkmy+n9/TEqV2VJrtGJpZjw9lrYfJSNAjtMNg/ljB+G0mRzk62SvqSN/Gb
 4qcg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=Ebic/lNaMVXzY+xQhL7+3IxeC/zE7LJPXGXaYMNQgmg=;
 b=CdNh6WTV3m6GotBPy1TjbhkK+UgzQyvC11/KpDGlEY90Juzt2ssLnfWidP4Ff7slmq
 xdI4efJXrD47sVi0BMJafeu4HD4MD5ItOV+Z2gikBYBUJenz5VVvfPFAxdBH4iXiXdBQ
 8JROus6ueRHTnHCMKb9yc20sSBzBGg2FDPUMucog98Vpz99BVP9NO6Pc5MwcvvTko/Bo
 qxwrYQskxEoxQkOGciTB8HoOtcjNBNRN1VKD3cHEoQEUiUavpogq/Mq81sdcxcU/njm6
 ARtMnziNgy6vEQEIm58iLvnfWTuvEPrbs25KGcg59IS5Kc+TB5S9gh8YVp+Z9xA4KFfb
 n7aw==
X-Gm-Message-State: AOAM532mzHGjjogluG2W1099soI2yg2i06UAPctnYvOyof3EB06D7Ivg
 bW9oJCKiw5rhwKlXUjau4E8W4cn6D8rdem3+pQM=
X-Google-Smtp-Source: ABdhPJx0MfRHmlAnC7ZDAVYpzPteV88nabXmYdA7JFcQVTYypILDN0On508lV2lkeGumhbQ1gICuUqdgdxDhtjEL3tI=
X-Received: by 2002:a2e:b6c5:0:b0:246:bfd:2724 with SMTP id
 m5-20020a2eb6c5000000b002460bfd2724mr6174282ljo.259.1645200859749; Fri, 18
 Feb 2022 08:14:19 -0800 (PST)
MIME-Version: 1.0
References: <CA+-SuJ3FqMW5aTcuEpuqoKLffothhqn5karWTAC=fETpOs_3Rw@mail.gmail.com>
 <20220218133952.3084134-1-dkozlyuk@nvidia.com>
In-Reply-To: <20220218133952.3084134-1-dkozlyuk@nvidia.com>
From: =?UTF-8?B?0JTQvNC40YLRgNC40Lkg0KHRgtC10L/QsNC90L7Qsg==?=
 <stepanov.dmit@gmail.com>
Date: Fri, 18 Feb 2022 19:14:08 +0300
Message-ID: <CA+-SuJ1w5NKSW=CsdVRSB9_+C5Kk48FqAAsOKdOXbNjWNt_+wA@mail.gmail.com>
Subject: Re: Mellanox performance degradation with more than 12 lcores
To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Cc: users@dpdk.org
Content-Type: multipart/alternative; boundary="000000000000d4605405d84d2c34"
X-Mailman-Approved-At: Mon, 21 Feb 2022 18:33:08 +0100
X-BeenThere: users@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK usage discussions <users.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/users>,
 <mailto:users-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/users/>
List-Post: <mailto:users@dpdk.org>
List-Help: <mailto:users-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/users>,
 <mailto:users-request@dpdk.org?subject=subscribe>
Errors-To: users-bounces@dpdk.org

--000000000000d4605405d84d2c34
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks for the clarification!
I was able to get 148Mpps with 12 lcores after some BIOS tunings.
Looks like due to these HW limitations I have to use ring buffer as you
suggested to support more than 32 lcores!

=D0=BF=D1=82, 18 =D1=84=D0=B5=D0=B2=D1=80. 2022 =D0=B3. =D0=B2 16:40, Dmitr=
y Kozlyuk <dkozlyuk@nvidia.com>:

> Hi,
>
> > With more than 12 lcores overall receive performance reduces.
> > With 16-32 lcores I get 100-110 Mpps,
>
> It is more about the number of queues than the number of cores:
> 12 queues are the threshold when Multi-Packet Receive Queue (MPRQ)
> is automatically enabled in mlx5 PMD.
> Try increasing --rxd and check out mprq_en device argument.
> Please see mlx5 PMD user guide for details about MPRQ.
> You should be able to get full 148 Mpps with your HW.
>
> > and I get a significant performance fall with 33 lcores - 84Mpps.
> > With 63 cores I get even 35Mpps overall receive performance.
> >
> > Are there any limitations on the total number of receive queues (total
> > lcores) that can handle a single port on a given NIC?
>
> This is a hardware limitation.
> The limit on the number of queues you can create is very high (16M),
> but performance can perfectly scale only up to 32 queues
> at high packet rates (as opposed to bit rates).
> Using more queues can even degrade it, just as you observe.
> One way to overcome this (not specific to mlx5)
> is to use a ring buffer for incoming packets,
> from which any number of processing cores can take packets.
>

--000000000000d4605405d84d2c34
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks for the clarification!=C2=A0<div>I was able to get =
148Mpps=C2=A0with 12 lcores=C2=A0after some BIOS tunings.=C2=A0</div><div>L=
ooks like due to these HW limitations I have to use ring buffer as you sugg=
ested to support more than 32 lcores!=C2=A0</div></div><br><div class=3D"gm=
ail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=D0=BF=D1=82, 18 =D1=84=D0=
=B5=D0=B2=D1=80. 2022 =D0=B3. =D0=B2 16:40, Dmitry Kozlyuk &lt;<a href=3D"m=
ailto:dkozlyuk@nvidia.com">dkozlyuk@nvidia.com</a>&gt;:<br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s=
olid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
&gt; With more than 12 lcores overall receive performance reduces.<br>
&gt; With 16-32 lcores I get 100-110 Mpps,<br>
<br>
It is more about the number of queues than the number of cores:<br>
12 queues are the threshold when Multi-Packet Receive Queue (MPRQ)<br>
is automatically enabled in mlx5 PMD.<br>
Try increasing --rxd and check out mprq_en device argument.<br>
Please see mlx5 PMD user guide for details about MPRQ.<br>
You should be able to get full 148 Mpps with your HW.<br>
<br>
&gt; and I get a significant performance fall with 33 lcores - 84Mpps.<br>
&gt; With 63 cores I get even 35Mpps overall receive performance.<br>
&gt; <br>
&gt; Are there any limitations on the total number of receive queues (total=
<br>
&gt; lcores) that can handle a single port on a given NIC?<br>
<br>
This is a hardware limitation.<br>
The limit on the number of queues you can create is very high (16M),<br>
but performance can perfectly scale only up to 32 queues<br>
at high packet rates (as opposed to bit rates).<br>
Using more queues can even degrade it, just as you observe.<br>
One way to overcome this (not specific to mlx5)<br>
is to use a ring buffer for incoming packets,<br>
from which any number of processing cores can take packets.<br>
</blockquote></div>

--000000000000d4605405d84d2c34--