From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8C9EAA034D for ; Fri, 11 Mar 2022 13:58:35 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1D84B40140; Fri, 11 Mar 2022 13:58:35 +0100 (CET) Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by mails.dpdk.org (Postfix) with ESMTP id 3E8A440042 for ; Fri, 11 Mar 2022 13:58:34 +0100 (CET) Received: by mail-lf1-f42.google.com with SMTP id 3so14889962lfr.7 for ; Fri, 11 Mar 2022 04:58:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LIHkZReD/5xsedPeOGTkUe5xxZC/wNqK//kmu10cnlY=; b=Heelat7yyfyL4MkR5OOppOCnKzPfx+uHCn9QvdYnBTFy8OpeRkNazrBNkgvdiZBG5r kXfr7OM1l1aF5FsPVv3WFPMySmCEKmkW9vH0ldsYqdzEJ2nJjmbjxas7ZO0QagzygHTX 6JhMv+kKqesIAx0Qy6Hoq3gqz12EeX9xzp4KyxjJ+hIEiz8gCmK8EDsq7WjsjcDUvvSk nCahPCua+EYvcLgISK69KNWVIVGcrV24q7NQKJzvu5+P/twVVbj0Po1k4WLWm2SobSp7 fF1dZJSiT9j5Abw4vjkSB6FxeEP3tmRJ+B+YuQiCY4yhtB+wrN8XPGOx8T/iA2MXhms2 k5fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LIHkZReD/5xsedPeOGTkUe5xxZC/wNqK//kmu10cnlY=; b=sHT/ocTlJgCUI37PA1keSgMdPyUWMuSwJrmF4nMW1qRWOlM1aSKUKSLaFU2zatuozs e5OSpcPSZQqrBU95ifyR3oY62mwO6zBtPM7i0iiE9cJy8XRoGAScLejUv+hixxwCCuyW wO8ag4ZEVQmeyGAiTFcKUpkOQSxvL26Ml1DZl7/oHKvNlC17urbs3KoUj6ZnVXVhonEY wZExtQe2qu4aEDwJn61YvmoSYcQ6AMAiPXoUTQMuMy3Y5ImuMfV9ONQ/3+dZF6VHnXMx OGV1iYSNT7oU/p1aLzL0sAVVIwkWZTQm0Np/yKgpsa0dMdyZMUDtP+Pv2NHpOv2aia6I R/ZQ== X-Gm-Message-State: AOAM530fbiayuT1bGjEbn7dkrKFIF/44Zi3tedowX8bpVgtg8Nk477Z+ iRwhMFuyqn/R+RE/fC7nD8GDIkKI3ZaTdiF10S8= X-Google-Smtp-Source: ABdhPJwa1ibtwQVV/VnR0TgvnQ/OypMuhLTWusfFM7SR9ODmaJBz/biV3Ad4eMWkFOthXSTyfxKlX5/X33lAIA2h5po= X-Received: by 2002:a05:6512:39c8:b0:448:38a8:daf7 with SMTP id k8-20020a05651239c800b0044838a8daf7mr6072748lfu.507.1647003513520; Fri, 11 Mar 2022 04:58:33 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?0JTQvNC40YLRgNC40Lkg0KHRgtC10L/QsNC90L7Qsg==?= Date: Fri, 11 Mar 2022 15:58:22 +0300 Message-ID: Subject: Re: Flow rules performance with ConnectX-6 Dx To: Dmitry Kozlyuk Cc: "users@dpdk.org" Content-Type: multipart/alternative; boundary="0000000000005e054005d9f0e3e8" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --0000000000005e054005d9f0e3e8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey, Dmitry! Thanks for reply! I'm using global RSS configuration (configured using rte_eth_dev_configure) which distributes incoming packets to different queues. And each queue is handled by different lcore. I've checked that incoming traffic is properly distributed among them. For example, in case of 16 queues (lcores) I see about 900 Kpps per lcore which in sum gives 15 Mpps. I was able to reproduce the same using testpmd utility My steps: - Start generator at 50 Mpps with 2 IP dest addresses: 10.0.0.1 and 10.0.0.= 2 - Start testpmd in interactive mode with 16 queues/lcores: numactl -N 1 -m 1 ./dpdk-testpmd -l 64-127 -a 0000:c1:00.0 -- --nb-cores=3D16 --rxq=3D16 --txq=3D16 -i - Create flow rule: testpmd> flow create 0 group 0 priority 0 ingress pattern eth / ipv4 dst is 10.0.0.2 / end actions drop / end - Start forwarding: testpmd> start - Show stats (it shows the same 15Mpps instead of expected 25 Mpps) testpmd> show port stats 0 ######################## NIC statistics for port 0 ######################## RX-packets: 1127219612 RX-missed: 0 RX-bytes: 67633178722 RX-errors: 0 RX-nombuf: 0 TX-packets: 1127219393 TX-errors: 0 TX-bytes: 67633171416 Throughput (since last show) Rx-pps: 14759286 Rx-bps: 7084457512 Tx-pps: 14758730 Tx-bps: 7084315448 ###########################################################################= # - Ensure incoming traffic is properly distributed among queues (lcores): testpmd> show port xstats 0 rx_q0_packets: 21841125 rx_q1_packets: 21847375 rx_q2_packets: 21833731 rx_q3_packets: 21837461 rx_q4_packets: 21842922 rx_q5_packets: 21843999 rx_q6_packets: 21838775 rx_q7_packets: 21833429 rx_q8_packets: 21838033 rx_q9_packets: 21835210 rx_q10_packets: 21833261 rx_q11_packets: 21833059 rx_q12_packets: 21849831 rx_q13_packets: 21843589 rx_q14_packets: 21842721 rx_q15_packets: 21834222 - If I use IP dest addresses which don't match drop rule (replace 10.0.0.2 by 10.0.0.3) - I get expected 50 Mpps ######################## NIC statistics for port 0 ######################## RX-packets: 1988576249 RX-missed: 0 RX-bytes: 119314577228 RX-errors: 0 RX-nombuf: 0 TX-packets: 1988576248 TX-errors: 0 TX-bytes: 119314576882 Throughput (since last show) Rx-pps: 49999534 Rx-bps: 23999776424 Tx-pps: 49999580 Tx-bps: 23999776424 ###########################################################################= # =D0=BF=D1=82, 11 =D0=BC=D0=B0=D1=80. 2022 =D0=B3. =D0=B2 14:37, Dmitry Kozl= yuk : > Hi Dmitry, > > Can it be that RSS, to which non-matching traffic gets by default, > is configured in a way that steers each destination IP to one queue? > And this 15 Mpps is in fact how much a core can read from a queue? > In general, it is always worth trying to reproduce the issue with testpmd > and to describe flow rules in full testpmd format ("flow create..."). > --0000000000005e054005d9f0e3e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey, Dmitry!
Thanks for reply!

I'm using glo= bal RSS configuration (configured using rte_eth_dev_configure) which distri= butes incoming packets to different queues. And each queue is handled by di= fferent lcore.
I've checked that incoming traffic is properly distri= buted among them. For example, in case of 16 queues (lcores) I see about 90= 0 Kpps per lcore which in sum gives 15 Mpps.

I was able to reproduce= the same using testpmd utility

My steps:

- Start generator a= t 50 Mpps with 2 IP dest addresses: 10.0.0.1 and 10.0.0.2

- Start te= stpmd in interactive mode with 16 queues/lcores:

numactl -N 1 -m 1 .= /dpdk-testpmd -l 64-127 -a 0000:c1:00.0 =C2=A0-- --nb-cores=3D16 --rxq=3D16= --txq=3D16 -i

- Create flow rule:

testpmd> flow create 0 = group 0 priority 0 ingress pattern eth / ipv4 dst is 10.0.0.2 / end actions= drop / end

- Start forwarding:

testpmd> start

- Sh= ow stats (it shows the same 15Mpps instead of expected 25 Mpps)

test= pmd> show port stats 0

=C2=A0 ######################## NIC statis= tics for port 0 =C2=A0########################
=C2=A0 RX-packets: 112721= 9612 RX-missed: 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0RX-bytes: =C2=A06763317= 8722
=C2=A0 RX-errors: 0
=C2=A0 RX-nombuf: =C2=A00 =C2=A0 =C2=A0 =C2= =A0 =C2=A0
=C2=A0 TX-packets: 1127219393 TX-errors: 0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0TX-bytes: =C2=A067633171416

=C2=A0 Throughput (sinc= e last show)
=C2=A0 Rx-pps: =C2=A0 =C2=A0 14759286 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0Rx-bps: =C2=A0 7084457512
=C2=A0 Tx-pps: =C2=A0 =C2=A0 1475= 8730 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Tx-bps: =C2=A0 7084315448
=C2=A0 = ###########################################################################= #

- Ensure incoming traffic is properly distributed among queues (lc= ores):

testpmd> show port xstats 0

rx_q0_packets: 21841125=
rx_q1_packets: 21847375
rx_q2_packets: 21833731
rx_q3_packets: 21= 837461
rx_q4_packets: 21842922
rx_q5_packets: 21843999
rx_q6_packe= ts: 21838775
rx_q7_packets: 21833429
rx_q8_packets: 21838033
rx_q9= _packets: 21835210
rx_q10_packets: 21833261
rx_q11_packets: 21833059<= br>rx_q12_packets: 21849831
rx_q13_packets: 21843589
rx_q14_packets: = 21842721
rx_q15_packets: 21834222

- If I use IP dest addresses wh= ich don't match drop rule (replace 10.0.0.2 by 10.0.0.3) - I get expect= ed 50 Mpps

=C2=A0 ######################## NIC statistics for port 0= =C2=A0########################
=C2=A0 RX-packets: 1988576249 RX-missed:= 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0RX-bytes: =C2=A0119314577228
=C2=A0= RX-errors: 0
=C2=A0 RX-nombuf: =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0
= =C2=A0 TX-packets: 1988576248 TX-errors: 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0TX-bytes: =C2=A0119314576882

=C2=A0 Throughput (since last show)<= br>=C2=A0 Rx-pps: =C2=A0 =C2=A0 49999534 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= Rx-bps: =C2=A023999776424
=C2=A0 Tx-pps: =C2=A0 =C2=A0 49999580 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0Tx-bps: =C2=A023999776424
=C2=A0 ############= ################################################################
<= br>
=D0=BF= =D1=82, 11 =D0=BC=D0=B0=D1=80. 2022 =D0=B3. =D0=B2 14:37, Dmitry Kozlyuk &l= t;dkozlyuk@nvidia.com>:
Hi Dmitry,

Can it be that RSS, to which non-matching traffic gets by default,
is configured in a way that steers each destination IP to one queue?
And this 15 Mpps is in fact how much a core can read from a queue?
In general, it is always worth trying to reproduce the issue with testpmd and to describe flow rules in full testpmd format ("flow create...&quo= t;).
--0000000000005e054005d9f0e3e8--