* [dpdk-users] MLX ConnectX-4 Discarding packets
@ 2021-09-10 13:34 Filip Janiszewski
  2021-09-11  8:56 ` Filip Janiszewski
  0 siblings, 1 reply; 7+ messages in thread
From: Filip Janiszewski @ 2021-09-10 13:34 UTC (permalink / raw)
  To: users
Hi,
I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
and using the same capture software we can't get any faster than 10Gbps,
when exceeding that speed regardless of the amount of queues configured
the rx_discards_phy counter starts to raise and packets are lost in huge
amounts.
On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
Is there any specific DPDK configuration that we might want to setup for
those AMD servers? The software is DPDK based so I wonder if some build
option is missing somewhere.
What else I might want to look for to investigate this issue?
Thanks
-- 
BR, Filip
+48 666 369 823
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [dpdk-users] MLX ConnectX-4 Discarding packets
  2021-09-10 13:34 [dpdk-users] MLX ConnectX-4 Discarding packets Filip Janiszewski
@ 2021-09-11  8:56 ` Filip Janiszewski
  2021-09-11 10:20   ` Steffen Weise
  0 siblings, 1 reply; 7+ messages in thread
From: Filip Janiszewski @ 2021-09-11  8:56 UTC (permalink / raw)
  To: users
I ran more tests,
This AMD server is a bit confusing, I can tune it to capture 28Mpps (64
bytes frame) from one single core, so I would assume that using one more
core will at least increase a bit the capture capabilities, but it's
not, 1% more speed and it drops regardless of how many queues are
configured - I've not observed this situation on the Intel server, where
adding more queues/cores scale to higher throughput.
This issue have been verified now with both Mellanox and Intel (810
series, 100GbE) NICs.
Anybody encountered anything similar?
Thanks
Il 9/10/21 3:34 PM, Filip Janiszewski ha scritto:
> Hi,
> 
> I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
> an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
> and using the same capture software we can't get any faster than 10Gbps,
> when exceeding that speed regardless of the amount of queues configured
> the rx_discards_phy counter starts to raise and packets are lost in huge
> amounts.
> 
> On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
> 
> Is there any specific DPDK configuration that we might want to setup for
> those AMD servers? The software is DPDK based so I wonder if some build
> option is missing somewhere.
> 
> What else I might want to look for to investigate this issue?
> 
> Thanks
> 
-- 
BR, Filip
+48 666 369 823
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [dpdk-users] MLX ConnectX-4 Discarding packets
  2021-09-11  8:56 ` Filip Janiszewski
@ 2021-09-11 10:20   ` Steffen Weise
  2021-09-11 14:19     ` Filip Janiszewski
  0 siblings, 1 reply; 7+ messages in thread
From: Steffen Weise @ 2021-09-11 10:20 UTC (permalink / raw)
  To: Filip Janiszewski; +Cc: users
Hi Filip,
i have not seen the same issues.
Are you aware of this tuning guide? I applied it and had no issues with intel 100G NIC.
HPC Tuning Guide for AMD EPYC Processors
http://developer.amd.com/wp-content/resources/56420.pdf
Hope it helps.
Cheers,
Steffen Weise
> Am 11.09.2021 um 10:56 schrieb Filip Janiszewski <contact@filipjaniszewski.com>:
> 
> I ran more tests,
> 
> This AMD server is a bit confusing, I can tune it to capture 28Mpps (64
> bytes frame) from one single core, so I would assume that using one more
> core will at least increase a bit the capture capabilities, but it's
> not, 1% more speed and it drops regardless of how many queues are
> configured - I've not observed this situation on the Intel server, where
> adding more queues/cores scale to higher throughput.
> 
> This issue have been verified now with both Mellanox and Intel (810
> series, 100GbE) NICs.
> 
> Anybody encountered anything similar?
> 
> Thanks
> 
> Il 9/10/21 3:34 PM, Filip Janiszewski ha scritto:
>> Hi,
>> 
>> I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
>> an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
>> and using the same capture software we can't get any faster than 10Gbps,
>> when exceeding that speed regardless of the amount of queues configured
>> the rx_discards_phy counter starts to raise and packets are lost in huge
>> amounts.
>> 
>> On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
>> 
>> Is there any specific DPDK configuration that we might want to setup for
>> those AMD servers? The software is DPDK based so I wonder if some build
>> option is missing somewhere.
>> 
>> What else I might want to look for to investigate this issue?
>> 
>> Thanks
>> 
> 
> -- 
> BR, Filip
> +48 666 369 823
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [dpdk-users] MLX ConnectX-4 Discarding packets
  2021-09-11 10:20   ` Steffen Weise
@ 2021-09-11 14:19     ` Filip Janiszewski
  2021-09-11 14:34       ` Filip Janiszewski
  0 siblings, 1 reply; 7+ messages in thread
From: Filip Janiszewski @ 2021-09-11 14:19 UTC (permalink / raw)
  To: Steffen Weise; +Cc: users
Thanks,
I knew that document and we've implemented many of those settings/rules,
but perhaps there's one crucial I've forgot? Wonder which one.
Anyway, increasing the amount of queues impinge the performance, while
sending 250M packets over a 100GbE link to an Intel 810-cqda2 NIC
mounted on the EPYC Milan server, i see:
.
1 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 54,590,111
2 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 79,394,138
4 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 87,414,030
.
With DPDK 21.02 on RHL8.4. I can't observe this situation while
capturing from my Intel server where increasing the queues leads to
better performance (while with the test input set I drop with one queue,
I do not drop anymore with 2 on the Intel server.)
A customer with a brand new EPYC Milan server in his lab observed as
well this scenario which is a bit of a worry, but again it might be some
config/compilation issue we need do deal with?
BTW, the same issue can be reproduced with testpmd, using 4 queues and
the same input data set (250M of 64bytes frame at 30Gbps):
.
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...
  ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
-------
  RX-packets: 41762999       TX-packets: 0              TX-dropped: 0
  ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
-------
  RX-packets: 40152306       TX-packets: 0              TX-dropped: 0
  ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
-------
  RX-packets: 41153402       TX-packets: 0              TX-dropped: 0
  ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
-------
  RX-packets: 38341370       TX-packets: 0              TX-dropped: 0
  ---------------------- Forward statistics for port 0
----------------------
  RX-packets: 161410077      RX-dropped: 88589923      RX-total: 250000000
  TX-packets: 0              TX-dropped: 0             TX-total: 0
----------------------------------------------------------------------------
.
.
testpmd> show port xstats 0
###### NIC extended statistics for port 0
rx_good_packets: 161410081
tx_good_packets: 0
rx_good_bytes: 9684605284
tx_good_bytes: 0
rx_missed_errors: 88589923
.
Can't figure out what's wrong here..
Il 9/11/21 12:20 PM, Steffen Weise ha scritto:
> Hi Filip,
> 
> i have not seen the same issues.
> Are you aware of this tuning guide? I applied it and had no issues with
> intel 100G NIC.
> 
> HPC Tuning Guide for AMD EPYC Processors
> http://developer.amd.com/wp-content/resources/56420.pdf
> <http://developer.amd.com/wp-content/resources/56420.pdf>
> 
> Hope it helps.
> 
> Cheers,
> Steffen Weise
> 
> 
>> Am 11.09.2021 um 10:56 schrieb Filip Janiszewski
>> <contact@filipjaniszewski.com>:
>>
>> I ran more tests,
>>
>> This AMD server is a bit confusing, I can tune it to capture 28Mpps (64
>> bytes frame) from one single core, so I would assume that using one more
>> core will at least increase a bit the capture capabilities, but it's
>> not, 1% more speed and it drops regardless of how many queues are
>> configured - I've not observed this situation on the Intel server, where
>> adding more queues/cores scale to higher throughput.
>>
>> This issue have been verified now with both Mellanox and Intel (810
>> series, 100GbE) NICs.
>>
>> Anybody encountered anything similar?
>>
>> Thanks
>>
>> Il 9/10/21 3:34 PM, Filip Janiszewski ha scritto:
>>> Hi,
>>>
>>> I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
>>> an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
>>> and using the same capture software we can't get any faster than 10Gbps,
>>> when exceeding that speed regardless of the amount of queues configured
>>> the rx_discards_phy counter starts to raise and packets are lost in huge
>>> amounts.
>>>
>>> On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
>>>
>>> Is there any specific DPDK configuration that we might want to setup for
>>> those AMD servers? The software is DPDK based so I wonder if some build
>>> option is missing somewhere.
>>>
>>> What else I might want to look for to investigate this issue?
>>>
>>> Thanks
>>>
>>
>> -- 
>> BR, Filip
>> +48 666 369 823
-- 
BR, Filip
+48 666 369 823
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [dpdk-users] MLX ConnectX-4 Discarding packets
  2021-09-11 14:19     ` Filip Janiszewski
@ 2021-09-11 14:34       ` Filip Janiszewski
  2021-09-12  9:32         ` Filip Janiszewski
  0 siblings, 1 reply; 7+ messages in thread
From: Filip Janiszewski @ 2021-09-11 14:34 UTC (permalink / raw)
  To: Steffen Weise; +Cc: users
I wanted just to add, while running the same exact testpmd on the other
machine I won't get a single miss with the same patter traffic:
.
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...
  ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
-------
  RX-packets: 61711939       TX-packets: 0              TX-dropped: 0
  ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
-------
  RX-packets: 62889424       TX-packets: 0              TX-dropped: 0
  ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
-------
  RX-packets: 61914199       TX-packets: 0              TX-dropped: 0
  ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
-------
  RX-packets: 63484438       TX-packets: 0              TX-dropped: 0
  ---------------------- Forward statistics for port 0
----------------------
  RX-packets: 250000000      RX-dropped: 0             RX-total: 250000000
  TX-packets: 0              TX-dropped: 0             TX-total: 0
----------------------------------------------------------------------------
  +++++++++++++++ Accumulated forward statistics for all
ports+++++++++++++++
  RX-packets: 250000000      RX-dropped: 0             RX-total: 250000000
  TX-packets: 0              TX-dropped: 0             TX-total: 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
.
In the lab I've the EPYC connected directly to the Xeon using a 100GbE
link, both same RHL8.4 and same DPDK 21.02, running:
.
./dpdk-testpmd -l 21-31 -n 8 -w 81:00.1  -- -i --rxq=4 --txq=4
--burst=64 --forward-mode=rxonly --rss-ip --total-num-mbufs=4194304
--nb-cores=4
.
and sending from the other end with pktgen, the EPYC loss tons of
packets (see my previous email), the Xeon don't loss anything.
*Confusion!*
Il 9/11/21 4:19 PM, Filip Janiszewski ha scritto:
> Thanks,
> 
> I knew that document and we've implemented many of those settings/rules,
> but perhaps there's one crucial I've forgot? Wonder which one.
> 
> Anyway, increasing the amount of queues impinge the performance, while
> sending 250M packets over a 100GbE link to an Intel 810-cqda2 NIC
> mounted on the EPYC Milan server, i see:
> 
> .
> 1 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 54,590,111
> 2 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 79,394,138
> 4 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 87,414,030
> .
> 
> With DPDK 21.02 on RHL8.4. I can't observe this situation while
> capturing from my Intel server where increasing the queues leads to
> better performance (while with the test input set I drop with one queue,
> I do not drop anymore with 2 on the Intel server.)
> 
> A customer with a brand new EPYC Milan server in his lab observed as
> well this scenario which is a bit of a worry, but again it might be some
> config/compilation issue we need do deal with?
> 
> BTW, the same issue can be reproduced with testpmd, using 4 queues and
> the same input data set (250M of 64bytes frame at 30Gbps):
> 
> .
> testpmd> stop
> Telling cores to stop...
> Waiting for lcores to finish...
> 
>   ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
> -------
>   RX-packets: 41762999       TX-packets: 0              TX-dropped: 0
> 
> 
>   ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
> -------
>   RX-packets: 40152306       TX-packets: 0              TX-dropped: 0
> 
> 
>   ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
> -------
>   RX-packets: 41153402       TX-packets: 0              TX-dropped: 0
> 
> 
>   ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
> -------
>   RX-packets: 38341370       TX-packets: 0              TX-dropped: 0
> 
> 
>   ---------------------- Forward statistics for port 0
> ----------------------
>   RX-packets: 161410077      RX-dropped: 88589923      RX-total: 250000000
>   TX-packets: 0              TX-dropped: 0             TX-total: 0
> 
> ----------------------------------------------------------------------------
> .
> 
> .
> testpmd> show port xstats 0
> ###### NIC extended statistics for port 0
> rx_good_packets: 161410081
> tx_good_packets: 0
> rx_good_bytes: 9684605284
> tx_good_bytes: 0
> rx_missed_errors: 88589923
> .
> 
> Can't figure out what's wrong here..
> 
> 
> Il 9/11/21 12:20 PM, Steffen Weise ha scritto:
>> Hi Filip,
>>
>> i have not seen the same issues.
>> Are you aware of this tuning guide? I applied it and had no issues with
>> intel 100G NIC.
>>
>> HPC Tuning Guide for AMD EPYC Processors
>> http://developer.amd.com/wp-content/resources/56420.pdf
>> <http://developer.amd.com/wp-content/resources/56420.pdf>
>>
>> Hope it helps.
>>
>> Cheers,
>> Steffen Weise
>>
>>
>>> Am 11.09.2021 um 10:56 schrieb Filip Janiszewski
>>> <contact@filipjaniszewski.com>:
>>>
>>> I ran more tests,
>>>
>>> This AMD server is a bit confusing, I can tune it to capture 28Mpps (64
>>> bytes frame) from one single core, so I would assume that using one more
>>> core will at least increase a bit the capture capabilities, but it's
>>> not, 1% more speed and it drops regardless of how many queues are
>>> configured - I've not observed this situation on the Intel server, where
>>> adding more queues/cores scale to higher throughput.
>>>
>>> This issue have been verified now with both Mellanox and Intel (810
>>> series, 100GbE) NICs.
>>>
>>> Anybody encountered anything similar?
>>>
>>> Thanks
>>>
>>> Il 9/10/21 3:34 PM, Filip Janiszewski ha scritto:
>>>> Hi,
>>>>
>>>> I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
>>>> an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
>>>> and using the same capture software we can't get any faster than 10Gbps,
>>>> when exceeding that speed regardless of the amount of queues configured
>>>> the rx_discards_phy counter starts to raise and packets are lost in huge
>>>> amounts.
>>>>
>>>> On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
>>>>
>>>> Is there any specific DPDK configuration that we might want to setup for
>>>> those AMD servers? The software is DPDK based so I wonder if some build
>>>> option is missing somewhere.
>>>>
>>>> What else I might want to look for to investigate this issue?
>>>>
>>>> Thanks
>>>>
>>>
>>> -- 
>>> BR, Filip
>>> +48 666 369 823
> 
-- 
BR, Filip
+48 666 369 823
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [dpdk-users] MLX ConnectX-4 Discarding packets
  2021-09-11 14:34       ` Filip Janiszewski
@ 2021-09-12  9:32         ` Filip Janiszewski
  2021-09-29 10:43           ` Thomas Monjalon
  0 siblings, 1 reply; 7+ messages in thread
From: Filip Janiszewski @ 2021-09-12  9:32 UTC (permalink / raw)
  To: Steffen Weise; +Cc: users
Alright, nailed it down to a wrong preferred PCIe device in the BIOS
configuration, it has not been changed after the NIC have been moved to
another PCIe slot.
Now the EPYC is going really great, getting 100Gbps rate easily.
Thank
Il 9/11/21 4:34 PM, Filip Janiszewski ha scritto:
> I wanted just to add, while running the same exact testpmd on the other
> machine I won't get a single miss with the same patter traffic:
> 
> .
> testpmd> stop
> Telling cores to stop...
> Waiting for lcores to finish...
> 
>   ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
> -------
>   RX-packets: 61711939       TX-packets: 0              TX-dropped: 0
> 
> 
>   ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
> -------
>   RX-packets: 62889424       TX-packets: 0              TX-dropped: 0
> 
> 
>   ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
> -------
>   RX-packets: 61914199       TX-packets: 0              TX-dropped: 0
> 
> 
>   ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
> -------
>   RX-packets: 63484438       TX-packets: 0              TX-dropped: 0
> 
> 
>   ---------------------- Forward statistics for port 0
> ----------------------
>   RX-packets: 250000000      RX-dropped: 0             RX-total: 250000000
>   TX-packets: 0              TX-dropped: 0             TX-total: 0
> 
> ----------------------------------------------------------------------------
> 
>   +++++++++++++++ Accumulated forward statistics for all
> ports+++++++++++++++
>   RX-packets: 250000000      RX-dropped: 0             RX-total: 250000000
>   TX-packets: 0              TX-dropped: 0             TX-total: 0
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> .
> 
> In the lab I've the EPYC connected directly to the Xeon using a 100GbE
> link, both same RHL8.4 and same DPDK 21.02, running:
> 
> .
> ./dpdk-testpmd -l 21-31 -n 8 -w 81:00.1  -- -i --rxq=4 --txq=4
> --burst=64 --forward-mode=rxonly --rss-ip --total-num-mbufs=4194304
> --nb-cores=4
> .
> 
> and sending from the other end with pktgen, the EPYC loss tons of
> packets (see my previous email), the Xeon don't loss anything.
> 
> *Confusion!*
> 
> Il 9/11/21 4:19 PM, Filip Janiszewski ha scritto:
>> Thanks,
>>
>> I knew that document and we've implemented many of those settings/rules,
>> but perhaps there's one crucial I've forgot? Wonder which one.
>>
>> Anyway, increasing the amount of queues impinge the performance, while
>> sending 250M packets over a 100GbE link to an Intel 810-cqda2 NIC
>> mounted on the EPYC Milan server, i see:
>>
>> .
>> 1 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 54,590,111
>> 2 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 79,394,138
>> 4 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 87,414,030
>> .
>>
>> With DPDK 21.02 on RHL8.4. I can't observe this situation while
>> capturing from my Intel server where increasing the queues leads to
>> better performance (while with the test input set I drop with one queue,
>> I do not drop anymore with 2 on the Intel server.)
>>
>> A customer with a brand new EPYC Milan server in his lab observed as
>> well this scenario which is a bit of a worry, but again it might be some
>> config/compilation issue we need do deal with?
>>
>> BTW, the same issue can be reproduced with testpmd, using 4 queues and
>> the same input data set (250M of 64bytes frame at 30Gbps):
>>
>> .
>> testpmd> stop
>> Telling cores to stop...
>> Waiting for lcores to finish...
>>
>>   ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
>> -------
>>   RX-packets: 41762999       TX-packets: 0              TX-dropped: 0
>>
>>
>>   ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
>> -------
>>   RX-packets: 40152306       TX-packets: 0              TX-dropped: 0
>>
>>
>>   ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
>> -------
>>   RX-packets: 41153402       TX-packets: 0              TX-dropped: 0
>>
>>
>>   ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
>> -------
>>   RX-packets: 38341370       TX-packets: 0              TX-dropped: 0
>>
>>
>>   ---------------------- Forward statistics for port 0
>> ----------------------
>>   RX-packets: 161410077      RX-dropped: 88589923      RX-total: 250000000
>>   TX-packets: 0              TX-dropped: 0             TX-total: 0
>>
>> ----------------------------------------------------------------------------
>> .
>>
>> .
>> testpmd> show port xstats 0
>> ###### NIC extended statistics for port 0
>> rx_good_packets: 161410081
>> tx_good_packets: 0
>> rx_good_bytes: 9684605284
>> tx_good_bytes: 0
>> rx_missed_errors: 88589923
>> .
>>
>> Can't figure out what's wrong here..
>>
>>
>> Il 9/11/21 12:20 PM, Steffen Weise ha scritto:
>>> Hi Filip,
>>>
>>> i have not seen the same issues.
>>> Are you aware of this tuning guide? I applied it and had no issues with
>>> intel 100G NIC.
>>>
>>> HPC Tuning Guide for AMD EPYC Processors
>>> http://developer.amd.com/wp-content/resources/56420.pdf
>>> <http://developer.amd.com/wp-content/resources/56420.pdf>
>>>
>>> Hope it helps.
>>>
>>> Cheers,
>>> Steffen Weise
>>>
>>>
>>>> Am 11.09.2021 um 10:56 schrieb Filip Janiszewski
>>>> <contact@filipjaniszewski.com>:
>>>>
>>>> I ran more tests,
>>>>
>>>> This AMD server is a bit confusing, I can tune it to capture 28Mpps (64
>>>> bytes frame) from one single core, so I would assume that using one more
>>>> core will at least increase a bit the capture capabilities, but it's
>>>> not, 1% more speed and it drops regardless of how many queues are
>>>> configured - I've not observed this situation on the Intel server, where
>>>> adding more queues/cores scale to higher throughput.
>>>>
>>>> This issue have been verified now with both Mellanox and Intel (810
>>>> series, 100GbE) NICs.
>>>>
>>>> Anybody encountered anything similar?
>>>>
>>>> Thanks
>>>>
>>>> Il 9/10/21 3:34 PM, Filip Janiszewski ha scritto:
>>>>> Hi,
>>>>>
>>>>> I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
>>>>> an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
>>>>> and using the same capture software we can't get any faster than 10Gbps,
>>>>> when exceeding that speed regardless of the amount of queues configured
>>>>> the rx_discards_phy counter starts to raise and packets are lost in huge
>>>>> amounts.
>>>>>
>>>>> On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
>>>>>
>>>>> Is there any specific DPDK configuration that we might want to setup for
>>>>> those AMD servers? The software is DPDK based so I wonder if some build
>>>>> option is missing somewhere.
>>>>>
>>>>> What else I might want to look for to investigate this issue?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>> -- 
>>>> BR, Filip
>>>> +48 666 369 823
>>
> 
-- 
BR, Filip
+48 666 369 823
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [dpdk-users] MLX ConnectX-4 Discarding packets
  2021-09-12  9:32         ` Filip Janiszewski
@ 2021-09-29 10:43           ` Thomas Monjalon
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2021-09-29 10:43 UTC (permalink / raw)
  To: Steffen Weise; +Cc: users, Filip Janiszewski
Great, thanks for the update!
12/09/2021 11:32, Filip Janiszewski:
> Alright, nailed it down to a wrong preferred PCIe device in the BIOS
> configuration, it has not been changed after the NIC have been moved to
> another PCIe slot.
> 
> Now the EPYC is going really great, getting 100Gbps rate easily.
> 
> Thank
> 
> Il 9/11/21 4:34 PM, Filip Janiszewski ha scritto:
> > I wanted just to add, while running the same exact testpmd on the other
> > machine I won't get a single miss with the same patter traffic:
> > 
> > .
> > testpmd> stop
> > Telling cores to stop...
> > Waiting for lcores to finish...
> > 
> >   ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
> > -------
> >   RX-packets: 61711939       TX-packets: 0              TX-dropped: 0
> > 
> > 
> >   ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
> > -------
> >   RX-packets: 62889424       TX-packets: 0              TX-dropped: 0
> > 
> > 
> >   ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
> > -------
> >   RX-packets: 61914199       TX-packets: 0              TX-dropped: 0
> > 
> > 
> >   ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
> > -------
> >   RX-packets: 63484438       TX-packets: 0              TX-dropped: 0
> > 
> > 
> >   ---------------------- Forward statistics for port 0
> > ----------------------
> >   RX-packets: 250000000      RX-dropped: 0             RX-total: 250000000
> >   TX-packets: 0              TX-dropped: 0             TX-total: 0
> > 
> > ----------------------------------------------------------------------------
> > 
> >   +++++++++++++++ Accumulated forward statistics for all
> > ports+++++++++++++++
> >   RX-packets: 250000000      RX-dropped: 0             RX-total: 250000000
> >   TX-packets: 0              TX-dropped: 0             TX-total: 0
> > 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > .
> > 
> > In the lab I've the EPYC connected directly to the Xeon using a 100GbE
> > link, both same RHL8.4 and same DPDK 21.02, running:
> > 
> > .
> > ./dpdk-testpmd -l 21-31 -n 8 -w 81:00.1  -- -i --rxq=4 --txq=4
> > --burst=64 --forward-mode=rxonly --rss-ip --total-num-mbufs=4194304
> > --nb-cores=4
> > .
> > 
> > and sending from the other end with pktgen, the EPYC loss tons of
> > packets (see my previous email), the Xeon don't loss anything.
> > 
> > *Confusion!*
> > 
> > Il 9/11/21 4:19 PM, Filip Janiszewski ha scritto:
> >> Thanks,
> >>
> >> I knew that document and we've implemented many of those settings/rules,
> >> but perhaps there's one crucial I've forgot? Wonder which one.
> >>
> >> Anyway, increasing the amount of queues impinge the performance, while
> >> sending 250M packets over a 100GbE link to an Intel 810-cqda2 NIC
> >> mounted on the EPYC Milan server, i see:
> >>
> >> .
> >> 1 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 54,590,111
> >> 2 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 79,394,138
> >> 4 queue, 30Gbps, ~45Mpps, 64B frame = imiss: 87,414,030
> >> .
> >>
> >> With DPDK 21.02 on RHL8.4. I can't observe this situation while
> >> capturing from my Intel server where increasing the queues leads to
> >> better performance (while with the test input set I drop with one queue,
> >> I do not drop anymore with 2 on the Intel server.)
> >>
> >> A customer with a brand new EPYC Milan server in his lab observed as
> >> well this scenario which is a bit of a worry, but again it might be some
> >> config/compilation issue we need do deal with?
> >>
> >> BTW, the same issue can be reproduced with testpmd, using 4 queues and
> >> the same input data set (250M of 64bytes frame at 30Gbps):
> >>
> >> .
> >> testpmd> stop
> >> Telling cores to stop...
> >> Waiting for lcores to finish...
> >>
> >>   ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 0/Queue= 0
> >> -------
> >>   RX-packets: 41762999       TX-packets: 0              TX-dropped: 0
> >>
> >>
> >>   ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 0/Queue= 1
> >> -------
> >>   RX-packets: 40152306       TX-packets: 0              TX-dropped: 0
> >>
> >>
> >>   ------- Forward Stats for RX Port= 0/Queue= 2 -> TX Port= 0/Queue= 2
> >> -------
> >>   RX-packets: 41153402       TX-packets: 0              TX-dropped: 0
> >>
> >>
> >>   ------- Forward Stats for RX Port= 0/Queue= 3 -> TX Port= 0/Queue= 3
> >> -------
> >>   RX-packets: 38341370       TX-packets: 0              TX-dropped: 0
> >>
> >>
> >>   ---------------------- Forward statistics for port 0
> >> ----------------------
> >>   RX-packets: 161410077      RX-dropped: 88589923      RX-total: 250000000
> >>   TX-packets: 0              TX-dropped: 0             TX-total: 0
> >>
> >> ----------------------------------------------------------------------------
> >> .
> >>
> >> .
> >> testpmd> show port xstats 0
> >> ###### NIC extended statistics for port 0
> >> rx_good_packets: 161410081
> >> tx_good_packets: 0
> >> rx_good_bytes: 9684605284
> >> tx_good_bytes: 0
> >> rx_missed_errors: 88589923
> >> .
> >>
> >> Can't figure out what's wrong here..
> >>
> >>
> >> Il 9/11/21 12:20 PM, Steffen Weise ha scritto:
> >>> Hi Filip,
> >>>
> >>> i have not seen the same issues.
> >>> Are you aware of this tuning guide? I applied it and had no issues with
> >>> intel 100G NIC.
> >>>
> >>> HPC Tuning Guide for AMD EPYC Processors
> >>> http://developer.amd.com/wp-content/resources/56420.pdf
> >>> <http://developer.amd.com/wp-content/resources/56420.pdf>
> >>>
> >>> Hope it helps.
> >>>
> >>> Cheers,
> >>> Steffen Weise
> >>>
> >>>
> >>>> Am 11.09.2021 um 10:56 schrieb Filip Janiszewski
> >>>> <contact@filipjaniszewski.com>:
> >>>>
> >>>> I ran more tests,
> >>>>
> >>>> This AMD server is a bit confusing, I can tune it to capture 28Mpps (64
> >>>> bytes frame) from one single core, so I would assume that using one more
> >>>> core will at least increase a bit the capture capabilities, but it's
> >>>> not, 1% more speed and it drops regardless of how many queues are
> >>>> configured - I've not observed this situation on the Intel server, where
> >>>> adding more queues/cores scale to higher throughput.
> >>>>
> >>>> This issue have been verified now with both Mellanox and Intel (810
> >>>> series, 100GbE) NICs.
> >>>>
> >>>> Anybody encountered anything similar?
> >>>>
> >>>> Thanks
> >>>>
> >>>> Il 9/10/21 3:34 PM, Filip Janiszewski ha scritto:
> >>>>> Hi,
> >>>>>
> >>>>> I've switched a 100Gbe MLX ConnectX-4 card from an Intel Xeon server to
> >>>>> an AMD EPYC server (running 75F3 CPU, 256GiB of RAM and PCIe4 lanes),
> >>>>> and using the same capture software we can't get any faster than 10Gbps,
> >>>>> when exceeding that speed regardless of the amount of queues configured
> >>>>> the rx_discards_phy counter starts to raise and packets are lost in huge
> >>>>> amounts.
> >>>>>
> >>>>> On the Xeon machine, I was able to get easily to 50Gbps with 4 queues.
> >>>>>
> >>>>> Is there any specific DPDK configuration that we might want to setup for
> >>>>> those AMD servers? The software is DPDK based so I wonder if some build
> >>>>> option is missing somewhere.
> >>>>>
> >>>>> What else I might want to look for to investigate this issue?
> >>>>>
> >>>>> Thanks
^ permalink raw reply	[flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-09-29 10:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10 13:34 [dpdk-users] MLX ConnectX-4 Discarding packets Filip Janiszewski
2021-09-11  8:56 ` Filip Janiszewski
2021-09-11 10:20   ` Steffen Weise
2021-09-11 14:19     ` Filip Janiszewski
2021-09-11 14:34       ` Filip Janiszewski
2021-09-12  9:32         ` Filip Janiszewski
2021-09-29 10:43           ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).