DPDK usage discussions
 help / color / mirror / Atom feed
From: Yaron Illouz <yaroni@radcom.com>
To: Matan Azrad <matan@nvidia.com>, "users@dpdk.org" <users@dpdk.org>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-users] imissed drop with mellanox connectx5
Date: Thu, 22 Jul 2021 10:34:31 +0000	[thread overview]
Message-ID: <AM9PR09MB499503443149770A0D5D2BBDA1E49@AM9PR09MB4995.eurprd09.prod.outlook.com> (raw)
In-Reply-To: <DM4PR12MB5389E4135E4AD5A69F9BBF57DFE49@DM4PR12MB5389.namprd12.prod.outlook.com>

Hi Matan

We work with mbuf in all threads and lcores,
We pass them from one thread to another through the dpdk ring before releasing them.
There are drops in 10K to 100K pps, we can't stay with these drops.

The drops are in the imissed counter from rte_eth_stats_get, so I thought that the drops are at the port level and not drop at mempool level
From what I see number of mbuf in pool is stable( and close to the total/original number of mbuf in pool), the rings are empty, Traffic is well balanced between threads, All threads are running in pool from port and from ring.
And from perf top profiler there doesn't seem to be any unexpected function taking cpu.

So the only possible architecture would be to implement all logic in the threads that read from port, and to launch hundreds of threads in multiqueue mode that read from port? I don't think this is a viable solution ( In the following link for example they show an example of application that pass packet from one core/thread to another https://doc.dpdk.org/guides-16.04/sample_app_ug/qos_scheduler.html )

Thank you answer

-----Original Message-----
From: Matan Azrad <matan@nvidia.com> 
Sent: Thursday, July 22, 2021 8:19 AM
To: Yaron Illouz <yaroni@radcom.com>; users@dpdk.org
Cc: dev@dpdk.org
Subject: RE: imissed drop with mellanox connectx5

Hi Yaron

Freeing mbufs from a different lcore than the original lcore allocated them causes cache miss in the mempool cache of the original lcore per mbuf allocation - all the time the PMD will get non-hot mbufs to work with. 

It can be one of the reasons for the earlier drops you see.

Matan

From: Yaron Illouz
> Hi
> 
> We try to read from 100G NIC Mellanox ConnectX-5  without drop at nic.
> All thread are with core pinning and cpu isolation.
> We use dpdk 19.11
> I tried to apply all configuration that are in
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffast
> .dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellanox_NIC_performance_r&amp;dat
> a=04%7C01%7C%7Cdcbb2d8246be4dc456c508d94cd038a7%7C0eb9e2d98763412e9709
> 3f539e9e25bc%7C0%7C0%7C637625279453292671%7CUnknown%7CTWFpbGZsb3d8eyJW
> IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&
> amp;sdata=KMBFyIMEFV4B0JqxQE%2BiMXJ2p9qE8lEOpUWRsFhD0gM%3D&amp;reserve
> d=0
> eport.pdf
> 
> We have a strange behavior, 1 thread can receive receive 20 Gbps/12 
> Mpps and free mbuf without dropps,  but when trying to pass these mbuf 
> to another thread that only free them there are drops, even when 
> trying to work with more threads.
> 
> When running 1 thread that only read from port (no multi queue) and 
> free mbuf in the same thread, there are no dropp with traffic up to 21 
> Gbps  12.4 Mpps.
> When running 6 thread that only read from port (with multi queue) and 
> free mbuf in the same threads, there are no dropp with traffic up to 
> 21 Gbps  12.4 Mpps.
> 
> When running 1 to 6 thread that only read from port and pass them to 
> another 6 thread that only read from ring and free mbuf, there are 
> dropp in nic (imissed counter) with traffic over to 10 Gbps  5.2 
> Mpps.(Here receive thread were pinned to cpu 1-6 and additional thread 
> from 7-12 each thread on a single cpu) Each receive thread send to one thread that free the buffer.
> 
> Configurations:
> 
> We use rings of size 32768 between the threads. Ring are initialized 
> with SP/SC, Write are done with bulk of 512 with rte_ring_enqueue_burst.
> Port is initialized with rte_eth_rx_queue_setup nb_rx_desc=8192 
> rte_eth_rxconf - rx_conf.rx_thresh.pthresh = DPDK_NIC_RX_PTHRESH; 
> //ring prefetch threshold
>                                 rx_conf.rx_thresh.hthresh = 
> DPDK_NIC_RX_HTHRESH; //ring host threshold
>                                 rx_conf.rx_thresh.wthresh = 
> DPDK_NIC_RX_WTHRESH; //ring writeback threshold
>                                 rx_conf.rx_free_thresh = 
> DPDK_NIC_RX_FREE_THRESH; rss -
> >  ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP;
> 
> 
> We tried to work with and without hyperthreading.
> 
> ****************************************
> 
> Network devices using kernel driver
> ===================================
> 0000:37:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2f0 
> drv=mlx5_core unused=igb_uio
> 0000:37:00.1 'MT27800 Family [ConnectX-5] 1017' if=ens2f1 
> drv=mlx5_core unused=igb_uio
> 
> ****************************************
> 
> ethtool -i ens2f0
> driver: mlx5_core
> version: 5.3-1.0.0
> firmware-version: 16.30.1004 (HPE0000000009)
> expansion-rom-version:
> bus-info: 0000:37:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
> 
> ****************************************
> 
> uname -a
> Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19
> 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> 
> ****************************************
> 
> lscpu | grep -e Socket -e Core -e Thread
> Thread(s) per core:    1
> Core(s) per socket:    24
> Socket(s):             2
> 
> ****************************************
> cat /sys/devices/system/node/node0/cpulist
> 0-23
> ****************************************
> From /proc/cpuinfo
> 
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 85
> model name      : Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz
> stepping        : 7
> microcode       : 0x5003003
> cpu MHz         : 2200.000
> 
> ****************************************
> 
> python /home/cpu_layout.py
> ==========================================================
> ============
> Core and Socket Information (as reported by '/sys/devices/system/cpu') 
> ==========================================================
> ============
> 
> cores =  [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 
> 20, 21, 25, 26, 27, 28, 29, 24] sockets =  [0, 1]
> 
>         Socket 0    Socket 1
>         --------    --------
> Core 0  [0]         [24]
> Core 1  [1]         [25]
> Core 2  [2]         [26]
> Core 3  [3]         [27]
> Core 4  [4]         [28]
> Core 5  [5]         [29]
> Core 6  [6]         [30]
> Core 8  [7]
> Core 9  [8]         [31]
> Core 10 [9]         [32]
> Core 11 [10]        [33]
> Core 12 [11]        [34]
> Core 13 [12]        [35]
> Core 16 [13]        [36]
> Core 17 [14]        [37]
> Core 18 [15]        [38]
> Core 19 [16]        [39]
> Core 20 [17]        [40]
> Core 21 [18]        [41]
> Core 25 [19]        [43]
> Core 26 [20]        [44]
> Core 27 [21]        [45]
> Core 28 [22]        [46]
> Core 29 [23]        [47]
> Core 24             [42]

  reply	other threads:[~2021-07-22 10:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 15:27 Yaron Illouz
2021-07-22  5:19 ` Matan Azrad
2021-07-22 10:34   ` Yaron Illouz [this message]
2021-07-24  6:31     ` Gerry Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM9PR09MB499503443149770A0D5D2BBDA1E49@AM9PR09MB4995.eurprd09.prod.outlook.com \
    --to=yaroni@radcom.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).