DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Gaoxiang Liu <gaoxiangliu0@163.com>, chas3@att.com, humin29@huawei.com
Cc: olivier.matz@6wind.com, dev@dpdk.org, liugaoxiang@huawei.com,
	"Morten Brørup" <mb@smartsharesystems.com>
Subject: Re: [PATCH v7] net/bonding: another fix to LACP mempool size
Date: Fri, 12 Apr 2024 20:04:09 +0100	[thread overview]
Message-ID: <1c487911-d7a4-44dd-be1a-4813fe1f0f36@amd.com> (raw)
In-Reply-To: <20220328151652.221-1-gaoxiangliu0@163.com>

On 3/28/2022 4:16 PM, Gaoxiang Liu wrote:
> The following log message may appear after a slave is idle(or nearly
> idle)
> for a few minutes:"PMD: Failed to allocate LACP packet from pool".
> And bond mode 4 negotiation may fail.
> 
> Problem:When bond mode 4 has been chosed and delicated queue has
> not been enable, all mbufs from a slave' private pool(used
> exclusively for transmitting LACPDUs) have been allocated in
> interrupt thread, and are still sitting in the device's tx
> descriptor ring and other cores' mempool caches in fwd thread.
> Thus the interrupt thread can not alloc LACP packet from pool.
> 

Hi Gaoxiang,

Briefly, this patch increases the mempool size, an additional amount to
compensate cache size.

Above mentions two reasons for increase,
1. mbufs in Tx ring of the device. As far as I can see mbufs allocated
from 'mbuf_pool' and enqueued to device Tx ring, and if what I
understand above is Tx queue may be disabled causing mbufs stuck in this
ring.

2. mbufs stuck in mempool per core cache.


Increasing mempool size can be an option but can we address root cause,
for 1. if the device Tx queue is disabled can we skip enqueue to the ring.

And for 2. as you said mbuf is allocated only in the interrupt thread
which doesn't have any cache, so not sure why other lcore caches filled.
Also can we disable the caches instead, as this is already used for low
traffic?


> Solution: Ensure that each slave'tx (LACPDU) mempool owns more than
> n-tx-queues * n-tx-descriptor + fwd_core_num *
> per-core-mmempool-flush-threshold mbufs.
> 
> Note that the LACP tx machine fuction is the only code that allocates
> from a slave's private pool. It runs in the context of the interrupt
> thread, and thus it has no mempool cache of its own.
> 
> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
> 
> ---
> v2:
> * Fixed compile issues.
> 
> v3:
> * delete duplicate code.
> 
> v4;
> * Fixed some issues.
> 1. total_tx_desc should use +=
> 2. add detailed logs
> 
> v5:
> * Fixed some issues.
> 1. move CACHE_FLUSHTHRESH_MULTIPLIER to rte_eth_bond-8023ad.c
> 2. use RTE_MIN
> 
> v6:
> * add a comment of CACHE_FLUSHTHRESH_MULTIPLIER macro
> 
> v7:
> * Fixed some issues.
> 1. move CACHE_FLUSHTHRESH_MULTIPLIER to rte_mempool.h
> ---
>  drivers/net/bonding/rte_eth_bond_8023ad.c | 7 ++++---
>  lib/mempool/rte_mempool.h                 | 2 ++
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
> index ca50583d62..f7f6828126 100644
> --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
> +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
> @@ -1050,6 +1050,7 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
>  	uint32_t total_tx_desc;
>  	struct bond_tx_queue *bd_tx_q;
>  	uint16_t q_id;
> +	uint32_t cache_size;
>  
>  	/* Given slave mus not be in active list */
>  	RTE_ASSERT(find_slave_by_id(internals->active_slaves,
> @@ -1100,11 +1101,11 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
>  		total_tx_desc += bd_tx_q->nb_tx_desc;
>  	}
>  
> +	cache_size = RTE_MIN(RTE_MEMPOOL_CACHE_MAX_SIZE, 32);
>

I can see it is coming from old code, but do you know why 32 selected as
cache size, is it an arbitrary value?

> +	total_tx_desc += rte_lcore_count() * cache_size * RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER;
>  	snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", slave_id);
>  	port->mbuf_pool = rte_pktmbuf_pool_create(mem_name, total_tx_desc,
> -		RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
> -			32 : RTE_MEMPOOL_CACHE_MAX_SIZE,
> -		0, element_size, socket_id);
> +		cache_size, 0, element_size, socket_id);
>  
>  	/* Any memory allocation failure in initialization is critical because
>  	 * resources can't be free, so reinitialization is impossible. */
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c1527..fa15ed710f 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -56,6 +56,8 @@
>  extern "C" {
>  #endif
>  
> +#define RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER 1.5
> +
>

I can see we delve into discussion related to this exposing this value
from library, instead of having a copy in driver (which is valid point).

Looking to the reason to usage, driver tries to compensate the mbufs may
be may be stuck in the cache till they hit the threshold. But this flush
threshold value can be driver internal, and it seems original intention
is not have something driver/application be aware of this value.

So instead of using this value, what about first try disabling cache as
mentioned above?


      parent reply	other threads:[~2024-04-12 19:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-04  9:56 [PATCH] " Gaoxiang Liu
2022-03-05  2:26 ` [PATCH v2] " Gaoxiang Liu
2022-03-05  7:09   ` [PATCH v3] " Gaoxiang Liu
2022-03-08  1:17     ` Min Hu (Connor)
2022-03-08 14:24     ` [PATCH v4] " Gaoxiang Liu
2022-03-09  0:32       ` Min Hu (Connor)
2022-03-09  1:25       ` Stephen Hemminger
2022-03-09  2:53         ` Min Hu (Connor)
2022-03-25 12:02           ` Gaoxiang Liu
2022-03-09  1:26       ` Stephen Hemminger
2022-03-25 12:10       ` [PATCH v5] " Gaoxiang Liu
2022-03-25 13:01         ` Gaoxiang Liu
2022-03-25 13:13           ` Morten Brørup
2022-03-26 12:57             ` Wang, Haiyue
2022-03-25 13:34           ` [PATCH v6] " Gaoxiang Liu
2022-03-25 14:04             ` Morten Brørup
2022-03-28 15:16             ` [PATCH v7] " Gaoxiang Liu
2022-04-29 14:20               ` Ferruh Yigit
2022-05-01  7:02                 ` Matan Azrad
2024-04-12 19:04               ` Ferruh Yigit [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c487911-d7a4-44dd-be1a-4813fe1f0f36@amd.com \
    --to=ferruh.yigit@amd.com \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=gaoxiangliu0@163.com \
    --cc=humin29@huawei.com \
    --cc=liugaoxiang@huawei.com \
    --cc=mb@smartsharesystems.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).