[dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule
@ 2017-07-28  7:58 chenchanghu
  2017-07-28  8:40 ` Adrien Mazarguil
  0 siblings, 1 reply; 4+ messages in thread
From: chenchanghu @ 2017-07-28  7:58 UTC (permalink / raw)
  To: dev, adrien.mazarguil, nelio.laranjeiro
  Cc: Zhoujingbin, Zhoulei (G),
	Deng Kairong, Chenrujie, cuiyayun, Chengwei (Titus),
	Lixuan (Alex)

Hi,
         When I used the mlx4 pmd, I meet a problem about MLX4_PMD_TX_MP_CACHE vaule, which is used for Memory pool to Memory region translation. The detail test is descripted below.
1.Test environmemt infomation:
  a. Linux distribution: CentOS
  b. dpdk version: dpdk-16.04
  c. Ethernet device : mlx4 VF
  d. pmd info: mlx4 poll-mode-driver

2.Test diagram:
+----------------------+    +---------------------+         +-----------------------+
| client1       |    |  client2     |  .....    |  clientN      |
+----------------+----+    +-------+------------+         +----------+------------+
                  |         |                 |
                  |         |                 |
               +----v---------------v------------------------------v------+
               |              share memory queue   |
               +----------------------------+------------------------------+
                                |
                                |
             +-------------------------------v--------------------------------+
             |                    server            |
             +-------------------------------+--------------------------------+
                                |
                                |
             +-------------------------------v--------------------------------+
             |             dpdk rte_eth_tx_burst      |
             +-------------------------------+--------------------------------+
                                |
             +-------------------------------v---------------------------------+
             |                mlx4 pmd driver         |
             +-----------------------------------------------------------------+
  a. Every client has one memory pool, all clients send message to server queue in the shared memory.
  b. Server is only one thread, and mlx4 pmd use one tx queue.

3.Test step:
  a. We start 30 clients, which means total mempool number reaching 30, every client will send 20 packets/second, every packet length is 10k.However,the server will do large packet segmentation before the packet send to rte_eth_tx_burst.
  b. When we use the mlx4 pmd default MLX4_PMD_TX_MP_CACHE value which is 8, we found that the function 'rte_eth_tx_burst' cost about 40ms, which is mostly cost by the function 'ibv_reg_mr'.
  c. Then we modify the MLX4_PMD_TX_MP_CACHE vaule to 32, which is configured the vaule 'CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE' in the config/common_base file, we found the function 'rte_eth_tx_burst' running time is less than 5ms.

   Would the community modify the default CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE value to 32 to adapt the scenario like above description, avoiding the slow operation when use too many mempool number
which is more than the CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE value in one tx queue.
   Please send your reply to chenchanghu@huawei.com<mailto:chenchanghu@huawei.com>, any suggestion is to be greatefully appreciated.
4. Patch:
diff --git a/config/common_base b/config/common_base
index a0580d1..af6ba47 100644
--- a/config/common_base
+++ b/config/common_base
@@ -207,7 +207,7 @@ CONFIG_RTE_LIBRTE_MLX4_PMD=y
CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
-CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
+CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=32
CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule
  2017-07-28  7:58 [dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule chenchanghu
@ 2017-07-28  8:40 ` Adrien Mazarguil
  0 siblings, 0 replies; 4+ messages in thread
From: Adrien Mazarguil @ 2017-07-28  8:40 UTC (permalink / raw)
  To: chenchanghu
  Cc: dev, nelio.laranjeiro, Zhoujingbin, Zhoulei (G),
	Deng Kairong, Chenrujie, cuiyayun, Chengwei (Titus),
	Lixuan (Alex)

Hi,

On Fri, Jul 28, 2017 at 07:58:48AM +0000, chenchanghu wrote:
> Hi,
>          When I used the mlx4 pmd, I meet a problem about MLX4_PMD_TX_MP_CACHE vaule, which is used for Memory pool to Memory region translation. The detail test is descripted below.
> 1.Test environmemt infomation:
>   a. Linux distribution: CentOS
>   b. dpdk version: dpdk-16.04
>   c. Ethernet device : mlx4 VF
>   d. pmd info: mlx4 poll-mode-driver
> 
> 2.Test diagram:
> +----------------------+    +---------------------+         +-----------------------+
> | client1       |    |  client2     |  .....    |  clientN      |
> +----------------+----+    +-------+------------+         +----------+------------+
>                   |         |                 |
>                   |         |                 |
>                +----v---------------v------------------------------v------+
>                |              share memory queue   |
>                +----------------------------+------------------------------+
>                                 |
>                                 |
>              +-------------------------------v--------------------------------+
>              |                    server            |
>              +-------------------------------+--------------------------------+
>                                 |
>                                 |
>              +-------------------------------v--------------------------------+
>              |             dpdk rte_eth_tx_burst      |
>              +-------------------------------+--------------------------------+
>                                 |
>              +-------------------------------v---------------------------------+
>              |                mlx4 pmd driver         |
>              +-----------------------------------------------------------------+
>   a. Every client has one memory pool, all clients send message to server queue in the shared memory.
>   b. Server is only one thread, and mlx4 pmd use one tx queue.
> 
> 3.Test step:
>   a. We start 30 clients, which means total mempool number reaching 30, every client will send 20 packets/second, every packet length is 10k.However,the server will do large packet segmentation before the packet send to rte_eth_tx_burst.
>   b. When we use the mlx4 pmd default MLX4_PMD_TX_MP_CACHE value which is 8, we found that the function 'rte_eth_tx_burst' cost about 40ms, which is mostly cost by the function 'ibv_reg_mr'.
>   c. Then we modify the MLX4_PMD_TX_MP_CACHE vaule to 32, which is configured the vaule 'CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE' in the config/common_base file, we found the function 'rte_eth_tx_burst' running time is less than 5ms.

Yep, this is an old limitation that also affects the mlx5 PMD (and recently
discussed [1]).

>    Would the community modify the default CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE value to 32 to adapt the scenario like above description, avoiding the slow operation when use too many mempool number
> which is more than the CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE value in one tx queue.
> 
>    Please send your reply to chenchanghu@huawei.com<mailto:chenchanghu@huawei.com>, any suggestion is to be greatefully appreciated.
> 
> 4. Patch:
> diff --git a/config/common_base b/config/common_base
> index a0580d1..af6ba47 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -207,7 +207,7 @@ CONFIG_RTE_LIBRTE_MLX4_PMD=y
> CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
> CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
> CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
> -CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
> +CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=32
> CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1

In the past, using 8 as the default MP cache size was deemed a fair
trade-off since most applications only used a small number of
mempools. Also, recompiling DPDK after tweaking compilation options to suit
specific applications needs was not much of an issue.

Today DPDK is more library-like so setting compile-time parameters is not
always possible for applications, this is why we're in the process of
removing these options altogether as part of a major refactoring. They will
become either run-time options (through device parameters) or fully
automatic whenever possible. This should be the case for mlx4 in the 17.11
release (it's too late for 17.08).

In the meantime, considering CONFIG_RTE_LIBRTE_MLX4_PMD is disabled by
default and needs to be enabled manually, users that need a larger MP cache
need to also increase CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE like you did.

Because updating the default value cannot possibly satisfy all use cases,
I think it's better to leave it as is for the time being in order to not
affect existing applications.

[1] http://dpdk.org/ml/archives/dev/2017-July/071405.html

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule
  2017-07-28 10:52 chenchanghu
@ 2017-07-28 12:00 ` Adrien Mazarguil
  0 siblings, 0 replies; 4+ messages in thread
From: Adrien Mazarguil @ 2017-07-28 12:00 UTC (permalink / raw)
  To: chenchanghu
  Cc: dev, nelio.laranjeiro, Zhoujingbin, Zhoulei (G),
	Deng Kairong, Chenrujie, cuiyayun, Chengwei (Titus),
	Lixuan (Alex), Lilijun (Jerry)

Hi Changhu,

On Fri, Jul 28, 2017 at 10:52:45AM +0000, chenchanghu wrote:
> Hi   Adrien,
> Thanks very much! I have got the  question about MLX4_PMD_TX_MP_CACHE value, we will modify this value to suit our applications.
>   However, in the 2 clients or more clients test, we found that the function 'txq->if_qp->send_pending' and 'txq->if_qp->send_flush(txq->qp)'  in 'mlx4_tx_burst'   probabilistic cost almost *5ms* each function . The probability is about 1/50000, which means every 50000 packets sending appeared once.
>   Does this phenomenon is normal? Or do we ignored some configurations that not showed documented?

5 ms for these function calls is strange and certainly not normal. Are you
sure this time is spent in send_pending()/send_flush() and not in
mlx4_tx_burst() itself?

Given the MP cache size and number of mempools involved in your setup, cache
look-up might be longer than normal, but this alone does not explain it.
Might be something else, such as:

- txq_mp2mr() fails to register a mempool of one of these packets for some
  reason (chunked mempool?) Enable CONFIG_RTE_LIBRTE_MLX4_DEBUG and look
  for "unable to get MP <-> MR association" messages.

- You've enabled TX inline mode using a large value and CPU cycles are
  wasted by the PMD doing memcpy() on large packets. Don't enable inline TX
  (set CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE to 0).

- Sent packets have too many segments (more than MLX4_PMD_SGE_WR_N). This is
  super expensive as the PMD needs to linearize extra segments. You can set
  MLX4_PMD_SGE_WR_N to the next power of two (8), however beware doing so
  will degrade performance.

This might also be caused by external factors that depend on the application
or the host system, if for instance DPDK memory is spread across NUMA
nodes. Make sure it's not the case.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule
@ 2017-07-28 10:52 chenchanghu
  2017-07-28 12:00 ` Adrien Mazarguil
  0 siblings, 1 reply; 4+ messages in thread
From: chenchanghu @ 2017-07-28 10:52 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: dev, nelio.laranjeiro, Zhoujingbin, Zhoulei (G),
	Deng Kairong, Chenrujie, cuiyayun, Chengwei (Titus),
	Lixuan (Alex), Lilijun (Jerry)

Hi   Adrien,
Thanks very much! I have got the  question about MLX4_PMD_TX_MP_CACHE value, we will modify this value to suit our applications.
  However, in the 2 clients or more clients test, we found that the function 'txq->if_qp->send_pending' and 'txq->if_qp->send_flush(txq->qp)'  in 'mlx4_tx_burst'   probabilistic cost almost *5ms* each function . The probability is about 1/50000, which means every 50000 packets sending appeared once.
  Does this phenomenon is normal? Or do we ignored some configurations that not showed documented?
 Please send your reply to chenchanghu@huawei.com , any suggestion is to be gratefully appreciated.
Thanks


-----邮件原件-----
发件人: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] 
发送时间: 2017年7月28日 16:41
收件人: chenchanghu <chenchanghu@huawei.com>
抄送: dev@dpdk.org; nelio.laranjeiro@6wind.com; Zhoujingbin <zhoujingbin@huawei.com>; Zhoulei (G) <stone.zhou@huawei.com>; Deng Kairong <dengkairong@huawei.com>; Chenrujie <chenrujie@huawei.com>; cuiyayun <cuiyayun@huawei.com>; Chengwei (Titus) <titus.chengwei@huawei.com>; Lixuan (Alex) <Awesome.li@huawei.com>
主题: Re: [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule

Hi,

On Fri, Jul 28, 2017 at 07:58:48AM +0000, chenchanghu wrote:
> Hi,
>          When I used the mlx4 pmd, I meet a problem about MLX4_PMD_TX_MP_CACHE vaule, which is used for Memory pool to Memory region translation. The detail test is descripted below.
> 1.Test environmemt infomation:
>   a. Linux distribution: CentOS
>   b. dpdk version: dpdk-16.04
>   c. Ethernet device : mlx4 VF
>   d. pmd info: mlx4 poll-mode-driver
> 
> 2.Test diagram:
> +----------------------+    +---------------------+         +-----------------------+
> | client1       |    |  client2     |  .....    |  clientN      |
> +----------------+----+    +-------+------------+         +----------+------------+
>                   |         |                 |
>                   |         |                 |
>                +----v---------------v------------------------------v------+
>                |              share memory queue   |
>                +----------------------------+------------------------------+
>                                 |
>                                 |
>              +-------------------------------v--------------------------------+
>              |                    server            |
>              +-------------------------------+--------------------------------+
>                                 |
>                                 |
>              +-------------------------------v--------------------------------+
>              |             dpdk rte_eth_tx_burst      |
>              +-------------------------------+--------------------------------+
>                                 |
>              +-------------------------------v---------------------------------+
>              |                mlx4 pmd driver         |
>              +-----------------------------------------------------------------+
>   a. Every client has one memory pool, all clients send message to server queue in the shared memory.
>   b. Server is only one thread, and mlx4 pmd use one tx queue.
> 
> 3.Test step:
>   a. We start 30 clients, which means total mempool number reaching 30, every client will send 20 packets/second, every packet length is 10k.However,the server will do large packet segmentation before the packet send to rte_eth_tx_burst.
>   b. When we use the mlx4 pmd default MLX4_PMD_TX_MP_CACHE value which is 8, we found that the function 'rte_eth_tx_burst' cost about 40ms, which is mostly cost by the function 'ibv_reg_mr'.
>   c. Then we modify the MLX4_PMD_TX_MP_CACHE vaule to 32, which is configured the vaule 'CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE' in the config/common_base file, we found the function 'rte_eth_tx_burst' running time is less than 5ms.

Yep, this is an old limitation that also affects the mlx5 PMD (and recently discussed [1]).

>    Would the community modify the default 
> CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE value to 32 to adapt the scenario like above description, avoiding the slow operation when use too many mempool number which is more than the CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE value in one tx queue.
> 
>    Please send your reply to chenchanghu@huawei.com<mailto:chenchanghu@huawei.com>, any suggestion is to be greatefully appreciated.
> 
> 4. Patch:
> diff --git a/config/common_base b/config/common_base index 
> a0580d1..af6ba47 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -207,7 +207,7 @@ CONFIG_RTE_LIBRTE_MLX4_PMD=y 
> CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
> CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
> CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
> -CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
> +CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=32
> CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1

In the past, using 8 as the default MP cache size was deemed a fair trade-off since most applications only used a small number of mempools. Also, recompiling DPDK after tweaking compilation options to suit specific applications needs was not much of an issue.

Today DPDK is more library-like so setting compile-time parameters is not always possible for applications, this is why we're in the process of removing these options altogether as part of a major refactoring. They will become either run-time options (through device parameters) or fully automatic whenever possible. This should be the case for mlx4 in the 17.11 release (it's too late for 17.08).

In the meantime, considering CONFIG_RTE_LIBRTE_MLX4_PMD is disabled by default and needs to be enabled manually, users that need a larger MP cache need to also increase CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE like you did.

Because updating the default value cannot possibly satisfy all use cases, I think it's better to leave it as is for the time being in order to not affect existing applications.

[1] http://dpdk.org/ml/archives/dev/2017-July/071405.html

--
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-28 12:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28  7:58 [dpdk-dev] [disscussion] mlx4 driver MLX4_PMD_TX_MP_CACHE default vaule chenchanghu
2017-07-28  8:40 ` Adrien Mazarguil
2017-07-28 10:52 chenchanghu
2017-07-28 12:00 ` Adrien Mazarguil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).