DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] mlx5 reta size is dynamic
@ 2018-03-21 18:56 Hanoch Haim (hhaim)
  2018-03-21 21:47 ` Yongseok Koh
  0 siblings, 1 reply; 11+ messages in thread
From: Hanoch Haim (hhaim) @ 2018-03-21 18:56 UTC (permalink / raw)
  To: dev; +Cc: Hanoch Haim (hhaim)

Hi mlx5 driver expert,

DPDK: 17.11
Any reason mlx5 driver change the rate table size dynamically based on the rx- queues# ?
There is a hidden assumption that the user wants to distribute the packets evenly which is not always correct.

/* If the requested number of RX queues is not a power of two, use the
          * maximum indirection table size for better balancing.
          * The result is always rounded to the next power of two. */
          reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
                                           priv->ind_table_max_size :
                                           rxqs_n));

thanks,
Hanoh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-21 18:56 [dpdk-dev] mlx5 reta size is dynamic Hanoch Haim (hhaim)
@ 2018-03-21 21:47 ` Yongseok Koh
  2018-03-22  6:52   ` Hanoch Haim (hhaim)
  0 siblings, 1 reply; 11+ messages in thread
From: Yongseok Koh @ 2018-03-21 21:47 UTC (permalink / raw)
  To: Hanoch Haim (hhaim); +Cc: dev

On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> Hi mlx5 driver expert,
> 
> DPDK: 17.11
> Any reason mlx5 driver change the rate table size dynamically based on the rx-
> queues# ?

The device only supports 2^n-sized indirection table. For example, if the number
of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could
be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues
will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was
too much disparity and preferred setting the max size in order to mitigate the
imbalance.

> There is a hidden assumption that the user wants to distribute the packets
> evenly which is not always correct.

But it is mostly correct because RSS is used for uniform distribution. The
decision wasn't made based on our speculation but by many request from multiple
customers.

> /* If the requested number of RX queues is not a power of two, use the
>           * maximum indirection table size for better balancing.
>           * The result is always rounded to the next power of two. */
>           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
>                                            priv->ind_table_max_size :
>                                            rxqs_n));

Thanks,
Yongseok

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-21 21:47 ` Yongseok Koh
@ 2018-03-22  6:52   ` Hanoch Haim (hhaim)
  2018-03-22  8:54     ` Nélio Laranjeiro
  0 siblings, 1 reply; 11+ messages in thread
From: Hanoch Haim (hhaim) @ 2018-03-22  6:52 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dev

Hi Yongseok, 


RSS has a DPDK API,application can ask for the reta table size and configure it. In your case you are assuming specific use case and change the size dynamically which solve 90% of the use-cases but break the 10% use-case. 
Instead, you could provide the application a consistent API and with that 100% of the applications can work with no issue. This is what happen with Intel (ixgbe/i40e)
Another minor issue the rss_key_size return as zero but internally it is 40 bytes

Thanks,
Hanoh

-----Original Message-----
From: Yongseok Koh [mailto:yskoh@mellanox.com] 
Sent: Wednesday, March 21, 2018 11:48 PM
To: Hanoch Haim (hhaim)
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic

On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> Hi mlx5 driver expert,
> 
> DPDK: 17.11
> Any reason mlx5 driver change the rate table size dynamically based on 
> the rx- queues# ?

The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.

> There is a hidden assumption that the user wants to distribute the 
> packets evenly which is not always correct.

But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.

> /* If the requested number of RX queues is not a power of two, use the
>           * maximum indirection table size for better balancing.
>           * The result is always rounded to the next power of two. */
>           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
>                                            priv->ind_table_max_size :
>                                            rxqs_n));

Thanks,
Yongseok

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22  6:52   ` Hanoch Haim (hhaim)
@ 2018-03-22  8:54     ` Nélio Laranjeiro
  2018-03-22  9:02       ` Hanoch Haim (hhaim)
  0 siblings, 1 reply; 11+ messages in thread
From: Nélio Laranjeiro @ 2018-03-22  8:54 UTC (permalink / raw)
  To: Hanoch Haim (hhaim); +Cc: Yongseok Koh, dev

On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> Hi Yongseok, 
> 
> 
> RSS has a DPDK API,application can ask for the reta table size and
> configure it. In your case you are assuming specific use case and
> change the size dynamically which solve 90% of the use-cases but break
> the 10% use-case. 
> Instead, you could provide the application a consistent API and with
> that 100% of the applications can work with no issue. This is what
> happen with Intel (ixgbe/i40e)
> Another minor issue the rss_key_size return as zero but internally it
> is 40 bytes

Hi Hanoch,

Legacy DPDK API has always considered there is only a single indirection
table aka. RETA whereas this is not true [1][2] on this device.

On MLX5 there is an indirection table per Hash Rx queue according to the
list of queues making part of it.
The Hash Rx queue is configured to make the hash with configured
information:
 - Algorithm,
 - key
 - hash field (Verbs hash field)
 - Indirection table
An Hash Rx queue cannot handle multiple RSS configuration, we have an
Hash Rx queue per protocol and thus a full configuration per protocol.

In such situation, changing the RETA means stopping the traffic,
destroying every single flow, hash Rx queue, indirection table to remake
everything with the new configuration.
Until then, we always recommended to any application to restart the port
on this device after a RETA update to apply this new configuration.

Since the flow API is the new way to configure flows, application should
move to this new one instead of using old API for such behavior.
We should also remove such devop from the PMD to avoid any confusion.

Regards,

> Thanks,
> Hanoh
> 
> -----Original Message-----
> From: Yongseok Koh [mailto:yskoh@mellanox.com] 
> Sent: Wednesday, March 21, 2018 11:48 PM
> To: Hanoch Haim (hhaim)
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > Hi mlx5 driver expert,
> > 
> > DPDK: 17.11
> > Any reason mlx5 driver change the rate table size dynamically based on 
> > the rx- queues# ?
> 
> The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> 
> > There is a hidden assumption that the user wants to distribute the 
> > packets evenly which is not always correct.
> 
> But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> 
> > /* If the requested number of RX queues is not a power of two, use the
> >           * maximum indirection table size for better balancing.
> >           * The result is always rounded to the next power of two. */
> >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> >                                            priv->ind_table_max_size :
> >                                            rxqs_n));
> 
> Thanks,
> Yongseok

[1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
[2] https://dpdk.org/ml/archives/dev/2015-October/024669.html

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22  8:54     ` Nélio Laranjeiro
@ 2018-03-22  9:02       ` Hanoch Haim (hhaim)
  2018-03-22  9:27         ` Nélio Laranjeiro
  0 siblings, 1 reply; 11+ messages in thread
From: Hanoch Haim (hhaim) @ 2018-03-22  9:02 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Yongseok Koh, dev

Hi Nelio, 
I think you didn't understand me. I suggest to keep the RETA table size constant (maximum 512 in your case) and don't change its base on the number of configured Rx-queue.

This will make the DPDK API consistent. As a user I need to do tricks (allocate an odd/prime number of rx-queues) to get the RETA size constant at 512  

I'm not talking about changing the values in the RETA table which can be done while there is traffic. 

Thanks, 
Hanoh


-----Original Message-----
From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
Sent: Thursday, March 22, 2018 10:55 AM
To: Hanoch Haim (hhaim)
Cc: Yongseok Koh; dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic

On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> Hi Yongseok,
> 
> 
> RSS has a DPDK API,application can ask for the reta table size and 
> configure it. In your case you are assuming specific use case and 
> change the size dynamically which solve 90% of the use-cases but break 
> the 10% use-case.
> Instead, you could provide the application a consistent API and with 
> that 100% of the applications can work with no issue. This is what 
> happen with Intel (ixgbe/i40e) Another minor issue the rss_key_size 
> return as zero but internally it is 40 bytes

Hi Hanoch,

Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.

On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
The Hash Rx queue is configured to make the hash with configured
information:
 - Algorithm,
 - key
 - hash field (Verbs hash field)
 - Indirection table
An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.

In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.

Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
We should also remove such devop from the PMD to avoid any confusion.

Regards,

> Thanks,
> Hanoh
> 
> -----Original Message-----
> From: Yongseok Koh [mailto:yskoh@mellanox.com]
> Sent: Wednesday, March 21, 2018 11:48 PM
> To: Hanoch Haim (hhaim)
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > Hi mlx5 driver expert,
> > 
> > DPDK: 17.11
> > Any reason mlx5 driver change the rate table size dynamically based 
> > on the rx- queues# ?
> 
> The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> 
> > There is a hidden assumption that the user wants to distribute the 
> > packets evenly which is not always correct.
> 
> But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> 
> > /* If the requested number of RX queues is not a power of two, use the
> >           * maximum indirection table size for better balancing.
> >           * The result is always rounded to the next power of two. */
> >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> >                                            priv->ind_table_max_size :
> >                                            rxqs_n));
> 
> Thanks,
> Yongseok

[1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
[2] https://dpdk.org/ml/archives/dev/2015-October/024669.html

--
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22  9:02       ` Hanoch Haim (hhaim)
@ 2018-03-22  9:27         ` Nélio Laranjeiro
  2018-03-22 10:00           ` Hanoch Haim (hhaim)
  0 siblings, 1 reply; 11+ messages in thread
From: Nélio Laranjeiro @ 2018-03-22  9:27 UTC (permalink / raw)
  To: Hanoch Haim (hhaim); +Cc: Yongseok Koh, dev

Hi Hanoch,

On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> Hi Nelio, 
> I think you didn't understand me. I suggest to keep the RETA table
> size constant (maximum 512 in your case) and don't change its base on
> the number of configured Rx-queue.

It is even simpler, we can return the maximum size or a multiple of
RTE_RETA_GROUP_SIZE according to the number of Rx queues being used, in
the devop->dev_infos_get() as it is what the
rte_eth_dev_rss_reta_update() implementation will expect.
 
> This will make the DPDK API consistent. As a user I need to do tricks
> (allocate an odd/prime number of rx-queues) to get the RETA size
> constant at 512

I understand this issue, what I don't fully understand your needs.

> I'm not talking about changing the values in the RETA table which can
> be done while there is traffic. 

On MLX5 changing the entries of the RETA table don't affect the current
traffic, it needs a port restart to affect it and only for "default"
flows, any flow created through the public flow API are not impacted by
the RETA table.


>From my understanding, you wish to have a size returned by
devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
This is why you are asking for a fix size?  So, if internally the PMD
starts with a smaller RETA table does not really matter, until the RETA
API works without any trick from the application side.  Is this correct?

Thanks,

> Thanks, 
> Hanoh
> 
> 
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
> Sent: Thursday, March 22, 2018 10:55 AM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Yongseok,
> > 
> > 
> > RSS has a DPDK API,application can ask for the reta table size and 
> > configure it. In your case you are assuming specific use case and 
> > change the size dynamically which solve 90% of the use-cases but break 
> > the 10% use-case.
> > Instead, you could provide the application a consistent API and with 
> > that 100% of the applications can work with no issue. This is what 
> > happen with Intel (ixgbe/i40e) Another minor issue the rss_key_size 
> > return as zero but internally it is 40 bytes
> 
> Hi Hanoch,
> 
> Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> 
> On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> The Hash Rx queue is configured to make the hash with configured
> information:
>  - Algorithm,
>  - key
>  - hash field (Verbs hash field)
>  - Indirection table
> An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> 
> In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> 
> Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> We should also remove such devop from the PMD to avoid any confusion.
> 
> Regards,
> 
> > Thanks,
> > Hanoh
> > 
> > -----Original Message-----
> > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > Sent: Wednesday, March 21, 2018 11:48 PM
> > To: Hanoch Haim (hhaim)
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > 
> > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi mlx5 driver expert,
> > > 
> > > DPDK: 17.11
> > > Any reason mlx5 driver change the rate table size dynamically based 
> > > on the rx- queues# ?
> > 
> > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > 
> > > There is a hidden assumption that the user wants to distribute the 
> > > packets evenly which is not always correct.
> > 
> > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > 
> > > /* If the requested number of RX queues is not a power of two, use the
> > >           * maximum indirection table size for better balancing.
> > >           * The result is always rounded to the next power of two. */
> > >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > >                                            priv->ind_table_max_size :
> > >                                            rxqs_n));
> > 
> > Thanks,
> > Yongseok
> 
> [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> 
> --
> Nélio Laranjeiro
> 6WIND

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22  9:27         ` Nélio Laranjeiro
@ 2018-03-22 10:00           ` Hanoch Haim (hhaim)
  2018-03-22 10:45             ` Nélio Laranjeiro
  0 siblings, 1 reply; 11+ messages in thread
From: Hanoch Haim (hhaim) @ 2018-03-22 10:00 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Yongseok Koh, dev

Hi Nelio,

Let me provide more background. 
The context is TRex running in Advance Stateful (ASTf) mode using multi-core.  
In this case the flows are distributed using RSS. New flows (c->s) need to have a tuple that will match the generated core. For this calculation there is a need of to know the *RETA table size* 


Code:

       /*1.  verify that driver can support RSS */
       rte_eth_dev_info_get(m_repid,&dev_info);
       save_reta_size = dev_info.reta_size
       save_hash_key = dev_info.hash_key_size
       printf("RETA_SIZE : %d \n",save_reta_size);
       printf("HASH_SIZE : %d \n",save_hash_key);

       /*2.  configure queues  */
       ret = rte_eth_dev_configure(m_repid,
                                   nb_rx_queue,
                                   nb_tx_queue,
                                   eth_conf);
	..

       /* 3. reading the RETA again */
       rte_eth_dev_info_get(m_repid,&dev_info);
       save_reta_size = dev_info.reta_size        <<
       save_hash_key = dev_info.hash_key_size
       printf("RETA_SIZE1 : %d \n",save_reta_size);


       /* 4. update the RETA table */
       rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0], dev_info.reta_size)

       
2.   /*Output in case of  Intel i40e*/

       RETA_SIZE : 512
       HASH_SIZE : 52

       RETA_SIZE1 : 512

3.       /*Output in case of  Mlx5 */

       RETA_SIZE : 512
       HASH_SIZE : 0 

       RETA_SIZE1 : 4  << not round of 64 , depends on the number of rx queues


Hanoh

-----Original Message-----
From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
Sent: Thursday, March 22, 2018 11:28 AM
To: Hanoch Haim (hhaim)
Cc: Yongseok Koh; dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic

Hi Hanoch,

On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> Hi Nelio,
> I think you didn't understand me. I suggest to keep the RETA table 
> size constant (maximum 512 in your case) and don't change its base on 
> the number of configured Rx-queue.

It is even simpler, we can return the maximum size or a multiple of RTE_RETA_GROUP_SIZE according to the number of Rx queues being used, in the devop->dev_infos_get() as it is what the
rte_eth_dev_rss_reta_update() implementation will expect.
 
> This will make the DPDK API consistent. As a user I need to do tricks 
> (allocate an odd/prime number of rx-queues) to get the RETA size 
> constant at 512

I understand this issue, what I don't fully understand your needs.

> I'm not talking about changing the values in the RETA table which can 
> be done while there is traffic.

On MLX5 changing the entries of the RETA table don't affect the current traffic, it needs a port restart to affect it and only for "default"
flows, any flow created through the public flow API are not impacted by the RETA table.


>From my understanding, you wish to have a size returned by
devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
This is why you are asking for a fix size?  So, if internally the PMD starts with a smaller RETA table does not really matter, until the RETA API works without any trick from the application side.  Is this correct?

Thanks,

> Thanks,
> Hanoh
> 
> 
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Thursday, March 22, 2018 10:55 AM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Yongseok,
> > 
> > 
> > RSS has a DPDK API,application can ask for the reta table size and 
> > configure it. In your case you are assuming specific use case and 
> > change the size dynamically which solve 90% of the use-cases but 
> > break the 10% use-case.
> > Instead, you could provide the application a consistent API and with 
> > that 100% of the applications can work with no issue. This is what 
> > happen with Intel (ixgbe/i40e) Another minor issue the rss_key_size 
> > return as zero but internally it is 40 bytes
> 
> Hi Hanoch,
> 
> Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> 
> On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> The Hash Rx queue is configured to make the hash with configured
> information:
>  - Algorithm,
>  - key
>  - hash field (Verbs hash field)
>  - Indirection table
> An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> 
> In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> 
> Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> We should also remove such devop from the PMD to avoid any confusion.
> 
> Regards,
> 
> > Thanks,
> > Hanoh
> > 
> > -----Original Message-----
> > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > Sent: Wednesday, March 21, 2018 11:48 PM
> > To: Hanoch Haim (hhaim)
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > 
> > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi mlx5 driver expert,
> > > 
> > > DPDK: 17.11
> > > Any reason mlx5 driver change the rate table size dynamically 
> > > based on the rx- queues# ?
> > 
> > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > 
> > > There is a hidden assumption that the user wants to distribute the 
> > > packets evenly which is not always correct.
> > 
> > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > 
> > > /* If the requested number of RX queues is not a power of two, use the
> > >           * maximum indirection table size for better balancing.
> > >           * The result is always rounded to the next power of two. */
> > >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > >                                            priv->ind_table_max_size :
> > >                                            rxqs_n));
> > 
> > Thanks,
> > Yongseok
> 
> [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> 
> --
> Nélio Laranjeiro
> 6WIND

--
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22 10:00           ` Hanoch Haim (hhaim)
@ 2018-03-22 10:45             ` Nélio Laranjeiro
  2018-03-22 10:59               ` Hanoch Haim (hhaim)
  0 siblings, 1 reply; 11+ messages in thread
From: Nélio Laranjeiro @ 2018-03-22 10:45 UTC (permalink / raw)
  To: Hanoch Haim (hhaim); +Cc: Yongseok Koh, dev

Hi Hanoch,

On Thu, Mar 22, 2018 at 10:00:45AM +0000, Hanoch Haim (hhaim) wrote:
> Hi Nelio,
> 
> Let me provide more background. 
> The context is TRex running in Advance Stateful (ASTf) mode using multi-core.  
> In this case the flows are distributed using RSS. New flows (c->s)
> need to have a tuple that will match the generated core. For this
> calculation there is a need of to know the *RETA table size* 
> 
> 
> Code:
> 
>        /*1.  verify that driver can support RSS */
>        rte_eth_dev_info_get(m_repid,&dev_info);
>        save_reta_size = dev_info.reta_size
>        save_hash_key = dev_info.hash_key_size
>        printf("RETA_SIZE : %d \n",save_reta_size);
>        printf("HASH_SIZE : %d \n",save_hash_key);
> 
>        /*2.  configure queues  */
>        ret = rte_eth_dev_configure(m_repid,
>                                    nb_rx_queue,
>                                    nb_tx_queue,
>                                    eth_conf);
> 	..
> 
>        /* 3. reading the RETA again */
>        rte_eth_dev_info_get(m_repid,&dev_info);
>        save_reta_size = dev_info.reta_size        <<
>        save_hash_key = dev_info.hash_key_size
>        printf("RETA_SIZE1 : %d \n",save_reta_size);
> 
> 
>        /* 4. update the RETA table */
>        rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0], dev_info.reta_size)
> 
>        
> 2.   /*Output in case of  Intel i40e*/
> 
>        RETA_SIZE : 512
>        HASH_SIZE : 52
> 
>        RETA_SIZE1 : 512
> 
> 3.       /*Output in case of  Mlx5 */
> 
>        RETA_SIZE : 512
>        HASH_SIZE : 0 
> 
>        RETA_SIZE1 : 4  << not round of 64 , depends on the number of rx queues

Your need is to have a fixed size returned by the
rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it
won't modify your spreading.

An information, you are getting the hash key size, according to the
documentation of struct rte_eth_rss_conf, only the i40e can have a key
len different from 40 bytes, others should just ignore the field [1].

Regards,

[1] https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n380

> Hanoh
> 
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
> Sent: Thursday, March 22, 2018 11:28 AM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> Hi Hanoch,
> 
> On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Nelio,
> > I think you didn't understand me. I suggest to keep the RETA table 
> > size constant (maximum 512 in your case) and don't change its base on 
> > the number of configured Rx-queue.
> 
> It is even simpler, we can return the maximum size or a multiple of RTE_RETA_GROUP_SIZE according to the number of Rx queues being used, in the devop->dev_infos_get() as it is what the
> rte_eth_dev_rss_reta_update() implementation will expect.
>  
> > This will make the DPDK API consistent. As a user I need to do tricks 
> > (allocate an odd/prime number of rx-queues) to get the RETA size 
> > constant at 512
> 
> I understand this issue, what I don't fully understand your needs.
> 
> > I'm not talking about changing the values in the RETA table which can 
> > be done while there is traffic.
> 
> On MLX5 changing the entries of the RETA table don't affect the current traffic, it needs a port restart to affect it and only for "default"
> flows, any flow created through the public flow API are not impacted by the RETA table.
> 
> 
> From my understanding, you wish to have a size returned by
> devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
> This is why you are asking for a fix size?  So, if internally the PMD starts with a smaller RETA table does not really matter, until the RETA API works without any trick from the application side.  Is this correct?
> 
> Thanks,
> 
> > Thanks,
> > Hanoh
> > 
> > 
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Thursday, March 22, 2018 10:55 AM
> > To: Hanoch Haim (hhaim)
> > Cc: Yongseok Koh; dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > 
> > On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi Yongseok,
> > > 
> > > 
> > > RSS has a DPDK API,application can ask for the reta table size and 
> > > configure it. In your case you are assuming specific use case and 
> > > change the size dynamically which solve 90% of the use-cases but 
> > > break the 10% use-case.
> > > Instead, you could provide the application a consistent API and with 
> > > that 100% of the applications can work with no issue. This is what 
> > > happen with Intel (ixgbe/i40e) Another minor issue the rss_key_size 
> > > return as zero but internally it is 40 bytes
> > 
> > Hi Hanoch,
> > 
> > Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> > 
> > On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> > The Hash Rx queue is configured to make the hash with configured
> > information:
> >  - Algorithm,
> >  - key
> >  - hash field (Verbs hash field)
> >  - Indirection table
> > An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> > 
> > In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> > Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> > 
> > Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> > We should also remove such devop from the PMD to avoid any confusion.
> > 
> > Regards,
> > 
> > > Thanks,
> > > Hanoh
> > > 
> > > -----Original Message-----
> > > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > > Sent: Wednesday, March 21, 2018 11:48 PM
> > > To: Hanoch Haim (hhaim)
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > 
> > > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > > Hi mlx5 driver expert,
> > > > 
> > > > DPDK: 17.11
> > > > Any reason mlx5 driver change the rate table size dynamically 
> > > > based on the rx- queues# ?
> > > 
> > > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > > 
> > > > There is a hidden assumption that the user wants to distribute the 
> > > > packets evenly which is not always correct.
> > > 
> > > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > > 
> > > > /* If the requested number of RX queues is not a power of two, use the
> > > >           * maximum indirection table size for better balancing.
> > > >           * The result is always rounded to the next power of two. */
> > > >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > > >                                            priv->ind_table_max_size :
> > > >                                            rxqs_n));
> > > 
> > > Thanks,
> > > Yongseok
> > 
> > [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> > [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND
> 
> --
> Nélio Laranjeiro
> 6WIND

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22 10:45             ` Nélio Laranjeiro
@ 2018-03-22 10:59               ` Hanoch Haim (hhaim)
  2018-03-22 12:29                 ` Nélio Laranjeiro
  0 siblings, 1 reply; 11+ messages in thread
From: Hanoch Haim (hhaim) @ 2018-03-22 10:59 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Yongseok Koh, dev

Hi,

1) Regarding this sentence, 
"Your need is to have a fixed size returned by the rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it won't modify your spreading."

I'm fine with that as long:

1. rte_eth_dev_info_get  will expose the same *size* 
2. rte_eth_dev_rss_reta_update will behave the as there are reta_size for *any* random input (will enlarge the table internally to maximum size)
In other words, from the user prospective you will have static reta_size. 

2) "In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration."

>From an experiment I did, you *can* change it under traffic and it works without issue.
Drivers tested are: igbe/i40e/mlx5

Thanks,
Hanoh


-----Original Message-----
From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
Sent: Thursday, March 22, 2018 12:46 PM
To: Hanoch Haim (hhaim)
Cc: Yongseok Koh; dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic

Hi Hanoch,

On Thu, Mar 22, 2018 at 10:00:45AM +0000, Hanoch Haim (hhaim) wrote:
> Hi Nelio,
> 
> Let me provide more background. 
> The context is TRex running in Advance Stateful (ASTf) mode using multi-core.  
> In this case the flows are distributed using RSS. New flows (c->s) 
> need to have a tuple that will match the generated core. For this 
> calculation there is a need of to know the *RETA table size*
> 
> 
> Code:
> 
>        /*1.  verify that driver can support RSS */
>        rte_eth_dev_info_get(m_repid,&dev_info);
>        save_reta_size = dev_info.reta_size
>        save_hash_key = dev_info.hash_key_size
>        printf("RETA_SIZE : %d \n",save_reta_size);
>        printf("HASH_SIZE : %d \n",save_hash_key);
> 
>        /*2.  configure queues  */
>        ret = rte_eth_dev_configure(m_repid,
>                                    nb_rx_queue,
>                                    nb_tx_queue,
>                                    eth_conf);
> 	..
> 
>        /* 3. reading the RETA again */
>        rte_eth_dev_info_get(m_repid,&dev_info);
>        save_reta_size = dev_info.reta_size        <<
>        save_hash_key = dev_info.hash_key_size
>        printf("RETA_SIZE1 : %d \n",save_reta_size);
> 
> 
>        /* 4. update the RETA table */
>        rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0], 
> dev_info.reta_size)
> 
>        
> 2.   /*Output in case of  Intel i40e*/
> 
>        RETA_SIZE : 512
>        HASH_SIZE : 52
> 
>        RETA_SIZE1 : 512
> 
> 3.       /*Output in case of  Mlx5 */
> 
>        RETA_SIZE : 512
>        HASH_SIZE : 0
> 
>        RETA_SIZE1 : 4  << not round of 64 , depends on the number of 
> rx queues

Your need is to have a fixed size returned by the rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it won't modify your spreading.

An information, you are getting the hash key size, according to the documentation of struct rte_eth_rss_conf, only the i40e can have a key len different from 40 bytes, others should just ignore the field [1].

Regards,

[1] https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n380

> Hanoh
> 
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Thursday, March 22, 2018 11:28 AM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> Hi Hanoch,
> 
> On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Nelio,
> > I think you didn't understand me. I suggest to keep the RETA table 
> > size constant (maximum 512 in your case) and don't change its base 
> > on the number of configured Rx-queue.
> 
> It is even simpler, we can return the maximum size or a multiple of 
> RTE_RETA_GROUP_SIZE according to the number of Rx queues being used, 
> in the devop->dev_infos_get() as it is what the
> rte_eth_dev_rss_reta_update() implementation will expect.
>  
> > This will make the DPDK API consistent. As a user I need to do 
> > tricks (allocate an odd/prime number of rx-queues) to get the RETA 
> > size constant at 512
> 
> I understand this issue, what I don't fully understand your needs.
> 
> > I'm not talking about changing the values in the RETA table which 
> > can be done while there is traffic.
> 
> On MLX5 changing the entries of the RETA table don't affect the current traffic, it needs a port restart to affect it and only for "default"
> flows, any flow created through the public flow API are not impacted by the RETA table.
> 
> 
> From my understanding, you wish to have a size returned by
> devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
> This is why you are asking for a fix size?  So, if internally the PMD starts with a smaller RETA table does not really matter, until the RETA API works without any trick from the application side.  Is this correct?
> 
> Thanks,
> 
> > Thanks,
> > Hanoh
> > 
> > 
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Thursday, March 22, 2018 10:55 AM
> > To: Hanoch Haim (hhaim)
> > Cc: Yongseok Koh; dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > 
> > On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi Yongseok,
> > > 
> > > 
> > > RSS has a DPDK API,application can ask for the reta table size and 
> > > configure it. In your case you are assuming specific use case and 
> > > change the size dynamically which solve 90% of the use-cases but 
> > > break the 10% use-case.
> > > Instead, you could provide the application a consistent API and 
> > > with that 100% of the applications can work with no issue. This is 
> > > what happen with Intel (ixgbe/i40e) Another minor issue the 
> > > rss_key_size return as zero but internally it is 40 bytes
> > 
> > Hi Hanoch,
> > 
> > Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> > 
> > On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> > The Hash Rx queue is configured to make the hash with configured
> > information:
> >  - Algorithm,
> >  - key
> >  - hash field (Verbs hash field)
> >  - Indirection table
> > An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> > 
> > In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> > Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> > 
> > Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> > We should also remove such devop from the PMD to avoid any confusion.
> > 
> > Regards,
> > 
> > > Thanks,
> > > Hanoh
> > > 
> > > -----Original Message-----
> > > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > > Sent: Wednesday, March 21, 2018 11:48 PM
> > > To: Hanoch Haim (hhaim)
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > 
> > > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > > Hi mlx5 driver expert,
> > > > 
> > > > DPDK: 17.11
> > > > Any reason mlx5 driver change the rate table size dynamically 
> > > > based on the rx- queues# ?
> > > 
> > > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > > 
> > > > There is a hidden assumption that the user wants to distribute 
> > > > the packets evenly which is not always correct.
> > > 
> > > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > > 
> > > > /* If the requested number of RX queues is not a power of two, use the
> > > >           * maximum indirection table size for better balancing.
> > > >           * The result is always rounded to the next power of two. */
> > > >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > > >                                            priv->ind_table_max_size :
> > > >                                            rxqs_n));
> > > 
> > > Thanks,
> > > Yongseok
> > 
> > [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> > [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND
> 
> --
> Nélio Laranjeiro
> 6WIND

--
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22 10:59               ` Hanoch Haim (hhaim)
@ 2018-03-22 12:29                 ` Nélio Laranjeiro
  2018-03-22 12:33                   ` Hanoch Haim (hhaim)
  0 siblings, 1 reply; 11+ messages in thread
From: Nélio Laranjeiro @ 2018-03-22 12:29 UTC (permalink / raw)
  To: Hanoch Haim (hhaim); +Cc: Yongseok Koh, dev

Hi,

On Thu, Mar 22, 2018 at 10:59:36AM +0000, Hanoch Haim (hhaim) wrote:
> Hi,
> 
> 1) Regarding this sentence, 
> "Your need is to have a fixed size returned by the
> rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it
> won't modify your spreading."
> 
> I'm fine with that as long:
> 
> 1. rte_eth_dev_info_get  will expose the same *size* 
> 2. rte_eth_dev_rss_reta_update will behave the as there are reta_size
> for *any* random input (will enlarge the table internally to maximum
> size)
> In other words, from the user prospective you will have static
> reta_size. 

Good, the requirement is clear enough for me i.e. user static RETA table
size and spreading accordingly.

> 2) "In such situation, changing the RETA means stopping the traffic,
> destroying every single flow, hash Rx queue, indirection table to
> remake everything with the new configuration.
> Until then, we always recommended to any application to restart the
> port on this device after a RETA update to apply this new
> configuration."
> 
> From an experiment I did, you *can* change it under traffic and it works without issue.
> Drivers tested are: igbe/i40e/mlx5

hmm, it is certainly calling a devops which will end by calling
mlx5_traffic_start().

Thanks,

> Thanks,
> Hanoh
> 
> 
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
> Sent: Thursday, March 22, 2018 12:46 PM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> Hi Hanoch,
> 
> On Thu, Mar 22, 2018 at 10:00:45AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Nelio,
> > 
> > Let me provide more background. 
> > The context is TRex running in Advance Stateful (ASTf) mode using multi-core.  
> > In this case the flows are distributed using RSS. New flows (c->s) 
> > need to have a tuple that will match the generated core. For this 
> > calculation there is a need of to know the *RETA table size*
> > 
> > 
> > Code:
> > 
> >        /*1.  verify that driver can support RSS */
> >        rte_eth_dev_info_get(m_repid,&dev_info);
> >        save_reta_size = dev_info.reta_size
> >        save_hash_key = dev_info.hash_key_size
> >        printf("RETA_SIZE : %d \n",save_reta_size);
> >        printf("HASH_SIZE : %d \n",save_hash_key);
> > 
> >        /*2.  configure queues  */
> >        ret = rte_eth_dev_configure(m_repid,
> >                                    nb_rx_queue,
> >                                    nb_tx_queue,
> >                                    eth_conf);
> > 	..
> > 
> >        /* 3. reading the RETA again */
> >        rte_eth_dev_info_get(m_repid,&dev_info);
> >        save_reta_size = dev_info.reta_size        <<
> >        save_hash_key = dev_info.hash_key_size
> >        printf("RETA_SIZE1 : %d \n",save_reta_size);
> > 
> > 
> >        /* 4. update the RETA table */
> >        rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0], 
> > dev_info.reta_size)
> > 
> >        
> > 2.   /*Output in case of  Intel i40e*/
> > 
> >        RETA_SIZE : 512
> >        HASH_SIZE : 52
> > 
> >        RETA_SIZE1 : 512
> > 
> > 3.       /*Output in case of  Mlx5 */
> > 
> >        RETA_SIZE : 512
> >        HASH_SIZE : 0
> > 
> >        RETA_SIZE1 : 4  << not round of 64 , depends on the number of 
> > rx queues
> 
> Your need is to have a fixed size returned by the rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it won't modify your spreading.
> 
> An information, you are getting the hash key size, according to the documentation of struct rte_eth_rss_conf, only the i40e can have a key len different from 40 bytes, others should just ignore the field [1].
> 
> Regards,
> 
> [1] https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n380
> 
> > Hanoh
> > 
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Thursday, March 22, 2018 11:28 AM
> > To: Hanoch Haim (hhaim)
> > Cc: Yongseok Koh; dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > 
> > Hi Hanoch,
> > 
> > On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi Nelio,
> > > I think you didn't understand me. I suggest to keep the RETA table 
> > > size constant (maximum 512 in your case) and don't change its base 
> > > on the number of configured Rx-queue.
> > 
> > It is even simpler, we can return the maximum size or a multiple of 
> > RTE_RETA_GROUP_SIZE according to the number of Rx queues being used, 
> > in the devop->dev_infos_get() as it is what the
> > rte_eth_dev_rss_reta_update() implementation will expect.
> >  
> > > This will make the DPDK API consistent. As a user I need to do 
> > > tricks (allocate an odd/prime number of rx-queues) to get the RETA 
> > > size constant at 512
> > 
> > I understand this issue, what I don't fully understand your needs.
> > 
> > > I'm not talking about changing the values in the RETA table which 
> > > can be done while there is traffic.
> > 
> > On MLX5 changing the entries of the RETA table don't affect the current traffic, it needs a port restart to affect it and only for "default"
> > flows, any flow created through the public flow API are not impacted by the RETA table.
> > 
> > 
> > From my understanding, you wish to have a size returned by
> > devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
> > This is why you are asking for a fix size?  So, if internally the PMD starts with a smaller RETA table does not really matter, until the RETA API works without any trick from the application side.  Is this correct?
> > 
> > Thanks,
> > 
> > > Thanks,
> > > Hanoh
> > > 
> > > 
> > > -----Original Message-----
> > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > > Sent: Thursday, March 22, 2018 10:55 AM
> > > To: Hanoch Haim (hhaim)
> > > Cc: Yongseok Koh; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > 
> > > On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > > > Hi Yongseok,
> > > > 
> > > > 
> > > > RSS has a DPDK API,application can ask for the reta table size and 
> > > > configure it. In your case you are assuming specific use case and 
> > > > change the size dynamically which solve 90% of the use-cases but 
> > > > break the 10% use-case.
> > > > Instead, you could provide the application a consistent API and 
> > > > with that 100% of the applications can work with no issue. This is 
> > > > what happen with Intel (ixgbe/i40e) Another minor issue the 
> > > > rss_key_size return as zero but internally it is 40 bytes
> > > 
> > > Hi Hanoch,
> > > 
> > > Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> > > 
> > > On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> > > The Hash Rx queue is configured to make the hash with configured
> > > information:
> > >  - Algorithm,
> > >  - key
> > >  - hash field (Verbs hash field)
> > >  - Indirection table
> > > An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> > > 
> > > In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> > > Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> > > 
> > > Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> > > We should also remove such devop from the PMD to avoid any confusion.
> > > 
> > > Regards,
> > > 
> > > > Thanks,
> > > > Hanoh
> > > > 
> > > > -----Original Message-----
> > > > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > > > Sent: Wednesday, March 21, 2018 11:48 PM
> > > > To: Hanoch Haim (hhaim)
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > > 
> > > > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > > > Hi mlx5 driver expert,
> > > > > 
> > > > > DPDK: 17.11
> > > > > Any reason mlx5 driver change the rate table size dynamically 
> > > > > based on the rx- queues# ?
> > > > 
> > > > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > > > 
> > > > > There is a hidden assumption that the user wants to distribute 
> > > > > the packets evenly which is not always correct.
> > > > 
> > > > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > > > 
> > > > > /* If the requested number of RX queues is not a power of two, use the
> > > > >           * maximum indirection table size for better balancing.
> > > > >           * The result is always rounded to the next power of two. */
> > > > >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > > > >                                            priv->ind_table_max_size :
> > > > >                                            rxqs_n));
> > > > 
> > > > Thanks,
> > > > Yongseok
> > > 
> > > [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> > > [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> > > 
> > > --
> > > Nélio Laranjeiro
> > > 6WIND
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND
> 
> --
> Nélio Laranjeiro
> 6WIND

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] mlx5 reta size is dynamic
  2018-03-22 12:29                 ` Nélio Laranjeiro
@ 2018-03-22 12:33                   ` Hanoch Haim (hhaim)
  0 siblings, 0 replies; 11+ messages in thread
From: Hanoch Haim (hhaim) @ 2018-03-22 12:33 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Yongseok Koh, dev

Regarding #2 
For some reason the "rte_eth_dev_rss_reta_update" API didn't make a change for Intel NIC if it was called *before* start. (weird I agree)
Moving it after start API solve the issue for all the drivers ..

Thanks,
Hanoh


-----Original Message-----
From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] 
Sent: Thursday, March 22, 2018 2:29 PM
To: Hanoch Haim (hhaim)
Cc: Yongseok Koh; dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic

Hi,

On Thu, Mar 22, 2018 at 10:59:36AM +0000, Hanoch Haim (hhaim) wrote:
> Hi,
> 
> 1) Regarding this sentence,
> "Your need is to have a fixed size returned by the 
> rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it 
> won't modify your spreading."
> 
> I'm fine with that as long:
> 
> 1. rte_eth_dev_info_get  will expose the same *size* 2. 
> rte_eth_dev_rss_reta_update will behave the as there are reta_size for 
> *any* random input (will enlarge the table internally to maximum
> size)
> In other words, from the user prospective you will have static 
> reta_size.

Good, the requirement is clear enough for me i.e. user static RETA table size and spreading accordingly.

> 2) "In such situation, changing the RETA means stopping the traffic, 
> destroying every single flow, hash Rx queue, indirection table to 
> remake everything with the new configuration.
> Until then, we always recommended to any application to restart the 
> port on this device after a RETA update to apply this new 
> configuration."
> 
> From an experiment I did, you *can* change it under traffic and it works without issue.
> Drivers tested are: igbe/i40e/mlx5

hmm, it is certainly calling a devops which will end by calling mlx5_traffic_start().

Thanks,

> Thanks,
> Hanoh
> 
> 
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Thursday, March 22, 2018 12:46 PM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> 
> Hi Hanoch,
> 
> On Thu, Mar 22, 2018 at 10:00:45AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Nelio,
> > 
> > Let me provide more background. 
> > The context is TRex running in Advance Stateful (ASTf) mode using multi-core.  
> > In this case the flows are distributed using RSS. New flows (c->s) 
> > need to have a tuple that will match the generated core. For this 
> > calculation there is a need of to know the *RETA table size*
> > 
> > 
> > Code:
> > 
> >        /*1.  verify that driver can support RSS */
> >        rte_eth_dev_info_get(m_repid,&dev_info);
> >        save_reta_size = dev_info.reta_size
> >        save_hash_key = dev_info.hash_key_size
> >        printf("RETA_SIZE : %d \n",save_reta_size);
> >        printf("HASH_SIZE : %d \n",save_hash_key);
> > 
> >        /*2.  configure queues  */
> >        ret = rte_eth_dev_configure(m_repid,
> >                                    nb_rx_queue,
> >                                    nb_tx_queue,
> >                                    eth_conf);
> > 	..
> > 
> >        /* 3. reading the RETA again */
> >        rte_eth_dev_info_get(m_repid,&dev_info);
> >        save_reta_size = dev_info.reta_size        <<
> >        save_hash_key = dev_info.hash_key_size
> >        printf("RETA_SIZE1 : %d \n",save_reta_size);
> > 
> > 
> >        /* 4. update the RETA table */
> >        rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0],
> > dev_info.reta_size)
> > 
> >        
> > 2.   /*Output in case of  Intel i40e*/
> > 
> >        RETA_SIZE : 512
> >        HASH_SIZE : 52
> > 
> >        RETA_SIZE1 : 512
> > 
> > 3.       /*Output in case of  Mlx5 */
> > 
> >        RETA_SIZE : 512
> >        HASH_SIZE : 0
> > 
> >        RETA_SIZE1 : 4  << not round of 64 , depends on the number of 
> > rx queues
> 
> Your need is to have a fixed size returned by the rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it won't modify your spreading.
> 
> An information, you are getting the hash key size, according to the documentation of struct rte_eth_rss_conf, only the i40e can have a key len different from 40 bytes, others should just ignore the field [1].
> 
> Regards,
> 
> [1] 
> https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n380
> 
> > Hanoh
> > 
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Thursday, March 22, 2018 11:28 AM
> > To: Hanoch Haim (hhaim)
> > Cc: Yongseok Koh; dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > 
> > Hi Hanoch,
> > 
> > On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi Nelio,
> > > I think you didn't understand me. I suggest to keep the RETA table 
> > > size constant (maximum 512 in your case) and don't change its base 
> > > on the number of configured Rx-queue.
> > 
> > It is even simpler, we can return the maximum size or a multiple of 
> > RTE_RETA_GROUP_SIZE according to the number of Rx queues being used, 
> > in the devop->dev_infos_get() as it is what the
> > rte_eth_dev_rss_reta_update() implementation will expect.
> >  
> > > This will make the DPDK API consistent. As a user I need to do 
> > > tricks (allocate an odd/prime number of rx-queues) to get the RETA 
> > > size constant at 512
> > 
> > I understand this issue, what I don't fully understand your needs.
> > 
> > > I'm not talking about changing the values in the RETA table which 
> > > can be done while there is traffic.
> > 
> > On MLX5 changing the entries of the RETA table don't affect the current traffic, it needs a port restart to affect it and only for "default"
> > flows, any flow created through the public flow API are not impacted by the RETA table.
> > 
> > 
> > From my understanding, you wish to have a size returned by
> > devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
> > This is why you are asking for a fix size?  So, if internally the PMD starts with a smaller RETA table does not really matter, until the RETA API works without any trick from the application side.  Is this correct?
> > 
> > Thanks,
> > 
> > > Thanks,
> > > Hanoh
> > > 
> > > 
> > > -----Original Message-----
> > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > > Sent: Thursday, March 22, 2018 10:55 AM
> > > To: Hanoch Haim (hhaim)
> > > Cc: Yongseok Koh; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > 
> > > On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > > > Hi Yongseok,
> > > > 
> > > > 
> > > > RSS has a DPDK API,application can ask for the reta table size 
> > > > and configure it. In your case you are assuming specific use 
> > > > case and change the size dynamically which solve 90% of the 
> > > > use-cases but break the 10% use-case.
> > > > Instead, you could provide the application a consistent API and 
> > > > with that 100% of the applications can work with no issue. This 
> > > > is what happen with Intel (ixgbe/i40e) Another minor issue the 
> > > > rss_key_size return as zero but internally it is 40 bytes
> > > 
> > > Hi Hanoch,
> > > 
> > > Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> > > 
> > > On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> > > The Hash Rx queue is configured to make the hash with configured
> > > information:
> > >  - Algorithm,
> > >  - key
> > >  - hash field (Verbs hash field)
> > >  - Indirection table
> > > An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> > > 
> > > In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> > > Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> > > 
> > > Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> > > We should also remove such devop from the PMD to avoid any confusion.
> > > 
> > > Regards,
> > > 
> > > > Thanks,
> > > > Hanoh
> > > > 
> > > > -----Original Message-----
> > > > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > > > Sent: Wednesday, March 21, 2018 11:48 PM
> > > > To: Hanoch Haim (hhaim)
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > > 
> > > > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > > > Hi mlx5 driver expert,
> > > > > 
> > > > > DPDK: 17.11
> > > > > Any reason mlx5 driver change the rate table size dynamically 
> > > > > based on the rx- queues# ?
> > > > 
> > > > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > > > 
> > > > > There is a hidden assumption that the user wants to distribute 
> > > > > the packets evenly which is not always correct.
> > > > 
> > > > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > > > 
> > > > > /* If the requested number of RX queues is not a power of two, use the
> > > > >           * maximum indirection table size for better balancing.
> > > > >           * The result is always rounded to the next power of two. */
> > > > >           reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > > > >                                            priv->ind_table_max_size :
> > > > >                                            rxqs_n));
> > > > 
> > > > Thanks,
> > > > Yongseok
> > > 
> > > [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> > > [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> > > 
> > > --
> > > Nélio Laranjeiro
> > > 6WIND
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND
> 
> --
> Nélio Laranjeiro
> 6WIND

--
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-22 12:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21 18:56 [dpdk-dev] mlx5 reta size is dynamic Hanoch Haim (hhaim)
2018-03-21 21:47 ` Yongseok Koh
2018-03-22  6:52   ` Hanoch Haim (hhaim)
2018-03-22  8:54     ` Nélio Laranjeiro
2018-03-22  9:02       ` Hanoch Haim (hhaim)
2018-03-22  9:27         ` Nélio Laranjeiro
2018-03-22 10:00           ` Hanoch Haim (hhaim)
2018-03-22 10:45             ` Nélio Laranjeiro
2018-03-22 10:59               ` Hanoch Haim (hhaim)
2018-03-22 12:29                 ` Nélio Laranjeiro
2018-03-22 12:33                   ` Hanoch Haim (hhaim)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).