From: "Hanoch Haim (hhaim)" <hhaim@cisco.com>
To: "Nélio Laranjeiro" <nelio.laranjeiro@6wind.com>
Cc: Yongseok Koh <yskoh@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
Date: Thu, 22 Mar 2018 12:33:51 +0000 [thread overview]
Message-ID: <6d30b3680a6d4e12b43a8e50b29bbd90@XCH-RTP-017.cisco.com> (raw)
In-Reply-To: <20180322122927.gfevzmmdkdzq4n66@laranjeiro-vm.dev.6wind.com>
Regarding #2
For some reason the "rte_eth_dev_rss_reta_update" API didn't make a change for Intel NIC if it was called *before* start. (weird I agree)
Moving it after start API solve the issue for all the drivers ..
Thanks,
Hanoh
-----Original Message-----
From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
Sent: Thursday, March 22, 2018 2:29 PM
To: Hanoch Haim (hhaim)
Cc: Yongseok Koh; dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
Hi,
On Thu, Mar 22, 2018 at 10:59:36AM +0000, Hanoch Haim (hhaim) wrote:
> Hi,
>
> 1) Regarding this sentence,
> "Your need is to have a fixed size returned by the
> rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it
> won't modify your spreading."
>
> I'm fine with that as long:
>
> 1. rte_eth_dev_info_get will expose the same *size* 2.
> rte_eth_dev_rss_reta_update will behave the as there are reta_size for
> *any* random input (will enlarge the table internally to maximum
> size)
> In other words, from the user prospective you will have static
> reta_size.
Good, the requirement is clear enough for me i.e. user static RETA table size and spreading accordingly.
> 2) "In such situation, changing the RETA means stopping the traffic,
> destroying every single flow, hash Rx queue, indirection table to
> remake everything with the new configuration.
> Until then, we always recommended to any application to restart the
> port on this device after a RETA update to apply this new
> configuration."
>
> From an experiment I did, you *can* change it under traffic and it works without issue.
> Drivers tested are: igbe/i40e/mlx5
hmm, it is certainly calling a devops which will end by calling mlx5_traffic_start().
Thanks,
> Thanks,
> Hanoh
>
>
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Thursday, March 22, 2018 12:46 PM
> To: Hanoch Haim (hhaim)
> Cc: Yongseok Koh; dev@dpdk.org
> Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
>
> Hi Hanoch,
>
> On Thu, Mar 22, 2018 at 10:00:45AM +0000, Hanoch Haim (hhaim) wrote:
> > Hi Nelio,
> >
> > Let me provide more background.
> > The context is TRex running in Advance Stateful (ASTf) mode using multi-core.
> > In this case the flows are distributed using RSS. New flows (c->s)
> > need to have a tuple that will match the generated core. For this
> > calculation there is a need of to know the *RETA table size*
> >
> >
> > Code:
> >
> > /*1. verify that driver can support RSS */
> > rte_eth_dev_info_get(m_repid,&dev_info);
> > save_reta_size = dev_info.reta_size
> > save_hash_key = dev_info.hash_key_size
> > printf("RETA_SIZE : %d \n",save_reta_size);
> > printf("HASH_SIZE : %d \n",save_hash_key);
> >
> > /*2. configure queues */
> > ret = rte_eth_dev_configure(m_repid,
> > nb_rx_queue,
> > nb_tx_queue,
> > eth_conf);
> > ..
> >
> > /* 3. reading the RETA again */
> > rte_eth_dev_info_get(m_repid,&dev_info);
> > save_reta_size = dev_info.reta_size <<
> > save_hash_key = dev_info.hash_key_size
> > printf("RETA_SIZE1 : %d \n",save_reta_size);
> >
> >
> > /* 4. update the RETA table */
> > rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0],
> > dev_info.reta_size)
> >
> >
> > 2. /*Output in case of Intel i40e*/
> >
> > RETA_SIZE : 512
> > HASH_SIZE : 52
> >
> > RETA_SIZE1 : 512
> >
> > 3. /*Output in case of Mlx5 */
> >
> > RETA_SIZE : 512
> > HASH_SIZE : 0
> >
> > RETA_SIZE1 : 4 << not round of 64 , depends on the number of
> > rx queues
>
> Your need is to have a fixed size returned by the rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it won't modify your spreading.
>
> An information, you are getting the hash key size, according to the documentation of struct rte_eth_rss_conf, only the i40e can have a key len different from 40 bytes, others should just ignore the field [1].
>
> Regards,
>
> [1]
> https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n380
>
> > Hanoh
> >
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Thursday, March 22, 2018 11:28 AM
> > To: Hanoch Haim (hhaim)
> > Cc: Yongseok Koh; dev@dpdk.org
> > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> >
> > Hi Hanoch,
> >
> > On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote:
> > > Hi Nelio,
> > > I think you didn't understand me. I suggest to keep the RETA table
> > > size constant (maximum 512 in your case) and don't change its base
> > > on the number of configured Rx-queue.
> >
> > It is even simpler, we can return the maximum size or a multiple of
> > RTE_RETA_GROUP_SIZE according to the number of Rx queues being used,
> > in the devop->dev_infos_get() as it is what the
> > rte_eth_dev_rss_reta_update() implementation will expect.
> >
> > > This will make the DPDK API consistent. As a user I need to do
> > > tricks (allocate an odd/prime number of rx-queues) to get the RETA
> > > size constant at 512
> >
> > I understand this issue, what I don't fully understand your needs.
> >
> > > I'm not talking about changing the values in the RETA table which
> > > can be done while there is traffic.
> >
> > On MLX5 changing the entries of the RETA table don't affect the current traffic, it needs a port restart to affect it and only for "default"
> > flows, any flow created through the public flow API are not impacted by the RETA table.
> >
> >
> > From my understanding, you wish to have a size returned by
> > devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update().
> > This is why you are asking for a fix size? So, if internally the PMD starts with a smaller RETA table does not really matter, until the RETA API works without any trick from the application side. Is this correct?
> >
> > Thanks,
> >
> > > Thanks,
> > > Hanoh
> > >
> > >
> > > -----Original Message-----
> > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > > Sent: Thursday, March 22, 2018 10:55 AM
> > > To: Hanoch Haim (hhaim)
> > > Cc: Yongseok Koh; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > >
> > > On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote:
> > > > Hi Yongseok,
> > > >
> > > >
> > > > RSS has a DPDK API,application can ask for the reta table size
> > > > and configure it. In your case you are assuming specific use
> > > > case and change the size dynamically which solve 90% of the
> > > > use-cases but break the 10% use-case.
> > > > Instead, you could provide the application a consistent API and
> > > > with that 100% of the applications can work with no issue. This
> > > > is what happen with Intel (ixgbe/i40e) Another minor issue the
> > > > rss_key_size return as zero but internally it is 40 bytes
> > >
> > > Hi Hanoch,
> > >
> > > Legacy DPDK API has always considered there is only a single indirection table aka. RETA whereas this is not true [1][2] on this device.
> > >
> > > On MLX5 there is an indirection table per Hash Rx queue according to the list of queues making part of it.
> > > The Hash Rx queue is configured to make the hash with configured
> > > information:
> > > - Algorithm,
> > > - key
> > > - hash field (Verbs hash field)
> > > - Indirection table
> > > An Hash Rx queue cannot handle multiple RSS configuration, we have an Hash Rx queue per protocol and thus a full configuration per protocol.
> > >
> > > In such situation, changing the RETA means stopping the traffic, destroying every single flow, hash Rx queue, indirection table to remake everything with the new configuration.
> > > Until then, we always recommended to any application to restart the port on this device after a RETA update to apply this new configuration.
> > >
> > > Since the flow API is the new way to configure flows, application should move to this new one instead of using old API for such behavior.
> > > We should also remove such devop from the PMD to avoid any confusion.
> > >
> > > Regards,
> > >
> > > > Thanks,
> > > > Hanoh
> > > >
> > > > -----Original Message-----
> > > > From: Yongseok Koh [mailto:yskoh@mellanox.com]
> > > > Sent: Wednesday, March 21, 2018 11:48 PM
> > > > To: Hanoch Haim (hhaim)
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic
> > > >
> > > > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote:
> > > > > Hi mlx5 driver expert,
> > > > >
> > > > > DPDK: 17.11
> > > > > Any reason mlx5 driver change the rate table size dynamically
> > > > > based on the rx- queues# ?
> > > >
> > > > The device only supports 2^n-sized indirection table. For example, if the number of Rx queues is 6, device can't have 1-1 mapping but the size of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for example, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues receives 1/8. We thought it was too much disparity and preferred setting the max size in order to mitigate the imbalance.
> > > >
> > > > > There is a hidden assumption that the user wants to distribute
> > > > > the packets evenly which is not always correct.
> > > >
> > > > But it is mostly correct because RSS is used for uniform distribution. The decision wasn't made based on our speculation but by many request from multiple customers.
> > > >
> > > > > /* If the requested number of RX queues is not a power of two, use the
> > > > > * maximum indirection table size for better balancing.
> > > > > * The result is always rounded to the next power of two. */
> > > > > reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
> > > > > priv->ind_table_max_size :
> > > > > rxqs_n));
> > > >
> > > > Thanks,
> > > > Yongseok
> > >
> > > [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html
> > > [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html
> > >
> > > --
> > > Nélio Laranjeiro
> > > 6WIND
> >
> > --
> > Nélio Laranjeiro
> > 6WIND
>
> --
> Nélio Laranjeiro
> 6WIND
--
Nélio Laranjeiro
6WIND
prev parent reply other threads:[~2018-03-22 12:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-21 18:56 Hanoch Haim (hhaim)
2018-03-21 21:47 ` Yongseok Koh
2018-03-22 6:52 ` Hanoch Haim (hhaim)
2018-03-22 8:54 ` Nélio Laranjeiro
2018-03-22 9:02 ` Hanoch Haim (hhaim)
2018-03-22 9:27 ` Nélio Laranjeiro
2018-03-22 10:00 ` Hanoch Haim (hhaim)
2018-03-22 10:45 ` Nélio Laranjeiro
2018-03-22 10:59 ` Hanoch Haim (hhaim)
2018-03-22 12:29 ` Nélio Laranjeiro
2018-03-22 12:33 ` Hanoch Haim (hhaim) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d30b3680a6d4e12b43a8e50b29bbd90@XCH-RTP-017.cisco.com \
--to=hhaim@cisco.com \
--cc=dev@dpdk.org \
--cc=nelio.laranjeiro@6wind.com \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).