DPDK patches and discussions
 help / color / mirror / Atom feed
* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
@ 2020-08-03 15:18 Slava Ovsiienko
  2020-08-03 15:31 ` Andrew Rybchenko
  0 siblings, 1 reply; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-03 15:18 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, ferruh.yigit,
	jerinjacobk, stephen, ajit.khaparde, maxime.coquelin,
	olivier.matz, david.marchand

Hi, Andrew

Thanks for the comment, please, see below.

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Monday, August 3, 2020 17:31
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> ferruh.yigit@intel.com; jerinjacobk@gmail.com;
> stephen@networkplumber.org; ajit.khaparde@broadcom.com;
> maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> david.marchand@redhat.com
> Subject: Re: ***Spam*** [PATCH] doc: announce changes to ethdev rxconf
> structure
> 
> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
> > The DPDK datapath in the transmit direction is very flexible.
> > The applications can build multisegment packets and manages almost all
> > data aspects - the memory pools where segments are allocated from, the
> > segment lengths, the memory attributes like external, registered, etc.
> >
> > In the receiving direction, the datapath is much less flexible, the
> > applications can only specify the memory pool to configure the
> > receiving queue and nothing more. In order to extend the receiving
> > datapath capabilities it is proposed to add the new fields into
> > rte_eth_rxconf structure:
> >
> > struct rte_eth_rxconf {
> >     ...
> >     uint16_t rx_split_num; /* number of segments to split */
> >     uint16_t *rx_split_len; /* array of segment lengthes */
> >     struct rte_mempool **mp; /* array of segment memory pools */
> >     ...
> > };
> >
> > The non-zero value of rx_split_num field configures the receiving
> > queue to split ingress packets into multiple segments to the mbufs
> > allocated from various memory pools according to the specified
> > lengths. The zero value of rx_split_num field provides the backward
> > compatibility and queue should be configured in a regular way (with
> > single/multiple mbufs of the same data buffer length allocated from
> > the single memory pool).
> 
> From the above description it is not 100% clear how it will coexist with:
>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>  - DEV_RX_OFFLOAD_SCATTER

DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
for the new feature to indicate the application is prepared for the 
multisegment packets.

But SCATTER it just tells that ingress packet length can exceed
the mbuf data buffer length and the chain of mbufs must be built to store
the entire packet. But there is the limitation - all mbufs are allocated
 from the same memory pool, and all data buffers have the same length.
The new feature provides an opportunity to allocated mbufs from the desired
pools and specifies the length of each buffer/part.

>  - DEV_RX_OFFLOAD_HEADER_SPLIT
The new feature (let's name it as "BUFFER_SPLIT") might be supported
in conjunction with HEADER_SPLIT (say, split the rest of the data after the header)
or rejected if HEADER_SPLIT is configured on the port, depending on PMD
implementation (return ENOTSUP if both features are requested on the same port).

> How will application know that the feature is supported? Limitations?
It is subject for further discussion, I see two options:
 - introduce the DEV_RX_OFFLOAD_BUFFER_SPLIT flag
- return ENOTSUP/EINVAL from rx_queue_setup() if feature is requested
  (mp parameter is supposed to be NULL for the case)

> Is it always split by specified/fixed length?
Yes, it is simple feature, it splits the data to the buffers with required
memory attributes provided by specified pools according to the fixed lengths.
It should be OK for protocols like eCPRI or some tunneling.

> What happens if header length is actually different?
It is per queue configuration, packets might be sorted with rte_flow engine between the queues.
The supposed use case is to filter out specific protocol packets (say eCPRI with fixed header length)
and split ones on specific Rx queue.


With best regards,
Slava

> 
> > The new approach would allow splitting the ingress packets into
> > multiple parts pushed to the memory with different attributes.
> > For example, the packet headers can be pushed to the embedded data
> > buffers within mbufs and the application data into the external
> > buffers attached to mbufs allocated from the different memory pools.
> > The memory attributes for the split parts may differ either - for
> > example the application data may be pushed into the external memory
> > located on the dedicated physical device, say GPU or NVMe. This would
> > improve the DPDK receiving datapath flexibility preserving
> > compatibility with existing API.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index ea4cfa7..cd700ae 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -99,6 +99,11 @@ Deprecation Notices
> >    In 19.11 PMDs will still update the field even when the offload is not
> >    enabled.
> >
> > +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> > +receiving
> > +  queues to split ingress packets into multiple segments according to
> > +the
> > +  specified lengths into the buffers allocated from the specified
> > +  memory pools. The backward compatibility to existing API is preserved.
> > +
> >  * ethdev: ``rx_descriptor_done`` dev_ops and
> ``rte_eth_rx_descriptor_done``
> >    will be deprecated in 20.11 and will be removed in 21.11.
> >    Existing ``rte_eth_rx_descriptor_status`` and
> > ``rte_eth_tx_descriptor_status``


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 15:18 [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure Slava Ovsiienko
@ 2020-08-03 15:31 ` Andrew Rybchenko
  2020-08-03 16:51   ` Slava Ovsiienko
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Rybchenko @ 2020-08-03 15:31 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, ferruh.yigit,
	jerinjacobk, stephen, ajit.khaparde, maxime.coquelin,
	olivier.matz, david.marchand

Hi Slava,

On 8/3/20 6:18 PM, Slava Ovsiienko wrote:
> Hi, Andrew
> 
> Thanks for the comment, please, see below.
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Monday, August 3, 2020 17:31
>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
>> ferruh.yigit@intel.com; jerinjacobk@gmail.com;
>> stephen@networkplumber.org; ajit.khaparde@broadcom.com;
>> maxime.coquelin@redhat.com; olivier.matz@6wind.com;
>> david.marchand@redhat.com
>> Subject: Re: ***Spam*** [PATCH] doc: announce changes to ethdev rxconf
>> structure
>>
>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
>>> The DPDK datapath in the transmit direction is very flexible.
>>> The applications can build multisegment packets and manages almost all
>>> data aspects - the memory pools where segments are allocated from, the
>>> segment lengths, the memory attributes like external, registered, etc.
>>>
>>> In the receiving direction, the datapath is much less flexible, the
>>> applications can only specify the memory pool to configure the
>>> receiving queue and nothing more. In order to extend the receiving
>>> datapath capabilities it is proposed to add the new fields into
>>> rte_eth_rxconf structure:
>>>
>>> struct rte_eth_rxconf {
>>>     ...
>>>     uint16_t rx_split_num; /* number of segments to split */
>>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>>     struct rte_mempool **mp; /* array of segment memory pools */
>>>     ...
>>> };
>>>
>>> The non-zero value of rx_split_num field configures the receiving
>>> queue to split ingress packets into multiple segments to the mbufs
>>> allocated from various memory pools according to the specified
>>> lengths. The zero value of rx_split_num field provides the backward
>>> compatibility and queue should be configured in a regular way (with
>>> single/multiple mbufs of the same data buffer length allocated from
>>> the single memory pool).
>>
>> From the above description it is not 100% clear how it will coexist with:
>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>>  - DEV_RX_OFFLOAD_SCATTER
> 
> DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
> for the new feature to indicate the application is prepared for the 
> multisegment packets.

I hope it will be mentioned in the feature documentation in the future,
but I'm not 100% sure that it is required. See below.

> 
> But SCATTER it just tells that ingress packet length can exceed
> the mbuf data buffer length and the chain of mbufs must be built to store
> the entire packet. But there is the limitation - all mbufs are allocated
>  from the same memory pool, and all data buffers have the same length.
> The new feature provides an opportunity to allocated mbufs from the desired
> pools and specifies the length of each buffer/part.

Yes, it is clear, but what happens if packet does not fit into
the provided pools chain? Is the last used many times? May be
it logical to use Rx queue setup mb_pool as well for the
purpose? I.e. use suggested here pools only once and use
mb_pool many times for the rest if SCATTER is supported and
only once if SCATTER is not supported.

> 
>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
> The new feature (let's name it as "BUFFER_SPLIT") might be supported
> in conjunction with HEADER_SPLIT (say, split the rest of the data after the header)
> or rejected if HEADER_SPLIT is configured on the port, depending on PMD
> implementation (return ENOTSUP if both features are requested on the same port).

OK, consider to make SCATTER and BUFFER_SPLIT independent as
suggested above.

> 
>> How will application know that the feature is supported? Limitations?
> It is subject for further discussion, I see two options:
>  - introduce the DEV_RX_OFFLOAD_BUFFER_SPLIT flag

+1

> - return ENOTSUP/EINVAL from rx_queue_setup() if feature is requested
>   (mp parameter is supposed to be NULL for the case)

I'd say that it should be used for corner cases only which are
hard to formalize. It could be important to know maximum
number of buffers to split, total length which could be split
from the remaining, limitations on split lengths.

> 
>> Is it always split by specified/fixed length?
> Yes, it is simple feature, it splits the data to the buffers with required
> memory attributes provided by specified pools according to the fixed lengths.
> It should be OK for protocols like eCPRI or some tunneling.

I see. Thanks.

> 
>> What happens if header length is actually different?
> It is per queue configuration, packets might be sorted with rte_flow engine between the queues.
> The supposed use case is to filter out specific protocol packets (say eCPRI with fixed header length)
> and split ones on specific Rx queue.

Got it.

Thanks,
Andrew.

> 
> 
> With best regards,
> Slava
> 
>>
>>> The new approach would allow splitting the ingress packets into
>>> multiple parts pushed to the memory with different attributes.
>>> For example, the packet headers can be pushed to the embedded data
>>> buffers within mbufs and the application data into the external
>>> buffers attached to mbufs allocated from the different memory pools.
>>> The memory attributes for the split parts may differ either - for
>>> example the application data may be pushed into the external memory
>>> located on the dedicated physical device, say GPU or NVMe. This would
>>> improve the DPDK receiving datapath flexibility preserving
>>> compatibility with existing API.
>>>
>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>> ---
>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>> b/doc/guides/rel_notes/deprecation.rst
>>> index ea4cfa7..cd700ae 100644
>>> --- a/doc/guides/rel_notes/deprecation.rst
>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>> @@ -99,6 +99,11 @@ Deprecation Notices
>>>    In 19.11 PMDs will still update the field even when the offload is not
>>>    enabled.
>>>
>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
>>> +receiving
>>> +  queues to split ingress packets into multiple segments according to
>>> +the
>>> +  specified lengths into the buffers allocated from the specified
>>> +  memory pools. The backward compatibility to existing API is preserved.
>>> +
>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
>> ``rte_eth_rx_descriptor_done``
>>>    will be deprecated in 20.11 and will be removed in 21.11.
>>>    Existing ``rte_eth_rx_descriptor_status`` and
>>> ``rte_eth_tx_descriptor_status``
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 15:31 ` Andrew Rybchenko
@ 2020-08-03 16:51   ` Slava Ovsiienko
  2020-08-30 12:58     ` Andrew Rybchenko
  0 siblings, 1 reply; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-03 16:51 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, ferruh.yigit,
	jerinjacobk, stephen, ajit.khaparde, maxime.coquelin,
	olivier.matz, david.marchand

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Monday, August 3, 2020 18:31
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> ferruh.yigit@intel.com; jerinjacobk@gmail.com;
> stephen@networkplumber.org; ajit.khaparde@broadcom.com;
> maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> david.marchand@redhat.com
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> Hi Slava,
> 
> On 8/3/20 6:18 PM, Slava Ovsiienko wrote:
> > Hi, Andrew
> >
> > Thanks for the comment, please, see below.
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Monday, August 3, 2020 17:31
> >> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> >> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> >> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> ferruh.yigit@intel.com; jerinjacobk@gmail.com;
> >> stephen@networkplumber.org; ajit.khaparde@broadcom.com;
> >> maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> >> david.marchand@redhat.com
> >> Subject: Re: ***Spam*** [PATCH] doc: announce changes to ethdev
> >> rxconf structure
> >>
> >> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
> >>> The DPDK datapath in the transmit direction is very flexible.
> >>> The applications can build multisegment packets and manages almost
> >>> all data aspects - the memory pools where segments are allocated
> >>> from, the segment lengths, the memory attributes like external,
> registered, etc.
> >>>
> >>> In the receiving direction, the datapath is much less flexible, the
> >>> applications can only specify the memory pool to configure the
> >>> receiving queue and nothing more. In order to extend the receiving
> >>> datapath capabilities it is proposed to add the new fields into
> >>> rte_eth_rxconf structure:
> >>>
> >>> struct rte_eth_rxconf {
> >>>     ...
> >>>     uint16_t rx_split_num; /* number of segments to split */
> >>>     uint16_t *rx_split_len; /* array of segment lengthes */
> >>>     struct rte_mempool **mp; /* array of segment memory pools */
> >>>     ...
> >>> };
> >>>
> >>> The non-zero value of rx_split_num field configures the receiving
> >>> queue to split ingress packets into multiple segments to the mbufs
> >>> allocated from various memory pools according to the specified
> >>> lengths. The zero value of rx_split_num field provides the backward
> >>> compatibility and queue should be configured in a regular way (with
> >>> single/multiple mbufs of the same data buffer length allocated from
> >>> the single memory pool).
> >>
> >> From the above description it is not 100% clear how it will coexist with:
> >>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
> >>  - DEV_RX_OFFLOAD_SCATTER
> >
> > DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
> > for the new feature to indicate the application is prepared for the
> > multisegment packets.
> 
> I hope it will be mentioned in the feature documentation in the future, but
> I'm not 100% sure that it is required. See below.
I suppose there is the hierarchy:
- applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in this way:
"Hey, driver, I'm ready to handle multi-segment packets". Readiness in general.
- application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, in particular way:
"Hey, driver, please, drop ten bytes here, here and here, and the rest - over there"


> >
> > But SCATTER it just tells that ingress packet length can exceed the
> > mbuf data buffer length and the chain of mbufs must be built to store
> > the entire packet. But there is the limitation - all mbufs are
> > allocated  from the same memory pool, and all data buffers have the same
> length.
> > The new feature provides an opportunity to allocated mbufs from the
> > desired pools and specifies the length of each buffer/part.
> 
> Yes, it is clear, but what happens if packet does not fit into the provided
> pools chain? Is the last used many times? May be it logical to use Rx queue
> setup mb_pool as well for the purpose? I.e. use suggested here pools only
> once and use mb_pool many times for the rest if SCATTER is supported and
> only once if SCATTER is not supported.

It could be easily configured w/o involving SCATTER flag - just specify the last pool
multiple times. I.e.
pool 0 - 14B
pool 1 - 20B
...
pool N - 512B
pool N - 512B
pool N - 512B, sum of length >= max packet size 1518

It was supposed the sum of lengths in array covers the maximal packet size.
Currently there is the limitation on packet size, for example mlx5 PMD 
just drops the packets with the length exceeded the one queue is configured for.

> 
> >
> >>  - DEV_RX_OFFLOAD_HEADER_SPLIT
> > The new feature (let's name it as "BUFFER_SPLIT") might be supported
> > in conjunction with HEADER_SPLIT (say, split the rest of the data
> > after the header) or rejected if HEADER_SPLIT is configured on the
> > port, depending on PMD implementation (return ENOTSUP if both features
> are requested on the same port).
> 
> OK, consider to make SCATTER and BUFFER_SPLIT independent as suggested
> above.
Sorry, do you mean HEADER_SPLIT and BUFFER_SPLIT?

> 
> >
> >> How will application know that the feature is supported? Limitations?
> > It is subject for further discussion, I see two options:
> >  - introduce the DEV_RX_OFFLOAD_BUFFER_SPLIT flag
> 
> +1
OK, got it.

> 
> > - return ENOTSUP/EINVAL from rx_queue_setup() if feature is requested
> >   (mp parameter is supposed to be NULL for the case)
> 
> I'd say that it should be used for corner cases only which are hard to
> formalize. It could be important to know maximum number of buffers to
> split, total length which could be split from the remaining, limitations on split
> lengths.
Agree, the dedicated OFFLOAD flag seems to be preferable.

With best regards, Slava

> 
> >
> >> Is it always split by specified/fixed length?
> > Yes, it is simple feature, it splits the data to the buffers with
> > required memory attributes provided by specified pools according to the
> fixed lengths.
> > It should be OK for protocols like eCPRI or some tunneling.
> 
> I see. Thanks.
> 
> >
> >> What happens if header length is actually different?
> > It is per queue configuration, packets might be sorted with rte_flow engine
> between the queues.
> > The supposed use case is to filter out specific protocol packets (say
> > eCPRI with fixed header length) and split ones on specific Rx queue.
> 
> Got it.
> 
> Thanks,
> Andrew.
> 
> >
> >
> > With best regards,
> > Slava
> >
> >>
> >>> The new approach would allow splitting the ingress packets into
> >>> multiple parts pushed to the memory with different attributes.
> >>> For example, the packet headers can be pushed to the embedded data
> >>> buffers within mbufs and the application data into the external
> >>> buffers attached to mbufs allocated from the different memory pools.
> >>> The memory attributes for the split parts may differ either - for
> >>> example the application data may be pushed into the external memory
> >>> located on the dedicated physical device, say GPU or NVMe. This
> >>> would improve the DPDK receiving datapath flexibility preserving
> >>> compatibility with existing API.
> >>>
> >>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >>> ---
> >>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>> b/doc/guides/rel_notes/deprecation.rst
> >>> index ea4cfa7..cd700ae 100644
> >>> --- a/doc/guides/rel_notes/deprecation.rst
> >>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>> @@ -99,6 +99,11 @@ Deprecation Notices
> >>>    In 19.11 PMDs will still update the field even when the offload is not
> >>>    enabled.
> >>>
> >>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> >>> +receiving
> >>> +  queues to split ingress packets into multiple segments according
> >>> +to the
> >>> +  specified lengths into the buffers allocated from the specified
> >>> +  memory pools. The backward compatibility to existing API is
> preserved.
> >>> +
> >>>  * ethdev: ``rx_descriptor_done`` dev_ops and
> >> ``rte_eth_rx_descriptor_done``
> >>>    will be deprecated in 20.11 and will be removed in 21.11.
> >>>    Existing ``rte_eth_rx_descriptor_status`` and
> >>> ``rte_eth_tx_descriptor_status``
> >


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 16:51   ` Slava Ovsiienko
@ 2020-08-30 12:58     ` Andrew Rybchenko
  2020-08-30 18:26       ` Stephen Hemminger
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Rybchenko @ 2020-08-30 12:58 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, ferruh.yigit,
	jerinjacobk, stephen, ajit.khaparde, maxime.coquelin,
	olivier.matz, david.marchand

On 8/3/20 7:51 PM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Monday, August 3, 2020 18:31
>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
>> ferruh.yigit@intel.com; jerinjacobk@gmail.com;
>> stephen@networkplumber.org; ajit.khaparde@broadcom.com;
>> maxime.coquelin@redhat.com; olivier.matz@6wind.com;
>> david.marchand@redhat.com
>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
>>
>> Hi Slava,
>>
>> On 8/3/20 6:18 PM, Slava Ovsiienko wrote:
>>> Hi, Andrew
>>>
>>> Thanks for the comment, please, see below.
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> Sent: Monday, August 3, 2020 17:31
>>>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
>>>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>>>> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
>>>> ferruh.yigit@intel.com; jerinjacobk@gmail.com;
>>>> stephen@networkplumber.org; ajit.khaparde@broadcom.com;
>>>> maxime.coquelin@redhat.com; olivier.matz@6wind.com;
>>>> david.marchand@redhat.com
>>>> Subject: Re: ***Spam*** [PATCH] doc: announce changes to ethdev
>>>> rxconf structure
>>>>
>>>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
>>>>> The DPDK datapath in the transmit direction is very flexible.
>>>>> The applications can build multisegment packets and manages almost
>>>>> all data aspects - the memory pools where segments are allocated
>>>>> from, the segment lengths, the memory attributes like external,
>> registered, etc.
>>>>>
>>>>> In the receiving direction, the datapath is much less flexible, the
>>>>> applications can only specify the memory pool to configure the
>>>>> receiving queue and nothing more. In order to extend the receiving
>>>>> datapath capabilities it is proposed to add the new fields into
>>>>> rte_eth_rxconf structure:
>>>>>
>>>>> struct rte_eth_rxconf {
>>>>>     ...
>>>>>     uint16_t rx_split_num; /* number of segments to split */
>>>>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>>>>     struct rte_mempool **mp; /* array of segment memory pools */
>>>>>     ...
>>>>> };
>>>>>
>>>>> The non-zero value of rx_split_num field configures the receiving
>>>>> queue to split ingress packets into multiple segments to the mbufs
>>>>> allocated from various memory pools according to the specified
>>>>> lengths. The zero value of rx_split_num field provides the backward
>>>>> compatibility and queue should be configured in a regular way (with
>>>>> single/multiple mbufs of the same data buffer length allocated from
>>>>> the single memory pool).
>>>>
>>>> From the above description it is not 100% clear how it will coexist with:
>>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>>>>  - DEV_RX_OFFLOAD_SCATTER
>>>
>>> DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
>>> for the new feature to indicate the application is prepared for the
>>> multisegment packets.
>>
>> I hope it will be mentioned in the feature documentation in the future, but
>> I'm not 100% sure that it is required. See below.
> I suppose there is the hierarchy:
> - applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in this way:
> "Hey, driver, I'm ready to handle multi-segment packets". Readiness in general.
> - application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, in particular way:
> "Hey, driver, please, drop ten bytes here, here and here, and the rest - over there"

My idea is to keep SCATTER and BUFFER_SPLIT independent.
SCATTER is a possibility to make multi-segment packets getting
mbufs from main rxq mempool as many as required.
BUFFER_SPLIT is support of many mempools and splitting
received packets as specified.

>>>
>>> But SCATTER it just tells that ingress packet length can exceed the
>>> mbuf data buffer length and the chain of mbufs must be built to store
>>> the entire packet. But there is the limitation - all mbufs are
>>> allocated  from the same memory pool, and all data buffers have the same
>> length.
>>> The new feature provides an opportunity to allocated mbufs from the
>>> desired pools and specifies the length of each buffer/part.
>>
>> Yes, it is clear, but what happens if packet does not fit into the provided
>> pools chain? Is the last used many times? May be it logical to use Rx queue
>> setup mb_pool as well for the purpose? I.e. use suggested here pools only
>> once and use mb_pool many times for the rest if SCATTER is supported and
>> only once if SCATTER is not supported.
> 
> It could be easily configured w/o involving SCATTER flag - just specify the last pool
> multiple times. I.e.
> pool 0 - 14B
> pool 1 - 20B
> ...
> pool N - 512B
> pool N - 512B
> pool N - 512B, sum of length >= max packet size 1518

I see, but IMHO it is excessive. pools 0 .. N-1 could be
provided in buffer split config, Nth should be in main
RxQ mempool plus SCATTER enabled.

> It was supposed the sum of lengths in array covers the maximal packet size.
> Currently there is the limitation on packet size, for example mlx5 PMD 
> just drops the packets with the length exceeded the one queue is configured for.
> 
>>
>>>
>>>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
>>> The new feature (let's name it as "BUFFER_SPLIT") might be supported
>>> in conjunction with HEADER_SPLIT (say, split the rest of the data
>>> after the header) or rejected if HEADER_SPLIT is configured on the
>>> port, depending on PMD implementation (return ENOTSUP if both features
>> are requested on the same port).
>>
>> OK, consider to make SCATTER and BUFFER_SPLIT independent as suggested
>> above.
> Sorry, do you mean HEADER_SPLIT and BUFFER_SPLIT?

See above.

>>
>>>
>>>> How will application know that the feature is supported? Limitations?
>>> It is subject for further discussion, I see two options:
>>>  - introduce the DEV_RX_OFFLOAD_BUFFER_SPLIT flag
>>
>> +1
> OK, got it.
> 
>>
>>> - return ENOTSUP/EINVAL from rx_queue_setup() if feature is requested
>>>   (mp parameter is supposed to be NULL for the case)
>>
>> I'd say that it should be used for corner cases only which are hard to
>> formalize. It could be important to know maximum number of buffers to
>> split, total length which could be split from the remaining, limitations on split
>> lengths.
> Agree, the dedicated OFFLOAD flag seems to be preferable.
> 
> With best regards, Slava
> 
>>
>>>
>>>> Is it always split by specified/fixed length?
>>> Yes, it is simple feature, it splits the data to the buffers with
>>> required memory attributes provided by specified pools according to the
>> fixed lengths.
>>> It should be OK for protocols like eCPRI or some tunneling.
>>
>> I see. Thanks.
>>
>>>
>>>> What happens if header length is actually different?
>>> It is per queue configuration, packets might be sorted with rte_flow engine
>> between the queues.
>>> The supposed use case is to filter out specific protocol packets (say
>>> eCPRI with fixed header length) and split ones on specific Rx queue.
>>
>> Got it.
>>
>> Thanks,
>> Andrew.
>>
>>>
>>>
>>> With best regards,
>>> Slava
>>>
>>>>
>>>>> The new approach would allow splitting the ingress packets into
>>>>> multiple parts pushed to the memory with different attributes.
>>>>> For example, the packet headers can be pushed to the embedded data
>>>>> buffers within mbufs and the application data into the external
>>>>> buffers attached to mbufs allocated from the different memory pools.
>>>>> The memory attributes for the split parts may differ either - for
>>>>> example the application data may be pushed into the external memory
>>>>> located on the dedicated physical device, say GPU or NVMe. This
>>>>> would improve the DPDK receiving datapath flexibility preserving
>>>>> compatibility with existing API.
>>>>>
>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>>>> ---
>>>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>  1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>> index ea4cfa7..cd700ae 100644
>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>> @@ -99,6 +99,11 @@ Deprecation Notices
>>>>>    In 19.11 PMDs will still update the field even when the offload is not
>>>>>    enabled.
>>>>>
>>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
>>>>> +receiving
>>>>> +  queues to split ingress packets into multiple segments according
>>>>> +to the
>>>>> +  specified lengths into the buffers allocated from the specified
>>>>> +  memory pools. The backward compatibility to existing API is
>> preserved.
>>>>> +
>>>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
>>>> ``rte_eth_rx_descriptor_done``
>>>>>    will be deprecated in 20.11 and will be removed in 21.11.
>>>>>    Existing ``rte_eth_rx_descriptor_status`` and
>>>>> ``rte_eth_tx_descriptor_status``
>>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-30 12:58     ` Andrew Rybchenko
@ 2020-08-30 18:26       ` Stephen Hemminger
  2020-08-31  6:35         ` Andrew Rybchenko
  0 siblings, 1 reply; 24+ messages in thread
From: Stephen Hemminger @ 2020-08-30 18:26 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Slava Ovsiienko, dev, Matan Azrad, Raslan Darawsheh,
	Thomas Monjalon, ferruh.yigit, jerinjacobk, ajit.khaparde,
	maxime.coquelin, olivier.matz, david.marchand

On Sun, 30 Aug 2020 15:58:57 +0300
Andrew Rybchenko <arybchenko@solarflare.com> wrote:

> >>>>>
> >>>>> The non-zero value of rx_split_num field configures the receiving
> >>>>> queue to split ingress packets into multiple segments to the mbufs
> >>>>> allocated from various memory pools according to the specified
> >>>>> lengths. The zero value of rx_split_num field provides the backward
> >>>>> compatibility and queue should be configured in a regular way (with
> >>>>> single/multiple mbufs of the same data buffer length allocated from
> >>>>> the single memory pool).  
> >>>>
> >>>> From the above description it is not 100% clear how it will coexist with:
> >>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
> >>>>  - DEV_RX_OFFLOAD_SCATTER  
> >>>
> >>> DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
> >>> for the new feature to indicate the application is prepared for the
> >>> multisegment packets.  
> >>
> >> I hope it will be mentioned in the feature documentation in the future, but
> >> I'm not 100% sure that it is required. See below.  
> > I suppose there is the hierarchy:
> > - applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in this way:
> > "Hey, driver, I'm ready to handle multi-segment packets". Readiness in general.
> > - application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, in particular way:
> > "Hey, driver, please, drop ten bytes here, here and here, and the rest - over there"  
> 
> My idea is to keep SCATTER and BUFFER_SPLIT independent.
> SCATTER is a possibility to make multi-segment packets getting
> mbufs from main rxq mempool as many as required.
> BUFFER_SPLIT is support of many mempools and splitting
> received packets as specified.

No.
Once again, drivers should take anything from application and rely on using
logic to choose best path. Modern CPU's have good branch predictors, and making
the developer do that work is counter productive.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-30 18:26       ` Stephen Hemminger
@ 2020-08-31  6:35         ` Andrew Rybchenko
  2020-08-31 16:59           ` Stephen Hemminger
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Rybchenko @ 2020-08-31  6:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Slava Ovsiienko, dev, Matan Azrad, Raslan Darawsheh,
	Thomas Monjalon, ferruh.yigit, jerinjacobk, ajit.khaparde,
	maxime.coquelin, olivier.matz, david.marchand

Hi Stephen,

On 8/30/20 9:26 PM, Stephen Hemminger wrote:
> On Sun, 30 Aug 2020 15:58:57 +0300
> Andrew Rybchenko <arybchenko@solarflare.com> wrote:
> 
>>>>>>>
>>>>>>> The non-zero value of rx_split_num field configures the receiving
>>>>>>> queue to split ingress packets into multiple segments to the mbufs
>>>>>>> allocated from various memory pools according to the specified
>>>>>>> lengths. The zero value of rx_split_num field provides the backward
>>>>>>> compatibility and queue should be configured in a regular way (with
>>>>>>> single/multiple mbufs of the same data buffer length allocated from
>>>>>>> the single memory pool).  
>>>>>>
>>>>>> From the above description it is not 100% clear how it will coexist with:
>>>>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>>>>>>  - DEV_RX_OFFLOAD_SCATTER  
>>>>>
>>>>> DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
>>>>> for the new feature to indicate the application is prepared for the
>>>>> multisegment packets.  
>>>>
>>>> I hope it will be mentioned in the feature documentation in the future, but
>>>> I'm not 100% sure that it is required. See below.  
>>> I suppose there is the hierarchy:
>>> - applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in this way:
>>> "Hey, driver, I'm ready to handle multi-segment packets". Readiness in general.
>>> - application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, in particular way:
>>> "Hey, driver, please, drop ten bytes here, here and here, and the rest - over there"  
>>
>> My idea is to keep SCATTER and BUFFER_SPLIT independent.
>> SCATTER is a possibility to make multi-segment packets getting
>> mbufs from main rxq mempool as many as required.
>> BUFFER_SPLIT is support of many mempools and splitting
>> received packets as specified.
> 
> No.
> Once again, drivers should take anything from application and rely on using
> logic to choose best path. Modern CPU's have good branch predictors, and making
> the developer do that work is counter productive.

Please, add a bit more details. I simply can see relationship.
So, right now for me it looks like just misunderstanding.

Thanks,
Andrew.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-31  6:35         ` Andrew Rybchenko
@ 2020-08-31 16:59           ` Stephen Hemminger
  0 siblings, 0 replies; 24+ messages in thread
From: Stephen Hemminger @ 2020-08-31 16:59 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Slava Ovsiienko, dev, Matan Azrad, Raslan Darawsheh,
	Thomas Monjalon, ferruh.yigit, jerinjacobk, ajit.khaparde,
	maxime.coquelin, olivier.matz, david.marchand

On Mon, 31 Aug 2020 09:35:18 +0300
Andrew Rybchenko <arybchenko@solarflare.com> wrote:

> >>>>> multisegment packets.    
> >>>>
> >>>> I hope it will be mentioned in the feature documentation in the future, but
> >>>> I'm not 100% sure that it is required. See below.    
> >>> I suppose there is the hierarchy:
> >>> - applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in this way:
> >>> "Hey, driver, I'm ready to handle multi-segment packets". Readiness in general.
> >>> - application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, in particular way:
> >>> "Hey, driver, please, drop ten bytes here, here and here, and the rest - over there"    
> >>
> >> My idea is to keep SCATTER and BUFFER_SPLIT independent.
> >> SCATTER is a possibility to make multi-segment packets getting
> >> mbufs from main rxq mempool as many as required.
> >> BUFFER_SPLIT is support of many mempools and splitting
> >> received packets as specified.  
> > 
> > No.
> > Once again, drivers should take anything from application and rely on using
> > logic to choose best path. Modern CPU's have good branch predictors, and making
> > the developer do that work is counter productive.  
> 
> Please, add a bit more details. I simply can see relationship.
> So, right now for me it looks like just misunderstanding.
> 
> Thanks,
> Andrew.

Ok, documenting the existing behaviour is good. I was just concerned that this was
going to lead to more per-queue flags.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 18:10             ` Stephen Hemminger
@ 2020-08-07 11:23               ` Slava Ovsiienko
  0 siblings, 0 replies; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-07 11:23 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ferruh Yigit, Jerin Jacob, dpdk-dev, Matan Azrad,
	Raslan Darawsheh, Thomas Monjalon, Andrew Rybchenko,
	Ajit Khaparde, Maxime Coquelin, Olivier Matz, David Marchand

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, August 6, 2020 21:10
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: Ferruh Yigit <ferruh.yigit@intel.com>; Jerin Jacob
> <jerinjacobk@gmail.com>; dpdk-dev <dev@dpdk.org>; Matan Azrad
> <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>;
> Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> David Marchand <david.marchand@redhat.com>
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On Thu, 6 Aug 2020 17:03:31 +0000
> Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Thursday, August 6, 2020 19:26
> > > To: Ferruh Yigit <ferruh.yigit@intel.com>
> > > Cc: Jerin Jacob <jerinjacobk@gmail.com>; Slava Ovsiienko
> > > <viacheslavo@mellanox.com>; dpdk-dev <dev@dpdk.org>; Matan Azrad
> > > <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>;
> > > Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> > > <arybchenko@solarflare.com>; Ajit Khaparde
> > > <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > > <maxime.coquelin@redhat.com>; Olivier Matz
> <olivier.matz@6wind.com>;
> > > David Marchand <david.marchand@redhat.com>
> > > Subject: Re: [PATCH] doc: announce changes to ethdev rxconf
> > > structure
> > >
> > > On Thu, 6 Aug 2020 16:58:22 +0100
> > > Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > >
> > > > On 8/4/2020 2:32 PM, Jerin Jacob wrote:
> > > > > On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko
> > > <viacheslavo@mellanox.com> wrote:
> > > > >>
> > > > >> Hi, Jerin,
> > > > >>
> > > > >> Thanks for the comment,  please, see below.
> > > > >>
> > > > >>> -----Original Message-----
> > > > >>> From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > >>> Sent: Monday, August 3, 2020 14:57
> > > > >>> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > > > >>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad
> <matan@mellanox.com>;
> > > > >>> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> > > > >>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > > >>> Stephen Hemminger <stephen@networkplumber.org>; Andrew
> > > Rybchenko
> > > > >>> <arybchenko@solarflare.com>; Ajit Khaparde
> > > > >>> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > > > >>> <maxime.coquelin@redhat.com>; Olivier Matz
> > > > >>> <olivier.matz@6wind.com>; David Marchand
> > > > >>> <david.marchand@redhat.com>
> > > > >>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf
> > > > >>> structure
> > > > >>>
> > > > >>> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> > > > >>> <viacheslavo@mellanox.com> wrote:
> > > > >>>>
> > > > >>>> The DPDK datapath in the transmit direction is very flexible.
> > > > >>>> The applications can build multisegment packets and manages
> > > > >>>> almost all data aspects - the memory pools where segments are
> > > > >>>> allocated from, the segment lengths, the memory attributes
> > > > >>>> like
> > > external, registered, etc.
> > > > >>>>
> > > > >>>> In the receiving direction, the datapath is much less
> > > > >>>> flexible, the applications can only specify the memory pool
> > > > >>>> to configure the receiving queue and nothing more. In order
> > > > >>>> to extend the receiving datapath capabilities it is proposed
> > > > >>>> to add the new fields into rte_eth_rxconf structure:
> > > > >>>>
> > > > >>>> struct rte_eth_rxconf {
> > > > >>>>     ...
> > > > >>>>     uint16_t rx_split_num; /* number of segments to split */
> > > > >>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> > > > >>>>     struct rte_mempool **mp; /* array of segment memory pools
> > > > >>>> */
> > > > >>>
> > > > >>> The pool has the packet length it's been configured for.
> > > > >>> So I think, rx_split_len can be removed.
> > > > >>
> > > > >> Yes, it is one of the supposed options - if pointer to array of
> > > > >> segment lengths is NULL , the queue_setup() could use the
> > > > >> lengths from
> > > the pool's properties.
> > > > >> But we are talking about packet split, in general, it should
> > > > >> not depend on pool properties. What if application provides the
> > > > >> single pool and just wants to have the tunnel header in the
> > > > >> first dedicated
> > > mbuf?
> > > > >>
> > > > >>>
> > > > >>> This feature also available in Marvell HW. So it not specific
> > > > >>> to one
> > > vendor.
> > > > >>> Maybe we could just the use case mention the use case in the
> > > > >>> depreciation notice and the tentative change in rte_eth_rxconf
> > > > >>> and exact details can be worked out at the time of implementation.
> > > > >>>
> > > > >> So, if I understand correctly, the struct changes in the commit
> > > > >> message should be marked as just possible implementation?
> > > > >
> > > > > Yes.
> > > > >
> > > > > We may need to have a detailed discussion on the correct
> > > > > abstraction for various HW is available with this feature.
> > > > >
> > > > > On Marvell HW, We can configure TWO pools for given eth Rx queue.
> > > > > One pool can be configured as a small packet pool and other one
> > > > > as large packet pool.
> > > > > And there is a threshold value to decide the pool between small
> > > > > and
> > > large.
> > > > > For example:
> > > > > - The small pool is configured 2k
> > > > > - The large pool is configured with 10k
> > > > > - And if the threshold value is configured as 2k.
> > > > > Any packet size <=2K will land in small pool and others in a large pool.
> > > > > The use case, we are targeting is to save the memory space for
> > > > > jumbo
> > > frames.
> > > >
> > > > Out of curiosity, do you provide two different buffer address in
> > > > the descriptor and HW automatically uses one based on the size, or
> > > > driver uses one of the pools based on the configuration and
> > > > possible largest packet size?
> > >
> > > I am all for allowing more configuration of buffer pool.
> > > But don't want that to be exposed as a hardware specific requirement
> > > in the API for applications. The worst case would be if your API changes
> required:
> > >
> > >   if (strcmp(dev->driver_name, "marvell") == 0) {
> > >      // make another mempool for this driver
> > >
> > I thought about adding some other segment attributes, vendor specific.
> > We could describe the segments with some descriptor structure (size,
> > pool) and add flags field to one. The proposals from other vendors are
> welcome.
> >
> 
> Please no snowflake API's "are driver is special"...
> 
> Think of how it can fit into a general model.
> Also, just because your hardware has a special feature does not mean the
> DPDK has to support it!

Sure. The initial proposal is just about how to extend the Rx buffers description.
Now it is the single pool and the single fixed segment size only. Not very flexible so far.

With best regards, Slava


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 17:03           ` Slava Ovsiienko
@ 2020-08-06 18:10             ` Stephen Hemminger
  2020-08-07 11:23               ` Slava Ovsiienko
  0 siblings, 1 reply; 24+ messages in thread
From: Stephen Hemminger @ 2020-08-06 18:10 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: Ferruh Yigit, Jerin Jacob, dpdk-dev, Matan Azrad,
	Raslan Darawsheh, Thomas Monjalon, Andrew Rybchenko,
	Ajit Khaparde, Maxime Coquelin, Olivier Matz, David Marchand

On Thu, 6 Aug 2020 17:03:31 +0000
Slava Ovsiienko <viacheslavo@mellanox.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, August 6, 2020 19:26
> > To: Ferruh Yigit <ferruh.yigit@intel.com>
> > Cc: Jerin Jacob <jerinjacobk@gmail.com>; Slava Ovsiienko
> > <viacheslavo@mellanox.com>; dpdk-dev <dev@dpdk.org>; Matan Azrad
> > <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>;
> > Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; Ajit Khaparde
> > <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> > David Marchand <david.marchand@redhat.com>
> > Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> > 
> > On Thu, 6 Aug 2020 16:58:22 +0100
> > Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >   
> > > On 8/4/2020 2:32 PM, Jerin Jacob wrote:  
> > > > On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko  
> > <viacheslavo@mellanox.com> wrote:  
> > > >>
> > > >> Hi, Jerin,
> > > >>
> > > >> Thanks for the comment,  please, see below.
> > > >>  
> > > >>> -----Original Message-----
> > > >>> From: Jerin Jacob <jerinjacobk@gmail.com>
> > > >>> Sent: Monday, August 3, 2020 14:57
> > > >>> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > > >>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> > > >>> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> > > >>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > >>> Stephen Hemminger <stephen@networkplumber.org>; Andrew  
> > Rybchenko  
> > > >>> <arybchenko@solarflare.com>; Ajit Khaparde
> > > >>> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > > >>> <maxime.coquelin@redhat.com>; Olivier Matz
> > > >>> <olivier.matz@6wind.com>; David Marchand
> > > >>> <david.marchand@redhat.com>
> > > >>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf
> > > >>> structure
> > > >>>
> > > >>> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> > > >>> <viacheslavo@mellanox.com> wrote:  
> > > >>>>
> > > >>>> The DPDK datapath in the transmit direction is very flexible.
> > > >>>> The applications can build multisegment packets and manages
> > > >>>> almost all data aspects - the memory pools where segments are
> > > >>>> allocated from, the segment lengths, the memory attributes like  
> > external, registered, etc.  
> > > >>>>
> > > >>>> In the receiving direction, the datapath is much less flexible,
> > > >>>> the applications can only specify the memory pool to configure
> > > >>>> the receiving queue and nothing more. In order to extend the
> > > >>>> receiving datapath capabilities it is proposed to add the new
> > > >>>> fields into rte_eth_rxconf structure:
> > > >>>>
> > > >>>> struct rte_eth_rxconf {
> > > >>>>     ...
> > > >>>>     uint16_t rx_split_num; /* number of segments to split */
> > > >>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> > > >>>>     struct rte_mempool **mp; /* array of segment memory pools */  
> > > >>>
> > > >>> The pool has the packet length it's been configured for.
> > > >>> So I think, rx_split_len can be removed.  
> > > >>
> > > >> Yes, it is one of the supposed options - if pointer to array of
> > > >> segment lengths is NULL , the queue_setup() could use the lengths from  
> > the pool's properties.  
> > > >> But we are talking about packet split, in general, it should not
> > > >> depend on pool properties. What if application provides the single
> > > >> pool and just wants to have the tunnel header in the first dedicated  
> > mbuf?  
> > > >>  
> > > >>>
> > > >>> This feature also available in Marvell HW. So it not specific to one  
> > vendor.  
> > > >>> Maybe we could just the use case mention the use case in the
> > > >>> depreciation notice and the tentative change in rte_eth_rxconf and
> > > >>> exact details can be worked out at the time of implementation.
> > > >>>  
> > > >> So, if I understand correctly, the struct changes in the commit
> > > >> message should be marked as just possible implementation?  
> > > >
> > > > Yes.
> > > >
> > > > We may need to have a detailed discussion on the correct abstraction
> > > > for various HW is available with this feature.
> > > >
> > > > On Marvell HW, We can configure TWO pools for given eth Rx queue.
> > > > One pool can be configured as a small packet pool and other one as
> > > > large packet pool.
> > > > And there is a threshold value to decide the pool between small and  
> > large.  
> > > > For example:
> > > > - The small pool is configured 2k
> > > > - The large pool is configured with 10k
> > > > - And if the threshold value is configured as 2k.
> > > > Any packet size <=2K will land in small pool and others in a large pool.
> > > > The use case, we are targeting is to save the memory space for jumbo  
> > frames.  
> > >
> > > Out of curiosity, do you provide two different buffer address in the
> > > descriptor and HW automatically uses one based on the size, or driver
> > > uses one of the pools based on the configuration and possible largest
> > > packet size?  
> > 
> > I am all for allowing more configuration of buffer pool.
> > But don't want that to be exposed as a hardware specific requirement in the
> > API for applications. The worst case would be if your API changes required:
> > 
> >   if (strcmp(dev->driver_name, "marvell") == 0) {
> >      // make another mempool for this driver
> >   
> I thought about adding some other segment attributes, vendor specific.
> We could describe the segments with some descriptor structure (size, pool)
> and add flags field to one. The proposals from other vendors are welcome.
> 

Please no snowflake API's "are driver is special"...

Think of how it can fit into a general model.
Also, just because your hardware has a special feature does not mean
the DPDK has to support it!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:25         ` Stephen Hemminger
  2020-08-06 16:41           ` Jerin Jacob
@ 2020-08-06 17:03           ` Slava Ovsiienko
  2020-08-06 18:10             ` Stephen Hemminger
  1 sibling, 1 reply; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-06 17:03 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: Jerin Jacob, dpdk-dev, Matan Azrad, Raslan Darawsheh,
	Thomas Monjalon, Andrew Rybchenko, Ajit Khaparde,
	Maxime Coquelin, Olivier Matz, David Marchand

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, August 6, 2020 19:26
> To: Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: Jerin Jacob <jerinjacobk@gmail.com>; Slava Ovsiienko
> <viacheslavo@mellanox.com>; dpdk-dev <dev@dpdk.org>; Matan Azrad
> <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>;
> Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> David Marchand <david.marchand@redhat.com>
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On Thu, 6 Aug 2020 16:58:22 +0100
> Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
> > On 8/4/2020 2:32 PM, Jerin Jacob wrote:
> > > On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko
> <viacheslavo@mellanox.com> wrote:
> > >>
> > >> Hi, Jerin,
> > >>
> > >> Thanks for the comment,  please, see below.
> > >>
> > >>> -----Original Message-----
> > >>> From: Jerin Jacob <jerinjacobk@gmail.com>
> > >>> Sent: Monday, August 3, 2020 14:57
> > >>> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > >>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> > >>> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> > >>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > >>> Stephen Hemminger <stephen@networkplumber.org>; Andrew
> Rybchenko
> > >>> <arybchenko@solarflare.com>; Ajit Khaparde
> > >>> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > >>> <maxime.coquelin@redhat.com>; Olivier Matz
> > >>> <olivier.matz@6wind.com>; David Marchand
> > >>> <david.marchand@redhat.com>
> > >>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf
> > >>> structure
> > >>>
> > >>> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> > >>> <viacheslavo@mellanox.com> wrote:
> > >>>>
> > >>>> The DPDK datapath in the transmit direction is very flexible.
> > >>>> The applications can build multisegment packets and manages
> > >>>> almost all data aspects - the memory pools where segments are
> > >>>> allocated from, the segment lengths, the memory attributes like
> external, registered, etc.
> > >>>>
> > >>>> In the receiving direction, the datapath is much less flexible,
> > >>>> the applications can only specify the memory pool to configure
> > >>>> the receiving queue and nothing more. In order to extend the
> > >>>> receiving datapath capabilities it is proposed to add the new
> > >>>> fields into rte_eth_rxconf structure:
> > >>>>
> > >>>> struct rte_eth_rxconf {
> > >>>>     ...
> > >>>>     uint16_t rx_split_num; /* number of segments to split */
> > >>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> > >>>>     struct rte_mempool **mp; /* array of segment memory pools */
> > >>>
> > >>> The pool has the packet length it's been configured for.
> > >>> So I think, rx_split_len can be removed.
> > >>
> > >> Yes, it is one of the supposed options - if pointer to array of
> > >> segment lengths is NULL , the queue_setup() could use the lengths from
> the pool's properties.
> > >> But we are talking about packet split, in general, it should not
> > >> depend on pool properties. What if application provides the single
> > >> pool and just wants to have the tunnel header in the first dedicated
> mbuf?
> > >>
> > >>>
> > >>> This feature also available in Marvell HW. So it not specific to one
> vendor.
> > >>> Maybe we could just the use case mention the use case in the
> > >>> depreciation notice and the tentative change in rte_eth_rxconf and
> > >>> exact details can be worked out at the time of implementation.
> > >>>
> > >> So, if I understand correctly, the struct changes in the commit
> > >> message should be marked as just possible implementation?
> > >
> > > Yes.
> > >
> > > We may need to have a detailed discussion on the correct abstraction
> > > for various HW is available with this feature.
> > >
> > > On Marvell HW, We can configure TWO pools for given eth Rx queue.
> > > One pool can be configured as a small packet pool and other one as
> > > large packet pool.
> > > And there is a threshold value to decide the pool between small and
> large.
> > > For example:
> > > - The small pool is configured 2k
> > > - The large pool is configured with 10k
> > > - And if the threshold value is configured as 2k.
> > > Any packet size <=2K will land in small pool and others in a large pool.
> > > The use case, we are targeting is to save the memory space for jumbo
> frames.
> >
> > Out of curiosity, do you provide two different buffer address in the
> > descriptor and HW automatically uses one based on the size, or driver
> > uses one of the pools based on the configuration and possible largest
> > packet size?
> 
> I am all for allowing more configuration of buffer pool.
> But don't want that to be exposed as a hardware specific requirement in the
> API for applications. The worst case would be if your API changes required:
> 
>   if (strcmp(dev->driver_name, "marvell") == 0) {
>      // make another mempool for this driver
> 
I thought about adding some other segment attributes, vendor specific.
We could describe the segments with some descriptor structure (size, pool)
and add flags field to one. The proposals from other vendors are welcome.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:43           ` Ferruh Yigit
@ 2020-08-06 16:48             ` Slava Ovsiienko
  0 siblings, 0 replies; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-06 16:48 UTC (permalink / raw)
  To: Ferruh Yigit, Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, jerinjacobk,
	stephen, ajit.khaparde, maxime.coquelin, olivier.matz,
	david.marchand

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, August 6, 2020 19:43
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> jerinjacobk@gmail.com; stephen@networkplumber.org;
> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
> olivier.matz@6wind.com; david.marchand@redhat.com
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On 8/6/2020 5:39 PM, Slava Ovsiienko wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Thursday, August 6, 2020 19:37
> >> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Andrew Rybchenko
> >> <arybchenko@solarflare.com>; dev@dpdk.org
> >> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> >> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> jerinjacobk@gmail.com; stephen@networkplumber.org;
> >> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
> >> olivier.matz@6wind.com; david.marchand@redhat.com
> >> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> >>
> >> On 8/6/2020 5:29 PM, Slava Ovsiienko wrote:
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>> Sent: Thursday, August 6, 2020 19:16
> >>>> To: Andrew Rybchenko <arybchenko@solarflare.com>; Slava Ovsiienko
> >>>> <viacheslavo@mellanox.com>; dev@dpdk.org
> >>>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> >>>> <rasland@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>;
> >>>> jerinjacobk@gmail.com; stephen@networkplumber.org;
> >>>> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
> >>>> olivier.matz@6wind.com; david.marchand@redhat.com
> >>>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf
> >>>> structure
> >>>>
> >>>> On 8/3/2020 3:31 PM, Andrew Rybchenko wrote:
> >>>>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
> >>>>>> The DPDK datapath in the transmit direction is very flexible.
> >>>>>> The applications can build multisegment packets and manages
> >>>>>> almost all data aspects - the memory pools where segments are
> >>>>>> allocated from, the segment lengths, the memory attributes like
> >>>>>> external, registered, etc.
> >>>>>>
> >>>>>> In the receiving direction, the datapath is much less flexible,
> >>>>>> the applications can only specify the memory pool to configure
> >>>>>> the receiving queue and nothing more. In order to extend the
> >>>>>> receiving datapath capabilities it is proposed to add the new
> >>>>>> fields into rte_eth_rxconf structure:
> >>>>>>
> >>>>>> struct rte_eth_rxconf {
> >>>>>>     ...
> >>>>>>     uint16_t rx_split_num; /* number of segments to split */
> >>>>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> >>>>>>     struct rte_mempool **mp; /* array of segment memory pools */
> >>>>>>     ...
> >>>>>> };
> >>>>>>
> >>>>>> The non-zero value of rx_split_num field configures the receiving
> >>>>>> queue to split ingress packets into multiple segments to the
> >>>>>> mbufs allocated from various memory pools according to the
> >>>>>> specified lengths. The zero value of rx_split_num field provides
> >>>>>> the backward compatibility and queue should be configured in a
> >>>>>> regular way (with single/multiple mbufs of the same data buffer
> >>>>>> length allocated from the single memory pool).
> >>>>>
> >>>>> From the above description it is not 100% clear how it will
> >>>>> coexist
> >>>>> with:
> >>>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
> >>>>
> >>>> +1
> >>> - supposed to be NULL if the array of lengths/pools is used
> >>>
> >>>>
> >>>>>  - DEV_RX_OFFLOAD_SCATTER
> >>>>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
> >>>>> How will application know that the feature is supported? Limitations?
> >>>>
> >>>> +1
> >>> New flag  DEV_RX_OFFLOAD_BUFFER_SPLIT is supposed to be
> introduced.
> >>> The feature requires the DEV_RX_OFFLOAD_SCATTER is set.
> >>> If DEV_RX_OFFLOAD_HEADER_SPLIT is set the error is returned.
> >>>
> >>>>
> >>>>> Is it always split by specified/fixed length?
> >>>>> What happens if header length is actually different?
> >>>>
> >>>> As far as I understand intention is to filter specific packets to a
> >>>> queue first and later do the split, so the header length will be fixed...
> >>>
> >>> Not exactly. The filtering should be handled by rte_flow engine.
> >>> The intention is to provide the more flexible way to describe rx
> >>> buffers. Currently it is the single pool with fixed size segments.
> >>> No way to split the packet into multiple segments with specified
> >>> lengths and in the specified pools. What if packet payload should be
> >>> stored in the physical memory on other device (GPU/Storage)? What if
> >>> caching is not desired for the payload (just forwarding
> >>> application)? We could provide
> >> the special NC pool.
> >>> What if packet should be split into the chunks with specific gaps?
> >>> For Tx direction we have this opportunity to gather packet from
> >>> various pools and any desired combinations , but Rx is much less flexible.
> >>>
> >>>>>
> >>>>>> The new approach would allow splitting the ingress packets into
> >>>>>> multiple parts pushed to the memory with different attributes.
> >>>>>> For example, the packet headers can be pushed to the embedded
> >>>>>> data buffers within mbufs and the application data into the
> >>>>>> external buffers attached to mbufs allocated from the different
> memory pools.
> >>>>>> The memory attributes for the split parts may differ either - for
> >>>>>> example the application data may be pushed into the external
> >>>>>> memory located on the dedicated physical device, say GPU or NVMe.
> >>>>>> This would improve the DPDK receiving datapath flexibility
> >>>>>> preserving compatibility with existing API.
> >>
> >> If you don't know the packet types in advance, how can you use fixed
> >> sizes to split a packet? Won't it be like having random parts of
> >> packet in each mempool..
> > It is per queue configuration. We have the rte_flow engine and can
> > filter out the desired packets to the desired queue.
> 
> That is what I was trying to say above, intentions is first filter the packets to
> a specific queue, later split them into multiple mempools, you said "not
> exactly", what is the difference I am missing?

Sorry, it is my bad - I mixed up with Marvell's queue capability to sort packets into two
pools depending on packet size. Yes, you are completely correct, first filter out the specific packets
to the dedicated queue and then split ones into the chunks of specified fixed sizes.
> 
> >
> >>
> >>>>>>
> >>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >>>>>> ---
> >>>>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>>>>  1 file changed, 5 insertions(+)
> >>>>>>
> >>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>>>>> b/doc/guides/rel_notes/deprecation.rst
> >>>>>> index ea4cfa7..cd700ae 100644
> >>>>>> --- a/doc/guides/rel_notes/deprecation.rst
> >>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>>>>> @@ -99,6 +99,11 @@ Deprecation Notices
> >>>>>>    In 19.11 PMDs will still update the field even when the offload is
> not
> >>>>>>    enabled.
> >>>>>>
> >>>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> >>>>>> +receiving
> >>>>>> +  queues to split ingress packets into multiple segments
> >>>>>> +according to the
> >>>>>> +  specified lengths into the buffers allocated from the
> >>>>>> +specified
> >>>>>> +  memory pools. The backward compatibility to existing API is
> >> preserved.
> >>>>>> +
> >>>>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
> >>>> ``rte_eth_rx_descriptor_done``
> >>>>>>    will be deprecated in 20.11 and will be removed in 21.11.
> >>>>>>    Existing ``rte_eth_rx_descriptor_status`` and
> >>>>>> ``rte_eth_tx_descriptor_status``
> >>>>>
> >>>
> >


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:39         ` Slava Ovsiienko
@ 2020-08-06 16:43           ` Ferruh Yigit
  2020-08-06 16:48             ` Slava Ovsiienko
  0 siblings, 1 reply; 24+ messages in thread
From: Ferruh Yigit @ 2020-08-06 16:43 UTC (permalink / raw)
  To: Slava Ovsiienko, Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, jerinjacobk,
	stephen, ajit.khaparde, maxime.coquelin, olivier.matz,
	david.marchand

On 8/6/2020 5:39 PM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Thursday, August 6, 2020 19:37
>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Andrew Rybchenko
>> <arybchenko@solarflare.com>; dev@dpdk.org
>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
>> jerinjacobk@gmail.com; stephen@networkplumber.org;
>> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
>> olivier.matz@6wind.com; david.marchand@redhat.com
>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
>>
>> On 8/6/2020 5:29 PM, Slava Ovsiienko wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: Thursday, August 6, 2020 19:16
>>>> To: Andrew Rybchenko <arybchenko@solarflare.com>; Slava Ovsiienko
>>>> <viacheslavo@mellanox.com>; dev@dpdk.org
>>>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>>>> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
>>>> jerinjacobk@gmail.com; stephen@networkplumber.org;
>>>> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
>>>> olivier.matz@6wind.com; david.marchand@redhat.com
>>>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
>>>>
>>>> On 8/3/2020 3:31 PM, Andrew Rybchenko wrote:
>>>>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
>>>>>> The DPDK datapath in the transmit direction is very flexible.
>>>>>> The applications can build multisegment packets and manages almost
>>>>>> all data aspects - the memory pools where segments are allocated
>>>>>> from, the segment lengths, the memory attributes like external,
>>>>>> registered, etc.
>>>>>>
>>>>>> In the receiving direction, the datapath is much less flexible, the
>>>>>> applications can only specify the memory pool to configure the
>>>>>> receiving queue and nothing more. In order to extend the receiving
>>>>>> datapath capabilities it is proposed to add the new fields into
>>>>>> rte_eth_rxconf structure:
>>>>>>
>>>>>> struct rte_eth_rxconf {
>>>>>>     ...
>>>>>>     uint16_t rx_split_num; /* number of segments to split */
>>>>>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>>>>>     struct rte_mempool **mp; /* array of segment memory pools */
>>>>>>     ...
>>>>>> };
>>>>>>
>>>>>> The non-zero value of rx_split_num field configures the receiving
>>>>>> queue to split ingress packets into multiple segments to the mbufs
>>>>>> allocated from various memory pools according to the specified
>>>>>> lengths. The zero value of rx_split_num field provides the backward
>>>>>> compatibility and queue should be configured in a regular way (with
>>>>>> single/multiple mbufs of the same data buffer length allocated from
>>>>>> the single memory pool).
>>>>>
>>>>> From the above description it is not 100% clear how it will coexist
>>>>> with:
>>>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>>>>
>>>> +1
>>> - supposed to be NULL if the array of lengths/pools is used
>>>
>>>>
>>>>>  - DEV_RX_OFFLOAD_SCATTER
>>>>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
>>>>> How will application know that the feature is supported? Limitations?
>>>>
>>>> +1
>>> New flag  DEV_RX_OFFLOAD_BUFFER_SPLIT is supposed to be introduced.
>>> The feature requires the DEV_RX_OFFLOAD_SCATTER is set.
>>> If DEV_RX_OFFLOAD_HEADER_SPLIT is set the error is returned.
>>>
>>>>
>>>>> Is it always split by specified/fixed length?
>>>>> What happens if header length is actually different?
>>>>
>>>> As far as I understand intention is to filter specific packets to a
>>>> queue first and later do the split, so the header length will be fixed...
>>>
>>> Not exactly. The filtering should be handled by rte_flow engine.
>>> The intention is to provide the more flexible way to describe rx
>>> buffers. Currently it is the single pool with fixed size segments. No
>>> way to split the packet into multiple segments with specified lengths
>>> and in the specified pools. What if packet payload should be stored in
>>> the physical memory on other device (GPU/Storage)? What if caching is
>>> not desired for the payload (just forwarding application)? We could provide
>> the special NC pool.
>>> What if packet should be split into the chunks with specific gaps?
>>> For Tx direction we have this opportunity to gather packet from
>>> various pools and any desired combinations , but Rx is much less flexible.
>>>
>>>>>
>>>>>> The new approach would allow splitting the ingress packets into
>>>>>> multiple parts pushed to the memory with different attributes.
>>>>>> For example, the packet headers can be pushed to the embedded data
>>>>>> buffers within mbufs and the application data into the external
>>>>>> buffers attached to mbufs allocated from the different memory pools.
>>>>>> The memory attributes for the split parts may differ either - for
>>>>>> example the application data may be pushed into the external memory
>>>>>> located on the dedicated physical device, say GPU or NVMe. This
>>>>>> would improve the DPDK receiving datapath flexibility preserving
>>>>>> compatibility with existing API.
>>
>> If you don't know the packet types in advance, how can you use fixed sizes to
>> split a packet? Won't it be like having random parts of packet in each
>> mempool..
> It is per queue configuration. We have the rte_flow engine and can filter out
> the desired packets to the desired queue.

That is what I was trying to say above, intentions is first filter the packets
to a specific queue, later split them into multiple mempools, you said "not
exactly", what is the difference I am missing?

> 
>>
>>>>>>
>>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>>>>> ---
>>>>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>>  1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>>> index ea4cfa7..cd700ae 100644
>>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>>> @@ -99,6 +99,11 @@ Deprecation Notices
>>>>>>    In 19.11 PMDs will still update the field even when the offload is not
>>>>>>    enabled.
>>>>>>
>>>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
>>>>>> +receiving
>>>>>> +  queues to split ingress packets into multiple segments according
>>>>>> +to the
>>>>>> +  specified lengths into the buffers allocated from the specified
>>>>>> +  memory pools. The backward compatibility to existing API is
>> preserved.
>>>>>> +
>>>>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
>>>> ``rte_eth_rx_descriptor_done``
>>>>>>    will be deprecated in 20.11 and will be removed in 21.11.
>>>>>>    Existing ``rte_eth_rx_descriptor_status`` and
>>>>>> ``rte_eth_tx_descriptor_status``
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:25         ` Stephen Hemminger
@ 2020-08-06 16:41           ` Jerin Jacob
  2020-08-06 17:03           ` Slava Ovsiienko
  1 sibling, 0 replies; 24+ messages in thread
From: Jerin Jacob @ 2020-08-06 16:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ferruh Yigit, Slava Ovsiienko, dpdk-dev, Matan Azrad,
	Raslan Darawsheh, Thomas Monjalon, Andrew Rybchenko,
	Ajit Khaparde, Maxime Coquelin, Olivier Matz, David Marchand

On Thu, Aug 6, 2020 at 9:56 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Thu, 6 Aug 2020 16:58:22 +0100
> Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> > On 8/4/2020 2:32 PM, Jerin Jacob wrote:
> > > On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> > >>
> > >> Hi, Jerin,
> > >>
> > >> Thanks for the comment,  please, see below.
> > >>
> > >>> -----Original Message-----
> > >>> From: Jerin Jacob <jerinjacobk@gmail.com>
> > >>> Sent: Monday, August 3, 2020 14:57
> > >>> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > >>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> > >>> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> > >>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Stephen
> > >>> Hemminger <stephen@networkplumber.org>; Andrew Rybchenko
> > >>> <arybchenko@solarflare.com>; Ajit Khaparde
> > >>> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > >>> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> > >>> David Marchand <david.marchand@redhat.com>
> > >>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> > >>>
> > >>> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> > >>> <viacheslavo@mellanox.com> wrote:
> > >>>>
> > >>>> The DPDK datapath in the transmit direction is very flexible.
> > >>>> The applications can build multisegment packets and manages almost all
> > >>>> data aspects - the memory pools where segments are allocated from, the
> > >>>> segment lengths, the memory attributes like external, registered, etc.
> > >>>>
> > >>>> In the receiving direction, the datapath is much less flexible, the
> > >>>> applications can only specify the memory pool to configure the
> > >>>> receiving queue and nothing more. In order to extend the receiving
> > >>>> datapath capabilities it is proposed to add the new fields into
> > >>>> rte_eth_rxconf structure:
> > >>>>
> > >>>> struct rte_eth_rxconf {
> > >>>>     ...
> > >>>>     uint16_t rx_split_num; /* number of segments to split */
> > >>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> > >>>>     struct rte_mempool **mp; /* array of segment memory pools */
> > >>>
> > >>> The pool has the packet length it's been configured for.
> > >>> So I think, rx_split_len can be removed.
> > >>
> > >> Yes, it is one of the supposed options - if pointer to array of segment lengths
> > >> is NULL , the queue_setup() could use the lengths from the pool's properties.
> > >> But we are talking about packet split, in general, it should not depend
> > >> on pool properties. What if application provides the single pool
> > >> and just wants to have the tunnel header in the first dedicated mbuf?
> > >>
> > >>>
> > >>> This feature also available in Marvell HW. So it not specific to one vendor.
> > >>> Maybe we could just the use case mention the use case in the depreciation
> > >>> notice and the tentative change in rte_eth_rxconf and exact details can be
> > >>> worked out at the time of implementation.
> > >>>
> > >> So, if I understand correctly, the struct changes in the commit message
> > >> should be marked as just possible implementation?
> > >
> > > Yes.
> > >
> > > We may need to have a detailed discussion on the correct abstraction for various
> > > HW is available with this feature.
> > >
> > > On Marvell HW, We can configure TWO pools for given eth Rx queue.
> > > One pool can be configured as a small packet pool and other one as
> > > large packet pool.
> > > And there is a threshold value to decide the pool between small and large.
> > > For example:
> > > - The small pool is configured 2k
> > > - The large pool is configured with 10k
> > > - And if the threshold value is configured as 2k.
> > > Any packet size <=2K will land in small pool and others in a large pool.
> > > The use case, we are targeting is to save the memory space for jumbo frames.
> >
> > Out of curiosity, do you provide two different buffer address in the descriptor
> > and HW automatically uses one based on the size,
> > or driver uses one of the pools based on the configuration and possible largest
> > packet size?

The later one.

>
> I am all for allowing more configuration of buffer pool.
> But don't want that to be exposed as a hardware specific requirement in the
> API for applications. The worst case would be if your API changes required:
>
>   if (strcmp(dev->driver_name, "marvell") == 0) {
>      // make another mempool for this driver

There is no HW specific requirements here. If one pool specified(like
the existing situation),
HW will create scatter-gather frame.

It is mostly useful for the application use case where it needs single
contiguous of data
for processing(like crypto) and/or improving Rx/TX performance by
running in single seg mode
without losing too much of memory.


>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:37       ` Ferruh Yigit
@ 2020-08-06 16:39         ` Slava Ovsiienko
  2020-08-06 16:43           ` Ferruh Yigit
  0 siblings, 1 reply; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-06 16:39 UTC (permalink / raw)
  To: Ferruh Yigit, Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, jerinjacobk,
	stephen, ajit.khaparde, maxime.coquelin, olivier.matz,
	david.marchand

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, August 6, 2020 19:37
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> jerinjacobk@gmail.com; stephen@networkplumber.org;
> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
> olivier.matz@6wind.com; david.marchand@redhat.com
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On 8/6/2020 5:29 PM, Slava Ovsiienko wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Thursday, August 6, 2020 19:16
> >> To: Andrew Rybchenko <arybchenko@solarflare.com>; Slava Ovsiienko
> >> <viacheslavo@mellanox.com>; dev@dpdk.org
> >> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> >> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> jerinjacobk@gmail.com; stephen@networkplumber.org;
> >> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
> >> olivier.matz@6wind.com; david.marchand@redhat.com
> >> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> >>
> >> On 8/3/2020 3:31 PM, Andrew Rybchenko wrote:
> >>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
> >>>> The DPDK datapath in the transmit direction is very flexible.
> >>>> The applications can build multisegment packets and manages almost
> >>>> all data aspects - the memory pools where segments are allocated
> >>>> from, the segment lengths, the memory attributes like external,
> >>>> registered, etc.
> >>>>
> >>>> In the receiving direction, the datapath is much less flexible, the
> >>>> applications can only specify the memory pool to configure the
> >>>> receiving queue and nothing more. In order to extend the receiving
> >>>> datapath capabilities it is proposed to add the new fields into
> >>>> rte_eth_rxconf structure:
> >>>>
> >>>> struct rte_eth_rxconf {
> >>>>     ...
> >>>>     uint16_t rx_split_num; /* number of segments to split */
> >>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> >>>>     struct rte_mempool **mp; /* array of segment memory pools */
> >>>>     ...
> >>>> };
> >>>>
> >>>> The non-zero value of rx_split_num field configures the receiving
> >>>> queue to split ingress packets into multiple segments to the mbufs
> >>>> allocated from various memory pools according to the specified
> >>>> lengths. The zero value of rx_split_num field provides the backward
> >>>> compatibility and queue should be configured in a regular way (with
> >>>> single/multiple mbufs of the same data buffer length allocated from
> >>>> the single memory pool).
> >>>
> >>> From the above description it is not 100% clear how it will coexist
> >>> with:
> >>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
> >>
> >> +1
> > - supposed to be NULL if the array of lengths/pools is used
> >
> >>
> >>>  - DEV_RX_OFFLOAD_SCATTER
> >>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
> >>> How will application know that the feature is supported? Limitations?
> >>
> >> +1
> > New flag  DEV_RX_OFFLOAD_BUFFER_SPLIT is supposed to be introduced.
> > The feature requires the DEV_RX_OFFLOAD_SCATTER is set.
> > If DEV_RX_OFFLOAD_HEADER_SPLIT is set the error is returned.
> >
> >>
> >>> Is it always split by specified/fixed length?
> >>> What happens if header length is actually different?
> >>
> >> As far as I understand intention is to filter specific packets to a
> >> queue first and later do the split, so the header length will be fixed...
> >
> > Not exactly. The filtering should be handled by rte_flow engine.
> > The intention is to provide the more flexible way to describe rx
> > buffers. Currently it is the single pool with fixed size segments. No
> > way to split the packet into multiple segments with specified lengths
> > and in the specified pools. What if packet payload should be stored in
> > the physical memory on other device (GPU/Storage)? What if caching is
> > not desired for the payload (just forwarding application)? We could provide
> the special NC pool.
> > What if packet should be split into the chunks with specific gaps?
> > For Tx direction we have this opportunity to gather packet from
> > various pools and any desired combinations , but Rx is much less flexible.
> >
> >>>
> >>>> The new approach would allow splitting the ingress packets into
> >>>> multiple parts pushed to the memory with different attributes.
> >>>> For example, the packet headers can be pushed to the embedded data
> >>>> buffers within mbufs and the application data into the external
> >>>> buffers attached to mbufs allocated from the different memory pools.
> >>>> The memory attributes for the split parts may differ either - for
> >>>> example the application data may be pushed into the external memory
> >>>> located on the dedicated physical device, say GPU or NVMe. This
> >>>> would improve the DPDK receiving datapath flexibility preserving
> >>>> compatibility with existing API.
> 
> If you don't know the packet types in advance, how can you use fixed sizes to
> split a packet? Won't it be like having random parts of packet in each
> mempool..
It is per queue configuration. We have the rte_flow engine and can filter out
the desired packets to the desired queue.

> 
> >>>>
> >>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >>>> ---
> >>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>>  1 file changed, 5 insertions(+)
> >>>>
> >>>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>>> b/doc/guides/rel_notes/deprecation.rst
> >>>> index ea4cfa7..cd700ae 100644
> >>>> --- a/doc/guides/rel_notes/deprecation.rst
> >>>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>>> @@ -99,6 +99,11 @@ Deprecation Notices
> >>>>    In 19.11 PMDs will still update the field even when the offload is not
> >>>>    enabled.
> >>>>
> >>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> >>>> +receiving
> >>>> +  queues to split ingress packets into multiple segments according
> >>>> +to the
> >>>> +  specified lengths into the buffers allocated from the specified
> >>>> +  memory pools. The backward compatibility to existing API is
> preserved.
> >>>> +
> >>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
> >> ``rte_eth_rx_descriptor_done``
> >>>>    will be deprecated in 20.11 and will be removed in 21.11.
> >>>>    Existing ``rte_eth_rx_descriptor_status`` and
> >>>> ``rte_eth_tx_descriptor_status``
> >>>
> >


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:29     ` Slava Ovsiienko
@ 2020-08-06 16:37       ` Ferruh Yigit
  2020-08-06 16:39         ` Slava Ovsiienko
  0 siblings, 1 reply; 24+ messages in thread
From: Ferruh Yigit @ 2020-08-06 16:37 UTC (permalink / raw)
  To: Slava Ovsiienko, Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, jerinjacobk,
	stephen, ajit.khaparde, maxime.coquelin, olivier.matz,
	david.marchand

On 8/6/2020 5:29 PM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Thursday, August 6, 2020 19:16
>> To: Andrew Rybchenko <arybchenko@solarflare.com>; Slava Ovsiienko
>> <viacheslavo@mellanox.com>; dev@dpdk.org
>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
>> jerinjacobk@gmail.com; stephen@networkplumber.org;
>> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
>> olivier.matz@6wind.com; david.marchand@redhat.com
>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
>>
>> On 8/3/2020 3:31 PM, Andrew Rybchenko wrote:
>>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
>>>> The DPDK datapath in the transmit direction is very flexible.
>>>> The applications can build multisegment packets and manages almost
>>>> all data aspects - the memory pools where segments are allocated
>>>> from, the segment lengths, the memory attributes like external,
>>>> registered, etc.
>>>>
>>>> In the receiving direction, the datapath is much less flexible, the
>>>> applications can only specify the memory pool to configure the
>>>> receiving queue and nothing more. In order to extend the receiving
>>>> datapath capabilities it is proposed to add the new fields into
>>>> rte_eth_rxconf structure:
>>>>
>>>> struct rte_eth_rxconf {
>>>>     ...
>>>>     uint16_t rx_split_num; /* number of segments to split */
>>>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>>>     struct rte_mempool **mp; /* array of segment memory pools */
>>>>     ...
>>>> };
>>>>
>>>> The non-zero value of rx_split_num field configures the receiving
>>>> queue to split ingress packets into multiple segments to the mbufs
>>>> allocated from various memory pools according to the specified
>>>> lengths. The zero value of rx_split_num field provides the backward
>>>> compatibility and queue should be configured in a regular way (with
>>>> single/multiple mbufs of the same data buffer length allocated from
>>>> the single memory pool).
>>>
>>> From the above description it is not 100% clear how it will coexist
>>> with:
>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>>
>> +1
> - supposed to be NULL if the array of lengths/pools is used
> 
>>
>>>  - DEV_RX_OFFLOAD_SCATTER
>>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
>>> How will application know that the feature is supported? Limitations?
>>
>> +1
> New flag  DEV_RX_OFFLOAD_BUFFER_SPLIT is supposed to be introduced.
> The feature requires the DEV_RX_OFFLOAD_SCATTER is set.
> If DEV_RX_OFFLOAD_HEADER_SPLIT is set the error is returned.
> 
>>
>>> Is it always split by specified/fixed length?
>>> What happens if header length is actually different?
>>
>> As far as I understand intention is to filter specific packets to a queue first
>> and later do the split, so the header length will be fixed...
> 
> Not exactly. The filtering should be handled by rte_flow engine.
> The intention is to provide the more flexible way to describe
> rx buffers. Currently it is the single pool with fixed size segments. No way to
> split the packet into multiple segments with specified lengths and in
> the specified pools. What if packet payload should be stored in the physical
> memory on other device (GPU/Storage)? What if caching is not desired for
> the payload (just forwarding application)? We could provide the special NC pool.
> What if packet should be split into the chunks with specific gaps?
> For Tx direction we have this opportunity to gather packet from various
> pools and any desired combinations , but Rx is much less flexible.
>  
>>>
>>>> The new approach would allow splitting the ingress packets into
>>>> multiple parts pushed to the memory with different attributes.
>>>> For example, the packet headers can be pushed to the embedded data
>>>> buffers within mbufs and the application data into the external
>>>> buffers attached to mbufs allocated from the different memory pools.
>>>> The memory attributes for the split parts may differ either - for
>>>> example the application data may be pushed into the external memory
>>>> located on the dedicated physical device, say GPU or NVMe. This would
>>>> improve the DPDK receiving datapath flexibility preserving
>>>> compatibility with existing API.

If you don't know the packet types in advance, how can you use fixed sizes to
split a packet? Won't it be like having random parts of packet in each mempool..

>>>>
>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>>> ---
>>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>> b/doc/guides/rel_notes/deprecation.rst
>>>> index ea4cfa7..cd700ae 100644
>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>> @@ -99,6 +99,11 @@ Deprecation Notices
>>>>    In 19.11 PMDs will still update the field even when the offload is not
>>>>    enabled.
>>>>
>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
>>>> +receiving
>>>> +  queues to split ingress packets into multiple segments according
>>>> +to the
>>>> +  specified lengths into the buffers allocated from the specified
>>>> +  memory pools. The backward compatibility to existing API is preserved.
>>>> +
>>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
>> ``rte_eth_rx_descriptor_done``
>>>>    will be deprecated in 20.11 and will be removed in 21.11.
>>>>    Existing ``rte_eth_rx_descriptor_status`` and
>>>> ``rte_eth_tx_descriptor_status``
>>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 16:15   ` [dpdk-dev] " Ferruh Yigit
@ 2020-08-06 16:29     ` Slava Ovsiienko
  2020-08-06 16:37       ` Ferruh Yigit
  0 siblings, 1 reply; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-06 16:29 UTC (permalink / raw)
  To: Ferruh Yigit, Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Thomas Monjalon, jerinjacobk,
	stephen, ajit.khaparde, maxime.coquelin, olivier.matz,
	david.marchand

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, August 6, 2020 19:16
> To: Andrew Rybchenko <arybchenko@solarflare.com>; Slava Ovsiienko
> <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>;
> jerinjacobk@gmail.com; stephen@networkplumber.org;
> ajit.khaparde@broadcom.com; maxime.coquelin@redhat.com;
> olivier.matz@6wind.com; david.marchand@redhat.com
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On 8/3/2020 3:31 PM, Andrew Rybchenko wrote:
> > On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
> >> The DPDK datapath in the transmit direction is very flexible.
> >> The applications can build multisegment packets and manages almost
> >> all data aspects - the memory pools where segments are allocated
> >> from, the segment lengths, the memory attributes like external,
> >> registered, etc.
> >>
> >> In the receiving direction, the datapath is much less flexible, the
> >> applications can only specify the memory pool to configure the
> >> receiving queue and nothing more. In order to extend the receiving
> >> datapath capabilities it is proposed to add the new fields into
> >> rte_eth_rxconf structure:
> >>
> >> struct rte_eth_rxconf {
> >>     ...
> >>     uint16_t rx_split_num; /* number of segments to split */
> >>     uint16_t *rx_split_len; /* array of segment lengthes */
> >>     struct rte_mempool **mp; /* array of segment memory pools */
> >>     ...
> >> };
> >>
> >> The non-zero value of rx_split_num field configures the receiving
> >> queue to split ingress packets into multiple segments to the mbufs
> >> allocated from various memory pools according to the specified
> >> lengths. The zero value of rx_split_num field provides the backward
> >> compatibility and queue should be configured in a regular way (with
> >> single/multiple mbufs of the same data buffer length allocated from
> >> the single memory pool).
> >
> > From the above description it is not 100% clear how it will coexist
> > with:
> >  - existing mb_pool argument of the rte_eth_rx_queue_setup()
> 
> +1
- supposed to be NULL if the array of lengths/pools is used

> 
> >  - DEV_RX_OFFLOAD_SCATTER
> >  - DEV_RX_OFFLOAD_HEADER_SPLIT
> > How will application know that the feature is supported? Limitations?
> 
> +1
New flag  DEV_RX_OFFLOAD_BUFFER_SPLIT is supposed to be introduced.
The feature requires the DEV_RX_OFFLOAD_SCATTER is set.
If DEV_RX_OFFLOAD_HEADER_SPLIT is set the error is returned.

> 
> > Is it always split by specified/fixed length?
> > What happens if header length is actually different?
> 
> As far as I understand intention is to filter specific packets to a queue first
> and later do the split, so the header length will be fixed...

Not exactly. The filtering should be handled by rte_flow engine.
The intention is to provide the more flexible way to describe
rx buffers. Currently it is the single pool with fixed size segments. No way to
split the packet into multiple segments with specified lengths and in
the specified pools. What if packet payload should be stored in the physical
memory on other device (GPU/Storage)? What if caching is not desired for
the payload (just forwarding application)? We could provide the special NC pool.
What if packet should be split into the chunks with specific gaps?
For Tx direction we have this opportunity to gather packet from various
pools and any desired combinations , but Rx is much less flexible.
 
> >
> >> The new approach would allow splitting the ingress packets into
> >> multiple parts pushed to the memory with different attributes.
> >> For example, the packet headers can be pushed to the embedded data
> >> buffers within mbufs and the application data into the external
> >> buffers attached to mbufs allocated from the different memory pools.
> >> The memory attributes for the split parts may differ either - for
> >> example the application data may be pushed into the external memory
> >> located on the dedicated physical device, say GPU or NVMe. This would
> >> improve the DPDK receiving datapath flexibility preserving
> >> compatibility with existing API.
> >>
> >> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index ea4cfa7..cd700ae 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -99,6 +99,11 @@ Deprecation Notices
> >>    In 19.11 PMDs will still update the field even when the offload is not
> >>    enabled.
> >>
> >> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> >> +receiving
> >> +  queues to split ingress packets into multiple segments according
> >> +to the
> >> +  specified lengths into the buffers allocated from the specified
> >> +  memory pools. The backward compatibility to existing API is preserved.
> >> +
> >>  * ethdev: ``rx_descriptor_done`` dev_ops and
> ``rte_eth_rx_descriptor_done``
> >>    will be deprecated in 20.11 and will be removed in 21.11.
> >>    Existing ``rte_eth_rx_descriptor_status`` and
> >> ``rte_eth_tx_descriptor_status``
> >


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-06 15:58       ` Ferruh Yigit
@ 2020-08-06 16:25         ` Stephen Hemminger
  2020-08-06 16:41           ` Jerin Jacob
  2020-08-06 17:03           ` Slava Ovsiienko
  0 siblings, 2 replies; 24+ messages in thread
From: Stephen Hemminger @ 2020-08-06 16:25 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Jerin Jacob, Slava Ovsiienko, dpdk-dev, Matan Azrad,
	Raslan Darawsheh, Thomas Monjalon, Andrew Rybchenko,
	Ajit Khaparde, Maxime Coquelin, Olivier Matz, David Marchand

On Thu, 6 Aug 2020 16:58:22 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 8/4/2020 2:32 PM, Jerin Jacob wrote:
> > On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko <viacheslavo@mellanox.com> wrote:  
> >>
> >> Hi, Jerin,
> >>
> >> Thanks for the comment,  please, see below.
> >>  
> >>> -----Original Message-----
> >>> From: Jerin Jacob <jerinjacobk@gmail.com>
> >>> Sent: Monday, August 3, 2020 14:57
> >>> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> >>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> >>> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> >>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Stephen
> >>> Hemminger <stephen@networkplumber.org>; Andrew Rybchenko
> >>> <arybchenko@solarflare.com>; Ajit Khaparde
> >>> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> >>> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> >>> David Marchand <david.marchand@redhat.com>
> >>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> >>>
> >>> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> >>> <viacheslavo@mellanox.com> wrote:  
> >>>>
> >>>> The DPDK datapath in the transmit direction is very flexible.
> >>>> The applications can build multisegment packets and manages almost all
> >>>> data aspects - the memory pools where segments are allocated from, the
> >>>> segment lengths, the memory attributes like external, registered, etc.
> >>>>
> >>>> In the receiving direction, the datapath is much less flexible, the
> >>>> applications can only specify the memory pool to configure the
> >>>> receiving queue and nothing more. In order to extend the receiving
> >>>> datapath capabilities it is proposed to add the new fields into
> >>>> rte_eth_rxconf structure:
> >>>>
> >>>> struct rte_eth_rxconf {
> >>>>     ...
> >>>>     uint16_t rx_split_num; /* number of segments to split */
> >>>>     uint16_t *rx_split_len; /* array of segment lengthes */
> >>>>     struct rte_mempool **mp; /* array of segment memory pools */  
> >>>
> >>> The pool has the packet length it's been configured for.
> >>> So I think, rx_split_len can be removed.  
> >>
> >> Yes, it is one of the supposed options - if pointer to array of segment lengths
> >> is NULL , the queue_setup() could use the lengths from the pool's properties.
> >> But we are talking about packet split, in general, it should not depend
> >> on pool properties. What if application provides the single pool
> >> and just wants to have the tunnel header in the first dedicated mbuf?
> >>  
> >>>
> >>> This feature also available in Marvell HW. So it not specific to one vendor.
> >>> Maybe we could just the use case mention the use case in the depreciation
> >>> notice and the tentative change in rte_eth_rxconf and exact details can be
> >>> worked out at the time of implementation.
> >>>  
> >> So, if I understand correctly, the struct changes in the commit message
> >> should be marked as just possible implementation?  
> > 
> > Yes.
> > 
> > We may need to have a detailed discussion on the correct abstraction for various
> > HW is available with this feature.
> > 
> > On Marvell HW, We can configure TWO pools for given eth Rx queue.
> > One pool can be configured as a small packet pool and other one as
> > large packet pool.
> > And there is a threshold value to decide the pool between small and large.
> > For example:
> > - The small pool is configured 2k
> > - The large pool is configured with 10k
> > - And if the threshold value is configured as 2k.
> > Any packet size <=2K will land in small pool and others in a large pool.
> > The use case, we are targeting is to save the memory space for jumbo frames.  
> 
> Out of curiosity, do you provide two different buffer address in the descriptor
> and HW automatically uses one based on the size,
> or driver uses one of the pools based on the configuration and possible largest
> packet size?

I am all for allowing more configuration of buffer pool.
But don't want that to be exposed as a hardware specific requirement in the
API for applications. The worst case would be if your API changes required:

  if (strcmp(dev->driver_name, "marvell") == 0) {
     // make another mempool for this driver



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 14:31 ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
@ 2020-08-06 16:15   ` Ferruh Yigit
  2020-08-06 16:29     ` Slava Ovsiienko
  0 siblings, 1 reply; 24+ messages in thread
From: Ferruh Yigit @ 2020-08-06 16:15 UTC (permalink / raw)
  To: Andrew Rybchenko, Viacheslav Ovsiienko, dev
  Cc: matan, rasland, thomas, jerinjacobk, stephen, ajit.khaparde,
	maxime.coquelin, olivier.matz, david.marchand

On 8/3/2020 3:31 PM, Andrew Rybchenko wrote:
> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
>> The DPDK datapath in the transmit direction is very flexible.
>> The applications can build multisegment packets and manages
>> almost all data aspects - the memory pools where segments
>> are allocated from, the segment lengths, the memory attributes
>> like external, registered, etc.
>>
>> In the receiving direction, the datapath is much less flexible,
>> the applications can only specify the memory pool to configure
>> the receiving queue and nothing more. In order to extend the
>> receiving datapath capabilities it is proposed to add the new
>> fields into rte_eth_rxconf structure:
>>
>> struct rte_eth_rxconf {
>>     ...
>>     uint16_t rx_split_num; /* number of segments to split */
>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>     struct rte_mempool **mp; /* array of segment memory pools */
>>     ...
>> };
>>
>> The non-zero value of rx_split_num field configures the receiving
>> queue to split ingress packets into multiple segments to the mbufs
>> allocated from various memory pools according to the specified
>> lengths. The zero value of rx_split_num field provides the
>> backward compatibility and queue should be configured in a regular
>> way (with single/multiple mbufs of the same data buffer length
>> allocated from the single memory pool).
> 
> From the above description it is not 100% clear how it will
> coexist with:
>  - existing mb_pool argument of the rte_eth_rx_queue_setup()

+1

>  - DEV_RX_OFFLOAD_SCATTER
>  - DEV_RX_OFFLOAD_HEADER_SPLIT
> How will application know that the feature is supported? Limitations?

+1

> Is it always split by specified/fixed length?
> What happens if header length is actually different?

As far as I understand intention is to filter specific packets to a queue first
and later do the split, so the header length will be fixed...

> 
>> The new approach would allow splitting the ingress packets into
>> multiple parts pushed to the memory with different attributes.
>> For example, the packet headers can be pushed to the embedded data
>> buffers within mbufs and the application data into the external
>> buffers attached to mbufs allocated from the different memory
>> pools. The memory attributes for the split parts may differ
>> either - for example the application data may be pushed into
>> the external memory located on the dedicated physical device,
>> say GPU or NVMe. This would improve the DPDK receiving datapath
>> flexibility preserving compatibility with existing API.
>>
>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index ea4cfa7..cd700ae 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -99,6 +99,11 @@ Deprecation Notices
>>    In 19.11 PMDs will still update the field even when the offload is not
>>    enabled.
>>  
>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the receiving
>> +  queues to split ingress packets into multiple segments according to the
>> +  specified lengths into the buffers allocated from the specified
>> +  memory pools. The backward compatibility to existing API is preserved.
>> +
>>  * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
>>    will be deprecated in 20.11 and will be removed in 21.11.
>>    Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-04 13:32     ` Jerin Jacob
  2020-08-05  6:35       ` Slava Ovsiienko
@ 2020-08-06 15:58       ` Ferruh Yigit
  2020-08-06 16:25         ` Stephen Hemminger
  1 sibling, 1 reply; 24+ messages in thread
From: Ferruh Yigit @ 2020-08-06 15:58 UTC (permalink / raw)
  To: Jerin Jacob, Slava Ovsiienko
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Thomas Monjalon,
	Stephen Hemminger, Andrew Rybchenko, Ajit Khaparde,
	Maxime Coquelin, Olivier Matz, David Marchand

On 8/4/2020 2:32 PM, Jerin Jacob wrote:
> On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
>>
>> Hi, Jerin,
>>
>> Thanks for the comment,  please, see below.
>>
>>> -----Original Message-----
>>> From: Jerin Jacob <jerinjacobk@gmail.com>
>>> Sent: Monday, August 3, 2020 14:57
>>> To: Slava Ovsiienko <viacheslavo@mellanox.com>
>>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
>>> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Stephen
>>> Hemminger <stephen@networkplumber.org>; Andrew Rybchenko
>>> <arybchenko@solarflare.com>; Ajit Khaparde
>>> <ajit.khaparde@broadcom.com>; Maxime Coquelin
>>> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
>>> David Marchand <david.marchand@redhat.com>
>>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
>>>
>>> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
>>> <viacheslavo@mellanox.com> wrote:
>>>>
>>>> The DPDK datapath in the transmit direction is very flexible.
>>>> The applications can build multisegment packets and manages almost all
>>>> data aspects - the memory pools where segments are allocated from, the
>>>> segment lengths, the memory attributes like external, registered, etc.
>>>>
>>>> In the receiving direction, the datapath is much less flexible, the
>>>> applications can only specify the memory pool to configure the
>>>> receiving queue and nothing more. In order to extend the receiving
>>>> datapath capabilities it is proposed to add the new fields into
>>>> rte_eth_rxconf structure:
>>>>
>>>> struct rte_eth_rxconf {
>>>>     ...
>>>>     uint16_t rx_split_num; /* number of segments to split */
>>>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>>>     struct rte_mempool **mp; /* array of segment memory pools */
>>>
>>> The pool has the packet length it's been configured for.
>>> So I think, rx_split_len can be removed.
>>
>> Yes, it is one of the supposed options - if pointer to array of segment lengths
>> is NULL , the queue_setup() could use the lengths from the pool's properties.
>> But we are talking about packet split, in general, it should not depend
>> on pool properties. What if application provides the single pool
>> and just wants to have the tunnel header in the first dedicated mbuf?
>>
>>>
>>> This feature also available in Marvell HW. So it not specific to one vendor.
>>> Maybe we could just the use case mention the use case in the depreciation
>>> notice and the tentative change in rte_eth_rxconf and exact details can be
>>> worked out at the time of implementation.
>>>
>> So, if I understand correctly, the struct changes in the commit message
>> should be marked as just possible implementation?
> 
> Yes.
> 
> We may need to have a detailed discussion on the correct abstraction for various
> HW is available with this feature.
> 
> On Marvell HW, We can configure TWO pools for given eth Rx queue.
> One pool can be configured as a small packet pool and other one as
> large packet pool.
> And there is a threshold value to decide the pool between small and large.
> For example:
> - The small pool is configured 2k
> - The large pool is configured with 10k
> - And if the threshold value is configured as 2k.
> Any packet size <=2K will land in small pool and others in a large pool.
> The use case, we are targeting is to save the memory space for jumbo frames.

Out of curiosity, do you provide two different buffer address in the descriptor
and HW automatically uses one based on the size,
or driver uses one of the pools based on the configuration and possible largest
packet size?

> 
> If you can share the MLX HW working model, Then we can find the
> correct abstraction.
> 
>>
>> With best regards,
>> Slava
>>
>>> With the above change:
>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>>
>>>
>>>>     ...
>>>> };
>>>>
>>>> The non-zero value of rx_split_num field configures the receiving
>>>> queue to split ingress packets into multiple segments to the mbufs
>>>> allocated from various memory pools according to the specified
>>>> lengths. The zero value of rx_split_num field provides the backward
>>>> compatibility and queue should be configured in a regular way (with
>>>> single/multiple mbufs of the same data buffer length allocated from
>>>> the single memory pool).
>>>>
>>>> The new approach would allow splitting the ingress packets into
>>>> multiple parts pushed to the memory with different attributes.
>>>> For example, the packet headers can be pushed to the embedded data
>>>> buffers within mbufs and the application data into the external
>>>> buffers attached to mbufs allocated from the different memory pools.
>>>> The memory attributes for the split parts may differ either - for
>>>> example the application data may be pushed into the external memory
>>>> located on the dedicated physical device, say GPU or NVMe. This would
>>>> improve the DPDK receiving datapath flexibility preserving
>>>> compatibility with existing API.
>>>>
>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>>> ---
>>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>> b/doc/guides/rel_notes/deprecation.rst
>>>> index ea4cfa7..cd700ae 100644
>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>> @@ -99,6 +99,11 @@ Deprecation Notices
>>>>    In 19.11 PMDs will still update the field even when the offload is not
>>>>    enabled.
>>>>
>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
>>>> +receiving
>>>> +  queues to split ingress packets into multiple segments according to
>>>> +the
>>>> +  specified lengths into the buffers allocated from the specified
>>>> +  memory pools. The backward compatibility to existing API is preserved.
>>>> +
>>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
>>> ``rte_eth_rx_descriptor_done``
>>>>    will be deprecated in 20.11 and will be removed in 21.11.
>>>>    Existing ``rte_eth_rx_descriptor_status`` and
>>>> ``rte_eth_tx_descriptor_status``
>>>> --
>>>> 1.8.3.1
>>>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-04 13:32     ` Jerin Jacob
@ 2020-08-05  6:35       ` Slava Ovsiienko
  2020-08-06 15:58       ` Ferruh Yigit
  1 sibling, 0 replies; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-05  6:35 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Thomas Monjalon,
	Ferruh Yigit, Stephen Hemminger, Andrew Rybchenko, Ajit Khaparde,
	Maxime Coquelin, Olivier Matz, David Marchand

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, August 4, 2020 16:33
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Stephen
> Hemminger <stephen@networkplumber.org>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> David Marchand <david.marchand@redhat.com>
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko
> <viacheslavo@mellanox.com> wrote:
> >
> > Hi, Jerin,
> >
> > Thanks for the comment,  please, see below.
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, August 3, 2020 14:57
> > > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > > Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> > > Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> > > <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > Stephen Hemminger <stephen@networkplumber.org>; Andrew
> Rybchenko
> > > <arybchenko@solarflare.com>; Ajit Khaparde
> > > <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > > <maxime.coquelin@redhat.com>; Olivier Matz
> <olivier.matz@6wind.com>;
> > > David Marchand <david.marchand@redhat.com>
> > > Subject: Re: [PATCH] doc: announce changes to ethdev rxconf
> > > structure
> > >
> > > On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> > > <viacheslavo@mellanox.com> wrote:
> > > >
> > > > The DPDK datapath in the transmit direction is very flexible.
> > > > The applications can build multisegment packets and manages almost
> > > > all data aspects - the memory pools where segments are allocated
> > > > from, the segment lengths, the memory attributes like external,
> registered, etc.
> > > >
> > > > In the receiving direction, the datapath is much less flexible,
> > > > the applications can only specify the memory pool to configure the
> > > > receiving queue and nothing more. In order to extend the receiving
> > > > datapath capabilities it is proposed to add the new fields into
> > > > rte_eth_rxconf structure:
> > > >
> > > > struct rte_eth_rxconf {
> > > >     ...
> > > >     uint16_t rx_split_num; /* number of segments to split */
> > > >     uint16_t *rx_split_len; /* array of segment lengthes */
> > > >     struct rte_mempool **mp; /* array of segment memory pools */
> > >
> > > The pool has the packet length it's been configured for.
> > > So I think, rx_split_len can be removed.
> >
> > Yes, it is one of the supposed options - if pointer to array of
> > segment lengths is NULL , the queue_setup() could use the lengths from the
> pool's properties.
> > But we are talking about packet split, in general, it should not
> > depend on pool properties. What if application provides the single
> > pool and just wants to have the tunnel header in the first dedicated mbuf?
> >
> > >
> > > This feature also available in Marvell HW. So it not specific to one
> vendor.
> > > Maybe we could just the use case mention the use case in the
> > > depreciation notice and the tentative change in rte_eth_rxconf and
> > > exact details can be worked out at the time of implementation.
> > >
> > So, if I understand correctly, the struct changes in the commit
> > message should be marked as just possible implementation?
> 
> Yes.
> 
> We may need to have a detailed discussion on the correct abstraction for
> various HW is available with this feature.
> 
> On Marvell HW, We can configure TWO pools for given eth Rx queue.
> One pool can be configured as a small packet pool and other one as large
> packet pool.
> And there is a threshold value to decide the pool between small and large.
> For example:
> - The small pool is configured 2k
> - The large pool is configured with 10k
> - And if the threshold value is configured as 2k.
> Any packet size <=2K will land in small pool and others in a large pool.
> The use case, we are targeting is to save the memory space for jumbo
> frames.

It is a little bit different than split feature, it is about somehow packet smart sorting.
"Buffer split" is just about more flexible description of rx buffers. Currently
the rx buffers can only be the chain of the buffers of the same size and
allocated from the same memory pool. It is simple and not versatile way,
we could extend this.

Of course, there is no any objection against sharing this split Rx buffer description
with other features, but, for the example above (2k/10k) it is only the part, it
would require some other parameters (threshold) not used by split. Yes, let's
discuss.

> 
> If you can share the MLX HW working model, Then we can find the correct
> abstraction.
From MLNX HW point of view buffer split feature does require nothing special.
The HW rx buffer descriptors support flexible buffer formats, there is
no problem to specify the chain of mbufs with different sizes and dedicated
pointers to receive and split packet into by hardware.


With best regards,
Slava
> >
> > > With the above change:
> > > Acked-by: Jerin Jacob <jerinj@marvell.com>
> > >
> > >
> > > >     ...
> > > > };
> > > >
> > > > The non-zero value of rx_split_num field configures the receiving
> > > > queue to split ingress packets into multiple segments to the mbufs
> > > > allocated from various memory pools according to the specified
> > > > lengths. The zero value of rx_split_num field provides the
> > > > backward compatibility and queue should be configured in a regular
> > > > way (with single/multiple mbufs of the same data buffer length
> > > > allocated from the single memory pool).
> > > >
> > > > The new approach would allow splitting the ingress packets into
> > > > multiple parts pushed to the memory with different attributes.
> > > > For example, the packet headers can be pushed to the embedded data
> > > > buffers within mbufs and the application data into the external
> > > > buffers attached to mbufs allocated from the different memory pools.
> > > > The memory attributes for the split parts may differ either - for
> > > > example the application data may be pushed into the external
> > > > memory located on the dedicated physical device, say GPU or NVMe.
> > > > This would improve the DPDK receiving datapath flexibility
> > > > preserving compatibility with existing API.
> > > >
> > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > > ---
> > > >  doc/guides/rel_notes/deprecation.rst | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > > b/doc/guides/rel_notes/deprecation.rst
> > > > index ea4cfa7..cd700ae 100644
> > > > --- a/doc/guides/rel_notes/deprecation.rst
> > > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > > @@ -99,6 +99,11 @@ Deprecation Notices
> > > >    In 19.11 PMDs will still update the field even when the offload is not
> > > >    enabled.
> > > >
> > > > +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> > > > +receiving
> > > > +  queues to split ingress packets into multiple segments
> > > > +according to the
> > > > +  specified lengths into the buffers allocated from the specified
> > > > +  memory pools. The backward compatibility to existing API is
> preserved.
> > > > +
> > > >  * ethdev: ``rx_descriptor_done`` dev_ops and
> > > ``rte_eth_rx_descriptor_done``
> > > >    will be deprecated in 20.11 and will be removed in 21.11.
> > > >    Existing ``rte_eth_rx_descriptor_status`` and
> > > > ``rte_eth_tx_descriptor_status``
> > > > --
> > > > 1.8.3.1
> > > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 13:06   ` Slava Ovsiienko
@ 2020-08-04 13:32     ` Jerin Jacob
  2020-08-05  6:35       ` Slava Ovsiienko
  2020-08-06 15:58       ` Ferruh Yigit
  0 siblings, 2 replies; 24+ messages in thread
From: Jerin Jacob @ 2020-08-04 13:32 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Thomas Monjalon,
	Ferruh Yigit, Stephen Hemminger, Andrew Rybchenko, Ajit Khaparde,
	Maxime Coquelin, Olivier Matz, David Marchand

On Mon, Aug 3, 2020 at 6:36 PM Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
>
> Hi, Jerin,
>
> Thanks for the comment,  please, see below.
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, August 3, 2020 14:57
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> > Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> > <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Stephen
> > Hemminger <stephen@networkplumber.org>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; Ajit Khaparde
> > <ajit.khaparde@broadcom.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> > David Marchand <david.marchand@redhat.com>
> > Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> >
> > On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> > <viacheslavo@mellanox.com> wrote:
> > >
> > > The DPDK datapath in the transmit direction is very flexible.
> > > The applications can build multisegment packets and manages almost all
> > > data aspects - the memory pools where segments are allocated from, the
> > > segment lengths, the memory attributes like external, registered, etc.
> > >
> > > In the receiving direction, the datapath is much less flexible, the
> > > applications can only specify the memory pool to configure the
> > > receiving queue and nothing more. In order to extend the receiving
> > > datapath capabilities it is proposed to add the new fields into
> > > rte_eth_rxconf structure:
> > >
> > > struct rte_eth_rxconf {
> > >     ...
> > >     uint16_t rx_split_num; /* number of segments to split */
> > >     uint16_t *rx_split_len; /* array of segment lengthes */
> > >     struct rte_mempool **mp; /* array of segment memory pools */
> >
> > The pool has the packet length it's been configured for.
> > So I think, rx_split_len can be removed.
>
> Yes, it is one of the supposed options - if pointer to array of segment lengths
> is NULL , the queue_setup() could use the lengths from the pool's properties.
> But we are talking about packet split, in general, it should not depend
> on pool properties. What if application provides the single pool
> and just wants to have the tunnel header in the first dedicated mbuf?
>
> >
> > This feature also available in Marvell HW. So it not specific to one vendor.
> > Maybe we could just the use case mention the use case in the depreciation
> > notice and the tentative change in rte_eth_rxconf and exact details can be
> > worked out at the time of implementation.
> >
> So, if I understand correctly, the struct changes in the commit message
> should be marked as just possible implementation?

Yes.

We may need to have a detailed discussion on the correct abstraction for various
HW is available with this feature.

On Marvell HW, We can configure TWO pools for given eth Rx queue.
One pool can be configured as a small packet pool and other one as
large packet pool.
And there is a threshold value to decide the pool between small and large.
For example:
- The small pool is configured 2k
- The large pool is configured with 10k
- And if the threshold value is configured as 2k.
Any packet size <=2K will land in small pool and others in a large pool.
The use case, we are targeting is to save the memory space for jumbo frames.

If you can share the MLX HW working model, Then we can find the
correct abstraction.

>
> With best regards,
> Slava
>
> > With the above change:
> > Acked-by: Jerin Jacob <jerinj@marvell.com>
> >
> >
> > >     ...
> > > };
> > >
> > > The non-zero value of rx_split_num field configures the receiving
> > > queue to split ingress packets into multiple segments to the mbufs
> > > allocated from various memory pools according to the specified
> > > lengths. The zero value of rx_split_num field provides the backward
> > > compatibility and queue should be configured in a regular way (with
> > > single/multiple mbufs of the same data buffer length allocated from
> > > the single memory pool).
> > >
> > > The new approach would allow splitting the ingress packets into
> > > multiple parts pushed to the memory with different attributes.
> > > For example, the packet headers can be pushed to the embedded data
> > > buffers within mbufs and the application data into the external
> > > buffers attached to mbufs allocated from the different memory pools.
> > > The memory attributes for the split parts may differ either - for
> > > example the application data may be pushed into the external memory
> > > located on the dedicated physical device, say GPU or NVMe. This would
> > > improve the DPDK receiving datapath flexibility preserving
> > > compatibility with existing API.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > >  doc/guides/rel_notes/deprecation.rst | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index ea4cfa7..cd700ae 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -99,6 +99,11 @@ Deprecation Notices
> > >    In 19.11 PMDs will still update the field even when the offload is not
> > >    enabled.
> > >
> > > +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> > > +receiving
> > > +  queues to split ingress packets into multiple segments according to
> > > +the
> > > +  specified lengths into the buffers allocated from the specified
> > > +  memory pools. The backward compatibility to existing API is preserved.
> > > +
> > >  * ethdev: ``rx_descriptor_done`` dev_ops and
> > ``rte_eth_rx_descriptor_done``
> > >    will be deprecated in 20.11 and will be removed in 21.11.
> > >    Existing ``rte_eth_rx_descriptor_status`` and
> > > ``rte_eth_tx_descriptor_status``
> > > --
> > > 1.8.3.1
> > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 11:56 ` Jerin Jacob
@ 2020-08-03 13:06   ` Slava Ovsiienko
  2020-08-04 13:32     ` Jerin Jacob
  0 siblings, 1 reply; 24+ messages in thread
From: Slava Ovsiienko @ 2020-08-03 13:06 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Thomas Monjalon,
	Ferruh Yigit, Stephen Hemminger, Andrew Rybchenko, Ajit Khaparde,
	Maxime Coquelin, Olivier Matz, David Marchand

Hi, Jerin,

Thanks for the comment,  please, see below.

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, August 3, 2020 14:57
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> Raslan Darawsheh <rasland@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Stephen
> Hemminger <stephen@networkplumber.org>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Olivier Matz <olivier.matz@6wind.com>;
> David Marchand <david.marchand@redhat.com>
> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
> 
> On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
> <viacheslavo@mellanox.com> wrote:
> >
> > The DPDK datapath in the transmit direction is very flexible.
> > The applications can build multisegment packets and manages almost all
> > data aspects - the memory pools where segments are allocated from, the
> > segment lengths, the memory attributes like external, registered, etc.
> >
> > In the receiving direction, the datapath is much less flexible, the
> > applications can only specify the memory pool to configure the
> > receiving queue and nothing more. In order to extend the receiving
> > datapath capabilities it is proposed to add the new fields into
> > rte_eth_rxconf structure:
> >
> > struct rte_eth_rxconf {
> >     ...
> >     uint16_t rx_split_num; /* number of segments to split */
> >     uint16_t *rx_split_len; /* array of segment lengthes */
> >     struct rte_mempool **mp; /* array of segment memory pools */
> 
> The pool has the packet length it's been configured for.
> So I think, rx_split_len can be removed.

Yes, it is one of the supposed options - if pointer to array of segment lengths
is NULL , the queue_setup() could use the lengths from the pool's properties.
But we are talking about packet split, in general, it should not depend
on pool properties. What if application provides the single pool
and just wants to have the tunnel header in the first dedicated mbuf?

> 
> This feature also available in Marvell HW. So it not specific to one vendor.
> Maybe we could just the use case mention the use case in the depreciation
> notice and the tentative change in rte_eth_rxconf and exact details can be
> worked out at the time of implementation.
> 
So, if I understand correctly, the struct changes in the commit message
should be marked as just possible implementation?

With best regards,
Slava

> With the above change:
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> 
> >     ...
> > };
> >
> > The non-zero value of rx_split_num field configures the receiving
> > queue to split ingress packets into multiple segments to the mbufs
> > allocated from various memory pools according to the specified
> > lengths. The zero value of rx_split_num field provides the backward
> > compatibility and queue should be configured in a regular way (with
> > single/multiple mbufs of the same data buffer length allocated from
> > the single memory pool).
> >
> > The new approach would allow splitting the ingress packets into
> > multiple parts pushed to the memory with different attributes.
> > For example, the packet headers can be pushed to the embedded data
> > buffers within mbufs and the application data into the external
> > buffers attached to mbufs allocated from the different memory pools.
> > The memory attributes for the split parts may differ either - for
> > example the application data may be pushed into the external memory
> > located on the dedicated physical device, say GPU or NVMe. This would
> > improve the DPDK receiving datapath flexibility preserving
> > compatibility with existing API.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index ea4cfa7..cd700ae 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -99,6 +99,11 @@ Deprecation Notices
> >    In 19.11 PMDs will still update the field even when the offload is not
> >    enabled.
> >
> > +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
> > +receiving
> > +  queues to split ingress packets into multiple segments according to
> > +the
> > +  specified lengths into the buffers allocated from the specified
> > +  memory pools. The backward compatibility to existing API is preserved.
> > +
> >  * ethdev: ``rx_descriptor_done`` dev_ops and
> ``rte_eth_rx_descriptor_done``
> >    will be deprecated in 20.11 and will be removed in 21.11.
> >    Existing ``rte_eth_rx_descriptor_status`` and
> > ``rte_eth_tx_descriptor_status``
> > --
> > 1.8.3.1
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
  2020-08-03 10:58 Viacheslav Ovsiienko
@ 2020-08-03 11:56 ` Jerin Jacob
  2020-08-03 13:06   ` Slava Ovsiienko
  2020-08-03 14:31 ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
  1 sibling, 1 reply; 24+ messages in thread
From: Jerin Jacob @ 2020-08-03 11:56 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Thomas Monjalon,
	Ferruh Yigit, Stephen Hemminger, Andrew Rybchenko, Ajit Khaparde,
	Maxime Coquelin, Olivier Matz, David Marchand

On Mon, Aug 3, 2020 at 4:28 PM Viacheslav Ovsiienko
<viacheslavo@mellanox.com> wrote:
>
> The DPDK datapath in the transmit direction is very flexible.
> The applications can build multisegment packets and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external, registered, etc.
>
> In the receiving direction, the datapath is much less flexible,
> the applications can only specify the memory pool to configure
> the receiving queue and nothing more. In order to extend the
> receiving datapath capabilities it is proposed to add the new
> fields into rte_eth_rxconf structure:
>
> struct rte_eth_rxconf {
>     ...
>     uint16_t rx_split_num; /* number of segments to split */
>     uint16_t *rx_split_len; /* array of segment lengthes */
>     struct rte_mempool **mp; /* array of segment memory pools */

The pool has the packet length it's been configured for.
So I think, rx_split_len can be removed.

This feature also available in Marvell HW. So it not specific to one vendor.
Maybe we could just the use case mention the use case in the depreciation notice
and the tentative change in rte_eth_rxconf and exact details can be worked
out at the time of implementation.

With the above change:
Acked-by: Jerin Jacob <jerinj@marvell.com>


>     ...
> };
>
> The non-zero value of rx_split_num field configures the receiving
> queue to split ingress packets into multiple segments to the mbufs
> allocated from various memory pools according to the specified
> lengths. The zero value of rx_split_num field provides the
> backward compatibility and queue should be configured in a regular
> way (with single/multiple mbufs of the same data buffer length
> allocated from the single memory pool).
>
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded data
> buffers within mbufs and the application data into the external
> buffers attached to mbufs allocated from the different memory
> pools. The memory attributes for the split parts may differ
> either - for example the application data may be pushed into
> the external memory located on the dedicated physical device,
> say GPU or NVMe. This would improve the DPDK receiving datapath
> flexibility preserving compatibility with existing API.
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index ea4cfa7..cd700ae 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -99,6 +99,11 @@ Deprecation Notices
>    In 19.11 PMDs will still update the field even when the offload is not
>    enabled.
>
> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the receiving
> +  queues to split ingress packets into multiple segments according to the
> +  specified lengths into the buffers allocated from the specified
> +  memory pools. The backward compatibility to existing API is preserved.
> +
>  * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
>    will be deprecated in 20.11 and will be removed in 21.11.
>    Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure
@ 2020-08-03 10:58 Viacheslav Ovsiienko
  2020-08-03 11:56 ` Jerin Jacob
  2020-08-03 14:31 ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
  0 siblings, 2 replies; 24+ messages in thread
From: Viacheslav Ovsiienko @ 2020-08-03 10:58 UTC (permalink / raw)
  To: dev
  Cc: matan, rasland, thomas, ferruh.yigit, jerinjacobk, stephen,
	arybchenko, ajit.khaparde, maxime.coquelin, olivier.matz,
	david.marchand

The DPDK datapath in the transmit direction is very flexible.
The applications can build multisegment packets and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external, registered, etc.

In the receiving direction, the datapath is much less flexible,
the applications can only specify the memory pool to configure
the receiving queue and nothing more. In order to extend the
receiving datapath capabilities it is proposed to add the new
fields into rte_eth_rxconf structure:

struct rte_eth_rxconf {
    ...
    uint16_t rx_split_num; /* number of segments to split */
    uint16_t *rx_split_len; /* array of segment lengthes */
    struct rte_mempool **mp; /* array of segment memory pools */
    ...
};

The non-zero value of rx_split_num field configures the receiving
queue to split ingress packets into multiple segments to the mbufs
allocated from various memory pools according to the specified
lengths. The zero value of rx_split_num field provides the
backward compatibility and queue should be configured in a regular
way (with single/multiple mbufs of the same data buffer length
allocated from the single memory pool).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded data
buffers within mbufs and the application data into the external
buffers attached to mbufs allocated from the different memory
pools. The memory attributes for the split parts may differ
either - for example the application data may be pushed into
the external memory located on the dedicated physical device,
say GPU or NVMe. This would improve the DPDK receiving datapath
flexibility preserving compatibility with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 doc/guides/rel_notes/deprecation.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index ea4cfa7..cd700ae 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -99,6 +99,11 @@ Deprecation Notices
   In 19.11 PMDs will still update the field even when the offload is not
   enabled.
 
+* ethdev: add new fields to ``rte_eth_rxconf`` to configure the receiving
+  queues to split ingress packets into multiple segments according to the
+  specified lengths into the buffers allocated from the specified
+  memory pools. The backward compatibility to existing API is preserved.
+
 * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
   will be deprecated in 20.11 and will be removed in 21.11.
   Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-08-31 16:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03 15:18 [dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure Slava Ovsiienko
2020-08-03 15:31 ` Andrew Rybchenko
2020-08-03 16:51   ` Slava Ovsiienko
2020-08-30 12:58     ` Andrew Rybchenko
2020-08-30 18:26       ` Stephen Hemminger
2020-08-31  6:35         ` Andrew Rybchenko
2020-08-31 16:59           ` Stephen Hemminger
  -- strict thread matches above, loose matches on Subject: below --
2020-08-03 10:58 Viacheslav Ovsiienko
2020-08-03 11:56 ` Jerin Jacob
2020-08-03 13:06   ` Slava Ovsiienko
2020-08-04 13:32     ` Jerin Jacob
2020-08-05  6:35       ` Slava Ovsiienko
2020-08-06 15:58       ` Ferruh Yigit
2020-08-06 16:25         ` Stephen Hemminger
2020-08-06 16:41           ` Jerin Jacob
2020-08-06 17:03           ` Slava Ovsiienko
2020-08-06 18:10             ` Stephen Hemminger
2020-08-07 11:23               ` Slava Ovsiienko
2020-08-03 14:31 ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
2020-08-06 16:15   ` [dpdk-dev] " Ferruh Yigit
2020-08-06 16:29     ` Slava Ovsiienko
2020-08-06 16:37       ` Ferruh Yigit
2020-08-06 16:39         ` Slava Ovsiienko
2020-08-06 16:43           ` Ferruh Yigit
2020-08-06 16:48             ` Slava Ovsiienko

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git