[dpdk-dev] Reshuffling of rte

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] Reshuffling of rte_mbuf structure.
@ 2015-10-31  4:44 shesha Sreenivasamurthy (shesha)
  2015-11-01  4:45 ` Arnon Warshavsky
  0 siblings, 1 reply; 12+ messages in thread
From: shesha Sreenivasamurthy (shesha) @ 2015-10-31  4:44 UTC (permalink / raw)
  To: dev

In Cisco, we are using DPDK for a very high speed packet processor application. We don't use NIC TCP offload / RSS hashing. Putting those fields in the first cache-line - and the obligatory mb->next datum in the second cache line - causes significant LSU pressure and performance degradation. If it does not affect other applications, I would like to propose reshuffling of fields so that the obligator "next" field falls in first cache line and RSS hashing goes to next. If this re-shuffling indeed hurts other applications, another idea is to make it compile time configurable. Please provide feedback.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0x0000C0DE; }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-10-31  4:44 [dpdk-dev] Reshuffling of rte_mbuf structure shesha Sreenivasamurthy (shesha)
@ 2015-11-01  4:45 ` Arnon Warshavsky
  2015-11-02 16:24   ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Arnon Warshavsky @ 2015-11-01  4:45 UTC (permalink / raw)
  To: shesha Sreenivasamurthy (shesha); +Cc: dev

My 2 cents,

This was brought up in the recent user space summit, and it seems that
indeed there is no one cache lines arrangement that fits all.
OTOH multiple compile time options to suffice all flavors, would make it
unpleasant to read maintain test and debug.
(I think there was quiet a consensus in favor of reducing compile options
in general)

Currently I manage similar deviations via our own source control which I
admit to be quite a pain.
I would prefer an option of code manipulation/generation by some script
during dpdk install,
which takes the default version of rte_mbuf.h,
along with an optional user file (json,xml,elvish,whatever) defining the
structure replacements,
creating your custom version, and placing it instead of the installed copy
of rte_mbuf.h.
Maybe the only facility required from dpdk is just the ability to register
calls to such user scripts at some install stage(s), providing the mean
along with responsibility to the user.

/Arnon

On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
shesha@cisco.com> wrote:

> In Cisco, we are using DPDK for a very high speed packet processor
> application. We don't use NIC TCP offload / RSS hashing. Putting those
> fields in the first cache-line - and the obligatory mb->next datum in the
> second cache line - causes significant LSU pressure and performance
> degradation. If it does not affect other applications, I would like to
> propose reshuffling of fields so that the obligator "next" field falls in
> first cache line and RSS hashing goes to next. If this re-shuffling indeed
> hurts other applications, another idea is to make it compile time
> configurable. Please provide feedback.
>
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0x0000C0DE; }
>

-- 

*Arnon Warshavsky*
*Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon@qwilt.com
<arnon@qwilt.com>*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-01  4:45 ` Arnon Warshavsky
@ 2015-11-02 16:24   ` Stephen Hemminger
  2015-11-02 18:30     ` shesha Sreenivasamurthy (shesha)
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2015-11-02 16:24 UTC (permalink / raw)
  To: Arnon Warshavsky; +Cc: dev

On Sun, 1 Nov 2015 06:45:31 +0200
Arnon Warshavsky <arnon@qwilt.com> wrote:

> My 2 cents,
> 
> This was brought up in the recent user space summit, and it seems that
> indeed there is no one cache lines arrangement that fits all.
> OTOH multiple compile time options to suffice all flavors, would make it
> unpleasant to read maintain test and debug.
> (I think there was quiet a consensus in favor of reducing compile options
> in general)
> 
> Currently I manage similar deviations via our own source control which I
> admit to be quite a pain.
> I would prefer an option of code manipulation/generation by some script
> during dpdk install,
> which takes the default version of rte_mbuf.h,
> along with an optional user file (json,xml,elvish,whatever) defining the
> structure replacements,
> creating your custom version, and placing it instead of the installed copy
> of rte_mbuf.h.
> Maybe the only facility required from dpdk is just the ability to register
> calls to such user scripts at some install stage(s), providing the mean
> along with responsibility to the user.
> 
> /Arnon
> 
> 
> 
> On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
> shesha@cisco.com> wrote:
> 
> > In Cisco, we are using DPDK for a very high speed packet processor
> > application. We don't use NIC TCP offload / RSS hashing. Putting those
> > fields in the first cache-line - and the obligatory mb->next datum in the
> > second cache line - causes significant LSU pressure and performance
> > degradation. If it does not affect other applications, I would like to
> > propose reshuffling of fields so that the obligator "next" field falls in
> > first cache line and RSS hashing goes to next. If this re-shuffling indeed
> > hurts other applications, another idea is to make it compile time
> > configurable. Please provide feedback.
> >
> > --
> > - Thanks
> > char * (*shesha) (uint64_t cache, uint8_t F00D)
> > { return 0x0000C0DE; }
> >

Having different layouts will be a disaster for distro's they have to choose one.
And I hate to introduce more configuration!

But we see the same issue. It would make sense if there were configuration options
for some common optimizations NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT and then
the mbuf got optimized for those combinations. Seems better than config options
like LAYOUT1, LAYOUT2, ...

In this specific case, I think lots of driver could be check nb_segs == 1 and avoiding
the next field for simple packets.

Long term, I think this will be losing battle. As DPDK grows more features, the current
mbuf structure will grow there is really nothing stopping the bloat of meta data.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-02 16:24   ` Stephen Hemminger
@ 2015-11-02 18:30     ` shesha Sreenivasamurthy (shesha)
  2015-11-02 18:35       ` Arnon Warshavsky
  0 siblings, 1 reply; 12+ messages in thread
From: shesha Sreenivasamurthy (shesha) @ 2015-11-02 18:30 UTC (permalink / raw)
  To: Stephen Hemminger, Arnon Warshavsky; +Cc: dev

One issue I see with optimization config options such as NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT is: It is not sufficient to have those "Ifdefs" inside mbuf structure, but should be sprinkled all over the code where corresponding fields are used. This may make the code messier.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0x0000C0DE; }

From: Stephen Hemminger <stephen@networkplumber.org<mailto:stephen@networkplumber.org>>
Date: Monday, November 2, 2015 at 8:24 AM
To: Arnon Warshavsky <arnon@qwilt.com<mailto:arnon@qwilt.com>>
Cc: Cisco Employee <shesha@cisco.com<mailto:shesha@cisco.com>>, "dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

On Sun, 1 Nov 2015 06:45:31 +0200
Arnon Warshavsky <arnon@qwilt.com<mailto:arnon@qwilt.com>> wrote:

My 2 cents,
This was brought up in the recent user space summit, and it seems that
indeed there is no one cache lines arrangement that fits all.
OTOH multiple compile time options to suffice all flavors, would make it
unpleasant to read maintain test and debug.
(I think there was quiet a consensus in favor of reducing compile options
in general)
Currently I manage similar deviations via our own source control which I
admit to be quite a pain.
I would prefer an option of code manipulation/generation by some script
during dpdk install,
which takes the default version of rte_mbuf.h,
along with an optional user file (json,xml,elvish,whatever) defining the
structure replacements,
creating your custom version, and placing it instead of the installed copy
of rte_mbuf.h.
Maybe the only facility required from dpdk is just the ability to register
calls to such user scripts at some install stage(s), providing the mean
along with responsibility to the user.
/Arnon
On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
shesha@cisco.com<mailto:shesha@cisco.com>> wrote:
> In Cisco, we are using DPDK for a very high speed packet processor
> application. We don't use NIC TCP offload / RSS hashing. Putting those
> fields in the first cache-line - and the obligatory mb->next datum in the
> second cache line - causes significant LSU pressure and performance
> degradation. If it does not affect other applications, I would like to
> propose reshuffling of fields so that the obligator "next" field falls in
> first cache line and RSS hashing goes to next. If this re-shuffling indeed
> hurts other applications, another idea is to make it compile time
> configurable. Please provide feedback.
>
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0x0000C0DE; }
>

Having different layouts will be a disaster for distro's they have to choose one.
And I hate to introduce more configuration!

But we see the same issue. It would make sense if there were configuration options
for some common optimizations NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT and then
the mbuf got optimized for those combinations. Seems better than config options
like LAYOUT1, LAYOUT2, ...

In this specific case, I think lots of driver could be check nb_segs == 1 and avoiding
the next field for simple packets.

Long term, I think this will be losing battle. As DPDK grows more features, the current
mbuf structure will grow there is really nothing stopping the bloat of meta data.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-02 18:30     ` shesha Sreenivasamurthy (shesha)
@ 2015-11-02 18:35       ` Arnon Warshavsky
  2015-11-02 22:19         ` shesha Sreenivasamurthy (shesha)
  0 siblings, 1 reply; 12+ messages in thread
From: Arnon Warshavsky @ 2015-11-02 18:35 UTC (permalink / raw)
  To: shesha Sreenivasamurthy (shesha); +Cc: dev

If NO_TX_OFFLOAD only changes the layout in terms of relative field
location in cache lines, and does not eliminate the fields themselves
why should the using code be affected?

On Mon, Nov 2, 2015 at 8:30 PM, shesha Sreenivasamurthy (shesha) <
shesha@cisco.com> wrote:

> One issue I see with optimization config options such as NO_TX_OFFLOAD,
> NO_MULTISEG, NO_REFCOUNT is: It is not sufficient to have those “Ifdefs”
> inside mbuf structure, but should be sprinkled all over the code where
> corresponding fields are used. This may make the code messier.
>
> --
> *- Thanks*
> *char * (*shesha) (uint64_t cache, uint8_t F00D)*
> *{ return 0x0000C0DE; } *
>
> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Monday, November 2, 2015 at 8:24 AM
> To: Arnon Warshavsky <arnon@qwilt.com>
> Cc: Cisco Employee <shesha@cisco.com>, "dev@dpdk.org" <dev@dpdk.org>
> Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
>
> On Sun, 1 Nov 2015 06:45:31 +0200
> Arnon Warshavsky <arnon@qwilt.com> wrote:
>
> My 2 cents,
> This was brought up in the recent user space summit, and it seems that
> indeed there is no one cache lines arrangement that fits all.
> OTOH multiple compile time options to suffice all flavors, would make it
> unpleasant to read maintain test and debug.
> (I think there was quiet a consensus in favor of reducing compile options
> in general)
> Currently I manage similar deviations via our own source control which I
> admit to be quite a pain.
> I would prefer an option of code manipulation/generation by some script
> during dpdk install,
> which takes the default version of rte_mbuf.h,
> along with an optional user file (json,xml,elvish,whatever) defining the
> structure replacements,
> creating your custom version, and placing it instead of the installed copy
> of rte_mbuf.h.
> Maybe the only facility required from dpdk is just the ability to register
> calls to such user scripts at some install stage(s), providing the mean
> along with responsibility to the user.
> /Arnon
> On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
> shesha@cisco.com> wrote:
> > In Cisco, we are using DPDK for a very high speed packet processor
> > application. We don't use NIC TCP offload / RSS hashing. Putting those
> > fields in the first cache-line - and the obligatory mb->next datum in the
> > second cache line - causes significant LSU pressure and performance
> > degradation. If it does not affect other applications, I would like to
> > propose reshuffling of fields so that the obligator "next" field falls in
> > first cache line and RSS hashing goes to next. If this re-shuffling
> indeed
> > hurts other applications, another idea is to make it compile time
> > configurable. Please provide feedback.
> >
> > --
> > - Thanks
> > char * (*shesha) (uint64_t cache, uint8_t F00D)
> > { return 0x0000C0DE; }
> >
>
>
> Having different layouts will be a disaster for distro's they have to
> choose one.
> And I hate to introduce more configuration!
>
> But we see the same issue. It would make sense if there were configuration
> options
> for some common optimizations NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT and
> then
> the mbuf got optimized for those combinations. Seems better than config
> options
> like LAYOUT1, LAYOUT2, ...
>
> In this specific case, I think lots of driver could be check nb_segs == 1
> and avoiding
> the next field for simple packets.
>
> Long term, I think this will be losing battle. As DPDK grows more
> features, the current
> mbuf structure will grow there is really nothing stopping the bloat of
> meta data.
>
>


-- 

*Arnon Warshavsky*
*Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon@qwilt.com
<arnon@qwilt.com>*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-02 18:35       ` Arnon Warshavsky
@ 2015-11-02 22:19         ` shesha Sreenivasamurthy (shesha)
  2015-11-02 22:51           ` Thomas Monjalon
  0 siblings, 1 reply; 12+ messages in thread
From: shesha Sreenivasamurthy (shesha) @ 2015-11-02 22:19 UTC (permalink / raw)
  To: Arnon Warshavsky; +Cc: dev

Ok, You are saying re-order the fields based on the configurations params. I took word "NO" in the param to eliminate. Sure, this does not require and change in the code that uses it. Will it not now boil down to same as having completely different layout definition and be more messier ?

For example: Rather than having:

#ifdef NO_TX_OFFLOAD
Struct mbuf_rte {
fieldA
field1
field2
fieldB
field4
filed5
};
#endif

#ifdef NO_MULTISEG
Struct mbuf_rte{
fieldA
field2
field1
fieldB
filed5
field4
}
#endif

We end up having

Struct mbuf_rte {
fieldA
#ifdef NO_TX_OFFLOAD
field1
field2
#endif
#ifdef NO_MULTISEG
field2
field1
#endif
fieldB
#ifdef NO_TX_OFFLOAD
field4
field5
#endif
#ifdef NO_MULTISEG
field5
field4
#endif
};

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0x0000C0DE; }

From: Arnon Warshavsky <arnon@qwilt.com<mailto:arnon@qwilt.com>>
Date: Monday, November 2, 2015 at 10:35 AM
To: Cisco Employee <shesha@cisco.com<mailto:shesha@cisco.com>>
Cc: Stephen Hemminger <stephen@networkplumber.org<mailto:stephen@networkplumber.org>>, "dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

If NO_TX_OFFLOAD only changes the layout in terms of relative field location in cache lines, and does not eliminate the fields themselves
why should the using code be affected?

On Mon, Nov 2, 2015 at 8:30 PM, shesha Sreenivasamurthy (shesha) <shesha@cisco.com<mailto:shesha@cisco.com>> wrote:
One issue I see with optimization config options such as NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT is: It is not sufficient to have those "Ifdefs" inside mbuf structure, but should be sprinkled all over the code where corresponding fields are used. This may make the code messier.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0x0000C0DE; }

From: Stephen Hemminger <stephen@networkplumber.org<mailto:stephen@networkplumber.org>>
Date: Monday, November 2, 2015 at 8:24 AM
To: Arnon Warshavsky <arnon@qwilt.com<mailto:arnon@qwilt.com>>
Cc: Cisco Employee <shesha@cisco.com<mailto:shesha@cisco.com>>, "dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

On Sun, 1 Nov 2015 06:45:31 +0200
Arnon Warshavsky <arnon@qwilt.com<mailto:arnon@qwilt.com>> wrote:

My 2 cents,
This was brought up in the recent user space summit, and it seems that
indeed there is no one cache lines arrangement that fits all.
OTOH multiple compile time options to suffice all flavors, would make it
unpleasant to read maintain test and debug.
(I think there was quiet a consensus in favor of reducing compile options
in general)
Currently I manage similar deviations via our own source control which I
admit to be quite a pain.
I would prefer an option of code manipulation/generation by some script
during dpdk install,
which takes the default version of rte_mbuf.h,
along with an optional user file (json,xml,elvish,whatever) defining the
structure replacements,
creating your custom version, and placing it instead of the installed copy
of rte_mbuf.h.
Maybe the only facility required from dpdk is just the ability to register
calls to such user scripts at some install stage(s), providing the mean
along with responsibility to the user.
/Arnon
On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
shesha@cisco.com<mailto:shesha@cisco.com>> wrote:
> In Cisco, we are using DPDK for a very high speed packet processor
> application. We don't use NIC TCP offload / RSS hashing. Putting those
> fields in the first cache-line - and the obligatory mb->next datum in the
> second cache line - causes significant LSU pressure and performance
> degradation. If it does not affect other applications, I would like to
> propose reshuffling of fields so that the obligator "next" field falls in
> first cache line and RSS hashing goes to next. If this re-shuffling indeed
> hurts other applications, another idea is to make it compile time
> configurable. Please provide feedback.
>
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0x0000C0DE; }
>

Having different layouts will be a disaster for distro's they have to choose one.
And I hate to introduce more configuration!

But we see the same issue. It would make sense if there were configuration options
for some common optimizations NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT and then
the mbuf got optimized for those combinations. Seems better than config options
like LAYOUT1, LAYOUT2, ...

In this specific case, I think lots of driver could be check nb_segs == 1 and avoiding
the next field for simple packets.

Long term, I think this will be losing battle. As DPDK grows more features, the current
mbuf structure will grow there is really nothing stopping the bloat of meta data.

--

Arnon Warshavsky
Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon@qwilt.com<mailto:arnon@qwilt.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-02 22:19         ` shesha Sreenivasamurthy (shesha)
@ 2015-11-02 22:51           ` Thomas Monjalon
  2015-11-03  0:21             ` Matthew Hall
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Monjalon @ 2015-11-02 22:51 UTC (permalink / raw)
  To: shesha Sreenivasamurthy (shesha), Arnon Warshavsky; +Cc: dev

This discussion is about improving performance of specific use cases
by moving the mbuf fields when needed.
We could consider how to configure it and how complicated it would be to
write applications or drivers (especially vector ones) for such a moving
structure.
But it is simpler to say that having an API depending of some options
is a "no-design" which could seriously slow down the DPDK adoption.
You can have a different opinion but I cannot imagine how strong must be the
arguments to make it happen.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-02 22:51           ` Thomas Monjalon
@ 2015-11-03  0:21             ` Matthew Hall
  2015-11-03 10:20               ` Bruce Richardson
  2015-11-04 18:56               ` shesha Sreenivasamurthy (shesha)
  0 siblings, 2 replies; 12+ messages in thread
From: Matthew Hall @ 2015-11-03  0:21 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Mon, Nov 02, 2015 at 11:51:23PM +0100, Thomas Monjalon wrote:
> But it is simpler to say that having an API depending of some options
> is a "no-design" which could seriously slow down the DPDK adoption.

What about something similar to how Java JNI works? It needed to support 
multiple Java JRE / JDK brands, implementations etc. Upon initialization, a 
function pointer array is created, and specific slots are filled with pointers 
to the real implementation of some native API functions you can call from 
inside your library to perform operations.

In the DPDK case, we need flexible data instead of flexible function 
implementations.

To do this there would be some pointer slots in the mbuf that are are filled 
with pointers to metadata for required DPDK features. The data could be placed 
in the following cachelines, using some reserved tailroom between the mbuf 
control block and the packet data block. Then the prefetch could be set up to 
prefetch only the used parts of the tailroom at any given point, to prevent 
unwanted slowdowns.

Matthew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-03  0:21             ` Matthew Hall
@ 2015-11-03 10:20               ` Bruce Richardson
  2015-11-03 11:44                 ` Zoltan Kiss
  2015-11-04 18:56               ` shesha Sreenivasamurthy (shesha)
  1 sibling, 1 reply; 12+ messages in thread
From: Bruce Richardson @ 2015-11-03 10:20 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev

On Mon, Nov 02, 2015 at 07:21:17PM -0500, Matthew Hall wrote:
> On Mon, Nov 02, 2015 at 11:51:23PM +0100, Thomas Monjalon wrote:
> > But it is simpler to say that having an API depending of some options
> > is a "no-design" which could seriously slow down the DPDK adoption.
> 
> What about something similar to how Java JNI works? It needed to support 
> multiple Java JRE / JDK brands, implementations etc. Upon initialization, a 
> function pointer array is created, and specific slots are filled with pointers 
> to the real implementation of some native API functions you can call from 
> inside your library to perform operations.
> 
> In the DPDK case, we need flexible data instead of flexible function 
> implementations.
> 
> To do this there would be some pointer slots in the mbuf that are are filled 
> with pointers to metadata for required DPDK features. The data could be placed 
> in the following cachelines, using some reserved tailroom between the mbuf 
> control block and the packet data block. Then the prefetch could be set up to 
> prefetch only the used parts of the tailroom at any given point, to prevent 
> unwanted slowdowns.
> 
> Matthew.

The trouble is that a lot of the metadata comes from the receive descriptor on
the RX code path, which is extremely sensitive to cache line usage. This is why
in the 1.8 changes to the mbuf, the data used by the RX code paths were all put
on the first cacheline.

/Bruce

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-03 10:20               ` Bruce Richardson
@ 2015-11-03 11:44                 ` Zoltan Kiss
  2015-11-03 14:33                   ` Matthew Hall
  0 siblings, 1 reply; 12+ messages in thread
From: Zoltan Kiss @ 2015-11-03 11:44 UTC (permalink / raw)
  To: Bruce Richardson, Matthew Hall; +Cc: dev

Also, there could be places in the code where we change a set of 
continuous fields in the mbuf. E.g. ixgbe vector pmd receive function 
takes advantage of 128 bit vector registers and fill out 
rx_descriptor_fields1 with one instruction. But I guess there are other 
places too, and they are really hard to find with code analysis. A 
change in the mbuf structure would probably bring a plethora of nasty 
bugs due to this.

On 03/11/15 10:20, Bruce Richardson wrote:
> On Mon, Nov 02, 2015 at 07:21:17PM -0500, Matthew Hall wrote:
>> On Mon, Nov 02, 2015 at 11:51:23PM +0100, Thomas Monjalon wrote:
>>> But it is simpler to say that having an API depending of some options
>>> is a "no-design" which could seriously slow down the DPDK adoption.
>>
>> What about something similar to how Java JNI works? It needed to support
>> multiple Java JRE / JDK brands, implementations etc. Upon initialization, a
>> function pointer array is created, and specific slots are filled with pointers
>> to the real implementation of some native API functions you can call from
>> inside your library to perform operations.
>>
>> In the DPDK case, we need flexible data instead of flexible function
>> implementations.
>>
>> To do this there would be some pointer slots in the mbuf that are are filled
>> with pointers to metadata for required DPDK features. The data could be placed
>> in the following cachelines, using some reserved tailroom between the mbuf
>> control block and the packet data block. Then the prefetch could be set up to
>> prefetch only the used parts of the tailroom at any given point, to prevent
>> unwanted slowdowns.
>>
>> Matthew.
>
> The trouble is that a lot of the metadata comes from the receive descriptor on
> the RX code path, which is extremely sensitive to cache line usage. This is why
> in the 1.8 changes to the mbuf, the data used by the RX code paths were all put
> on the first cacheline.
>
> /Bruce
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-03 11:44                 ` Zoltan Kiss
@ 2015-11-03 14:33                   ` Matthew Hall
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Hall @ 2015-11-03 14:33 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: dev

On Tue, Nov 03, 2015 at 11:44:22AM +0000, Zoltan Kiss wrote:
> Also, there could be places in the code where we change a set of
> continuous fields in the mbuf. E.g. ixgbe vector pmd receive
> function takes advantage of 128 bit vector registers and fill out
> rx_descriptor_fields1 with one instruction. But I guess there are
> other places too, and they are really hard to find with code
> analysis. A change in the mbuf structure would probably bring a
> plethora of nasty bugs due to this.

If the RX path is the cause of most of the issues, then it seems like we need 
to make some diagrams and a description of how this code works, so we could 
crowd-source the best proposed performance and cleanliness improvements.

Trying to solve this problem one little hack at a time isn't going to achieve 
the pretty demanding performance and flexibility constraints on the code.

Do we have some kind of plans to do bounties, specific wiki pages on known 
design problems, Google Summer of Code, or some other kind of process for 
longer-term architectural improvements?

Also, in this instance it seems like it might be wise to outsource some 
black-magic like these vector instructions, to some of the pre-optimized 
cleaner alternatives like rte_memcpy.

Matthew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] Reshuffling of rte_mbuf structure.
  2015-11-03  0:21             ` Matthew Hall
  2015-11-03 10:20               ` Bruce Richardson
@ 2015-11-04 18:56               ` shesha Sreenivasamurthy (shesha)
  1 sibling, 0 replies; 12+ messages in thread
From: shesha Sreenivasamurthy (shesha) @ 2015-11-04 18:56 UTC (permalink / raw)
  To: dev

Is there a way where we can just define the fields that ought to be there in the mbuf structure, but the position and size is implementation dependent ? The application can provide "mbuf_impl.h" that contains mbuf_rte fields in the order that seems appropriate to application.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0x0000C0DE; }

From: Matthew Hall <mhall@mhcomputing.net<mailto:mhall@mhcomputing.net>>
Date: Monday, November 2, 2015 at 4:21 PM
To: Thomas Monjalon <thomas.monjalon@6wind.com<mailto:thomas.monjalon@6wind.com>>
Cc: Cisco Employee <shesha@cisco.com<mailto:shesha@cisco.com>>, Arnon Warshavsky <arnon@qwilt.com<mailto:arnon@qwilt.com>>, "dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

On Mon, Nov 02, 2015 at 11:51:23PM +0100, Thomas Monjalon wrote:
But it is simpler to say that having an API depending of some options
is a "no-design" which could seriously slow down the DPDK adoption.

What about something similar to how Java JNI works? It needed to support
multiple Java JRE / JDK brands, implementations etc. Upon initialization, a
function pointer array is created, and specific slots are filled with pointers
to the real implementation of some native API functions you can call from
inside your library to perform operations.

In the DPDK case, we need flexible data instead of flexible function
implementations.

To do this there would be some pointer slots in the mbuf that are are filled
with pointers to metadata for required DPDK features. The data could be placed
in the following cachelines, using some reserved tailroom between the mbuf
control block and the packet data block. Then the prefetch could be set up to
prefetch only the used parts of the tailroom at any given point, to prevent
unwanted slowdowns.

Matthew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-11-04 18:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-31  4:44 [dpdk-dev] Reshuffling of rte_mbuf structure shesha Sreenivasamurthy (shesha)
2015-11-01  4:45 ` Arnon Warshavsky
2015-11-02 16:24   ` Stephen Hemminger
2015-11-02 18:30     ` shesha Sreenivasamurthy (shesha)
2015-11-02 18:35       ` Arnon Warshavsky
2015-11-02 22:19         ` shesha Sreenivasamurthy (shesha)
2015-11-02 22:51           ` Thomas Monjalon
2015-11-03  0:21             ` Matthew Hall
2015-11-03 10:20               ` Bruce Richardson
2015-11-03 11:44                 ` Zoltan Kiss
2015-11-03 14:33                   ` Matthew Hall
2015-11-04 18:56               ` shesha Sreenivasamurthy (shesha)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).