* [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
@ 2014-10-07 9:33 Pattan, Reshma
2014-10-07 11:21 ` Neil Horman
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Pattan, Reshma @ 2014-10-07 9:33 UTC (permalink / raw)
To: dev
Hi All,
I am planning to implement packet reorder library. Details are as below, please go through them and provide the comments.
Requirement:
To reorder out of ordered packets that are received from different cores.
Usage:
To be used along with distributor library. Next version of distributor are planned to distribute incoming packets to all worker cores irrespective of the flow type.
In this case to ensure in order delivery of the packets at output side reorder library will used by the tx end.
Assumption:
All input packets will be marked with sequence number in seqn field of mbuf, this will be the reference for reordering at the tx end.
Sequence number will be of type uint32_t. New sequence number field seqn will be added to mbuf structure.
Design:
a)There will be reorder buffer(circular buffer) structure maintained in reorder library to store reordered packets and other details of buffer like head to drain the packet from, min sequence number and other details.
b)Library will provide insert and drain functions to reorder and fetch out the reordered packets respectively.
c)Users of library should pass the packets to insert functions for reordering.
Insertion logic:
Sequence number of current packet will be used to calculate offset in reorder buffer and write packet to the location in the reorder buffer corresponding to offset.
Offset is calculated as difference of current packet sequence number and sequence number associated with reorder buffer.
During sequence number wrapping or wrapping over of reorder buffer size, before inserting the new packet we should move offset number of packets to other buffer called overflow buffer and advance the head of reorder buffer by "offset-reorder buffer size" and insert the new packet.
Insert function:
int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct rte_mbuf *mbuf);
Note: Other insert function is also under plan to insert burst of packets.
Reorder buffer:
struct rte_reorder_buffer {
unsigned int size; /* The size (number of entries) of the buffer. */
unsigned int mask; /* Mask (size - 1) of the buffer */
unsigned int head; /* Current head of buffer */
uint32_t min_seqn; /* latest sequence number associated with buffer */
struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer to hold reordered mbufs */
};
d)Users can fetch out the reordered packets by drain function provided by library. Users must pass the mbuf array , drain function should fill passed mbuff array with the reordered buffer packets.
During drain operation, overflow buffer packets will be fetched out first and then reorder buffer.
Drain function:
int rte_reorder_drain(struct rte_reorder_buffer *buffer, struct rte_mbuf **mbufs)
Thanks,
Reshma
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-07 9:33 [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library Pattan, Reshma
@ 2014-10-07 11:21 ` Neil Horman
2014-10-08 14:11 ` Pattan, Reshma
2014-10-08 22:41 ` Matthew Hall
2014-10-09 19:01 ` Jay Rolette
2 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2014-10-07 11:21 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> Hi All,
>
> I am planning to implement packet reorder library. Details are as below, please go through them and provide the comments.
>
> Requirement:
> To reorder out of ordered packets that are received from different cores.
>
> Usage:
> To be used along with distributor library. Next version of distributor are planned to distribute incoming packets to all worker cores irrespective of the flow type.
> In this case to ensure in order delivery of the packets at output side reorder library will used by the tx end.
>
> Assumption:
> All input packets will be marked with sequence number in seqn field of mbuf, this will be the reference for reordering at the tx end.
> Sequence number will be of type uint32_t. New sequence number field seqn will be added to mbuf structure.
>
> Design:
> a)There will be reorder buffer(circular buffer) structure maintained in reorder library to store reordered packets and other details of buffer like head to drain the packet from, min sequence number and other details.
> b)Library will provide insert and drain functions to reorder and fetch out the reordered packets respectively.
> c)Users of library should pass the packets to insert functions for reordering.
>
> Insertion logic:
> Sequence number of current packet will be used to calculate offset in reorder buffer and write packet to the location in the reorder buffer corresponding to offset.
> Offset is calculated as difference of current packet sequence number and sequence number associated with reorder buffer.
>
> During sequence number wrapping or wrapping over of reorder buffer size, before inserting the new packet we should move offset number of packets to other buffer called overflow buffer and advance the head of reorder buffer by "offset-reorder buffer size" and insert the new packet.
>
> Insert function:
> int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct rte_mbuf *mbuf);
> Note: Other insert function is also under plan to insert burst of packets.
>
> Reorder buffer:
> struct rte_reorder_buffer {
> unsigned int size; /* The size (number of entries) of the buffer. */
> unsigned int mask; /* Mask (size - 1) of the buffer */
> unsigned int head; /* Current head of buffer */
> uint32_t min_seqn; /* latest sequence number associated with buffer */
> struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer to hold reordered mbufs */
> };
>
> d)Users can fetch out the reordered packets by drain function provided by library. Users must pass the mbuf array , drain function should fill passed mbuff array with the reordered buffer packets.
> During drain operation, overflow buffer packets will be fetched out first and then reorder buffer.
>
> Drain function:
> int rte_reorder_drain(struct rte_reorder_buffer *buffer, struct rte_mbuf **mbufs)
>
> Thanks,
> Reshma
>
This seems reasonable, but why not integrate it with the distributor library
rather than create a separate library for it? It seems as though the
distributor library is a pre-requisite for this libraries use anyway, as
otherwise there will not be anything to reorder
Neil
> --------------------------------------------------------------
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-07 11:21 ` Neil Horman
@ 2014-10-08 14:11 ` Pattan, Reshma
2014-10-08 19:15 ` Neil Horman
0 siblings, 1 reply; 20+ messages in thread
From: Pattan, Reshma @ 2014-10-08 14:11 UTC (permalink / raw)
To: Neil Horman; +Cc: dev
> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Tuesday, October 7, 2014 12:22 PM
> To: Pattan, Reshma
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
>
> On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > Hi All,
> >
> > I am planning to implement packet reorder library. Details are as below,
> please go through them and provide the comments.
> >
> > Requirement:
> > To reorder out of ordered packets that are received from different
> cores.
> >
> > Usage:
> > To be used along with distributor library. Next version of distributor are
> planned to distribute incoming packets to all worker cores irrespective of the
> flow type.
> > In this case to ensure in order delivery of the packets at output side reorder
> library will used by the tx end.
> >
> > Assumption:
> > All input packets will be marked with sequence number in seqn field of mbuf,
> this will be the reference for reordering at the tx end.
> > Sequence number will be of type uint32_t. New sequence number field seqn
> will be added to mbuf structure.
> >
> > Design:
> > a)There will be reorder buffer(circular buffer) structure maintained in reorder
> library to store reordered packets and other details of buffer like head to drain
> the packet from, min sequence number and other details.
> > b)Library will provide insert and drain functions to reorder and fetch
> out the reordered packets respectively.
> > c)Users of library should pass the packets to insert functions for reordering.
> >
> > Insertion logic:
> > Sequence number of current packet will be used to calculate offset in reorder
> buffer and write packet to the location in the reorder buffer corresponding to
> offset.
> > Offset is calculated as difference of current packet sequence
> number and sequence number associated with reorder buffer.
> >
> > During sequence number wrapping or wrapping over of reorder buffer size,
> before inserting the new packet we should move offset number of packets to
> other buffer called overflow buffer and advance the head of reorder buffer by
> "offset-reorder buffer size" and insert the new packet.
> >
> > Insert function:
> > int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct
> > rte_mbuf *mbuf);
> > Note: Other insert function is also under plan to insert burst of packets.
> >
> > Reorder buffer:
> > struct rte_reorder_buffer {
> > unsigned int size; /* The size (number of entries) of the buffer. */
> > unsigned int mask; /* Mask (size - 1) of the buffer */
> > unsigned int head; /* Current head of buffer */
> > uint32_t min_seqn; /* latest sequence number associated with buffer
> */
> > struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer
> > to hold reordered mbufs */ };
> >
> > d)Users can fetch out the reordered packets by drain function provided by
> library. Users must pass the mbuf array , drain function should fill passed mbuff
> array with the reordered buffer packets.
> > During drain operation, overflow buffer packets will be fetched out first and
> then reorder buffer.
> >
> > Drain function:
> > int rte_reorder_drain(struct rte_reorder_buffer
> > *buffer, struct rte_mbuf **mbufs)
> >
> > Thanks,
> > Reshma
> >
>
> This seems reasonable, but why not integrate it with the distributor library rather
> than create a separate library for it? It seems as though the distributor library is
> a pre-requisite for this libraries use anyway, as otherwise there will not be
> anything to reorder Neil
>
Hi Neil,
Reorder library should be standalone , as there are many ways that can cause out of ordering of packets, I just mentioned future packet distributor enhancements
as one of the example for out of ordering.
Other ways like, users can directly distribute the packets to different cores via rings and that causes packet out of ordering as well.
So, keeping reorder library standalone would be good to work with all packet distribution ways.
Thanks,
Reshma
> > --------------------------------------------------------------
> > Intel Shannon Limited
> > Registered in Ireland
> > Registered Office: Collinstown Industrial Park, Leixlip, County
> > Kildare Registered Number: 308263 Business address: Dromore House,
> > East Park, Shannon, Co. Clare
> >
> > This e-mail and any attachments may contain confidential material for the sole
> use of the intended recipient(s). Any review or distribution by others is strictly
> prohibited. If you are not the intended recipient, please contact the sender and
> delete all copies.
> >
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-08 14:11 ` Pattan, Reshma
@ 2014-10-08 19:15 ` Neil Horman
2014-10-09 10:27 ` Pattan, Reshma
0 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2014-10-08 19:15 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
On Wed, Oct 08, 2014 at 02:11:34PM +0000, Pattan, Reshma wrote:
>
>
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Tuesday, October 7, 2014 12:22 PM
> > To: Pattan, Reshma
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
> >
> > On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > > Hi All,
> > >
> > > I am planning to implement packet reorder library. Details are as below,
> > please go through them and provide the comments.
> > >
> > > Requirement:
> > > To reorder out of ordered packets that are received from different
> > cores.
> > >
> > > Usage:
> > > To be used along with distributor library. Next version of distributor are
> > planned to distribute incoming packets to all worker cores irrespective of the
> > flow type.
> > > In this case to ensure in order delivery of the packets at output side reorder
> > library will used by the tx end.
> > >
> > > Assumption:
> > > All input packets will be marked with sequence number in seqn field of mbuf,
> > this will be the reference for reordering at the tx end.
> > > Sequence number will be of type uint32_t. New sequence number field seqn
> > will be added to mbuf structure.
> > >
> > > Design:
> > > a)There will be reorder buffer(circular buffer) structure maintained in reorder
> > library to store reordered packets and other details of buffer like head to drain
> > the packet from, min sequence number and other details.
> > > b)Library will provide insert and drain functions to reorder and fetch
> > out the reordered packets respectively.
> > > c)Users of library should pass the packets to insert functions for reordering.
> > >
> > > Insertion logic:
> > > Sequence number of current packet will be used to calculate offset in reorder
> > buffer and write packet to the location in the reorder buffer corresponding to
> > offset.
> > > Offset is calculated as difference of current packet sequence
> > number and sequence number associated with reorder buffer.
> > >
> > > During sequence number wrapping or wrapping over of reorder buffer size,
> > before inserting the new packet we should move offset number of packets to
> > other buffer called overflow buffer and advance the head of reorder buffer by
> > "offset-reorder buffer size" and insert the new packet.
> > >
> > > Insert function:
> > > int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct
> > > rte_mbuf *mbuf);
> > > Note: Other insert function is also under plan to insert burst of packets.
> > >
> > > Reorder buffer:
> > > struct rte_reorder_buffer {
> > > unsigned int size; /* The size (number of entries) of the buffer. */
> > > unsigned int mask; /* Mask (size - 1) of the buffer */
> > > unsigned int head; /* Current head of buffer */
> > > uint32_t min_seqn; /* latest sequence number associated with buffer
> > */
> > > struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer
> > > to hold reordered mbufs */ };
> > >
> > > d)Users can fetch out the reordered packets by drain function provided by
> > library. Users must pass the mbuf array , drain function should fill passed mbuff
> > array with the reordered buffer packets.
> > > During drain operation, overflow buffer packets will be fetched out first and
> > then reorder buffer.
> > >
> > > Drain function:
> > > int rte_reorder_drain(struct rte_reorder_buffer
> > > *buffer, struct rte_mbuf **mbufs)
> > >
> > > Thanks,
> > > Reshma
> > >
> >
> > This seems reasonable, but why not integrate it with the distributor library rather
> > than create a separate library for it? It seems as though the distributor library is
> > a pre-requisite for this libraries use anyway, as otherwise there will not be
> > anything to reorder Neil
> >
>
> Hi Neil,
>
> Reorder library should be standalone , as there are many ways that can cause out of ordering of packets, I just mentioned future packet distributor enhancements
> as one of the example for out of ordering.
> Other ways like, users can directly distribute the packets to different cores via rings and that causes packet out of ordering as well.
> So, keeping reorder library standalone would be good to work with all packet distribution ways.
>
Hmm, ok, that seems reasonable.
Just out of curiosity, where do you intend to inject the packet sequence number
for this library?
Neil
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-07 9:33 [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library Pattan, Reshma
2014-10-07 11:21 ` Neil Horman
@ 2014-10-08 22:41 ` Matthew Hall
2014-10-08 22:55 ` Neil Horman
2014-10-09 19:01 ` Jay Rolette
2 siblings, 1 reply; 20+ messages in thread
From: Matthew Hall @ 2014-10-08 22:41 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> To be used along with distributor library. Next version of distributor are
> planned to distribute incoming packets to all worker cores irrespective of
> the flow type. In this case to ensure in order delivery of the packets at
> output side reorder library will used by the tx end.
Hello,
I am trying to understand, why is usage of the RSS options on the RX Queues
insufficient to keep the packets ordered per-flow on the TX Queues?
Thanks,
Matthew.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-08 22:41 ` Matthew Hall
@ 2014-10-08 22:55 ` Neil Horman
2014-10-08 23:07 ` Matthew Hall
0 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2014-10-08 22:55 UTC (permalink / raw)
To: Matthew Hall; +Cc: dev
On Wed, Oct 08, 2014 at 03:41:11PM -0700, Matthew Hall wrote:
> On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > To be used along with distributor library. Next version of distributor are
> > planned to distribute incoming packets to all worker cores irrespective of
> > the flow type. In this case to ensure in order delivery of the packets at
> > output side reorder library will used by the tx end.
>
> Hello,
>
> I am trying to understand, why is usage of the RSS options on the RX Queues
> insufficient to keep the packets ordered per-flow on the TX Queues?
>
I think because there is a possibility that multiple workers may be used for a
single tx queue.
Neil
> Thanks,
> Matthew.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-08 22:55 ` Neil Horman
@ 2014-10-08 23:07 ` Matthew Hall
2014-10-09 9:14 ` Bruce Richardson
0 siblings, 1 reply; 20+ messages in thread
From: Matthew Hall @ 2014-10-08 23:07 UTC (permalink / raw)
To: Neil Horman; +Cc: dev
On Wed, Oct 08, 2014 at 06:55:41PM -0400, Neil Horman wrote:
> I think because there is a possibility that multiple workers may be used for a
> single tx queue.
>
> Neil
OK, so, in my application packets are RX'ed to a predictable RX queue and core
using RSS.
Then you put them into a predictable TX queue for the same core, in the same
order they came in from the RX queue with RSS enabled.
So you've got a consistent-hashed subset of packets as input, being converted
to output in the same order.
Will it work, or not work? I'm just curious if my app is doing it wrong and I
need to fix it, or how this case should be handled in general...
Matthew.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-08 23:07 ` Matthew Hall
@ 2014-10-09 9:14 ` Bruce Richardson
2014-10-09 17:11 ` Matthew Hall
0 siblings, 1 reply; 20+ messages in thread
From: Bruce Richardson @ 2014-10-09 9:14 UTC (permalink / raw)
To: Matthew Hall; +Cc: dev
On Wed, Oct 08, 2014 at 04:07:28PM -0700, Matthew Hall wrote:
> On Wed, Oct 08, 2014 at 06:55:41PM -0400, Neil Horman wrote:
> > I think because there is a possibility that multiple workers may be used for a
> > single tx queue.
> >
> > Neil
>
> OK, so, in my application packets are RX'ed to a predictable RX queue and core
> using RSS.
>
> Then you put them into a predictable TX queue for the same core, in the same
> order they came in from the RX queue with RSS enabled.
>
> So you've got a consistent-hashed subset of packets as input, being converted
> to output in the same order.
>
> Will it work, or not work? I'm just curious if my app is doing it wrong and I
> need to fix it, or how this case should be handled in general...
>
> Matthew.
Hi Matthew,
What you are doing will indeed work, and it's the way the vast majority of
the sample apps are written. However, this will not always work for everyone
else, sadly.
First off, with RSS, there are a number of limitations. On the 1G and 10G
NICs RSS works only with IP traffic, and won't work in cases with other
protocols or where IP is encapsulated in anything other than a single VLAN.
Those cases need software load distribution. As well as this, you have very
little control over where flows get put, as the separation into queues
(which go to cores), is only done on the low seven bits. For applications
which work with a small number of flows, e.g. where multiple flows are
contained inside a single tunnel, you get a get a large flow imbalance,
where you get far more traffic coming to one queue/core than to another.
Again in this instance, software load balancing is needed.
Secondly, then, based off that, it is entirely possible when doing software
load balancing to strictly process packets for a flow in order - and indeed
this is what the existing packet distributor does. However, for certain
types of flow where processing of packets for that flow can be done in
parallel, forcing things to be done serially can slow things down. As well
as this, there can sometimes be requirements for the load balancing between
cores to be done as fairly as possible so that it is guaranteed that all
cores have approx the same load, irrespective of the number of input flows.
In these cases, having the option to blindly distribute traffic to cores and
then reorder packets on TX is the best way to ensure even load distribution.
It's not going to be for everyone, but it's good to have the option - and
there are a number of people doing things this way already.
Lastly, there is also the assumption being made that all flows are
independent, which again may not always be the case. If you need ordering
across flows and to share load between cores then reordering on transmission
is the only way to do things.
Hope this helps,
Regards,
/Bruce
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-08 19:15 ` Neil Horman
@ 2014-10-09 10:27 ` Pattan, Reshma
2014-10-09 11:36 ` Neil Horman
0 siblings, 1 reply; 20+ messages in thread
From: Pattan, Reshma @ 2014-10-09 10:27 UTC (permalink / raw)
To: Neil Horman; +Cc: dev
> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Wednesday, October 8, 2014 8:16 PM
> To: Pattan, Reshma
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
>
> On Wed, Oct 08, 2014 at 02:11:34PM +0000, Pattan, Reshma wrote:
> >
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: Tuesday, October 7, 2014 12:22 PM
> > > To: Pattan, Reshma
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder
> > > library
> > >
> > > On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > > > Hi All,
> > > >
> > > > I am planning to implement packet reorder library. Details are as
> > > > below,
> > > please go through them and provide the comments.
> > > >
> > > > Requirement:
> > > > To reorder out of ordered packets that are received
> > > > from different
> > > cores.
> > > >
> > > > Usage:
> > > > To be used along with distributor library. Next version of
> > > > distributor are
> > > planned to distribute incoming packets to all worker cores
> > > irrespective of the flow type.
> > > > In this case to ensure in order delivery of the packets at output
> > > > side reorder
> > > library will used by the tx end.
> > > >
> > > > Assumption:
> > > > All input packets will be marked with sequence number in seqn
> > > > field of mbuf,
> > > this will be the reference for reordering at the tx end.
> > > > Sequence number will be of type uint32_t. New sequence number
> > > > field seqn
> > > will be added to mbuf structure.
> > > >
> > > > Design:
> > > > a)There will be reorder buffer(circular buffer) structure
> > > > maintained in reorder
> > > library to store reordered packets and other details of buffer like
> > > head to drain the packet from, min sequence number and other details.
> > > > b)Library will provide insert and drain functions
> > > > to reorder and fetch
> > > out the reordered packets respectively.
> > > > c)Users of library should pass the packets to insert functions for reordering.
> > > >
> > > > Insertion logic:
> > > > Sequence number of current packet will be used to calculate offset
> > > > in reorder
> > > buffer and write packet to the location in the reorder buffer
> > > corresponding to offset.
> > > > Offset is calculated as difference of
> > > > current packet sequence
> > > number and sequence number associated with reorder buffer.
> > > >
> > > > During sequence number wrapping or wrapping over of reorder buffer
> > > > size,
> > > before inserting the new packet we should move offset number of
> > > packets to other buffer called overflow buffer and advance the head
> > > of reorder buffer by "offset-reorder buffer size" and insert the new packet.
> > > >
> > > > Insert function:
> > > > int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct
> > > > rte_mbuf *mbuf);
> > > > Note: Other insert function is also under plan to insert burst of packets.
> > > >
> > > > Reorder buffer:
> > > > struct rte_reorder_buffer {
> > > > unsigned int size; /* The size (number of entries) of the buffer. */
> > > > unsigned int mask; /* Mask (size - 1) of the buffer */
> > > > unsigned int head; /* Current head of buffer */
> > > > uint32_t min_seqn; /* latest sequence number associated with
> buffer
> > > */
> > > > struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /*
> > > > buffer to hold reordered mbufs */ };
> > > >
> > > > d)Users can fetch out the reordered packets by drain function
> > > > provided by
> > > library. Users must pass the mbuf array , drain function should fill
> > > passed mbuff array with the reordered buffer packets.
> > > > During drain operation, overflow buffer packets will be fetched
> > > > out first and
> > > then reorder buffer.
> > > >
> > > > Drain function:
> > > > int rte_reorder_drain(struct rte_reorder_buffer
> > > > *buffer, struct rte_mbuf **mbufs)
> > > >
> > > > Thanks,
> > > > Reshma
> > > >
> > >
> > > This seems reasonable, but why not integrate it with the distributor
> > > library rather than create a separate library for it? It seems as
> > > though the distributor library is a pre-requisite for this libraries
> > > use anyway, as otherwise there will not be anything to reorder Neil
> > >
> >
> > Hi Neil,
> >
> > Reorder library should be standalone , as there are many ways that can
> > cause out of ordering of packets, I just mentioned future packet distributor
> enhancements as one of the example for out of ordering.
> > Other ways like, users can directly distribute the packets to different cores via
> rings and that causes packet out of ordering as well.
> > So, keeping reorder library standalone would be good to work with all packet
> distribution ways.
> >
>
>
> Hmm, ok, that seems reasonable.
>
> Just out of curiosity, where do you intend to inject the packet sequence number
> for this library?
>
Sequence number marking can be done in either of these places 1)PMD rx side 2) packet distributor process or 3) in application itself.
Thanks,
Reshma
> Neil
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 10:27 ` Pattan, Reshma
@ 2014-10-09 11:36 ` Neil Horman
2014-10-09 14:36 ` Pattan, Reshma
0 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2014-10-09 11:36 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
On Thu, Oct 09, 2014 at 10:27:55AM +0000, Pattan, Reshma wrote:
>
>
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Wednesday, October 8, 2014 8:16 PM
> > To: Pattan, Reshma
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
> >
> > On Wed, Oct 08, 2014 at 02:11:34PM +0000, Pattan, Reshma wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > Sent: Tuesday, October 7, 2014 12:22 PM
> > > > To: Pattan, Reshma
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder
> > > > library
> > > >
> > > > On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > > > > Hi All,
> > > > >
> > > > > I am planning to implement packet reorder library. Details are as
> > > > > below,
> > > > please go through them and provide the comments.
> > > > >
> > > > > Requirement:
> > > > > To reorder out of ordered packets that are received
> > > > > from different
> > > > cores.
> > > > >
> > > > > Usage:
> > > > > To be used along with distributor library. Next version of
> > > > > distributor are
> > > > planned to distribute incoming packets to all worker cores
> > > > irrespective of the flow type.
> > > > > In this case to ensure in order delivery of the packets at output
> > > > > side reorder
> > > > library will used by the tx end.
> > > > >
> > > > > Assumption:
> > > > > All input packets will be marked with sequence number in seqn
> > > > > field of mbuf,
> > > > this will be the reference for reordering at the tx end.
> > > > > Sequence number will be of type uint32_t. New sequence number
> > > > > field seqn
> > > > will be added to mbuf structure.
> > > > >
> > > > > Design:
> > > > > a)There will be reorder buffer(circular buffer) structure
> > > > > maintained in reorder
> > > > library to store reordered packets and other details of buffer like
> > > > head to drain the packet from, min sequence number and other details.
> > > > > b)Library will provide insert and drain functions
> > > > > to reorder and fetch
> > > > out the reordered packets respectively.
> > > > > c)Users of library should pass the packets to insert functions for reordering.
> > > > >
> > > > > Insertion logic:
> > > > > Sequence number of current packet will be used to calculate offset
> > > > > in reorder
> > > > buffer and write packet to the location in the reorder buffer
> > > > corresponding to offset.
> > > > > Offset is calculated as difference of
> > > > > current packet sequence
> > > > number and sequence number associated with reorder buffer.
> > > > >
> > > > > During sequence number wrapping or wrapping over of reorder buffer
> > > > > size,
> > > > before inserting the new packet we should move offset number of
> > > > packets to other buffer called overflow buffer and advance the head
> > > > of reorder buffer by "offset-reorder buffer size" and insert the new packet.
> > > > >
> > > > > Insert function:
> > > > > int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct
> > > > > rte_mbuf *mbuf);
> > > > > Note: Other insert function is also under plan to insert burst of packets.
> > > > >
> > > > > Reorder buffer:
> > > > > struct rte_reorder_buffer {
> > > > > unsigned int size; /* The size (number of entries) of the buffer. */
> > > > > unsigned int mask; /* Mask (size - 1) of the buffer */
> > > > > unsigned int head; /* Current head of buffer */
> > > > > uint32_t min_seqn; /* latest sequence number associated with
> > buffer
> > > > */
> > > > > struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /*
> > > > > buffer to hold reordered mbufs */ };
> > > > >
> > > > > d)Users can fetch out the reordered packets by drain function
> > > > > provided by
> > > > library. Users must pass the mbuf array , drain function should fill
> > > > passed mbuff array with the reordered buffer packets.
> > > > > During drain operation, overflow buffer packets will be fetched
> > > > > out first and
> > > > then reorder buffer.
> > > > >
> > > > > Drain function:
> > > > > int rte_reorder_drain(struct rte_reorder_buffer
> > > > > *buffer, struct rte_mbuf **mbufs)
> > > > >
> > > > > Thanks,
> > > > > Reshma
> > > > >
> > > >
> > > > This seems reasonable, but why not integrate it with the distributor
> > > > library rather than create a separate library for it? It seems as
> > > > though the distributor library is a pre-requisite for this libraries
> > > > use anyway, as otherwise there will not be anything to reorder Neil
> > > >
> > >
> > > Hi Neil,
> > >
> > > Reorder library should be standalone , as there are many ways that can
> > > cause out of ordering of packets, I just mentioned future packet distributor
> > enhancements as one of the example for out of ordering.
> > > Other ways like, users can directly distribute the packets to different cores via
> > rings and that causes packet out of ordering as well.
> > > So, keeping reorder library standalone would be good to work with all packet
> > distribution ways.
> > >
> >
> >
> > Hmm, ok, that seems reasonable.
> >
> > Just out of curiosity, where do you intend to inject the packet sequence number
> > for this library?
> >
> Sequence number marking can be done in either of these places 1)PMD rx side 2) packet distributor process or 3) in application itself.
>
Thanks. So, that was the part of this that was getting to me. For this to work,
you're going to have to mark frames externally to the re-order library (i.e.
You're proposed library needs to rely on the application or the pmd to do proper
marking of frames to reorder properly). Is it tennable in your mind to require
that?
Neil
> Thanks,
> Reshma
>
> > Neil
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 11:36 ` Neil Horman
@ 2014-10-09 14:36 ` Pattan, Reshma
2014-10-09 16:09 ` Neil Horman
0 siblings, 1 reply; 20+ messages in thread
From: Pattan, Reshma @ 2014-10-09 14:36 UTC (permalink / raw)
To: Neil Horman; +Cc: dev
> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Thursday, October 9, 2014 12:37 PM
> To: Pattan, Reshma
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
>
> On Thu, Oct 09, 2014 at 10:27:55AM +0000, Pattan, Reshma wrote:
> >
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: Wednesday, October 8, 2014 8:16 PM
> > > To: Pattan, Reshma
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder
> > > library
> > >
> > > On Wed, Oct 08, 2014 at 02:11:34PM +0000, Pattan, Reshma wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > > Sent: Tuesday, October 7, 2014 12:22 PM
> > > > > To: Pattan, Reshma
> > > > > Cc: dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder
> > > > > library
> > > > >
> > > > > On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > > > > > Hi All,
> > > > > >
> > > > > > I am planning to implement packet reorder library. Details
> > > > > > are as below,
> > > > > please go through them and provide the comments.
> > > > > >
> > > > > > Requirement:
> > > > > > To reorder out of ordered packets that are
> > > > > > received from different
> > > > > cores.
> > > > > >
> > > > > > Usage:
> > > > > > To be used along with distributor library. Next version of
> > > > > > distributor are
> > > > > planned to distribute incoming packets to all worker cores
> > > > > irrespective of the flow type.
> > > > > > In this case to ensure in order delivery of the packets at
> > > > > > output side reorder
> > > > > library will used by the tx end.
> > > > > >
> > > > > > Assumption:
> > > > > > All input packets will be marked with sequence number in seqn
> > > > > > field of mbuf,
> > > > > this will be the reference for reordering at the tx end.
> > > > > > Sequence number will be of type uint32_t. New sequence number
> > > > > > field seqn
> > > > > will be added to mbuf structure.
> > > > > >
> > > > > > Design:
> > > > > > a)There will be reorder buffer(circular buffer) structure
> > > > > > maintained in reorder
> > > > > library to store reordered packets and other details of buffer
> > > > > like head to drain the packet from, min sequence number and other
> details.
> > > > > > b)Library will provide insert and drain
> > > > > > functions to reorder and fetch
> > > > > out the reordered packets respectively.
> > > > > > c)Users of library should pass the packets to insert functions for
> reordering.
> > > > > >
> > > > > > Insertion logic:
> > > > > > Sequence number of current packet will be used to calculate
> > > > > > offset in reorder
> > > > > buffer and write packet to the location in the reorder buffer
> > > > > corresponding to offset.
> > > > > > Offset is calculated as
> > > > > > difference of current packet sequence
> > > > > number and sequence number associated with reorder buffer.
> > > > > >
> > > > > > During sequence number wrapping or wrapping over of reorder
> > > > > > buffer size,
> > > > > before inserting the new packet we should move offset number of
> > > > > packets to other buffer called overflow buffer and advance the
> > > > > head of reorder buffer by "offset-reorder buffer size" and insert the new
> packet.
> > > > > >
> > > > > > Insert function:
> > > > > > int rte_reorder_insert(struct rte_reorder_buffer *buffer,
> > > > > > struct rte_mbuf *mbuf);
> > > > > > Note: Other insert function is also under plan to insert burst of packets.
> > > > > >
> > > > > > Reorder buffer:
> > > > > > struct rte_reorder_buffer {
> > > > > > unsigned int size; /* The size (number of entries) of the buffer.
> */
> > > > > > unsigned int mask; /* Mask (size - 1) of the buffer */
> > > > > > unsigned int head; /* Current head of buffer */
> > > > > > uint32_t min_seqn; /* latest sequence number associated with
> > > buffer
> > > > > */
> > > > > > struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /*
> > > > > > buffer to hold reordered mbufs */ };
> > > > > >
> > > > > > d)Users can fetch out the reordered packets by drain function
> > > > > > provided by
> > > > > library. Users must pass the mbuf array , drain function should
> > > > > fill passed mbuff array with the reordered buffer packets.
> > > > > > During drain operation, overflow buffer packets will be
> > > > > > fetched out first and
> > > > > then reorder buffer.
> > > > > >
> > > > > > Drain function:
> > > > > > int rte_reorder_drain(struct rte_reorder_buffer
> > > > > > *buffer, struct rte_mbuf **mbufs)
> > > > > >
> > > > > > Thanks,
> > > > > > Reshma
> > > > > >
> > > > >
> > > > > This seems reasonable, but why not integrate it with the
> > > > > distributor library rather than create a separate library for
> > > > > it? It seems as though the distributor library is a
> > > > > pre-requisite for this libraries use anyway, as otherwise there
> > > > > will not be anything to reorder Neil
> > > > >
> > > >
> > > > Hi Neil,
> > > >
> > > > Reorder library should be standalone , as there are many ways that
> > > > can cause out of ordering of packets, I just mentioned future
> > > > packet distributor
> > > enhancements as one of the example for out of ordering.
> > > > Other ways like, users can directly distribute the packets to
> > > > different cores via
> > > rings and that causes packet out of ordering as well.
> > > > So, keeping reorder library standalone would be good to work with
> > > > all packet
> > > distribution ways.
> > > >
> > >
> > >
> > > Hmm, ok, that seems reasonable.
> > >
> > > Just out of curiosity, where do you intend to inject the packet
> > > sequence number for this library?
> > >
> > Sequence number marking can be done in either of these places 1)PMD rx
> side 2) packet distributor process or 3) in application itself.
> >
> Thanks. So, that was the part of this that was getting to me. For this to work,
> you're going to have to mark frames externally to the re-order library (i.e.
> You're proposed library needs to rely on the application or the pmd to do proper
> marking of frames to reorder properly). Is it tennable in your mind to require
> that?
>
For the users of distributor sequence number marking will be automatic.
We are also considering ways to make sequence number marking automatic in PMD RX but at cost of performance hit. This approach will be optional but not mandatory, i.e. user configurable.
Thanks,
Reshma
> Neil
>
> > Thanks,
> > Reshma
> >
> > > Neil
> >
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 14:36 ` Pattan, Reshma
@ 2014-10-09 16:09 ` Neil Horman
2014-10-09 17:21 ` Matthew Hall
0 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2014-10-09 16:09 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
On Thu, Oct 09, 2014 at 02:36:06PM +0000, Pattan, Reshma wrote:
>
>
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Thursday, October 9, 2014 12:37 PM
> > To: Pattan, Reshma
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
> >
> > On Thu, Oct 09, 2014 at 10:27:55AM +0000, Pattan, Reshma wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > Sent: Wednesday, October 8, 2014 8:16 PM
> > > > To: Pattan, Reshma
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder
> > > > library
> > > >
> > > > On Wed, Oct 08, 2014 at 02:11:34PM +0000, Pattan, Reshma wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > > > Sent: Tuesday, October 7, 2014 12:22 PM
> > > > > > To: Pattan, Reshma
> > > > > > Cc: dev@dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder
> > > > > > library
> > > > > >
> > > > > > On Tue, Oct 07, 2014 at 09:33:06AM +0000, Pattan, Reshma wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I am planning to implement packet reorder library. Details
> > > > > > > are as below,
> > > > > > please go through them and provide the comments.
> > > > > > >
> > > > > > > Requirement:
> > > > > > > To reorder out of ordered packets that are
> > > > > > > received from different
> > > > > > cores.
> > > > > > >
> > > > > > > Usage:
> > > > > > > To be used along with distributor library. Next version of
> > > > > > > distributor are
> > > > > > planned to distribute incoming packets to all worker cores
> > > > > > irrespective of the flow type.
> > > > > > > In this case to ensure in order delivery of the packets at
> > > > > > > output side reorder
> > > > > > library will used by the tx end.
> > > > > > >
> > > > > > > Assumption:
> > > > > > > All input packets will be marked with sequence number in seqn
> > > > > > > field of mbuf,
> > > > > > this will be the reference for reordering at the tx end.
> > > > > > > Sequence number will be of type uint32_t. New sequence number
> > > > > > > field seqn
> > > > > > will be added to mbuf structure.
> > > > > > >
> > > > > > > Design:
> > > > > > > a)There will be reorder buffer(circular buffer) structure
> > > > > > > maintained in reorder
> > > > > > library to store reordered packets and other details of buffer
> > > > > > like head to drain the packet from, min sequence number and other
> > details.
> > > > > > > b)Library will provide insert and drain
> > > > > > > functions to reorder and fetch
> > > > > > out the reordered packets respectively.
> > > > > > > c)Users of library should pass the packets to insert functions for
> > reordering.
> > > > > > >
> > > > > > > Insertion logic:
> > > > > > > Sequence number of current packet will be used to calculate
> > > > > > > offset in reorder
> > > > > > buffer and write packet to the location in the reorder buffer
> > > > > > corresponding to offset.
> > > > > > > Offset is calculated as
> > > > > > > difference of current packet sequence
> > > > > > number and sequence number associated with reorder buffer.
> > > > > > >
> > > > > > > During sequence number wrapping or wrapping over of reorder
> > > > > > > buffer size,
> > > > > > before inserting the new packet we should move offset number of
> > > > > > packets to other buffer called overflow buffer and advance the
> > > > > > head of reorder buffer by "offset-reorder buffer size" and insert the new
> > packet.
> > > > > > >
> > > > > > > Insert function:
> > > > > > > int rte_reorder_insert(struct rte_reorder_buffer *buffer,
> > > > > > > struct rte_mbuf *mbuf);
> > > > > > > Note: Other insert function is also under plan to insert burst of packets.
> > > > > > >
> > > > > > > Reorder buffer:
> > > > > > > struct rte_reorder_buffer {
> > > > > > > unsigned int size; /* The size (number of entries) of the buffer.
> > */
> > > > > > > unsigned int mask; /* Mask (size - 1) of the buffer */
> > > > > > > unsigned int head; /* Current head of buffer */
> > > > > > > uint32_t min_seqn; /* latest sequence number associated with
> > > > buffer
> > > > > > */
> > > > > > > struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /*
> > > > > > > buffer to hold reordered mbufs */ };
> > > > > > >
> > > > > > > d)Users can fetch out the reordered packets by drain function
> > > > > > > provided by
> > > > > > library. Users must pass the mbuf array , drain function should
> > > > > > fill passed mbuff array with the reordered buffer packets.
> > > > > > > During drain operation, overflow buffer packets will be
> > > > > > > fetched out first and
> > > > > > then reorder buffer.
> > > > > > >
> > > > > > > Drain function:
> > > > > > > int rte_reorder_drain(struct rte_reorder_buffer
> > > > > > > *buffer, struct rte_mbuf **mbufs)
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Reshma
> > > > > > >
> > > > > >
> > > > > > This seems reasonable, but why not integrate it with the
> > > > > > distributor library rather than create a separate library for
> > > > > > it? It seems as though the distributor library is a
> > > > > > pre-requisite for this libraries use anyway, as otherwise there
> > > > > > will not be anything to reorder Neil
> > > > > >
> > > > >
> > > > > Hi Neil,
> > > > >
> > > > > Reorder library should be standalone , as there are many ways that
> > > > > can cause out of ordering of packets, I just mentioned future
> > > > > packet distributor
> > > > enhancements as one of the example for out of ordering.
> > > > > Other ways like, users can directly distribute the packets to
> > > > > different cores via
> > > > rings and that causes packet out of ordering as well.
> > > > > So, keeping reorder library standalone would be good to work with
> > > > > all packet
> > > > distribution ways.
> > > > >
> > > >
> > > >
> > > > Hmm, ok, that seems reasonable.
> > > >
> > > > Just out of curiosity, where do you intend to inject the packet
> > > > sequence number for this library?
> > > >
> > > Sequence number marking can be done in either of these places 1)PMD rx
> > side 2) packet distributor process or 3) in application itself.
> > >
> > Thanks. So, that was the part of this that was getting to me. For this to work,
> > you're going to have to mark frames externally to the re-order library (i.e.
> > You're proposed library needs to rely on the application or the pmd to do proper
> > marking of frames to reorder properly). Is it tennable in your mind to require
> > that?
> >
>
> For the users of distributor sequence number marking will be automatic.
I'm not suggesting that it won't be automatic, I'm suggesting that it will
potentially be haphazzard.
>From what you've said above, sequence assignment needs to occur prior to any
order breaking event. That means you either need to do it in individual PMD's
on RX, or in the rte_eth library if you want to make it common. On the TX side
you need to ensure that the application applies sequence numbers uniquely and in
a predictable fashion in order for the library to work. that seems like a large
external requirement for this functionality to work, and I was wondering how you
were going to address that.
> We are also considering ways to make sequence number marking automatic in PMD RX but at cost of performance hit. This approach will be optional but not mandatory, i.e. user configurable.
>
> Thanks,
> Reshma
>
> > Neil
> >
> > > Thanks,
> > > Reshma
> > >
> > > > Neil
> > >
> > >
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 9:14 ` Bruce Richardson
@ 2014-10-09 17:11 ` Matthew Hall
2014-10-10 10:59 ` Bruce Richardson
0 siblings, 1 reply; 20+ messages in thread
From: Matthew Hall @ 2014-10-09 17:11 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
On Thu, Oct 09, 2014 at 10:14:21AM +0100, Bruce Richardson wrote:
> Hi Matthew,
>
> What you are doing will indeed work, and it's the way the vast majority of
> the sample apps are written. However, this will not always work for everyone
> else, sadly.
>
> First off, with RSS, there are a number of limitations. On the 1G and 10G
> NICs RSS works only with IP traffic, and won't work in cases with other
> protocols or where IP is encapsulated in anything other than a single VLAN.
> Those cases need software load distribution. As well as this, you have very
> little control over where flows get put, as the separation into queues
> (which go to cores), is only done on the low seven bits. For applications
> which work with a small number of flows, e.g. where multiple flows are
> contained inside a single tunnel, you get a get a large flow imbalance,
> where you get far more traffic coming to one queue/core than to another.
> Again in this instance, software load balancing is needed.
>
> Secondly, then, based off that, it is entirely possible when doing software
> load balancing to strictly process packets for a flow in order - and indeed
> this is what the existing packet distributor does. However, for certain
> types of flow where processing of packets for that flow can be done in
> parallel, forcing things to be done serially can slow things down. As well
> as this, there can sometimes be requirements for the load balancing between
> cores to be done as fairly as possible so that it is guaranteed that all
> cores have approx the same load, irrespective of the number of input flows.
> In these cases, having the option to blindly distribute traffic to cores and
> then reorder packets on TX is the best way to ensure even load distribution.
> It's not going to be for everyone, but it's good to have the option - and
> there are a number of people doing things this way already.
>
> Lastly, there is also the assumption being made that all flows are
> independent, which again may not always be the case. If you need ordering
> across flows and to share load between cores then reordering on transmission
> is the only way to do things.
>
> Hope this helps,
>
> Regards,
> /Bruce
Bruce,
This explanation is of excellent quality.
It would be nice if it could be made into a whitepaper about the different
L2-L7 acceleration technologies available in the Intel NICs, popular VNICs
(virtio-net and vmxnet3), Intel CPUs, and DPDK code, all working together. Or
incorporated into such a document if it already exists.
Without things like this it's very hard to understand when and how to enable
the different accelerations can be used together, when they'll work, and when
they won't work.
For example, I didn't know RSS only worked on IP... I was assuming it would do
a consistent-hash of MAC's for non-IP packets at least... also, when it
doesn't know what to do, does it send them to the default queue, or a random
FIFO RX queue picks it up or what?
Matthew.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 16:09 ` Neil Horman
@ 2014-10-09 17:21 ` Matthew Hall
2014-10-09 17:55 ` Neil Horman
0 siblings, 1 reply; 20+ messages in thread
From: Matthew Hall @ 2014-10-09 17:21 UTC (permalink / raw)
To: Neil Horman; +Cc: dev
On Thu, Oct 09, 2014 at 12:09:42PM -0400, Neil Horman wrote:
> From what you've said above, sequence assignment needs to occur prior to any
> order breaking event. That means you either need to do it in individual PMD's
> on RX, or in the rte_eth library if you want to make it common. On the TX side
> you need to ensure that the application applies sequence numbers uniquely and in
> a predictable fashion in order for the library to work. that seems like a large
> external requirement for this functionality to work, and I was wondering how you
> were going to address that.
To me it seems like the actualy mechanism of applying the sequence number to
the packet belongs in librte_reorder or librte_mbuf.
Then, a config define or runtime setting should be used to determine if the
operation actually does something or becomes a no-op. Likely a define is
needed so the compiler can optimize the code out when it's not used, like for
RSS-based cases like mine.
If the type of sequencing needed could be applied during allocation from the
pool, that would be nice because it would be transparent, as long as the PMD's
retrieved their new mbuf's the right way.
If the sequencing to be applied based on properties from L4-L7, then it seems
like the distributor would have to be sure to do call the right functions
itself, as would other customers, because there'd be no way to magically get
them right before you've looked at L4-L7 data first.
Matthew.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 17:21 ` Matthew Hall
@ 2014-10-09 17:55 ` Neil Horman
0 siblings, 0 replies; 20+ messages in thread
From: Neil Horman @ 2014-10-09 17:55 UTC (permalink / raw)
To: Matthew Hall; +Cc: dev
On Thu, Oct 09, 2014 at 10:21:35AM -0700, Matthew Hall wrote:
> On Thu, Oct 09, 2014 at 12:09:42PM -0400, Neil Horman wrote:
> > From what you've said above, sequence assignment needs to occur prior to any
> > order breaking event. That means you either need to do it in individual PMD's
> > on RX, or in the rte_eth library if you want to make it common. On the TX side
> > you need to ensure that the application applies sequence numbers uniquely and in
> > a predictable fashion in order for the library to work. that seems like a large
> > external requirement for this functionality to work, and I was wondering how you
> > were going to address that.
>
> To me it seems like the actualy mechanism of applying the sequence number to
> the packet belongs in librte_reorder or librte_mbuf.
>
Perhaps, but this is why I was asking about wanting to merge this functionality
into another library, because segmentation and reordering isn't really an
isolated function, its an operation that needs hooks in a multiple points in the
stack, and as such may be better suited to integration into another library
(distributor as I mentioned before, or mbuf as you note here). Otherwise you're
left with a library that works only if some external mechanism is implemented
properly for each pmd/application that wants to support it.
> Then, a config define or runtime setting should be used to determine if the
> operation actually does something or becomes a no-op. Likely a define is
> needed so the compiler can optimize the code out when it's not used, like for
> RSS-based cases like mine.
>
I'm sure that would be the case. Thats not the hard part though. The hard part
is figuring out the right policy and way to assign sequence numbers when their
used, in such a way that minimal external code change is needed to support it,
and in such a way that it "always works"
> If the type of sequencing needed could be applied during allocation from the
> pool, that would be nice because it would be transparent, as long as the PMD's
> retrieved their new mbuf's the right way.
>
Sure, that would be a fine way to do it, but the mbuf pool currently I don't
think differentiates between flows, does it? And as such we either have to:
a) teach it about flows and let PMD's specify a flow from which to draw a
sequence number
b) Allow all nics to share a sequence pool (i.e. serialize sequence number
access and take the performance hit).
c) Segment the sequence name space in such a way that each rx/tx queue can
distribute sequence numbers independently
Not saying any of this is bad, just a question I'm curious as to the answer to.
> If the sequencing to be applied based on properties from L4-L7, then it seems
> like the distributor would have to be sure to do call the right functions
> itself, as would other customers, because there'd be no way to magically get
> them right before you've looked at L4-L7 data first.
>
True, though from a tx standpoint the order the data is marshalled gives you the
opportunity to sequence the data without explicit L4-L7 knoweldge
Neil
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-07 9:33 [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library Pattan, Reshma
2014-10-07 11:21 ` Neil Horman
2014-10-08 22:41 ` Matthew Hall
@ 2014-10-09 19:01 ` Jay Rolette
2014-10-17 9:44 ` Pattan, Reshma
2 siblings, 1 reply; 20+ messages in thread
From: Jay Rolette @ 2014-10-09 19:01 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
Hi Reshma,
A few comments and questions about your design...
1) How do you envision the reorder library to be used? Based on the
description, it seems like the expectation is that packet order would be
maintained at either the interface/port level or maybe at the RX queue
level. Is that right or am I reading too much between the lines?
For my purposes (and for network security products I've developed in the
past), I mostly don't care about box or port-level order. The most
important thing is to maintain packet order within a flow. Relative order
from packets in different flows doesn't matter. If there is a way I can
process packets in parallel and transmit out-of-order transmission *within
the flow*, that's very useful. Architecturally, it helps avoid hot-spotting
in my packet processing pipeline and wasting cycles when load-balancing
isn't perfect (and it never is).
2) If the reorder library is "flow aware", then give me flexibility on
deciding what a flow is. Let me define pseudo-flows even if the protocol
itself isn't connection oriented (ie., frequently useful to treat UDP
5-tuples as a flow). I may want to include tunnels/VLANs/etc. as part of my
"flow" definition. I may need to include the physical port as part of the
flow definition.
Ideally, the library includes the common cases and gives me the option to
register a callback function for doing whatever sort of "flows" I require
for my app.
3) Is there a way to apply the reorder library to some packets and not
others? I might want to use for TCP and UDP, but not care about order for
other IP traffic (for example).
4) How are you dealing with internal congestion? If I drop a packet
somewhere in my processing pipeline, how does the TX side of the reorder
queue/buffer deal with the missing sequence number? Is there some sort of
timeout mechanism so that it will only wait for X microseconds for a
missing sequence number?
Need the ability to bound how long packets are held up in the reorder
engine before they are released.
Assuming you address this, the reorder engine will also need to deal with
slow packets that show up after "later" packets were transmitted.
Regards,
Jay
On Tue, Oct 7, 2014 at 4:33 AM, Pattan, Reshma <reshma.pattan@intel.com>
wrote:
> Hi All,
>
> I am planning to implement packet reorder library. Details are as below,
> please go through them and provide the comments.
>
> Requirement:
> To reorder out of ordered packets that are received from
> different cores.
>
> Usage:
> To be used along with distributor library. Next version of distributor are
> planned to distribute incoming packets to all worker cores irrespective of
> the flow type.
> In this case to ensure in order delivery of the packets at output side
> reorder library will used by the tx end.
>
> Assumption:
> All input packets will be marked with sequence number in seqn field of
> mbuf, this will be the reference for reordering at the tx end.
> Sequence number will be of type uint32_t. New sequence number field seqn
> will be added to mbuf structure.
>
> Design:
> a)There will be reorder buffer(circular buffer) structure maintained in
> reorder library to store reordered packets and other details of buffer like
> head to drain the packet from, min sequence number and other details.
> b)Library will provide insert and drain functions to
> reorder and fetch out the reordered packets respectively.
> c)Users of library should pass the packets to insert functions for
> reordering.
>
> Insertion logic:
> Sequence number of current packet will be used to calculate offset in
> reorder buffer and write packet to the location in the reorder buffer
> corresponding to offset.
> Offset is calculated as difference of current
> packet sequence number and sequence number associated with reorder buffer.
>
> During sequence number wrapping or wrapping over of reorder buffer size,
> before inserting the new packet we should move offset number of packets to
> other buffer called overflow buffer and advance the head of reorder buffer
> by "offset-reorder buffer size" and insert the new packet.
>
> Insert function:
> int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct rte_mbuf
> *mbuf);
> Note: Other insert function is also under plan to insert burst of packets.
>
> Reorder buffer:
> struct rte_reorder_buffer {
> unsigned int size; /* The size (number of entries) of the
> buffer. */
> unsigned int mask; /* Mask (size - 1) of the buffer */
> unsigned int head; /* Current head of buffer */
> uint32_t min_seqn; /* latest sequence number associated with
> buffer */
> struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer to
> hold reordered mbufs */
> };
>
> d)Users can fetch out the reordered packets by drain function provided by
> library. Users must pass the mbuf array , drain function should fill
> passed mbuff array with the reordered buffer packets.
> During drain operation, overflow buffer packets will be fetched out first
> and then reorder buffer.
>
> Drain function:
> int rte_reorder_drain(struct rte_reorder_buffer *buffer,
> struct rte_mbuf **mbufs)
>
> Thanks,
> Reshma
>
> --------------------------------------------------------------
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient(s). Any review or distribution by others
> is strictly prohibited. If you are not the intended recipient, please
> contact the sender and delete all copies.
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 17:11 ` Matthew Hall
@ 2014-10-10 10:59 ` Bruce Richardson
0 siblings, 0 replies; 20+ messages in thread
From: Bruce Richardson @ 2014-10-10 10:59 UTC (permalink / raw)
To: Matthew Hall; +Cc: dev
On Thu, Oct 09, 2014 at 10:11:35AM -0700, Matthew Hall wrote:
> On Thu, Oct 09, 2014 at 10:14:21AM +0100, Bruce Richardson wrote:
> > Hi Matthew,
> >
> > What you are doing will indeed work, and it's the way the vast majority of
> > the sample apps are written. However, this will not always work for everyone
> > else, sadly.
> >
> > First off, with RSS, there are a number of limitations. On the 1G and 10G
> > NICs RSS works only with IP traffic, and won't work in cases with other
> > protocols or where IP is encapsulated in anything other than a single VLAN.
> > Those cases need software load distribution. As well as this, you have very
> > little control over where flows get put, as the separation into queues
> > (which go to cores), is only done on the low seven bits. For applications
> > which work with a small number of flows, e.g. where multiple flows are
> > contained inside a single tunnel, you get a get a large flow imbalance,
> > where you get far more traffic coming to one queue/core than to another.
> > Again in this instance, software load balancing is needed.
> >
> > Secondly, then, based off that, it is entirely possible when doing software
> > load balancing to strictly process packets for a flow in order - and indeed
> > this is what the existing packet distributor does. However, for certain
> > types of flow where processing of packets for that flow can be done in
> > parallel, forcing things to be done serially can slow things down. As well
> > as this, there can sometimes be requirements for the load balancing between
> > cores to be done as fairly as possible so that it is guaranteed that all
> > cores have approx the same load, irrespective of the number of input flows.
> > In these cases, having the option to blindly distribute traffic to cores and
> > then reorder packets on TX is the best way to ensure even load distribution.
> > It's not going to be for everyone, but it's good to have the option - and
> > there are a number of people doing things this way already.
> >
> > Lastly, there is also the assumption being made that all flows are
> > independent, which again may not always be the case. If you need ordering
> > across flows and to share load between cores then reordering on transmission
> > is the only way to do things.
> >
> > Hope this helps,
> >
> > Regards,
> > /Bruce
>
> Bruce,
>
> This explanation is of excellent quality.
>
> It would be nice if it could be made into a whitepaper about the different
> L2-L7 acceleration technologies available in the Intel NICs, popular VNICs
> (virtio-net and vmxnet3), Intel CPUs, and DPDK code, all working together. Or
> incorporated into such a document if it already exists.
>
> Without things like this it's very hard to understand when and how to enable
> the different accelerations can be used together, when they'll work, and when
> they won't work.
>
> For example, I didn't know RSS only worked on IP... I was assuming it would do
> a consistent-hash of MAC's for non-IP packets at least... also, when it
> doesn't know what to do, does it send them to the default queue, or a random
> FIFO RX queue picks it up or what?
>
When RSS gets a non-IP packet, or a packet it can't hash, that packet will
be put into queue 0. This leads to a number of little tricks we can use if
we have a mix of IP/non-IP traffic.
1. To simply separate out IP traffic from non-IP traffic, we just turn on
RSS and update the reta table to have all entries set to queue 1. This means
that all IP traffic goes to queue 1, and all other traffic to queue 0.
2. If you want to separate IPv6 from IPv4, you can do the exact same thing
as in point 1, except only turn on RSS for one of the protocols. If you only
turn on RSS for IPv4, then IPv6 traffic should be treated as non-IP and go
to queue 0.
3. If you have IP and non-IP traffic going to a set of ports and are using
multiple RSS queues to split that traffic across multiple cores, such that
each core also reads from each port [e.g. 4 ports, and 4 cores, where each
core reads one RSS queue on each port], you can "rotate" the RSS table
between ports so that you also load-balance the non-IP traffic coming in.
Taking the referenced example, instead of having core 0 read queue 0 on each
port, you have the values that hash to queue 0 on port 0 get directed to
queue 1 on port 1, queue 2 on port 2, etc. Then core 0 [and every other
core] reads a different queue number on each port - while still getting the
same flows. Furthermore, since the non-IP traffic is unaffected and always
goes to queue 0, the non-IP traffic to each port gets handled by a different
core, rather than all non-IP traffic going to core 0 as would be the case in
the default setup. [Yes, this would be the case too if you took the simple
option of just having one core per port, but doing things this way also
gives you load balancing if one port is busier than the others.]
Finally, I'd just note that RSS is documented in section 7.1.2.8 of the
datasheet for the Intel 82599 10 GbE Controller, and to read up there for
any more information.
Regards,
/Bruce
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-09 19:01 ` Jay Rolette
@ 2014-10-17 9:44 ` Pattan, Reshma
2014-10-17 16:26 ` Jay Rolette
2014-10-18 17:26 ` Matthew Hall
0 siblings, 2 replies; 20+ messages in thread
From: Pattan, Reshma @ 2014-10-17 9:44 UTC (permalink / raw)
To: Jay Rolette; +Cc: dev
Hi Jay,
Please find comments inline.
Thanks,
Reshma
From: Jay Rolette [mailto:rolette@infiniteio.com]
Sent: Thursday, October 9, 2014 8:02 PM
To: Pattan, Reshma
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
Hi Reshma,
A few comments and questions about your design...
1) How do you envision the reorder library to be used? Based on the description, it seems like the expectation is that packet order would be maintained at either the interface/port level or maybe at the RX queue level. Is that right or am I reading too much between the lines?
For my purposes (and for network security products I've developed in the past), I mostly don't care about box or port-level order. The most important thing is to maintain packet order within a flow. Relative order from packets in different flows doesn't matter. If there is a way I can process packets in parallel and transmit out-of-order transmission *within the flow*, that's very useful. Architecturally, it helps avoid hot-spotting in my packet processing pipeline and wasting cycles when load-balancing isn't perfect (and it never is).
[Reshma]: Generic parallel processing of packets is planned in phase2 version of distributor based on sequence numbers, but not flow based parallel processing.
2) If the reorder library is "flow aware", then give me flexibility on deciding what a flow is. Let me define pseudo-flows even if the protocol itself isn't connection oriented (ie., frequently useful to treat UDP 5-tuples as a flow). I may want to include tunnels/VLANs/etc. as part of my "flow" definition. I may need to include the physical port as part of the flow definition.
Ideally, the library includes the common cases and gives me the option to register a callback function for doing whatever sort of "flows" I require for my app.
[Reshma]:It is not flow aware. But to reorder packets of particular flow, you can handover particular flow to the library and library will give you back the reordered data.
3) Is there a way to apply the reorder library to some packets and not others? I might want to use for TCP and UDP, but not care about order for other IP traffic (for example).
[Reshma]:No, reorder library will not have intelligence about traffic type (i.e. flow or protocols based).
Applications can do traffic splitting into flows or protocol based and handover to library for reordering
4) How are you dealing with internal congestion? If I drop a packet somewhere in my processing pipeline, how does the TX side of the reorder queue/buffer deal with the missing sequence number? Is there some sort of timeout mechanism so that it will only wait for X microseconds for a missing sequence number?
[Reshma]: Library just takes care of packets what it has got. No waiting mechanism is used for missing packets.
Reorder processing will skip the dropped packets(i.e. will create a gap in reorder buffer) and proceed with allocation of slot to the later packets which are available.
Need the ability to bound how long packets are held up in the reorder engine before they are released.
[Reshma]: This is dependent upon how frequently packets are enqueued and dequeued from it. Packets which are in order and without gaps are dequeued at the next call to the dequeue api. If there is a gap, the time taken to skip over the gap will depend on the size of the reorder ring.
Assuming you address this, the reorder engine will also need to deal with slow packets that show up after "later" packets were transmitted.
[Reshma]: As of now, plan is to check sequence number of current packet that library has got with the min sequence number maintained in the library.
The difference between them should not cross 2*reorder_buffer_size . If so we don’t handle such packet and drop it.
But, we are open to suggestions on how to handle late packets? Should we have config option to drop them or just deque them in next immediate dequeue operation.
Regards,
Jay
On Tue, Oct 7, 2014 at 4:33 AM, Pattan, Reshma <reshma.pattan@intel.com<mailto:reshma.pattan@intel.com>> wrote:
Hi All,
I am planning to implement packet reorder library. Details are as below, please go through them and provide the comments.
Requirement:
To reorder out of ordered packets that are received from different cores.
Usage:
To be used along with distributor library. Next version of distributor are planned to distribute incoming packets to all worker cores irrespective of the flow type.
In this case to ensure in order delivery of the packets at output side reorder library will used by the tx end.
Assumption:
All input packets will be marked with sequence number in seqn field of mbuf, this will be the reference for reordering at the tx end.
Sequence number will be of type uint32_t. New sequence number field seqn will be added to mbuf structure.
Design:
a)There will be reorder buffer(circular buffer) structure maintained in reorder library to store reordered packets and other details of buffer like head to drain the packet from, min sequence number and other details.
b)Library will provide insert and drain functions to reorder and fetch out the reordered packets respectively.
c)Users of library should pass the packets to insert functions for reordering.
Insertion logic:
Sequence number of current packet will be used to calculate offset in reorder buffer and write packet to the location in the reorder buffer corresponding to offset.
Offset is calculated as difference of current packet sequence number and sequence number associated with reorder buffer.
During sequence number wrapping or wrapping over of reorder buffer size, before inserting the new packet we should move offset number of packets to other buffer called overflow buffer and advance the head of reorder buffer by "offset-reorder buffer size" and insert the new packet.
Insert function:
int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct rte_mbuf *mbuf);
Note: Other insert function is also under plan to insert burst of packets.
Reorder buffer:
struct rte_reorder_buffer {
unsigned int size; /* The size (number of entries) of the buffer. */
unsigned int mask; /* Mask (size - 1) of the buffer */
unsigned int head; /* Current head of buffer */
uint32_t min_seqn; /* latest sequence number associated with buffer */
struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer to hold reordered mbufs */
};
d)Users can fetch out the reordered packets by drain function provided by library. Users must pass the mbuf array , drain function should fill passed mbuff array with the reordered buffer packets.
During drain operation, overflow buffer packets will be fetched out first and then reorder buffer.
Drain function:
int rte_reorder_drain(struct rte_reorder_buffer *buffer, struct rte_mbuf **mbufs)
Thanks,
Reshma
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-17 9:44 ` Pattan, Reshma
@ 2014-10-17 16:26 ` Jay Rolette
2014-10-18 17:26 ` Matthew Hall
1 sibling, 0 replies; 20+ messages in thread
From: Jay Rolette @ 2014-10-17 16:26 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
Thanks for the responses, Reshma.
Can you provide a little more context about the use case that your reorder
library is intended to help with? If I'm understanding your answers
correctly, the functionality seems pretty limited and not something I would
ever end up using, but that may be more about the types of products I build
(deep packet inspection and working at L4-L7 generally, even though running
at or near line-rate).
Please take my comments in the spirit intended... If the design makes sense
for different use cases and I'm not the target audience, that's perfectly
ok and there are probably different trade-offs being made. But if it is
intended to be useful for DPI applications, I'd hate to just be quiet and
end up with something that doesn't get used as much as it might.
I haven't looked at the distributor library, so entirely possible it makes
more sense in that context.
More detailed responses to your previous answers inline.
Regards,
Jay
On Fri, Oct 17, 2014 at 4:44 AM, Pattan, Reshma <reshma.pattan@intel.com>
wrote:
> Hi Jay,
>
>
>
> Please find comments inline.
>
>
>
> Thanks,
>
> Reshma
>
>
>
> *From:* Jay Rolette [mailto:rolette@infiniteio.com]
> *Sent:* Thursday, October 9, 2014 8:02 PM
> *To:* Pattan, Reshma
> *Cc:* dev@dpdk.org
> *Subject:* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
>
>
>
> Hi Reshma,
>
>
>
> A few comments and questions about your design...
>
>
>
> 1) How do you envision the reorder library to be used? Based on the
> description, it seems like the expectation is that packet order would be
> maintained at either the interface/port level or maybe at the RX queue
> level. Is that right or am I reading too much between the lines?
>
>
>
> For my purposes (and for network security products I've developed in the
> past), I mostly don't care about box or port-level order. The most
> important thing is to maintain packet order within a flow. Relative order
> from packets in different flows doesn't matter. If there is a way I can
> process packets in parallel and transmit out-of-order transmission *within
> the flow*, that's very useful. Architecturally, it helps avoid hot-spotting
> in my packet processing pipeline and wasting cycles when load-balancing
> isn't perfect (and it never is).
>
> [Reshma]: Generic parallel processing of packets is planned in phase2
> version of distributor based on sequence numbers, but not flow based
> parallel processing.
>
See question at the top of my email about the intended use-case. For DPI
applications, global (box-wide or per port) reordering isn't normally
required. Maintaining order within flows is the important part. Depending
on your implementation and the guarantees you make, the impact it has on
aggregate system throughput can be significant.
> 2) If the reorder library is "flow aware", then give me flexibility on
> deciding what a flow is. Let me define pseudo-flows even if the protocol
> itself isn't connection oriented (ie., frequently useful to treat UDP
> 5-tuples as a flow). I may want to include tunnels/VLANs/etc. as part of my
> "flow" definition. I may need to include the physical port as part of the
> flow definition.
>
>
>
> Ideally, the library includes the common cases and gives me the option to
> register a callback function for doing whatever sort of "flows" I require
> for my app.
>
> [Reshma]:It is not flow aware. But to reorder packets of particular flow,
> you can handover particular flow to the library and library will give you
> back the reordered data.
>
I think given how a couple of other bits are described, this doesn't end up
helping. More a bit further down.
> 3) Is there a way to apply the reorder library to some packets and not
> others? I might want to use for TCP and UDP, but not care about order for
> other IP traffic (for example).
>
> [Reshma]:No, reorder library will not have intelligence about traffic
> type (i.e. flow or protocols based).
>
> Applications can do traffic splitting into flows or protocol based and
> handover to library for reordering
>
Ditto
4) How are you dealing with internal congestion? If I drop a packet
> somewhere in my processing pipeline, how does the TX side of the reorder
> queue/buffer deal with the missing sequence number? Is there some sort of
> timeout mechanism so that it will only wait for X microseconds for a
> missing sequence number?
>
> [Reshma]: Library just takes care of packets what it has got. No waiting
> mechanism is used for missing packets.
>
> Reorder processing will skip the dropped packets(i.e. will create a gap in
> reorder buffer) and proceed with allocation of slot to the later packets
> which are available.
>
>
>
> Need the ability to bound how long packets are held up in the reorder
> engine before they are released.
>
> [Reshma]: This is dependent upon how frequently packets are enqueued and
> dequeued from it. Packets which are in order and without gaps are dequeued
> at the next call to the dequeue api. If there is a gap, the time taken to
> skip over the gap will depend on the size of the reorder ring.
>
So the window for correcting out-of-order is nothing more than whatever
queueing delays happen to be on the TX queue? That seems... not very
useful. Am I missing something about the design?
For DPI applications, processing time is somewhat variable between
different packets in a flow. I'm assuming L3 apps have similar issues with
control plane traffic. In a low-latency architecture, very few packets
should be sitting in any TX queues so you really need something with some
time/cycle-count constraints to manage that window - ie., how long should
other packets in a flow be held up in the TX queue waiting for "earlier"
packets vs. transmitting them anyway?
Assuming you address this, the reorder engine will also need to deal with
> slow packets that show up after "later" packets were transmitted.
>
> [Reshma]: As of now, plan is to check sequence number of current packet
> that library has got with the min sequence number maintained in the library.
>
> The difference between them should not cross 2*reorder_buffer_size . If so
> we don’t handle such packet and drop it.
>
> But, we are open to suggestions on how to handle late packets? Should we
> have config option to drop them or just deque them in next immediate
> dequeue operation.
>
Config option is the most flexible, but I would expect the normal case to
be to TX the packet "quickly". I'm hesitant to say deque it in the next
immediate dequeue operation because there are potential DoS attack vectors
on the system depending on implementation details.
> On Tue, Oct 7, 2014 at 4:33 AM, Pattan, Reshma <reshma.pattan@intel.com>
> wrote:
>
> Hi All,
>
> I am planning to implement packet reorder library. Details are as below,
> please go through them and provide the comments.
>
> Requirement:
> To reorder out of ordered packets that are received from
> different cores.
>
> Usage:
> To be used along with distributor library. Next version of distributor are
> planned to distribute incoming packets to all worker cores irrespective of
> the flow type.
> In this case to ensure in order delivery of the packets at output side
> reorder library will used by the tx end.
>
> Assumption:
> All input packets will be marked with sequence number in seqn field of
> mbuf, this will be the reference for reordering at the tx end.
> Sequence number will be of type uint32_t. New sequence number field seqn
> will be added to mbuf structure.
>
> Design:
> a)There will be reorder buffer(circular buffer) structure maintained in
> reorder library to store reordered packets and other details of buffer like
> head to drain the packet from, min sequence number and other details.
> b)Library will provide insert and drain functions to
> reorder and fetch out the reordered packets respectively.
> c)Users of library should pass the packets to insert functions for
> reordering.
>
> Insertion logic:
> Sequence number of current packet will be used to calculate offset in
> reorder buffer and write packet to the location in the reorder buffer
> corresponding to offset.
> Offset is calculated as difference of current
> packet sequence number and sequence number associated with reorder buffer.
>
> During sequence number wrapping or wrapping over of reorder buffer size,
> before inserting the new packet we should move offset number of packets to
> other buffer called overflow buffer and advance the head of reorder buffer
> by "offset-reorder buffer size" and insert the new packet.
>
> Insert function:
> int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct rte_mbuf
> *mbuf);
> Note: Other insert function is also under plan to insert burst of packets.
>
> Reorder buffer:
> struct rte_reorder_buffer {
> unsigned int size; /* The size (number of entries) of the
> buffer. */
> unsigned int mask; /* Mask (size - 1) of the buffer */
> unsigned int head; /* Current head of buffer */
> uint32_t min_seqn; /* latest sequence number associated with
> buffer */
> struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer to
> hold reordered mbufs */
> };
>
> d)Users can fetch out the reordered packets by drain function provided by
> library. Users must pass the mbuf array , drain function should fill
> passed mbuff array with the reordered buffer packets.
> During drain operation, overflow buffer packets will be fetched out first
> and then reorder buffer.
>
> Drain function:
> int rte_reorder_drain(struct rte_reorder_buffer *buffer,
> struct rte_mbuf **mbufs)
>
> Thanks,
> Reshma
>
> --------------------------------------------------------------
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient(s). Any review or distribution by others
> is strictly prohibited. If you are not the intended recipient, please
> contact the sender and delete all copies.
>
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
2014-10-17 9:44 ` Pattan, Reshma
2014-10-17 16:26 ` Jay Rolette
@ 2014-10-18 17:26 ` Matthew Hall
1 sibling, 0 replies; 20+ messages in thread
From: Matthew Hall @ 2014-10-18 17:26 UTC (permalink / raw)
To: Pattan, Reshma; +Cc: dev
On Fri, Oct 17, 2014 at 09:44:49AM +0000, Pattan, Reshma wrote:
> [Reshma]: Library just takes care of packets what it has got. No waiting
> mechanism is used for missing packets.
> [Reshma]: This is dependent upon how frequently packets are enqueued and
> dequeued from it. Packets which are in order and without gaps are dequeued
> at the next call to the dequeue api. If there is a gap, the time taken to
> skip over the gap will depend on the size of the reorder ring.
I am not sure this library will help much if it can't handle missing / delayed
packets in some way.
When you're using a non-flow-aware distributor and moving packets around
randomly / round robin / not using flows, you'll end up with some cases where
a packet has some HOL blocking in some core and another later packet comes
through faster on another core.
Maybe I missed something here, but I think there's got to be some logic where
you can tell it's not ready to dequeue something yet, because there's a gap it
needs to fill, or flows will break a lot due to race conditions I suspect.
Matthew.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-10-18 17:19 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-07 9:33 [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library Pattan, Reshma
2014-10-07 11:21 ` Neil Horman
2014-10-08 14:11 ` Pattan, Reshma
2014-10-08 19:15 ` Neil Horman
2014-10-09 10:27 ` Pattan, Reshma
2014-10-09 11:36 ` Neil Horman
2014-10-09 14:36 ` Pattan, Reshma
2014-10-09 16:09 ` Neil Horman
2014-10-09 17:21 ` Matthew Hall
2014-10-09 17:55 ` Neil Horman
2014-10-08 22:41 ` Matthew Hall
2014-10-08 22:55 ` Neil Horman
2014-10-08 23:07 ` Matthew Hall
2014-10-09 9:14 ` Bruce Richardson
2014-10-09 17:11 ` Matthew Hall
2014-10-10 10:59 ` Bruce Richardson
2014-10-09 19:01 ` Jay Rolette
2014-10-17 9:44 ` Pattan, Reshma
2014-10-17 16:26 ` Jay Rolette
2014-10-18 17:26 ` Matthew Hall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).