DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev]  mbuff rearm_data aligmenet issue on non x86
@ 2016-05-12  9:14 Jerin Jacob
  2016-05-12 10:07 ` Ananyev, Konstantin
  0 siblings, 1 reply; 5+ messages in thread
From: Jerin Jacob @ 2016-05-12  9:14 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, thomas.monjalon

Hi All,

I would like align mbuff rearm_data field to 8 byte aligned so that
write to mbuf->rearm_data with uint64_t* will be naturally aligned.
I am not sure about IA but some other architecture/implementation has overhead
in non-naturally aligned stores.

Proposed patch is something like this below, But open for any change to
make fit for all other architectures/platform.

Any thoughts ?

➜ [master] [dpdk-master] $ git diff
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 529debb..5a917d0 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -733,10 +733,8 @@ struct rte_mbuf {
        void *buf_addr;           /**< Virtual address of segment
buffer. */
        phys_addr_t buf_physaddr; /**< Physical address of segment
buffer. */
 
-       uint16_t buf_len;         /**< Length of segment buffer. */
-
        /* next 6 bytes are initialised on RX descriptor rearm */
-       MARKER8 rearm_data;
+       MARKER64 rearm_data;
        uint16_t data_off;
 
        /**
@@ -754,6 +752,7 @@ struct rte_mbuf {
        uint8_t nb_segs;          /**< Number of segments. */
        uint8_t port;             /**< Input port. */
 
+       uint16_t buf_len;         /**< Length of segment buffer. */
        uint64_t ol_flags;        /**< Offload features. */
 
        /* remaining bytes are set on RX when pulling packet from
 * descriptor 

/Jerin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] mbuff rearm_data aligmenet issue on non x86
  2016-05-12  9:14 [dpdk-dev] mbuff rearm_data aligmenet issue on non x86 Jerin Jacob
@ 2016-05-12 10:07 ` Ananyev, Konstantin
  2016-05-12 12:17   ` Jerin Jacob
  0 siblings, 1 reply; 5+ messages in thread
From: Ananyev, Konstantin @ 2016-05-12 10:07 UTC (permalink / raw)
  To: Jerin Jacob, dev; +Cc: Richardson, Bruce, thomas.monjalon

Hi Jerrin,

> 
> Hi All,
> 
> I would like align mbuff rearm_data field to 8 byte aligned so that
> write to mbuf->rearm_data with uint64_t* will be naturally aligned.
> I am not sure about IA but some other architecture/implementation has overhead
> in non-naturally aligned stores.
> 
> Proposed patch is something like this below, But open for any change to
> make fit for all other architectures/platform.
> 
> Any thoughts ?
> 
> ➜ [master] [dpdk-master] $ git diff
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 529debb..5a917d0 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -733,10 +733,8 @@ struct rte_mbuf {
>         void *buf_addr;           /**< Virtual address of segment
> buffer. */
>         phys_addr_t buf_physaddr; /**< Physical address of segment
> buffer. */
> 
> -       uint16_t buf_len;         /**< Length of segment buffer. */
> -


There is no need to move buf_len itself, I think.
Just move rearm_data marker prior to buf_len is enough.
Though how do you suggest to deal with the fact, that right now we blindly
update the whole 64bits pointed by rearm_data:

drivers/net/ixgbe/ixgbe_rxtx_vec.c:
	/*
                 * Flush mbuf with pkt template.
                 * Data to be rearmed is 6 bytes long.
                 * Though, RX will overwrite ol_flags that are coming next
                 * anyway. So overwrite whole 8 bytes with one load:
                 * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
                 */
                p0 = (uintptr_t)&mb0->rearm_data;
                *(uint64_t *)p0 = rxq->mbuf_initializer;

?

If buf_len will be inside these 64bits, we can't do it anymore.

Are you suggesting something like:

uint64_t *p0, v0; 

p0 = &mb0->rearm_data;
v0 = *p0 & REARM_MASK;
*p0 = v0 | rxq->mbuf_initializer;
? 

If so I wonder what would be the performance impact of that change.
Konstantin


>         /* next 6 bytes are initialised on RX descriptor rearm */
> -       MARKER8 rearm_data;
> +       MARKER64 rearm_data;
>         uint16_t data_off;
> 
>         /**
> @@ -754,6 +752,7 @@ struct rte_mbuf {
>         uint8_t nb_segs;          /**< Number of segments. */
>         uint8_t port;             /**< Input port. */
> 
> +       uint16_t buf_len;         /**< Length of segment buffer. */
>         uint64_t ol_flags;        /**< Offload features. */
> 
>         /* remaining bytes are set on RX when pulling packet from
>  * descriptor
> 
> /Jerin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] mbuff rearm_data aligmenet issue on non x86
  2016-05-12 10:07 ` Ananyev, Konstantin
@ 2016-05-12 12:17   ` Jerin Jacob
  2016-05-12 13:14     ` Ananyev, Konstantin
  0 siblings, 1 reply; 5+ messages in thread
From: Jerin Jacob @ 2016-05-12 12:17 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Richardson, Bruce, thomas.monjalon

On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote:
> Hi Jerrin,
> 
> > 
> > Hi All,
> > 
> > I would like align mbuff rearm_data field to 8 byte aligned so that
> > write to mbuf->rearm_data with uint64_t* will be naturally aligned.
> > I am not sure about IA but some other architecture/implementation has overhead
> > in non-naturally aligned stores.
> > 
> > Proposed patch is something like this below, But open for any change to
> > make fit for all other architectures/platform.
> > 
> > Any thoughts ?
> > 
> > ➜ [master] [dpdk-master] $ git diff
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 529debb..5a917d0 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -733,10 +733,8 @@ struct rte_mbuf {
> >         void *buf_addr;           /**< Virtual address of segment
> > buffer. */
> >         phys_addr_t buf_physaddr; /**< Physical address of segment
> > buffer. */
> > 
> > -       uint16_t buf_len;         /**< Length of segment buffer. */
> > -
> 
> 
> There is no need to move buf_len itself, I think.
> Just move rearm_data marker prior to buf_len is enough.
> Though how do you suggest to deal with the fact, that right now we blindly
> update the whole 64bits pointed by rearm_data:
> 
> drivers/net/ixgbe/ixgbe_rxtx_vec.c:
> 	/*
>                  * Flush mbuf with pkt template.
>                  * Data to be rearmed is 6 bytes long.
>                  * Though, RX will overwrite ol_flags that are coming next
>                  * anyway. So overwrite whole 8 bytes with one load:
>                  * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
>                  */
>                 p0 = (uintptr_t)&mb0->rearm_data;
>                 *(uint64_t *)p0 = rxq->mbuf_initializer;
> 
> ?
> 
> If buf_len will be inside these 64bits, we can't do it anymore.
> 
> Are you suggesting something like:
> 
> uint64_t *p0, v0; 
> 
> p0 = &mb0->rearm_data;
> v0 = *p0 & REARM_MASK;
> *p0 = v0 | rxq->mbuf_initializer;
> ? 

Due to unaligned rearm_data issue, In ThunderX platform, we need to write
multiple half word of aligned stores(so masking was better us).

But I think, if we can put 16bit hole between port and ol_flags then
we may not need the masking stuff in ixgbe. Right?

OR

Even better, if we can fill in a uint16_t variable which will replaced
later in the flow like "data_len"? and move buf_len at end the first
cache line? or any other thoughts to fix unaligned rearm_data issue?

Jerin



> 
> If so I wonder what would be the performance impact of that change.
> Konstantin
> 
> 
> >         /* next 6 bytes are initialised on RX descriptor rearm */
> > -       MARKER8 rearm_data;
> > +       MARKER64 rearm_data;
> >         uint16_t data_off;
> > 
> >         /**
> > @@ -754,6 +752,7 @@ struct rte_mbuf {
> >         uint8_t nb_segs;          /**< Number of segments. */
> >         uint8_t port;             /**< Input port. */
> > 
> > +       uint16_t buf_len;         /**< Length of segment buffer. */
> >         uint64_t ol_flags;        /**< Offload features. */
> > 
> >         /* remaining bytes are set on RX when pulling packet from
> >  * descriptor
> > 
> > /Jerin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] mbuff rearm_data aligmenet issue on non x86
  2016-05-12 12:17   ` Jerin Jacob
@ 2016-05-12 13:14     ` Ananyev, Konstantin
  2016-05-12 14:50       ` Jerin Jacob
  0 siblings, 1 reply; 5+ messages in thread
From: Ananyev, Konstantin @ 2016-05-12 13:14 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, Richardson, Bruce, thomas.monjalon

> 
> On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote:
> > Hi Jerrin,
> >
> > >
> > > Hi All,
> > >
> > > I would like align mbuff rearm_data field to 8 byte aligned so that
> > > write to mbuf->rearm_data with uint64_t* will be naturally aligned.
> > > I am not sure about IA but some other architecture/implementation has overhead
> > > in non-naturally aligned stores.
> > >
> > > Proposed patch is something like this below, But open for any change to
> > > make fit for all other architectures/platform.
> > >
> > > Any thoughts ?
> > >
> > > ➜ [master] [dpdk-master] $ git diff
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 529debb..5a917d0 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -733,10 +733,8 @@ struct rte_mbuf {
> > >         void *buf_addr;           /**< Virtual address of segment
> > > buffer. */
> > >         phys_addr_t buf_physaddr; /**< Physical address of segment
> > > buffer. */
> > >
> > > -       uint16_t buf_len;         /**< Length of segment buffer. */
> > > -
> >
> >
> > There is no need to move buf_len itself, I think.
> > Just move rearm_data marker prior to buf_len is enough.
> > Though how do you suggest to deal with the fact, that right now we blindly
> > update the whole 64bits pointed by rearm_data:
> >
> > drivers/net/ixgbe/ixgbe_rxtx_vec.c:
> > 	/*
> >                  * Flush mbuf with pkt template.
> >                  * Data to be rearmed is 6 bytes long.
> >                  * Though, RX will overwrite ol_flags that are coming next
> >                  * anyway. So overwrite whole 8 bytes with one load:
> >                  * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
> >                  */
> >                 p0 = (uintptr_t)&mb0->rearm_data;
> >                 *(uint64_t *)p0 = rxq->mbuf_initializer;
> >
> > ?
> >
> > If buf_len will be inside these 64bits, we can't do it anymore.
> >
> > Are you suggesting something like:
> >
> > uint64_t *p0, v0;
> >
> > p0 = &mb0->rearm_data;
> > v0 = *p0 & REARM_MASK;
> > *p0 = v0 | rxq->mbuf_initializer;
> > ?
> 
> Due to unaligned rearm_data issue, In ThunderX platform, we need to write
> multiple half word of aligned stores(so masking was better us).

Ok, so what would be the gain on ARM if you'll make that change?
Again, what would be the drop (if any) on IA?

> But I think, if we can put 16bit hole between port and ol_flags then
> we may not need the masking stuff in ixgbe. Right?

You mean move buf_len somewhere else (end of cacheline0) and 
introduce a 2B hole between port and ol_flags, right?
Yep, that probably wouldn't have any performance impact.

> 
> OR
> 
> Even better, if we can fill in a uint16_t variable which will replaced
> later in the flow like "data_len"?

data_len is grouped  with rx_descriptor_fields1 on purpose -
so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16B write.

Konstantin

> and move buf_len at end the first  cache line? 
>or any other thoughts to fix unaligned rearm_data issue?
> 
> Jerin
> 
> 
> 
> >
> > If so I wonder what would be the performance impact of that change.
> > Konstantin
> >
> >
> > >         /* next 6 bytes are initialised on RX descriptor rearm */
> > > -       MARKER8 rearm_data;
> > > +       MARKER64 rearm_data;
> > >         uint16_t data_off;
> > >
> > >         /**
> > > @@ -754,6 +752,7 @@ struct rte_mbuf {
> > >         uint8_t nb_segs;          /**< Number of segments. */
> > >         uint8_t port;             /**< Input port. */
> > >
> > > +       uint16_t buf_len;         /**< Length of segment buffer. */
> > >         uint64_t ol_flags;        /**< Offload features. */
> > >
> > >         /* remaining bytes are set on RX when pulling packet from
> > >  * descriptor
> > >
> > > /Jerin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] mbuff rearm_data aligmenet issue on non x86
  2016-05-12 13:14     ` Ananyev, Konstantin
@ 2016-05-12 14:50       ` Jerin Jacob
  0 siblings, 0 replies; 5+ messages in thread
From: Jerin Jacob @ 2016-05-12 14:50 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Richardson, Bruce, thomas.monjalon

On Thu, May 12, 2016 at 01:14:34PM +0000, Ananyev, Konstantin wrote:
> > 
> > On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote:
> > > Hi Jerrin,
> > >
> > > >
> > > > Hi All,
> > > >
> > > > I would like align mbuff rearm_data field to 8 byte aligned so that
> > > > write to mbuf->rearm_data with uint64_t* will be naturally aligned.
> > > > I am not sure about IA but some other architecture/implementation has overhead
> > > > in non-naturally aligned stores.
> > > >
> > > > Proposed patch is something like this below, But open for any change to
> > > > make fit for all other architectures/platform.
> > > >
> > > > Any thoughts ?
> > > >
> > > > ➜ [master] [dpdk-master] $ git diff
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > index 529debb..5a917d0 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -733,10 +733,8 @@ struct rte_mbuf {
> > > >         void *buf_addr;           /**< Virtual address of segment
> > > > buffer. */
> > > >         phys_addr_t buf_physaddr; /**< Physical address of segment
> > > > buffer. */
> > > >
> > > > -       uint16_t buf_len;         /**< Length of segment buffer. */
> > > > -
> > >
> > >
> > > There is no need to move buf_len itself, I think.
> > > Just move rearm_data marker prior to buf_len is enough.
> > > Though how do you suggest to deal with the fact, that right now we blindly
> > > update the whole 64bits pointed by rearm_data:
> > >
> > > drivers/net/ixgbe/ixgbe_rxtx_vec.c:
> > > 	/*
> > >                  * Flush mbuf with pkt template.
> > >                  * Data to be rearmed is 6 bytes long.
> > >                  * Though, RX will overwrite ol_flags that are coming next
> > >                  * anyway. So overwrite whole 8 bytes with one load:
> > >                  * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
> > >                  */
> > >                 p0 = (uintptr_t)&mb0->rearm_data;
> > >                 *(uint64_t *)p0 = rxq->mbuf_initializer;
> > >
> > > ?
> > >
> > > If buf_len will be inside these 64bits, we can't do it anymore.
> > >
> > > Are you suggesting something like:
> > >
> > > uint64_t *p0, v0;
> > >
> > > p0 = &mb0->rearm_data;
> > > v0 = *p0 & REARM_MASK;
> > > *p0 = v0 | rxq->mbuf_initializer;
> > > ?
> > 
> > Due to unaligned rearm_data issue, In ThunderX platform, we need to write
> > multiple half word of aligned stores(so masking was better us).
> 
> Ok, so what would be the gain on ARM if you'll make that change?

~4 cpu cycles per packet.Again it may not be ARM architecture specific
as ARM architecture does not define  instruction latency so it is more of a
implementation specific data.

> Again, what would be the drop (if any) on IA?
> 
> > But I think, if we can put 16bit hole between port and ol_flags then
> > we may not need the masking stuff in ixgbe. Right?
> 
> You mean move buf_len somewhere else (end of cacheline0) and 
> introduce a 2B hole between port and ol_flags, right?

Yes

> Yep, that probably wouldn't have any performance impact.

I will try two options and send a patch which don't have any
performance impact on IA.

> 
> > 
> > OR
> > 
> > Even better, if we can fill in a uint16_t variable which will replaced
> > later in the flow like "data_len"?
> 
> data_len is grouped  with rx_descriptor_fields1 on purpose -
> so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16B write.

OK

> 
> Konstantin
> 
> > and move buf_len at end the first  cache line? 
> >or any other thoughts to fix unaligned rearm_data issue?
> > 
> > Jerin
> > 
> > 
> > 
> > >
> > > If so I wonder what would be the performance impact of that change.
> > > Konstantin
> > >
> > >
> > > >         /* next 6 bytes are initialised on RX descriptor rearm */
> > > > -       MARKER8 rearm_data;
> > > > +       MARKER64 rearm_data;
> > > >         uint16_t data_off;
> > > >
> > > >         /**
> > > > @@ -754,6 +752,7 @@ struct rte_mbuf {
> > > >         uint8_t nb_segs;          /**< Number of segments. */
> > > >         uint8_t port;             /**< Input port. */
> > > >
> > > > +       uint16_t buf_len;         /**< Length of segment buffer. */
> > > >         uint64_t ol_flags;        /**< Offload features. */
> > > >
> > > >         /* remaining bytes are set on RX when pulling packet from
> > > >  * descriptor
> > > >
> > > > /Jerin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-12 14:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12  9:14 [dpdk-dev] mbuff rearm_data aligmenet issue on non x86 Jerin Jacob
2016-05-12 10:07 ` Ananyev, Konstantin
2016-05-12 12:17   ` Jerin Jacob
2016-05-12 13:14     ` Ananyev, Konstantin
2016-05-12 14:50       ` Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).