DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets
@ 2015-12-06 15:59 Jerin Jacob
  2015-12-06 15:59 ` [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue " Jerin Jacob
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-06 15:59 UTC (permalink / raw)
  To: dev

This patchset fixes performance/cache resource issues with 128-byte cache line targets
found in mbuf and bitmap DPDK libraries

Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
128-bytes cache line size target.

This patchset doesn't introduce any performance degradation
for 64-bytes cache line size targets.

Jerin Jacob (2):
  mbuf: fix performance/cache resource issue with 128-byte cache line
    targets
  bitmap: optimize for 128-bytes cache line targets

 app/test/test_mbuf.c                                         |  4 ++++
 .../linuxapp/eal/include/exec-env/rte_kni_common.h           |  4 ++++
 lib/librte_mbuf/rte_mbuf.h                                   |  2 ++
 lib/librte_sched/rte_bitmap.h                                | 12 +++++++++---
 4 files changed, 19 insertions(+), 3 deletions(-)

--
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-06 15:59 [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
@ 2015-12-06 15:59 ` Jerin Jacob
  2015-12-07 15:21   ` Ananyev, Konstantin
  2015-12-06 15:59 ` [dpdk-dev] [PATCH 2/2] bitmap: optimize for 128-bytes " Jerin Jacob
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-06 15:59 UTC (permalink / raw)
  To: dev

No need to split mbuf structure to two cache lines for 128-byte cache line
size targets as it can fit on a single 128-byte cache line.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 app/test/test_mbuf.c                                          | 4 ++++
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
 lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index b32bef6..5e21075 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
 static int
 test_mbuf(void)
 {
+#if RTE_CACHE_LINE_SIZE == 64
 	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
+#elif RTE_CACHE_LINE_SIZE == 128
+	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
+#endif
 
 	/* create pktmbuf pool if it does not exist */
 	if (pktmbuf_pool == NULL) {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index bd1cc09..e724af7 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -121,8 +121,12 @@ struct rte_kni_mbuf {
 	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 
+#if RTE_CACHE_LINE_SIZE == 64
 	/* fields on second cache line */
 	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
+#elif RTE_CACHE_LINE_SIZE == 128
+	char pad3[24];
+#endif
 	void *pool;
 	void *next;
 };
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..0bf55e0 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -813,8 +813,10 @@ struct rte_mbuf {
 
 	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
 
+#if RTE_CACHE_LINE_SIZE == 64
 	/* second cache line - fields only used in slow path or on TX */
 	MARKER cacheline1 __rte_cache_aligned;
+#endif
 
 	union {
 		void *userdata;   /**< Can be used for external metadata */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 2/2] bitmap: optimize for 128-bytes cache line targets
  2015-12-06 15:59 [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2015-12-06 15:59 ` [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue " Jerin Jacob
@ 2015-12-06 15:59 ` Jerin Jacob
  2015-12-06 16:30 ` [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte " Thomas Monjalon
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
  3 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-06 15:59 UTC (permalink / raw)
  To: dev

existing rte_bitmap library implementation optimally configured to run on
64-bytes cache line, extending to 128-bytes cache line targets.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_sched/rte_bitmap.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/librte_sched/rte_bitmap.h b/lib/librte_sched/rte_bitmap.h
index 3d911e4..e6cf26e 100644
--- a/lib/librte_sched/rte_bitmap.h
+++ b/lib/librte_sched/rte_bitmap.h
@@ -45,10 +45,10 @@ extern "C" {
  * The bitmap component provides a mechanism to manage large arrays of bits
  * through bit get/set/clear and bit array scan operations.
  *
- * The bitmap scan operation is optimized for 64-bit CPUs using 64-byte cache
+ * The bitmap scan operation is optimized for 64-bit CPUs using 64/128 byte cache
  * lines. The bitmap is hierarchically organized using two arrays (array1 and
  * array2), with each bit in array1 being associated with a full cache line
- * (512 bits) of bitmap bits, which are stored in array2: the bit in array1 is
+ * (512/1024 bits) of bitmap bits, which are stored in array2: the bit in array1 is
  * set only when there is at least one bit set within its associated array2
  * bits, otherwise the bit in array1 is cleared. The read and write operations
  * for array1 and array2 are always done in slabs of 64 bits.
@@ -81,11 +81,17 @@ extern "C" {
 
 /* Cache line (CL) */
 #define RTE_BITMAP_CL_BIT_SIZE                   (RTE_CACHE_LINE_SIZE * 8)
+
+#if RTE_CACHE_LINE_SIZE == 64
 #define RTE_BITMAP_CL_BIT_SIZE_LOG2              9
+#elif RTE_CACHE_LINE_SIZE == 128
+#define RTE_BITMAP_CL_BIT_SIZE_LOG2              10
+#endif
+
 #define RTE_BITMAP_CL_BIT_MASK                   (RTE_BITMAP_CL_BIT_SIZE - 1)
 
 #define RTE_BITMAP_CL_SLAB_SIZE                  (RTE_BITMAP_CL_BIT_SIZE / RTE_BITMAP_SLAB_BIT_SIZE)
-#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             3
+#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             (RTE_BITMAP_CL_BIT_SIZE_LOG2 - RTE_BITMAP_SLAB_BIT_SIZE_LOG2)
 #define RTE_BITMAP_CL_SLAB_MASK                  (RTE_BITMAP_CL_SLAB_SIZE - 1)
 
 /** Bitmap data structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-06 15:59 [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2015-12-06 15:59 ` [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue " Jerin Jacob
  2015-12-06 15:59 ` [dpdk-dev] [PATCH 2/2] bitmap: optimize for 128-bytes " Jerin Jacob
@ 2015-12-06 16:30 ` Thomas Monjalon
  2015-12-07  7:26   ` Jerin Jacob
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
  3 siblings, 1 reply; 40+ messages in thread
From: Thomas Monjalon @ 2015-12-06 16:30 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

2015-12-06 21:29, Jerin Jacob:
> This patchset fixes performance/cache resource issues with 128-byte cache line targets
> found in mbuf and bitmap DPDK libraries
> 
> Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> 128-bytes cache line size target.

When introducing IBM Power8, we failed to clean the cache line size definition.
I promised to not forget this issue in this thread with Neil:
	http://dpdk.org/ml/archives/dev/2014-December/009439.html

It is defined in
	config/defconfig_*
	mk/machine/*/rte.vars.mk
	mk/arch/*/rte.vars.mk
	rte_memory.h
	rte_kni_common.h

It should be defined only in the config files.
When we will introduce a configure script, we should be able to detect it.

Please Jerin, as ThunderX maintainer, may you help to fix this old mess?
Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-06 16:30 ` [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte " Thomas Monjalon
@ 2015-12-07  7:26   ` Jerin Jacob
  2015-12-07 11:40     ` Thomas Monjalon
  0 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-07  7:26 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Sun, Dec 06, 2015 at 05:30:50PM +0100, Thomas Monjalon wrote:
> 2015-12-06 21:29, Jerin Jacob:
> > This patchset fixes performance/cache resource issues with 128-byte cache line targets
> > found in mbuf and bitmap DPDK libraries
> > 
> > Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> > 128-bytes cache line size target.
> 
> When introducing IBM Power8, we failed to clean the cache line size definition.
> I promised to not forget this issue in this thread with Neil:
> 	http://dpdk.org/ml/archives/dev/2014-December/009439.html
> 
> It is defined in
> 	config/defconfig_*
> 	mk/machine/*/rte.vars.mk
> 	mk/arch/*/rte.vars.mk
> 	rte_memory.h
> 	rte_kni_common.h
> 
> It should be defined only in the config files.
> When we will introduce a configure script, we should be able to detect it.
> 
> Please Jerin, as ThunderX maintainer, may you help to fix this old mess?

Yes Thomas, I will takeup this issue when we will have configure script.
apart from that, content of the this patch will be still valid
as the fix going to be generating cache line define from the config file.

Jerin

> Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-07  7:26   ` Jerin Jacob
@ 2015-12-07 11:40     ` Thomas Monjalon
  2015-12-07 14:33       ` Jerin Jacob
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Monjalon @ 2015-12-07 11:40 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

2015-12-07 12:56, Jerin Jacob:
> On Sun, Dec 06, 2015 at 05:30:50PM +0100, Thomas Monjalon wrote:
> > 2015-12-06 21:29, Jerin Jacob:
> > > This patchset fixes performance/cache resource issues with 128-byte cache line targets
> > > found in mbuf and bitmap DPDK libraries
> > > 
> > > Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> > > 128-bytes cache line size target.
> > 
> > When introducing IBM Power8, we failed to clean the cache line size definition.
> > I promised to not forget this issue in this thread with Neil:
> > 	http://dpdk.org/ml/archives/dev/2014-December/009439.html
> > 
> > It is defined in
> > 	config/defconfig_*
> > 	mk/machine/*/rte.vars.mk
> > 	mk/arch/*/rte.vars.mk
> > 	rte_memory.h
> > 	rte_kni_common.h
> > 
> > It should be defined only in the config files.
> > When we will introduce a configure script, we should be able to detect it.
> > 
> > Please Jerin, as ThunderX maintainer, may you help to fix this old mess?
> 
> Yes Thomas, I will takeup this issue when we will have configure script.

I thought we could start setting the value in only one place.
The detection in configure script would be another step.

> apart from that, content of the this patch will be still valid
> as the fix going to be generating cache line define from the config file.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-07 11:40     ` Thomas Monjalon
@ 2015-12-07 14:33       ` Jerin Jacob
  2015-12-07 14:35         ` Thomas Monjalon
  0 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-07 14:33 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Mon, Dec 07, 2015 at 03:40:13AM -0800, Thomas Monjalon wrote:
> 2015-12-07 12:56, Jerin Jacob:
> > On Sun, Dec 06, 2015 at 05:30:50PM +0100, Thomas Monjalon wrote:
> > > 2015-12-06 21:29, Jerin Jacob:
> > > > This patchset fixes performance/cache resource issues with 128-byte cache line targets
> > > > found in mbuf and bitmap DPDK libraries
> > > > 
> > > > Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> > > > 128-bytes cache line size target.
> > > 
> > > When introducing IBM Power8, we failed to clean the cache line size definition.
> > > I promised to not forget this issue in this thread with Neil:
> > > 	http://dpdk.org/ml/archives/dev/2014-December/009439.html
> > > 
> > > It is defined in
> > > 	config/defconfig_*
> > > 	mk/machine/*/rte.vars.mk
> > > 	mk/arch/*/rte.vars.mk
> > > 	rte_memory.h
> > > 	rte_kni_common.h
> > > 
> > > It should be defined only in the config files.
> > > When we will introduce a configure script, we should be able to detect it.
> > > 
> > > Please Jerin, as ThunderX maintainer, may you help to fix this old mess?
> > 
> > Yes Thomas, I will takeup this issue when we will have configure script.
> 
> I thought we could start setting the value in only one place.
> The detection in configure script would be another step.

OK Thomas, I have sent the cleanup patch. Please review it.

Jerin

> 
> > apart from that, content of the this patch will be still valid
> > as the fix going to be generating cache line define from the config file.
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-07 14:33       ` Jerin Jacob
@ 2015-12-07 14:35         ` Thomas Monjalon
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Monjalon @ 2015-12-07 14:35 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

2015-12-07 20:03, Jerin Jacob:
> On Mon, Dec 07, 2015 at 03:40:13AM -0800, Thomas Monjalon wrote:
> > 2015-12-07 12:56, Jerin Jacob:
> > > On Sun, Dec 06, 2015 at 05:30:50PM +0100, Thomas Monjalon wrote:
> > > > 2015-12-06 21:29, Jerin Jacob:
> > > > > This patchset fixes performance/cache resource issues with 128-byte cache line targets
> > > > > found in mbuf and bitmap DPDK libraries
> > > > > 
> > > > > Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> > > > > 128-bytes cache line size target.
> > > > 
> > > > When introducing IBM Power8, we failed to clean the cache line size definition.
> > > > I promised to not forget this issue in this thread with Neil:
> > > > 	http://dpdk.org/ml/archives/dev/2014-December/009439.html
> > > > 
> > > > It is defined in
> > > > 	config/defconfig_*
> > > > 	mk/machine/*/rte.vars.mk
> > > > 	mk/arch/*/rte.vars.mk
> > > > 	rte_memory.h
> > > > 	rte_kni_common.h
> > > > 
> > > > It should be defined only in the config files.
> > > > When we will introduce a configure script, we should be able to detect it.
> > > > 
> > > > Please Jerin, as ThunderX maintainer, may you help to fix this old mess?
> > > 
> > > Yes Thomas, I will takeup this issue when we will have configure script.
> > 
> > I thought we could start setting the value in only one place.
> > The detection in configure script would be another step.
> 
> OK Thomas, I have sent the cleanup patch. Please review it.

You are too fast :)
I will review it but it will be deferred to 2.3.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-06 15:59 ` [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue " Jerin Jacob
@ 2015-12-07 15:21   ` Ananyev, Konstantin
  2015-12-08 12:45     ` Jerin Jacob
  0 siblings, 1 reply; 40+ messages in thread
From: Ananyev, Konstantin @ 2015-12-07 15:21 UTC (permalink / raw)
  To: Jerin Jacob, dev

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Sunday, December 06, 2015 3:59 PM
> To: dev@dpdk.org
> Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian; Ananyev, Konstantin; Jerin
> Jacob
> Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> 
> No need to split mbuf structure to two cache lines for 128-byte cache line
> size targets as it can fit on a single 128-byte cache line.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  app/test/test_mbuf.c                                          | 4 ++++
>  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
>  lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
>  3 files changed, 10 insertions(+)
> 
> diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> index b32bef6..5e21075 100644
> --- a/app/test/test_mbuf.c
> +++ b/app/test/test_mbuf.c
> @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
>  static int
>  test_mbuf(void)
>  {
> +#if RTE_CACHE_LINE_SIZE == 64
>  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> +#elif RTE_CACHE_LINE_SIZE == 128
> +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
> +#endif
> 
>  	/* create pktmbuf pool if it does not exist */
>  	if (pktmbuf_pool == NULL) {
> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> env/rte_kni_common.h
> index bd1cc09..e724af7 100644
> --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
>  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
>  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> 
> +#if RTE_CACHE_LINE_SIZE == 64
>  	/* fields on second cache line */
>  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> +#elif RTE_CACHE_LINE_SIZE == 128
> +	char pad3[24];
> +#endif
>  	void *pool;
>  	void *next;
>  };
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f234ac9..0bf55e0 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -813,8 +813,10 @@ struct rte_mbuf {
> 
>  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
> 
> +#if RTE_CACHE_LINE_SIZE == 64
>  	/* second cache line - fields only used in slow path or on TX */
>  	MARKER cacheline1 __rte_cache_aligned;
> +#endif

I suppose you'll need to keep same space reserved for first 64B even on systems with 128B cache-line.
Otherwise we can endup with different mbuf format for systems with 128B cache-line.
Another thing - now we have __rte_cache_aligned all over the places, and I don't know is to double 
sizes of all these structures is a good idea. 
Again,  #if RTE_CACHE_LINE_SIZE == 64 ... all over the places looks a bit clumsy.
Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE architecture specific,
but introduce RTE_CACHE_MIN_LINE_SIZE(==64)/ __rte_cache_min_aligned and used it for mbuf
(and might be other places).
Konstantin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-07 15:21   ` Ananyev, Konstantin
@ 2015-12-08 12:45     ` Jerin Jacob
  2015-12-08 16:07       ` Ananyev, Konstantin
  0 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-08 12:45 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Mon, Dec 07, 2015 at 03:21:33PM +0000, Ananyev, Konstantin wrote:

Hi Konstantin,

> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Sunday, December 06, 2015 3:59 PM
> > To: dev@dpdk.org
> > Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian; Ananyev, Konstantin; Jerin
> > Jacob
> > Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> >
> > No need to split mbuf structure to two cache lines for 128-byte cache line
> > size targets as it can fit on a single 128-byte cache line.
> >
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  app/test/test_mbuf.c                                          | 4 ++++
> >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
> >  lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
> >  3 files changed, 10 insertions(+)
> >
> > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > index b32bef6..5e21075 100644
> > --- a/app/test/test_mbuf.c
> > +++ b/app/test/test_mbuf.c
> > @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
> >  static int
> >  test_mbuf(void)
> >  {
> > +#if RTE_CACHE_LINE_SIZE == 64
> >  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> > +#elif RTE_CACHE_LINE_SIZE == 128
> > +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
> > +#endif
> >
> >  	/* create pktmbuf pool if it does not exist */
> >  	if (pktmbuf_pool == NULL) {
> > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> > env/rte_kni_common.h
> > index bd1cc09..e724af7 100644
> > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
> >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> >  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> >
> > +#if RTE_CACHE_LINE_SIZE == 64
> >  	/* fields on second cache line */
> >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > +#elif RTE_CACHE_LINE_SIZE == 128
> > +	char pad3[24];
> > +#endif
> >  	void *pool;
> >  	void *next;
> >  };
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index f234ac9..0bf55e0 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -813,8 +813,10 @@ struct rte_mbuf {
> >
> >  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
> >
> > +#if RTE_CACHE_LINE_SIZE == 64
> >  	/* second cache line - fields only used in slow path or on TX */
> >  	MARKER cacheline1 __rte_cache_aligned;
> > +#endif
>
> I suppose you'll need to keep same space reserved for first 64B even on systems with 128B cache-line.
> Otherwise we can endup with different mbuf format for systems with 128B cache-line.

Just to understand, Is there any issue in mbuf format being different
across the systems. I think, we are not sending the mbuf over the wire
or sharing with different system etc. right?

Yes, I do understand the KNI dependency with mbuf.

> Another thing - now we have __rte_cache_aligned all over the places, and I don't know is to double
> sizes of all these structures is a good idea.

I thought so, the only concern I have, what if, the struct split to 64
and one cache line is shared between two core/two different structs which have
the different type of operation(most likely). One extensive write and other one
read, The write makes the line dirty start evicting and read core is
going to suffer. Any thoughts?

If its tradeoff between amount memory and performance, I think, it makes sense
to stick the performance in data plane, Hence split option may be not useful?
right?


> Again,  #if RTE_CACHE_LINE_SIZE == 64 ... all over the places looks a bit clumsy.
> Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE architecture specific,

I think, its architecture specific now

> but introduce RTE_CACHE_MIN_LINE_SIZE(==64)/ __rte_cache_min_aligned and used it for mbuf
> (and might be other places).

Yes, it will help in this specific mbuf case and it make sense
if mbuf going to stay within 2 x 64 CL.

AND/OR

can we introduce something like below to reduce the clutter in
other places, macro name is just not correct, trying to share the view

#define rte_cacheline_diff(for_64, for_128)\
do {\
#if RTE_CACHE_LINE_SIZE == 64\
for_64;
#elif RTE_CACHE_LINE_SIZE == 128\
for_128;\
#endif

OR

Typedef struct rte_mbuf to new abstract type and define for 64 bytes and
128 byte

Jerin

> Konstantin
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-08 12:45     ` Jerin Jacob
@ 2015-12-08 16:07       ` Ananyev, Konstantin
  2015-12-08 17:49         ` Jerin Jacob
  0 siblings, 1 reply; 40+ messages in thread
From: Ananyev, Konstantin @ 2015-12-08 16:07 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

> 
> Hi Konstantin,
> 
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Sunday, December 06, 2015 3:59 PM
> > > To: dev@dpdk.org
> > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian; Ananyev, Konstantin; Jerin
> > > Jacob
> > > Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> > >
> > > No need to split mbuf structure to two cache lines for 128-byte cache line
> > > size targets as it can fit on a single 128-byte cache line.
> > >
> > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > >  app/test/test_mbuf.c                                          | 4 ++++
> > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
> > >  lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
> > >  3 files changed, 10 insertions(+)
> > >
> > > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > > index b32bef6..5e21075 100644
> > > --- a/app/test/test_mbuf.c
> > > +++ b/app/test/test_mbuf.c
> > > @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
> > >  static int
> > >  test_mbuf(void)
> > >  {
> > > +#if RTE_CACHE_LINE_SIZE == 64
> > >  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
> > > +#endif
> > >
> > >  	/* create pktmbuf pool if it does not exist */
> > >  	if (pktmbuf_pool == NULL) {
> > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> > > env/rte_kni_common.h
> > > index bd1cc09..e724af7 100644
> > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
> > >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> > >  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> > >
> > > +#if RTE_CACHE_LINE_SIZE == 64
> > >  	/* fields on second cache line */
> > >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > +	char pad3[24];
> > > +#endif
> > >  	void *pool;
> > >  	void *next;
> > >  };
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index f234ac9..0bf55e0 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -813,8 +813,10 @@ struct rte_mbuf {
> > >
> > >  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
> > >
> > > +#if RTE_CACHE_LINE_SIZE == 64
> > >  	/* second cache line - fields only used in slow path or on TX */
> > >  	MARKER cacheline1 __rte_cache_aligned;
> > > +#endif
> >
> > I suppose you'll need to keep same space reserved for first 64B even on systems with 128B cache-line.
> > Otherwise we can endup with different mbuf format for systems with 128B cache-line.
> 
> Just to understand, Is there any issue in mbuf format being different
> across the systems. I think, we are not sending the mbuf over the wire
> or sharing with different system etc. right?

No, we don't have to support that.
At least I am not aware about such cases. 

> 
> Yes, I do understand the KNI dependency with mbuf.

Are you asking about what will be broken (except KNI) if mbuf layout IA and ARM would be different?
Probably nothing right now, except vector RX/TX.
But they are not supported on ARM anyway, and if someone will implement them in future, it
might be completely different from IA one.   
It just seems wrong to me to have different mbuf layout for each architecture.

> 
> > Another thing - now we have __rte_cache_aligned all over the places, and I don't know is to double
> > sizes of all these structures is a good idea.
> 
> I thought so, the only concern I have, what if, the struct split to 64
> and one cache line is shared between two core/two different structs which have
> the different type of operation(most likely). One extensive write and other one
> read, The write makes the line dirty start evicting and read core is
> going to suffer. Any thoughts?
> 
> If its tradeoff between amount memory and performance, I think, it makes sense
> to stick the performance in data plane, Hence split option may be not useful?
> right?

I understand that for most cases you would like to have your data structures CL aligned -
to avoid performance penalties.
I just suggest to have RTE_CACHE_MIN_LINE_SIZE(==64) for few cases when it might be plausible.
As an example:
struct rte_mbuf {
	...
	MARKER cacheline1 __rte_cache_min_aligned;
	...
} _rte_cache_aligned;

So we would have mbuf with the same size and layout, but different alignment for IA and ARM.

Another example, where RTE_CACHE_MIN_LINE_SIZE  could be used:
struct rte_eth_(rxq|txq)_info.
There is no real need to have them 128B aligned for ARM. 
The main purpose why they were defined as '__rte_cache_aligned' -
just to reserve some space for future expansion.

Konstantin

> 
> 
> > Again,  #if RTE_CACHE_LINE_SIZE == 64 ... all over the places looks a bit clumsy.
> > Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE architecture specific,
> 
> I think, its architecture specific now
> 
> > but introduce RTE_CACHE_MIN_LINE_SIZE(==64)/ __rte_cache_min_aligned and used it for mbuf
> > (and might be other places).
> 
> Yes, it will help in this specific mbuf case and it make sense
> if mbuf going to stay within 2 x 64 CL.
> 
> AND/OR
> 
> can we introduce something like below to reduce the clutter in
> other places, macro name is just not correct, trying to share the view
> 
> #define rte_cacheline_diff(for_64, for_128)\
> do {\
> #if RTE_CACHE_LINE_SIZE == 64\
> for_64;
> #elif RTE_CACHE_LINE_SIZE == 128\
> for_128;\
> #endif
> 
> OR
> 
> Typedef struct rte_mbuf to new abstract type and define for 64 bytes and
> 128 byte
> 
> Jerin
> 
> > Konstantin
> >
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-08 16:07       ` Ananyev, Konstantin
@ 2015-12-08 17:49         ` Jerin Jacob
  2015-12-09 13:44           ` Ananyev, Konstantin
  0 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-08 17:49 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Tue, Dec 08, 2015 at 04:07:46PM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Konstantin,
> >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Sunday, December 06, 2015 3:59 PM
> > > > To: dev@dpdk.org
> > > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian; Ananyev, Konstantin; Jerin
> > > > Jacob
> > > > Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> > > >
> > > > No need to split mbuf structure to two cache lines for 128-byte cache line
> > > > size targets as it can fit on a single 128-byte cache line.
> > > >
> > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > ---
> > > >  app/test/test_mbuf.c                                          | 4 ++++
> > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
> > > >  lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
> > > >  3 files changed, 10 insertions(+)
> > > >
> > > > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > > > index b32bef6..5e21075 100644
> > > > --- a/app/test/test_mbuf.c
> > > > +++ b/app/test/test_mbuf.c
> > > > @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
> > > >  static int
> > > >  test_mbuf(void)
> > > >  {
> > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > >  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> > > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > > +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
> > > > +#endif
> > > >
> > > >  	/* create pktmbuf pool if it does not exist */
> > > >  	if (pktmbuf_pool == NULL) {
> > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> > > > env/rte_kni_common.h
> > > > index bd1cc09..e724af7 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > > @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
> > > >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> > > >  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> > > >
> > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > >  	/* fields on second cache line */
> > > >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > > +	char pad3[24];
> > > > +#endif
> > > >  	void *pool;
> > > >  	void *next;
> > > >  };
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > index f234ac9..0bf55e0 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -813,8 +813,10 @@ struct rte_mbuf {
> > > >
> > > >  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
> > > >
> > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > >  	/* second cache line - fields only used in slow path or on TX */
> > > >  	MARKER cacheline1 __rte_cache_aligned;
> > > > +#endif
> > >
> > > I suppose you'll need to keep same space reserved for first 64B even on systems with 128B cache-line.
> > > Otherwise we can endup with different mbuf format for systems with 128B cache-line.
> >
> > Just to understand, Is there any issue in mbuf format being different
> > across the systems. I think, we are not sending the mbuf over the wire
> > or sharing with different system etc. right?
>
> No, we don't have to support that.
> At least I am not aware about such cases.
>
> >
> > Yes, I do understand the KNI dependency with mbuf.
>
> Are you asking about what will be broken (except KNI) if mbuf layout IA and ARM would be different?
> Probably nothing right now, except vector RX/TX.
> But they are not supported on ARM anyway, and if someone will implement them in future, it
> might be completely different from IA one.
> It just seems wrong to me to have different mbuf layout for each architecture.

It's not architecture specific, it's machine and PMD specific.
Typical ARM machines are 64-bytes CL but ThunderX and Power8  have
128-byte CL.

It's PMD specific also, There are some NIC's which can write application
interested fields before the packet in DDR, typically one CL size is dedicated
for that.
So there is an overhead to emulate the standard mbuf layout(Which the
application shouldn't care about) i.e
- reserve the space for generic mbuf layout
- reserve the space for HW mbuf write
- on packet receive, copy the content from HW mbuf space to generic
  buf layout(space and additional cache misses for each packet, bad :-( )

So, It's critical to abstract the mbuf to support such HW capable NICs.
The application should be interested in the fields of mbuf, not the
actual layout.Maybe we can take up this with external mem pool manager.

>
> >
> > > Another thing - now we have __rte_cache_aligned all over the places, and I don't know is to double
> > > sizes of all these structures is a good idea.
> >
> > I thought so, the only concern I have, what if, the struct split to 64
> > and one cache line is shared between two core/two different structs which have
> > the different type of operation(most likely). One extensive write and other one
> > read, The write makes the line dirty start evicting and read core is
> > going to suffer. Any thoughts?
> >
> > If its tradeoff between amount memory and performance, I think, it makes sense
> > to stick the performance in data plane, Hence split option may be not useful?
> > right?
>
> I understand that for most cases you would like to have your data structures CL aligned -
> to avoid performance penalties.
> I just suggest to have RTE_CACHE_MIN_LINE_SIZE(==64) for few cases when it might be plausible.
> As an example:
> struct rte_mbuf {
> 	...
> 	MARKER cacheline1 __rte_cache_min_aligned;
> 	...
> } _rte_cache_aligned;

I agree(in last email also). I will send next revision based on this,
But kni muf definition, bitmap change we need to have some #define,
so I have proposed some scheme in the last email(See below)[1]. Any thoughts?

>
> So we would have mbuf with the same size and layout, but different alignment for IA and ARM.
>
> Another example, where RTE_CACHE_MIN_LINE_SIZE  could be used:
> struct rte_eth_(rxq|txq)_info.
> There is no real need to have them 128B aligned for ARM.
> The main purpose why they were defined as '__rte_cache_aligned' -
> just to reserve some space for future expansion.

makes sense

>
> Konstantin
>
> >
> >
> > > Again,  #if RTE_CACHE_LINE_SIZE == 64 ... all over the places looks a bit clumsy.
> > > Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE architecture specific,
> >
> > I think, its architecture specific now
> >
> > > but introduce RTE_CACHE_MIN_LINE_SIZE(==64)/ __rte_cache_min_aligned and used it for mbuf
> > > (and might be other places).
> >
> > Yes, it will help in this specific mbuf case and it make sense
> > if mbuf going to stay within 2 x 64 CL.
> >
> > AND/OR
> >
> > can we introduce something like below to reduce the clutter in
> > other places, macro name is just not correct, trying to share the view
> >
> > #define rte_cacheline_diff(for_64, for_128)\
> > do {\
> > #if RTE_CACHE_LINE_SIZE == 64\
> > for_64;
> > #elif RTE_CACHE_LINE_SIZE == 128\
> > for_128;\
> > #endif
> >
> > OR
> >
> > Typedef struct rte_mbuf to new abstract type and define for 64 bytes and
> > 128 byte

[1] some proposals list.

> >
> > Jerin
> >
> > > Konstantin
> > >
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-08 17:49         ` Jerin Jacob
@ 2015-12-09 13:44           ` Ananyev, Konstantin
  2015-12-09 14:49             ` Jerin Jacob
  0 siblings, 1 reply; 40+ messages in thread
From: Ananyev, Konstantin @ 2015-12-09 13:44 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev


Hi Jerin,

> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, December 08, 2015 5:49 PM
> To: Ananyev, Konstantin
> Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian
> Subject: Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> 
> On Tue, Dec 08, 2015 at 04:07:46PM +0000, Ananyev, Konstantin wrote:
> > >
> > > Hi Konstantin,
> > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > Sent: Sunday, December 06, 2015 3:59 PM
> > > > > To: dev@dpdk.org
> > > > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian; Ananyev, Konstantin;
> Jerin
> > > > > Jacob
> > > > > Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> > > > >
> > > > > No need to split mbuf structure to two cache lines for 128-byte cache line
> > > > > size targets as it can fit on a single 128-byte cache line.
> > > > >
> > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > ---
> > > > >  app/test/test_mbuf.c                                          | 4 ++++
> > > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
> > > > >  lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
> > > > >  3 files changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > > > > index b32bef6..5e21075 100644
> > > > > --- a/app/test/test_mbuf.c
> > > > > +++ b/app/test/test_mbuf.c
> > > > > @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
> > > > >  static int
> > > > >  test_mbuf(void)
> > > > >  {
> > > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > > >  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> > > > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > > > +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
> > > > > +#endif
> > > > >
> > > > >  	/* create pktmbuf pool if it does not exist */
> > > > >  	if (pktmbuf_pool == NULL) {
> > > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> > > > > env/rte_kni_common.h
> > > > > index bd1cc09..e724af7 100644
> > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > > > @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
> > > > >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> > > > >  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> > > > >
> > > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > > >  	/* fields on second cache line */
> > > > >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > > > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > > > +	char pad3[24];
> > > > > +#endif
> > > > >  	void *pool;
> > > > >  	void *next;
> > > > >  };
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > index f234ac9..0bf55e0 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -813,8 +813,10 @@ struct rte_mbuf {
> > > > >
> > > > >  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
> > > > >
> > > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > > >  	/* second cache line - fields only used in slow path or on TX */
> > > > >  	MARKER cacheline1 __rte_cache_aligned;
> > > > > +#endif
> > > >
> > > > I suppose you'll need to keep same space reserved for first 64B even on systems with 128B cache-line.
> > > > Otherwise we can endup with different mbuf format for systems with 128B cache-line.
> > >
> > > Just to understand, Is there any issue in mbuf format being different
> > > across the systems. I think, we are not sending the mbuf over the wire
> > > or sharing with different system etc. right?
> >
> > No, we don't have to support that.
> > At least I am not aware about such cases.
> >
> > >
> > > Yes, I do understand the KNI dependency with mbuf.
> >
> > Are you asking about what will be broken (except KNI) if mbuf layout IA and ARM would be different?
> > Probably nothing right now, except vector RX/TX.
> > But they are not supported on ARM anyway, and if someone will implement them in future, it
> > might be completely different from IA one.
> > It just seems wrong to me to have different mbuf layout for each architecture.
> 
> It's not architecture specific, it's machine and PMD specific.
> Typical ARM machines are 64-bytes CL but ThunderX and Power8  have
> 128-byte CL.

Ok, didn't know that. 
Thanks for clarification.

> 
> It's PMD specific also, There are some NIC's which can write application
> interested fields before the packet in DDR, typically one CL size is dedicated
> for that.
> So there is an overhead to emulate the standard mbuf layout(Which the
> application shouldn't care about) i.e
> - reserve the space for generic mbuf layout
> - reserve the space for HW mbuf write
> - on packet receive, copy the content from HW mbuf space to generic
>   buf layout(space and additional cache misses for each packet, bad :-( )

Each different NIC model has different format of HW descriptors
That's what each PMD have to do: read information from HW specific layout,
interpret it and fill mbuf. 
I suppose that's the price all of us have to pay.
Otherwise it would mean that DPDK app would be able to work only with one
NIC model and if you'll have to rebuild your app each time the underlying HW 
Is going to change.

> 
> So, It's critical to abstract the mbuf to support such HW capable NICs.
> The application should be interested in the fields of mbuf, not the
> actual layout.Maybe we can take up this with external mem pool manager.
> 
> >
> > >
> > > > Another thing - now we have __rte_cache_aligned all over the places, and I don't know is to double
> > > > sizes of all these structures is a good idea.
> > >
> > > I thought so, the only concern I have, what if, the struct split to 64
> > > and one cache line is shared between two core/two different structs which have
> > > the different type of operation(most likely). One extensive write and other one
> > > read, The write makes the line dirty start evicting and read core is
> > > going to suffer. Any thoughts?
> > >
> > > If its tradeoff between amount memory and performance, I think, it makes sense
> > > to stick the performance in data plane, Hence split option may be not useful?
> > > right?
> >
> > I understand that for most cases you would like to have your data structures CL aligned -
> > to avoid performance penalties.
> > I just suggest to have RTE_CACHE_MIN_LINE_SIZE(==64) for few cases when it might be plausible.
> > As an example:
> > struct rte_mbuf {
> > 	...
> > 	MARKER cacheline1 __rte_cache_min_aligned;
> > 	...
> > } _rte_cache_aligned;
> 
> I agree(in last email also). I will send next revision based on this,

Ok.

> But kni muf definition, bitmap change we need to have some #define,
> so I have proposed some scheme in the last email(See below)[1]. Any thoughts?

Probably I am missing something, but with RTE_CACHE_MIN_LINE_SIZE,
and keeping mbuf layout intact, why do we need:
#if RTE_CACHE_LINE_SIZE == 64\
for_64;
#elif RTE_CACHE_LINE_SIZE == 128\
for_128;\
#endif  
for rte_mbuf.h and friends at all?

Inside kni_common.h, we can change:
- char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
+ char pad3[8] __attribute__((__aligned__(RTE_CACHE_MIN_LINE_SIZE)));
To keep it in sync with rte_mbuf.h

Inside test_mbuf.c: 
- RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
+ RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_MIN_LINE_SIZE * 2);

For rte_bitmap.h, and similar stuff - if we'll have
CONFIG_RTE_CACHE_LINE_SIZE_LOG2 defined in the config file,
and will make  RTE_CACHE_LINE_SIZE derived from it,
then it would fix such problems?

Konstantin


> 
> >
> > So we would have mbuf with the same size and layout, but different alignment for IA and ARM.
> >
> > Another example, where RTE_CACHE_MIN_LINE_SIZE  could be used:
> > struct rte_eth_(rxq|txq)_info.
> > There is no real need to have them 128B aligned for ARM.
> > The main purpose why they were defined as '__rte_cache_aligned' -
> > just to reserve some space for future expansion.
> 
> makes sense
> 
> >
> > Konstantin
> >
> > >
> > >
> > > > Again,  #if RTE_CACHE_LINE_SIZE == 64 ... all over the places looks a bit clumsy.
> > > > Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE architecture specific,
> > >
> > > I think, its architecture specific now
> > >
> > > > but introduce RTE_CACHE_MIN_LINE_SIZE(==64)/ __rte_cache_min_aligned and used it for mbuf
> > > > (and might be other places).
> > >
> > > Yes, it will help in this specific mbuf case and it make sense
> > > if mbuf going to stay within 2 x 64 CL.
> > >
> > > AND/OR
> > >
> > > can we introduce something like below to reduce the clutter in
> > > other places, macro name is just not correct, trying to share the view
> > >
> > > #define rte_cacheline_diff(for_64, for_128)\
> > > do {\
> > > #if RTE_CACHE_LINE_SIZE == 64\
> > > for_64;
> > > #elif RTE_CACHE_LINE_SIZE == 128\
> > > for_128;\
> > > #endif
> > >
> > > OR
> > >
> > > Typedef struct rte_mbuf to new abstract type and define for 64 bytes and
> > > 128 byte
> 
> [1] some proposals list.
> 
> > >
> > > Jerin
> > >
> > > > Konstantin
> > > >
> > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-09 13:44           ` Ananyev, Konstantin
@ 2015-12-09 14:49             ` Jerin Jacob
  0 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-09 14:49 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Wed, Dec 09, 2015 at 01:44:44PM +0000, Ananyev, Konstantin wrote:

Hi Konstantin,

>
> Hi Jerin,
>
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Tuesday, December 08, 2015 5:49 PM
> > To: Ananyev, Konstantin
> > Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian
> > Subject: Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> >
> > On Tue, Dec 08, 2015 at 04:07:46PM +0000, Ananyev, Konstantin wrote:
> > > >
> > > > Hi Konstantin,
> > > >
> > > > > Hi Jerin,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > > Sent: Sunday, December 06, 2015 3:59 PM
> > > > > > To: dev@dpdk.org
> > > > > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6wind.com; Dumitrescu, Cristian; Ananyev, Konstantin;
> > Jerin
> > > > > > Jacob
> > > > > > Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> > > > > >
> > > > > > No need to split mbuf structure to two cache lines for 128-byte cache line
> > > > > > size targets as it can fit on a single 128-byte cache line.
> > > > > >
> > > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > > ---
> > > > > >  app/test/test_mbuf.c                                          | 4 ++++
> > > > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
> > > > > >  lib/librte_mbuf/rte_mbuf.h                                    | 2 ++
> > > > > >  3 files changed, 10 insertions(+)
> > > > > >
> > > > > > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > > > > > index b32bef6..5e21075 100644
> > > > > > --- a/app/test/test_mbuf.c
> > > > > > +++ b/app/test/test_mbuf.c
> > > > > > @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
> > > > > >  static int
> > > > > >  test_mbuf(void)
> > > > > >  {
> > > > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > > > >  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> > > > > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > > > > +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE);
> > > > > > +#endif
> > > > > >
> > > > > >  	/* create pktmbuf pool if it does not exist */
> > > > > >  	if (pktmbuf_pool == NULL) {
> > > > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> > > > > > env/rte_kni_common.h
> > > > > > index bd1cc09..e724af7 100644
> > > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > > > > @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
> > > > > >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> > > > > >  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> > > > > >
> > > > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > > > >  	/* fields on second cache line */
> > > > > >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > > > > > +#elif RTE_CACHE_LINE_SIZE == 128
> > > > > > +	char pad3[24];
> > > > > > +#endif
> > > > > >  	void *pool;
> > > > > >  	void *next;
> > > > > >  };
> > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > > index f234ac9..0bf55e0 100644
> > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > @@ -813,8 +813,10 @@ struct rte_mbuf {
> > > > > >
> > > > > >  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
> > > > > >
> > > > > > +#if RTE_CACHE_LINE_SIZE == 64
> > > > > >  	/* second cache line - fields only used in slow path or on TX */
> > > > > >  	MARKER cacheline1 __rte_cache_aligned;
> > > > > > +#endif
> > > > >
> > > > > I suppose you'll need to keep same space reserved for first 64B even on systems with 128B cache-line.
> > > > > Otherwise we can endup with different mbuf format for systems with 128B cache-line.
> > > >
> > > > Just to understand, Is there any issue in mbuf format being different
> > > > across the systems. I think, we are not sending the mbuf over the wire
> > > > or sharing with different system etc. right?
> > >
> > > No, we don't have to support that.
> > > At least I am not aware about such cases.
> > >
> > > >
> > > > Yes, I do understand the KNI dependency with mbuf.
> > >
> > > Are you asking about what will be broken (except KNI) if mbuf layout IA and ARM would be different?
> > > Probably nothing right now, except vector RX/TX.
> > > But they are not supported on ARM anyway, and if someone will implement them in future, it
> > > might be completely different from IA one.
> > > It just seems wrong to me to have different mbuf layout for each architecture.
> >
> > It's not architecture specific, it's machine and PMD specific.
> > Typical ARM machines are 64-bytes CL but ThunderX and Power8  have
> > 128-byte CL.
>
> Ok, didn't know that.
> Thanks for clarification.
>
> >
> > It's PMD specific also, There are some NIC's which can write application
> > interested fields before the packet in DDR, typically one CL size is dedicated
> > for that.
> > So there is an overhead to emulate the standard mbuf layout(Which the
> > application shouldn't care about) i.e
> > - reserve the space for generic mbuf layout
> > - reserve the space for HW mbuf write
> > - on packet receive, copy the content from HW mbuf space to generic
> >   buf layout(space and additional cache misses for each packet, bad :-( )
>
> Each different NIC model has different format of HW descriptors
> That's what each PMD have to do: read information from HW specific layout,
> interpret it and fill mbuf.
> I suppose that's the price all of us have to pay.
> Otherwise it would mean that DPDK app would be able to work only with one
> NIC model and if you'll have to rebuild your app each time the underlying HW
> Is going to change.

Yes, I understood your view.
But to accommodate Embedded NW hardware or
some enterprise NW version of HW which derived from Embedded version it
going to be VERY expensive.
I know certain HW that can write complete MBUF info with changing also in HW
(== libmbuf/libmempool kind of libraries its not required at all in fast
path)

So am just thinking, To support all the PMD device at runtime instead
of compilation time(the issue you have pointed out correctly), Why can't
we "attach" operation of libmuf/libmempool(like get the flags, put and get
the buffers etc) to a PMD as function pointers.

And if given platform/PMD doesn't have HW support then they can use
software based libmbuf/libmempool and attach to PMD driver and if it has
HW support then specific PMD can override those function pointers.

and application just uses the function pointer associated with PMD to
support multiple PMD's at runtime without losing the HW capability of
the pool and mbuf management in HW.

I think that will improve DPDK capability to deal the multiple
variety of NW HWs(Enterprise, Embedded or Hybrid)

>
> >
> > So, It's critical to abstract the mbuf to support such HW capable NICs.
> > The application should be interested in the fields of mbuf, not the
> > actual layout.Maybe we can take up this with external mem pool manager.
> >
> > >
> > > >
> > > > > Another thing - now we have __rte_cache_aligned all over the places, and I don't know is to double
> > > > > sizes of all these structures is a good idea.
> > > >
> > > > I thought so, the only concern I have, what if, the struct split to 64
> > > > and one cache line is shared between two core/two different structs which have
> > > > the different type of operation(most likely). One extensive write and other one
> > > > read, The write makes the line dirty start evicting and read core is
> > > > going to suffer. Any thoughts?
> > > >
> > > > If its tradeoff between amount memory and performance, I think, it makes sense
> > > > to stick the performance in data plane, Hence split option may be not useful?
> > > > right?
> > >
> > > I understand that for most cases you would like to have your data structures CL aligned -
> > > to avoid performance penalties.
> > > I just suggest to have RTE_CACHE_MIN_LINE_SIZE(==64) for few cases when it might be plausible.
> > > As an example:
> > > struct rte_mbuf {
> > > 	...
> > > 	MARKER cacheline1 __rte_cache_min_aligned;
> > > 	...
> > > } _rte_cache_aligned;
> >
> > I agree(in last email also). I will send next revision based on this,
>
> Ok.
>
> > But kni muf definition, bitmap change we need to have some #define,
> > so I have proposed some scheme in the last email(See below)[1]. Any thoughts?
>
> Probably I am missing something, but with RTE_CACHE_MIN_LINE_SIZE,
> and keeping mbuf layout intact, why do we need:
> #if RTE_CACHE_LINE_SIZE == 64\
> for_64;
> #elif RTE_CACHE_LINE_SIZE == 128\
> for_128;\
> #endif
> for rte_mbuf.h and friends at all?

Not required, I was trying a macro for common code. I think we live
without that, That's neat too. Will address in next version.

>
> Inside kni_common.h, we can change:
> - char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> + char pad3[8] __attribute__((__aligned__(RTE_CACHE_MIN_LINE_SIZE)));
> To keep it in sync with rte_mbuf.h
>
> Inside test_mbuf.c:
> - RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
> + RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_MIN_LINE_SIZE * 2);
>
> For rte_bitmap.h, and similar stuff - if we'll have
> CONFIG_RTE_CACHE_LINE_SIZE_LOG2 defined in the config file,
> and will make  RTE_CACHE_LINE_SIZE derived from it,
> then it would fix such problems?

makes sense. Will address in next version.

>
> Konstantin
>
>
> >
> > >
> > > So we would have mbuf with the same size and layout, but different alignment for IA and ARM.
> > >
> > > Another example, where RTE_CACHE_MIN_LINE_SIZE  could be used:
> > > struct rte_eth_(rxq|txq)_info.
> > > There is no real need to have them 128B aligned for ARM.
> > > The main purpose why they were defined as '__rte_cache_aligned' -
> > > just to reserve some space for future expansion.
> >
> > makes sense
> >
> > >
> > > Konstantin
> > >
> > > >
> > > >
> > > > > Again,  #if RTE_CACHE_LINE_SIZE == 64 ... all over the places looks a bit clumsy.
> > > > > Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE architecture specific,
> > > >
> > > > I think, its architecture specific now
> > > >
> > > > > but introduce RTE_CACHE_MIN_LINE_SIZE(==64)/ __rte_cache_min_aligned and used it for mbuf
> > > > > (and might be other places).
> > > >
> > > > Yes, it will help in this specific mbuf case and it make sense
> > > > if mbuf going to stay within 2 x 64 CL.
> > > >
> > > > AND/OR
> > > >
> > > > can we introduce something like below to reduce the clutter in
> > > > other places, macro name is just not correct, trying to share the view
> > > >
> > > > #define rte_cacheline_diff(for_64, for_128)\
> > > > do {\
> > > > #if RTE_CACHE_LINE_SIZE == 64\
> > > > for_64;
> > > > #elif RTE_CACHE_LINE_SIZE == 128\
> > > > for_128;\
> > > > #endif
> > > >
> > > > OR
> > > >
> > > > Typedef struct rte_mbuf to new abstract type and define for 64 bytes and
> > > > 128 byte
> >
> > [1] some proposals list.
> >
> > > >
> > > > Jerin
> > > >
> > > > > Konstantin
> > > > >
> > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-06 15:59 [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
                   ` (2 preceding siblings ...)
  2015-12-06 16:30 ` [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte " Thomas Monjalon
@ 2015-12-10 16:36 ` Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 1/4] eal: Introduce new cache macro definitions Jerin Jacob
                     ` (4 more replies)
  3 siblings, 5 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-10 16:36 UTC (permalink / raw)
  To: dev

This patchset fixes performance/cache resource issues with 128-byte cache line targets
found in mbuf and bitmap DPDK libraries

Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
128-bytes cache line size target.

This patchset doesn't introduce any performance degradation
for 64-bytes cache line size targets.

v1..v2
- Introduced new cache macro definitions as Suggested by Konstantin
- Reduced the cache alignment requirement for 128-byte cache targets in
slow-path data structures to save the memory
- Verified x86(a 64byte cacheline target) does not have any impact on these changes by
verifying the md5sum of app/test,app/testpmd, app/testacl binaries with
or without this patch set

Jerin Jacob (4):
  eal: Introduce new cache macro definitions
  mbuf: fix performance/cache resource issue with 128-byte cache line
    targets
  bitmap: optimize for 128-bytes cache line targets
  cache/slow path: reduce cache align requirement for 128-byte cache
    targets

 app/test/test_mbuf.c                                     |  2 +-
 lib/librte_eal/common/include/rte_memory.h               | 16 ++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_kni_common.h       |  4 +++-
 lib/librte_ether/rte_ethdev.h                            |  4 ++--
 lib/librte_mbuf/rte_mbuf.h                               |  2 +-
 lib/librte_mempool/rte_mempool.h                         |  2 +-
 lib/librte_ring/rte_ring.h                               |  2 +-
 lib/librte_sched/rte_bitmap.h                            | 10 +++++-----
 8 files changed, 30 insertions(+), 12 deletions(-)

--
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] eal: Introduce new cache macro definitions
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
@ 2015-12-10 16:36   ` Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-10 16:36 UTC (permalink / raw)
  To: dev

- RTE_CACHE_MIN_LINE_SIZE(Supported minimum cache line size)
- __rte_cache_min_aligned(Force minimum cache line alignment)
- RTE_CACHE_LINE_SIZE_LOG2(Express cache line size in terms of log2)

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_eal/common/include/rte_memory.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 9c9e40f..b67a76f 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -77,11 +77,27 @@ enum rte_page_sizes {
 	(RTE_CACHE_LINE_SIZE * ((size + RTE_CACHE_LINE_SIZE - 1) / RTE_CACHE_LINE_SIZE))
 /**< Return the first cache-aligned value greater or equal to size. */
 
+/**< Cache line size in terms of log2 */
+#if RTE_CACHE_LINE_SIZE == 64
+#define RTE_CACHE_LINE_SIZE_LOG2 6
+#elif RTE_CACHE_LINE_SIZE == 128
+#define RTE_CACHE_LINE_SIZE_LOG2 7
+#else
+#error "Unsupported cache line size"
+#endif
+
+#define RTE_CACHE_MIN_LINE_SIZE 64	/**< Minimum Cache line size. */
+
 /**
  * Force alignment to cache line.
  */
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
 
+/**
+ * Force minimum cache line alignment.
+ */
+#define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_MIN_LINE_SIZE)
+
 typedef uint64_t phys_addr_t; /**< Physical address definition. */
 #define RTE_BAD_PHYS_ADDR ((phys_addr_t)-1)
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 1/4] eal: Introduce new cache macro definitions Jerin Jacob
@ 2015-12-10 16:36   ` Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-10 16:36 UTC (permalink / raw)
  To: dev

No need to split mbuf structure to two cache lines for 128-byte cache line
size targets as it can fit on a single 128-byte cache line.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 app/test/test_mbuf.c                                          | 2 +-
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 +++-
 lib/librte_mbuf/rte_mbuf.h                                    | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index b32bef6..c99b82d 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -930,7 +930,7 @@ test_failing_mbuf_sanity_check(void)
 static int
 test_mbuf(void)
 {
-	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
+	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_MIN_LINE_SIZE * 2);
 
 	/* create pktmbuf pool if it does not exist */
 	if (pktmbuf_pool == NULL) {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index bd1cc09..be51916 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -72,6 +72,8 @@
 #define RTE_CACHE_LINE_SIZE 64       /**< Cache line size. */
 #endif
 
+#define RTE_CACHE_MIN_LINE_SIZE 64   /**< Minimum Cache line size. */
+
 /*
  * Request id.
  */
@@ -122,7 +124,7 @@ struct rte_kni_mbuf {
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 
 	/* fields on second cache line */
-	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
+	char pad3[8]  __attribute__((__aligned__(RTE_CACHE_MIN_LINE_SIZE)));
 	void *pool;
 	void *next;
 };
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..c973e9b 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -814,7 +814,7 @@ struct rte_mbuf {
 	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
 
 	/* second cache line - fields only used in slow path or on TX */
-	MARKER cacheline1 __rte_cache_aligned;
+	MARKER cacheline1 __rte_cache_min_aligned;
 
 	union {
 		void *userdata;   /**< Can be used for external metadata */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] bitmap: optimize for 128-bytes cache line targets
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 1/4] eal: Introduce new cache macro definitions Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
@ 2015-12-10 16:36   ` Jerin Jacob
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  4 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-10 16:36 UTC (permalink / raw)
  To: dev

existing rte_bitmap library implementation optimally configured to run on
64-bytes cache line, extending to 128-bytes cache line targets.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_sched/rte_bitmap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_sched/rte_bitmap.h b/lib/librte_sched/rte_bitmap.h
index 3d911e4..03f332f 100644
--- a/lib/librte_sched/rte_bitmap.h
+++ b/lib/librte_sched/rte_bitmap.h
@@ -45,11 +45,11 @@ extern "C" {
  * The bitmap component provides a mechanism to manage large arrays of bits
  * through bit get/set/clear and bit array scan operations.
  *
- * The bitmap scan operation is optimized for 64-bit CPUs using 64-byte cache
+ * The bitmap scan operation is optimized for 64-bit CPUs using 64/128 byte cache
  * lines. The bitmap is hierarchically organized using two arrays (array1 and
  * array2), with each bit in array1 being associated with a full cache line
- * (512 bits) of bitmap bits, which are stored in array2: the bit in array1 is
- * set only when there is at least one bit set within its associated array2
+ * (512/1024 bits) of bitmap bits, which are stored in array2: the bit in array1
+ * is set only when there is at least one bit set within its associated array2
  * bits, otherwise the bit in array1 is cleared. The read and write operations
  * for array1 and array2 are always done in slabs of 64 bits.
  *
@@ -81,11 +81,11 @@ extern "C" {
 
 /* Cache line (CL) */
 #define RTE_BITMAP_CL_BIT_SIZE                   (RTE_CACHE_LINE_SIZE * 8)
-#define RTE_BITMAP_CL_BIT_SIZE_LOG2              9
+#define RTE_BITMAP_CL_BIT_SIZE_LOG2              (RTE_CACHE_LINE_SIZE_LOG2 + 3)
 #define RTE_BITMAP_CL_BIT_MASK                   (RTE_BITMAP_CL_BIT_SIZE - 1)
 
 #define RTE_BITMAP_CL_SLAB_SIZE                  (RTE_BITMAP_CL_BIT_SIZE / RTE_BITMAP_SLAB_BIT_SIZE)
-#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             3
+#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             (RTE_BITMAP_CL_BIT_SIZE_LOG2 - RTE_BITMAP_SLAB_BIT_SIZE_LOG2)
 #define RTE_BITMAP_CL_SLAB_MASK                  (RTE_BITMAP_CL_SLAB_SIZE - 1)
 
 /** Bitmap data structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
                     ` (2 preceding siblings ...)
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
@ 2015-12-10 16:36   ` Jerin Jacob
  2015-12-11 12:55     ` Ananyev, Konstantin
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  4 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-10 16:36 UTC (permalink / raw)
  To: dev

slow-path data structures need not be 128-byte cache aligned.
Reduce the alignment to 64-byte to save the memory.

No behavior change for 64-byte cache aligned systems as minimum
cache line size as 64.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_ether/rte_ethdev.h    | 4 ++--
 lib/librte_mempool/rte_mempool.h | 2 +-
 lib/librte_ring/rte_ring.h       | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..4dbf73b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -863,7 +863,7 @@ struct rte_eth_rxq_info {
 	struct rte_eth_rxconf conf; /**< queue config parameters. */
 	uint8_t scattered_rx;       /**< scattered packets RX supported. */
 	uint16_t nb_desc;           /**< configured number of RXDs. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 
 /**
  * Ethernet device TX queue information structure.
@@ -872,7 +872,7 @@ struct rte_eth_rxq_info {
 struct rte_eth_txq_info {
 	struct rte_eth_txconf conf; /**< queue config parameters. */
 	uint16_t nb_desc;           /**< configured number of TXDs. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 
 /** Maximum name length for extended statistics counters */
 #define RTE_ETH_XSTATS_NAME_SIZE 64
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 6e2390a..8e5d10c 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -92,7 +92,7 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_objs; /**< Objects successfully allocated. */
 	uint64_t get_fail_bulk;    /**< Failed allocation number. */
 	uint64_t get_fail_objs;    /**< Objects that failed to be allocated. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 #endif
 
 #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index de036ce..33166aa 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -123,7 +123,7 @@ struct rte_ring_debug_stats {
 	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
 	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
 	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 #endif
 
 #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
@ 2015-12-11 12:55     ` Ananyev, Konstantin
  2015-12-11 13:07       ` Thomas Monjalon
  2015-12-11 13:56       ` Jerin Jacob
  0 siblings, 2 replies; 40+ messages in thread
From: Ananyev, Konstantin @ 2015-12-11 12:55 UTC (permalink / raw)
  To: Jerin Jacob, dev

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, December 10, 2015 4:36 PM
> To: dev@dpdk.org
> Cc: thomas.monjalon@6wind.com; Ananyev, Konstantin; viktorin@rehivetech.com; jianbo.liu@linaro.org; Jerin Jacob
> Subject: [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
> 
> slow-path data structures need not be 128-byte cache aligned.
> Reduce the alignment to 64-byte to save the memory.
> 
> No behavior change for 64-byte cache aligned systems as minimum
> cache line size as 64.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_ether/rte_ethdev.h    | 4 ++--
>  lib/librte_mempool/rte_mempool.h | 2 +-
>  lib/librte_ring/rte_ring.h       | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index bada8ad..4dbf73b 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -863,7 +863,7 @@ struct rte_eth_rxq_info {
>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>  	uint16_t nb_desc;           /**< configured number of RXDs. */
> -} __rte_cache_aligned;
> +} __rte_cache_min_aligned;
> 
>  /**
>   * Ethernet device TX queue information structure.
> @@ -872,7 +872,7 @@ struct rte_eth_rxq_info {
>  struct rte_eth_txq_info {
>  	struct rte_eth_txconf conf; /**< queue config parameters. */
>  	uint16_t nb_desc;           /**< configured number of TXDs. */
> -} __rte_cache_aligned;
> +} __rte_cache_min_aligned;
> 
>  /** Maximum name length for extended statistics counters */
>  #define RTE_ETH_XSTATS_NAME_SIZE 64
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 6e2390a..8e5d10c 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -92,7 +92,7 @@ struct rte_mempool_debug_stats {
>  	uint64_t get_success_objs; /**< Objects successfully allocated. */
>  	uint64_t get_fail_bulk;    /**< Failed allocation number. */
>  	uint64_t get_fail_objs;    /**< Objects that failed to be allocated. */
> -} __rte_cache_aligned;
> +} __rte_cache_min_aligned;
>  #endif
> 
>  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index de036ce..33166aa 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -123,7 +123,7 @@ struct rte_ring_debug_stats {
>  	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
>  	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
>  	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
> -} __rte_cache_aligned;
> +} __rte_cache_min_aligned;
>  #endif

I think we better keep both struct rte_ring_debug_stats and rte_mempool_debug_stats
as __rte_cache_aligned.
Both are on a  per core basis and can be used at run-time
(when RTE_LIBRTE_RING_DEBUG/RTE_LIBRTE_MEMPOOL_DEBUG=y),
and not supposed to be shared by different cores. 
All other things in the series look good to me.
BTW, by some reason I can't find that series in the patchworks.
Konstantin


> 
>  #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2015-12-11 12:55     ` Ananyev, Konstantin
@ 2015-12-11 13:07       ` Thomas Monjalon
  2015-12-11 13:56       ` Jerin Jacob
  1 sibling, 0 replies; 40+ messages in thread
From: Thomas Monjalon @ 2015-12-11 13:07 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2015-12-11 12:55, Ananyev, Konstantin:
> BTW, by some reason I can't find that series in the patchworks.

Because it is deferred:
	http://dpdk.org/dev/patchwork/project/dpdk/list/?state=10

It's too late to do this change in 2.2.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2015-12-11 12:55     ` Ananyev, Konstantin
  2015-12-11 13:07       ` Thomas Monjalon
@ 2015-12-11 13:56       ` Jerin Jacob
  2015-12-11 14:42         ` Ananyev, Konstantin
  1 sibling, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-11 13:56 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Fri, Dec 11, 2015 at 12:55:57PM +0000, Ananyev, Konstantin wrote:
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Thursday, December 10, 2015 4:36 PM
> > To: dev@dpdk.org
> > Cc: thomas.monjalon@6wind.com; Ananyev, Konstantin; viktorin@rehivetech.com; jianbo.liu@linaro.org; Jerin Jacob
> > Subject: [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
> >
> > slow-path data structures need not be 128-byte cache aligned.
> > Reduce the alignment to 64-byte to save the memory.
> >
> > No behavior change for 64-byte cache aligned systems as minimum
> > cache line size as 64.
> >
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h    | 4 ++--
> >  lib/librte_mempool/rte_mempool.h | 2 +-
> >  lib/librte_ring/rte_ring.h       | 2 +-
> >  3 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index bada8ad..4dbf73b 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -863,7 +863,7 @@ struct rte_eth_rxq_info {
> >  	struct rte_eth_rxconf conf; /**< queue config parameters. */
> >  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
> >  	uint16_t nb_desc;           /**< configured number of RXDs. */
> > -} __rte_cache_aligned;
> > +} __rte_cache_min_aligned;
> >
> >  /**
> >   * Ethernet device TX queue information structure.
> > @@ -872,7 +872,7 @@ struct rte_eth_rxq_info {
> >  struct rte_eth_txq_info {
> >  	struct rte_eth_txconf conf; /**< queue config parameters. */
> >  	uint16_t nb_desc;           /**< configured number of TXDs. */
> > -} __rte_cache_aligned;
> > +} __rte_cache_min_aligned;
> >
> >  /** Maximum name length for extended statistics counters */
> >  #define RTE_ETH_XSTATS_NAME_SIZE 64
> > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> > index 6e2390a..8e5d10c 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -92,7 +92,7 @@ struct rte_mempool_debug_stats {
> >  	uint64_t get_success_objs; /**< Objects successfully allocated. */
> >  	uint64_t get_fail_bulk;    /**< Failed allocation number. */
> >  	uint64_t get_fail_objs;    /**< Objects that failed to be allocated. */
> > -} __rte_cache_aligned;
> > +} __rte_cache_min_aligned;
> >  #endif
> >
> >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index de036ce..33166aa 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -123,7 +123,7 @@ struct rte_ring_debug_stats {
> >  	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
> >  	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
> >  	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
> > -} __rte_cache_aligned;
> > +} __rte_cache_min_aligned;
> >  #endif
>
> I think we better keep both struct rte_ring_debug_stats and rte_mempool_debug_stats
> as __rte_cache_aligned.
> Both are on a  per core basis and can be used at run-time
> (when RTE_LIBRTE_RING_DEBUG/RTE_LIBRTE_MEMPOOL_DEBUG=y),
> and not supposed to be shared by different cores.
> All other things in the series look good to me.

OK, I will fix the alignment of rte_ring_debug_stats and
rte_mempool_debug_stats and send the next revision with your ACK.

Jerin


> BTW, by some reason I can't find that series in the patchworks.
> Konstantin
>
>
> >
> >  #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
> > --
> > 2.1.0
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2015-12-11 13:56       ` Jerin Jacob
@ 2015-12-11 14:42         ` Ananyev, Konstantin
  0 siblings, 0 replies; 40+ messages in thread
From: Ananyev, Konstantin @ 2015-12-11 14:42 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev


> >
> > I think we better keep both struct rte_ring_debug_stats and rte_mempool_debug_stats
> > as __rte_cache_aligned.
> > Both are on a  per core basis and can be used at run-time
> > (when RTE_LIBRTE_RING_DEBUG/RTE_LIBRTE_MEMPOOL_DEBUG=y),
> > and not supposed to be shared by different cores.
> > All other things in the series look good to me.
> 
> OK, I will fix the alignment of rte_ring_debug_stats and
> rte_mempool_debug_stats and send the next revision with your ACK.

Works for me.
Konstantin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
                     ` (3 preceding siblings ...)
  2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
@ 2015-12-14  4:32   ` Jerin Jacob
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions Jerin Jacob
                       ` (4 more replies)
  4 siblings, 5 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-14  4:32 UTC (permalink / raw)
  To: dev

This patchset fixes performance/cache resource issues with 128-byte cache line targets
found in mbuf and bitmap DPDK libraries

Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
128-bytes cache line size target.

This patchset doesn't introduce any performance degradation
for 64-bytes cache line size targets.

v1..v2
- Introduced new cache macro definitions as Suggested by Konstantin
- Reduced the cache alignment requirement for 128-byte cache targets in
slow-path data structures to save the memory
- Verified x86(a 64byte cacheline target) does not have any impact on these changes by
verifying the md5sum of app/test,app/testpmd, app/testacl binaries with
or without this patch set

v2..v3

revert the cache alignment of rte_ring_debug_stats,
rte_mempool_debug_stats structures

For the series,
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Jerin Jacob (4):
  eal: Introduce new cache macro definitions
  mbuf: fix performance/cache resource issue with 128-byte cache line
    targets
  bitmap: optimize for 128-bytes cache line targets
  cache/slow-path: reduce cache align requirement for 128-byte cache
    targets

 app/test/test_mbuf.c                                     |  2 +-
 lib/librte_eal/common/include/rte_memory.h               | 16 ++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_kni_common.h       |  4 +++-
 lib/librte_ether/rte_ethdev.h                            |  4 ++--
 lib/librte_mbuf/rte_mbuf.h                               |  2 +-
 lib/librte_sched/rte_bitmap.h                            | 10 +++++-----
 6 files changed, 28 insertions(+), 10 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
@ 2015-12-14  4:32     ` Jerin Jacob
  2016-01-04 13:15       ` Olivier MATZ
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2015-12-14  4:32 UTC (permalink / raw)
  To: dev

- RTE_CACHE_MIN_LINE_SIZE(Supported minimum cache line size)
- __rte_cache_min_aligned(Force minimum cache line alignment)
- RTE_CACHE_LINE_SIZE_LOG2(Express cache line size in terms of log2)

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_eal/common/include/rte_memory.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 9c9e40f..b67a76f 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -77,11 +77,27 @@ enum rte_page_sizes {
 	(RTE_CACHE_LINE_SIZE * ((size + RTE_CACHE_LINE_SIZE - 1) / RTE_CACHE_LINE_SIZE))
 /**< Return the first cache-aligned value greater or equal to size. */
 
+/**< Cache line size in terms of log2 */
+#if RTE_CACHE_LINE_SIZE == 64
+#define RTE_CACHE_LINE_SIZE_LOG2 6
+#elif RTE_CACHE_LINE_SIZE == 128
+#define RTE_CACHE_LINE_SIZE_LOG2 7
+#else
+#error "Unsupported cache line size"
+#endif
+
+#define RTE_CACHE_MIN_LINE_SIZE 64	/**< Minimum Cache line size. */
+
 /**
  * Force alignment to cache line.
  */
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
 
+/**
+ * Force minimum cache line alignment.
+ */
+#define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_MIN_LINE_SIZE)
+
 typedef uint64_t phys_addr_t; /**< Physical address definition. */
 #define RTE_BAD_PHYS_ADDR ((phys_addr_t)-1)
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions Jerin Jacob
@ 2015-12-14  4:32     ` Jerin Jacob
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-14  4:32 UTC (permalink / raw)
  To: dev

No need to split mbuf structure to two cache lines for 128-byte cache line
size targets as it can fit on a single 128-byte cache line.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 app/test/test_mbuf.c                                          | 2 +-
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 +++-
 lib/librte_mbuf/rte_mbuf.h                                    | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index b32bef6..c99b82d 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -930,7 +930,7 @@ test_failing_mbuf_sanity_check(void)
 static int
 test_mbuf(void)
 {
-	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
+	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_MIN_LINE_SIZE * 2);
 
 	/* create pktmbuf pool if it does not exist */
 	if (pktmbuf_pool == NULL) {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index bd1cc09..be51916 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -72,6 +72,8 @@
 #define RTE_CACHE_LINE_SIZE 64       /**< Cache line size. */
 #endif
 
+#define RTE_CACHE_MIN_LINE_SIZE 64   /**< Minimum Cache line size. */
+
 /*
  * Request id.
  */
@@ -122,7 +124,7 @@ struct rte_kni_mbuf {
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 
 	/* fields on second cache line */
-	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
+	char pad3[8]  __attribute__((__aligned__(RTE_CACHE_MIN_LINE_SIZE)));
 	void *pool;
 	void *next;
 };
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..c973e9b 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -814,7 +814,7 @@ struct rte_mbuf {
 	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
 
 	/* second cache line - fields only used in slow path or on TX */
-	MARKER cacheline1 __rte_cache_aligned;
+	MARKER cacheline1 __rte_cache_min_aligned;
 
 	union {
 		void *userdata;   /**< Can be used for external metadata */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] bitmap: optimize for 128-bytes cache line targets
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions Jerin Jacob
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
@ 2015-12-14  4:32     ` Jerin Jacob
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  4 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-14  4:32 UTC (permalink / raw)
  To: dev

existing rte_bitmap library implementation optimally configured to run on
64-bytes cache line, extending to 128-bytes cache line targets.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_sched/rte_bitmap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_sched/rte_bitmap.h b/lib/librte_sched/rte_bitmap.h
index 3d911e4..03f332f 100644
--- a/lib/librte_sched/rte_bitmap.h
+++ b/lib/librte_sched/rte_bitmap.h
@@ -45,11 +45,11 @@ extern "C" {
  * The bitmap component provides a mechanism to manage large arrays of bits
  * through bit get/set/clear and bit array scan operations.
  *
- * The bitmap scan operation is optimized for 64-bit CPUs using 64-byte cache
+ * The bitmap scan operation is optimized for 64-bit CPUs using 64/128 byte cache
  * lines. The bitmap is hierarchically organized using two arrays (array1 and
  * array2), with each bit in array1 being associated with a full cache line
- * (512 bits) of bitmap bits, which are stored in array2: the bit in array1 is
- * set only when there is at least one bit set within its associated array2
+ * (512/1024 bits) of bitmap bits, which are stored in array2: the bit in array1
+ * is set only when there is at least one bit set within its associated array2
  * bits, otherwise the bit in array1 is cleared. The read and write operations
  * for array1 and array2 are always done in slabs of 64 bits.
  *
@@ -81,11 +81,11 @@ extern "C" {
 
 /* Cache line (CL) */
 #define RTE_BITMAP_CL_BIT_SIZE                   (RTE_CACHE_LINE_SIZE * 8)
-#define RTE_BITMAP_CL_BIT_SIZE_LOG2              9
+#define RTE_BITMAP_CL_BIT_SIZE_LOG2              (RTE_CACHE_LINE_SIZE_LOG2 + 3)
 #define RTE_BITMAP_CL_BIT_MASK                   (RTE_BITMAP_CL_BIT_SIZE - 1)
 
 #define RTE_BITMAP_CL_SLAB_SIZE                  (RTE_BITMAP_CL_BIT_SIZE / RTE_BITMAP_SLAB_BIT_SIZE)
-#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             3
+#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             (RTE_BITMAP_CL_BIT_SIZE_LOG2 - RTE_BITMAP_SLAB_BIT_SIZE_LOG2)
 #define RTE_BITMAP_CL_SLAB_MASK                  (RTE_BITMAP_CL_SLAB_SIZE - 1)
 
 /** Bitmap data structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
                       ` (2 preceding siblings ...)
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
@ 2015-12-14  4:32     ` Jerin Jacob
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  4 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2015-12-14  4:32 UTC (permalink / raw)
  To: dev

slow-path data structures need not be 128-byte cache aligned.
Reduce the alignment to 64-byte to save the memory.

No behavior change for 64-byte cache aligned systems as minimum
cache line size as 64.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..4dbf73b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -863,7 +863,7 @@ struct rte_eth_rxq_info {
 	struct rte_eth_rxconf conf; /**< queue config parameters. */
 	uint8_t scattered_rx;       /**< scattered packets RX supported. */
 	uint16_t nb_desc;           /**< configured number of RXDs. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 
 /**
  * Ethernet device TX queue information structure.
@@ -872,7 +872,7 @@ struct rte_eth_rxq_info {
 struct rte_eth_txq_info {
 	struct rte_eth_txconf conf; /**< queue config parameters. */
 	uint16_t nb_desc;           /**< configured number of TXDs. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 
 /** Maximum name length for extended statistics counters */
 #define RTE_ETH_XSTATS_NAME_SIZE 64
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions Jerin Jacob
@ 2016-01-04 13:15       ` Olivier MATZ
  2016-01-06 15:10         ` Jerin Jacob
  0 siblings, 1 reply; 40+ messages in thread
From: Olivier MATZ @ 2016-01-04 13:15 UTC (permalink / raw)
  To: Jerin Jacob, dev

Hi Jerin,

Please see some comments below.

On 12/14/2015 05:32 AM, Jerin Jacob wrote:
> - RTE_CACHE_MIN_LINE_SIZE(Supported minimum cache line size)
> - __rte_cache_min_aligned(Force minimum cache line alignment)
> - RTE_CACHE_LINE_SIZE_LOG2(Express cache line size in terms of log2)
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_eal/common/include/rte_memory.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
> index 9c9e40f..b67a76f 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -77,11 +77,27 @@ enum rte_page_sizes {
>  	(RTE_CACHE_LINE_SIZE * ((size + RTE_CACHE_LINE_SIZE - 1) / RTE_CACHE_LINE_SIZE))
>  /**< Return the first cache-aligned value greater or equal to size. */
>  
> +/**< Cache line size in terms of log2 */
> +#if RTE_CACHE_LINE_SIZE == 64
> +#define RTE_CACHE_LINE_SIZE_LOG2 6
> +#elif RTE_CACHE_LINE_SIZE == 128
> +#define RTE_CACHE_LINE_SIZE_LOG2 7
> +#else
> +#error "Unsupported cache line size"
> +#endif
> +
> +#define RTE_CACHE_MIN_LINE_SIZE 64	/**< Minimum Cache line size. */
> +

I think RTE_CACHE_LINE_MIN_SIZE or RTE_MIN_CACHE_LINE_SIZE would
be clearer than RTE_CACHE_MIN_LINE_SIZE.

>  /**
>   * Force alignment to cache line.
>   */
>  #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
>  
> +/**
> + * Force minimum cache line alignment.
> + */
> +#define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_MIN_LINE_SIZE)

I'm not really convinced that __rte_cache_min_aligned is straightforward
for someone reading the code that it means "aligned to the minimum cache
line size supported by the dpdk".

In the two cases you are using this macro (mbuf structure and queue
info), I'm wondering if using __attribute__((aligned(64))) wouldn't be
clearer?
- for mbuf, it could be a local define, like MBUF_ALIGN_SIZE
- for queue info, using 64 makes sense as it's used to reserve space
  for future use

What do you think?


Regards,
Olivier

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions
  2016-01-04 13:15       ` Olivier MATZ
@ 2016-01-06 15:10         ` Jerin Jacob
  0 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2016-01-06 15:10 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev

On Mon, Jan 04, 2016 at 02:15:53PM +0100, Olivier MATZ wrote:
> Hi Jerin,

Hi Olivier,

> 
> Please see some comments below.
> 
> On 12/14/2015 05:32 AM, Jerin Jacob wrote:
> > - RTE_CACHE_MIN_LINE_SIZE(Supported minimum cache line size)
> > - __rte_cache_min_aligned(Force minimum cache line alignment)
> > - RTE_CACHE_LINE_SIZE_LOG2(Express cache line size in terms of log2)
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_eal/common/include/rte_memory.h | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
> > index 9c9e40f..b67a76f 100644
> > --- a/lib/librte_eal/common/include/rte_memory.h
> > +++ b/lib/librte_eal/common/include/rte_memory.h
> > @@ -77,11 +77,27 @@ enum rte_page_sizes {
> >  	(RTE_CACHE_LINE_SIZE * ((size + RTE_CACHE_LINE_SIZE - 1) / RTE_CACHE_LINE_SIZE))
> >  /**< Return the first cache-aligned value greater or equal to size. */
> >  
> > +/**< Cache line size in terms of log2 */
> > +#if RTE_CACHE_LINE_SIZE == 64
> > +#define RTE_CACHE_LINE_SIZE_LOG2 6
> > +#elif RTE_CACHE_LINE_SIZE == 128
> > +#define RTE_CACHE_LINE_SIZE_LOG2 7
> > +#else
> > +#error "Unsupported cache line size"
> > +#endif
> > +
> > +#define RTE_CACHE_MIN_LINE_SIZE 64	/**< Minimum Cache line size. */
> > +
> 
> I think RTE_CACHE_LINE_MIN_SIZE or RTE_MIN_CACHE_LINE_SIZE would
> be clearer than RTE_CACHE_MIN_LINE_SIZE.

OK. I will resend the next version with RTE_CACHE_LINE_MIN_SIZE

> 
> >  /**
> >   * Force alignment to cache line.
> >   */
> >  #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
> >  
> > +/**
> > + * Force minimum cache line alignment.
> > + */
> > +#define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_MIN_LINE_SIZE)
> 
> I'm not really convinced that __rte_cache_min_aligned is straightforward
> for someone reading the code that it means "aligned to the minimum cache
> line size supported by the dpdk".
> 
> In the two cases you are using this macro (mbuf structure and queue
> info), I'm wondering if using __attribute__((aligned(64))) wouldn't be
> clearer?
> - for mbuf, it could be a local define, like MBUF_ALIGN_SIZE
> - for queue info, using 64 makes sense as it's used to reserve space
>   for future use
> 
> What do you think?

IMO, it makes sense to keep "__rte_cache_min_aligned" as it will useful
for forcing the minimum alignment requirements in DPDK in future
structures.

Thoughts?

> 
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
                       ` (3 preceding siblings ...)
  2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
@ 2016-01-29  7:45     ` Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 1/4] eal: Introduce new cache line macro definitions Jerin Jacob
                         ` (5 more replies)
  4 siblings, 6 replies; 40+ messages in thread
From: Jerin Jacob @ 2016-01-29  7:45 UTC (permalink / raw)
  To: dev; +Cc: viktorin

This patchset fixes performance/cache resource issues with 128-byte cache line targets
found in mbuf and bitmap DPDK libraries

Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
128-bytes cache line size target.

This patchset doesn't introduce any performance degradation
for 64-bytes cache line size targets.

v1..v2
- Introduced new cache macro definitions as Suggested by Konstantin
- Reduced the cache alignment requirement for 128-byte cache targets in
slow-path data structures to save the memory
- Verified x86(a 64byte cacheline target) does not have any impact on these changes by
verifying the md5sum of app/test,app/testpmd, app/testacl binaries with
or without this patch set

v2..v3

revert the cache alignment of rte_ring_debug_stats,
rte_mempool_debug_stats structures

v3..v4
replaced RTE_CACHE_MIN_LINE_SIZE with RTE_CACHE_LINE_MIN_SIZE as suggested by
olivier.matz@6wind.com

For clean git am, "config: cleanup existing RTE_CACHE_LINE_SIZE selection scheme"
patch needs to apply first

Jerin Jacob (4):
  eal: Introduce new cache line macro definitions
  mbuf: fix performance/cache resource issue with 128-byte cache line
    targets
  bitmap: optimize for 128-bytes cache line targets
  cache/slow-path: reduce cache align requirement for 128-byte cache
    targets

 app/test/test_mbuf.c                                     |  2 +-
 lib/librte_eal/common/include/rte_memory.h               | 16 ++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_kni_common.h       |  4 +++-
 lib/librte_ether/rte_ethdev.h                            |  4 ++--
 lib/librte_mbuf/rte_mbuf.h                               |  2 +-
 lib/librte_sched/rte_bitmap.h                            | 10 +++++-----
 6 files changed, 28 insertions(+), 10 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v4 1/4] eal: Introduce new cache line macro definitions
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
@ 2016-01-29  7:45       ` Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2016-01-29  7:45 UTC (permalink / raw)
  To: dev; +Cc: viktorin

- RTE_CACHE_LINE_MIN_SIZE(Supported minimum cache line size)
- __rte_cache_min_aligned(Force minimum cache line alignment)
- RTE_CACHE_LINE_SIZE_LOG2(Express cache line size in terms of log2)

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_eal/common/include/rte_memory.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 2200d58..f8dbece 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -74,11 +74,27 @@ enum rte_page_sizes {
 	(RTE_CACHE_LINE_SIZE * ((size + RTE_CACHE_LINE_SIZE - 1) / RTE_CACHE_LINE_SIZE))
 /**< Return the first cache-aligned value greater or equal to size. */
 
+/**< Cache line size in terms of log2 */
+#if RTE_CACHE_LINE_SIZE == 64
+#define RTE_CACHE_LINE_SIZE_LOG2 6
+#elif RTE_CACHE_LINE_SIZE == 128
+#define RTE_CACHE_LINE_SIZE_LOG2 7
+#else
+#error "Unsupported cache line size"
+#endif
+
+#define RTE_CACHE_LINE_MIN_SIZE 64	/**< Minimum Cache line size. */
+
 /**
  * Force alignment to cache line.
  */
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
 
+/**
+ * Force minimum cache line alignment.
+ */
+#define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
+
 typedef uint64_t phys_addr_t; /**< Physical address definition. */
 #define RTE_BAD_PHYS_ADDR ((phys_addr_t)-1)
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v4 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 1/4] eal: Introduce new cache line macro definitions Jerin Jacob
@ 2016-01-29  7:45       ` Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2016-01-29  7:45 UTC (permalink / raw)
  To: dev; +Cc: viktorin

No need to split mbuf structure to two cache lines for 128-byte cache
line size targets as it can fit on a single 128-byte cache line.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 app/test/test_mbuf.c                                          | 2 +-
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 +++-
 lib/librte_mbuf/rte_mbuf.h                                    | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index b32bef6..c24cbe0 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -930,7 +930,7 @@ test_failing_mbuf_sanity_check(void)
 static int
 test_mbuf(void)
 {
-	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_SIZE * 2);
+	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != RTE_CACHE_LINE_MIN_SIZE * 2);
 
 	/* create pktmbuf pool if it does not exist */
 	if (pktmbuf_pool == NULL) {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 815abd6..fef914f 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -68,6 +68,8 @@
  */
 #define RTE_KNI_NAMESIZE 32
 
+#define RTE_CACHE_LINE_MIN_SIZE 64
+
 /*
  * Request id.
  */
@@ -118,7 +120,7 @@ struct rte_kni_mbuf {
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 
 	/* fields on second cache line */
-	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
+	char pad3[8]  __attribute__((__aligned__(RTE_CACHE_LINE_MIN_SIZE)));
 	void *pool;
 	void *next;
 };
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..c973e9b 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -814,7 +814,7 @@ struct rte_mbuf {
 	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
 
 	/* second cache line - fields only used in slow path or on TX */
-	MARKER cacheline1 __rte_cache_aligned;
+	MARKER cacheline1 __rte_cache_min_aligned;
 
 	union {
 		void *userdata;   /**< Can be used for external metadata */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v4 3/4] bitmap: optimize for 128-bytes cache line targets
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 1/4] eal: Introduce new cache line macro definitions Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
@ 2016-01-29  7:45       ` Jerin Jacob
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2016-01-29  7:45 UTC (permalink / raw)
  To: dev; +Cc: viktorin

existing rte_bitmap library implementation optimally configured to run on
64-bytes cache line, extending to 128-bytes cache line targets.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_sched/rte_bitmap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_sched/rte_bitmap.h b/lib/librte_sched/rte_bitmap.h
index 3d911e4..03f332f 100644
--- a/lib/librte_sched/rte_bitmap.h
+++ b/lib/librte_sched/rte_bitmap.h
@@ -45,11 +45,11 @@ extern "C" {
  * The bitmap component provides a mechanism to manage large arrays of bits
  * through bit get/set/clear and bit array scan operations.
  *
- * The bitmap scan operation is optimized for 64-bit CPUs using 64-byte cache
+ * The bitmap scan operation is optimized for 64-bit CPUs using 64/128 byte cache
  * lines. The bitmap is hierarchically organized using two arrays (array1 and
  * array2), with each bit in array1 being associated with a full cache line
- * (512 bits) of bitmap bits, which are stored in array2: the bit in array1 is
- * set only when there is at least one bit set within its associated array2
+ * (512/1024 bits) of bitmap bits, which are stored in array2: the bit in array1
+ * is set only when there is at least one bit set within its associated array2
  * bits, otherwise the bit in array1 is cleared. The read and write operations
  * for array1 and array2 are always done in slabs of 64 bits.
  *
@@ -81,11 +81,11 @@ extern "C" {
 
 /* Cache line (CL) */
 #define RTE_BITMAP_CL_BIT_SIZE                   (RTE_CACHE_LINE_SIZE * 8)
-#define RTE_BITMAP_CL_BIT_SIZE_LOG2              9
+#define RTE_BITMAP_CL_BIT_SIZE_LOG2              (RTE_CACHE_LINE_SIZE_LOG2 + 3)
 #define RTE_BITMAP_CL_BIT_MASK                   (RTE_BITMAP_CL_BIT_SIZE - 1)
 
 #define RTE_BITMAP_CL_SLAB_SIZE                  (RTE_BITMAP_CL_BIT_SIZE / RTE_BITMAP_SLAB_BIT_SIZE)
-#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             3
+#define RTE_BITMAP_CL_SLAB_SIZE_LOG2             (RTE_BITMAP_CL_BIT_SIZE_LOG2 - RTE_BITMAP_SLAB_BIT_SIZE_LOG2)
 #define RTE_BITMAP_CL_SLAB_MASK                  (RTE_BITMAP_CL_SLAB_SIZE - 1)
 
 /** Bitmap data structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH v4 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
                         ` (2 preceding siblings ...)
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
@ 2016-01-29  7:45       ` Jerin Jacob
  2016-02-08  9:31       ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
  2016-02-11 11:53       ` Thomas Monjalon
  5 siblings, 0 replies; 40+ messages in thread
From: Jerin Jacob @ 2016-01-29  7:45 UTC (permalink / raw)
  To: dev; +Cc: viktorin

slow-path data structures need not be 128-byte cache aligned.
Reduce the alignment to 64-byte to save the memory.

No behavior change for 64-byte cache aligned systems as minimum
cache line size as 64.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..16da821 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -863,7 +863,7 @@ struct rte_eth_rxq_info {
 	struct rte_eth_rxconf conf; /**< queue config parameters. */
 	uint8_t scattered_rx;       /**< scattered packets RX supported. */
 	uint16_t nb_desc;           /**< configured number of RXDs. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 
 /**
  * Ethernet device TX queue information structure.
@@ -872,7 +872,7 @@ struct rte_eth_rxq_info {
 struct rte_eth_txq_info {
 	struct rte_eth_txconf conf; /**< queue config parameters. */
 	uint16_t nb_desc;           /**< configured number of TXDs. */
-} __rte_cache_aligned;
+} __rte_cache_min_aligned;
 
 /** Maximum name length for extended statistics counters */
 #define RTE_ETH_XSTATS_NAME_SIZE 64
-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
                         ` (3 preceding siblings ...)
  2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
@ 2016-02-08  9:31       ` Jerin Jacob
  2016-02-08 10:24         ` Jan Viktorin
  2016-02-11 11:53       ` Thomas Monjalon
  5 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2016-02-08  9:31 UTC (permalink / raw)
  To: dev; +Cc: viktorin

On Fri, Jan 29, 2016 at 01:15:51PM +0530, Jerin Jacob wrote:
> This patchset fixes performance/cache resource issues with 128-byte cache line targets
> found in mbuf and bitmap DPDK libraries
> 
> Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> 128-bytes cache line size target.
> 
> This patchset doesn't introduce any performance degradation
> for 64-bytes cache line size targets.
> 
> v1..v2
> - Introduced new cache macro definitions as Suggested by Konstantin
> - Reduced the cache alignment requirement for 128-byte cache targets in
> slow-path data structures to save the memory
> - Verified x86(a 64byte cacheline target) does not have any impact on these changes by
> verifying the md5sum of app/test,app/testpmd, app/testacl binaries with
> or without this patch set
> 
> v2..v3
> 
> revert the cache alignment of rte_ring_debug_stats,
> rte_mempool_debug_stats structures
> 
> v3..v4
> replaced RTE_CACHE_MIN_LINE_SIZE with RTE_CACHE_LINE_MIN_SIZE as suggested by
> olivier.matz@6wind.com
> 
> For clean git am, "config: cleanup existing RTE_CACHE_LINE_SIZE selection scheme"
> patch needs to apply first
> 
> Jerin Jacob (4):
>   eal: Introduce new cache line macro definitions
>   mbuf: fix performance/cache resource issue with 128-byte cache line
>     targets
>   bitmap: optimize for 128-bytes cache line targets
>   cache/slow-path: reduce cache align requirement for 128-byte cache
>     targets
> 

ping for review/merge

>  app/test/test_mbuf.c                                     |  2 +-
>  lib/librte_eal/common/include/rte_memory.h               | 16 ++++++++++++++++
>  .../linuxapp/eal/include/exec-env/rte_kni_common.h       |  4 +++-
>  lib/librte_ether/rte_ethdev.h                            |  4 ++--
>  lib/librte_mbuf/rte_mbuf.h                               |  2 +-
>  lib/librte_sched/rte_bitmap.h                            | 10 +++++-----
>  6 files changed, 28 insertions(+), 10 deletions(-)
> 
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2016-02-08  9:31       ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
@ 2016-02-08 10:24         ` Jan Viktorin
  2016-02-08 10:52           ` Jerin Jacob
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Viktorin @ 2016-02-08 10:24 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hello Jerin,

I failed to apply the v4 series. I've tried multiple bases. The v3
applied successfully to v2.2.0-rc4. Could you give the proper working
base?

Applying patch #10232 using 'git am'
Description: [dpdk-dev,v4,2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets
Applying: mbuf: fix performance/cache resource issue with 128-byte cache line targets
error: patch failed: lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h:68
error: lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h: patch does not apply
Patch failed at 0001 mbuf: fix performance/cache resource issue with 128-byte cache line targets
The copy of the patch that failed is found in: /home/jviki/upstream/dpdk/.git/worktrees/dpdk-pw/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
'git am' failed with exit status 128
Try to fix git-am manually or exit...

Regards
Jan

On Mon, 8 Feb 2016 15:01:58 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> On Fri, Jan 29, 2016 at 01:15:51PM +0530, Jerin Jacob wrote:
> > This patchset fixes performance/cache resource issues with 128-byte cache line targets
> > found in mbuf and bitmap DPDK libraries
> > 
> > Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> > 128-bytes cache line size target.
> > 
> > This patchset doesn't introduce any performance degradation
> > for 64-bytes cache line size targets.
> > 
> > v1..v2
> > - Introduced new cache macro definitions as Suggested by Konstantin
> > - Reduced the cache alignment requirement for 128-byte cache targets in
> > slow-path data structures to save the memory
> > - Verified x86(a 64byte cacheline target) does not have any impact on these changes by
> > verifying the md5sum of app/test,app/testpmd, app/testacl binaries with
> > or without this patch set
> > 
> > v2..v3
> > 
> > revert the cache alignment of rte_ring_debug_stats,
> > rte_mempool_debug_stats structures
> > 
> > v3..v4
> > replaced RTE_CACHE_MIN_LINE_SIZE with RTE_CACHE_LINE_MIN_SIZE as suggested by
> > olivier.matz@6wind.com
> > 
> > For clean git am, "config: cleanup existing RTE_CACHE_LINE_SIZE selection scheme"
> > patch needs to apply first
> > 
> > Jerin Jacob (4):
> >   eal: Introduce new cache line macro definitions
> >   mbuf: fix performance/cache resource issue with 128-byte cache line
> >     targets
> >   bitmap: optimize for 128-bytes cache line targets
> >   cache/slow-path: reduce cache align requirement for 128-byte cache
> >     targets
> >   
> 
> ping for review/merge
> 
> >  app/test/test_mbuf.c                                     |  2 +-
> >  lib/librte_eal/common/include/rte_memory.h               | 16 ++++++++++++++++
> >  .../linuxapp/eal/include/exec-env/rte_kni_common.h       |  4 +++-
> >  lib/librte_ether/rte_ethdev.h                            |  4 ++--
> >  lib/librte_mbuf/rte_mbuf.h                               |  2 +-
> >  lib/librte_sched/rte_bitmap.h                            | 10 +++++-----
> >  6 files changed, 28 insertions(+), 10 deletions(-)
> > 
> > -- 
> > 2.1.0
> >   



-- 
  Jan Viktorin                E-mail: Viktorin@RehiveTech.com
  System Architect            Web:    www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2016-02-08 10:24         ` Jan Viktorin
@ 2016-02-08 10:52           ` Jerin Jacob
  2016-02-08 12:56             ` Jan Viktorin
  0 siblings, 1 reply; 40+ messages in thread
From: Jerin Jacob @ 2016-02-08 10:52 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev

On Mon, Feb 08, 2016 at 11:24:30AM +0100, Jan Viktorin wrote:
> Hello Jerin,
>
> I failed to apply the v4 series. I've tried multiple bases. The v3
> applied successfully to v2.2.0-rc4. Could you give the proper working
> base?

Hello Jan,

Please apply the depended "config: cleanup existing RTE_CACHE_LINE_SIZE
selection scheme" patch first as mentioned in the cover letter.

http://dpdk.org/dev/patchwork/patch/9388/

>
> Applying patch #10232 using 'git am'
> Description: [dpdk-dev,v4,2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets
> Applying: mbuf: fix performance/cache resource issue with 128-byte cache line targets
> error: patch failed: lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h:68
> error: lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h: patch does not apply
> Patch failed at 0001 mbuf: fix performance/cache resource issue with 128-byte cache line targets
> The copy of the patch that failed is found in: /home/jviki/upstream/dpdk/.git/worktrees/dpdk-pw/rebase-apply/patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 'git am' failed with exit status 128
> Try to fix git-am manually or exit...
>
> Regards
> Jan
>
> On Mon, 8 Feb 2016 15:01:58 +0530
> Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
>
> > On Fri, Jan 29, 2016 at 01:15:51PM +0530, Jerin Jacob wrote:
> > > This patchset fixes performance/cache resource issues with 128-byte cache line targets
> > > found in mbuf and bitmap DPDK libraries
> > >
> > > Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> > > 128-bytes cache line size target.
> > >
> > > This patchset doesn't introduce any performance degradation
> > > for 64-bytes cache line size targets.
> > >
> > > v1..v2
> > > - Introduced new cache macro definitions as Suggested by Konstantin
> > > - Reduced the cache alignment requirement for 128-byte cache targets in
> > > slow-path data structures to save the memory
> > > - Verified x86(a 64byte cacheline target) does not have any impact on these changes by
> > > verifying the md5sum of app/test,app/testpmd, app/testacl binaries with
> > > or without this patch set
> > >
> > > v2..v3
> > >
> > > revert the cache alignment of rte_ring_debug_stats,
> > > rte_mempool_debug_stats structures
> > >
> > > v3..v4
> > > replaced RTE_CACHE_MIN_LINE_SIZE with RTE_CACHE_LINE_MIN_SIZE as suggested by
> > > olivier.matz@6wind.com
> > >
> > > For clean git am, "config: cleanup existing RTE_CACHE_LINE_SIZE selection scheme"
> > > patch needs to apply first
> > >
> > > Jerin Jacob (4):
> > >   eal: Introduce new cache line macro definitions
> > >   mbuf: fix performance/cache resource issue with 128-byte cache line
> > >     targets
> > >   bitmap: optimize for 128-bytes cache line targets
> > >   cache/slow-path: reduce cache align requirement for 128-byte cache
> > >     targets
> > >
> >
> > ping for review/merge
> >
> > >  app/test/test_mbuf.c                                     |  2 +-
> > >  lib/librte_eal/common/include/rte_memory.h               | 16 ++++++++++++++++
> > >  .../linuxapp/eal/include/exec-env/rte_kni_common.h       |  4 +++-
> > >  lib/librte_ether/rte_ethdev.h                            |  4 ++--
> > >  lib/librte_mbuf/rte_mbuf.h                               |  2 +-
> > >  lib/librte_sched/rte_bitmap.h                            | 10 +++++-----
> > >  6 files changed, 28 insertions(+), 10 deletions(-)
> > >
> > > --
> > > 2.1.0
> > >
>
>
>
> --
>   Jan Viktorin                E-mail: Viktorin@RehiveTech.com
>   System Architect            Web:    www.RehiveTech.com
>   RehiveTech
>   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2016-02-08 10:52           ` Jerin Jacob
@ 2016-02-08 12:56             ` Jan Viktorin
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Viktorin @ 2016-02-08 12:56 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On Mon, 8 Feb 2016 16:22:17 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> On Mon, Feb 08, 2016 at 11:24:30AM +0100, Jan Viktorin wrote:
> > Hello Jerin,
> >
> > I failed to apply the v4 series. I've tried multiple bases. The v3
> > applied successfully to v2.2.0-rc4. Could you give the proper working
> > base?  
> 
> Hello Jan,
> 
> Please apply the depended "config: cleanup existing RTE_CACHE_LINE_SIZE
> selection scheme" patch first as mentioned in the cover letter.
> 
> http://dpdk.org/dev/patchwork/patch/9388/

Sorry, I've missed that note. Somebody should standardize dependency
specifications... It applies and builds well now.

Regards
Jan

> [snip]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets
  2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
                         ` (4 preceding siblings ...)
  2016-02-08  9:31       ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
@ 2016-02-11 11:53       ` Thomas Monjalon
  5 siblings, 0 replies; 40+ messages in thread
From: Thomas Monjalon @ 2016-02-11 11:53 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, viktorin

2016-01-29 13:15, Jerin Jacob:
> This patchset fixes performance/cache resource issues with 128-byte cache line targets
> found in mbuf and bitmap DPDK libraries
> 
> Currently, we have two DPDK targets(ThunderX and ppc_64) which are based on
> 128-bytes cache line size target.
> 
> This patchset doesn't introduce any performance degradation
> for 64-bytes cache line size targets.

Series applied, thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-02-11 11:54 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-06 15:59 [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
2015-12-06 15:59 ` [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource issue " Jerin Jacob
2015-12-07 15:21   ` Ananyev, Konstantin
2015-12-08 12:45     ` Jerin Jacob
2015-12-08 16:07       ` Ananyev, Konstantin
2015-12-08 17:49         ` Jerin Jacob
2015-12-09 13:44           ` Ananyev, Konstantin
2015-12-09 14:49             ` Jerin Jacob
2015-12-06 15:59 ` [dpdk-dev] [PATCH 2/2] bitmap: optimize for 128-bytes " Jerin Jacob
2015-12-06 16:30 ` [dpdk-dev] [PATCH 0/2] fix performance/cache resource issues with 128-byte " Thomas Monjalon
2015-12-07  7:26   ` Jerin Jacob
2015-12-07 11:40     ` Thomas Monjalon
2015-12-07 14:33       ` Jerin Jacob
2015-12-07 14:35         ` Thomas Monjalon
2015-12-10 16:36 ` [dpdk-dev] [PATCH v2 0/4] " Jerin Jacob
2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 1/4] eal: Introduce new cache macro definitions Jerin Jacob
2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
2015-12-10 16:36   ` [dpdk-dev] [PATCH v2 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
2015-12-11 12:55     ` Ananyev, Konstantin
2015-12-11 13:07       ` Thomas Monjalon
2015-12-11 13:56       ` Jerin Jacob
2015-12-11 14:42         ` Ananyev, Konstantin
2015-12-14  4:32   ` [dpdk-dev] [PATCH v3 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 1/4] eal: Introduce new cache macro definitions Jerin Jacob
2016-01-04 13:15       ` Olivier MATZ
2016-01-06 15:10         ` Jerin Jacob
2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
2015-12-14  4:32     ` [dpdk-dev] [PATCH v3 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
2016-01-29  7:45     ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 1/4] eal: Introduce new cache line macro definitions Jerin Jacob
2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 2/4] mbuf: fix performance/cache resource issue with 128-byte cache line targets Jerin Jacob
2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 3/4] bitmap: optimize for 128-bytes " Jerin Jacob
2016-01-29  7:45       ` [dpdk-dev] [PATCH v4 4/4] cache/slow-path: reduce cache align requirement for 128-byte cache targets Jerin Jacob
2016-02-08  9:31       ` [dpdk-dev] [PATCH v4 0/4] fix performance/cache resource issues with 128-byte cache line targets Jerin Jacob
2016-02-08 10:24         ` Jan Viktorin
2016-02-08 10:52           ` Jerin Jacob
2016-02-08 12:56             ` Jan Viktorin
2016-02-11 11:53       ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).