DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev]  [PATCH] ethdev: make struct rte_eth_dev cache aligned
@ 2016-05-02  8:07 Jerin Jacob
  2016-05-03  9:40 ` Bruce Richardson
  2016-05-03 12:42 ` [dpdk-dev] [PATCH v2] " Jerin Jacob
  0 siblings, 2 replies; 10+ messages in thread
From: Jerin Jacob @ 2016-05-02  8:07 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, Jerin Jacob

Elements of struct rte_eth_dev used in the fast path.
Make struct rte_eth_dev cache aligned to avoid the cases where
rte_eth_dev elements share the same cache line with other structures.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_ether/rte_ethdev.c | 2 +-
 lib/librte_ether/rte_ethdev.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..04f492d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -70,7 +70,7 @@
 #include "rte_ethdev.h"
 
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
-struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
+struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] __rte_cache_aligned;
 static struct rte_eth_dev_data *rte_eth_dev_data;
 static uint8_t nb_ports;
 
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8519ff6..e359dda 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1630,7 +1630,7 @@ struct rte_eth_dev {
 	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	uint8_t attached; /**< Flag indicating the port is attached */
 	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
-};
+} __rte_cache_aligned;
 
 struct rte_eth_dev_sriov {
 	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH] ethdev: make struct rte_eth_dev cache aligned
  2016-05-02  8:07 [dpdk-dev] [PATCH] ethdev: make struct rte_eth_dev cache aligned Jerin Jacob
@ 2016-05-03  9:40 ` Bruce Richardson
  2016-05-03 12:10   ` Jerin Jacob
  2016-05-03 12:42 ` [dpdk-dev] [PATCH v2] " Jerin Jacob
  1 sibling, 1 reply; 10+ messages in thread
From: Bruce Richardson @ 2016-05-03  9:40 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, thomas.monjalon

On Mon, May 02, 2016 at 01:37:45PM +0530, Jerin Jacob wrote:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 2 +-
>  lib/librte_ether/rte_ethdev.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index a31018e..04f492d 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -70,7 +70,7 @@
>  #include "rte_ethdev.h"
>  
>  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
> -struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> +struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] __rte_cache_aligned;
>  static struct rte_eth_dev_data *rte_eth_dev_data;
>  static uint8_t nb_ports;
>  
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 8519ff6..e359dda 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1630,7 +1630,7 @@ struct rte_eth_dev {
>  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>  	uint8_t attached; /**< Flag indicating the port is attached */
>  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> -};
> +} __rte_cache_aligned;
>  
>  struct rte_eth_dev_sriov {
>  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
> -- 
Adding cache aligned in two places is overkill, I think. If the structure is
marked as being cache aligned, there is no need to have the rte_eth_devices
marked that way too. I suggest therefore just keeping the structure alignment.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH] ethdev: make struct rte_eth_dev cache aligned
  2016-05-03  9:40 ` Bruce Richardson
@ 2016-05-03 12:10   ` Jerin Jacob
  0 siblings, 0 replies; 10+ messages in thread
From: Jerin Jacob @ 2016-05-03 12:10 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, thomas.monjalon

On Tue, May 03, 2016 at 10:40:53AM +0100, Bruce Richardson wrote:
> On Mon, May 02, 2016 at 01:37:45PM +0530, Jerin Jacob wrote:
> > Elements of struct rte_eth_dev used in the fast path.
> > Make struct rte_eth_dev cache aligned to avoid the cases where
> > rte_eth_dev elements share the same cache line with other structures.
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  lib/librte_ether/rte_ethdev.c | 2 +-
> >  lib/librte_ether/rte_ethdev.h | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > index a31018e..04f492d 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -70,7 +70,7 @@
> >  #include "rte_ethdev.h"
> >  
> >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
> > -struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > +struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] __rte_cache_aligned;
> >  static struct rte_eth_dev_data *rte_eth_dev_data;
> >  static uint8_t nb_ports;
> >  
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index 8519ff6..e359dda 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1630,7 +1630,7 @@ struct rte_eth_dev {
> >  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> >  	uint8_t attached; /**< Flag indicating the port is attached */
> >  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> > -};
> > +} __rte_cache_aligned;
> >  
> >  struct rte_eth_dev_sriov {
> >  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
> > -- 
> Adding cache aligned in two places is overkill, I think. If the structure is
> marked as being cache aligned, there is no need to have the rte_eth_devices
> marked that way too. I suggest therefore just keeping the structure alignment.

OK. Will fix it in v2

> 
> Regards,
> /Bruce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-02  8:07 [dpdk-dev] [PATCH] ethdev: make struct rte_eth_dev cache aligned Jerin Jacob
  2016-05-03  9:40 ` Bruce Richardson
@ 2016-05-03 12:42 ` Jerin Jacob
  2016-05-04 11:09   ` Bruce Richardson
  2016-06-22 21:20   ` Thomas Monjalon
  1 sibling, 2 replies; 10+ messages in thread
From: Jerin Jacob @ 2016-05-03 12:42 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, thomas.monjalon, Jerin Jacob

Elements of struct rte_eth_dev used in the fast path.
Make struct rte_eth_dev cache aligned to avoid the cases where
rte_eth_dev elements share the same cache line with other structures.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v2:
Remove __rte_cache_aligned from rte_eth_devices and keep
it only at struct rte_eth_dev definition as suggested by Bruce
http://dpdk.org/dev/patchwork/patch/12328/
---
 lib/librte_ether/rte_ethdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..48f14d5 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1615,7 +1615,7 @@ struct rte_eth_dev {
 	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	uint8_t attached; /**< Flag indicating the port is attached */
 	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
-};
+} __rte_cache_aligned;
 
 struct rte_eth_dev_sriov {
 	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-03 12:42 ` [dpdk-dev] [PATCH v2] " Jerin Jacob
@ 2016-05-04 11:09   ` Bruce Richardson
  2016-05-04 13:42     ` Jerin Jacob
  2016-06-22 21:20   ` Thomas Monjalon
  1 sibling, 1 reply; 10+ messages in thread
From: Bruce Richardson @ 2016-05-04 11:09 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, thomas.monjalon

On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v2:
> Remove __rte_cache_aligned from rte_eth_devices and keep
> it only at struct rte_eth_dev definition as suggested by Bruce
> http://dpdk.org/dev/patchwork/patch/12328/
> ---
>  lib/librte_ether/rte_ethdev.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 2757510..48f14d5 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
>  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>  	uint8_t attached; /**< Flag indicating the port is attached */
>  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> -};
> +} __rte_cache_aligned;
>  
>  struct rte_eth_dev_sriov {
>  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
> -- 

Hi Jerin,

have you seen a performance degradation due to ethdev elements sharing a cache
line? I ask because, surprisingly for me, I actually see a performance regression
when I apply the above patch. It's not a big change - perf reduction of <1% - but
still noticable across multiple runs using testpmd. I'm using two 1x40G NICs
using i40e driver, and I see ~100kpps less traffic per port after applying the
patch. [CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz]

Testpmd cmd and output shown below.

Regards,
/Bruce

$ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c C0000 -n 4 -- --rxd=512 --txd=512 --numa
EAL: Detected 36 lcore(s)
EAL: Probing VFIO support...
PMD: nfp_net_pmd_init(): librte_pmd_nfp_net version 0.1

EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:05:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:08:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:08:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1584 rte_i40e_pmd
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
EAL: PCI device 0000:88:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1584 rte_i40e_pmd
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
Configuring Port 0 (socket 1)
Port 0: 68:05:CA:27:D4:4E
Configuring Port 1 (socket 1)
Port 1: 68:05:CA:27:D2:0A
Checking link statuses...
Port 0 Link Up - speed 40000 Mbps - full-duplex
Port 1 Link Up - speed 40000 Mbps - full-duplex
Done
No commandline core given, start packet forwarding
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=512 - RX free threshold=32
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=32
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=32 - TXQ flags=0xf01
Press enter to exit

Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 1940564672     RX-dropped: 1456035742    RX-total: 3396600414
  TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 1940564671     RX-dropped: 1456036082    RX-total: 3396600753
  TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 3881129343     RX-dropped: 2912071824    RX-total: 6793201167
  TX-packets: 3881129472     TX-dropped: 0             TX-total: 3881129472
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

Shutting down port 0...
Stopping ports...
Done
Closing ports...
Done

Shutting down port 1...
Stopping ports...
Done
Closing ports...
Done

Bye...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-04 11:09   ` Bruce Richardson
@ 2016-05-04 13:42     ` Jerin Jacob
  2016-05-04 13:53       ` Richardson, Bruce
  0 siblings, 1 reply; 10+ messages in thread
From: Jerin Jacob @ 2016-05-04 13:42 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, thomas.monjalon

On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > Elements of struct rte_eth_dev used in the fast path.
> > Make struct rte_eth_dev cache aligned to avoid the cases where
> > rte_eth_dev elements share the same cache line with other structures.
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> > v2:
> > Remove __rte_cache_aligned from rte_eth_devices and keep
> > it only at struct rte_eth_dev definition as suggested by Bruce
> > http://dpdk.org/dev/patchwork/patch/12328/
> > ---
> >  lib/librte_ether/rte_ethdev.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index 2757510..48f14d5 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> >  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> >  	uint8_t attached; /**< Flag indicating the port is attached */
> >  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> > -};
> > +} __rte_cache_aligned;
> >  
> >  struct rte_eth_dev_sriov {
> >  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
> > -- 
> 
> Hi Jerin,

Hi Bruce,

> 
> have you seen a performance degradation due to ethdev elements sharing a cache

No. Not because of sharing the cache line.

> line? I ask because, surprisingly for me, I actually see a performance regression

I see performance degradation in PMD in my setup where independent
changes are causing the performance issue in PMD(~<100k). That's the reason
I thought making aligned cache line stuff where ever it makes sense so that
independent change shouldn't impact the PMD performance and this patch
was an initiative for the same.

> when I apply the above patch. It's not a big change - perf reduction of <1% - but
> still noticable across multiple runs using testpmd. I'm using two 1x40G NICs
> using i40e driver, and I see ~100kpps less traffic per port after applying the
> patch. [CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz]

This particular patch does not have any performance degradation in my
setup.
CPU: ThunderX

> 
> Testpmd cmd and output shown below.
> 
> Regards,
> /Bruce
> 
> $ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c C0000 -n 4 -- --rxd=512 --txd=512 --numa
> EAL: Detected 36 lcore(s)
> EAL: Probing VFIO support...
> PMD: nfp_net_pmd_init(): librte_pmd_nfp_net version 0.1
> 
> EAL: PCI device 0000:01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL: PCI device 0000:01:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL: PCI device 0000:05:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:05:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:08:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:08:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:81:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1584 rte_i40e_pmd
> PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
> EAL: PCI device 0000:88:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1584 rte_i40e_pmd
> PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
> Configuring Port 0 (socket 1)
> Port 0: 68:05:CA:27:D4:4E
> Configuring Port 1 (socket 1)
> Port 1: 68:05:CA:27:D2:0A
> Checking link statuses...
> Port 0 Link Up - speed 40000 Mbps - full-duplex
> Port 1 Link Up - speed 40000 Mbps - full-duplex
> Done
> No commandline core given, start packet forwarding
>   io packet forwarding - CRC stripping disabled - packets/burst=32
>   nb forwarding cores=1 - nb forwarding ports=2
>   RX queues=1 - RX desc=512 - RX free threshold=32
>   RX threshold registers: pthresh=8 hthresh=8 wthresh=0
>   TX queues=1 - TX desc=512 - TX free threshold=32
>   TX threshold registers: pthresh=32 hthresh=0 wthresh=0
>   TX RS bit threshold=32 - TXQ flags=0xf01
> Press enter to exit
> 
> Telling cores to stop...
> Waiting for lcores to finish...
> 
>   ---------------------- Forward statistics for port 0  ----------------------
>   RX-packets: 1940564672     RX-dropped: 1456035742    RX-total: 3396600414
>   TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
>   ----------------------------------------------------------------------------
> 
>   ---------------------- Forward statistics for port 1  ----------------------
>   RX-packets: 1940564671     RX-dropped: 1456036082    RX-total: 3396600753
>   TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
>   ----------------------------------------------------------------------------
> 
>   +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
>   RX-packets: 3881129343     RX-dropped: 2912071824    RX-total: 6793201167
>   TX-packets: 3881129472     TX-dropped: 0             TX-total: 3881129472
>   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Done.
> 
> Shutting down port 0...
> Stopping ports...
> Done
> Closing ports...
> Done
> 
> Shutting down port 1...
> Stopping ports...
> Done
> Closing ports...
> Done
> 
> Bye...
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-04 13:42     ` Jerin Jacob
@ 2016-05-04 13:53       ` Richardson, Bruce
  2016-05-04 15:19         ` Jerin Jacob
  0 siblings, 1 reply; 10+ messages in thread
From: Richardson, Bruce @ 2016-05-04 13:53 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, thomas.monjalon



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, May 4, 2016 2:43 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> aligned
> 
> On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> > On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > > Elements of struct rte_eth_dev used in the fast path.
> > > Make struct rte_eth_dev cache aligned to avoid the cases where
> > > rte_eth_dev elements share the same cache line with other structures.
> > >
> > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > > v2:
> > > Remove __rte_cache_aligned from rte_eth_devices and keep it only at
> > > struct rte_eth_dev definition as suggested by Bruce
> > > http://dpdk.org/dev/patchwork/patch/12328/
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 2757510..48f14d5 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> > >  	struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> > >  	uint8_t attached; /**< Flag indicating the port is attached */
> > >  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device
> > > type */ -};
> > > +} __rte_cache_aligned;
> > >
> > >  struct rte_eth_dev_sriov {
> > >  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64
> pools */
> > > --
> >
> > Hi Jerin,
> 
> Hi Bruce,
> 
> >
> > have you seen a performance degradation due to ethdev elements sharing
> > a cache
> 
> No. Not because of sharing the cache line.
> 
> > line? I ask because, surprisingly for me, I actually see a performance
> > regression
> 
> I see performance degradation in PMD in my setup where independent changes
> are causing the performance issue in PMD(~<100k). That's the reason I
> thought making aligned cache line stuff where ever it makes sense so that
> independent change shouldn't impact the PMD performance and this patch was
> an initiative for the same.
> 
> > when I apply the above patch. It's not a big change - perf reduction
> > of <1% - but still noticable across multiple runs using testpmd. I'm
> > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > E5-2699 v3 @ 2.30GHz]
> 
> This particular patch does not have any performance degradation in my
> setup.
> CPU: ThunderX

Ok, so I take it that this patch is performance neutral on your setup, then?
If that's the case, can we hold off on merging it on the basis that it's not needed and does cause a slight regression.

Thanks,
/Bruce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-04 13:53       ` Richardson, Bruce
@ 2016-05-04 15:19         ` Jerin Jacob
  2016-05-04 15:48           ` Bruce Richardson
  0 siblings, 1 reply; 10+ messages in thread
From: Jerin Jacob @ 2016-05-04 15:19 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev, thomas.monjalon

On Wed, May 04, 2016 at 01:53:39PM +0000, Richardson, Bruce wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, May 4, 2016 2:43 PM
> > To: Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: dev@dpdk.org; thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> > aligned
> > 
> > On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:

Snip

> > >
> > > Hi Jerin,
> > 
> > Hi Bruce,
> > 
> > >
> > > have you seen a performance degradation due to ethdev elements sharing
> > > a cache
> > 
> > No. Not because of sharing the cache line.
> > 
> > > line? I ask because, surprisingly for me, I actually see a performance
> > > regression
> > 
> > I see performance degradation in PMD in my setup where independent changes
> > are causing the performance issue in PMD(~<100k). That's the reason I
> > thought making aligned cache line stuff where ever it makes sense so that
> > independent change shouldn't impact the PMD performance and this patch was
> > an initiative for the same.
> > 
> > > when I apply the above patch. It's not a big change - perf reduction
> > > of <1% - but still noticable across multiple runs using testpmd. I'm
> > > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > > E5-2699 v3 @ 2.30GHz]
> > 
> > This particular patch does not have any performance degradation in my
> > setup.
> > CPU: ThunderX
> 
> Ok, so I take it that this patch is performance neutral on your setup, then?
> If that's the case, can we hold off on merging it on the basis that it's not needed and does cause a slight regression.

OK

Can you share the base dpdk.org upstream change set where this patch
creates the slight regression? I will test it the same on IA and
ThunderX platform.

In my testpmd build, rte_eth_devices(0x0000000000751ef8) share the cache
line with inactive "notify_ops" that the reason for its not creating any benefit.
I guess the case would have been different if its active write location.

 COMMON         0x0000000000751ef0        0x8
/home/jerin/dpdk-thunderx/build/lib/librte_vhost.a(virtio-net.o)
                0x0000000000751ef0                notify_ops
 COMMON         0x0000000000751ef8    0x80900
/home/jerin/dpdk-thunderx/build/lib/libethdev.a(rte_ethdev.o)
                0x0000000000751ef8                rte_eth_devices

Thanks,
Jerin

> 
> Thanks,
> /Bruce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-04 15:19         ` Jerin Jacob
@ 2016-05-04 15:48           ` Bruce Richardson
  0 siblings, 0 replies; 10+ messages in thread
From: Bruce Richardson @ 2016-05-04 15:48 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, thomas.monjalon

On Wed, May 04, 2016 at 08:49:38PM +0530, Jerin Jacob wrote:
> On Wed, May 04, 2016 at 01:53:39PM +0000, Richardson, Bruce wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Wednesday, May 4, 2016 2:43 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>
> > > Cc: dev@dpdk.org; thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> > > aligned
> > > 
> > > On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> 
> Snip
> 
> > > >
> > > > Hi Jerin,
> > > 
> > > Hi Bruce,
> > > 
> > > >
> > > > have you seen a performance degradation due to ethdev elements sharing
> > > > a cache
> > > 
> > > No. Not because of sharing the cache line.
> > > 
> > > > line? I ask because, surprisingly for me, I actually see a performance
> > > > regression
> > > 
> > > I see performance degradation in PMD in my setup where independent changes
> > > are causing the performance issue in PMD(~<100k). That's the reason I
> > > thought making aligned cache line stuff where ever it makes sense so that
> > > independent change shouldn't impact the PMD performance and this patch was
> > > an initiative for the same.
> > > 
> > > > when I apply the above patch. It's not a big change - perf reduction
> > > > of <1% - but still noticable across multiple runs using testpmd. I'm
> > > > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > > > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > > > E5-2699 v3 @ 2.30GHz]
> > > 
> > > This particular patch does not have any performance degradation in my
> > > setup.
> > > CPU: ThunderX
> > 
> > Ok, so I take it that this patch is performance neutral on your setup, then?
> > If that's the case, can we hold off on merging it on the basis that it's not needed and does cause a slight regression.
> 
> OK
> 
> Can you share the base dpdk.org upstream change set where this patch
> creates the slight regression? I will test it the same on IA and
> ThunderX platform.
> 
> In my testpmd build, rte_eth_devices(0x0000000000751ef8) share the cache
> line with inactive "notify_ops" that the reason for its not creating any benefit.
> I guess the case would have been different if its active write location.
> 
>  COMMON         0x0000000000751ef0        0x8
> /home/jerin/dpdk-thunderx/build/lib/librte_vhost.a(virtio-net.o)
>                 0x0000000000751ef0                notify_ops
>  COMMON         0x0000000000751ef8    0x80900
> /home/jerin/dpdk-thunderx/build/lib/libethdev.a(rte_ethdev.o)
>                 0x0000000000751ef8                rte_eth_devices
> 
> Thanks,
> Jerin
> 
> > 
Hi Jerin,

I see no change with the latest code on dpdk.org mainline. I do see a change though,
with the latest code on the dpdk-next-net/rel_16_07 branch, which has additional
i40e driver improvements on it.
However, on re-running the tests, I realise I may have got my measurement results
backwards and that this patch may actually cause a slight increase instead. :-(

Therefore, I withdraw my current objections to this patch. But I would like to
see someone else verify what performance difference, if any, this causes on
different platforms, so your planned tests on ThunderX and IA would still be
great to see! :-)

Thanks,
/Bruce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned
  2016-05-03 12:42 ` [dpdk-dev] [PATCH v2] " Jerin Jacob
  2016-05-04 11:09   ` Bruce Richardson
@ 2016-06-22 21:20   ` Thomas Monjalon
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Monjalon @ 2016-06-22 21:20 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, bruce.richardson

2016-05-03 18:12, Jerin Jacob:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Let's try it in real tests.
Applied, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-22 21:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-02  8:07 [dpdk-dev] [PATCH] ethdev: make struct rte_eth_dev cache aligned Jerin Jacob
2016-05-03  9:40 ` Bruce Richardson
2016-05-03 12:10   ` Jerin Jacob
2016-05-03 12:42 ` [dpdk-dev] [PATCH v2] " Jerin Jacob
2016-05-04 11:09   ` Bruce Richardson
2016-05-04 13:42     ` Jerin Jacob
2016-05-04 13:53       ` Richardson, Bruce
2016-05-04 15:19         ` Jerin Jacob
2016-05-04 15:48           ` Bruce Richardson
2016-06-22 21:20   ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).