Data prefetch instruction can preload data into cpu’s hierarchical cache before data access. Virtualized data paths like virtio utilized this feature for acceleration. Since most modern cpus have support prefetch function, we can enable packet data prefetch as default. Signed-off-by: Marvin Liu <yong.liu@intel.com> diff --git a/config/meson.build b/config/meson.build index 69f2aeb605..a0c828a437 100644 --- a/config/meson.build +++ b/config/meson.build @@ -109,6 +109,9 @@ if not is_windows add_project_link_arguments('-Wl,--no-as-needed', language: 'c') endif +# do prefetch of packet data +dpdk_conf.set('RTE_PMD_PACKET_PREFETCH', 1) + # use pthreads if available for the platform if not is_windows add_project_link_arguments('-pthread', language: 'c') -- 2.17.1
On Tue, 22 Sep 2020 16:21:35 +0800
Marvin Liu <yong.liu@intel.com> wrote:
> Data prefetch instruction can preload data into cpu’s hierarchical
> cache before data access. Virtualized data paths like virtio utilized
> this feature for acceleration. Since most modern cpus have support
> prefetch function, we can enable packet data prefetch as default.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
> diff --git a/config/meson.build b/config/meson.build
> index 69f2aeb605..a0c828a437 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -109,6 +109,9 @@ if not is_windows
> add_project_link_arguments('-Wl,--no-as-needed', language: 'c')
> endif
>
> +# do prefetch of packet data
> +dpdk_conf.set('RTE_PMD_PACKET_PREFETCH', 1)
> +
> # use pthreads if available for the platform
> if not is_windows
> add_project_link_arguments('-pthread', language: 'c')
With meson, the project has been using rte_config.h for this.
Data prefetch instruction can preload data into cpu’s hierarchical cache before data access. Virtualized data paths like virtio utilized this feature for acceleration. Since most modern cpus have support prefetch function, we can enable packet data prefetch as default. Signed-off-by: Marvin Liu <yong.liu@intel.com> --- v2: move define from meson.build to rte_config.h --- config/rte_config.h | 1 + 1 file changed, 1 insertion(+) diff --git a/config/rte_config.h b/config/rte_config.h index 0bae630fd9..8b007c4c31 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -101,6 +101,7 @@ #define RTE_LIBRTE_GRAPH_STATS 1 /****** driver defines ********/ +#define RTE_PMD_PACKET_PREFETCH 1 /* QuickAssist device */ /* Max. number of QuickAssist devices which can be attached */ -- 2.17.1
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, September 22, 2020 10:12 PM
> To: Liu, Yong <yong.liu@intel.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] build: enable packet data prefetch
>
> On Tue, 22 Sep 2020 16:21:35 +0800
> Marvin Liu <yong.liu@intel.com> wrote:
>
> > Data prefetch instruction can preload data into cpu’s hierarchical
> > cache before data access. Virtualized data paths like virtio utilized
> > this feature for acceleration. Since most modern cpus have support
> > prefetch function, we can enable packet data prefetch as default.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >
>
> With meson, the project has been using rte_config.h for this.
Thanks a lot, will send v2 for the change.
Regards,
Marvin
23/09/2020 03:51, Marvin Liu:
> Data prefetch instruction can preload data into cpu’s hierarchical
> cache before data access. Virtualized data paths like virtio utilized
> this feature for acceleration. Since most modern cpus have support
> prefetch function, we can enable packet data prefetch as default.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
> ---
> +#define RTE_PMD_PACKET_PREFETCH 1
We could also remove the related #ifdefs.
What can be the drawback of always enable those prefetches?
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, October 15, 2020 6:03 AM
> To: Liu, Yong <yong.liu@intel.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>;
> stephen@networkplumber.org; dev@dpdk.org;
> david.marchand@redhat.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> maxime.coquelin@redhat.com; honnappa.nagarahalli@arm.com; David
> Christensen <drc@linux.vnet.ibm.com>; ruifeng.wang@arm.com
> Subject: Re: [dpdk-dev] [PATCH v2] config: enable packet data prefetch
>
> 23/09/2020 03:51, Marvin Liu:
> > Data prefetch instruction can preload data into cpu’s hierarchical
> > cache before data access. Virtualized data paths like virtio utilized
> > this feature for acceleration. Since most modern cpus have support
> > prefetch function, we can enable packet data prefetch as default.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> > ---
> > +#define RTE_PMD_PACKET_PREFETCH 1
>
> We could also remove the related #ifdefs.
>
> What can be the drawback of always enable those prefetches?
>
Hi Thomas,
I think the potential drawback is that current prefetch location cannot guarantee the best performance across different platforms.
Each developer has tuned the performance by adding prefetch instruction and verified the result on himself platform.
So prefetch location is based on certain platform, also it will be hard for developer to compare the results across platforms.
Thanks,
Marvin
<snip> > > > > 23/09/2020 03:51, Marvin Liu: > > > Data prefetch instruction can preload data into cpu’s hierarchical > > > cache before data access. Virtualized data paths like virtio > > > utilized this feature for acceleration. Since most modern cpus have > > > support prefetch function, we can enable packet data prefetch as default. > > > > > > Signed-off-by: Marvin Liu <yong.liu@intel.com> > > > --- > > > +#define RTE_PMD_PACKET_PREFETCH 1 > > > > We could also remove the related #ifdefs. > > > > What can be the drawback of always enable those prefetches? > > > > Hi Thomas, > I think the potential drawback is that current prefetch location cannot > guarantee the best performance across different platforms. Then, does it make sense to enable this by default? > Each developer has tuned the performance by adding prefetch instruction > and verified the result on himself platform. > So prefetch location is based on certain platform, also it will be hard for > developer to compare the results across platforms. > > Thanks, > Marvin
> -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Thursday, October 15, 2020 12:10 PM > To: Liu, Yong <yong.liu@intel.com>; thomas@monjalon.net > Cc: Richardson, Bruce <bruce.richardson@intel.com>; > stephen@networkplumber.org; dev@dpdk.org; > david.marchand@redhat.com; Yigit, Ferruh <ferruh.yigit@intel.com>; > maxime.coquelin@redhat.com; David Christensen > <drc@linux.vnet.ibm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>; nd > <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; > nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH v2] config: enable packet data prefetch > > <snip> > > > > > > > 23/09/2020 03:51, Marvin Liu: > > > > Data prefetch instruction can preload data into cpu’s hierarchical > > > > cache before data access. Virtualized data paths like virtio > > > > utilized this feature for acceleration. Since most modern cpus have > > > > support prefetch function, we can enable packet data prefetch as > default. > > > > > > > > Signed-off-by: Marvin Liu <yong.liu@intel.com> > > > > --- > > > > +#define RTE_PMD_PACKET_PREFETCH 1 > > > > > > We could also remove the related #ifdefs. > > > > > > What can be the drawback of always enable those prefetches? > > > > > > > Hi Thomas, > > I think the potential drawback is that current prefetch location cannot > > guarantee the best performance across different platforms. > Then, does it make sense to enable this by default? > Now most of prefetch actions are placed after pointer of data is valid. I think this methodology can benefit all platforms. It's hard to say that it’s the best choice for all. But no more better solution in my mind. At least, we need to allow user to enable packet data prefetch. Regards, Marvin > > Each developer has tuned the performance by adding prefetch instruction > > and verified the result on himself platform. > > So prefetch location is based on certain platform, also it will be hard for > > developer to compare the results across platforms. > > > > Thanks, > > Marvin
15/10/2020 10:23, Liu, Yong: > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > > > > 23/09/2020 03:51, Marvin Liu: > > > > > Data prefetch instruction can preload data into cpu’s hierarchical > > > > > cache before data access. Virtualized data paths like virtio > > > > > utilized this feature for acceleration. Since most modern cpus have > > > > > support prefetch function, we can enable packet data prefetch as > > default. > > > > > > > > > > Signed-off-by: Marvin Liu <yong.liu@intel.com> > > > > > --- > > > > > +#define RTE_PMD_PACKET_PREFETCH 1 > > > > > > > > We could also remove the related #ifdefs. > > > > > > > > What can be the drawback of always enable those prefetches? > > > > > > > > > > Hi Thomas, > > > I think the potential drawback is that current prefetch location cannot > > > guarantee the best performance across different platforms. > > Then, does it make sense to enable this by default? > > > > Now most of prefetch actions are placed after pointer of data is valid. I think this methodology can benefit all platforms. > It's hard to say that it’s the best choice for all. But no more better solution in my mind. > At least, we need to allow user to enable packet data prefetch. In my opinion, it can be tested and measured. > > > Each developer has tuned the performance by adding prefetch instruction > > > and verified the result on himself platform. > > > So prefetch location is based on certain platform, also it will be hard for > > > developer to compare the results across platforms. If it shows benefit on an architecture, then it should be enabled with #ifdef RTE_ARCH_XX I am for removing the option RTE_PMD_PACKET_PREFETCH.
<snip> > 15/10/2020 10:23, Liu, Yong: > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > > > > > 23/09/2020 03:51, Marvin Liu: > > > > > > Data prefetch instruction can preload data into cpu’s > > > > > > hierarchical cache before data access. Virtualized data paths > > > > > > like virtio utilized this feature for acceleration. Since most > > > > > > modern cpus have support prefetch function, we can enable > > > > > > packet data prefetch as > > > default. > > > > > > > > > > > > Signed-off-by: Marvin Liu <yong.liu@intel.com> > > > > > > --- > > > > > > +#define RTE_PMD_PACKET_PREFETCH 1 > > > > > > > > > > We could also remove the related #ifdefs. > > > > > > > > > > What can be the drawback of always enable those prefetches? > > > > > > > > > > > > > Hi Thomas, > > > > I think the potential drawback is that current prefetch location > > > > cannot guarantee the best performance across different platforms. > > > Then, does it make sense to enable this by default? > > > > > > > Now most of prefetch actions are placed after pointer of data is valid. I > think this methodology can benefit all platforms. > > It's hard to say that it’s the best choice for all. But no more better solution > in my mind. > > At least, we need to allow user to enable packet data prefetch. > > In my opinion, it can be tested and measured. + Joyce, to test this for VirtIO on Arm > > > > > Each developer has tuned the performance by adding prefetch > > > > instruction and verified the result on himself platform. > > > > So prefetch location is based on certain platform, also it will be > > > > hard for developer to compare the results across platforms. > > If it shows benefit on an architecture, then it should be enabled with #ifdef > RTE_ARCH_XX > > I am for removing the option RTE_PMD_PACKET_PREFETCH. > >
Data prefetch instruction can preload data into cpu’s hierarchical cache before data access. Data paths like e1000 and virtio utilized this feature for packet data access acceleration. Enabled packet data prefetch on x86 architecture, as prefetch instruction has been supported from very early generation. Signed-off-by: Marvin Liu <yong.liu@intel.com> --- v3: replace build config with pre-defined architecture macro v2: move define from meson.build to rte_config.h --- diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c index 19e3bffd46..5a9cd04c9c 100644 --- a/drivers/net/e1000/em_rxtx.c +++ b/drivers/net/e1000/em_rxtx.c @@ -185,7 +185,7 @@ struct em_tx_queue { #define rte_em_prefetch(p) do {} while(0) #endif -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while(0) diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index dd520cd82c..45d4ee0b92 100644 --- a/drivers/net/e1000/igb_rxtx.c +++ b/drivers/net/e1000/igb_rxtx.c @@ -198,7 +198,7 @@ struct igb_tx_queue { #define rte_igb_prefetch(p) do {} while(0) #endif -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while(0) diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c index 6a8718c086..671c32038b 100644 --- a/drivers/net/enic/enic_rxtx.c +++ b/drivers/net/enic/enic_rxtx.c @@ -25,7 +25,7 @@ #define rte_enic_prefetch(p) do {} while (0) #endif -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while (0) diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c index 4accaa2cd6..4bc3afadd1 100644 --- a/drivers/net/fm10k/fm10k_rxtx.c +++ b/drivers/net/fm10k/fm10k_rxtx.c @@ -10,7 +10,7 @@ #include "fm10k.h" #include "base/fm10k_type.h" -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while (0) diff --git a/drivers/net/igc/igc_txrx.c b/drivers/net/igc/igc_txrx.c index 4654ec41f0..b8b504738e 100644 --- a/drivers/net/igc/igc_txrx.c +++ b/drivers/net/igc/igc_txrx.c @@ -16,7 +16,7 @@ #define rte_igc_prefetch(p) do {} while (0) #endif -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while (0) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 6d2f7c9da3..4d39de9531 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -41,7 +41,7 @@ #define RX_RING_SZ ((IXGBE_MAX_RING_DESC + RTE_PMD_IXGBE_RX_MAX_BURST) * \ sizeof(union ixgbe_adv_rx_desc)) -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while(0) diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 42c4c9882f..0196290a5d 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -106,7 +106,7 @@ virtqueue_store_flags_packed(struct vring_packed_desc *dp, dp->flags = flags; } } -#ifdef RTE_PMD_PACKET_PREFETCH +#if defined(RTE_ARCH_X86) #define rte_packet_prefetch(p) rte_prefetch1(p) #else #define rte_packet_prefetch(p) do {} while(0) -- 2.17.1