From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id D266D5F21 for ; Tue, 2 Oct 2018 18:09:12 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 09:09:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="267844332" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.237.221.49]) ([10.237.221.49]) by fmsmga005.fm.intel.com with ESMTP; 02 Oct 2018 09:08:22 -0700 To: John Daley Cc: dev@dpdk.org, Hyong Youb Kim References: <20180928192007.29519-1-johndale@cisco.com> <20180928192544.6833-1-johndale@cisco.com> <20180928192544.6833-2-johndale@cisco.com> From: Ferruh Yigit Openpgp: preference=signencrypt Message-ID: <6bb761a4-080e-b626-d5bc-f46e0596e0a5@intel.com> Date: Tue, 2 Oct 2018 17:08:21 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180928192544.6833-2-johndale@cisco.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/enic: add AVX2 based vectorized Rx handler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2018 16:09:13 -0000 On 9/28/2018 8:25 PM, John Daley wrote: > From: Hyong Youb Kim > > Add the vectorized version of the no-scatter Rx handler. It aims to > process 8 descriptors per loop using AVX2 SIMD instructions. This > handler is in its own file enic_rxtx_vec_avx2.c, and makefile and > meson.build are modified to compile it when the compiler supports > AVX2. Under ideal conditions, the vectorized handler reduces > cycles/packet by more than 30%, when compared against the no-scatter > Rx handler. Most implementation ideas come from i40e's AVX2 based > handler, so credit goes to its authors. > > At this point, the new handler is meant for field trials, and is not > selected by default. So add a new devarg enable-avx2-rx to allow the > user to request the use of the new handler. When enable-avx2-rx=1, the > driver will consider using the new handler. > > Also update the guide doc and introduce the vectorized handler. > > Signed-off-by: Hyong Youb Kim > Reviewed-by: John Daley <...> > +Vectorized Rx Handler > +--------------------- > + > +ENIC PMD includes a version of the receive handler that is vectorized using > +AVX2 SIMD instructions. It is meant for bulk, throughput oriented workloads > +where reducing cycles/packet in PMD is a priority. In order to use the > +vectorized handler, take the following steps. > + > +- Use a recent version of gcc, icc, or clang and build 64-bit DPDK. If > + the compiler is known to support AVX2, DPDK build system > + automatically compiles the vectorized handler. Otherwise, the > + handler is not available. > + > +- Set ``devargs`` parameter ``enable-avx2-rx=1`` to explicitly request that > + PMD consider the vectorized handler when selecting the receive handler. Can be good to show a usage sample, as done in disable-overlay=1, like -w 12:00.0,enable-avx2-rx=1 > + > + As the current implementation is intended for field trials, by default, the > + vectorized handler is not considerd (``enable-avx2-rx=0``). > + > +- Run on a UCS M4 or later server with CPUs that support AVX2. > + > +PMD selects the vectorized handler when the handler is compiled into > +the driver, the user requests its use via ``enable-avx2-rx=1``, CPU > +supports AVX2, and scatter Rx is not used. To verify that the Code doesn't check if user requested DEV_RX_OFFLOAD_SCATTER, if so perhaps shouldn't enable vector path. > +vectorized handler is selected, enable debug logging > +(``--log-level=pmd,debug``) and check the following message. > + > +.. code-block:: console > + > + enic_use_vector_rx_handler use the non-scatter avx2 Rx handler > + > .. _enic_limitations: > > Limitations > diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile > index 7c6c29cc0..3ec6f9159 100644 > --- a/drivers/net/enic/Makefile > +++ b/drivers/net/enic/Makefile > @@ -39,4 +39,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_intr.c > SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_rq.c > SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_rss.c > > +ifeq ($(findstring RTE_MACHINE_CPUFLAG_AVX2,$(CFLAGS)),RTE_MACHINE_CPUFLAG_AVX2) > +# The current implementation assumes 64-bit pointers > +ifeq ($(CONFIG_RTE_ARCH_X86_64),y) > + SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_rxtx_vec_avx2.c > +endif > +endif This is not exactly enough. CFLAGS has MACHINE_CPUFLAG based on the parameters pass the compiler. This flag is what compiler supports. For example if DPDK build for "default" architecture AVX2 won't be supported and your vector path won't be build at all. Please check i40e Makefile. > + > include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h > index 775cd5d55..665f5668a 100644 > --- a/drivers/net/enic/enic.h > +++ b/drivers/net/enic/enic.h > @@ -106,6 +106,7 @@ struct enic { > struct vnic_dev_bar bar0; > struct vnic_dev *vdev; > > + uint64_t mbuf_initializer; A comment can be good. <...> > @@ -535,6 +552,21 @@ int enic_enable(struct enic *enic) > int err; > struct rte_eth_dev *eth_dev = enic->rte_dev; > uint64_t simple_tx_offloads; > + uintptr_t p; > + struct rte_mbuf mb_def = { .buf_addr = 0 }; > + > + /* > + * mbuf_initializer contains const-after-init fields of > + * receive mbufs (i.e. 64 bits of fields from rearm_data). > + * It is currently used by the vectorized handler. > + */ > + mb_def.nb_segs = 1; > + mb_def.data_off = RTE_PKTMBUF_HEADROOM; > + mb_def.port = enic->port_id; > + rte_mbuf_refcnt_set(&mb_def, 1); > + rte_compiler_barrier(); > + p = (uintptr_t)&mb_def.rearm_data; > + enic->mbuf_initializer = *(uint64_t *)p; What do you think wrapping this block with "enic->enable_avx2_rx" check, mostly to show the usage intention? <...> > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) { > + PMD_INIT_LOG(DEBUG, " use the non-scatter avx2" > + " Rx handler"); No need to split the log line.