From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 1EAFE2C52 for ; Wed, 25 May 2016 14:53:36 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP; 25 May 2016 05:53:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,364,1459839600"; d="scan'208";a="110236059" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.220.175]) by fmsmga004.fm.intel.com with SMTP; 25 May 2016 05:53:34 -0700 Received: by (sSMTP sendmail emulation); Wed, 25 May 2016 13:53:33 +0025 Date: Wed, 25 May 2016 13:53:33 +0100 From: Bruce Richardson To: Jerin Jacob Cc: Jianbo Liu , dev@dpdk.org, helin.zhang@intel.com, konstantin.ananyev@intel.com Message-ID: <20160525125332.GA8612@bricha3-MOBL3> References: <1461159902-16680-1-git-send-email-jianbo.liu@linaro.org> <1462515948-23906-1-git-send-email-jianbo.liu@linaro.org> <1462515948-23906-3-git-send-email-jianbo.liu@linaro.org> <20160525122935.GA30670@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160525122935.GA30670@localhost.localdomain> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH v3 2/4] ixgbe: implement vector PMD for arm architecture X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 May 2016 12:53:37 -0000 On Wed, May 25, 2016 at 05:59:38PM +0530, Jerin Jacob wrote: > On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote: > > use ARM NEON intrinsic to implement ixgbe vPMD > > > > Signed-off-by: Jianbo Liu > > --- > > drivers/net/ixgbe/Makefile | 4 + > > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 ++++++++++++++++++++++++++++++++ > > 2 files changed, 565 insertions(+) > > create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c > > > > + for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts; > > + pos += RTE_IXGBE_DESCS_PER_LOOP, > > + rxdp += RTE_IXGBE_DESCS_PER_LOOP) { > > + uint64x2_t descs[RTE_IXGBE_DESCS_PER_LOOP]; > > + uint8x16_t pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4; > > + uint8x16x2_t sterr_tmp1, sterr_tmp2; > > + uint64x2_t mbp1, mbp2; > > + uint8x16_t staterr; > > + uint16x8_t tmp; > > + uint32_t stat; > > + > > + /* B.1 load 1 mbuf point */ > > + mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); > > + > > + /* Read desc statuses backwards to avoid race condition */ > > + /* A.1 load 4 pkts desc */ > > + descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); > > + rte_rmb(); > > Any specific reason to add rte_rmb() here, If there is no performance > drop then it makes sense to add before descs[3] uses it.i.e > at rte_compiler_barrier() place in x86 code. > > > + > > + /* B.2 copy 2 mbuf point into rx_pkts */ > > + vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); > > + > > + /* B.1 load 1 mbuf point */ > > + mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); > > + > > + descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > > + /* B.1 load 2 mbuf point */ > > + descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > > + descs[0] = vld1q_u64((uint64_t *)(rxdp)); > > + > > + /* B.2 copy 2 mbuf point into rx_pkts */ > > + vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); > > + > > + if (split_packet) { > > + rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1); > > + rte_prefetch_non_temporal(&rx_pkts[pos+1]->cacheline1); > > + rte_prefetch_non_temporal(&rx_pkts[pos+2]->cacheline1); > > + rte_prefetch_non_temporal(&rx_pkts[pos+3]->cacheline1); > > replace with rte_mbuf_prefetch_part2 or equivalent > Hi Jerin, Jianbo, since this patch has already been applied and these are not critical issues with it, can a new patch please be submitted to propose these additional changes on top of what's on next-net now. Thanks, /Bruce