From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb0-f170.google.com (mail-yb0-f170.google.com [209.85.213.170]) by dpdk.org (Postfix) with ESMTP id 950A0591E for ; Thu, 9 Feb 2017 04:49:58 +0100 (CET) Received: by mail-yb0-f170.google.com with SMTP id o65so52301287ybo.2 for ; Wed, 08 Feb 2017 19:49:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=vzUnWeeHm8TMWR+FLfCJ7mOH0r1Kqfhqun6bCpQsnZY=; b=e1icHeJ/IXWTtzrDBcFv5N4BQkNrjljYpIwqmyGjexn9B9LnQ/nnnaB/0aSeWi9iuZ 6PgQL2YL29rTNtGab+Ip0x8z7Ro2vgOwcbQmb27EQIIzCb2Uu8cRTH8UAX4ATaGwkPwC uuPmYHJ10aZY/4PbP4x7MawiFcbBKB+sClnqw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vzUnWeeHm8TMWR+FLfCJ7mOH0r1Kqfhqun6bCpQsnZY=; b=I9fXbhIkug5gdogEVgoTht7KDpWwWF9xtTd0VPAmfxx3oFxslb0UBREbzgxICX0jYg I8/fnOO00q+vsl9hF21VrPtTuU+jQgcUKuUKCkEw39BUAk3wsBIccaP1j5aOixfldNXb qB8d4/N3ppzclgO0mVaF51VrOo6kXG+wmdzCpGhrAGidCDo+UxMZjG4iUnJliuoVdVpS qand6+8KztDthJ7yaC1JzNOYyls5n0S8QTG4BIsf9yGTIgZc+r8HyZF4QJhGU0b2++qA tOqdoMdwVt7Qfseo9ZfOEy6mhpJ/il3oz819h32ml8z18G8FjBVGIMJMv182Fs6y/0vz SHPg== X-Gm-Message-State: AMke39mKG/Hih2uWq6e2WBeEh+xaLGpOmcJpSwFGZFdqhV8RFc6OjnM9jxwgI0H3a5jIKRY0tD1e0tQCkLW0E218 X-Received: by 10.37.172.165 with SMTP id x37mr480036ybi.54.1486612197737; Wed, 08 Feb 2017 19:49:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.200.4 with HTTP; Wed, 8 Feb 2017 19:49:57 -0800 (PST) In-Reply-To: <2601191342CEEE43887BDE71AB9772583F114FBC@irsmsx105.ger.corp.intel.com> References: <1482127758-4904-1-git-send-email-jianbo.liu@linaro.org> <1486201024-32656-1-git-send-email-jianbo.liu@linaro.org> <2601191342CEEE43887BDE71AB9772583F110DC7@irsmsx105.ger.corp.intel.com> <1de7e1b9-cdf9-d53e-132b-30be97883af8@intel.com> <2601191342CEEE43887BDE71AB9772583F114EDA@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583F114FBC@irsmsx105.ger.corp.intel.com> From: Jianbo Liu Date: Thu, 9 Feb 2017 11:49:57 +0800 Message-ID: To: "Ananyev, Konstantin" Cc: "Yigit, Ferruh" , "dev@dpdk.org" , "Zhang, Helin" , "jerin.jacob@caviumnetworks.com" Content-Type: text/plain; charset=UTF-8 Subject: Re: [dpdk-dev] [PATCH v2 1/2] net/ixgbe: calculate the correct number of received packets in bulk alloc function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Feb 2017 03:49:58 -0000 On 9 February 2017 at 03:53, Ananyev, Konstantin wrote: > > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin >> Sent: Wednesday, February 8, 2017 6:54 PM >> To: Yigit, Ferruh ; Jianbo Liu ; dev@dpdk.org; Zhang, Helin ; >> jerin.jacob@caviumnetworks.com >> Subject: Re: [dpdk-dev] [PATCH v2 1/2] net/ixgbe: calculate the correct number of received packets in bulk alloc function >> >> Hi Ferruh, >> >> > >> > On 2/4/2017 1:26 PM, Ananyev, Konstantin wrote: >> > >> >> > >> To get better performance, Rx bulk alloc recv function will scan 8 descs >> > >> in one time, but the statuses are not consistent on ARM platform because >> > >> the memory allocated for Rx descriptors is cacheable hugepages. >> > >> This patch is to calculate the number of received packets by scan DD bit >> > >> sequentially, and stops when meeting the first packet with DD bit unset. >> > >> >> > >> Signed-off-by: Jianbo Liu >> > >> --- >> > >> drivers/net/ixgbe/ixgbe_rxtx.c | 16 +++++++++------- >> > >> 1 file changed, 9 insertions(+), 7 deletions(-) >> > >> >> > >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> index 36f1c02..613890e 100644 >> > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> @@ -1460,17 +1460,19 @@ static inline int __attribute__((always_inline)) >> > >> for (i = 0; i < RTE_PMD_IXGBE_RX_MAX_BURST; >> > >> i += LOOK_AHEAD, rxdp += LOOK_AHEAD, rxep += LOOK_AHEAD) { >> > >> /* Read desc statuses backwards to avoid race condition */ >> > >> - for (j = LOOK_AHEAD-1; j >= 0; --j) >> > >> + for (j = 0; j < LOOK_AHEAD; j++) >> > >> s[j] = rte_le_to_cpu_32(rxdp[j].wb.upper.status_error); >> > >> >> > >> - for (j = LOOK_AHEAD - 1; j >= 0; --j) >> > >> - pkt_info[j] = rte_le_to_cpu_32(rxdp[j].wb.lower. >> > >> - lo_dword.data); >> > >> + rte_smp_rmb(); >> > >> >> > >> /* Compute how many status bits were set */ >> > >> - nb_dd = 0; >> > >> - for (j = 0; j < LOOK_AHEAD; ++j) >> > >> - nb_dd += s[j] & IXGBE_RXDADV_STAT_DD; >> > >> + for (nb_dd = 0; nb_dd < LOOK_AHEAD && >> > >> + (s[nb_dd] & IXGBE_RXDADV_STAT_DD); nb_dd++) >> > >> + ; >> > >> + >> > >> + for (j = 0; j < nb_dd; j++) >> > >> + pkt_info[j] = rte_le_to_cpu_32(rxdp[j].wb.lower. >> > >> + lo_dword.data); >> > >> >> > >> nb_rx += nb_dd; >> > >> >> > >> -- >> > > >> > > Acked-by: Konstantin Ananyev >> > >> > Hi Konstantin, >> > >> > Is the ack valid for v3 and both patches? >> >> No, I didn't look into the second one in details. >> It is ARM specific, and I left it for people who are more familiar with ARM then me :) >> Konstantin > > Actually, I had a quick look after your mail. > > + /* A.1 load 1 pkts desc */ > + descs[0] = vld1q_u64((uint64_t *)(rxdp)); > + rte_smp_rmb(); > > /* B.2 copy 2 mbuf point into rx_pkts */ > vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); > @@ -271,10 +270,11 @@ > /* B.1 load 1 mbuf point */ > mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); > > - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > - /* B.1 load 2 mbuf point */ > descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > - descs[0] = vld1q_u64((uint64_t *)(rxdp)); > + > + /* A.1 load 2 pkts descs */ > + descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > + descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); > > Assuming that on all ARM-NEON platforms 16B reads are atomic, > I think there is no need for smp_rmb() after the desc[0] read. > What looks more appropriate to me: With checking DDs in sequence, it doesn't matter much where the rmb is. But there is a little performance improvement (0.02%) in my testing with your suggestion. So I'll send a new version. Thanks! > > descs[0] = vld1q_u64((uint64_t *)(rxdp)); > descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); > > rte_smp_rmb(); > > ... > > But, as I said would be good if some ARM guys have a look here. > Konstantin > > >> >> > >> > Thanks, >> > ferruh >> > >> > > >> > >> 1.8.3.1 >> > > >