From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id C044DE72 for ; Thu, 2 Jul 2015 18:20:02 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP; 02 Jul 2015 09:20:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,394,1432623600"; d="scan'208";a="739385590" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by fmsmga001.fm.intel.com with ESMTP; 02 Jul 2015 09:20:00 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.245]) by IRSMSX106.ger.corp.intel.com ([169.254.8.121]) with mapi id 14.03.0224.002; Thu, 2 Jul 2015 17:19:59 +0100 From: "Ananyev, Konstantin" To: WangDong , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 2/2] ixgbe:replace compiler memory barrier and rte_wmb with rte_dma_rmb and rte_dma_wmb. Thread-Index: AQHQsbZ2Syj54xTnBkK2ZnxfY9sfjJ3IWp2A Date: Thu, 2 Jul 2015 16:19:59 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836A21B8F@irsmsx105.ger.corp.intel.com> References: <1435504998-15566-1-git-send-email-dong.wang.pro@hotmail.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 2/2] ixgbe:replace compiler memory barrier and rte_wmb with rte_dma_rmb and rte_dma_wmb. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jul 2015 16:20:03 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong > Sent: Sunday, June 28, 2015 4:23 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH 2/2] ixgbe:replace compiler memory barrier and= rte_wmb with rte_dma_rmb and rte_dma_wmb. >=20 > --- > drivers/net/ixgbe/ixgbe_rxtx.c | 30 +++++++++--------------------- > drivers/net/ixgbe/ixgbe_rxtx_vec.c | 3 +++ > 2 files changed, 12 insertions(+), 21 deletions(-) >=20 > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxt= x.c > index 3ace8a8..3316488 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -130,6 +130,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq) >=20 > /* check DD bit on threshold descriptor */ > status =3D txq->tx_ring[txq->tx_next_dd].wb.status; > + rte_dma_rmb(); > if (! (status & IXGBE_ADVTXD_STAT_DD)) > return 0; Could you explain, why do we need rmb here for weak ordering model? We don't read rest of TXD later, so nothing could be reordered here. >=20 > @@ -320,7 +321,7 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkt= s, > txq->tx_tail =3D 0; >=20 > /* update tail pointer */ > - rte_wmb(); > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(txq->tdt_reg_addr, txq->tx_tail); >=20 > return nb_pkts; > @@ -841,7 +842,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_= pkts, > txd->read.cmd_type_len |=3D rte_cpu_to_le_32(cmd_type_len); > } > end_of_tx: > - rte_wmb(); >=20 > /* > * Set the Transmit Descriptor Tail (TDT) > @@ -849,6 +849,7 @@ end_of_tx: > PMD_TX_LOG(DEBUG, "port_id=3D%u queue_id=3D%u tx_tail=3D%u nb_tx=3D%u", > (unsigned) txq->port_id, (unsigned) txq->queue_id, > (unsigned) tx_id, (unsigned) nb_tx); > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(txq->tdt_reg_addr, tx_id); > txq->tx_tail =3D tx_id; >=20 > @@ -975,6 +976,7 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq) >=20 > /* Compute how many status bits were set */ > nb_dd =3D 0; > + rte_dma_rmb(); I think that's a bit too late for rmb() here. We need to preserve order of reading all 8 statuses, so I am afraid we need= to: /* Read desc statuses backwards to avoid race condition */ -for (j =3D LOOK_AHEAD-1; j >=3D 0; --j)=20 +for (j =3D LOOK_AHEAD-1; j >=3D 0; --j) { + rte_dma_wmb(); s[j] =3D rxdp[j].wb.upper.status_error; +} > for (j =3D 0; j < LOOK_AHEAD; ++j) > nb_dd +=3D s[j] & IXGBE_RXDADV_STAT_DD; >=20 > @@ -1138,7 +1140,7 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_p= kts, > } >=20 > /* update tail pointer */ > - rte_wmb(); > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, cur_free_trigger); > } >=20 > @@ -1229,13 +1231,10 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf *= *rx_pkts, > /* > * The order of operations here is important as the DD status > * bit must not be read after any other descriptor fields. > - * rx_ring and rxdp are pointing to volatile data so the order > - * of accesses cannot be reordered by the compiler. If they were > - * not volatile, they could be reordered which could lead to > - * using invalid descriptor fields when read from rxd. > */ > rxdp =3D &rx_ring[rx_id]; > staterr =3D rxdp->wb.upper.status_error; > + rte_dma_rmb(); > if (! (staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > break; > rxd =3D *rxdp; > @@ -1373,6 +1372,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **r= x_pkts, > (unsigned) nb_rx); > rx_id =3D (uint16_t) ((rx_id =3D=3D 0) ? > (rxq->nb_rx_desc - 1) : (rx_id - 1)); > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id); > nb_hold =3D 0; > } > @@ -1494,17 +1494,6 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbu= f **rx_pkts, uint16_t nb_pkts, >=20 > next_desc: > /* > - * The code in this whole file uses the volatile pointer to > - * ensure the read ordering of the status and the rest of the > - * descriptor fields (on the compiler level only!!!). This is so > - * UGLY - why not to just use the compiler barrier instead? DPDK > - * even has the rte_compiler_barrier() for that. > - * > - * But most importantly this is just wrong because this doesn't > - * ensure memory ordering in a general case at all. For > - * instance, DPDK is supposed to work on Power CPUs where > - * compiler barrier may just not be enough! > - * > * I tried to write only this function properly to have a > * starting point (as a part of an LRO/RSC series) but the > * compiler cursed at me when I tried to cast away the > @@ -1519,12 +1508,11 @@ next_desc: > * TODO: > * - Get rid of "volatile" crap and let the compiler do its > * job. > - * - Use the proper memory barrier (rte_rmb()) to ensure the > - * memory ordering below. > */ > rxdp =3D &rx_ring[rx_id]; > staterr =3D rte_le_to_cpu_32(rxdp->wb.upper.status_error); >=20 > + rte_dma_rmb(); > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; >=20 > @@ -1704,7 +1692,7 @@ next_desc: > "nb_hold=3D%u nb_rx=3D%u", > rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx); >=20 > - rte_wmb(); > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id); > nb_hold =3D 0; > } I think you missed one more wmb() in that function: ixgbe_recv_pkts_lro(...) { ... } else if (nb_hold > rxq->rx_free_thresh) { uint16_t next_rdt =3D rxq->rx_free_trigger; if (!ixgbe_rx_alloc_bufs(rxq, false)) { rte_wmb(); IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, next_rdt); nb_hold -=3D rxq->rx_free_thresh; } else { > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe= _rxtx_vec.c > index abd10f6..af4d779 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c In fact, I think there is no much point to modify that one. Vector routines use IA specific instrincts, so that code wouldn't work on a= ny other architecture anyway. > @@ -123,6 +123,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) > (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1)); >=20 > /* Update the tail pointer on the NIC */ > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id); > } >=20 > @@ -528,6 +529,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq) >=20 > /* check DD bit on threshold descriptor */ > status =3D txq->tx_ring[txq->tx_next_dd].wb.status; > + rte_dma_rmb(); > if (!(status & IXGBE_ADVTXD_STAT_DD)) > return 0; Again, as with its scalar counterpart, I don't think we need rmb here. We read only status from one TXD, that's it. But as I said above, there is probably no need to touch that file at all. Konstantin >=20 > @@ -645,6 +647,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf *= *tx_pkts, >=20 > txq->tx_tail =3D tx_id; >=20 > + rte_dma_wmb(); > IXGBE_PCI_REG_WRITE(txq->tdt_reg_addr, txq->tx_tail); >=20 > return nb_pkts; > -- > 2.1.0