From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B26307F70 for ; Thu, 31 Mar 2016 15:22:50 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 31 Mar 2016 06:22:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,422,1455004800"; d="scan'208";a="922592694" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by orsmga001.jf.intel.com with ESMTP; 31 Mar 2016 06:22:49 -0700 Received: from irsmsx111.ger.corp.intel.com (10.108.20.4) by IRSMSX151.ger.corp.intel.com (163.33.192.59) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 31 Mar 2016 14:22:47 +0100 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.13]) by irsmsx111.ger.corp.intel.com ([169.254.2.127]) with mapi id 14.03.0248.002; Thu, 31 Mar 2016 14:22:47 +0100 From: "Dumitrescu, Cristian" To: Robert Sanford , "dev@dpdk.org" CC: "Liang, Cunming" Thread-Topic: [PATCH 4/4] port: fix ethdev writer burst too big Thread-Index: AQHRiTO8m/qLB+HoCkiZczPANCBPkJ9ziSRA Date: Thu, 31 Mar 2016 13:22:47 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912647974F2E@IRSMSX108.ger.corp.intel.com> References: <1459198297-49854-1-git-send-email-rsanford@akamai.com> <1459198297-49854-5-git-send-email-rsanford@akamai.com> In-Reply-To: <1459198297-49854-5-git-send-email-rsanford@akamai.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTYyMDkyMGMtZTM0Yy00Yjg5LTg2NTctMmJkMzc0MTQzYTE1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InZiMzFqSzhWSmJpeVhuWFplenErR252dFNoK3NoTlwvazNId0hXSklkWmJFPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2016 13:22:51 -0000 > -----Original Message----- > From: Robert Sanford [mailto:rsanford2@gmail.com] > Sent: Monday, March 28, 2016 9:52 PM > To: dev@dpdk.org; Dumitrescu, Cristian > Subject: [PATCH 4/4] port: fix ethdev writer burst too big >=20 > For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally > send bursts larger than tx_burst_sz to the underlying ethdev. > Some PMDs (e.g., ixgbe) may truncate this request to their maximum > burst size, resulting in unnecessary enqueuing failures or ethdev > writer retries. Sending bursts larger than tx_burst_sz is actually intentional. The assumpt= ion is that NIC performance benefits from larger burst size. So the tx_burs= t_sz is used as a minimal burst size requirement, not as a maximal or fixed= burst size requirement. I agree with you that a while ago the vector version of IXGBE driver used t= o work the way you describe it, but I don't think this is the case anymore.= As an example, if TX burst size is set to 32 and 48 packets are transmitte= d, than the PMD will TX all the 48 packets (internally it can work in batch= es of 4, 8, 32, etc, should not matter) rather than TXing just 32 packets o= ut of 48 and user having to either discard or retry with the remaining 16 p= ackets. I am CC-ing Steve Liang for confirming this. Is there any PMD that people can name that currently behaves the opposite, = i.e. given a burst of 48 pkts for TX, accept 32 pkts and discard the other = 16? >=20 > We propose to fix this by moving the tx buffer flushing logic from > *after* the loop that puts all packets into the tx buffer, to *inside* > the loop, testing for a full burst when adding each packet. >=20 The issue I have with this approach is the introduction of a branch that ha= s to be tested for each iteration of the loop rather than once for the enti= re loop. The code branch where you add this is actually the slow(er) code path (wher= e local variable expr !=3D 0), which is used for non-contiguous or bursts s= maller than tx_burst_sz. Is there a particular reason you are only interest= ed of enabling this strategy (of using tx_burst_sz as a fixed burst size re= quirement) only on this code path? The reason I am asking is the other fast= (er) code path (where expr =3D=3D 0) also uses tx_burst_sz as a minimal req= uirement and therefore it can send burst sizes bigger than tx_burst_sz. > Signed-off-by: Robert Sanford > --- > lib/librte_port/rte_port_ethdev.c | 20 ++++++++++---------- > 1 files changed, 10 insertions(+), 10 deletions(-) >=20 > diff --git a/lib/librte_port/rte_port_ethdev.c > b/lib/librte_port/rte_port_ethdev.c > index 3fb4947..1283338 100644 > --- a/lib/librte_port/rte_port_ethdev.c > +++ b/lib/librte_port/rte_port_ethdev.c > @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void > *port, > struct rte_port_ethdev_writer { > struct rte_port_out_stats stats; >=20 > - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; > + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; > uint32_t tx_burst_sz; > uint16_t tx_buf_count; > uint64_t bsz_mask; > @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port, > p->tx_buf[tx_buf_count++] =3D pkt; >=20 > RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1); > pkts_mask &=3D ~pkt_mask; > - } >=20 > - p->tx_buf_count =3D tx_buf_count; > - if (tx_buf_count >=3D p->tx_burst_sz) > - send_burst(p); > + p->tx_buf_count =3D tx_buf_count; > + if (tx_buf_count >=3D p->tx_burst_sz) > + send_burst(p); > + } > } One observation here: if we enable this proposal (which I have an issue wit= h due to the executing the branch per loop iteration rather than once per e= ntire loop), it also eliminates the buffer overflow issue flagged by you in= the other email :), so no need to e.g. doble the size of the port internal= buffer (tx_buf). >=20 > return 0; > @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void > *port, > struct rte_port_ethdev_writer_nodrop { > struct rte_port_out_stats stats; >=20 > - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; > + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; > uint32_t tx_burst_sz; > uint16_t tx_buf_count; > uint64_t bsz_mask; > @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void > *port, > p->tx_buf[tx_buf_count++] =3D pkt; >=20 > RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); > pkts_mask &=3D ~pkt_mask; > - } >=20 > - p->tx_buf_count =3D tx_buf_count; > - if (tx_buf_count >=3D p->tx_burst_sz) > - send_burst_nodrop(p); > + p->tx_buf_count =3D tx_buf_count; > + if (tx_buf_count >=3D p->tx_burst_sz) > + send_burst_nodrop(p); > + } > } >=20 > return 0; > -- > 1.7.1