From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 10AFFB62 for ; Mon, 11 Apr 2016 21:21:28 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP; 11 Apr 2016 12:21:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,469,1455004800"; d="scan'208";a="684548818" Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25]) by FMSMGA003.fm.intel.com with ESMTP; 11 Apr 2016 12:21:26 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.238]) by irsmsx110.ger.corp.intel.com ([169.254.15.10]) with mapi id 14.03.0248.002; Mon, 11 Apr 2016 20:21:25 +0100 From: "Dumitrescu, Cristian" To: "Sanford, Robert" , "dev@dpdk.org" CC: "Liang, Cunming" , "Venkatesan, Venky" , "Richardson, Bruce" Thread-Topic: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Thread-Index: AQHRjDKoyc2GDY2FaUGj9fRrdME9ip+FL/cQ Date: Mon, 11 Apr 2016 19:21:24 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D89126479A0A65@IRSMSX108.ger.corp.intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZWU0YjQzZjItNzY0Yi00NjQwLWI4ODgtNGUzMWNkZjk5ODllIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IklONkU4bmJFRTJDNDFcL3U5YkJ0T3BaMDE2XC91ellLZnR5MU1NZmlIcGIxST0ifQ== x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Apr 2016 19:21:29 -0000 Hi Robert, I am doing a quick summary below on the changes proposed by these patches: 1. [PRIORITY 1] Bug fixing: a) Fix buffer overflow issue in rte_port_ring.c (writer, writer_nodrop): do= uble the tx_buf buffer size (applicable for current code approach) b) Fix issue with handling burst sizes bigger than 32: replace all declarat= ions of local variable bsz_size from uint32_t to uint64_t 2. [PRIORITY 2] Treat burst size as a fixed/exact value for the TX burst (A= pproach 2) instead of minimal value (current code, Approach 1) for ethdev p= orts. Rationale is that some PMDs (like vector IXGBE) _might_ drop the exce= ss packets in the burst. Additional input: 1. Bruce and I looked together at the code, it looks that vector IXGBE is n= ot doing this (anymore). Internally it handles packets in batches on 32 (as= your code snippets suggest), but there is no drop of excess packets taking= place. 2. Venky also suggested to keep a larger burst as a single burst (Approach = 1) rather than break the larger burst into a fixed/constant size burst whil= e buffering the excess packets until complete burst is met in the future. Given this input and also the timing of the release, we think the best opti= on is: - urgently send a quick patch to handle the bug fixes now - keep the existing code (burst size used as minimal burst size requirement= , not constant) as is, at least for now, and if you feel it is not the best= choice, we can continue to debate it for 16.7 release. What do you think? Jasvinder just send the bug fixing patches, hopefully they will make it int= o the 16.4 release: http://www.dpdk.org/ml/archives/dev/2016-April/037392.html http://www.dpdk.org/ml/archives/dev/2016-April/037393.html Many thanks for your work on this, Robert! Regards, Cristian > -----Original Message----- > From: Sanford, Robert [mailto:rsanford@akamai.com] > Sent: Friday, April 1, 2016 5:22 PM > To: Dumitrescu, Cristian ; dev@dpdk.org > Cc: Liang, Cunming > Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big >=20 > Hi Cristian, >=20 > Please see my comments inline. >=20 > > > > > >> -----Original Message----- > >> From: Robert Sanford [mailto:rsanford2@gmail.com] > >> Sent: Monday, March 28, 2016 9:52 PM > >> To: dev@dpdk.org; Dumitrescu, Cristian > >> Subject: [PATCH 4/4] port: fix ethdev writer burst too big > >> > >> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally > >> send bursts larger than tx_burst_sz to the underlying ethdev. > >> Some PMDs (e.g., ixgbe) may truncate this request to their maximum > >> burst size, resulting in unnecessary enqueuing failures or ethdev > >> writer retries. > > > >Sending bursts larger than tx_burst_sz is actually intentional. The > >assumption is that NIC performance benefits from larger burst size. So > >the tx_burst_sz is used as a minimal burst size requirement, not as a > >maximal or fixed burst size requirement. > > > >I agree with you that a while ago the vector version of IXGBE driver use= d > >to work the way you describe it, but I don't think this is the case > >anymore. As an example, if TX burst size is set to 32 and 48 packets are > >transmitted, than the PMD will TX all the 48 packets (internally it can > >work in batches of 4, 8, 32, etc, should not matter) rather than TXing > >just 32 packets out of 48 and user having to either discard or retry wit= h > >the remaining 16 packets. I am CC-ing Steve Liang for confirming this. > > > >Is there any PMD that people can name that currently behaves the > >opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and > >discard the other 16? > > > >> >=20 > Yes, I believe that IXGBE *still* truncates. What am I missing? :) My > interpretation of the latest vector TX burst function is that it truncate= s > bursts longer than txq->tx_rs_thresh. Here are relevant code snippets tha= t > show it lowering the number of packets (nb_pkts) to enqueue (apologies in > advance for the email client mangling the indentation): >=20 > --- >=20 > #define IXGBE_DEFAULT_TX_RSBIT_THRESH 32 >=20 > static void > ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info > *dev_info) > { > ... > dev_info->default_txconf =3D (struct rte_eth_txconf) { > ... > .tx_rs_thresh =3D IXGBE_DEFAULT_TX_RSBIT_THRESH, > ... > }; > ... > } >=20 >=20 > uint16_t > ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, > uint16_t nb_pkts) > { > ... > /* cross rx_thresh boundary is not allowed */ > nb_pkts =3D RTE_MIN(nb_pkts, txq->tx_rs_thresh); >=20 > if (txq->nb_tx_free < txq->tx_free_thresh) > ixgbe_tx_free_bufs(txq); >=20 >=20 > nb_commit =3D nb_pkts =3D (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts); > if (unlikely(nb_pkts =3D=3D 0)) > return 0; > ... > return nb_pkts; > } >=20 > --- >=20 >=20 >=20 > >> We propose to fix this by moving the tx buffer flushing logic from > >> *after* the loop that puts all packets into the tx buffer, to *inside* > >> the loop, testing for a full burst when adding each packet. > >> > > > >The issue I have with this approach is the introduction of a branch that > >has to be tested for each iteration of the loop rather than once for the > >entire loop. > > > >The code branch where you add this is actually the slow(er) code path > >(where local variable expr !=3D 0), which is used for non-contiguous or > >bursts smaller than tx_burst_sz. Is there a particular reason you are > >only interested of enabling this strategy (of using tx_burst_sz as a > >fixed burst size requirement) only on this code path? The reason I am > >asking is the other fast(er) code path (where expr =3D=3D 0) also uses > >tx_burst_sz as a minimal requirement and therefore it can send burst > >sizes bigger than tx_burst_sz. >=20 > The reason we limit the burst size only in the "else" path is that we als= o > proposed to limit the ethdev tx burst in the "if (expr=3D=3D0)" path, in = patch > 3/4. >=20 > > > > > >> Signed-off-by: Robert Sanford > >> --- > >> lib/librte_port/rte_port_ethdev.c | 20 ++++++++++---------- > >> 1 files changed, 10 insertions(+), 10 deletions(-) > >> > >> diff --git a/lib/librte_port/rte_port_ethdev.c > >> b/lib/librte_port/rte_port_ethdev.c > >> index 3fb4947..1283338 100644 > >> --- a/lib/librte_port/rte_port_ethdev.c > >> +++ b/lib/librte_port/rte_port_ethdev.c > >> @@ -151,7 +151,7 @@ static int > rte_port_ethdev_reader_stats_read(void > >> *port, > >> struct rte_port_ethdev_writer { > >> struct rte_port_out_stats stats; > >> > >> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; > >> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; > >> uint32_t tx_burst_sz; > >> uint16_t tx_buf_count; > >> uint64_t bsz_mask; > >> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port, > >> p->tx_buf[tx_buf_count++] =3D pkt; > >> > >> RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1); > >> pkts_mask &=3D ~pkt_mask; > >> - } > >> > >> - p->tx_buf_count =3D tx_buf_count; > >> - if (tx_buf_count >=3D p->tx_burst_sz) > >> - send_burst(p); > >> + p->tx_buf_count =3D tx_buf_count; > >> + if (tx_buf_count >=3D p->tx_burst_sz) > >> + send_burst(p); > >> + } > >> } > > > >One observation here: if we enable this proposal (which I have an issue > >with due to the executing the branch per loop iteration rather than once > >per entire loop), it also eliminates the buffer overflow issue flagged b= y > >you in the other email :), so no need to e.g. doble the size of the port > >internal buffer (tx_buf). > > > >> >=20 >=20 > Not exactly correct: We suggested doubling tx_buf[] for *ring* writers. > Here (the hunks above) we suggest the opposite: *reduce* the size of the > *ethdev* tx_buf[], because we never expect to buffer more than a full > burst. >=20 > You are correct about the additional branch per loop iteration. On the > other hand, the proposed change is simpler than something like this: > compute how many more packets we need to complete a full burst, copy > them > to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way > is acceptable to me. >=20 >=20 > >> return 0; > >> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void > >> *port, > >> struct rte_port_ethdev_writer_nodrop { > >> struct rte_port_out_stats stats; > >> > >> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; > >> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; > >> uint32_t tx_burst_sz; > >> uint16_t tx_buf_count; > >> uint64_t bsz_mask; > >> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void > >> *port, > >> p->tx_buf[tx_buf_count++] =3D pkt; > >> > >> RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); > >> pkts_mask &=3D ~pkt_mask; > >> - } > >> > >> - p->tx_buf_count =3D tx_buf_count; > >> - if (tx_buf_count >=3D p->tx_burst_sz) > >> - send_burst_nodrop(p); > >> + p->tx_buf_count =3D tx_buf_count; > >> + if (tx_buf_count >=3D p->tx_burst_sz) > >> + send_burst_nodrop(p); > >> + } > >> } > >> > >> return 0; > >> -- > >> 1.7.1 > > >=20 > -- > Regards, > Robert