From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from prod-mail-xrelay08.akamai.com (prod-mail-xrelay08.akamai.com [96.6.114.112]) by dpdk.org (Postfix) with ESMTP id DC8E32BA1 for ; Fri, 1 Apr 2016 18:22:18 +0200 (CEST) Received: from prod-mail-xrelay08.akamai.com (localhost.localdomain [127.0.0.1]) by postfix.imss70 (Postfix) with ESMTP id EF0F7200051; Fri, 1 Apr 2016 16:22:17 +0000 (GMT) Received: from prod-mail-relay09.akamai.com (prod-mail-relay09.akamai.com [172.27.22.68]) by prod-mail-xrelay08.akamai.com (Postfix) with ESMTP id D957420004C; Fri, 1 Apr 2016 16:22:17 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; s=a1; t=1459527737; bh=XgEKx/5IyuZdn9gh+Ds7dsWDnKntfD4bQjGoqAzdLkY=; l=6786; h=From:To:CC:Date:From; b=nOCIvJEh7Ll4r+EPcRCpCTpSV6eHUfaog9S4/DcnqCH0+7w4HqWOG0CxqVzxhebk+ Ob6G9mSkRnEv5yiRSDIkFTLPSld1gmk8aXQiGGuYKWusZCP7yvMup6gZ1wrIhOEdTE i9CYl+OpUxIYww0Lj3KTKk4QH52wyzmNfm3KSx04= Received: from email.msg.corp.akamai.com (um-cas.msg.corp.akamai.com [172.27.25.30]) by prod-mail-relay09.akamai.com (Postfix) with ESMTP id D69CA1E07C; Fri, 1 Apr 2016 16:22:17 +0000 (GMT) Received: from ustx2ex-dag1mb6.msg.corp.akamai.com (172.27.27.107) by ustx2ex-dag1mb3.msg.corp.akamai.com (172.27.27.103) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Fri, 1 Apr 2016 11:22:17 -0500 Received: from ustx2ex-dag1mb6.msg.corp.akamai.com ([172.27.27.107]) by ustx2ex-dag1mb6.msg.corp.akamai.com ([172.27.27.107]) with mapi id 15.00.1130.005; Fri, 1 Apr 2016 09:22:17 -0700 From: "Sanford, Robert" To: "Dumitrescu, Cristian" , "dev@dpdk.org" CC: "Liang, Cunming" Thread-Topic: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Thread-Index: AQHRjDKoyc2GDY2FaUGj9fRrdME9ig== Date: Fri, 1 Apr 2016 16:22:17 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.3.140616 x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [172.19.132.73] Content-Type: text/plain; charset="us-ascii" Content-ID: <456A8B9E9D16E24F8773BDEB28668077@akamai.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Apr 2016 16:22:19 -0000 Hi Cristian, Please see my comments inline. > > >> -----Original Message----- >> From: Robert Sanford [mailto:rsanford2@gmail.com] >> Sent: Monday, March 28, 2016 9:52 PM >> To: dev@dpdk.org; Dumitrescu, Cristian >> Subject: [PATCH 4/4] port: fix ethdev writer burst too big >>=20 >> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally >> send bursts larger than tx_burst_sz to the underlying ethdev. >> Some PMDs (e.g., ixgbe) may truncate this request to their maximum >> burst size, resulting in unnecessary enqueuing failures or ethdev >> writer retries. > >Sending bursts larger than tx_burst_sz is actually intentional. The >assumption is that NIC performance benefits from larger burst size. So >the tx_burst_sz is used as a minimal burst size requirement, not as a >maximal or fixed burst size requirement. > >I agree with you that a while ago the vector version of IXGBE driver used >to work the way you describe it, but I don't think this is the case >anymore. As an example, if TX burst size is set to 32 and 48 packets are >transmitted, than the PMD will TX all the 48 packets (internally it can >work in batches of 4, 8, 32, etc, should not matter) rather than TXing >just 32 packets out of 48 and user having to either discard or retry with >the remaining 16 packets. I am CC-ing Steve Liang for confirming this. > >Is there any PMD that people can name that currently behaves the >opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and >discard the other 16? > >>=20 Yes, I believe that IXGBE *still* truncates. What am I missing? :) My interpretation of the latest vector TX burst function is that it truncates bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that show it lowering the number of packets (nb_pkts) to enqueue (apologies in advance for the email client mangling the indentation): --- #define IXGBE_DEFAULT_TX_RSBIT_THRESH 32 static void ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { ... dev_info->default_txconf =3D (struct rte_eth_txconf) { ... .tx_rs_thresh =3D IXGBE_DEFAULT_TX_RSBIT_THRESH, ... }; ... } uint16_t ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { ... /* cross rx_thresh boundary is not allowed */ nb_pkts =3D RTE_MIN(nb_pkts, txq->tx_rs_thresh); if (txq->nb_tx_free < txq->tx_free_thresh) ixgbe_tx_free_bufs(txq); =20 nb_commit =3D nb_pkts =3D (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts); if (unlikely(nb_pkts =3D=3D 0)) return 0; ... return nb_pkts; } --- >> We propose to fix this by moving the tx buffer flushing logic from >> *after* the loop that puts all packets into the tx buffer, to *inside* >> the loop, testing for a full burst when adding each packet. >>=20 > >The issue I have with this approach is the introduction of a branch that >has to be tested for each iteration of the loop rather than once for the >entire loop. > >The code branch where you add this is actually the slow(er) code path >(where local variable expr !=3D 0), which is used for non-contiguous or >bursts smaller than tx_burst_sz. Is there a particular reason you are >only interested of enabling this strategy (of using tx_burst_sz as a >fixed burst size requirement) only on this code path? The reason I am >asking is the other fast(er) code path (where expr =3D=3D 0) also uses >tx_burst_sz as a minimal requirement and therefore it can send burst >sizes bigger than tx_burst_sz. The reason we limit the burst size only in the "else" path is that we also proposed to limit the ethdev tx burst in the "if (expr=3D=3D0)" path, in pa= tch 3/4. > > >> Signed-off-by: Robert Sanford >> --- >> lib/librte_port/rte_port_ethdev.c | 20 ++++++++++---------- >> 1 files changed, 10 insertions(+), 10 deletions(-) >>=20 >> diff --git a/lib/librte_port/rte_port_ethdev.c >> b/lib/librte_port/rte_port_ethdev.c >> index 3fb4947..1283338 100644 >> --- a/lib/librte_port/rte_port_ethdev.c >> +++ b/lib/librte_port/rte_port_ethdev.c >> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void >> *port, >> struct rte_port_ethdev_writer { >> struct rte_port_out_stats stats; >>=20 >> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; >> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; >> uint32_t tx_burst_sz; >> uint16_t tx_buf_count; >> uint64_t bsz_mask; >> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port, >> p->tx_buf[tx_buf_count++] =3D pkt; >>=20 >> RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1); >> pkts_mask &=3D ~pkt_mask; >> - } >>=20 >> - p->tx_buf_count =3D tx_buf_count; >> - if (tx_buf_count >=3D p->tx_burst_sz) >> - send_burst(p); >> + p->tx_buf_count =3D tx_buf_count; >> + if (tx_buf_count >=3D p->tx_burst_sz) >> + send_burst(p); >> + } >> } > >One observation here: if we enable this proposal (which I have an issue >with due to the executing the branch per loop iteration rather than once >per entire loop), it also eliminates the buffer overflow issue flagged by >you in the other email :), so no need to e.g. doble the size of the port >internal buffer (tx_buf). > >>=20 Not exactly correct: We suggested doubling tx_buf[] for *ring* writers. Here (the hunks above) we suggest the opposite: *reduce* the size of the *ethdev* tx_buf[], because we never expect to buffer more than a full burst. You are correct about the additional branch per loop iteration. On the other hand, the proposed change is simpler than something like this: compute how many more packets we need to complete a full burst, copy them to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way is acceptable to me. >> return 0; >> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void >> *port, >> struct rte_port_ethdev_writer_nodrop { >> struct rte_port_out_stats stats; >>=20 >> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; >> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; >> uint32_t tx_burst_sz; >> uint16_t tx_buf_count; >> uint64_t bsz_mask; >> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void >> *port, >> p->tx_buf[tx_buf_count++] =3D pkt; >>=20 >> RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); >> pkts_mask &=3D ~pkt_mask; >> - } >>=20 >> - p->tx_buf_count =3D tx_buf_count; >> - if (tx_buf_count >=3D p->tx_burst_sz) >> - send_burst_nodrop(p); >> + p->tx_buf_count =3D tx_buf_count; >> + if (tx_buf_count >=3D p->tx_burst_sz) >> + send_burst_nodrop(p); >> + } >> } >>=20 >> return 0; >> -- >> 1.7.1 > -- Regards, Robert