From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cristian.dumitrescu@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id B26307F70
 for <dev@dpdk.org>; Thu, 31 Mar 2016 15:22:50 +0200 (CEST)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga102.jf.intel.com with ESMTP; 31 Mar 2016 06:22:50 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.24,422,1455004800"; d="scan'208";a="922592694"
Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59])
 by orsmga001.jf.intel.com with ESMTP; 31 Mar 2016 06:22:49 -0700
Received: from irsmsx111.ger.corp.intel.com (10.108.20.4) by
 IRSMSX151.ger.corp.intel.com (163.33.192.59) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Thu, 31 Mar 2016 14:22:47 +0100
Received: from irsmsx108.ger.corp.intel.com ([169.254.11.13]) by
 irsmsx111.ger.corp.intel.com ([169.254.2.127]) with mapi id 14.03.0248.002;
 Thu, 31 Mar 2016 14:22:47 +0100
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: Robert Sanford <rsanford2@gmail.com>, "dev@dpdk.org" <dev@dpdk.org>
CC: "Liang, Cunming" <cunming.liang@intel.com>
Thread-Topic: [PATCH 4/4] port: fix ethdev writer burst too big
Thread-Index: AQHRiTO8m/qLB+HoCkiZczPANCBPkJ9ziSRA
Date: Thu, 31 Mar 2016 13:22:47 +0000
Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912647974F2E@IRSMSX108.ger.corp.intel.com>
References: <1459198297-49854-1-git-send-email-rsanford@akamai.com>
 <1459198297-49854-5-git-send-email-rsanford@akamai.com>
In-Reply-To: <1459198297-49854-5-git-send-email-rsanford@akamai.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTYyMDkyMGMtZTM0Yy00Yjg5LTg2NTctMmJkMzc0MTQzYTE1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InZiMzFqSzhWSmJpeVhuWFplenErR252dFNoK3NoTlwvazNId0hXSklkWmJFPSJ9
x-ctpclassification: CTP_IC
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Mar 2016 13:22:51 -0000



> -----Original Message-----
> From: Robert Sanford [mailto:rsanford2@gmail.com]
> Sent: Monday, March 28, 2016 9:52 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
>=20
> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
> send bursts larger than tx_burst_sz to the underlying ethdev.
> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
> burst size, resulting in unnecessary enqueuing failures or ethdev
> writer retries.

Sending bursts larger than tx_burst_sz is actually intentional. The assumpt=
ion is that NIC performance benefits from larger burst size. So the tx_burs=
t_sz is used as a minimal burst size requirement, not as a maximal or fixed=
 burst size requirement.

I agree with you that a while ago the vector version of IXGBE driver used t=
o work the way you describe it, but I don't think this is the case anymore.=
 As an example, if TX burst size is set to 32 and 48 packets are transmitte=
d, than the PMD will TX all the 48 packets (internally it can work in batch=
es of 4, 8, 32, etc, should not matter) rather than TXing just 32 packets o=
ut of 48 and user having to either discard or retry with the remaining 16 p=
ackets. I am CC-ing Steve Liang for confirming this.

Is there any PMD that people can name that currently behaves the opposite, =
i.e. given a burst of 48 pkts for TX, accept 32 pkts and discard the other =
16?

>=20
> We propose to fix this by moving the tx buffer flushing logic from
> *after* the loop that puts all packets into the tx buffer, to *inside*
> the loop, testing for a full burst when adding each packet.
>=20

The issue I have with this approach is the introduction of a branch that ha=
s to be tested for each iteration of the loop rather than once for the enti=
re loop.

The code branch where you add this is actually the slow(er) code path (wher=
e local variable expr !=3D 0), which is used for non-contiguous or bursts s=
maller than tx_burst_sz. Is there a particular reason you are only interest=
ed of enabling this strategy (of using tx_burst_sz as a fixed burst size re=
quirement) only on this code path? The reason I am asking is the other fast=
(er) code path (where expr =3D=3D 0) also uses tx_burst_sz as a minimal req=
uirement and therefore it can send burst sizes bigger than tx_burst_sz.


> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> ---
>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
>=20
> diff --git a/lib/librte_port/rte_port_ethdev.c
> b/lib/librte_port/rte_port_ethdev.c
> index 3fb4947..1283338 100644
> --- a/lib/librte_port/rte_port_ethdev.c
> +++ b/lib/librte_port/rte_port_ethdev.c
> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
> *port,
>  struct rte_port_ethdev_writer {
>  	struct rte_port_out_stats stats;
>=20
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>  			p->tx_buf[tx_buf_count++] =3D pkt;
>=20
> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &=3D ~pkt_mask;
> -		}
>=20
> -		p->tx_buf_count =3D tx_buf_count;
> -		if (tx_buf_count >=3D p->tx_burst_sz)
> -			send_burst(p);
> +			p->tx_buf_count =3D tx_buf_count;
> +			if (tx_buf_count >=3D p->tx_burst_sz)
> +				send_burst(p);
> +		}
>  	}

One observation here: if we enable this proposal (which I have an issue wit=
h due to the executing the branch per loop iteration rather than once per e=
ntire loop), it also eliminates the buffer overflow issue flagged by you in=
 the other email :), so no need to e.g. doble the size of the port internal=
 buffer (tx_buf).

>=20
>  	return 0;
> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
> *port,
>  struct rte_port_ethdev_writer_nodrop {
>  	struct rte_port_out_stats stats;
>=20
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
> *port,
>  			p->tx_buf[tx_buf_count++] =3D pkt;
>=20
> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &=3D ~pkt_mask;
> -		}
>=20
> -		p->tx_buf_count =3D tx_buf_count;
> -		if (tx_buf_count >=3D p->tx_burst_sz)
> -			send_burst_nodrop(p);
> +			p->tx_buf_count =3D tx_buf_count;
> +			if (tx_buf_count >=3D p->tx_burst_sz)
> +				send_burst_nodrop(p);
> +		}
>  	}
>=20
>  	return 0;
> --
> 1.7.1