From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <keith.wiles@intel.com>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id EB91CFFA
 for <users@dpdk.org>; Sat, 28 Jan 2017 23:43:51 +0100 (CET)
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
 by orsmga104.jf.intel.com with ESMTP; 28 Jan 2017 14:43:50 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.33,303,1477983600"; d="scan'208";a="58701759"
Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201])
 by fmsmga005.fm.intel.com with ESMTP; 28 Jan 2017 14:43:50 -0800
Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by
 FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Sat, 28 Jan 2017 14:43:50 -0800
Received: from fmsmsx113.amr.corp.intel.com ([169.254.13.230]) by
 fmsmsx115.amr.corp.intel.com ([169.254.4.4]) with mapi id 14.03.0248.002;
 Sat, 28 Jan 2017 14:43:50 -0800
From: "Wiles, Keith" <keith.wiles@intel.com>
To: Peter Keereweer <peterkeereweer@hotmail.com>
CC: "users@dpdk.org" <users@dpdk.org>
Thread-Topic: [dpdk-users] What to do after rte_eth_tx_burst: free or send
 again remaining packets?
Thread-Index: AQHSeZ89QhXZzOqpSEWE4fljFDCT1aFOTeesgAC074A=
Date: Sat, 28 Jan 2017 22:43:49 +0000
Message-ID: <DE60CBB1-14D5-4933-82BD-DD66D4A06003@intel.com>
References: <DB6PR1001MB14167842D49F0751FC93ADD9C0490@DB6PR1001MB1416.EURPRD10.PROD.OUTLOOK.COM>
 <DB6PR1001MB1416AC3F6FF8FE6F82EA1F35C0490@DB6PR1001MB1416.EURPRD10.PROD.OUTLOOK.COM>
In-Reply-To: <DB6PR1001MB1416AC3F6FF8FE6F82EA1F35C0490@DB6PR1001MB1416.EURPRD10.PROD.OUTLOOK.COM>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.252.137.159]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <F55CDBDED61AE14788045365F315C3F8@intel.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-users] What to do after rte_eth_tx_burst: free or send
 again remaining packets?
X-BeenThere: users@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK usage discussions <users.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/users>,
 <mailto:users-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/users/>
List-Post: <mailto:users@dpdk.org>
List-Help: <mailto:users-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/users>,
 <mailto:users-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Jan 2017 22:43:52 -0000


> On Jan 28, 2017, at 1:57 PM, Peter Keereweer <peterkeereweer@hotmail.com>=
 wrote:
>=20
> Hi!
>=20
> Currently I'am running some tests with the Load Balancer Sample Applicati=
on. I'm testing the Load Balancer Sample Application by sending packets wit=
h pktgen.
> I have a setup of 2 servers with each server containing a Intel 10Gbe 825=
99 NIC (connected to each other). I have configured the Load Balancer appli=
cation to use 1 core for RX, 1 worker core and 1 TX core. The TX core sends=
 all packets back to the pktgen application.
>=20
> With the pktgen I send 1024 UDP packets to the Load Balancer. Every packe=
t processed by the worker core will be printed to the screen (I added this =
code by myself). If I send 1024 UDP packets, 1008 ( =3D 7 x 144) packets wi=
ll be printed to the screen. This is  correct, because the RX core reads pa=
ckets with a burst size of 144. So if I send 1024 packets, I expect 1008 pa=
ckets back in the pktgen application. But surprisingly I only receive 224 p=
ackets instead of 1008 packets. After some research I found that that  224 =
packets is not just a random number, its 7 x 32 (=3D 224). So if the RX rea=
ds 7 x 144 packets, I get back 7 x 32 packets. After digging into the code =
from the Load Balancer application I found in 'runtime.c' in the 'app_lcore=
_io_tx' function this code :
>=20
> n_pkts =3D rte_eth_tx_burst(
>                                 port,
>                                 0,
>                                 lp->tx.mbuf_out[port].array,
>                                 (uint16_t) n_mbufs);
>=20
> ...
>=20
> if (unlikely(n_pkts < n_mbufs)) {
>                                 uint32_t k;
>                                 for (k =3D n_pkts; k < n_mbufs; k ++) {
>                                         struct rte_mbuf *pkt_to_free =3D =
lp->tx.mbuf_out[port].array[k];
>                                         rte_pktmbuf_free(pkt_to_free);
>                                 }
>                         }
>=20
> What I understand from this code is that n_mbufs 'packets' are send with =
'rte_eth_tx_burst' function. This function returns n_pkts, the number of pa=
ckets that are actually send. If the actual number of packets send is small=
er then n_mbufs (packets ready for  send given to the rte_eth_tx_burst) the=
n all remaining packets, which are not send, are freed. In de the Load Bala=
ncer application, n_mbufs is equal to 144. But in my case 'rte_eth_tx_burst=
' returns the value 32, and not 144. So 32 packets are actually send  and t=
he remaining packets (144 - 32 =3D 112) are freed. This is the reason why I=
 get 224 (7 x 32) packets back instead of 1008 (=3D 7 x 144).
>=20
> But the question is: why are the remaining packets freed instead of tryin=
g to send them again? If I look into the 'pktgen.c', there is a function '_=
send_burst_fast' where all remaining packets are trying to be send again (i=
n a while loop until they are all  send) instead of freeing them (see code =
below) :
>=20
> static __inline__ void
> _send_burst_fast(port_info_t *info, uint16_t qid)
> {
>         struct mbuf_table   *mtab =3D &info->q[qid].tx_mbufs;
>         struct rte_mbuf **pkts;
>         uint32_t ret, cnt;
>=20
>         cnt =3D mtab->len;
>         mtab->len =3D 0;
>=20
>         pkts    =3D mtab->m_table;
>=20
>         if (rte_atomic32_read(&info->port_flags) & PROCESS_TX_TAP_PKTS) {
>                 while (cnt > 0) {
>                         ret =3D rte_eth_tx_burst(info->pid, qid, pkts, cn=
t);
>=20
>                         pktgen_do_tx_tap(info, pkts, ret);
>=20
>                         pkts +=3D ret;
>                         cnt -=3D ret;
>                 }
>         } else {
>                 while(cnt > 0) {
>                         ret =3D rte_eth_tx_burst(info->pid, qid, pkts, cn=
t);
>=20
>                         pkts +=3D ret;
>                         cnt -=3D ret;
>                 }
>         }
> }=20
>=20
> Why is this while loop (sending packets until they have all been send) no=
t implemented in the 'app_lcore_io_tx' function in the Load Balancer applic=
ation? That would make sense right? It looks like that the Load Balancer ap=
plication makes an assumption that  if not all packets have been send, the =
remaining packets failed during the sending proces and should be freed.

The size of the TX ring on the hardware is limited in size, but you can adj=
ust that size. In pktgen I attempt to send all packets requested to be sent=
, but in the load balancer the developer decided to just drop the packets t=
hat are not sent as the TX hardware ring or even a SW ring is full. This no=
rmally means the core is sending packets faster then the HW ring on the NIC=
 can send the packets.

It was just a choice of the developer to drop the packets instead of trying=
 again until the packets array is empty. One possible way to fix this is to=
 increase the size of the TX ring 2-4 time larger then the RX ring. This st=
ill does not truly solve the problem it just moves it to the RX ring. The N=
IC if is does not have a valid RX descriptor and a place to DMA the packet =
into memory it gets dropped at the wire. BTW increasing the TX ring size al=
so means the these packets will not returned to the free pool and you can e=
xhaust the packet pool. The packets are stuck on the TX ring as done becaus=
e the threshold to reclaim the done packets is too high.

Say you have 1024 ring size and the high watermark for flushing the done of=
f the ring is 900 packets. Then if the packet pool is only 512 packets then=
 when you send 512 packets they will all be on the TX done queue and now yo=
u are in a deadlock not being able to send a packet as they are all on the =
TX done ring. This normally does not happen as the ring sizes or normally m=
uch smaller then the number of TX packets or even RX packets.

In pktgen I attempt to send all of the packets requested as it does not mak=
e any sense for the user to ask to send 10000 packets and pktgen only send =
some number less as the core sending the packets can over run the TX queue =
at some point.

I hope that helps.

>=20
> I hope someone can help me with this questions. Thank you in advance!!
>=20
> Peter

Regards,
Keith