From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id EB91CFFA for ; Sat, 28 Jan 2017 23:43:51 +0100 (CET) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP; 28 Jan 2017 14:43:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,303,1477983600"; d="scan'208";a="58701759" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga005.fm.intel.com with ESMTP; 28 Jan 2017 14:43:50 -0800 Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Sat, 28 Jan 2017 14:43:50 -0800 Received: from fmsmsx113.amr.corp.intel.com ([169.254.13.230]) by fmsmsx115.amr.corp.intel.com ([169.254.4.4]) with mapi id 14.03.0248.002; Sat, 28 Jan 2017 14:43:50 -0800 From: "Wiles, Keith" To: Peter Keereweer CC: "users@dpdk.org" Thread-Topic: [dpdk-users] What to do after rte_eth_tx_burst: free or send again remaining packets? Thread-Index: AQHSeZ89QhXZzOqpSEWE4fljFDCT1aFOTeesgAC074A= Date: Sat, 28 Jan 2017 22:43:49 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.252.137.159] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-users] What to do after rte_eth_tx_burst: free or send again remaining packets? X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jan 2017 22:43:52 -0000 > On Jan 28, 2017, at 1:57 PM, Peter Keereweer = wrote: >=20 > Hi! >=20 > Currently I'am running some tests with the Load Balancer Sample Applicati= on. I'm testing the Load Balancer Sample Application by sending packets wit= h pktgen. > I have a setup of 2 servers with each server containing a Intel 10Gbe 825= 99 NIC (connected to each other). I have configured the Load Balancer appli= cation to use 1 core for RX, 1 worker core and 1 TX core. The TX core sends= all packets back to the pktgen application. >=20 > With the pktgen I send 1024 UDP packets to the Load Balancer. Every packe= t processed by the worker core will be printed to the screen (I added this = code by myself). If I send 1024 UDP packets, 1008 ( =3D 7 x 144) packets wi= ll be printed to the screen. This is correct, because the RX core reads pa= ckets with a burst size of 144. So if I send 1024 packets, I expect 1008 pa= ckets back in the pktgen application. But surprisingly I only receive 224 p= ackets instead of 1008 packets. After some research I found that that 224 = packets is not just a random number, its 7 x 32 (=3D 224). So if the RX rea= ds 7 x 144 packets, I get back 7 x 32 packets. After digging into the code = from the Load Balancer application I found in 'runtime.c' in the 'app_lcore= _io_tx' function this code : >=20 > n_pkts =3D rte_eth_tx_burst( > port, > 0, > lp->tx.mbuf_out[port].array, > (uint16_t) n_mbufs); >=20 > ... >=20 > if (unlikely(n_pkts < n_mbufs)) { > uint32_t k; > for (k =3D n_pkts; k < n_mbufs; k ++) { > struct rte_mbuf *pkt_to_free =3D = lp->tx.mbuf_out[port].array[k]; > rte_pktmbuf_free(pkt_to_free); > } > } >=20 > What I understand from this code is that n_mbufs 'packets' are send with = 'rte_eth_tx_burst' function. This function returns n_pkts, the number of pa= ckets that are actually send. If the actual number of packets send is small= er then n_mbufs (packets ready for send given to the rte_eth_tx_burst) the= n all remaining packets, which are not send, are freed. In de the Load Bala= ncer application, n_mbufs is equal to 144. But in my case 'rte_eth_tx_burst= ' returns the value 32, and not 144. So 32 packets are actually send and t= he remaining packets (144 - 32 =3D 112) are freed. This is the reason why I= get 224 (7 x 32) packets back instead of 1008 (=3D 7 x 144). >=20 > But the question is: why are the remaining packets freed instead of tryin= g to send them again? If I look into the 'pktgen.c', there is a function '_= send_burst_fast' where all remaining packets are trying to be send again (i= n a while loop until they are all send) instead of freeing them (see code = below) : >=20 > static __inline__ void > _send_burst_fast(port_info_t *info, uint16_t qid) > { > struct mbuf_table *mtab =3D &info->q[qid].tx_mbufs; > struct rte_mbuf **pkts; > uint32_t ret, cnt; >=20 > cnt =3D mtab->len; > mtab->len =3D 0; >=20 > pkts =3D mtab->m_table; >=20 > if (rte_atomic32_read(&info->port_flags) & PROCESS_TX_TAP_PKTS) { > while (cnt > 0) { > ret =3D rte_eth_tx_burst(info->pid, qid, pkts, cn= t); >=20 > pktgen_do_tx_tap(info, pkts, ret); >=20 > pkts +=3D ret; > cnt -=3D ret; > } > } else { > while(cnt > 0) { > ret =3D rte_eth_tx_burst(info->pid, qid, pkts, cn= t); >=20 > pkts +=3D ret; > cnt -=3D ret; > } > } > }=20 >=20 > Why is this while loop (sending packets until they have all been send) no= t implemented in the 'app_lcore_io_tx' function in the Load Balancer applic= ation? That would make sense right? It looks like that the Load Balancer ap= plication makes an assumption that if not all packets have been send, the = remaining packets failed during the sending proces and should be freed. The size of the TX ring on the hardware is limited in size, but you can adj= ust that size. In pktgen I attempt to send all packets requested to be sent= , but in the load balancer the developer decided to just drop the packets t= hat are not sent as the TX hardware ring or even a SW ring is full. This no= rmally means the core is sending packets faster then the HW ring on the NIC= can send the packets. It was just a choice of the developer to drop the packets instead of trying= again until the packets array is empty. One possible way to fix this is to= increase the size of the TX ring 2-4 time larger then the RX ring. This st= ill does not truly solve the problem it just moves it to the RX ring. The N= IC if is does not have a valid RX descriptor and a place to DMA the packet = into memory it gets dropped at the wire. BTW increasing the TX ring size al= so means the these packets will not returned to the free pool and you can e= xhaust the packet pool. The packets are stuck on the TX ring as done becaus= e the threshold to reclaim the done packets is too high. Say you have 1024 ring size and the high watermark for flushing the done of= f the ring is 900 packets. Then if the packet pool is only 512 packets then= when you send 512 packets they will all be on the TX done queue and now yo= u are in a deadlock not being able to send a packet as they are all on the = TX done ring. This normally does not happen as the ring sizes or normally m= uch smaller then the number of TX packets or even RX packets. In pktgen I attempt to send all of the packets requested as it does not mak= e any sense for the user to ask to send 10000 packets and pktgen only send = some number less as the core sending the packets can over run the TX queue = at some point. I hope that helps. >=20 > I hope someone can help me with this questions. Thank you in advance!! >=20 > Peter Regards, Keith