From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 3F1972BA7 for ; Fri, 20 Jan 2017 12:24:42 +0100 (CET) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP; 20 Jan 2017 03:24:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,258,1477983600"; d="scan'208";a="55558310" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by fmsmga005.fm.intel.com with ESMTP; 20 Jan 2017 03:24:41 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.38]) by IRSMSX152.ger.corp.intel.com ([169.254.6.191]) with mapi id 14.03.0248.002; Fri, 20 Jan 2017 11:24:40 +0000 From: "Ananyev, Konstantin" To: Andrew Rybchenko , "Yang, Zhiyong" , "dev@dpdk.org" CC: "thomas.monjalon@6wind.com" , "Richardson, Bruce" Thread-Topic: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior Thread-Index: AQHScwLvmSQXMwEci0q+iANJA6Yaz6FBKT0AgAAM+lCAAAC5QA== Date: Fri, 20 Jan 2017 11:24:40 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583F108959@irsmsx105.ger.corp.intel.com> References: <1484905876-60165-1-git-send-email-zhiyong.yang@intel.com> <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2017 11:24:43 -0000 >=20 > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com] > Sent: Friday, January 20, 2017 10:26 AM > To: Yang, Zhiyong ; dev@dpdk.org > Cc: thomas.monjalon@6wind.com; Richardson, Bruce ; Ananyev, Konstantin > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching b= ehavior >=20 > On 01/20/2017 12:51 PM, Zhiyong Yang wrote: > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to > transmit output packets on the output queue for DPDK applications as > follows. >=20 > static inline uint16_t > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id, > struct rte_mbuf **tx_pkts, uint16_t nb_pkts); >=20 > Note: The fourth parameter nb_pkts: The number of packets to transmit. > The rte_eth_tx_burst() function returns the number of packets it actually > sent. The return value equal to *nb_pkts* means that all packets have bee= n > sent, and this is likely to signify that other output packets could be > immediately transmitted again. Applications that implement a "send as man= y > packets to transmit as possible" policy can check this specific case and > keep invoking the rte_eth_tx_burst() function until a value less than > *nb_pkts* is returned. >=20 > When you call TX only once in rte_eth_tx_burst, you may get different > behaviors from different PMDs. One problem that every DPDK user has to > face is that they need to take the policy into consideration at the app- > lication level when using any specific PMD to send the packets whether or > not it is necessary, which brings usage complexities and makes DPDK users > easily confused since they have to learn the details on TX function limit > of specific PMDs and have to handle the different return value: the numbe= r > of packets transmitted successfully for various PMDs. Some PMDs Tx func- > tions have a limit of sending at most 32 packets for every invoking, some > PMDs have another limit of at most 64 packets once, another ones have imp= - > lemented to send as many packets to transmit as possible, etc. This will > easily cause wrong usage for DPDK users. >=20 > This patch proposes to implement the above policy in DPDK lib in order to > simplify the application implementation and avoid the incorrect invoking > as well. So, DPDK Users don't need to consider the implementation policy > and to write duplicated code at the application level again when sending > packets. In addition to it, the users don't need to know the difference o= f > specific PMD TX and can transmit the arbitrary number of packets as they > expect when invoking TX API rte_eth_tx_burst, then check the return value > to get the number of packets actually sent. >=20 > How to implement the policy in DPDK lib? Two solutions are proposed below= . >=20 > Solution 1: > Implement the wrapper functions to remove some limits for each specific > PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that. >=20 > > IMHO, the solution is a bit better since it: > >=A01. Does not affect other PMDs at all > >=A02. Could be a bit faster for the PMDs which require it since has no i= ndirect > >=A0=A0=A0 function call on each iteration > >=A03. No ABI change I also would prefer solution number 1 for the reasons outlined by Andrew ab= ove. Also, IMO current limitation for number of packets to TX in some Intel PMD = TX routines are sort of artificial: - they are not caused by any real HW limitations - avoiding them at PMD level shouldn't cause any performance or functional = degradation. So I don't see any good reason why instead of fixing these limitations in our own PMDs we are trying to push them to the upper (rte_ethdev) layer. Konstantin >=20 >=20 > Solution 2: > Implement the policy in the function rte_eth_tx_burst() at the ethdev lay= - > er in a more consistent batching way. Make best effort to send *nb_pkts* > packets with bursts of no more than 32 by default since many DPDK TX PMDs > are using this max TX burst size(32). In addition, one data member which > defines the max TX burst size such as "uint16_t max_tx_burst_pkts;"will b= e > added to rte_eth_dev_data, which drivers can override if they work with > bursts of 64 or other NB(thanks for Bruce 's > suggestion). This can reduce the performance impacting to the lowest limi= t. >=20 > > I see no noticeable difference in performance, so don't mind if this is= finally choosen. > > Just be sure that you update all PMDs to set reasonable default values,= or may be > > even better, set UINT16_MAX in generic place - 0 is a bad default here. > > (Lost few seconds wondering why nothing is sent and cannot stop) >=20 >=20 > I prefer the latter between the 2 solutions because it makes DPDK code mo= re > consistent and easier and avoids to write too much duplicate logic in DPD= K > source code. In addition, I think no or a little performance drop is > brought by solution 2. But ABI change will be introduced. >=20 > In fact, the current rte_eth_rx_burst() function is using the similar > mechanism and faces the same problem as rte_eth_tx_burst(). >=20 > static inline uint16_t > rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id, > struct rte_mbuf **rx_pkts, const uint16_t nb_pkts); >=20 > Applications are responsible of implementing the policy "retrieve as many > received packets as possible", and check this specific case and keep > invoking the rte_eth_rx_burst() function until a value less than *nb_pkts= * > is returned. >=20 > The patch proposes to apply the above method to rte_eth_rx_burst() as wel= l. >=20 > In summary, The purpose of the RFC makes the job easier and more simple f= or > driver writers and avoids to write too much duplicate code at the applica= - > tion level. >=20