From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 29045376D for ; Thu, 28 Jul 2016 15:01:29 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 28 Jul 2016 06:01:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,434,1464678000"; d="scan'208";a="1025534465" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by orsmga002.jf.intel.com with ESMTP; 28 Jul 2016 06:01:27 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.102]) by IRSMSX104.ger.corp.intel.com ([169.254.5.203]) with mapi id 14.03.0248.002; Thu, 28 Jul 2016 14:01:16 +0100 From: "Ananyev, Konstantin" To: Jerin Jacob CC: Thomas Monjalon , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure Thread-Index: AQHR6C4vhCk5+MDYj0GuIonGmNNwxqAsvetQgABMTACAAI6goIAAD0eAgAAetsA= Date: Thu, 28 Jul 2016 13:01:16 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B8AB57@irsmsx105.ger.corp.intel.com> References: <1469024691-58750-1-git-send-email-tomaszx.kulasek@intel.com> <1469114659-66063-1-git-send-email-tomaszx.kulasek@intel.com> <2601191342CEEE43887BDE71AB97725836B80AD8@irsmsx105.ger.corp.intel.com> <2146153.nVzdynOqdk@xps13> <20160727171043.GA22116@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B8884E@irsmsx105.ger.corp.intel.com> <20160727174133.GA22895@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B88894@irsmsx105.ger.corp.intel.com> <20160728021345.GA3617@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836B8AA48@irsmsx105.ger.corp.intel.com> <20160728113853.GA14755@localhost.localdomain> In-Reply-To: <20160728113853.GA14755@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2016 13:01:29 -0000 > -----Original Message----- > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Thursday, July 28, 2016 12:39 PM > To: Ananyev, Konstantin > Cc: Thomas Monjalon ; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_d= ev structure >=20 > On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote: > > > If it does not cope up then it can skip tx'ing in the actual tx > > > burst itself and move the "skipped" tx packets to end of the list in > > > the tx burst so that application can take the action on "skipped" > > > packet after the tx burst > > > > Sorry, that's too cryptic for me. > > Can you reword it somehow? >=20 > OK. > 1) lets say application requests 32 packets to send it using tx_burst. > 2) packets are from p0 to p31 > 3) in driver due to some reason, it is not able to send the packets due t= o some constraints in the driver(say expect p2 and p16 everything > else sent successfully by the driver) > 4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and return= 30 > 5) application can take action on p2 and p16 based the return value of 30= (ie 32-30 =3D 2 packets needs to handle at pkt[0] and pkt[1] That would introduce packets reordering and unnecessary complicate the PMD = TX functions. Again it would require changes in *all* existing PMD tx functions. So we don't plan to do things that way. >=20 >=20 > > > > > > > > > > > > Instead it just setups the ol_flags, fills tx_offload fields and ca= lls tx_prep(). > > > > Please read the original Tomasz's patch, I think he explained > > > > possible use-cases with lot of details. > > > > > > Sorry, it is not very clear in terms of use cases. > > > > Ok, what I meant to say: > > Right now, if user wants to use HW TX cksum/TSO offloads he might have = to: > > - setup ipv4 header cksum field. > > - calculate the pseudo header checksum > > - setup tcp/udp cksum field. > > > > Rules how these calculations need to be done and which fields need to > > be updated, may vary depending on HW underneath and requested offloads. > > tx_prep() - supposed to hide all these nuances from user and allow him > > to use TX HW offloads in a transparent way. >=20 > Not sure I understand it completely. Bit contradicting with below stateme= nt > |We would document what tx_prep() supposed to do, and in what cases user > |don't need it. How that contradicts? Right now to make HW TX offloads to work user is required to do particular = actions: 1. set mbuf.ol_flags properly. 2. setup mbuf.tx_offload fields properly. 3. update L3/L4 header fields in a particular way. We move #3 into tx_prep(), to hide that complexity from the user simplify t= hings for him. Though if he still prefers to do #3 by himself - that's ok too. =20 >=20 > How about introducing a new ethdev generic eal command-line mode OR new e= thdev_configure hint that PMD driver is in "tx_prep- > >tx_burst" mode instead of just tx_burst? That way no fast-path performan= ce degradation for the PMD that does not need it User might want to send different packets over different devices, or even over different queues over the same device. Or even he might want to call tx_prep() for one group of packets, but skip for different group of packets for the same TX queue.=20 So I think we should allow user to decide when/where to call it. >=20 >=20 > > > > Another main purpose of tx_prep(): for multi-segment packets is to > > check that number of segments doesn't exceed HW limit. > > Again right now users have to do that on their own. > > > > > > > > In HW perspective, It it tries to avoid the illegal state. But not > > > sure calling "back to back" tx prepare and then tx burst how does it > > > improve the situation as the check illegal state check introduce in > > > actual tx burst it self. > > > > > > In SW perspective, its try to avoid sending malformed packets. In my > > > view the same can achieved with existing tx burst it self as PMD is > > > the one finally send the packets on the wire. > > > > Ok, so your question is: why not to put that functionality into > > tx_burst() itself, right? > > For few reasons: > > 1. putting that functionality into tx_burst() would introduce unnecessa= ry > > slowdown for cases when that functionality is not needed > > (one segment per packet, no HW offloads). >=20 > These parameters can be configured on init time No always, see above. >=20 > > 2. User might don't want to use tx_prep() - he/she might have its > > own analog, which he/she belives is faster/smarter,etc. >=20 > That's the current mode. Right? Yes. > > 3. Having it a s separate function would allow user control when/where > > to call it, let say only for some packets, or probably call tx_pr= ep() > > on one core, and do actual tx_burst() for these packets on the ot= her. > Why to process it under tx_prep() as application can always process the p= acket in one core Because not every application wants to do it over the same core. Some apps would like to do it on the same core, some apps would like to do = it on different core. With proposed API both models are possible. >=20 > > > > > > proposal quote: > > > > > > 1. Introduce rte_eth_tx_prep() function to do necessary preparations = of > > > packet burst to be safely transmitted on device for desired HW > > > offloads (set/reset checksum field according to the hardware > > > requirements) and check HW constraints (number of segments per > > > packet, etc). > > > > > > While the limitations and requirements may differ for devices, it > > > requires to extend rte_eth_dev structure with new function pointer > > > "tx_pkt_prep" which can be implemented in the driver to prepare an= d > > > verify packets, in devices specific way, before burst, what should= to > > > prevent application to send malformed packets. > > > > > > > > > > > > > > > and what if the PMD does not implement that callback then it is o= f waste cycles. Right? > > > > > > > > If you refer as lost cycles here something like: > > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then > > > > yes. > > > > Though comparing to actual work need to be done for most HW TX > > > > offloads, I think it is neglectable. > > > > > > Not sure. > > > > > > > Again, as I said before, it is totally voluntary for the applicatio= n. > > > > > > Not according to proposal. It can't be too as application has no > > > idea what PMD driver does with "prep" what is the implication on a > > > HW if application does not > > > > Why application writer wouldn't have an idea? > > We would document what tx_prep() supposed to do, and in what cases user= don't need it. >=20 > But how he/she detect that on that run-time ? By the application logic for example. If let say is doing the l2fwd for that group of packets, it would know that it doesn't need to do tx_prep(). To be honest, I don't understand what is your concern here. That proposed change doesn't break any existing functionality, doesn't introduce any new requirements to the existing API,=20 and wouldn't introduce any performance regression for existing apps. It is a an extension, and user is free not to use it, if it doesn't fit his= needs. >>From other side there are users who are interested in that functionality, and they do have use-cases for it. So what worries you? Konstantin >=20 > > Then it would be up to the user: > > - not to use it at all (one segment per packet, no HW TX offloads) >=20 > We already have TX flags for that >=20 > > - not to use tx_prep(), and make necessary preparations himself, > > that what people have to do now. > > - use tx_prep() > > > > Konstantin > >