From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 474105582 for ; Wed, 30 Nov 2016 11:54:55 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP; 30 Nov 2016 02:54:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,573,1473145200"; d="scan'208";a="1066095576" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by orsmga001.jf.intel.com with ESMTP; 30 Nov 2016 02:54:51 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.43]) by IRSMSX153.ger.corp.intel.com ([169.254.9.203]) with mapi id 14.03.0248.002; Wed, 30 Nov 2016 10:54:51 +0000 From: "Ananyev, Konstantin" To: Adrien Mazarguil , Thomas Monjalon CC: "dev@dpdk.org" , Rahul Lakkireddy , Stephen Hurd , "Jan Medala" , Jakub Palider , John Daley , Alejandro Lucero , Harish Patil , Rasesh Mody , Jerin Jacob , Yuanhan Liu , Yong Wang , "Kulasek, TomaszX" , "olivier.matz@6wind.com" Thread-Topic: [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thread-Index: AQHSRbBDKacvBhx2eUCvrHI7LRX1L6DuQp0AgALr7oCAACwIoA== Date: Wed, 30 Nov 2016 10:54:50 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583F0E20C8@irsmsx105.ger.corp.intel.com> References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com> <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com> <8317180.L80Qf11uiu@xps13> <20161130074003.GD10340@6wind.com> In-Reply-To: <20161130074003.GD10340@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Nov 2016 10:54:56 -0000 Hi Adrien, >=20 > On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote: > > We need attention of every PMD developers on this thread. >=20 > I've been following this thread from the beginning while working on rte_f= low > and wanted to see where it was headed before replying. (I know, v11 was > submitted about 1 month ago but still.) >=20 > > Reminder of what Konstantin suggested: > > " > > - if the PMD supports TX offloads AND > > - if to be able use any of these offloads the upper layer SW would have= to: > > * modify the contents of the packet OR > > * obey HW specific restrictions > > then it is a PMD developer responsibility to provide tx_prep() that wou= ld implement > > expected modifications of the packet contents and restriction checks. > > Otherwise, tx_prep() implementation is not required and can be safely s= et to NULL. > > " > > > > I copy/paste also my previous conclusion: > > > > Before txprep, there is only one API: the application must prepare the > > packets checksum itself (get_psd_sum in testpmd). > > With txprep, the application have 2 choices: keep doing the job itself > > or call txprep which calls a PMD-specific function. >=20 > Something is definitely needed here, and only PMDs can provide it. I thin= k > applications should not have to clear checksum fields or initialize them = to > some magic value, same goes for any other offload or hardware limitation > that needs to be worked around. >=20 > tx_prep() is one possible answer to this issue, however as mentioned in t= he > original patch it can be very expensive if exposed by the PMD. >=20 > Another issue I'm more concerned about is the way limitations are managed > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this > structure contains new fields that are only relevant to a few devices, an= d I > fear it will keep growing with each new hardware quirk to manage, breakin= g > ABIs in the process. Well, if some new HW capability/limitation would arise and we'd like to sup= port it in DPDK, then yes we probably would need to think how to incorporate it = here. Do you have anything particular in mind here? >=20 > What are applications supposed to do, check each of them regardless befor= e > attempting to send a burst? >=20 > I understand tx_prep() automates this process, however I'm wondering why > isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an > example, tx_prep() has an extra check in case of TSO that the TX burst > function does not perform. This ends up being much more expensive to > applications due to the additional loop doing redundant testing on each > mbuf. >=20 > If, say as a performance improvement, we decided to leave the validation > part to the TX burst function; what remains in tx_prep() is basically hea= vy > "preparation" requiring mbuf changes (i.e. erasing checksums, for now). >=20 > Following the same logic, why can't such a thing be made part of the TX > burst function as well (through a direct call to rte_phdr_cksum_fix() > whenever necessary). From an application standpoint, what are the advanta= ges > of having to: >=20 > if (tx_prep()) // iterate and update mbufs as needed > tx_burst(); // iterate and send >=20 > Compared to: >=20 > tx_burst(); // iterate, update as needed and send I think that was discussed extensively quite a lot previously here: As Thomas already replied - main motivation is to allow user to execute them on different stages of packet TX pipeline, and probably on different cores. I think that provides better flexibility to the user to when/where do these preparations and hopefully would lead to better performance. Though, if you or any other PMD developer/maintainer would prefer for particular PMD to combine both functionalities into tx_burst() and keep tx_prep() as NOP - this is still possible too. =20 >=20 > Note that PMDs could still provide different TX callbacks depending on th= e > set of enabled offloads so performance is not unnecessarily impacted. >=20 > In my opinion the second approach is both faster to applications and more > friendly from a usability perspective, am I missing something obvious? >=20 > > The question is: does non-Intel drivers need a checksum preparation for= TSO? > > Will it behave well if txprep does nothing in these drivers? > > > > When looking at the code, most of drivers handle the TSO flags. > > But it is hard to know whether they rely on the pseudo checksum or not. > > > > git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' driver= s/net/ > > > > drivers/net/bnxt/bnxt_txr.c > > drivers/net/cxgbe/sge.c > > drivers/net/e1000/em_rxtx.c > > drivers/net/e1000/igb_rxtx.c > > drivers/net/ena/ena_ethdev.c > > drivers/net/enic/enic_rxtx.c > > drivers/net/fm10k/fm10k_rxtx.c > > drivers/net/i40e/i40e_rxtx.c > > drivers/net/ixgbe/ixgbe_rxtx.c > > drivers/net/mlx4/mlx4.c > > drivers/net/mlx5/mlx5_rxtx.c > > drivers/net/nfp/nfp_net.c > > drivers/net/qede/qede_rxtx.c > > drivers/net/thunderx/nicvf_rxtx.c > > drivers/net/virtio/virtio_rxtx.c > > drivers/net/vmxnet3/vmxnet3_rxtx.c > > > > Please, we need a comment for each driver saying > > "it is OK, we do not need any checksum preparation for TSO" > > or > > "yes we have to implement tx_prepare or TSO will not work in this mode" >=20 > For both mlx4 and mlx5 then, > "it is OK, we do not need any checksum preparation for TSO". >=20 > Actually I do not think we'll ever need tx_prep() unless we add our own > quirks to struct rte_eth_desc_lim (and friends) which are currently quiet= ly > handled by TX burst functions. Ok, so MLX PMD is not affected by these changes and tx_prep for MLX can be = safely set to NULL, correct? Thanks Konstantin >=20 > -- > Adrien Mazarguil > 6WIND