From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 26CD51518 for ; Wed, 31 Aug 2016 14:35:19 +0200 (CEST) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP; 31 Aug 2016 05:34:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,261,1470726000"; d="scan'208";a="3082616" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by fmsmga006.fm.intel.com with ESMTP; 31 Aug 2016 05:34:57 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.102]) by IRSMSX109.ger.corp.intel.com ([169.254.13.24]) with mapi id 14.03.0248.002; Wed, 31 Aug 2016 13:34:56 +0100 From: "Ananyev, Konstantin" To: Stephen Hemminger , "Kulasek, TomaszX" CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 0/6] add Tx preparation Thread-Index: AQHR/7YytoC5uCoCk0KbBMnsnmaByqBbbxkAgAeVJMA= Date: Wed, 31 Aug 2016 12:34:56 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B95117@irsmsx105.ger.corp.intel.com> References: <1472228578-6980-1-git-send-email-tomaszx.kulasek@intel.com> <20160826103114.5b547cef@xeon-e3> In-Reply-To: <20160826103114.5b547cef@xeon-e3> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 0/6] add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2016 12:35:19 -0000 >=20 > On Fri, 26 Aug 2016 18:22:52 +0200 > Tomasz Kulasek wrote: >=20 > > As discussed in that thread: > > > > http://dpdk.org/ml/archives/dev/2015-September/023603.html > > > > Different NIC models depending on HW offload requested might impose > > different requirements on packets to be TX-ed in terms of: > > > > - Max number of fragments per packet allowed > > - Max number of fragments per TSO segments > > - The way pseudo-header checksum should be pre-calculated > > - L3/L4 header fields filling > > - etc. > > > > > > MOTIVATION: > > ----------- > > > > 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst. > > However, this work is sometimes required, and now, it's an > > application issue. >=20 > Why not? You are adding an additional API burden on every application. >=20 > > > > 2) Different hardware may have different requirements for TX offloads, > > other subset can be supported and so on. >=20 > These need to be reported by API so that application can handle it. If you read the patch description, you'll see that we do both: - provide tx_prep() - "2) Also new fields will be introduced in rte_eth_desc_lim:=20 nb_seg_max and nb_mtu_seg_max, providing an information about max segments in TSO and non-TSO packets acceptable by device. This information is useful for application to not create/limit malicious packet." > Doing these transformations in tx_prep seems late in the process. Why is that? It is totally up to the application to decide ahat stage it wants to call t= x_prep() for each packet - just after it formed and mbuf to be TX-ed, or just before calling tx_burst(= ) for it, or somewhere in btetween.=20 >=20 > > > > 3) Some parameters (e.g. number of segments in ixgbe driver) may hung > > device. These parameters may be vary for different devices. > > > > For example i40e HW allows 8 fragments per packet, but that is after > > TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit. >=20 > Seems better to handle these limits as exceptions in i40e_tx_burst etc; r= ather than a pre-step. Look at how Linux driver API works, several > drivers have to have an exception linearize path. Hmm, doesn't it contradicts with your statement above: ' Doing these transformations in tx_prep seems late in the process.'? :) I suppose we all know that Linux kernel driver and DPDK PMD usage model is = quite different.=20 As a rule of thumb we try to avoid modifying packet data inside the tx_burs= t() itself. Having this functionality in a different function gives upper layer a choic= e when it is better to modify packet contents and hopefully hide/minimize memory access latenci= es. =20 >=20 > > > > 4) Fields in packet may require different initialization (like e.g. wil= l > > require pseudo-header checksum precalculation, sometimes in a > > different way depending on packet type, and so on). Now application > > needs to care about it. >=20 > Once again, the driver should do this in Tx. Once again, I really doubt it should. >=20 >=20 > > > > 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let t= o > > prepare packet burst in acceptable form for specific device. > > > > 6) Some additional checks may be done in debug mode keeping tx_burst > > implementation clean. >=20 > Most of this could be done by refactoring existing tx_burst in drivers. > Much of the code seems to be written as the "let's write a 2000 line func= tion because that is most efficient" rather than "let's write small > steps and let the compiler optimize it" I don't see how that could be easily done inside tx_burst() without signifc= atn performance loss. Especially if we have a pipeline model, when we have one or several t produ= ce mbufs to be TX-ed, and one or several lcores that doing actual TX for these packets. Konstantin =20