From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 1FA712C60 for ; Fri, 9 Dec 2016 18:19:34 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 09 Dec 2016 09:19:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,324,1477983600"; d="scan'208";a="1079788608" Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25]) by fmsmga001.fm.intel.com with ESMTP; 09 Dec 2016 09:19:19 -0800 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.79]) by irsmsx110.ger.corp.intel.com ([163.33.3.25]) with mapi id 14.03.0248.002; Fri, 9 Dec 2016 17:19:18 +0000 From: "Kulasek, TomaszX" To: Olivier Matz , "Ananyev, Konstantin" CC: Thomas Monjalon , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation Thread-Index: AQHSRbBkXBK52xh5bEGv2Yak9bo9p6DzVKAAgACQoACAAHpagIAAhEeAgAmAjICAAWYaMA== Date: Fri, 9 Dec 2016 17:19:18 +0000 Message-ID: <3042915272161B4EB253DA4D77EB373A14F5A409@IRSMSX102.ger.corp.intel.com> References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com> <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com> <1479922585-8640-2-git-send-email-tomaszx.kulasek@intel.com> <7834627.cBDVu3uoNi@xps13> <2601191342CEEE43887BDE71AB9772583F0E2AB0@irsmsx105.ger.corp.intel.com> <20161202092425.13752e2f@platinum> <2601191342CEEE43887BDE71AB9772583F0E3405@irsmsx105.ger.corp.intel.com> <20161208182417.32ec3933@platinum> In-Reply-To: <20161208182417.32ec3933@platinum> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Dec 2016 17:19:35 -0000 Hi Oliver, My 5 cents below: > -----Original Message----- > From: Olivier Matz [mailto:olivier.matz@6wind.com] > Sent: Thursday, December 8, 2016 18:24 > To: Ananyev, Konstantin > Cc: Thomas Monjalon ; Kulasek, TomaszX > ; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation >=20 > Hi Konstantin, >=20 > On Fri, 2 Dec 2016 16:17:51 +0000, "Ananyev, Konstantin" > wrote: > > Hi Olivier, > > > > > -----Original Message----- > > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > > Sent: Friday, December 2, 2016 8:24 AM > > > To: Ananyev, Konstantin > > > Cc: Thomas Monjalon ; Kulasek, TomaszX > > > ; dev@dpdk.org Subject: Re: [dpdk-dev] > > > [PATCH v12 1/6] ethdev: add Tx preparation > > > > > > Hi Konstantin, > > > > > > On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin" > > > wrote: > > > > > > > > > > 2016-11-23 18:36, Tomasz Kulasek: > > > > > > +/** > > > > > > + * Process a burst of output packets on a transmit queue of > > > > > > an Ethernet device. > > > > > > + * > > > > > > + * The rte_eth_tx_prepare() function is invoked to prepare > > > > > > output packets to be > > > > > > + * transmitted on the output queue *queue_id* of the Ethernet > > > > > > device designated > > > > > > + * by its *port_id*. > > > > > > + * The *nb_pkts* parameter is the number of packets to be > > > > > > prepared which are > > > > > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures, > > > > > > each of them > > > > > > + * allocated from a pool created with > > > > > > rte_pktmbuf_pool_create(). > > > > > > + * For each packet to send, the rte_eth_tx_prepare() function > > > > > > performs > > > > > > + * the following operations: > > > > > > + * > > > > > > + * - Check if packet meets devices requirements for tx > > > > > > offloads. > > > > > > + * > > > > > > + * - Check limitations about number of segments. > > > > > > + * > > > > > > + * - Check additional requirements when debug is enabled. > > > > > > + * > > > > > > + * - Update and/or reset required checksums when tx offload > > > > > > is set for packet. > > > > > > + * > > > > > > + * Since this function can modify packet data, provided mbufs > > > > > > must be safely > > > > > > + * writable (e.g. modified data cannot be in shared > > > > > > segment). > > > > > > > > > > I think we will have to remove this limitation in next releases. > > > > > As we don't know how it could affect the API, I suggest to > > > > > declare this API EXPERIMENTAL. > > > > > > > > While I don't really mind to mart it as experimental, I don't > > > > really understand the reasoning: Why " this function can modify > > > > packet data, provided mbufs must be safely writable" suddenly > > > > becomes a problem? That seems like and obvious limitation to me > > > > and let say tx_burst() has the same one. Second, I don't see how > > > > you are going to remove it without introducing a heavy performance > > > > impact. Konstantin > > > > > > About tx_burst(), I don't think we should force the user to provide > > > a writable mbuf. There are many use cases where passing a clone > > > already works as of today and it avoids duplicating the mbuf data. > > > For instance: traffic generator, multicast, bridging/tap, etc... > > > > > > Moreover, this requirement would be inconsistent with the model you > > > are proposing in case of pipeline: > > > - tx_prepare() on core X, may update the data > > > - tx_burst() on core Y, should not touch the data to avoid cache > > > misses > > > > Probably I wasn't very clear in my previous mail. > > I am not saying that we should force the user to pass a writable mbuf. > > What I am saying that for tx_burst() current expectation is that after > > mbuf is handled to tx_burst() user shouldn't try to modify its buffer > > contents till TX engine is done with the buffer (mbuf_free() is called > > by TX func for it). For tx_prep(), I think, it is the same though > > restrictions are a bit more strict: user should not try to read/write > > to the mbuf while tx_prep() is not finished with it. What puzzles me > > is that why that should be the reason to mark tx_prep() as > > experimental. Konstantin >=20 > To be sure we're on the same page, let me reword: >=20 > - mbufs passed to tx_prepare() by the application must have their > headers (l2_len + l3_len + l4_len) writable because the phdr checksum > can be replaced. It could be precised in the api comment. >=20 > - mbufs passed to tx_burst() must not be modified by the driver/hw, nor > by the application. >=20 >=20 > About the API itself, I have one more question. I know you've already > discussed this a bit with Adrien, I don't want to spawn a new big thread > from here ;) >=20 > The API provides tx_prepare() to check the packets have the proper format= , > and possibly modify them (ex: csum offload / tso) to match hw > requirements. So it does both checks (number of segments) and fixes > (csum/tso). What determines things that should be checked and things that > should be fixed? >=20 1) Performance -- we may assume that packets are created with the common ru= les (e.g. doesn't tries to count IP checksum for IPv6 packet, sets all requ= ired fields, etc.). For now, additional checks are done only in DEBUG mode. 2) Uncommon requirements, such a number of segments for TSO/non-TSO, where = invalid values can cause unexpected results (or even hang device on burst),= so it is critical (at least for ixgbe and i40e) and must be checked. 3) Checksum field must be initialized in a hardware specific way to count i= t properly. 4) If packet is invalid its content isn't modified nor restored. > The application gets few information from tx_prepare() about what should > be done to make the packet accepted by the hw, and the actions will > probably be different depending on hardware. So could we imagine that in > the future the function also tries to fix the packet? I've seen your > comment saying that it has to be an application decision, so what about > having a parameter saying "fix the packet" or "don't fix it"? >=20 The main question here is how invasive in packet data tx_prepare should be. If I understand you correctly, you're talking about the situation when tx_p= repare will try to restore packet internally, e.g. linearize data for i40e = to meet number of segments requirements, split mbufs, when are too large, m= aybe someone will wanted to have full software checksum computation fallbac= k for devices which doesn't support it, and so on? And then this parameter = will indicate "do the required work for me", or "just check, initialize wha= t should be initialized, if fails, let me decide"? Implemented model is quite simple and clear: 1) Validate uncommon requirements which may cause a problem and let applica= tion to deal with it. 2) If packet is valid, initialize required fields according to the hardware= requirements. In fact, it doesn't fix anything for now, but initializes checksums, and th= e main reason why it is done here, is the fact that it cannot be done in a = trivial way in the application itself. IMHO, we should to keep this API as simple as it possible and not let it gr= ow in unexpected way (also for performance reason). > About rte_eth_desc_lim->nb_seg_max and > rte_eth_desc_lim->nb_mtu_seg_max, I'm still quite reserved, especially fo= r > the 2nd one, because I don't see how it can be used by the application. > Well, it does not hurt to have them, but for me it looks a bit useless. >=20 * "nb_seg_max": Maximum number of segments per whole packet. * "nb_mtu_seg_max": Maximum number of segments per one MTU. Application can use provided values in a fallowed way: * For non-TSO packet, a single transmit packet may span up to "nb_mtu_seg_m= ax" buffers. * For TSO packet the total number of data descriptors is "nb_seg_max", and = each segment within the TSO may span up to "nb_mtu_seg_max". > Last thing, I think this API should become the default in the future. > For instance, it would prevent the application to calculate a phdr csum > that will not be used by the hw. Not calling tx_prepare() would require > the user/application to exactly knows the underlying hw and the kind of > packet that are generated. So for me it means we'll need to also update > other examples (other testpmd engines, l2fwd, ...). Do you agree? >=20 Most of sample applications doesn't use TX offloads at all and doesn't coun= t checksums. So the question is "should tx_prepare be used even if not requ= ired?" >=20 > Regards, > Olivier >=20 Tomasz