From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by dpdk.org (Postfix) with ESMTP id CDEFF2B8C for ; Thu, 8 Dec 2016 18:24:19 +0100 (CET) Received: by mail-wm0-f47.google.com with SMTP id a197so227001949wmd.0 for ; Thu, 08 Dec 2016 09:24:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=st6JvRbEJo2QU56KBew29Z2BF872NwWQqN3Qd9OhOOk=; b=TOgXK0wV/xD3A5p7zjBRCAt0pajgytd+krGI63DaAzf2frBmvkyBiZQO4zzr+0zSEO 7/sbIC/ymRe6Nb6JyJ3BPVurii0VY8oplUgNDhaTAAzEQhNfg7J8eeF2LjJ0CjOTIxPy ADhXTtf8npihI37WdNdQMnTTfodGxortw7+80zW39ZZv+lVpwpsFsF0PvysLkn2GCU2u l4GWaV9T5GxOX4nB+idup2orr0ZTD1s3xSJ9rP9V+481KBbOPMH17S9BDGEewAslOjOu NJ8bEAyzNB9CM3ztCmIJPe1JTIHqN7kOzdoqPIfzo3CxhUp09HMwGWYu0tjdrmpX1qZJ UPYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=st6JvRbEJo2QU56KBew29Z2BF872NwWQqN3Qd9OhOOk=; b=b2MnYt7zDK1EWabFKoPwmxY1DD24VHf8W3r8LKjinwKw5n7NZiOteGIEbolZ5U7Q1Z k+K8XzpZNjBpFreX6Hg2zrIlTXpd9I8VqredIRLtiQuQW5c1ER5Ng2XcYpxzX5DCHtoe zSr5iyzhOXUq0Ho6A+N4JfuqflutmsjINfxcuFS38nStznS9gh2UmgeMYdNx9FEipI5X E4uLJs1qmlAA+JjDlbO097o2AoT7shK+NyOrXdkWp+0OLTxbwNmrrfAJKQ7qgJcWDOBe n3gMSTD66Yvif/2NTdPG+zzLFw+P16DJxwrwSPxmi4nqhqheqS9pHiq4Pq/vULFBy1ji 97Hw== X-Gm-Message-State: AKaTC00LO/wFE/ikxfWn+4zVNAUEeOtD9lBN5GmUUPTke3EawqlALWQkFYOGlxeaJp75M+dX X-Received: by 10.28.55.203 with SMTP id e194mr3101273wma.6.1481217859328; Thu, 08 Dec 2016 09:24:19 -0800 (PST) Received: from platinum (2a01cb0c03c651000226b0fffeed02fc.ipv6.abo.wanadoo.fr. [2a01:cb0c:3c6:5100:226:b0ff:feed:2fc]) by smtp.gmail.com with ESMTPSA id i15sm38050347wjs.16.2016.12.08.09.24.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 08 Dec 2016 09:24:19 -0800 (PST) Date: Thu, 8 Dec 2016 18:24:17 +0100 From: Olivier Matz To: "Ananyev, Konstantin" Cc: Thomas Monjalon , "Kulasek, TomaszX" , "dev@dpdk.org" Message-ID: <20161208182417.32ec3933@platinum> In-Reply-To: <2601191342CEEE43887BDE71AB9772583F0E3405@irsmsx105.ger.corp.intel.com> References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com> <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com> <1479922585-8640-2-git-send-email-tomaszx.kulasek@intel.com> <7834627.cBDVu3uoNi@xps13> <2601191342CEEE43887BDE71AB9772583F0E2AB0@irsmsx105.ger.corp.intel.com> <20161202092425.13752e2f@platinum> <2601191342CEEE43887BDE71AB9772583F0E3405@irsmsx105.ger.corp.intel.com> X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2016 17:24:20 -0000 Hi Konstantin, On Fri, 2 Dec 2016 16:17:51 +0000, "Ananyev, Konstantin" wrote: > Hi Olivier, > > > -----Original Message----- > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > Sent: Friday, December 2, 2016 8:24 AM > > To: Ananyev, Konstantin > > Cc: Thomas Monjalon ; Kulasek, TomaszX > > ; dev@dpdk.org Subject: Re: [dpdk-dev] > > [PATCH v12 1/6] ethdev: add Tx preparation > > > > Hi Konstantin, > > > > On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin" > > wrote: > > > > > > > > 2016-11-23 18:36, Tomasz Kulasek: > > > > > +/** > > > > > + * Process a burst of output packets on a transmit queue of > > > > > an Ethernet device. > > > > > + * > > > > > + * The rte_eth_tx_prepare() function is invoked to prepare > > > > > output packets to be > > > > > + * transmitted on the output queue *queue_id* of the Ethernet > > > > > device designated > > > > > + * by its *port_id*. > > > > > + * The *nb_pkts* parameter is the number of packets to be > > > > > prepared which are > > > > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures, > > > > > each of them > > > > > + * allocated from a pool created with > > > > > rte_pktmbuf_pool_create(). > > > > > + * For each packet to send, the rte_eth_tx_prepare() function > > > > > performs > > > > > + * the following operations: > > > > > + * > > > > > + * - Check if packet meets devices requirements for tx > > > > > offloads. > > > > > + * > > > > > + * - Check limitations about number of segments. > > > > > + * > > > > > + * - Check additional requirements when debug is enabled. > > > > > + * > > > > > + * - Update and/or reset required checksums when tx offload > > > > > is set for packet. > > > > > + * > > > > > + * Since this function can modify packet data, provided mbufs > > > > > must be safely > > > > > + * writable (e.g. modified data cannot be in shared > > > > > segment). > > > > > > > > I think we will have to remove this limitation in next releases. > > > > As we don't know how it could affect the API, I suggest to > > > > declare this API EXPERIMENTAL. > > > > > > While I don't really mind to mart it as experimental, I don't > > > really understand the reasoning: Why " this function can modify > > > packet data, provided mbufs must be safely writable" suddenly > > > becomes a problem? That seems like and obvious limitation to me > > > and let say tx_burst() has the same one. Second, I don't see how > > > you are going to remove it without introducing a heavy > > > performance impact. Konstantin > > > > About tx_burst(), I don't think we should force the user to provide > > a writable mbuf. There are many use cases where passing a clone > > already works as of today and it avoids duplicating the mbuf data. > > For instance: traffic generator, multicast, bridging/tap, etc... > > > > Moreover, this requirement would be inconsistent with the model you > > are proposing in case of pipeline: > > - tx_prepare() on core X, may update the data > > - tx_burst() on core Y, should not touch the data to avoid cache > > misses > > Probably I wasn't very clear in my previous mail. > I am not saying that we should force the user to pass a writable mbuf. > What I am saying that for tx_burst() current expectation is that > after mbuf is handled to tx_burst() user shouldn't try to modify its > buffer contents till TX engine is done with the buffer (mbuf_free() > is called by TX func for it). For tx_prep(), I think, it is the same > though restrictions are a bit more strict: user should not try to > read/write to the mbuf while tx_prep() is not finished with it. What > puzzles me is that why that should be the reason to mark tx_prep() as > experimental. Konstantin To be sure we're on the same page, let me reword: - mbufs passed to tx_prepare() by the application must have their headers (l2_len + l3_len + l4_len) writable because the phdr checksum can be replaced. It could be precised in the api comment. - mbufs passed to tx_burst() must not be modified by the driver/hw, nor by the application. About the API itself, I have one more question. I know you've already discussed this a bit with Adrien, I don't want to spawn a new big thread from here ;) The API provides tx_prepare() to check the packets have the proper format, and possibly modify them (ex: csum offload / tso) to match hw requirements. So it does both checks (number of segments) and fixes (csum/tso). What determines things that should be checked and things that should be fixed? The application gets few information from tx_prepare() about what should be done to make the packet accepted by the hw, and the actions will probably be different depending on hardware. So could we imagine that in the future the function also tries to fix the packet? I've seen your comment saying that it has to be an application decision, so what about having a parameter saying "fix the packet" or "don't fix it"? About rte_eth_desc_lim->nb_seg_max and rte_eth_desc_lim->nb_mtu_seg_max, I'm still quite reserved, especially for the 2nd one, because I don't see how it can be used by the application. Well, it does not hurt to have them, but for me it looks a bit useless. Last thing, I think this API should become the default in the future. For instance, it would prevent the application to calculate a phdr csum that will not be used by the hw. Not calling tx_prepare() would require the user/application to exactly knows the underlying hw and the kind of packet that are generated. So for me it means we'll need to also update other examples (other testpmd engines, l2fwd, ...). Do you agree? Regards, Olivier