From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f176.google.com (mail-wj0-f176.google.com [209.85.210.176]) by dpdk.org (Postfix) with ESMTP id 5C4362C6D for ; Mon, 5 Dec 2016 16:03:37 +0100 (CET) Received: by mail-wj0-f176.google.com with SMTP id xy5so293023534wjc.0 for ; Mon, 05 Dec 2016 07:03:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=tGDIjVeQWEIx9jPwrFijp4nGDXBaY6fOa8bJHU9hk5g=; b=fGy7MkZq2CvUv7C12bnEBKaYFcOTlZKUoBg/vjHez5xSd1u+XbHrNxNT6ANL2AS2sX ukU9ZWXltvTYGQc43+PQKe3cT0PLlywwkDwAdiOepNW+sHcfU1Q2OEAIT/FyyfxuM2uR LiqdoKf9bg4xpFFfzv+fFSfhZ68S35NTZB05mx1qqL36gGblP2vhhy9IGbzt5HrO90hn 4PJ9ouKNKCUV9bVFzMVXd6SEvNd9RZrY2S8uibOwLIauX5ZriKt5PVh+m8oMPe/BnytD LBPUuBh0DPkc2mdWLw5QGIFrBar6iHrH/6L+yEl5gk3Zzi5s+0KT9YQP2U0bXyU5Ncgr GYug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=tGDIjVeQWEIx9jPwrFijp4nGDXBaY6fOa8bJHU9hk5g=; b=PnfvIAsmW5F1OfsdB/1ADweOUofAGc3uastkcKR6AOUvaM5MAdGyc6RqoDVeIVpYVI VmBU7h1zIRP4Q5kajIVY2rJJlnPJRf2yit+W7og9YQQAHTUXEwi+DA/ooRLzusS4yfLM AgOqS9QzZJx0hqVpYFuUY2JnzvoZ2n4fr/3OBghqQLjRWeemH4usz3kIuyf6HjNGfpeF A7sqefJ5kw1KP1Eo6M9/ZZkNqSC6MfHzFpzlKcHOhmqwT/w59v6MMmHtag6w+gr+nXUn K6wTSkG9acVmgwROzFPBsvIMv0ALGeYQcdmCscN0XWk4FeB1fjFnv+zPNnllJj7IBsP5 7XGQ== X-Gm-Message-State: AKaTC02Tcl0KHzdspnwuHwRGthgkssgid95PXrWsmebx9kCfafuBYaOftibYdwFdX1UcukMZ X-Received: by 10.194.54.99 with SMTP id i3mr50466214wjp.86.1480950216671; Mon, 05 Dec 2016 07:03:36 -0800 (PST) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id 204sm562306wmj.7.2016.12.05.07.03.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Dec 2016 07:03:35 -0800 (PST) Date: Mon, 5 Dec 2016 16:03:27 +0100 From: Adrien Mazarguil To: "Ananyev, Konstantin" Cc: Thomas Monjalon , "dev@dpdk.org" , Rahul Lakkireddy , Stephen Hurd , Jan Medala , Jakub Palider , John Daley , Alejandro Lucero , Harish Patil , Rasesh Mody , Jerin Jacob , Yuanhan Liu , Yong Wang , "Kulasek, TomaszX" , "olivier.matz@6wind.com" Message-ID: <20161205150327.GP10340@6wind.com> References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com> <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com> <8317180.L80Qf11uiu@xps13> <20161130074003.GD10340@6wind.com> <2601191342CEEE43887BDE71AB9772583F0E20C8@irsmsx105.ger.corp.intel.com> <20161201071518.GG10340@6wind.com> <2601191342CEEE43887BDE71AB9772583F0E2A9A@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772583F0E2A9A@irsmsx105.ger.corp.intel.com> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Dec 2016 15:03:37 -0000 Hi Konstantin, On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote: [...] > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote: > > [...] > > > > Something is definitely needed here, and only PMDs can provide it. I think > > > > applications should not have to clear checksum fields or initialize them to > > > > some magic value, same goes for any other offload or hardware limitation > > > > that needs to be worked around. > > > > > > > > tx_prep() is one possible answer to this issue, however as mentioned in the > > > > original patch it can be very expensive if exposed by the PMD. > > > > > > > > Another issue I'm more concerned about is the way limitations are managed > > > > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this > > > > structure contains new fields that are only relevant to a few devices, and I > > > > fear it will keep growing with each new hardware quirk to manage, breaking > > > > ABIs in the process. > > > > > > Well, if some new HW capability/limitation would arise and we'd like to support > > > it in DPDK, then yes we probably would need to think how to incorporate it here. > > > Do you have anything particular in mind here? > > > > Nothing in particular, so for the sake of the argument, let's suppose that I > > would like to add a field to expose some limitation that only applies to my > > PMD during TX but looks generic enough to make sense, e.g. maximum packet > > size when VLAN tagging is requested. > > Hmm, I didn't hear about such limitations so far, but if it is real case - > sure, feel free to submit the patch. I won't, that was hypothetical. > > PMDs are free to set that field to some > > special value (say, 0) if they do not care. > > > > Since that field exists however, conscious applications should check its > > value for each packet that needs to be transmitted. This extra code causes a > > slowdown just by sitting in the data path. Since it is not the only field in > > that structure, the performance impact can be significant. > > > > Even though this code is inside applications, it remains unfair to PMDs for > > which these tests are irrelevant. This problem is identified and addressed > > by tx_prepare(). > > I suppose the question is why do we need: > uint16_t nb_seg_max; > uint16_t nb_mtu_seg_max; > as we now have tx_prepare(), right? > > For two reasons: > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough > for them and would prefer to do necessary preparations for TX offloads themselves. > > 2. Even if people do use tx_prepare() they still should take this information into accout. > As an example ixgbe can't TX packets with then 40 segments. > Obviously ixbge_tx_prep() performs that check and returns an error. Problem is that tx_prepare() also provides safeties which are not part of tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the fact struct rte_eth_desc_lim can grow new fields anytime, application developers will be tempted to just call tx_prepare() and focus on more useful things. Put another way, from a user's point of view, tx_prepare() is an opaque function that greatly increases tx_burst()'s ability to send mbufs as requested, with extra error checking on top; applications not written to run on a specific PMD/device (all of them ideally) will thus call tx_prepare() at some point. > But it wouldn't try to merge/reallocate mbufs for you. > User still has to do it himself, or just prevent creating such long chains somehow. Yes, that's another debate. PMDs could still implement a software fallback for unlikely slow events like these. The number of PMDs is not going to decrease, each device having its own set of weird limitations in specific cases, PMDs should do their best to process mbufs even if that means slowly due to the lack of preparation. tx_prepare() has its uses but should really be optional, in the sense that if that function is not called, tx_burst() should deal with it somehow. > > Thanks to tx_prepare(), these checks are moved back into PMDs where they > > belong. PMDs that do not need them do not have to provide support for > > tx_prepare() and do not suffer any performance impact as result; > > applications only have to make sure tx_prepare() is always called at some > > point before tx_burst(). > > > > Once you reach this stage, you've effectively made tx_prepare() mandatory > > before tx_burst(). If some bug occurs, then perhaps you forgot to call > > tx_prepare(), you just need to add it. The total cost for doing TX is > > therefore tx_prepare() + tx_burst(). > > > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will > > remain optional for long. Sure, PMDs that do not implement it do not care, > > I'm focusing on applications, for which the performance impact of calling > > tx_prepare() followed by tx_burst() is higher than a single tx_burst() > > performing all the necessary preparation at once. > > > > [...] > > > > Following the same logic, why can't such a thing be made part of the TX > > > > burst function as well (through a direct call to rte_phdr_cksum_fix() > > > > whenever necessary). From an application standpoint, what are the advantages > > > > of having to: > > > > > > > > if (tx_prep()) // iterate and update mbufs as needed > > > > tx_burst(); // iterate and send > > > > > > > > Compared to: > > > > > > > > tx_burst(); // iterate, update as needed and send > > > > > > I think that was discussed extensively quite a lot previously here: > > > As Thomas already replied - main motivation is to allow user > > > to execute them on different stages of packet TX pipeline, > > > and probably on different cores. > > > I think that provides better flexibility to the user to when/where > > > do these preparations and hopefully would lead to better performance. > > > > And I agree, I think this use case is valid but does not warrant such a high > > penalty when your application does not need that much flexibility. Simple > > (yet conscious) applications need the highest performance. Complex ones as > > you described already suffer quite a bit from IPCs and won't mind a couple > > of extra CPU cycles right? > > It would mean an extra cache-miss for every packet, so I think performance hit > would be quite significant. A performance hit has to occur somewhere regardless, because something has to be done in order to send packets that need it. Whether this cost is in application code or in a PMD function, it remains part of TX. > About the 'simple' case when tx_prep() and tx_burst() are called on the same core, > Why do you believe that: > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}? I mean instead of two function calls with their own loops: tx_prepare() { foreach (pkt) { check(); extra_check(); ... } } tx_burst() { foreach (pkt) { check(); stuff(); ... } } You end up with one: tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } } Which usually is more efficient. > tx_prep() itself is quite expensive, let say for Intel HW it includes: > - read mbuf fileds (2 cache-lines), > - read packet header (1/2 cache-lines) > - calculate pseudo-header csum > - update packet header > Comparing to that price of extra function call seems neglectable > (if we TX packets in bursts of course). We agree its performance is a critical issue then, sharing half the read steps with tx_burst() would make sense to me. > > Yes they will, therefore we need a method that satisfies both cases. > > > > As a possible solution, a special mbuf flag could be added to each mbuf > > having gone through tx_prepare(). That way, tx_burst() could skip some > > checks and things it would otherwise have done. > > That's an interesting idea, but it has one drawback: > As I understand, it means that from now on if user doing preparations on his own, > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work. > So any existing applications that using TX offloads and do preparation by themselves > would have to be modified to avoid performance loss. In my opinion, users should not do preparation on their own. If we provide a generic method, it has to be fast enough to replace theirs. Perhaps not as fast since it would work with all PMDs (usual trade-off), but acceptably so. > > Another possibility, telling the PMD first that you always intend to use > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result. > > That what we have right now (at least for Intel HW): > it is a user responsibility to do the necessary preparations/checks before calling tx_burst(). > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own. > Now he can use a 'proper' PMD provided function. > > My vote still would be for that model. OK, then in a nutshell: 1. Users are not expected to perform preparation/checks themselves anymore, if they do, it's their problem. 2. If configured through an API to be defined, tx_burst() can be split in two and applications must call tx_prepare() at some point before tx_burst(). 3. Otherwise tx_burst() should perform the necessary preparation and checks on its own by default (when tx_prepare() is not expected). 4. We probably still need some mbuf flag to mark mbufs that cannot be modified, the refcount could also serve as a hint. Anything else? -- Adrien Mazarguil 6WIND