From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by dpdk.org (Postfix) with ESMTP id A69B12BD9 for ; Fri, 15 Jul 2016 17:04:07 +0200 (CEST) Received: by mail-wm0-f51.google.com with SMTP id f126so29651022wma.1 for ; Fri, 15 Jul 2016 08:04:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=eXSM+JFOv+mfAsBrBMFHVGSyLBHRwLCHWd543QTSc6k=; b=m6DloVeYeSAP0nfSEdI776CRwi1aOVdlJ/3Gf9VRc2xovci541QJrRnyB0fcthfqLo rR78bRcyttBB4UoLRRu3T6gePp/oVFTeoCevFe8vgW+pTR6UtYJnFjwAVplgSyBcCZNX OH+AC09oJA6Pdsn4U3H5Jwxx2zUXOW/0Wk6ZEJXZcB450CW8psVVoAbUSpXtcwmrXYiA o1sOMxc8lMrApOXJGSXAQGfY+udMmZdnDkR/A2YGDhG0dIJy4pVw7JgYGPyKtYPGsMCs 0Uk/0Jj5RoyaeHcrMrZ9FjC4KK8zhMUfT6G3IwkFzdqToHBq/q2ZmLkfoOUX0ZJZVuSi TGZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=eXSM+JFOv+mfAsBrBMFHVGSyLBHRwLCHWd543QTSc6k=; b=AUU0TcAykIbEyed/cMERqWJLRQ0EAdlWhx+pdr9FtyFv3POAitvROWdk80phpx+Bz2 1CmXSNkLWzb5qwSdUq8h2/Ff1TPJg1k5lpqLm/6j42cyrKKmz49jwWFTrjauZpJthnVw vplz1F8RfAdJxvvlJ9bN+kh/8ziLoOMb2QTCZXgbufkyHuzvqiCXL8GMMX0giu7S0bK5 RWvaawuvuADlxz1ueUBTG+07/onRGhlodSBBZEnOIAP+i1sckm7f9lVLTgP9/fSJrsWE Fv5VuzBwVltegDdtMlIjPjuDp8wC/3nRx5xcJlh9zN2Vh1MIpJhzET0yQXQTBIh8xqhx Tdmg== X-Gm-Message-State: ALyK8tLgVZXUqWy0H2KtpQDKPkmgT9rxcTbMm8ZC+XB1nC7ncs0J1ITk5adWE143j9SJwdrL X-Received: by 10.194.187.139 with SMTP id fs11mr1239559wjc.164.1468595047206; Fri, 15 Jul 2016 08:04:07 -0700 (PDT) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id 12sm6331812wmj.19.2016.07.15.08.04.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Jul 2016 08:04:06 -0700 (PDT) Date: Fri, 15 Jul 2016 17:04:02 +0200 From: Adrien Mazarguil To: "Chandran, Sugesh" Cc: "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , "Lu, Wenzhuo" , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern , "Chilikin, Andrey" Message-ID: <20160715150402.GE7621@6wind.com> Mail-Followup-To: "Chandran, Sugesh" , "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , "Lu, Wenzhuo" , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern , "Chilikin, Andrey" References: <20160705181646.GO7621@6wind.com> <2EF2F5C0CC56984AA024D0B180335FCB13DEA331@IRSMSX102.ger.corp.intel.com> <20160708130310.GD7621@6wind.com> <2EF2F5C0CC56984AA024D0B180335FCB13DEB236@IRSMSX102.ger.corp.intel.com> <20160713200327.GC7621@6wind.com> <2EF2F5C0CC56984AA024D0B180335FCB13DEE55F@IRSMSX102.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2EF2F5C0CC56984AA024D0B180335FCB13DEE55F@IRSMSX102.ger.corp.intel.com> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2016 15:04:07 -0000 On Fri, Jul 15, 2016 at 09:23:26AM +0000, Chandran, Sugesh wrote: > Thank you Adrien, > Please find below for some more comments/inputs > > Let me know your thoughts on this. Thanks, stripping again non relevant parts. [...] > > > > > [Sugesh] Is it a limitation to use only 32 bit ID? Is it possible > > > > > to have a > > > > > 64 bit ID? So that application can use the control plane flow > > > > > pointer Itself as an ID. Does it make sense? > > > > > > > > I've specified a 32 bit ID for now because this is what FDIR > > > > supports and also what existing devices can report today AFAIK (i40e and > > mlx5). > > > > > > > > We could use 64 bit for future-proofness in a separate action like "ID64" > > > > when at least one device supports it. > > > > > > > > To PMD maintainers: please comment if you know devices that support > > > > tagging matching packets with more than 32 bits of user-provided > > > > data! > > > [Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet says so. > > > And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with rss > > > hash. This can be a software driver limitation that expose only 32 > > > bit. Possibly because of cache alignment issues? Since the hardware > > > can support 64 bit, I feel it make sense to support 64 bit as well. > > > > I agree we need 64 bit support, but then we also need a solution for devices > > that support only 32 bit. Possible methods I can think of: > > > > - A separate "ID64" action (or a "ID32" one, perhaps with a better name). > > > > - A single ID action with an unlimited number of bytes to return with > > packets (would actually be a string). PMDs can then refuse to create flow > > rules requesting an unsupported number of bytes. Devices supporting > > fewer > > than 32 bits are also included this way without the need for yet another > > action. > > > > Thoughts? > [Sugesh] I feel the single ID approach is much better. But I would say a fixed size ID > is easy to handle at upper layers. Say PMD returns 64bit ID in which MSBs > are masked out, based on how many bits the hardware can support. > PMD can refuse the unsupported number of bytes when requested. So the size > of ID going to be a parameter to program the flow. > What do you think? What you suggest if I am not mistaken is: struct rte_flow_action_id { uint64_t id; uint64_t mask; /* either a bit-mask or a prefix/suffix length? */ }; I think in this case a mask is more versatile than a prefix/suffix length as the value itself comes in an unknown endian (from PMD's POV). It also allows specific bits to be taken into account, like when HW only supports 32 bit, with some black magic the full original 64 bit value can be restored as long as the application only cares about at most 32 bits anywhere. However I do not think many applications "won't care" about specific bits in a given value and having to provide a properly crafted mask will be a hassle, they will just fill it with ones and hope for the best. As a result they won't take advantage of this feature or will stick to 32 bits all the time, or whatever happens to be the least common denominator. My previous suggestion was: struct rte_flow_action_id { uint8_t size; /* number of bytes in id[] */ uint8_t id[]; }; It does not solve the issue if an application requests more bytes than supported, however as a string, there is no endianness ambiguity and these bytes are copied as-is to the related mbuf field as if done through memcpy() possibly with some padding to fill the entire 64 bit field (copied bytes thus starting from MSB for big-endian machines, LSB for little-endian ones). The value itself remains opaque to the PMD. One issue is the flexible array approach makes static initialization more complicated. Maybe it is not worth the trouble since according to Andrey, even X710 reports at most 32 bits of user data. So what should we do? Fixed 32 bits ID for now to keep things simple, then another action for 64 bits later when necessary? > > [...] > > > > > [Sugesh] Another concern is the cost and time of installing these > > > > > rules in the hardware. Can we make these APIs time bound(or at > > > > > least an option > > > > to > > > > > set the time limit to execute these APIs), so that Application > > > > > doesn’t have to wait so long when installing and deleting flows > > > > with > > > > > slow hardware/NIC. What do you think? Most of the datapath flow > > > > installations are > > > > > dynamic and triggered only when there is an ingress traffic. Delay > > > > > in flow insertion/deletion have unpredictable > > > > consequences. > > > > > > > > This API is (currently) aimed at the control path only, and must > > > > indeed be assumed to be slow. Creating million of rules may take > > > > quite long as it may involve syscalls and other time-consuming > > > > synchronization things on the PMD side. > > > > > > > > So currently there is no plan to have rules added from the data path > > > > with time constraints. I think it would be implemented through a > > > > different set of functions anyway. > > > > > > > > I do not think adding time limits is practical, even specifying in > > > > the API that creating a single flow rule must take less than a > > > > maximum number of seconds in order to be effective is too much of a > > > > constraint (applications that create all flows during init may not care after > > all). > > > > > > > > You should consider in any case that modifying flow rules will > > > > always be slower than receiving packets, there is no way around > > > > that. Applications have to live with it and provide a software > > > > fallback for incoming packets while managing flow rules. > > > > > > > > Moreover, think about what happens when you hit the maximum > > number > > > > of flow rules and cannot create any more. Applications need to > > > > implement some kind of fallback in their data path. > > > > > > > > Offloading flows in HW is also only useful if they live much longer > > > > than the time taken to create and delete them. Perhaps applications > > > > may choose to do so after detecting long lived flows such as TCP > > > > sessions. > > > > > > > > You may have one separate control thread dedicated to manage flows > > > > and keep your normal control thread unaffected by delays. Several > > > > threads can even be dedicated, one per device. > > > [Sugesh] I agree that the flow insertion cannot be as fast as the > > > packet receiving rate. From application point of view the problem > > > will be when hardware flow insertion takes longer than software flow > > > insertion. At least application has to know the cost of > > > inserting/deleting a rule in hardware beforehand. Otherwise how > > > application can choose the right flow candidate for hardware. My point > > here is application is expecting a deterministic behavior from a classifier while > > inserting and deleting rules. > > > > Understood, however it will be difficult to estimate, particularly if a PMD > > must rearrange flow rules to make room for a new one due to priority levels > > collision or some other HW-related reason. I mean, spent time cannot be > > assumed to be constant, even PMDs cannot know in advance because it also > > depends on the performance of the host CPU. > > > > Such applications may find it easier to measure elapsed time for the rules > > they create, make statistics and extrapolate from this information for future > > rules. I do not think the PMD can help much here. > [Sugesh] From an application point of view this can be an issue. > Even there is a security concern when we program a short lived flow. Lets consider the case, > > 1) Control plane programs the hardware with Queue termination flow. > 2) Software dataplane programmed to treat the packets from the specific queue accordingly. > 3) Remove the flow from the hardware. (Lets consider this is a long wait process..). > Or even there is a chance that hardware take more time to report the status than removing it > physically . Now the packets in the queue no longer consider as matched/flow hit. > . This is due to the software dataplane update is yet to happen. > We must need a way to sync between software datapath and classifier APIs even though > they are both programmed from a different control thread. > > Are we saying these APIs are only meant for user defined static flows?? No, that is definitely not the intent. These are good points. With the specified API, applications may have to adapt their logic and take extra precautions in order to remain on the safe side at all times. For your above example, the application cannot assume a rule is added/deleted as long as the PMD has not completed the related operation, which means keeping the SW rule/fallback in place in the meantime. Should handle security concerns as long as after removing a rule, packets end up in a default queue entirely processed by SW. Obviously this may worsen response time. The ID action can help with this. By knowing which rule a received packet is associated with, processing can be temporarily offloaded by another thread without much complexity. I think applications have to implement SW fallbacks all the time, as even some sort of guarantee on the flow rule processing time may not be enough to avoid misdirected packets and related security issues. Let's wait for applications to start using this API and then consider an extra set of asynchronous / real-time functions when the need arises. It should not impact the way rules are specified. > > > > > [Sugesh] Another query is on the synchronization part. What if > > > > > same rules > > > > are > > > > > handled from different threads? Is application responsible for > > > > > handling the > > > > concurrent > > > > > hardware programming? > > > > > > > > Like most (if not all) DPDK APIs, applications are responsible for > > > > managing locking issues as decribed in 4.3 (Behavior). Since this is > > > > a control path API and applications usually have a single control > > > > thread, locking should not be necessary in most cases. > > > > > > > > Regarding my above comment about using several control threads to > > > > manage different devices, section 4.3 says: > > > > > > > > "There is no provision for reentrancy/multi-thread safety, although > > > > nothing should prevent different devices from being configured at > > > > the same time. PMDs may protect their control path functions > > accordingly." > > > > > > > > I'd like to emphasize it is not "per port" but "per device", since > > > > in a few cases a configurable resource is shared by several ports. > > > > It may be difficult for applications to determine which ports are > > > > shared by a given device but this falls outside the scope of this API. > > > > > > > > Do you think adding the guarantee that it is always safe to > > > > configure two different ports simultaneously without locking from > > > > the application side is necessary? In which case the PMD would be > > > > responsible for locking shared resources. > > > [Sugesh] This would be little bit complicated when some of ports are > > > not under DPDK itself(what if one port is managed by Kernel) Or ports > > > are tied by different application. Locking in PMD helps when the ports > > > are accessed by multiple DPDK application. However what if the port itself > > not under DPDK? > > > > Well, either we do not care about what happens outside of the DPDK > > context, or PMDs must find a way to satisfy everyone. I'm not a fan of locking > > either but it would be nice if flow rules configuration could be attempted on > > different ports simultaneously without the risk of wrecking anything, so that > > applications do not need to care. > > > > Possible cases for a dual port device with global flow rule settings affecting > > both ports: > > > > 1) ports 1 & 2 are managed by DPDK: this is the easy case, a rule that needs > > to alter a global setting necessary for an existing rule on any port is > > not allowed (EEXIST). PMD must maintain a device context common to both > > ports in order for this to work. This context is either under lock, or > > the first port on which a flow rule is created owns all future flow > > rules. > > > > 2) port 1 is managed by DPDK, port 2 by something else, the PMD is aware of > > it and knows that port 2 may modify the global context: no flow rules can > > be created from the DPDK application due to safety issues (EBUSY?). > > > > 3) port 1 is managed by DPDK, port 2 by something else, the PMD is aware of > > it and knows that port 2 will not modify flow rules: PMD should not care, > > no lock necessary. > > > > 4) port 1 is managed by DPDK, port 2 by something else and the PMD is not > > aware of it: either flow rules cannot be created ever at all, or we say > > it is user's reponsibility to make sure this does not happen. > > > > Considering that most control operations performed by DPDK affect the > > device regardless of other applications, I think 1) is the only case that should > > be defined, otherwise 4), defined as user's responsibility. No more comments on this part? What do you suggest? -- Adrien Mazarguil 6WIND