From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 508BB5A0A for ; Wed, 20 May 2015 20:47:25 +0200 (CEST) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1Yv91R-00057d-1Q; Wed, 20 May 2015 14:47:23 -0400 Date: Wed, 20 May 2015 14:47:16 -0400 From: Neil Horman To: Marc Sune Message-ID: <20150520184716.GB1775@hmsreliant.think-freely.org> References: <1431361781-12621-1-git-send-email-bruce.richardson@intel.com> <20150519113112.GA10700@bricha3-MOBL3> <2340603.EB75GY9ZBE@xps13> <555C5C4C.80308@bisdn.de> <20150520102816.GA1775@hmsreliant.think-freely.org> <555CBDCE.5010004@bisdn.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <555CBDCE.5010004@bisdn.de> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 May 2015 18:47:25 -0000 On Wed, May 20, 2015 at 07:01:02PM +0200, Marc Sune wrote: > > > On 20/05/15 12:28, Neil Horman wrote: > >On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote: > >> > >>On 20/05/15 10:31, Thomas Monjalon wrote: > >>>2015-05-19 12:31, Bruce Richardson: > >>>>On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: > >>>>>Hi all, > >>>>> > >>>>>after a small amount of offline discussion with Marc Sune, here is an > >>>>>alternative proposal for a higher-level interface - aka pktdev - to allow a > >>>>>common Rx/Tx API across device types handling mbufs [for now, ethdev, ring > >>>>>and KNI]. The key code is in the first patch fo the set - the second is an > >>>>>example of a trivial usecase. > >>>>> > >>>>>What is different about this to previously: > >>>>>* wrapper class, so no changes to any existing ring, ethdev implementations > >>>>>* use of function pointers for RX/TX with an API that maps to ethdev > >>>>> - this means there is little/no additional overhead for ethdev calls > >>>>> - inline special case for rings, to accelerate that. Since we are at a > >>>>> higher level, we can special case process some things if appropriate. This > >>>>> means the impact to ring ops is one (predictable) branch per burst > >>>>>* elimination of the queue abstraction. For the ring and KNI, there is no > >>>>> concept of queues, so we just wrap the functions directly (no need even for > >>>>> wrapper functions, the api's match so we can call directly). This also > >>>>> means: > >>>>> - adding in features per-queue, is far easier as we don't need to worry about > >>>>> having arrays of multiple queues. For example: > >>>>> - adding in buffering on TX (or RX) is easier since again we only have a > >>>>> single queue. > >>>>>* thread safety is made easier using a wrapper. For a MP ring, we can create > >>>>> multiple pktdevs around it, and each thread will then be able to use their > >>>>> own copy, with their own buffering etc. > >>>>> > >>>>>However, at this point, I'm just looking for general feedback on this as an > >>>>>approach. I think it's quite flexible - even more so than the earlier proposal > >>>>>we had. It's less proscriptive and doesn't make any demands on any other libs. > >>>>> > >>>>>Comments/thoughts welcome. > >>>>Any comments on this RFC before I see about investing further time in it to clean > >>>>it up a bit and submit as a non-RFC patchset for merge in 2.1? > >>>I would say there are 2 possible approaches for KNI and ring handling: > >>>1/ You Bruce, Marc and Keith are advocating for a layer on top of ethdev, > >>>ring, KNI and possibly other devices, which uses mbuf. The set of functions > >>>is simpler than ethdev but the data structure is mbuf which is related to > >>>ethdev layer. > >>>2/ Konstantin and Neil talked about keeping mbuf for ethdev layer and related > >>>libs only. Ring and KNI could have an ethdev API with a reduced set of > >>>implemented functions. Crypto devices could adopt a specific crypto API and > >>>an ethdev API at the same time. > >>I don't fully understand which APIs you meant by non-ethdev. This pktdev > >>wrapper proposal abstracts RX and TX functions only, and all of these are > >>using mbufs as the packet buffer abstraction right now anyway (ethdev). > >> > >He's referring to future device classes (like crypto devices), which ostensibly > >would make use of the pktdev API. My argument (and I think Thomas') is that if > >a bit of hardware can be made to operate as a packet sending/receiving device, > >then its just as reasonable to use the existing ethdev api rather than some > >other restricted version of it (pktdev) > > > >>This approach does not preclude that different libraries expose other API > >>calls. In fact they will have to; setup the port/device ... It is just a > >>higher level API, so that you don't have to check the type of port in your > >>DPDK application I/O loop, minimizing user's code. > >> > >No argument there. But if thats the case (and I agree that it is), an > >application will implicitly have to know what what type of device it is, because > >it (the application) will need to understand the specific API it is writing to. > > > >>Or were you in 2) thinking about creating a different "packet buffer" > >>abstraction, independent from the ethdev, and then map the different port > >>specifics (e.g. mbuf) to this new abstraction? > >> > >My argument was to just leave the ethdev api alone. If a device class can be > >made to look like a packet forwarding device, then use the existing ethdev api > >to implement it. > > > >>>I feel it's cleaner, more generic and more maintainable to have drivers > >>>implementing one or several stable APIs instead of having some restricted > >>>wrappers to update. > >>This would be a separate library _on top_ of the existing APIs, and it has > >>the advantage to simplify the DPDK user's application code when an > >>application needs to deal with several types of port, as shown in the > >>example that Bruce provided in PATCH #2. > >> > >But thats already the purpose of the ethdev api. Different types of > >hardware/software can be made to look like the same thing (an ethdev) from an > >application standpoint. Adding this pktdev layer does nothing but that, add a > >layer. If you want restricted functionality of an interface, thats ok, ethdev > >offers that ability. unimplemented methods in a pmd cause the ethdev api to > >return EOPNOTSUP to the calling application, so the application knows when a > >given ethdev can't do some aspect of what an ethdev is. > > Hi Neil, > > Thanks for the clarifications. Now I understand the concern Thomas > expressed. Using ethdev API (port-ids) was actually my first suggestion > here: > > http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545 > > And to be honest, what I was expecting when I was reading for the first time > DPDK's APIs. It is indeed an option. However, if we take a look at the API: > > http://www.dpdk.org/doc/api/rte__ethdev_8h.html > > none of the API calls, except the burst RX/TX and, perhaps, the callbacks, > would be used by devices other than NICs. It seems going a bit too far using > it, but ofc possible. > So, I'll make 3 counter-arguments here: 1) To your point about the ethdev api being much larger than what a non-ethernet device could use, I'll tacitly agree, but indicate that its not relevant. If you want a bit of hardware that isn't a network interface to behave like a network interface, then there are going to be alot of aspects of a network interface that it just can't do. Thats true regardless of how you implement that. In the pktdev model, you prevent those operations from being an option at all, while in the current ethdev model, you simply get a return code of EOPNOTSUP, and the application does the right thing (which is to say, it understands that this hardware doesn't need that aspect of network card mangement and goes on with its day). I assert that, because we already have the ethdev api, its a lower time investment to simply reuse it 2) To the implication that we aren't working with NICs here, you're correct. As you note in your previous message, the pktdev interface is in no way the end all and be all of device model design. You will need to add other api calls to manage the device. If thats the case, then don't shoehorn any one particular aspect of the API to fit a device model that the device doesn't conform to. Design the API so that it best reflects the hardware behavior. 3) An addendum to the point about hardware not being a NIC (and you didn't make this point directly above, but I think you may have mentioned it previously), sometimes you want a device to behave like another device for the purposes of using generic code to talk to several device types. While this is true, this is a case for device translation and use, not for carving out parts of an api to make something more generic. The use case I cited previously was an ipsec tunnel. An ipsec tunnel uses cryptography, and crypto device apis to encrypt decrypt packet data. The common way to implement this is to design a crypto api that accepts a block of data in a way most condusive to the hardware, and then implement a network driver (that uses whatever ethernet api, in this case the ethdev api), to integrate with the network datapath. With this model, the ipsec tunnel uses the full range of the ethdev api (or a good deal more of it), and the crypto api is optimized to work with crypo acceleration hardware. > In essence, rte_ether(rte_ethdev.h) right now has: i) NIC setup; general > configuration, queue config, fdir, offloads, hw stuff like leds... ii) RX/TX > routines and callbacks iii) Stats and queue stats iv) other utils for > ethernet stuff (rte_ether.h) > The key that I'm taking away here is 'right now'. Its already written, so theres no work involved in implementing it for new devices. > i) is clearly HW specific, and does only apply to NICs/ASICs (e.g. FM10k) Ok, so it only applies to NIC's, thats fine. If you want to write a driver that leaves those methods for the pmd set to NULL, the ethdev library will correctly return EOPNOTSUPP to the calling applications. > while ii) and iii) are things that could be abstracted beyond NICs, like > KNI, rte_ring, crypto... (iv could be moved into some utils/protocol parsing > libraries). > Right again, so let those device types implement the appropriate portions of the pmd driver structure that match to what they support. EVerything else is handled by the ethdev library automatically. > Perhaps these two groups could be split into two different libraries and > then ii) and iii) together would be something like ~ rte_pktdev (stats are > missing on the proposed patch), while i) would be rte_ether, or rte_nic if > we think it is a better name. > The point I'm trying to get to is, why split at all? Theres just no need that I can see. The example I would set here is the dummy driver in linux. Its a net device that only serves to act as a sink for network packets. It still uses the network driver interface, but of the 65-ish methods that the netdevice model in linux offers, it implements 8 (or approximately 12%). The other unused method are just that, unused, and thats ok. Applications that try to do things like set flow director options, or speed/duplex options gets a return code that effectively says "This device can't do that", and thats ok. Thats what we need to be doing here. Instead of finding a way to codify the subset of functionality that other devices might be able to implement, for those cases where we want other hardware to act like a netdevice, lets just let those devices pick and choose what to implement, and the interface we already have will communicate with applications appropriately. Regards Neil > In any case, and I think we all agree here, I just think that one way or > another this should be abstracted so that it simplifies (and reduces) a bit > the code of DPDK applications. > > Marc > > > > >>I don't see why this could limit us or make it less maintainable. Of course > >>this is an RFC patch; appropriate tests are missing (Bruce I can help you on > >>that) > >> > >It doesn't limit us, its just not a useful abstraction, because we already have > >the abilities it provides. > > > >Neil > >>Marc > >> > >>>Comments are welcome. > >> > >