From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1928EA0613 for ; Mon, 23 Sep 2019 18:16:16 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8C1F61BF1D; Mon, 23 Sep 2019 18:16:15 +0200 (CEST) Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by dpdk.org (Postfix) with ESMTP id 43AD71BF0F for ; Mon, 23 Sep 2019 18:16:14 +0200 (CEST) Received: by mail-wm1-f67.google.com with SMTP id 3so9941814wmi.3 for ; Mon, 23 Sep 2019 09:16:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=wVWD+CJ86Ncwzdb4A35WPSLkWcrcAMBVqa5+4proRwQ=; b=Z/Lbh3/BVbKCrnYJelm8B5LyN9QiL4ghZeunO9VbALhcPM5RBt6XUmggIdlJSgglZE 29+SWqb/3gk3dvSwBHRXrErEukpKZvL6liXNar9vsxwilM0846+8d2b6g6nfyd1Uf02b u6GitUWhbB0kbAQjQpa+HzVR2ci63l3UX3WS7ysc0/S8Jpl/o91RM1zYwALKkCSDMd38 NJ8UNkHbcOxjoUaaN3lQx8ezSASubfJoSsTQ2v/o9eAQjWDVak9NJqA7ATmmLXFpTsNG grMLQck/wiiJ1Kk5YWG3c9rOm9vUQ2zgdcd/a4GlQO5BosD5RCSrY5+XmCSTcaDOkjWf 1XOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=wVWD+CJ86Ncwzdb4A35WPSLkWcrcAMBVqa5+4proRwQ=; b=BEAKBszc6SZkqQk9kKzO3AY4BpLkOsq7QldYoP7GuW5WB2pwcfGR4/JpF7gnlMU1sz HdTUOZJUznwUEXQyAQXQWCWMRgQFP4T0YIz3hJ2YGpWhXLXCHl6Zvqvqzh/qQEHCGFIr Ie+VM4ZOGI/Vq1+VIGudiSYqG766NpU3kv+rU87V9iYFHzCAbCz5eKJNJKOP18rgY1fU MXfE/p2xcXYxpl9cWiXS4IpzND2YNMrlxi9ZKd5IyxRN0IEuZ1yh5C0maGkHhd+iOecz Vt1AhQE5HRRS8OcHcGQ2ivsYt8Byb+f398yck7R2RaeYYUFt5VOqdhsBZdXTniuvxpFU 0hBA== X-Gm-Message-State: APjAAAVWiChM/A7VMQ8hRfOj66glQKqSbZyHCuHJ63Xz0rUbKr0nvYfv 4qpXPF6H69bva0DvN2mJtcgCFQ== X-Google-Smtp-Source: APXvYqy/q44pMsGWNGpFGZK3GimCSK8fJESZyKkIU50uXBPaosfv2mfjGFuA/CGz/KXllb3o/wCksw== X-Received: by 2002:a1c:9cd0:: with SMTP id f199mr321834wme.111.1569255373830; Mon, 23 Sep 2019 09:16:13 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a6000226b0fffeed02fc.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:226:b0ff:feed:2fc]) by smtp.gmail.com with ESMTPSA id v8sm21777951wra.79.2019.09.23.09.16.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Sep 2019 09:16:13 -0700 (PDT) Date: Mon, 23 Sep 2019 18:16:12 +0200 From: Olivier Matz To: "Wiles, Keith" Cc: dev , Thomas Monjalon , "Wang, Haiyue" , Stephen Hemminger , Andrew Rybchenko , Jerin Jacob Kollanukkaran Message-ID: <20190923161612.6dwnp54ai7fnnpm7@platinum> References: <20190710092907.5565-1-olivier.matz@6wind.com> <20190918165448.22409-1-olivier.matz@6wind.com> <37115768-EDA5-4089-8E86-3EFB26194A00@intel.com> <20190923091301.hquyxbbcbai43e4p@platinum> <12FB7B92-A7FE-40E1-A6AA-DD1C92AF0A59@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <12FB7B92-A7FE-40E1-A6AA-DD1C92AF0A59@intel.com> User-Agent: NeoMutt/20180716 Subject: Re: [dpdk-dev] [PATCH] mbuf: support dynamic fields and flags X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, (reformated the quotes) On Mon, Sep 23, 2019 at 03:14:51PM +0000, Wiles, Keith wrote: > > > On Sep 23, 2019, at 4:13 AM, Olivier Matz > wrote: > > > > Hi Keith, > > > > On Sat, Sep 21, 2019 at 08:28:32AM +0000, Wiles, Keith wrote: > > > > > > > > > On Sep 18, 2019, at 6:54 PM, Olivier Matz > wrote: > > > > > > > Many features require to store data inside the mbuf. As the room in mbuf > > > > structure is limited, it is not possible to have a field for each > > > > feature. Also, changing fields in the mbuf structure can break the API > > > > or ABI. > > > > > > > > This commit addresses these issues, by enabling the dynamic registration > > > > of fields or flags: > > > > > > > > - a dynamic field is a named area in the rte_mbuf structure, with a > > > > given size (>= 1 byte) and alignment constraint. > > > > - a dynamic flag is a named bit in the rte_mbuf structure. > > > > > > > > The typical use case is a PMD that registers space for an offload > > > > feature, when the application requests to enable this feature. As > > > > the space in mbuf is limited, the space should only be reserved if it > > > > is going to be used (i.e when the application explicitly asks for it). > > > > > > > > The registration can be done at any moment, but it is not possible > > > > to unregister fields or flags for now. > > > > > > > > Signed-off-by: Olivier Matz > > > > > Acked-by: Thomas Monjalon > > > > — > > > > > > > > > > > > The idea of registration for space in the mbuf I am not a big fan. I did like > > > Konstantin’s suggestion of having the compiler help with optimizing the code, > > > but with a slight difference. Maybe I misunderstand, but now with this design > > > you have to pass the offsets to different parts of the application or place in > > > global memory or have each section request the offsets. It seems great if the > > > application is one big application or an appliance model application having > > > control of the whole design not so good for service chains like designs where > > > different parts of the whole application is design by different teams. > > > > If the global variable storing the offset is defined in the mbuf layer, what > > would be the problem? > > Are you assuming the values are shared between primary/secondary model or > between processes using shared memory? If moving the packet data via shared > memory to a different application written by a different company you still > have to move that metadata. The dynamic mbuf proposal works with secondary processes. What does that change if the application is written by a different company? If you need to store a timestamp, you register the timestamp and the offset will be the same in primary and secondary. > If the type was carried with the mbuf we can easily convey a small > type value or we would need to tell the other side we have all of this > registration information to send. I would suggest the number of mbuf > types will be small over time and I believe a 4 bit or 8 bit type is > reasonable. In many protocols using a type value is used to convey > this type of information. We can even tightly control the number of > types DPDK controls and then leave some for user defined if we like. 8 bits means 256 different mbuf layouts. You did not replied to my previous questions: - what happens if you need a field from layout1 and another from layout2? (ex: timestamp + ipsec, timestamp + seqn, seqn + ipsec, ...) - how do you implement the rx/tx drivers functions if you have to support several layouts, where a field may be at a different offset? > > > The only things you would have to do is: > > > > > > 1/ ensure the offset is registered > > > rte_mbuf_dyn_timestamp_register() > > > > > > 2/ use helpers > > > rte_mbuf_dyn_timestamp_get(), rte_mbuf_dyn_timestamp_set(), ... > > > Konstantin’s suggest if I understand it was to use structures to allow the > > compiler to optimize the access to the mbuf and I like that idea, but with one > > change we add a field in the mbuf to define the mbuf structure type. > > > > Say 0 is the standard rte_mbuf type then type 1 could be the IPSec offset type > > mbuf, type 2 could be something else, … The type 0 looks just like the mbuf we > > have today with maybe the optional fields set to reserved or some type of > > filler variables to reserve the holes in the structure. Then type 1 is the > > IPSec mbuf and in the reserved sections of the mbuf contain the IPSec related > > data with the standard mbuf fields still matching the type 0 version. > > This very look like the "selective layout" in our presentation [1], page 14. > > Your example talks about IPsec, but someone else will want to use a > sequence number, another one a timestamp, and another one will want to > use this space for its own application. There are a lot of use cases, > and it does not scale to have a layout for each of them. Worst, if > someone wants IPsec + a sequence number, how can it work? > > One of the problem to solve is to avoid mutually exclusive feature (i.e. > union of fields that cannot be used together in the mbuf). > > This allows the mbuf to be used by the developer and the compiler now knows > exactly where the fields are located in the structure and does not have to > deal with any of the macros and offsets and registration suggested here. Just > cast the mbuf pointer into the new type mbuf structure. We just have to make > sure the code that needs to use a given mbuf type has access to the structure > definitions. > > With the current proposal, we can imagine an API to ask to register a > field at a specific offset. It can then be used in the application, so > that accesses are done at no cost compared to a static field, because > the offset would be const. > > In the driver, the same logic could be used, but dynamically: > > if (offset == PREFERRED_OFFSET) { > /* code with static offset */ > } else { > /* generic code */ > } > > But I'm not sure it would scale a lot if there are several features > using dynamic fields. > > > If the mbufs it going to be translated from one type mbuf to another mbuf > > type, we just have to define that type and then cast the mbuf pointer to that > > structure. When an mbuf is received from IPSec PMD then the application needs > > to forward that mbuf to the next stage it can reset the type to 0 or to > > another type filling in the reserved fields to be used by the next stage in > > the pipeline. > > What you describe is one use case. > > What could be done with the API mentionned above (but I think it is > dangerous), is to allow a user to register 2 different fields at the > same offset, using a specific flag. This could work if the user knows > that these 2 fields are never used at the same time. > > The mbuf now contains the type and every point in the application can look at > the type to determine how that mbuf is defined. I am sure there are some holes > here, but I think it is a better solution then using all of these macros, > offset values and registration APIs. > > I'm not convinced having selective layouts is doable. The layouts cannot > fit all possible use cases, and managing the different layouts in the > driver looks difficult to me. Additionnaly, it does not solve the > problem of mutually exclusive features. > > I too at one time wanted some type of allocation or registration for > private mbuf space and applying to these limited fields in the mbuf > header may have been reasonable. The problem is using registration and > moving that information between processes is going to be hard to get > right. For a single Appliance model application it would work great > and not in a non-appliance model applications. I didn't get why it wouldn't work in a non-appliance model (are you talking about primary/secondary processes?). Can you elaborate about waht would be the problem? > The type/structure > method can help and it could have problems too, but using a > type/struct design seems to be one of the BKMs (Best Known Methods) in > the industry. Sorry, but this is not a valid argument. > To be honest it maybe we just take the hit in performance and add a > third cache line as I am sure trying to squeeze metadata into these > very limit fields will be a challenge IMO. I am not suggesting we add > a cache line to every mbuf only to the pools that require the extra > metadata by using the private space if that is reasonable. The > applications needing a lot of metadata will just have to take the hit > in performance anyway. If an application wants to attach more data in the mbuf, we already have the application private area. This zone is transparent from DPDK point of view, it does not impact drivers or libs. > Having to grab a metadata value via a set of macros and inline > functions seems like it will consume more cycles then just a > type/structure method as the compiler will help optimize the code > without having to call any macros or inline functions. Yes, I know that. This is the price to pay for solving the problems (wasted size, exclusive features, avoid abi breakage). I answered in a previous mail that the extra cost can be removed at application level if we add an API to reserve a known offset. The ability to locate some offload fields in the Rx part may also help to gain some cycles compared to static fields. Regards, Olivier