From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B70C3A00E6 for ; Fri, 12 Jul 2019 11:06:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6C61A1B9AA; Fri, 12 Jul 2019 11:06:48 +0200 (CEST) Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 29D3A1B9A7 for ; Fri, 12 Jul 2019 11:06:47 +0200 (CEST) Received: from lfbn-lil-1-176-160.w90-45.abo.wanadoo.fr ([90.45.26.160] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hlrYv-0001t9-AB; Fri, 12 Jul 2019 11:09:54 +0200 Received: by droids-corp.org (sSMTP sendmail emulation); Fri, 12 Jul 2019 11:06:43 +0200 Date: Fri, 12 Jul 2019 11:06:43 +0200 From: Olivier Matz To: "Wiles, Keith" Cc: dpdk dev community , Stephen Hemminger Message-ID: <20190712090643.z3mpl6e4smara3eo@platinum> References: <20190710092907.5565-1-olivier.matz@6wind.com> <20190710104917.73bc9201@hermes.lan> <3117C805-492A-4DA9-BE8F-E119E057C80D@intel.com> <20190711075320.woswl6fhl6szgdqb@glumotte.dev.6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, On Thu, Jul 11, 2019 at 02:37:23PM +0000, Wiles, Keith wrote: > > > > On Jul 11, 2019, at 2:53 AM, Olivier Matz wrote: > > > > Hi Keith, > > > > On Wed, Jul 10, 2019 at 06:12:16PM +0000, Wiles, Keith wrote: > >> > >> > >>> On Jul 10, 2019, at 12:49 PM, Stephen Hemminger wrote: > >>> > >>> On Wed, 10 Jul 2019 11:29:07 +0200 > >>> Olivier Matz wrote: > >>> > >>>> /** > >>>> * Indicate that the metadata field in the mbuf is in use. > >>>> @@ -738,6 +741,8 @@ struct rte_mbuf { > >>>> */ > >>>> struct rte_mbuf_ext_shared_info *shinfo; > >>>> > >>>> + uint64_t dynfield1; /**< Reserved for dynamic fields. */ > >>>> + uint64_t dynfield2; /**< Reserved for dynamic fields. */ > >>>> } __rte_cache_aligned; > >>> > >>> Growing mbuf is a fundamental ABI break and this needs > >>> higher level approval. Why not one pointer? > >>> > >>> It looks like you are creating something like FreeBSD m_tag. > >>> Why not use that? > >> > >> Changing the mbuf structure causes a big problem for a number reasons as Stephen states. > > > > Can you elaborate? > > > > This is indeed an ABI break, but I think this is only due to the adding > > of rte_mbuf_dynfield_copy() in rte_pktmbuf_attach(). The size of the > > mbuf does not change and the fields are not initialized when creating a > > new mbuf. So I think there is no ABI change for code that is not using > > rte_pktmbuf_attach(). > > > > I don't think it's a problem to have one ABI change, if it avoids many > > others in the future. > > > >> If we leave the mbuf stucture alone and add this feature to the > >> headroom space between the mbuf structure and the packet. When setting > >> up the mempool/mbuf pool we define a headroom to hold the extra data > >> when the mbuf pool is created or just use the current headroom > >> space. Using this method we can eliminate the mbuf structure change > >> and add the data to the packet buffer. We can do away with dynfield1 > >> and 2 as we know where headroom space begins and ends. Just a thought. > > > > The size of the mbuf metadata (between the mbuf structure and the > > buffer) is configured per pool, so it can be different accross > > mbufs. So, the access to the dynamic field would be slower: > > *(mbuf + dynfield_offset + metadata_size(mbuf)) > > We can force that space to be a minimum size when the mempool is > created in the case of a cloned mbuf. The cloned mbuf is a small use > case, but am important one and increasing the size for those special > mbufs by a cache line should not be a huge problem. > > I think most allocations do not change the size from the default value > of the headroom (128). The mbuf + buffer are normally rounded to 2K or > a bit bigger, which gives a bit more space in those cases of a packet > size of 1518-1522. Jumbo frames are the same. Using the headroom size > for an application needs to be defined and setup for the max size > anyway for the application needs, so normally all mbuf creates should > contain the same size to account for mbuf moments within the system. If we want more room for dynamic fields, we can do something like this. But I don't think this is something that will happen soon: we already have 16 bytes available, and I'm sure we can get another 16 bytes very easily by just converting fields like timestamp or sequence numbers. To attach larger amount of data to mbufs, the metadata feature still exists. We can imagine to extend the dynamic fields feature to be able to use more space after the mbuf structure (in metadata?), but it is more complex. I don't think that using headroom or tailroom is a good idea. That's true that mbufs are usually a bit more than 2K, and some space is lost when mtu is 1500. But smaller mbufs are perfectly legal too, except that some drivers do not support it. Anyway, headroom and tailroom must be used for what they are designed: reserving room to prepend or append data. If we want more space for dynamic fields, let's add a specific location for it, when it will be needed.