From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id C31183F9 for ; Thu, 5 Jun 2014 15:22:48 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 05 Jun 2014 06:22:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.98,981,1392192000"; d="scan'208";a="542902068" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by fmsmga001.fm.intel.com with ESMTP; 05 Jun 2014 06:22:57 -0700 Received: from irsmsx105.ger.corp.intel.com (163.33.3.28) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 5 Jun 2014 14:22:56 +0100 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.105]) by IRSMSX105.ger.corp.intel.com ([169.254.7.239]) with mapi id 14.03.0123.003; Thu, 5 Jun 2014 14:22:56 +0100 From: "Dumitrescu, Cristian" To: Ivan Boule , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf Thread-Index: AQHPfl2X5dYukeXxAEuaIRS5K6w7ZJtigzFQ Date: Thu, 5 Jun 2014 13:22:56 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D891261B1C150B@IRSMSX102.ger.corp.intel.com> References: <1401210592-19732-1-git-send-email-cristian.dumitrescu@intel.com> <1401210592-19732-5-git-send-email-cristian.dumitrescu@intel.com> <5385D091.1010205@6wind.com> <3EB4FA525960D640B5BDFFD6A3D891261B1BF8EC@IRSMSX102.ger.corp.intel.com> <538C6CE6.1000900@6wind.com> In-Reply-To: <538C6CE6.1000900@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jun 2014 13:22:50 -0000 Hi, In order to minimize the number of iterations, Ivan and I had an offline di= scussion on this. Ivan is concerned with the way the mbuf cloning/scatter-gather feature is c= urrently implemented in DPDK, and not with this particular patch. We agreed to take the discussion on cloning/scatter-gather implementation d= uring the DPDK 1.8 time-frame, at this belongs to the mbuf refresh discussi= on. The mbuf library is not just the format of the mbuf data structure, it = also includes all the features associated with the mbuf, as: accessing mbuf= fields through get/set methods, buffer chaining, cloning/scatter-gather, e= nabling HW offloads through mbuf flags and fields, etc. Regards, Cristian -----Original Message----- From: Ivan Boule [mailto:ivan.boule@6wind.com] = Sent: Monday, June 2, 2014 1:24 PM To: Dumitrescu, Cristian; dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-dat= a in the packet buffer just after mbuf Hi Cristian, I agree with you, the natural way to store application metadata into = mbufs consists in putting it right after the rte_mbuf data structure. This can be simply implemented this way: struct metadata { ... }; struct app_mbuf { struct rte_mbuf mbuf; struct metadata mtdt; }; With such a representation, the application initializes the "buf_addr" = field of each mbuf pointed to by a (struct app_mbuf *)amb pointer as: amb->mbuf.buf_addr =3D ((char *amb) + sizeof(struct app_mbuf)); But such a natural programming approach breaks the assumptions of the = macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and = RTE_MBUF_DIRECT. For instance, in the function __rte_pktmbuf_prefree_seg(m), after the line struct rte_mbuf *md =3D RTE_MBUF_FROM_BADDR(m->buf_addr); "md" is always different from "m", and thus systematically (and most of = the time wrongly) considered as an indirect mbuf. In the same way, the operations done by the function rte_pktmbuf_detach(m) void *buf =3D RTE_MBUF_TO_BADDR(m); m->buf_addr =3D buf; does not set buf_addr to its [dafault] correct value. To summarize, adding metadata after the rte_mbuf data structure is incompatible with the packet cloning feature behind the [wrongly named] RTE_MBUF_SCATTER_GATHER configuration option. By the way, you suggest to use the headroom to also store packet metadata. But, then, how does an application can store both types of information = in a given mbuf at the same time, when the actual length of network = headers in a mbuf is variable, as it depends on the protocol path = followed by the packet in a networking stack (tunnels, etc)? Regards, Ivan On 05/30/2014 12:28 AM, Dumitrescu, Cristian wrote: > Hi Ivan, > > I hate to disagree with you :), but no, we do not break the scatter-gathe= r feature. We actually implemented the Packet Framework IPv4 fragmentation = and reassembly ports to validate exactly this. > > Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro on= ly works (and it is only used) for direct mbufs, so probably the correct na= me for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (an= d it is not used) for indirect mbufs. > > I am describing the rationale behind meta-data design below, sorry for th= e long description that looks like a design document ... > > Quick recap: > - mbuf->buf_addr points to the first address where a byte of data for the= current segment _can_ be placed > - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buff= er with the packet itself, so mbuf->buf_addr =3D mbuf + 1; > - indirect mbuf: the packet descriptor is located in a different mempool = buffer than the packet itself, so mbuf->buf_addr !=3D mbuf + 1; > - Regardless of the mbuf type, it is not necessarily true that the first = byte of data is located at mbuf->buf_addr, as we save a headroom (configura= ble, by default initially of CONFIG_RTE_PKTMBUF_HEADROOM =3D 128 bytes) at = the start of the data buffer (mbuf->buf_addr) for prepending packet headers= ( ... or, why not, meta-data!). The location of the first real data byte = is mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which d= oes not change. > > Packet meta-data rationale: > - I think we both agree that we need to be able to store packet meta-data= in the packet buffer. The packet buffer regions where meta-data could be s= tored can only be the in the headroom or in the tailroom, which are both ma= naged by the application and both go up and down during the packet lifetime. > > - To me, packet-metadata is part of the packet descriptor: mbuf is the ma= ndatory & standard part of the packet descriptor, while meta-data is the op= tional & non-standard (per application) part of the packet descriptor. Ther= efore, it makes sense to put the meta-data immediately after mbuf, but this= is not mandatory. > > - The zero-size field called meta-data is really just an offset: it point= s to the first buffer location where meta-data _can_ be placed. The reason = for having this field is to provide a standard way to place meta-data in th= e packet buffer rather that hide it in the Packet Framework libraries and p= otentially conflict with other mbuf changes in the future. It flags that me= ta-data should be taken into consideration when any mbuf change is done in = the future. > > - For direct mbufs, the same buffer space (the headroom) is shared betwee= n packet data (header prepending) and meta-data similar to how the stack an= d the heap manage the same memory. Of course, since it is managed by the ap= plication, it is the responsibility of the application to make sure the pac= ket bytes and the meta-data do not step on one another, but this problem is= not at all new, nor introduced by the meta-data field: even currently, the= application has to make sure the headroom is dimensioned correctly, so tha= t even in the worst case scenario (application-specific), the packet bytes = do not run into the mbuf fields, right? > > - For indirect mbufs, the packet meta-data is associated with the indirec= t mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, = as the direct mbuf contains a different packet that has different meta-data= ), so it makes sense that meta-data of the indirect mbuf is stored in the s= ame buffer as the indirect mbuf (rather than the buffer of the attached dir= ect mbuf). So this means that the buffer size used to store the indirect mb= uf is sized appropriately (mbuf size, currently 64 bytes, plus the max size= for additional meta-data). This is illustrated in the code of the IPv4 fra= gmentation port, where for every child packet (output fragment) we copy the= parent meta-data in the child buffer (which stores an indirect buffer atta= ched to the direct buffer of the input jumbo, plus now additional meta-data= ). > > - We are also considering having a user_data field in the mbuf itself to = point to meta-data in the same buffer or any other buffer. The Packet Frame= work functionally works with both approaches, but there is a performance pr= oblem associated with the mbuf->user_data approach that we are not addressi= ng for this 1.7 release timeframe. The issue is the data dependency that is= created, as in order to find out the location of the meta-data, we need to= first read the mbuf and then read meta-data from mbuf->user_data. The cos= t of the additional memory access is high, due to data dependency preventin= g efficient prefetch, therefore (at least for the time being) we need to us= e a compile-time known meta-data offset. The application can fill the meta-= data on packet RX and then all additional CPU cores processing the packet c= an leave of the meta-data for a long time with no need to access the mbuf o= r the packet (efficiency). For the time being, if somebody needs more yet m= eta-data in yet another = buffer, they can add (for their application) a user_data pointer as part of= their application meta-data (instead of standard mbuf). > > - As said, the mbuf->metadata points to the first location where meta-dat= a _can_ be placed, if somebody needs a different offset, they can add it on= top of the mbuf->metadata (e.g. by having a reserved field in their struct= app_pkt_metadata). We have demonstrated the use of meta-data in the exampl= es/ip_pipeline sample app (see struct app_pkt_metadata in "main.h"). > > Please let me know if this makes sense? > > Regards, > Cristian > > PS: This is where a white board would help a lot ... > > > -----Original Message----- > From: Ivan Boule [mailto:ivan.boule@6wind.com] > Sent: Wednesday, May 28, 2014 1:03 PM > To: Dumitrescu, Cristian; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-d= ata in the packet buffer just after mbuf > > Hi Cristian, > > Currently, the DPDK framework does not make any assumption on the actual > layout of a mbuf. > More precisely, the DPDK does not impose any constraint on the actual > location of additional metadata, if any, or on the actual location and > size of its associated payload data buffer. > This is coherent with the fact that mbuf pools are not created by the > DPDK itself, but by network applications that are free to choose > whatever packet mbuf layout that fits their particular needs. > > There is one exception to this basic DPDK rule: the mbuf cloning feature > available through the RTE_MBUF_SCATTER_GATHER configuration option > assumes that the payload data buffer of the mbuf immediately follows the > rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR, > RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file > lib/librte_mbuf/rte_mbuf.h). > > The cloning feature prevents to build packet mbufs with their metadata > located immediately after the rte_mbuf data structure, which is exactly > what your patch introduces. > > At least, a comment that clearly warns the user of this incompatibility > might be worth adding into both the code and your patch log. > > Regards, > Ivan > > On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote: >> Added zero-size field (offset in data structure) to specify the beginnin= g of packet meta-data in the packet buffer just after the mbuf. >> >> The size of the packet meta-data is application specific and the packet = meta-data is managed by the application. >> >> The packet meta-data should always be accessed through the provided macr= os. >> >> This is used by the Packet Framework libraries (port, table, pipeline). >> >> There is absolutely no performance impact due to this mbuf field, as it = does not take any space in the mbuf structure (zero-size field). >> >> Signed-off-by: Cristian Dumitrescu >> --- >> lib/librte_mbuf/rte_mbuf.h | 17 +++++++++++++++++ >> 1 files changed, 17 insertions(+), 0 deletions(-) >> >> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h >> index 4a9ab41..bf09618 100644 >> --- a/lib/librte_mbuf/rte_mbuf.h >> +++ b/lib/librte_mbuf/rte_mbuf.h >> @@ -201,8 +201,25 @@ struct rte_mbuf { >> struct rte_ctrlmbuf ctrl; >> struct rte_pktmbuf pkt; >> }; >> + = >> + union { >> + uint8_t metadata[0]; >> + uint16_t metadata16[0]; >> + uint32_t metadata32[0]; >> + uint64_t metadata64[0]; >> + }; >> } __rte_cache_aligned; >> >> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset) (mbuf->metadata[off= set]) >> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset) (mbuf->metadata16[o= ffset/sizeof(uint16_t)]) >> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset) (mbuf->metadata32[o= ffset/sizeof(uint32_t)]) >> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset) (mbuf->metadata64[o= ffset/sizeof(uint64_t)]) >> + >> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset) (&mbuf->metadata[of= fset]) >> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset) (&mbuf->metadata16[= offset/sizeof(uint16_t)]) >> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset) (&mbuf->metadata32[= offset/sizeof(uint32_t)]) >> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset) (&mbuf->metadata64[= offset/sizeof(uint64_t)]) >> + >> /** >> * Given the buf_addr returns the pointer to corresponding mbuf. >> */ >> > > -- = Ivan Boule 6WIND Development Engineer -------------------------------------------------------------- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the s= ole use of the intended recipient(s). Any review or distribution by others = is strictly prohibited. If you are not the intended recipient, please conta= ct the sender and delete all copies.