From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id E6FA75A8C for ; Tue, 31 Mar 2015 01:17:26 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP; 30 Mar 2015 16:17:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,497,1422950400"; d="scan'208";a="548628810" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by orsmga003.jf.intel.com with ESMTP; 30 Mar 2015 16:17:24 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.2]) by IRSMSX151.ger.corp.intel.com ([169.254.4.52]) with mapi id 14.03.0224.002; Tue, 31 Mar 2015 00:17:24 +0100 From: "Ananyev, Konstantin" To: Olivier MATZ Thread-Topic: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data Thread-Index: AQHQaG2kxHNOjCcpFECqfV5gntl+K50wWsQAgAACpICAABPugIAEdP1wgAB/GQCAAEV+oA== Date: Mon, 30 Mar 2015 23:17:23 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725821412EE9@irsmsx105.ger.corp.intel.com> References: <1427302838-8285-1-git-send-email-olivier.matz@6wind.com> <1427385595-15011-1-git-send-email-olivier.matz@6wind.com> <1427385595-15011-2-git-send-email-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB97725821407D4D@irsmsx105.ger.corp.intel.com> <55151DDE.8040301@6wind.com> <55156188.6040101@6wind.com> <2601191342CEEE43887BDE71AB9772582140814C@irsmsx105.ger.corp.intel.com> <55157477.7090207@6wind.com> <2601191342CEEE43887BDE71AB9772582140FB21@irsmsx105.ger.corp.intel.com> <5519AA46.1090409@6wind.com> In-Reply-To: <5519AA46.1090409@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2015 23:17:27 -0000 Hi Olivier, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Monday, March 30, 2015 8:56 PM > To: Ananyev, Konstantin > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when appli= cation uses private mbuf data >=20 > Hi Konstantin, >=20 > On 03/30/2015 02:34 PM, Ananyev, Konstantin wrote: > > Hi Olivier, > > > >> -----Original Message----- > >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > >> Sent: Friday, March 27, 2015 3:17 PM > >> To: Ananyev, Konstantin; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when ap= plication uses private mbuf data > >> > >> Hi Konstantin, > >> > >> On 03/27/2015 03:25 PM, Ananyev, Konstantin wrote: > >>> Hi Olivier, > >>> > >>>> -----Original Message----- > >>>> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > >>>> Sent: Friday, March 27, 2015 1:56 PM > >>>> To: Ananyev, Konstantin; dev@dpdk.org > >>>> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when = application uses private mbuf data > >>>> > >>>> Hi Konstantin, > >>>> > >>>> On 03/27/2015 10:07 AM, Olivier MATZ wrote: > >>>>>> I think that to support ability to setup priv_size on a mempool ba= sis, > >>>>>> and reserve private space between struct rte_mbuf and rte_mbuf. bu= f_addr, > >>>>>> we need to: > >>>>>> > >>>>>> 1. Store priv_size both inside the mempool and inside the mbuf. > >>>>>> > >>>>>> 2. rte_pktmbuf_attach() should change the value of priv_size to th= e priv_size of the direct mbuf we are going to attach to: > >>>>>> rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) {... = mi->priv_size =3D md->priv_size; ...} > >>>>>> > >>>>>> 3. rte_pktmbuf_detach() should restore original value of mbuf's pr= iv_size: > >>>>>> rte_pktmbuf_detach(struct rte_mbuf *m) > >>>>>> { > >>>>>> ... > >>>>>> m->priv_size =3D rte_mempool_get_privsize(m->pool); > >>>>>> m->buf _addr=3D rte_mbuf_to_baddr(m); > >>>>>> ... > >>>>>> } > >>>>>> > >>>>>> Also I think we need to provide a way to specify priv_size for all= mbufs of the mepool at init time: > >>>>>> - either force people to specify it at rte_mempool_create() time (= probably use init_arg for that), > >>>>>> - or provide separate function that could be called straight after= rte_mempool_create() , that > >>>>>> would setup priv_size for the pool and for all its mbufs. > >>>>>> - or some sort of combination of these 2 approaches - introduce a = wrapper function > >>>>>> (rte_mbuf_pool_create() or something) that would take priv_size as= parameter, > >>>>>> create a new mempool and then setup priv_size. > >>>> > >>>> I though a bit more about this solution, and I realized that doing > >>>> mi->priv_size =3D md->priv_size in rte_pktmbuf_attach() is probably = not > >>>> a good idea, as there is no garantee that the size of mi is large en= ough > >>>> to store the priv of md. > >>>> > >>>> Having the same priv_size for mi and md is maybe a good constraint. > >>>> I can add this in the API comments and assertions in the code to > >>>> check this condition, what do you think? > >>> > >>> Probably we have a different concepts of what is mbuf's private spac= e in mind. > >>> From my point, even indirect buffer should use it's own private space= and > >>> leave contents of direct mbuf it attached to unmodified. > >>> After attach() operation, only contents of the buffer are shared betw= een mbufs, > >>> but not the mbuf's metadata. > >> > >> Sorry if it was not clear in my previous messages, but I agree > >> with your description. When attaching a mbuf, only data, not > >> metadata should be shared. > >> > >> In the solution you are suggesting (quoted above), you say we need > >> to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt > >> this was not possible, but it depends on the meaning we give to > >> priv_size: > >> > >> 1. If the meaning is "the size of the private data embedded in this > >> mbuf", which is the most logical meaning, we cannot do this > >> affectation > >> > >> 2. If the meaning is "the size of the private data embedded in the > >> mbuf the buf_addr is pointing to" (which is harder to get), the > >> affectation makes sense. > >> > >> From what I understand, you feel we should use 2/ as priv_size > >> definition. Is it correct? > > > > Yes, I meant 2. > > From my point priv_size inside mbuf is more like 'buf_ofs'. > > It's main purpose is for internal use - to help our mbuf manipulation r= outinies > > (attach/detach/free) to work correctly. > > If the user wants to query size of private space for the mbuf, he proba= bly should > > use the value from mempool. >=20 > Agree. >=20 >=20 > >> In my previous message, the definition of m->priv_size was 1/, > >> so that's why I felt assigning mi->priv_size to md->priv_size was > >> not possible. > >> > >> I agree 2/ is probably a good choice, as it would allow to attach > >> to a mbuf with a different priv_size. It may require some additional > >> comments above the field in the structure to explain that. > >> > >> > >>> Otherwise on detach(), you'll have to copy contents of private space = back, from direct to indirect mbuf? > >>> Again how to deal with the case, when 2 or more mbufs will attach to = the same direct one? > >>> > >>> So let say, if we'll have a macro: > >>> > >>> #define RTE_MBUF_PRIV_PTR(mb) ((void *)((struct rte_mbuf *)(mb)) + 1)= ) > >>> > >>> No matter is mb a direct or indirect mbuf. > >>> Do you have something else in mind here? > >> > >> I completely agree with this macro. We should consider the private dat= a > >> as an extension of the mbuf structure. > >> > >> > >>>>> Introducing rte_mbuf_pool_create() seems a good idea to me, it > >>>>> would hide 'rte_pktmbuf_pool_private' from the user and force > >>>>> to initialize all the required fields (mbuf_data_room_size only > >>>>> today, and maybe mbuf_priv_size). > >>>>> > >>>>> The API would be: > >>>>> > >>>>> struct rte_mempool * > >>>>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_siz= e, > >>>>> unsigned cache_size, size_t mbuf_priv_size, > >>>>> rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg, > >>>>> int socket_id, unsigned flags) > >>>>> > >>>>> I can give it a try and send a patch for this. > >>>> > >>>> About this, it is not required anymore for this patch series if we > >>>> agree with my comment above. > >>> > >>> I still think we need some way to setup priv_size on a per-mempool ba= sis. > >>> Doing that in rte_mbuf_pool_create() seems like a good thing to me. > >>> Not sure, why you decided to drop it? > >> > >> I think we can already do it without changing the API by providing > >> our own rte_pktmbuf_init and rte_pktmbuf_pool_init. > >> > >> rte_pktmbuf_init() has to set: > >> m->buf_len =3D mp->elt_size - sizeof(struct mbuf); > >> m->priv_size =3D sizeof(struct mbuf) - sizeof(struct rte_mbuf); > >> > >> rte_pktmbuf_pool_init() has to set: > >> /* we can use the default function */ > >> mbp_priv->mbuf_data_room_size =3D MBUF_RXDATA_SIZE + > >> RTE_PKTMBUF_HEADROOM; > > > > Yeh, when arg=3D=3DNULL for rte_pktmbuf_pool_init() we always set up > > mbuf_data_room_size to the hardcoded value. > > Which always looked strange to me. > > I think it should be set to: > > mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM; > > for that case. >=20 > Yes, that would make more sense instead of the hardcoded value. > But I'm not sure it should be part of this series as the clone > patches won't change this behavior. I would prefer to have it in > another series that reworks mbuf pool initialization. I can also > work on it. >=20 > On the other hand if you really feel this patch is needed in > this series, it's not a problem as it's a one-liner. >=20 > > > >> > >> In this case, it is possible to calculate the mbuf_priv_size only > >> from the pool object: > >> > >> mbuf_priv_size =3D pool->elt_size - RTE_PKTMBUF_HEADROOM - > >> pool_private->mbuf_data_room_size - > >> sizeof(rte_mbuf) > >> > > > > So if I understand your idea correctly: > > If second parameter for rte_pktmbuf_pool_init() is NULL, then > > we setup > > > > mbp_priv->mbuf_data_room_size =3D mp->elt_size - sizeof(struct rte_mbuf= ) - RTE_PKTMBUF_HEADROOM; > > > > Which means that priv_size =3D=3D0 for all mbufs in the pool > > Otherwise we setup: > > > > mbp_priv->mbuf_data_room_size =3D opaque_arg; > > > > As we are doing now, and priv_size for all mbufs in the pool will be: > > pool->elt_size - pool_private->mbuf_data_room_size - sizeof(rte_mbuf); > > > > And in that case, all users have to do to specify priv_size for mempool= is to pass a proper argument > > for rte_pktmbuf_pool_init() at mempool_create(). > > Correct? >=20 > Correct. >=20 >=20 >=20 > > > >> > >> I agree it's not ideal, but I think the mbuf pool initialization > >> is another problem. That's why I suggested to change this in a > >> separate series that will add rte_mbuf_pool_create() with the > >> API described above. Thoughts? > >> > > > > I also put answers to another mail below. > > Just to keep all discussion in one place. > > > >>> Though, I still think that the better approach is to reserve private = space before struct rte_mbuf, not after. > >>> In that case, user got his private space, and we keep buf_addr straig= ht after rte_mbuf, without any whole. > >>> So we don't need steps 2 and 3, above, > >>> plus we don't need rte_mbuf_to_baddr() and rte_mbuf_from_indirect() - > >>> RTE_MBUF_TO_BADDR/ RTE_MBUF_FROM_BADDR would keep working correctly. > >>> In fact, with this scheme - we don't even need priv_size for mbuf man= agement (attach/detach/free). > >>> > >>> Wonder did you try that approach? > >> > >> Yes, I though about this approach too. But I think it would require > >> more changes. When an application or a driver allocate a mbuf with > >> mempool_get(), it expects that the returned pointer is the rte_mbuf *. > > > > Yep, sure it will still get the pointer to the rte_mbuf *. > > Though later, if upper layer would like to convert from rte_mbuf* to a= pp_specific_mbuf *, > > it would have to use a macro: > > > > #define RTE_MBUF_TO_PRIV(m) ((void *)((uintptr_t)(m) - (m)->priv_size)) > > > >> > >> With this solution, there are 2 options: > >> - no mempool modification, so each application/driver has to add > >> priv_size bytes to the object to get the mbuf pointer. This does not > >> look feasible. > >> - change the mempool to support private area before each object. I > >> think mempool API is already quite complex, and I think it's not > >> the role of the mempool library to provide such features. > > > > > > In fact, I think the changes would be minimal. > > All we have to do, is to make changes in rte_pktmbuf_init(): > > > > void > > rte_pktmbuf_init(struct rte_mempool *mp, > > __attribute__((unused)) void *opaque_arg, > > void *_m, > > __attribute__((unused)) unsigned i) > > { > > > > //extract priv_size from mempool (discussed above). > > uint16_t priv_size =3D rte_mbufpool_get_priv_size(mp); > > > > struct rte_mbuf *m =3D _m + priv_size; > > uint32_t buf_len =3D mp->elt_size - sizeof(struct rte_mbuf) - pri= v_size; > > > > .... > > > > > > With that way priv_size inside mbuf would always be the size its own pr= ivate space, > > for both direct and indirect mbus., so we don't require to set priv_siz= e at attach/detach. > > Again RTE_MBUF_TO_BADDR() and RTE_MBUF_FROM_BADDR() would just work wit= hout any modifications. >=20 > I'm not sure I'm getting it. The argument '_m' of your > rte_pktmbuf_init() is the pointer to the priv data, right? > So it means that the mbuf is located priv_size bytes after. >=20 > The rte_pktmbuf_init() function is called by mempool_create(), > and the _m parameter is a pointer to a mempool object. So > in my understanding, mempool_get() would not return a rte_mbuf > but a pointer to the application private data. Ah my bad, forgot that mempool's obj_init() returns void now :( To make this approach work also need to change it, so it return a pointer t= o the new object.=20 So, rte_pktmbuf_init() would return m and then in mempool_add_elem(): if (obj_init) obj =3D obj_init(mp, obj_init_arg, obj, obj_idx); rte_ring_sp_enqueue(mp->ring, obj);=20 Konstantin >=20 >=20 > Regards, > Olivier >=20 >=20 >=20 > > > > Konstantin > > > >