From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id D7B21567C for ; Mon, 30 Mar 2015 14:34:56 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 30 Mar 2015 05:34:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,493,1422950400"; d="scan'208";a="548323851" Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25]) by orsmga003.jf.intel.com with ESMTP; 30 Mar 2015 05:34:54 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.2]) by irsmsx110.ger.corp.intel.com ([169.254.15.225]) with mapi id 14.03.0224.002; Mon, 30 Mar 2015 13:34:53 +0100 From: "Ananyev, Konstantin" To: Olivier MATZ Thread-Topic: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data Thread-Index: AQHQaG2kxHNOjCcpFECqfV5gntl+K50wWsQAgAACpICAABPugIAEdP1w Date: Mon, 30 Mar 2015 12:34:53 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772582140FB21@irsmsx105.ger.corp.intel.com> References: <1427302838-8285-1-git-send-email-olivier.matz@6wind.com> <1427385595-15011-1-git-send-email-olivier.matz@6wind.com> <1427385595-15011-2-git-send-email-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB97725821407D4D@irsmsx105.ger.corp.intel.com> <55151DDE.8040301@6wind.com> <55156188.6040101@6wind.com> <2601191342CEEE43887BDE71AB9772582140814C@irsmsx105.ger.corp.intel.com> <55157477.7090207@6wind.com> In-Reply-To: <55157477.7090207@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2015 12:34:57 -0000 Hi Olivier, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Friday, March 27, 2015 3:17 PM > To: Ananyev, Konstantin; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when appli= cation uses private mbuf data >=20 > Hi Konstantin, >=20 > On 03/27/2015 03:25 PM, Ananyev, Konstantin wrote: > > Hi Olivier, > > > >> -----Original Message----- > >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > >> Sent: Friday, March 27, 2015 1:56 PM > >> To: Ananyev, Konstantin; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when ap= plication uses private mbuf data > >> > >> Hi Konstantin, > >> > >> On 03/27/2015 10:07 AM, Olivier MATZ wrote: > >>>> I think that to support ability to setup priv_size on a mempool basi= s, > >>>> and reserve private space between struct rte_mbuf and rte_mbuf. buf_= addr, > >>>> we need to: > >>>> > >>>> 1. Store priv_size both inside the mempool and inside the mbuf. > >>>> > >>>> 2. rte_pktmbuf_attach() should change the value of priv_size to the = priv_size of the direct mbuf we are going to attach to: > >>>> rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) {... mi= ->priv_size =3D md->priv_size; ...} > >>>> > >>>> 3. rte_pktmbuf_detach() should restore original value of mbuf's priv= _size: > >>>> rte_pktmbuf_detach(struct rte_mbuf *m) > >>>> { > >>>> ... > >>>> m->priv_size =3D rte_mempool_get_privsize(m->pool); > >>>> m->buf _addr=3D rte_mbuf_to_baddr(m); > >>>> ... > >>>> } > >>>> > >>>> Also I think we need to provide a way to specify priv_size for all m= bufs of the mepool at init time: > >>>> - either force people to specify it at rte_mempool_create() time (pr= obably use init_arg for that), > >>>> - or provide separate function that could be called straight after r= te_mempool_create() , that > >>>> would setup priv_size for the pool and for all its mbufs. > >>>> - or some sort of combination of these 2 approaches - introduce a wr= apper function > >>>> (rte_mbuf_pool_create() or something) that would take priv_size as p= arameter, > >>>> create a new mempool and then setup priv_size. > >> > >> I though a bit more about this solution, and I realized that doing > >> mi->priv_size =3D md->priv_size in rte_pktmbuf_attach() is probably no= t > >> a good idea, as there is no garantee that the size of mi is large enou= gh > >> to store the priv of md. > >> > >> Having the same priv_size for mi and md is maybe a good constraint. > >> I can add this in the API comments and assertions in the code to > >> check this condition, what do you think? > > > > Probably we have a different concepts of what is mbuf's private space = in mind. > > From my point, even indirect buffer should use it's own private space a= nd > > leave contents of direct mbuf it attached to unmodified. > > After attach() operation, only contents of the buffer are shared betwee= n mbufs, > > but not the mbuf's metadata. >=20 > Sorry if it was not clear in my previous messages, but I agree > with your description. When attaching a mbuf, only data, not > metadata should be shared. >=20 > In the solution you are suggesting (quoted above), you say we need > to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt > this was not possible, but it depends on the meaning we give to > priv_size: >=20 > 1. If the meaning is "the size of the private data embedded in this > mbuf", which is the most logical meaning, we cannot do this > affectation >=20 > 2. If the meaning is "the size of the private data embedded in the > mbuf the buf_addr is pointing to" (which is harder to get), the > affectation makes sense. >=20 > From what I understand, you feel we should use 2/ as priv_size > definition. Is it correct? Yes, I meant 2. >>From my point priv_size inside mbuf is more like 'buf_ofs'. It's main purpose is for internal use - to help our mbuf manipulation routi= nies (attach/detach/free) to work correctly. If the user wants to query size of private space for the mbuf, he probably = should use the value from mempool. >=20 > In my previous message, the definition of m->priv_size was 1/, > so that's why I felt assigning mi->priv_size to md->priv_size was > not possible. >=20 > I agree 2/ is probably a good choice, as it would allow to attach > to a mbuf with a different priv_size. It may require some additional > comments above the field in the structure to explain that. >=20 >=20 > > Otherwise on detach(), you'll have to copy contents of private space ba= ck, from direct to indirect mbuf? > > Again how to deal with the case, when 2 or more mbufs will attach to th= e same direct one? > > > > So let say, if we'll have a macro: > > > > #define RTE_MBUF_PRIV_PTR(mb) ((void *)((struct rte_mbuf *)(mb)) + 1)) > > > > No matter is mb a direct or indirect mbuf. > > Do you have something else in mind here? >=20 > I completely agree with this macro. We should consider the private data > as an extension of the mbuf structure. >=20 >=20 > >>> Introducing rte_mbuf_pool_create() seems a good idea to me, it > >>> would hide 'rte_pktmbuf_pool_private' from the user and force > >>> to initialize all the required fields (mbuf_data_room_size only > >>> today, and maybe mbuf_priv_size). > >>> > >>> The API would be: > >>> > >>> struct rte_mempool * > >>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_size, > >>> unsigned cache_size, size_t mbuf_priv_size, > >>> rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg, > >>> int socket_id, unsigned flags) > >>> > >>> I can give it a try and send a patch for this. > >> > >> About this, it is not required anymore for this patch series if we > >> agree with my comment above. > > > > I still think we need some way to setup priv_size on a per-mempool basi= s. > > Doing that in rte_mbuf_pool_create() seems like a good thing to me. > > Not sure, why you decided to drop it? >=20 > I think we can already do it without changing the API by providing > our own rte_pktmbuf_init and rte_pktmbuf_pool_init. >=20 > rte_pktmbuf_init() has to set: > m->buf_len =3D mp->elt_size - sizeof(struct mbuf); > m->priv_size =3D sizeof(struct mbuf) - sizeof(struct rte_mbuf); >=20 > rte_pktmbuf_pool_init() has to set: > /* we can use the default function */ > mbp_priv->mbuf_data_room_size =3D MBUF_RXDATA_SIZE + > RTE_PKTMBUF_HEADROOM; Yeh, when arg=3D=3DNULL for rte_pktmbuf_pool_init() we always set up mbuf_data_room_size to the hardcoded value. Which always looked strange to me. I think it should be set to: mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM; for that case. >=20 > In this case, it is possible to calculate the mbuf_priv_size only > from the pool object: >=20 > mbuf_priv_size =3D pool->elt_size - RTE_PKTMBUF_HEADROOM - > pool_private->mbuf_data_room_size - > sizeof(rte_mbuf) >=20 So if I understand your idea correctly: If second parameter for rte_pktmbuf_pool_init() is NULL, then=20 we setup mbp_priv->mbuf_data_room_size =3D mp->elt_size - sizeof(struct rte_mbuf) - = RTE_PKTMBUF_HEADROOM; Which means that priv_size =3D=3D0 for all mbufs in the pool=20 Otherwise we setup: mbp_priv->mbuf_data_room_size =3D opaque_arg; As we are doing now, and priv_size for all mbufs in the pool will be: pool->elt_size - pool_private->mbuf_data_room_size - sizeof(rte_mbuf); And in that case, all users have to do to specify priv_size for mempool is = to pass a proper argument for rte_pktmbuf_pool_init() at mempool_create(). Correct?=20 >=20 > I agree it's not ideal, but I think the mbuf pool initialization > is another problem. That's why I suggested to change this in a > separate series that will add rte_mbuf_pool_create() with the > API described above. Thoughts? >=20 I also put answers to another mail below. Just to keep all discussion in one place. > > Though, I still think that the better approach is to reserve private sp= ace before struct rte_mbuf, not after. > > In that case, user got his private space, and we keep buf_addr straight= after rte_mbuf, without any whole. > > So we don't need steps 2 and 3, above, > > plus we don't need rte_mbuf_to_baddr() and rte_mbuf_from_indirect() - > > RTE_MBUF_TO_BADDR/ RTE_MBUF_FROM_BADDR would keep working correctly. > > In fact, with this scheme - we don't even need priv_size for mbuf manag= ement (attach/detach/free). > > > > Wonder did you try that approach? >=20 > Yes, I though about this approach too. But I think it would require > more changes. When an application or a driver allocate a mbuf with > mempool_get(), it expects that the returned pointer is the rte_mbuf *. Yep, sure it will still get the pointer to the rte_mbuf *. Though later, if upper layer would like to convert from rte_mbuf* to app_s= pecific_mbuf *, it would have to use a macro: #define RTE_MBUF_TO_PRIV(m) ((void *)((uintptr_t)(m) - (m)->priv_size))=20 >=20 > With this solution, there are 2 options: > - no mempool modification, so each application/driver has to add > priv_size bytes to the object to get the mbuf pointer. This does not > look feasible. > - change the mempool to support private area before each object. I > think mempool API is already quite complex, and I think it's not > the role of the mempool library to provide such features. In fact, I think the changes would be minimal. All we have to do, is to make changes in rte_pktmbuf_init(): void rte_pktmbuf_init(struct rte_mempool *mp, __attribute__((unused)) void *opaque_arg, void *_m, __attribute__((unused)) unsigned i) { //extract priv_size from mempool (discussed above). uint16_t priv_size =3D rte_mbufpool_get_priv_size(mp);=20 struct rte_mbuf *m =3D _m + priv_size; uint32_t buf_len =3D mp->elt_size - sizeof(struct rte_mbuf) - priv_si= ze; .... With that way priv_size inside mbuf would always be the size its own privat= e space, for both direct and indirect mbus., so we don't require to set priv_size at= attach/detach. Again RTE_MBUF_TO_BADDR() and RTE_MBUF_FROM_BADDR() would just work without= any modifications. Konstantin =20