From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by dpdk.org (Postfix) with ESMTP id 40F5637A8 for ; Tue, 7 Apr 2015 17:45:31 +0200 (CEST) Received: by wiaa2 with SMTP id a2so24157241wia.0 for ; Tue, 07 Apr 2015 08:45:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=Qbty4la2vqMNn2vq8fUUuTqlTyB/MP/Eoy/Vl5U7YTw=; b=aMkRHPYws7jWdPeXBKTvmMEfvmQ4kNIMMZ3fWsTn7x4TDOaqFzOQWKHfn3TQcJc4a/ IMDohRVbk5vTgJrJ9EHeejkTv1AJ0bhRDUEzdVCptQuxRKGRfyL8+b3njnkyM70evWVy 7wCz8+HEGNLN9Np70Tu6IqXUWPXSXrsqDrlHlanVEQ5/TjZxD6+YBjD5TmVtd3n32Uk8 dImWoOgxFZYuBVow2YVClvx/O3GiuZdThs6BtC4aH4GRfTGUbMKejpuhbzd2m9zyPmff A920sABefAs9SOhluYJ6v2KGIsi8N3bZvcpX+KVizX8m7Wv2hrSgc6lH08P1kSKgBqHX hfdw== X-Gm-Message-State: ALoCoQm+VX1R1OQljWhpGMZ/NsogjvrxJwiJ/0SbSDTKq5Kp3WrLABLXBtTac019gfhiAwjRGXAT X-Received: by 10.194.171.1 with SMTP id aq1mr11670881wjc.38.1428421531088; Tue, 07 Apr 2015 08:45:31 -0700 (PDT) Received: from [10.16.0.195] (6wind.net2.nerim.net. [213.41.180.237]) by mx.google.com with ESMTPSA id mc20sm7016799wic.15.2015.04.07.08.45.29 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Apr 2015 08:45:30 -0700 (PDT) Message-ID: <5523FB9B.2060508@6wind.com> Date: Tue, 07 Apr 2015 17:45:31 +0200 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "dev@dpdk.org" References: <1427385595-15011-1-git-send-email-olivier.matz@6wind.com> <1427829784-12323-1-git-send-email-zer0@droids-corp.org> <1427829784-12323-2-git-send-email-zer0@droids-corp.org> <2601191342CEEE43887BDE71AB97725821413A2D@irsmsx105.ger.corp.intel.com> <5522FF6B.1030503@6wind.com> <2601191342CEEE43887BDE71AB97725821414310@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB97725821414310@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2015 15:45:31 -0000 Hi Konstantin, On 04/07/2015 02:40 PM, Ananyev, Konstantin wrote: > Hi Olivier, > >> -----Original Message----- >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] >> Sent: Monday, April 06, 2015 10:50 PM >> To: Ananyev, Konstantin; dev@dpdk.org >> Cc: zoltan.kiss@linaro.org; Richardson, Bruce >> Subject: Re: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data >> >> Hi Konstantin, >> >> Thanks for your comments. >> >> On 04/02/2015 07:21 PM, Ananyev, Konstantin wrote: >>> Hi Olivier, >>> >>>> -----Original Message----- >>>> From: Olivier Matz [mailto:olivier.matz@6wind.com] >>>> Sent: Tuesday, March 31, 2015 8:23 PM >>>> To: dev@dpdk.org >>>> Cc: Ananyev, Konstantin; zoltan.kiss@linaro.org; Richardson, Bruce; Olivier Matz >>>> Subject: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data >>>> >>>> From: Olivier Matz >>>> >>>> Add a new private_size field in mbuf structure that should >>>> be initialized at mbuf pool creation. This field contains the >>>> size of the application private data in mbufs. >>>> >>>> Introduce new static inline functions rte_mbuf_from_indirect() >>>> and rte_mbuf_to_baddr() to replace the existing macros, which >>>> take the private size in account when attaching and detaching >>>> mbufs. >>>> >>>> Signed-off-by: Olivier Matz >>>> --- >>>> app/test-pmd/testpmd.c | 1 + >>>> examples/vhost/main.c | 4 +-- >>>> lib/librte_mbuf/rte_mbuf.c | 1 + >>>> lib/librte_mbuf/rte_mbuf.h | 77 +++++++++++++++++++++++++++++++++++----------- >>>> 4 files changed, 63 insertions(+), 20 deletions(-) >>>> >>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c >>>> index 3057791..c5a195a 100644 >>>> --- a/app/test-pmd/testpmd.c >>>> +++ b/app/test-pmd/testpmd.c >>>> @@ -425,6 +425,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp, >>>> mb->tx_offload = 0; >>>> mb->vlan_tci = 0; >>>> mb->hash.rss = 0; >>>> + mb->priv_size = 0; >>>> } >>>> >>>> static void >>>> diff --git a/examples/vhost/main.c b/examples/vhost/main.c >>>> index c3fcb80..e44e82f 100644 >>>> --- a/examples/vhost/main.c >>>> +++ b/examples/vhost/main.c >>>> @@ -139,7 +139,7 @@ >>>> /* Number of descriptors per cacheline. */ >>>> #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc)) >>>> >>>> -#define MBUF_EXT_MEM(mb) (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb)) >>>> +#define MBUF_EXT_MEM(mb) (rte_mbuf_from_indirect(mb) != (mb)) >>>> >>>> /* mask of enabled ports */ >>>> static uint32_t enabled_port_mask = 0; >>>> @@ -1550,7 +1550,7 @@ attach_rxmbuf_zcp(struct virtio_net *dev) >>>> static inline void pktmbuf_detach_zcp(struct rte_mbuf *m) >>>> { >>>> const struct rte_mempool *mp = m->pool; >>>> - void *buf = RTE_MBUF_TO_BADDR(m); >>>> + void *buf = rte_mbuf_to_baddr(m); >>>> uint32_t buf_ofs; >>>> uint32_t buf_len = mp->elt_size - sizeof(*m); >>>> m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof(*m); >>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c >>>> index 526b18d..e095999 100644 >>>> --- a/lib/librte_mbuf/rte_mbuf.c >>>> +++ b/lib/librte_mbuf/rte_mbuf.c >>>> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, >>>> m->pool = mp; >>>> m->nb_segs = 1; >>>> m->port = 0xff; >>>> + m->priv_size = 0; >>> >>> Why it is 0? >>> Shouldn't it be the same calulations as in detach() below: >>> m->priv_size = /*get private size from mempool private*/; >>> m->buf_addr = (char *)m + sizeof(struct rte_mbuf) + m->priv_size; >>> m->buf_len = mp->elt_size - sizeof(struct rte_mbuf) - m->priv_size; >>> ? >> >> It's 0 because we also have in the function (not visible in the >> patch): >> >> m->buf_addr = (char *)m + sizeof(struct rte_mbuf); > > Yep, that's why as I wrote above, I think we need to setup here all 3 fields: > priv_size, buf_addr, buf_len exactly in the same way as in detach(). > >> >> It means that an application that wants to use a private area has >> to provide another init function derived from this default function. > > After your changes, attach/free and other functions from public mbuf API > rely on priv_size being set properly. > So I suppose 'official' pktmbuf_init() should also set it in a proper manner. > >> This was already the case before the patch series. > > Before this patch series, we don't have priv_size, so we have nothing to setup. > >> >> As we discussed in previous mail, I plan to propose a rework of >> mbuf pool initialization in another series, and my initial idea was to >> change this at the same time. But on the other hand it does not hurt >> to do this change now. I'll include it in next version. > > Ok. Just to be sure we're on the same line: - before the patch series - private area was working before that patch series if clones were not used. To use a private are, the user had to provide another function derived from pktmbuf_init() to change m->buf_addr and m->buf_len. - using both private area + clones was broken - after the patch series - private area is working with or without clone. But yo use it, the user still has to provide another function to change m->buf_addr, m->buf_len *and m->priv_size*. The series just fixes the fact that "clones + priv" was not working. It does not address the problem that providing a new pktmbuf_init() function is required to use privata area. To fix this, I think it could require a API evolution that should be part of another series. I'll send a v4 addressing the comments soon, thanks. Regards, Olivier > >> >> >>> BTW, don't see changes in rte_pktmbuf_pool_init() to setup >>> mbp_priv->mbuf_data_room_size properly. >>> Without that changes, how can people start using that feature? >>> It seems that the only way now - setup priv_size and buf_len for each mbuf manually. >> >> It's the same reason than above. To use a private are, the user has >> to provide its own function that sets up data_room_size, derived from >> this pool_init default function. This was also the case before the >> patch series. >> >> >>> >>>> } >>>> >>>> /* do some sanity checks on a mbuf: panic if it fails */ >>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h >>>> index 17ba791..932fe58 100644 >>>> --- a/lib/librte_mbuf/rte_mbuf.h >>>> +++ b/lib/librte_mbuf/rte_mbuf.h >>>> @@ -317,18 +317,51 @@ struct rte_mbuf { >>>> /* uint64_t unused:8; */ >>>> }; >>>> }; >>>> + >>>> + /** Size of the application private data. In case of an indirect >>>> + * mbuf, it stores the direct mbuf private data size. */ >>>> + uint16_t priv_size; >>>> } __rte_cache_aligned; >>>> >>>> /** >>>> - * Given the buf_addr returns the pointer to corresponding mbuf. >>>> + * Return the mbuf owning the data buffer address of an indirect mbuf. >>>> + * >>>> + * @param mi >>>> + * The pointer to the indirect mbuf. >>>> + * @return >>>> + * The address of the direct mbuf corresponding to buffer_addr. >>>> */ >>>> -#define RTE_MBUF_FROM_BADDR(ba) (((struct rte_mbuf *)(ba)) - 1) >>>> +static inline struct rte_mbuf * >>>> +rte_mbuf_from_indirect(struct rte_mbuf *mi) >>>> +{ >>>> + struct rte_mbuf *md; >>>> + >>>> + /* mi->buf_addr and mi->priv_size correspond to buffer and >>>> + * private size of the direct mbuf */ >>>> + md = (struct rte_mbuf *)((char *)mi->buf_addr - sizeof(*mi) - >>>> + mi->priv_size); >>> >>> (uintptr_t)mi->buf_addr? >> >> Any clue why (uintptr_t) would be better than (char *) ? > > No big difference really, just looks a bit better to me :) > >> By the way, I added this cast because it would not compile with >> g++ (and probably with icc too). >> >>> >>>> + return md; >>>> +} >>>> >>>> /** >>>> - * Given the pointer to mbuf returns an address where it's buf_addr >>>> - * should point to. >>>> + * Return the buffer address embedded in the given mbuf. >>>> + * >>>> + * The user must ensure that m->priv_size corresponds to the >>>> + * private size of this mbuf, which is not the case for indirect >>>> + * mbufs. >>>> + * >>>> + * @param md >>>> + * The pointer to the mbuf. >>>> + * @return >>>> + * The address of the data buffer owned by the mbuf. >>>> */ >>>> -#define RTE_MBUF_TO_BADDR(mb) (((struct rte_mbuf *)(mb)) + 1) >>>> +static inline char * >>> >>> Might be better to return 'void *' here. >> >> Ok, as m->buf_addr is a (void *). >> >>> >>>> +rte_mbuf_to_baddr(struct rte_mbuf *md) >>>> +{ >>>> + char *buffer_addr; >>> >>> uintptr_t buffer_addr? >> >> Same question than above, I don't really see why it's better than >> (char *). >> >>> >>>> + buffer_addr = (char *)md + sizeof(*md) + md->priv_size; >>>> + return buffer_addr; >>>> +} >>>> >>>> /** >>>> * Returns TRUE if given mbuf is indirect, or FALSE otherwise. >>>> @@ -688,6 +721,7 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp) >>>> >>>> /** >>>> * Attach packet mbuf to another packet mbuf. >>>> + * >>>> * After attachment we refer the mbuf we attached as 'indirect', >>>> * while mbuf we attached to as 'direct'. >>>> * Right now, not supported: >>>> @@ -701,7 +735,6 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp) >>>> * @param md >>>> * The direct packet mbuf. >>>> */ >>>> - >>>> static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) >>>> { >>>> RTE_MBUF_ASSERT(RTE_MBUF_DIRECT(md) && >>>> @@ -712,6 +745,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) >>>> mi->buf_physaddr = md->buf_physaddr; >>>> mi->buf_addr = md->buf_addr; >>>> mi->buf_len = md->buf_len; >>>> + mi->priv_size = md->priv_size; >>>> >>>> mi->next = md->next; >>>> mi->data_off = md->data_off; >>>> @@ -732,7 +766,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) >>>> } >>>> >>>> /** >>>> - * Detach an indirect packet mbuf - >>>> + * Detach an indirect packet mbuf. >>>> + * >>>> * - restore original mbuf address and length values. >>>> * - reset pktmbuf data and data_len to their default values. >>>> * All other fields of the given packet mbuf will be left intact. >>>> @@ -740,22 +775,28 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) >>>> * @param m >>>> * The indirect attached packet mbuf. >>>> */ >>>> - >>>> static inline void rte_pktmbuf_detach(struct rte_mbuf *m) >>>> { >>>> - const struct rte_mempool *mp = m->pool; >>>> - void *buf = RTE_MBUF_TO_BADDR(m); >>>> - uint32_t buf_len = mp->elt_size - sizeof(*m); >>>> - m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof (*m); >>>> - >>>> + struct rte_pktmbuf_pool_private *mbp_priv; >>>> + struct rte_mempool *mp = m->pool; >>>> + void *buf; >>>> + unsigned mhdr_size; >>>> + >>>> + /* first, restore the priv_size, this is needed before calling >>>> + * rte_mbuf_to_baddr() */ >>>> + mbp_priv = rte_mempool_get_priv(mp); >>>> + m->priv_size = mp->elt_size - RTE_PKTMBUF_HEADROOM - >>>> + mbp_priv->mbuf_data_room_size - >>>> + sizeof(struct rte_mbuf); >>> >>> I think it is better to put this priv_size calculation above into the separate function - >>> rte_mbuf_get_priv_size(m) or something. >>> We need it in few places, and users would probably need it anyway. >> >> yep, good idea >> >>> >>>> + >>>> + buf = rte_mbuf_to_baddr(m); >>>> + mhdr_size = (char *)buf - (char *)m; >>> >>> Why do you need to recalculate mhdr_size here? >>> As I understand it is a m->priv_size, and you just retrieved it, 2 lines above. >>> >> >> It's not m->priv_size but (sizeof(rte_mbuf) + m->priv_size). > > Ah yes, sorry for confusion. > >> In both case, it requires an operation, but maybe >> mhdr_size = (sizeof(rte_mbuf) + m->priv_size) >> is clearer than >> mhdr_size = (char *)buf - (char *)m >> >> >>>> + m->buf_physaddr = rte_mempool_virt2phy(mp, m) + mhdr_size; >>> >>> Actually I think could just be: >>> m->buf_physaddr = rte_mempool_virt2phy(mp, buf); >> >> Even if it would work, the API of rte_mempool_virt2phy() >> says that the second argument should be "A pointer (virtual address) >> to the element of the pool." >> I think we should keep the initial code. > > Ok. > Konstantin > >> >> Regards, >> Olivier >> >