From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <olivier.matz@6wind.com>
Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com
 [209.85.212.179]) by dpdk.org (Postfix) with ESMTP id 40F5637A8
 for <dev@dpdk.org>; Tue,  7 Apr 2015 17:45:31 +0200 (CEST)
Received: by wiaa2 with SMTP id a2so24157241wia.0
 for <dev@dpdk.org>; Tue, 07 Apr 2015 08:45:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=Qbty4la2vqMNn2vq8fUUuTqlTyB/MP/Eoy/Vl5U7YTw=;
 b=aMkRHPYws7jWdPeXBKTvmMEfvmQ4kNIMMZ3fWsTn7x4TDOaqFzOQWKHfn3TQcJc4a/
 IMDohRVbk5vTgJrJ9EHeejkTv1AJ0bhRDUEzdVCptQuxRKGRfyL8+b3njnkyM70evWVy
 7wCz8+HEGNLN9Np70Tu6IqXUWPXSXrsqDrlHlanVEQ5/TjZxD6+YBjD5TmVtd3n32Uk8
 dImWoOgxFZYuBVow2YVClvx/O3GiuZdThs6BtC4aH4GRfTGUbMKejpuhbzd2m9zyPmff
 A920sABefAs9SOhluYJ6v2KGIsi8N3bZvcpX+KVizX8m7Wv2hrSgc6lH08P1kSKgBqHX
 hfdw==
X-Gm-Message-State: ALoCoQm+VX1R1OQljWhpGMZ/NsogjvrxJwiJ/0SbSDTKq5Kp3WrLABLXBtTac019gfhiAwjRGXAT
X-Received: by 10.194.171.1 with SMTP id aq1mr11670881wjc.38.1428421531088;
 Tue, 07 Apr 2015 08:45:31 -0700 (PDT)
Received: from [10.16.0.195] (6wind.net2.nerim.net. [213.41.180.237])
 by mx.google.com with ESMTPSA id mc20sm7016799wic.15.2015.04.07.08.45.29
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 07 Apr 2015 08:45:30 -0700 (PDT)
Message-ID: <5523FB9B.2060508@6wind.com>
Date: Tue, 07 Apr 2015 17:45:31 +0200
From: Olivier MATZ <olivier.matz@6wind.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Icedove/31.3.0
MIME-Version: 1.0
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, 
 "dev@dpdk.org" <dev@dpdk.org>
References: <1427385595-15011-1-git-send-email-olivier.matz@6wind.com>
 <1427829784-12323-1-git-send-email-zer0@droids-corp.org>
 <1427829784-12323-2-git-send-email-zer0@droids-corp.org>
 <2601191342CEEE43887BDE71AB97725821413A2D@irsmsx105.ger.corp.intel.com>
 <5522FF6B.1030503@6wind.com>
 <2601191342CEEE43887BDE71AB97725821414310@irsmsx105.ger.corp.intel.com>
In-Reply-To: <2601191342CEEE43887BDE71AB97725821414310@irsmsx105.ger.corp.intel.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v3 1/5] mbuf: fix clone support when
 application uses private mbuf data
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Apr 2015 15:45:31 -0000

Hi Konstantin,

On 04/07/2015 02:40 PM, Ananyev, Konstantin wrote:
> Hi Olivier,
>
>> -----Original Message-----
>> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
>> Sent: Monday, April 06, 2015 10:50 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Cc: zoltan.kiss@linaro.org; Richardson, Bruce
>> Subject: Re: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data
>>
>> Hi Konstantin,
>>
>> Thanks for your comments.
>>
>> On 04/02/2015 07:21 PM, Ananyev, Konstantin wrote:
>>> Hi Olivier,
>>>
>>>> -----Original Message-----
>>>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>>>> Sent: Tuesday, March 31, 2015 8:23 PM
>>>> To: dev@dpdk.org
>>>> Cc: Ananyev, Konstantin; zoltan.kiss@linaro.org; Richardson, Bruce; Olivier Matz
>>>> Subject: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data
>>>>
>>>> From: Olivier Matz <olivier.matz@6wind.com>
>>>>
>>>> Add a new private_size field in mbuf structure that should
>>>> be initialized at mbuf pool creation. This field contains the
>>>> size of the application private data in mbufs.
>>>>
>>>> Introduce new static inline functions rte_mbuf_from_indirect()
>>>> and rte_mbuf_to_baddr() to replace the existing macros, which
>>>> take the private size in account when attaching and detaching
>>>> mbufs.
>>>>
>>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>>> ---
>>>>   app/test-pmd/testpmd.c     |  1 +
>>>>   examples/vhost/main.c      |  4 +--
>>>>   lib/librte_mbuf/rte_mbuf.c |  1 +
>>>>   lib/librte_mbuf/rte_mbuf.h | 77 +++++++++++++++++++++++++++++++++++-----------
>>>>   4 files changed, 63 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>>>> index 3057791..c5a195a 100644
>>>> --- a/app/test-pmd/testpmd.c
>>>> +++ b/app/test-pmd/testpmd.c
>>>> @@ -425,6 +425,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>>>>   	mb->tx_offload   = 0;
>>>>   	mb->vlan_tci     = 0;
>>>>   	mb->hash.rss     = 0;
>>>> +	mb->priv_size    = 0;
>>>>   }
>>>>
>>>>   static void
>>>> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
>>>> index c3fcb80..e44e82f 100644
>>>> --- a/examples/vhost/main.c
>>>> +++ b/examples/vhost/main.c
>>>> @@ -139,7 +139,7 @@
>>>>   /* Number of descriptors per cacheline. */
>>>>   #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
>>>>
>>>> -#define MBUF_EXT_MEM(mb)   (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb))
>>>> +#define MBUF_EXT_MEM(mb)   (rte_mbuf_from_indirect(mb) != (mb))
>>>>
>>>>   /* mask of enabled ports */
>>>>   static uint32_t enabled_port_mask = 0;
>>>> @@ -1550,7 +1550,7 @@ attach_rxmbuf_zcp(struct virtio_net *dev)
>>>>   static inline void pktmbuf_detach_zcp(struct rte_mbuf *m)
>>>>   {
>>>>   	const struct rte_mempool *mp = m->pool;
>>>> -	void *buf = RTE_MBUF_TO_BADDR(m);
>>>> +	void *buf = rte_mbuf_to_baddr(m);
>>>>   	uint32_t buf_ofs;
>>>>   	uint32_t buf_len = mp->elt_size - sizeof(*m);
>>>>   	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof(*m);
>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>>> index 526b18d..e095999 100644
>>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>>> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>>>>   	m->pool = mp;
>>>>   	m->nb_segs = 1;
>>>>   	m->port = 0xff;
>>>> +	m->priv_size = 0;
>>>
>>> Why it is 0?
>>> Shouldn't it be the same calulations as in detach() below:
>>> m->priv_size = /*get private size from mempool private*/;
>>> m->buf_addr = (char *)m + sizeof(struct rte_mbuf) + m->priv_size;
>>> m->buf_len = mp->elt_size - sizeof(struct rte_mbuf) - m->priv_size;
>>> ?
>>
>> It's 0 because we also have in the function (not visible in the
>> patch):
>>
>>    m->buf_addr = (char *)m + sizeof(struct rte_mbuf);
>
> Yep, that's why as I wrote above, I think we need to setup here all 3 fields:
> priv_size, buf_addr, buf_len exactly in the same way as in detach().
>
>>
>> It means that an application that wants to use a private area has
>> to provide another init function derived from this default function.
>
> After your changes, attach/free and other functions from public mbuf API
> rely on priv_size being set properly.
> So I suppose 'official' pktmbuf_init() should also set it in a proper manner.
>
>> This was already the case before the patch series.
>
> Before this patch series, we don't have priv_size, so we have nothing to setup.
>
>>
>> As we discussed in previous mail, I plan to propose a rework of
>> mbuf pool initialization in another series, and my initial idea was to
>> change this at the same time. But on the other hand it does not hurt
>> to do this change now. I'll include it in next version.
>
> Ok.

Just to be sure we're on the same line:

- before the patch series

   - private area was working before that patch series if clones were not
     used. To use a private are, the user had to provide another
     function derived from pktmbuf_init() to change m->buf_addr and
     m->buf_len.
   - using both private area + clones was broken

- after the patch series

   - private area is working with or without clone. But yo use it,
     the user still has to provide another function to change
     m->buf_addr, m->buf_len *and m->priv_size*.

The series just fixes the fact that "clones + priv" was not working.
It does not address the problem that providing a new pktmbuf_init()
function is required to use privata area. To fix this, I think it
could require a API evolution that should be part of another series.

I'll send a v4 addressing the comments soon, thanks.

Regards,
Olivier


>
>>
>>
>>> BTW, don't see changes in rte_pktmbuf_pool_init() to setup
>>> mbp_priv->mbuf_data_room_size properly.
>>> Without that changes, how can people start using that feature?
>>> It seems that the only way now - setup priv_size and buf_len for each mbuf manually.
>>
>> It's the same reason than above. To use a private are, the user has
>> to provide its own function that sets up data_room_size, derived from
>> this pool_init default function. This was also the case before the
>> patch series.
>>
>>
>>>
>>>>   }
>>>>
>>>>   /* do some sanity checks on a mbuf: panic if it fails */
>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>>> index 17ba791..932fe58 100644
>>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>>> @@ -317,18 +317,51 @@ struct rte_mbuf {
>>>>   			/* uint64_t unused:8; */
>>>>   		};
>>>>   	};
>>>> +
>>>> +	/** Size of the application private data. In case of an indirect
>>>> +	 * mbuf, it stores the direct mbuf private data size. */
>>>> +	uint16_t priv_size;
>>>>   } __rte_cache_aligned;
>>>>
>>>>   /**
>>>> - * Given the buf_addr returns the pointer to corresponding mbuf.
>>>> + * Return the mbuf owning the data buffer address of an indirect mbuf.
>>>> + *
>>>> + * @param mi
>>>> + *   The pointer to the indirect mbuf.
>>>> + * @return
>>>> + *   The address of the direct mbuf corresponding to buffer_addr.
>>>>    */
>>>> -#define RTE_MBUF_FROM_BADDR(ba)     (((struct rte_mbuf *)(ba)) - 1)
>>>> +static inline struct rte_mbuf *
>>>> +rte_mbuf_from_indirect(struct rte_mbuf *mi)
>>>> +{
>>>> +       struct rte_mbuf *md;
>>>> +
>>>> +       /* mi->buf_addr and mi->priv_size correspond to buffer and
>>>> +	* private size of the direct mbuf */
>>>> +       md = (struct rte_mbuf *)((char *)mi->buf_addr - sizeof(*mi) -
>>>> +	       mi->priv_size);
>>>
>>> (uintptr_t)mi->buf_addr?
>>
>> Any clue why (uintptr_t) would be better than (char *) ?
>
> No big difference really, just looks a bit better to me :)
>
>> By the way, I added this cast because it would not compile with
>> g++ (and probably with icc too).
>>
>>>
>>>> +       return md;
>>>> +}
>>>>
>>>>   /**
>>>> - * Given the pointer to mbuf returns an address where it's  buf_addr
>>>> - * should point to.
>>>> + * Return the buffer address embedded in the given mbuf.
>>>> + *
>>>> + * The user must ensure that m->priv_size corresponds to the
>>>> + * private size of this mbuf, which is not the case for indirect
>>>> + * mbufs.
>>>> + *
>>>> + * @param md
>>>> + *   The pointer to the mbuf.
>>>> + * @return
>>>> + *   The address of the data buffer owned by the mbuf.
>>>>    */
>>>> -#define RTE_MBUF_TO_BADDR(mb)       (((struct rte_mbuf *)(mb)) + 1)
>>>> +static inline char *
>>>
>>> Might be better to return 'void *' here.
>>
>> Ok, as m->buf_addr is a (void *).
>>
>>>
>>>> +rte_mbuf_to_baddr(struct rte_mbuf *md)
>>>> +{
>>>> +       char *buffer_addr;
>>>
>>> uintptr_t buffer_addr?
>>
>> Same question than above, I don't really see why it's better than
>> (char *).
>>
>>>
>>>> +       buffer_addr = (char *)md + sizeof(*md) + md->priv_size;
>>>> +       return buffer_addr;
>>>> +}
>>>>
>>>>   /**
>>>>    * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
>>>> @@ -688,6 +721,7 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
>>>>
>>>>   /**
>>>>    * Attach packet mbuf to another packet mbuf.
>>>> + *
>>>>    * After attachment we refer the mbuf we attached as 'indirect',
>>>>    * while mbuf we attached to as 'direct'.
>>>>    * Right now, not supported:
>>>> @@ -701,7 +735,6 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
>>>>    * @param md
>>>>    *   The direct packet mbuf.
>>>>    */
>>>> -
>>>>   static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
>>>>   {
>>>>   	RTE_MBUF_ASSERT(RTE_MBUF_DIRECT(md) &&
>>>> @@ -712,6 +745,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
>>>>   	mi->buf_physaddr = md->buf_physaddr;
>>>>   	mi->buf_addr = md->buf_addr;
>>>>   	mi->buf_len = md->buf_len;
>>>> +	mi->priv_size = md->priv_size;
>>>>
>>>>   	mi->next = md->next;
>>>>   	mi->data_off = md->data_off;
>>>> @@ -732,7 +766,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
>>>>   }
>>>>
>>>>   /**
>>>> - * Detach an indirect packet mbuf -
>>>> + * Detach an indirect packet mbuf.
>>>> + *
>>>>    *  - restore original mbuf address and length values.
>>>>    *  - reset pktmbuf data and data_len to their default values.
>>>>    *  All other fields of the given packet mbuf will be left intact.
>>>> @@ -740,22 +775,28 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
>>>>    * @param m
>>>>    *   The indirect attached packet mbuf.
>>>>    */
>>>> -
>>>>   static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
>>>>   {
>>>> -	const struct rte_mempool *mp = m->pool;
>>>> -	void *buf = RTE_MBUF_TO_BADDR(m);
>>>> -	uint32_t buf_len = mp->elt_size - sizeof(*m);
>>>> -	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof (*m);
>>>> -
>>>> +	struct rte_pktmbuf_pool_private *mbp_priv;
>>>> +	struct rte_mempool *mp = m->pool;
>>>> +	void *buf;
>>>> +	unsigned mhdr_size;
>>>> +
>>>> +	/* first, restore the priv_size, this is needed before calling
>>>> +	 * rte_mbuf_to_baddr() */
>>>> +	mbp_priv = rte_mempool_get_priv(mp);
>>>> +	m->priv_size = mp->elt_size - RTE_PKTMBUF_HEADROOM -
>>>> +		mbp_priv->mbuf_data_room_size -
>>>> +		sizeof(struct rte_mbuf);
>>>
>>> I think it is better to put this priv_size calculation above into the separate function -
>>> rte_mbuf_get_priv_size(m) or something.
>>> We need it in few places, and users would probably need it anyway.
>>
>> yep, good idea
>>
>>>
>>>> +
>>>> +	buf = rte_mbuf_to_baddr(m);
>>>> +	mhdr_size = (char *)buf - (char *)m;
>>>
>>> Why do you need to recalculate mhdr_size here?
>>> As I understand it is a m->priv_size, and you just retrieved it, 2 lines above.
>>>
>>
>> It's not m->priv_size but (sizeof(rte_mbuf) + m->priv_size).
>
> Ah yes, sorry for confusion.
>
>> In both case, it requires an operation, but maybe
>>    mhdr_size = (sizeof(rte_mbuf) + m->priv_size)
>> is clearer than
>>    mhdr_size = (char *)buf - (char *)m
>>
>>
>>>> +	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + mhdr_size;
>>>
>>> Actually I think could just be:
>>> m->buf_physaddr = rte_mempool_virt2phy(mp, buf);
>>
>> Even if it would work, the API of rte_mempool_virt2phy()
>> says that the second argument should be "A pointer (virtual address)
>> to the element of the pool."
>> I think we should keep the initial code.
>
> Ok.
> Konstantin
>
>>
>> Regards,
>> Olivier
>>
>