From: Olivier Matz <olivier.matz@6wind.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
Adrien Mazarguil <adrien.mazarguil@6wind.com>
Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
Date: Thu, 9 Jun 2016 09:46:50 +0200 [thread overview]
Message-ID: <57591EEA.9040807@6wind.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836B6D17A@irsmsx105.ger.corp.intel.com>
Hi Konstantin,
>>>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
>>>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
>>>>>> mbuf back to the pool.
>>>>>> Not sure why do you consider that wrong?
>>>>>
>>>>> I do not consider this wrong and I'm all for using assert() to catch
>>>>> programming errors, however in this specific case, I think they are
>>>>> inconsistent and misleading.
>>>>
>>>> Honestly, I don't understand why.
>>>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
>>
>> What is the purpose of this? Is there some code relying on this?
>
> The whole current implementation of mbuf_free code path relies on that.
> Straight here:
> if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
> m->next = NULL;
> __rte_mbuf_raw_free(m);
> }
>
> If we'll exclude indirect mbuf logic, all it does:
> if (rte_mbuf_refcnt_update(m, -1) == 0) {
> m->next = NULL;
> __rte_mbuf_raw_free(m);
> }
>
> I.E.:
> decrement mbuf->refcnt.
> If new value of refcnt is zero, then put it back into the pool.
>
> So having ASERT(mbuf->refcnt==0) inside
> __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
> looks absolutely valid to me.
> I *has* to be zero at that point with current implementation,
> And if it is not then we probably have (or will have) a silent memory corruption.
This explains how the refcount is used, and why it is set
to zero before returning the mbuf to the pool with the mbuf
free functions.
It does not explain which code relies on the refcnt beeing 0
while the mbuf is in the pool.
>> But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all
>> PMDs were calling an internal function before).
>
> Yes, raw_alloc is public, NP with that.
>
>> We could argue that
>> rte_mbuf_raw_free() should also be made public for PMDs.
>
> We could, but right now it is not.
> Again, as I said, user could use it on his own but he obviously has to
> obey the rules and do manually what __rte_pktmbuf_prefree_seg() does.
> At least:
>
> rte_mbuf_refcnt_set(m, 0);
> __rte_mbuf_raw_free(m);
>
>>
>> As you said below, no-one promised that the free() reverts the malloc(),
>> but given the function names, one can legitimately expect that the
>> following code is valid:
>>
>> m = __rte_mbuf_raw_alloc();
>> /* do nothing here */
>> __rte_mbuf_raw_free(m);
>
> Surely people could (and would) expect various things...
> But the reality right now is: __rte_mbuf_raw_free() is an internal
> function and not counterpart of __rte_mbuf_raw_alloc().
> If the people don't bother to read API docs or/and the actual code,
> I can't see how we can help them :)
Yes, of course, people should read the doc.
This does not prevent to have a nice API that behaves in a
natural way :)
By the way, the fact that today the mbuf refcnt should be 0 while
in a pool is not in the api doc, but in the code.
>> If no code relies on having the refcnt set to 0 when a mbuf is in
>> the pool, I suggest to relax this constraint as Adrien proposed.
>
> Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()?
> To avoid any further confusion :)
> Seriously speaking I would prefer to leave it as it is.
> If you feel we have to introduce a counterpart of rte_mbuf_raw_alloc(),
> we can make a new public one:
>
> rte_mbuf_raw_free(stuct rte_mbuf *m)
> {
> if (rte_mbuf_refcnt_update(m, -1) == 0)
> __rte_mbuf_raw_free(m);
> }
This is an option, but I think it's not efficient to touch
the mbuf structure when allocating/freeing. See below.
>> Then, my opinion is that the refcount should be set to 1 in
>> rte_pktmbuf_reset().
>
> I don't think we need to update rte_pktmbuf_reset(),
> it doesn't touch refcnt at all and probably better to keep it that way.
Why would it be better?
All mbuf struct initializations are done in that function, why would
it be different for the refcnt?
> To achieve what you suggesting, we probably need to:
> 1) update _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to
> set refcnt back to 1, i.e:
>
> static inline struct rte_mbuf* __attribute__((always_inline))
> __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> {
> __rte_mbuf_sanity_check(m, 0);
>
> if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
> /* if this is an indirect mbuf, it is detached. */
> if (RTE_MBUF_INDIRECT(m))
> rte_pktmbuf_detach(m);
> + rte_mbuf_refcnt_set(m, 1);
> return m;
> }
> return NULL;
> }
>
> 2) either:
> a) update mbuf constructor function, so it sets refcnt=1.
> I suppose that is easy for rte_pktmbuf_init(), but it means that all custom
> constructors should do the same.
> Which means possible changes in existing user code and all ABI change related hassle.
> b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=1.
> But then I don't see how will get any performance gain here.
>
> So not sure is it really worth it.
>
>> And rte_pktmbuf_free() should not be allowed on
>> an uninitialized mbuf (yes, this would require some changes in PMDs).
>
> Not sure I understand you here...
> free() wouldn not be allowed on mbuf whose recnf==0?
> But it is not allowed right now anyway.
Today:
/* allowed */
m = rte_pktmbuf_alloc();
rte_pktmbuf_free(m);
/* not allowed */
m = rte_mbuf_raw_alloc();
__rte_mbuf_raw_free(m);
/* we should do instead (strange): */
m = rte_mbuf_raw_alloc();
rte_pktmbuf_free(m);
What I suggest to have:
/* allowed, no change */
m = rte_pktmbuf_alloc();
rte_pktmbuf_free(m);
/* allowed, these functions would be symetrical */
m = rte_mbuf_raw_alloc();
rte_mbuf_raw_free(m);
/* not allowed, m->refcnt is uninitialized */
m = rte_mbuf_raw_alloc();
rte_pktmbuf_free(m);
>
>> This would open the door for bulk allocation/free in the mbuf api.
>
> Hmm and what stops us having one right now?
> As I know we do have rte_pktmbuf_alloc_bulk(), I don't see
> why we can't have rte_pktmbuf_free_bulk() right now.
I think, not touching the refcnt on raw allocation allows a PMD to
do the following:
rte_mbuf_allow_raw_bulk(pool, &mbufs, n)
for (i = 0; i < n; i++) {
prefetch(m[i+1]);
rte_pktmbuf_reset(); /* including refcnt = 1 */
do_stuff(m);
}
/* if all mbufs are not used, free the remaining */
rte_mbuf_free_raw_bulk(&mbufs[i], n-i);
In that case, we can prefetch mbufs before using them, and
we can free the unused mbufs without touching the structure.
It looks to be a good advantage.
Yes, we can do that with mempool functions. But I feel having a mbuf
API with type checking is better.
>> This could be done in several steps:
>>
>> 1/ remove these assert(), add introduce a public rte_mbuf_raw_free()
>> 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs
>> in a future version, and ensure that all PMD are inline with this
>> requirement
>> 3/ later, move refcount initialization in rte_pktmbuf_reset()
>>
>> What do you think?
>
> I still think that assert() is on a right place :)
> *If* we'll change mbuf free code in a way that mbufs inside the pool
> should have refcnt==1, then I think we'll change it to:
> RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
> But as I stated above, that change might cause some backward compatibility hassle: 2.a)
> Or might not give us any performance gain: 2.b).
I suggest instead to have no constraint on the value of the refcnt
in the pool.
>> Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free()
>> and use mempool calls.
>
> Don't see why we have to remove them...
> Basically we have a bug in PMD, but instead of fixing it,
> you guys suggest to change mbuf code.
> Sounds a bit strange to me.
> Why not just make for these PMDs to call rte_mempool_put() directly, if you are sure it is safe here?
Yes, there are some bugs in the PMDs regarding the refcnt value when
compiled with MBUF_DEBUG=y. By the way, looking at the i40e code, we
have:
i40e_alloc_rx_queue_mbufs()
{
for (...) {
struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
...
rte_mbuf_refcnt_set(mbuf, 1); /* not needed */
...
}
i40e_tx_free_bufs()
{
...
if (txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) {
for (...) {
/* returning a mbuf with refcnt=1 */
rte_mempool_put(txep->mbuf->pool, txep->mbuf);
...
The fact that we can find many bugs related to that in PMDs is a
sign that the API is not understandable enough.
The point is not to fix the bugs in PMDs. I think the point is
to enhance the mbuf API.
Hope I have convinced you ;)
>
>> But having a mbuf wrapper does not seem a bad
>> thing to me.
>
> We can add some extra wrapper then, something like:
> #define __RTE_MBUF_PUT_FREED(m) (rte_mempool_put((m)->pool, m))
> ?
I think a wrapper should do the type checking (struct mbuf).
Thanks for this exchange.
Regards,
Olivier
next prev parent reply other threads:[~2016-06-09 7:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-08 8:31 Adrien Mazarguil
2016-06-08 10:34 ` Ananyev, Konstantin
2016-06-08 12:27 ` Adrien Mazarguil
2016-06-08 13:09 ` Ananyev, Konstantin
2016-06-08 13:57 ` Adrien Mazarguil
2016-06-08 14:11 ` Olivier Matz
2016-06-08 16:07 ` Ananyev, Konstantin
2016-06-09 7:46 ` Olivier Matz [this message]
2016-06-09 13:21 ` Ananyev, Konstantin
2016-06-09 14:19 ` Bruce Richardson
2016-06-09 15:27 ` Thomas Monjalon
2016-06-09 15:45 ` Ananyev, Konstantin
2016-06-09 18:42 ` Don Provan
2016-06-20 13:49 ` Adrien Mazarguil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57591EEA.9040807@6wind.com \
--to=olivier.matz@6wind.com \
--cc=adrien.mazarguil@6wind.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).