From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"Olivier Matz" <olivier.matz@6wind.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
Date: Sun, 8 Nov 2020 17:16:20 +0300 [thread overview]
Message-ID: <c2b4adb2-0a0a-7aa2-d89b-fc51b438b446@oktetlabs.ru> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C61400@smartserver.smartshare.dk>
On 11/6/20 3:23 PM, Morten Brørup wrote:
>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>> Sent: Friday, November 6, 2020 12:54 PM
>>
>>>>>>>>>>>>>>>>>> Hi Olivier,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
>>>> whatever
>>>>>> the
>>>>>>>> value
>>>>>>>>>> of m->next,
>>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
>> !=
>>>> 1.
>>>>>> For
>>>>>>>>>> instance in this
>>>>>>>>>>>>>>>>>>> case:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> m1 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>> rte_pktmbuf_append(m1, 500);
>>>>>>>>>>>>>>>>>>> m2 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>> rte_pktmbuf_append(m2, 500);
>>>>>>>>>>>>>>>>>>> rte_pktmbuf_chain(m1, m2);
>>>>>>>>>>>>>>>>>>> m0 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>> rte_pktmbuf_append(m0, 500);
>>>>>>>>>>>>>>>>>>> rte_pktmbuf_chain(m0, m1);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
>>>> nb_seg in
>>>>>> the
>>>>>>>>>> initial m1
>>>>>>>>>>>>>>>>>>> segment (this is not required), after
>> this
>>>> code
>>>>>> the
>>>>>>>>>> mbuf chain
>>>>>>>>>>>>>>>>>>> have 3 segments:
>>>>>>>>>>>>>>>>>>> - m0: next=m1, nb_seg=3
>>>>>>>>>>>>>>>>>>> - m1: next=m2, nb_seg=2
>>>>>>>>>>>>>>>>>>> - m2: next=NULL, nb_seg=1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
>> restore
>>>>>> nb_seg=1
>>>>>>>> in
>>>>>>>>>> the second
>>>>>>>>>>>>>>>>>>> segment.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
>>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
>>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
>> *m)
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> if (m->next != NULL) {
>>>>>>>>>>>>>>>>>> m->next = NULL;
>>>>>>>>>>>>>>>>>> m->nb_segs = 1;
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
>> if()
>>>>>> block,
>>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
>>>>>>>>>>>>>>>>>> What I am missing here?
>>>>>>>>>>>>>>>>>> Thinking in more generic way, that
>> change:
>>>>>>>>>>>>>>>>>> - if (m->next != NULL) {
>>>>>>>>>>>>>>>>>> - m->next = NULL;
>>>>>>>>>>>>>>>>>> - m->nb_segs = 1;
>>>>>>>>>>>>>>>>>> - }
>>>>>>>>>>>>>>>>>> + m->next = NULL;
>>>>>>>>>>>>>>>>>> + m->nb_segs = 1;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
>> and
>>>> now
>>>>>> it
>>>>>>>> does
>>>>>>>>>> not
>>>>>>>>>>>>>>>>> show the issue...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The full example also adds a split() to
>> break
>>>> the
>>>>>>>> mbuf
>>>>>>>>>> chain
>>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
>>>> would
>>>>>> be
>>>>>>>> done
>>>>>>>>>> for
>>>>>>>>>>>>>>>>> software TCP segmentation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If so, may be the right solution is to care
>>>> about
>>>>>>>> nb_segs
>>>>>>>>>>>>>>>> when next is set to NULL on split? Any
>> place
>>>> when
>>>>>> next
>>>>>>>> is
>>>>>>>>>> set
>>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
>>>> more
>>>>>>>> generic
>>>>>>>>>> place.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem with that approach is that there
>> are
>>>>>> already
>>>>>>>>>> several
>>>>>>>>>>>>>>> existing split() or trim() implementations in
>>>>>> different
>>>>>>>> DPDK-
>>>>>>>>>> based
>>>>>>>>>>>>>>> applications. For instance, we have some in
>>>>>> 6WINDGate. If
>>>>>>>> we
>>>>>>>>>> force
>>>>>>>>>>>>>>> applications to set nb_seg to 1 when
>> resetting
>>>> next,
>>>>>> it
>>>>>>>> has
>>>>>>>>>> to be
>>>>>>>>>>>>>>> documented because it is not straightforward.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think it is better to go that way.
>>>>>>>>>>>>>> From my perspective it seems natural to reset
>>>> nb_seg at
>>>>>>>> same
>>>>>>>>>> time
>>>>>>>>>>>>>> we reset next, otherwise inconsistency will
>> occur.
>>>>>>>>>>>>>
>>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
>> me
>>>> it
>>>>>> was
>>>>>>>> clear
>>>>>>>>>> that
>>>>>>>>>>>>> nb_segs is only valid in the first segment, like
>> for
>>>> many
>>>>>>>> fields
>>>>>>>>>> (port,
>>>>>>>>>>>>> ol_flags, vlan, rss, ...).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we say that nb_segs has to be valid in any
>>>> segments,
>>>>>> it
>>>>>>>> means
>>>>>>>>>> that
>>>>>>>>>>>>> chain() or split() will have to update it in all
>>>>>> segments,
>>>>>>>> which
>>>>>>>>>> is not
>>>>>>>>>>>>> efficient.
>>>>>>>>>>>>
>>>>>>>>>>>> Why in all?
>>>>>>>>>>>> We can state that nb_segs on non-first segment
>> should
>>>>>> always
>>>>>>>> equal
>>>>>>>>>> 1.
>>>>>>>>>>>> As I understand in that case, both split() and
>> chain()
>>>> have
>>>>>> to
>>>>>>>>>> update nb_segs
>>>>>>>>>>>> only for head mbufs, rest ones will remain
>> untouched.
>>>>>>>>>>>
>>>>>>>>>>> Well, anyway, I think it's strange to have a
>> constraint
>>>> on m-
>>>>>>>>> nb_segs
>>>>>>>>>> for
>>>>>>>>>>> non-first segment. We don't have that kind of
>> constraints
>>>> for
>>>>>>>> other
>>>>>>>>>> fields.
>>>>>>>>>>
>>>>>>>>>> True, we don't. But this is one of the fields we
>> consider
>>>>>> critical
>>>>>>>>>> for proper work of mbuf alloc/free mechanism.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
>>>>>> segments
>>>>>>>> will provide any benefits.
>>>>>>>>
>>>>>>>> It would make this patch unneeded.
>>>>>>>> So, for direct, non-segmented mbufs pktmbuf_free() will
>> remain
>>>>>> write-
>>>>>>>> free.
>>>>>>>
>>>>>>> I see. Then I agree with Konstantin that alternative
>> solutions
>>>> should
>>>>>> be considered.
>>>>>>>
>>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
>> is a
>>>>>> very common operation - certainly outweighs the cost of
>> requiring
>>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
>> 1.
>>>>>>>
>>>>>>> Nonetheless, the bug needs to be fixed somehow.
>>>>>>>
>>>>>>> If we can't come up with a better solution that doesn't break
>> the
>>>>>> ABI, we are forced to accept the patch.
>>>>>>>
>>>>>>> Unless the techboard accepts to break the ABI in order to
>> avoid
>>>> the
>>>>>> performance cost of this patch.
>>>>>>
>>>>>> Did someone notice a performance drop with this patch?
>>>>>> On my side, I don't see any regression on a L3 use case.
>>>>>
>>>>> I am afraid that the DPDK performance regression tests are based
>> on
>>>> TX immediately following RX, so cache misses in TX may go by
>> unnoticed
>>>> because RX warmed up the cache for TX already. And similarly for RX
>>>> reusing mbufs that have been warmed up by the preceding free() at
>> TX.
>>>>>
>>>>> Please consider testing the performance difference with the mbuf
>>>> being completely cold at TX, and going completely cold again before
>>>> being reused for RX.
>>>>>
>>>>>>
>>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
>>>> subsequent
>>>>>> mbuf
>>>>>> allocation to return a mbuf which is not correctly initialized.
>>>> There
>>>>>> are 2
>>>>>> options to fix it:
>>>>>>
>>>>>> 1/ change the mbuf free function (this patch)
>>>>>>
>>>>>> - m->nb_segs would behave like many other field: valid in
>> the
>>>> first
>>>>>> segment, ignored in other segments
>>>>>> - may impact performance (suspected)
>>>>>>
>>>>>> 2/ change all places where a mbuf chain is split, or trimmed
>>>>>>
>>>>>> - m->nb_segs would have a specific behavior: count the
>> number of
>>>>>> segments in the first mbuf, should be 1 in the last
>> segment,
>>>>>> ignored in other ones.
>>>>>> - no code change in mbuf library, so no performance impact
>>>>>> - need to patch all places where we do a mbuf split or trim.
>>>> From
>>>>>> afar,
>>>>>> I see at least mbuf_cut_seg_ofs() in DPDK. Some external
>>>>>> applications
>>>>>> may have to be patched (for instance, I already found 3
>> places
>>>> in
>>>>>> 6WIND code base without a deep search).
>>>>>>
>>>>>> In my opinion, 1/ is better, except we notice a significant
>>>>>> performance,
>>>>>> because the (implicit) behavior is unchanged.
>>>>>>
>>>>>> Whatever the solution, some documentation has to be added.
>>>>>>
>>>>>> Olivier
>>>>>>
>>>>>
>>>>> Unfortunately, I don't think that anything but the first option
>> will
>>>> go into 20.11 and stable releases of older versions, so I stand by
>> my
>>>> acknowledgment of the patch.
>>>>
>>>> If we are affraid about 20.11 performance (it is legitimate, few
>> days
>>>> before the release), we can target 21.02. After all, everybody
>> lives
>>>> with this bug since 2017, so there is no urgency. If accepted and
>> well
>>>> tested, it can be backported in stable branches.
>>>
>>> +1
>>>
>>> Good thinking, Olivier!
>>
>> Looking at the changes once again, it probably can be reworked a bit:
>>
>> - if (m->next != NULL) {
>> - m->next = NULL;
>> - m->nb_segs = 1;
>> - }
>>
>> + if (m->next != NULL)
>> + m->next = NULL;
>> + if (m->nb_segs != 1)
>> + m->nb_segs = 1;
>>
>> That way we add one more condition checking, but I suppose it
>> shouldn't be that perf critical.
>> That way for direct,non-segmented mbuf it still should be write-free.
>> Except cases as you described above: chain(), then split().
>>
>> Of-course we still need to do perf testing for that approach too.
>> So if your preference it to postpone it till 21.02 - that's ok for me.
>> Konstantin
>
> With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
+1
next prev parent reply other threads:[~2020-11-08 14:16 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-04 17:00 Olivier Matz
2020-11-05 0:15 ` Ananyev, Konstantin
2020-11-05 7:46 ` Olivier Matz
2020-11-05 8:26 ` Andrew Rybchenko
2020-11-05 9:10 ` Olivier Matz
2020-11-05 11:34 ` Ananyev, Konstantin
2020-11-05 12:31 ` Olivier Matz
2020-11-05 13:14 ` Ananyev, Konstantin
2020-11-05 13:24 ` Olivier Matz
2020-11-05 13:55 ` Ananyev, Konstantin
2020-11-05 16:30 ` Morten Brørup
2020-11-05 23:55 ` Ananyev, Konstantin
2020-11-06 7:52 ` Morten Brørup
2020-11-06 8:20 ` Olivier Matz
2020-11-06 8:50 ` Morten Brørup
2020-11-06 10:04 ` Olivier Matz
2020-11-06 10:07 ` Morten Brørup
2020-11-06 11:53 ` Ananyev, Konstantin
2020-11-06 12:23 ` Morten Brørup
2020-11-08 14:16 ` Andrew Rybchenko [this message]
2020-11-08 14:19 ` Ananyev, Konstantin
2020-11-10 16:26 ` Olivier Matz
2020-11-05 8:33 ` Morten Brørup
2020-11-05 9:03 ` Olivier Matz
2020-11-05 9:09 ` Andrew Rybchenko
2020-11-08 7:25 ` Ali Alnubani
2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2020-12-18 13:18 ` Morten Brørup
2020-12-18 23:33 ` Ajit Khaparde
2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2021-01-10 9:28 ` Ali Alnubani
2021-01-11 13:14 ` Ananyev, Konstantin
2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2021-01-15 13:59 ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-15 18:39 ` Ali Alnubani
2021-01-18 17:52 ` Ali Alnubani
2021-01-19 8:32 ` Olivier Matz
2021-01-19 8:53 ` Morten Brørup
2021-01-19 12:00 ` Ferruh Yigit
2021-01-19 12:27 ` Morten Brørup
2021-01-19 14:03 ` Ferruh Yigit
2021-01-19 14:21 ` Morten Brørup
2021-01-21 9:15 ` Ferruh Yigit
2021-01-19 14:04 ` Slava Ovsiienko
2021-07-24 8:47 ` Thomas Monjalon
2021-07-30 12:36 ` Olivier Matz
2021-07-30 14:35 ` Morten Brørup
2021-07-30 14:54 ` Thomas Monjalon
2021-07-30 15:14 ` Olivier Matz
2021-07-30 15:23 ` Morten Brørup
2021-08-04 13:29 ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
2021-08-04 14:25 ` Ajit Khaparde
2021-08-05 6:08 ` Morten Brørup
2021-08-06 14:21 ` Thomas Monjalon
2021-08-06 14:24 ` Morten Brørup
2021-09-28 8:28 ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
2021-09-28 9:00 ` Slava Ovsiienko
2021-09-28 9:25 ` Ananyev, Konstantin
2021-09-28 9:39 ` Morten Brørup
2021-09-29 8:03 ` Ali Alnubani
2021-09-29 21:39 ` Olivier Matz
2021-09-30 13:29 ` Ali Alnubani
2021-10-21 8:26 ` Thomas Monjalon
2021-01-21 9:19 ` Ferruh Yigit
2021-01-21 9:29 ` Morten Brørup
2021-01-21 16:35 ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
2021-01-23 8:57 ` Morten Brørup
2021-01-25 17:00 ` Brandon Lo
2021-01-25 18:42 ` Ferruh Yigit
2021-06-15 13:56 ` [dpdk-dev] " Morten Brørup
2021-09-29 21:37 ` [dpdk-dev] [PATCH v5] " Olivier Matz
2021-09-30 13:27 ` Ali Alnubani
2021-10-21 9:18 ` David Marchand
2022-07-28 14:06 ` CI performance test results might be misleading Morten Brørup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c2b4adb2-0a0a-7aa2-d89b-fc51b438b446@oktetlabs.ru \
--to=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
--cc=mb@smartsharesystems.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).