Re: [dpdk-dev] Proposal of mbuf Race Condition Detection

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: jigsaw <jigsaw@gmail.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Proposal of mbuf Race Condition Detection
Date: Thu, 13 Apr 2017 16:59:10 +0200	[thread overview]
Message-ID: <20170413145910.GG3790@6wind.com> (raw)
In-Reply-To: <CAHVfvh6rPgy9X=pSjOCSo4J5rf-aGAkYjs7qh6H5dKGTT1rE8g@mail.gmail.com>

Hi Qinglai,

On Thu, Apr 13, 2017 at 04:38:19PM +0300, jigsaw wrote:
> Hi,
> 
> I have a proposal for mbuf race condition detection and I would like to get
> your opinions before
> committing any patch.
> 
> Race condition is the worst bug I can think of; as it causes crashing long
> after the first crime scene,
> and the consequence is unpredictable and difficult to understand.
> 
> Besides, race condition is very difficult to reproduce. Usually we get a
> coredump from live network,
> but the coredump itself delivers nearly zero info. We just know that the
> mbuf is somehow broken, and
> it is perhaps overwritten by another core at any moment.
> 
> There are tools such as Valgrind and ThreadSanitizer to capture this fault.
> But the overhead brought
> by the tools are too high to be deployed in performance test, not to
> mention in the live network. But
> race condition rarely happens under low pressure.
> 
> Even worse, even in theory, the tools mentioned are not capable of
> capturing the scenario, because
> they requires explicit calls on locking primitives such as pthread mutex.
> This is because the semantics
> are not understood by the tools without explicit lock/unlock.
> 
> With such known restrictions, I have a proposal roughly as below.
> 
> The idea is to ask coder to do explicit lock/unlock on each mbuf that
> matters. The pseudo code is as below:
> 
>     RTE_MBUF_LOCK(m);
>     /* use mbuf as usual */
>     ...
>     RTE_MBUF_UNLOCK(m);
> 
> During the protected critical section, only the core which holds the lock
> can access the mbuf; while
> other cores, if they dare to use mbuf, they will be punished by segfault.
> 
> Since the proposal shall be feasible at least in performance test, the
> overhead of locking and
> punishment must be small. And the access to mbuf must be as usual.
> 
> Hence, the RTE_MBUF_LOCK is actually doing the following (pseudo code):
> 
> RTE_MBUF_LOCK(m)
> {
>     store_per_core_cache(m, m->buf_addr);
>     m->buf_addr = NULL;
>     mb(); // memory barrier
> }
> 
> And RTE_MBUF_UNLOCK is simply the reverse:
> 
> RTE_MBUF_UNLOCK(m)
> {
>     purge_per_core_cache(m);
>     m->buf_addr = load_per_core_cache(m);
>     mb();
> }
> 
> As we can see that the idea is to re-write m->buf_addr with NULL, and store
> it in a per-core
> cache. The other cores, while trying to access the mbuf during critical
> section, will be certainly
> punished.
> 
> Then of course we need to keep the owner core working. This is done by
> making changes to
> mtod, as below:
> 
> #define rte_pktmbuf_mtod_offset(m, t, o) \
>     ((t)((char *)(m)->buf_addr + load_per_core_cache(m) + (m)->data_off +
> (o)))
> 
> The per-core cache of mbuf works like a key-value database, which works
> like a direct-mapping
> cache. If an eviction happens (a newly arriving mbuf must kick out an old
> one), then the we restore
> the buf_addr of the evicted mbuf. Of course the RTE_MBUF_UNLOCK will then
> take care of such
> situation.
> 
> Note that we expect the LOCK/UNLOCK to work even when the mbuf is freed and
> allocated, since
> we don't trust the refcnt. A double-free is actually very rare; but data
> race can occur without double-free. Therefore, we need to survive the
> allocation of mbuf, that is rte_pktmbuf_init, which reset the
> buf_addr.
> 
> Further, other dereference to buf_addr in rte_mbuf.h need to be revised.
> But it is not a big deal (I hope).
> 
> The runtime overhead of this implementation is very light-weighted, and
> probably is able to turned
> on even in live network. And it is not intrusive at all: no change is
> needed in struct rte_mbuf; we just
> need a O(N) space complexity (where N is number of cores), and O(1) runtime
> complexity.
> 
> Alright, that is basically what is in my mind. Before any patch is
> provided, or any perf analysis is made, I would like to know what are the
> risks form design point of view. Please leave your feedback.

Your proposal makes sense but I'm not sure where developers should call
RTE_MBUF_LOCK() and RTE_MBUF_UNLOCK(). Is it targeted at user applications
only? Is it to be used internally by DPDK as well? Does it include PMDs?

I think any overhead outside of debugging mode, as minimal as it is, is too
much of a "punishment" for well behaving applications. The cost of a memory
barrier can be extremely high and this is one of the reasons DPDK processes
mbufs as bulks every time it gets the chance. RTE_MBUF_LOCK() and
RTE_MBUF_UNLOCK() should thus exist under some compilation option disabled
by default, and thought as an additional debugging tool.

Also the implementation would probably be more expensive than your
suggestion, in my opinion the reference count must be taken into account
and/or RTE_MBUF_LOCK() must be recursive, because this is how mbufs
work. Freeing mbufs multiple times is perfectly valid in many cases.

How can one lock several mbufs at once by the way?

Since it affects performance, this can only make sense as a debugging tool
developers can use when they encounter some corruption issue they cannot
identify, somewhat like poisoning buffers before they are freed. Not sure
it's worth the trouble.

Regards,

-- 
Adrien Mazarguil
6WIND

next prev parent reply	other threads:[~2017-04-13 14:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-13 13:38 jigsaw
2017-04-13 14:59 ` Adrien Mazarguil [this message]
2017-04-13 20:58   ` jigsaw
2017-04-13 23:19     ` Ananyev, Konstantin
2017-04-14  0:03       ` Stephen Hemminger
2017-04-14  8:51         ` jigsaw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170413145910.GG3790@6wind.com \
    --to=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jigsaw@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).