[dpdk-dev] Proposal of mbuf Race Condition Detection

DPDK patches and discussions
 help / color / mirror / Atom feed

From: jigsaw <jigsaw@gmail.com>
To: "dev@dpdk.org" <dev@dpdk.org>
Subject: [dpdk-dev] Proposal of mbuf Race Condition Detection
Date: Thu, 13 Apr 2017 16:38:19 +0300	[thread overview]
Message-ID: <CAHVfvh6rPgy9X=pSjOCSo4J5rf-aGAkYjs7qh6H5dKGTT1rE8g@mail.gmail.com> (raw)

Hi,

I have a proposal for mbuf race condition detection and I would like to get
your opinions before
committing any patch.

Race condition is the worst bug I can think of; as it causes crashing long
after the first crime scene,
and the consequence is unpredictable and difficult to understand.

Besides, race condition is very difficult to reproduce. Usually we get a
coredump from live network,
but the coredump itself delivers nearly zero info. We just know that the
mbuf is somehow broken, and
it is perhaps overwritten by another core at any moment.

There are tools such as Valgrind and ThreadSanitizer to capture this fault.
But the overhead brought
by the tools are too high to be deployed in performance test, not to
mention in the live network. But
race condition rarely happens under low pressure.

Even worse, even in theory, the tools mentioned are not capable of
capturing the scenario, because
they requires explicit calls on locking primitives such as pthread mutex.
This is because the semantics
are not understood by the tools without explicit lock/unlock.

With such known restrictions, I have a proposal roughly as below.

The idea is to ask coder to do explicit lock/unlock on each mbuf that
matters. The pseudo code is as below:

    RTE_MBUF_LOCK(m);
    /* use mbuf as usual */
    ...
    RTE_MBUF_UNLOCK(m);

During the protected critical section, only the core which holds the lock
can access the mbuf; while
other cores, if they dare to use mbuf, they will be punished by segfault.

Since the proposal shall be feasible at least in performance test, the
overhead of locking and
punishment must be small. And the access to mbuf must be as usual.

Hence, the RTE_MBUF_LOCK is actually doing the following (pseudo code):

RTE_MBUF_LOCK(m)
{
    store_per_core_cache(m, m->buf_addr);
    m->buf_addr = NULL;
    mb(); // memory barrier
}

And RTE_MBUF_UNLOCK is simply the reverse:

RTE_MBUF_UNLOCK(m)
{
    purge_per_core_cache(m);
    m->buf_addr = load_per_core_cache(m);
    mb();
}

As we can see that the idea is to re-write m->buf_addr with NULL, and store
it in a per-core
cache. The other cores, while trying to access the mbuf during critical
section, will be certainly
punished.

Then of course we need to keep the owner core working. This is done by
making changes to
mtod, as below:

#define rte_pktmbuf_mtod_offset(m, t, o) \
    ((t)((char *)(m)->buf_addr + load_per_core_cache(m) + (m)->data_off +
(o)))

The per-core cache of mbuf works like a key-value database, which works
like a direct-mapping
cache. If an eviction happens (a newly arriving mbuf must kick out an old
one), then the we restore
the buf_addr of the evicted mbuf. Of course the RTE_MBUF_UNLOCK will then
take care of such
situation.

Note that we expect the LOCK/UNLOCK to work even when the mbuf is freed and
allocated, since
we don't trust the refcnt. A double-free is actually very rare; but data
race can occur without double-free. Therefore, we need to survive the
allocation of mbuf, that is rte_pktmbuf_init, which reset the
buf_addr.

Further, other dereference to buf_addr in rte_mbuf.h need to be revised.
But it is not a big deal (I hope).

The runtime overhead of this implementation is very light-weighted, and
probably is able to turned
on even in live network. And it is not intrusive at all: no change is
needed in struct rte_mbuf; we just
need a O(N) space complexity (where N is number of cores), and O(1) runtime
complexity.

Alright, that is basically what is in my mind. Before any patch is
provided, or any perf analysis is made, I would like to know what are the
risks form design point of view. Please leave your feedback.

Thanks &
Regards,

Qinglai

next             reply	other threads:[~2017-04-13 13:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-13 13:38 jigsaw [this message]
2017-04-13 14:59 ` Adrien Mazarguil
2017-04-13 20:58   ` jigsaw
2017-04-13 23:19     ` Ananyev, Konstantin
2017-04-14  0:03       ` Stephen Hemminger
2017-04-14  8:51         ` jigsaw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHVfvh6rPgy9X=pSjOCSo4J5rf-aGAkYjs7qh6H5dKGTT1rE8g@mail.gmail.com' \
    --to=jigsaw@gmail.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).