DPDK patches and discussions
 help / color / mirror / Atom feed
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Morten Brørup" <mb@smartsharesystems.com>,
	"Olivier Matz" <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [RFC] ring: count and empty optimizations
Date: Thu, 30 Apr 2020 21:38:25 +0000	[thread overview]
Message-ID: <DBBPR08MB4646847454AB68B70B369DE598AA0@DBBPR08MB4646.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <BYAPR11MB3301B33E9BA72B7F9A19D3E19AAA0@BYAPR11MB3301.namprd11.prod.outlook.com>

> > > <snip>
> > >
> > > >
> > > > Hi Morten,
> > > >
> > > > On Tue, Apr 28, 2020 at 03:53:15PM +0200, Morten Brørup wrote:
> > > > > Olivier (maintainer of the Ring),
> > > >
> > > > I'm not anymore, CC'ing Konstantin and Honnappa.
> > > >
> > > > > I would like to suggest a couple of minor optimizations to the
> > > > > ring
> > > library.
> > > > >
> > > > >
> > > > > 1. Testing if the ring is empty is as simple as comparing the
> > > producer and
> > > > consumer pointers:
> > > > >
> > > > > static inline int
> > > > > rte_ring_empty(const struct rte_ring *r) {
> > > > > -	return rte_ring_count(r) == 0;
> > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > +	uint32_t cons_tail = r->cons.tail;
> > > > > +	return cons_tail == prod_tail;
> > > > > }
> > > > >
> > > > > In theory, this optimization reduces the number of potential
> > > > > cache
> > > misses
> > > > from 3 to 2 by not having to read r->mask in rte_ring_count().
> > > >
> > > > This one looks correct to me.
> > > >
> > > >
> > > > > 2. It is not possible to enqueue more elements than the capacity
> > > > > of
> > > a ring,
> > > > so the count function does not need to test if the capacity is
> > > exceeded:
> > > > >
> > > > > static inline unsigned
> > > > > rte_ring_count(const struct rte_ring *r) {
> > > > > 	uint32_t prod_tail = r->prod.tail;
> > > > > 	uint32_t cons_tail = r->cons.tail;
> > > > > 	uint32_t count = (prod_tail - cons_tail) & r->mask;
> > > > > -	return (count > r->capacity) ? r->capacity : count;
> > > > > + 	return count;
> > > > > }
> > > > >
> > > > > I cannot even come up with a race condition in this function
> > > > > where
> > > the
> > > > count would exceed the capacity. Maybe I missed something?
> > > >
> > > > Since there is no memory barrier, the order in which prod_tail and
> > > cons_tail
> > > > are fetched is not guaranteed. Or the thread could be interrupted
> > > > by
> > > the
> > > > kernel in between.
> > > The '__rte_ring_move_prod_head' function ensures that the distance
> > > between prod.head and cons.tail is always within the capacity
> > > irrespective of whether the consumers/producers are sleeping.
> >
> > Yes, this is exactly what I was thinking.
> >
> > The tails are the pointers after any updates, which is shown very nicely in
> the documentation.
> > And certainly prod.tail will not move further ahead than prod.head.
> >
> > So it makes sense using the tails outside the functions that move the
> pointers.
> >
> > Olivier found the race I couldn't find:
> > 1. The thread calls rte_ring_count(), and since there is no memory barrier it
> reads cons.tail, and has not yet read prod.tail.
> > 2. Other threads use the ring simultaneously. A consumer thread moves
> cons.tail ahead and a producer thread then moves prod.tail ahead.
> > Note: Without first moving cons.tail ahead, prod.tail cannot move too far
> ahead.
> > 3. The thread proceeds to read prod.tail. Now the count is completely
> incorrect, and may even exceed the capacity.
> >
> > Olivier pointed out that this could happen if the rte_ring_count
> > thread is interrupted by the kernel, and I agree. However, intuitively
> > I don't think that it can happen in a non-EAL thread, because the consumer
> thread needs to finish moving cons.tail before the producer thread can move
> prod.tail too far ahead. And by then the rte_ring_count thread has had plenty
> of time to read prod.tail. But it could happen in theory.
The issues associated with using rte_ring on non-EAL threads are documented at [1]. Knowing these issues, I think these functions should not pretend to keep the values correct which will save some cycles.

[1] https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#known-issues

> >
> > > > This function may probably return invalid results in MC/MP cases.
> > > > We just ensure that the result is between 0 and r->capacity.
> >
> > So should we update the documentation to say that it might return an
> > incorrect count (if there is a race), or should we fix the function to always
> provide a correct value?
I cannot think of a way to fix these functions for non-EAL observer threads (threads from where these functions are called).
For EAL observer threads, the data is correct at the point of observation in the code (may be not at the point of use of this data)

> 
> As long as you invoke rte_ring_count() while there are other active
> producers/consumers for that ring - it's return value can always be outdated,
> and not reflect current ring state.
> So I think just updating the doc is enough.
> 
> >
> > Furthermore, the same race condition probably affects rte_ring_empty()
> similarly, even in my improved version.
Yes

> >
> > And do these functions need to support non-EAL threads? I don't think so.
> What do you think?
> 
> Not sure why you differ EAL and non EAL threads here.
> The only difference between them from scheduler point of view - EAL threads
> have cpu affinity set by rte_eal_init().
> But nothing prevents user from setting/updating cpu affinity for any thread in
> his process in a way he likes.

      reply	other threads:[~2020-04-30 21:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28 13:53 Morten Brørup
2020-04-29 13:38 ` Olivier Matz
2020-04-30  1:12   ` Honnappa Nagarahalli
2020-04-30  9:19     ` Morten Brørup
2020-04-30 15:36       ` Ananyev, Konstantin
2020-04-30 21:38         ` Honnappa Nagarahalli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBBPR08MB4646847454AB68B70B369DE598AA0@DBBPR08MB4646.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).