Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: Bruce Richardson <bruce.richardson@intel.com>,
	"Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "Nicolau, Radu" <radu.nicolau@intel.com>,
	"Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"jerinj@marvell.com" <jerinj@marvell.com>, nd <nd@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements
Date: Mon, 28 Sep 2020 16:02:48 +0000	[thread overview]
Message-ID: <AM8PR08MB581035DD2604028F6A74337198350@AM8PR08MB5810.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20200925102805.GD923@bricha3-MOBL.ger.corp.intel.com>

<snip>
> > Add minimum burst throughout the scheduler pipeline and a flush counter.
> > Replace ring API calls with local single threaded implementation where
> > possible.
> >
> > Signed-off-by: Radu Nicolau mailto:radu.nicolau@intel.com
> >
> > Thanks for the patch, a few comments inline.
> >
> > ---
> >  drivers/event/sw/sw_evdev.hï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ | 11 +++-
> > drivers/event/sw/sw_evdev_scheduler.c | 83
> > +++++++++++++++++++++++----
> >  2 files changed, 81 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h
> > index 7c77b2495..95e51065f 100644
> > --- a/drivers/event/sw/sw_evdev.h
> > +++ b/drivers/event/sw/sw_evdev.h
> > @@ -29,7 +29,13 @@
> >  /* report dequeue burst sizes in buckets */ï¿½ #define
> > SW_DEQ_STAT_BUCKET_SHIFT 2
> >  /* how many packets pulled from port by sched */ -#define
> > SCHED_DEQUEUE_BURST_SIZE 32
> > +#define SCHED_DEQUEUE_BURST_SIZE 64
> > +
> > +#define SCHED_MIN_BURST_SIZE 8
> > +#define SCHED_NO_ENQ_CYCLE_FLUSH 256
> > +/* set SCHED_DEQUEUE_BURST_SIZE to 64 or 128 when setting this to 1*/
> > +#define SCHED_REFILL_ONCE_PER_CALL 1
> >
> > Is it possible to make the above #define a runtime option?
> > Eg, --vdev event_sw,refill_iter=1
> >
> > That would allow packaged versions of DPDK to be usable in both modes.
> >
> > +
> >
> >  #define SW_PORT_HIST_LIST (MAX_SW_PROD_Q_DEPTH) /* size of our
> > history list */ï¿½ #define NUM_SAMPLES 64 /* how many data points use
> > for average stats */ @@ -214,6 +220,9 @@ struct sw_evdev {  ï¿½ï¿½ï¿½
> > uint32_t xstats_count_mode_port;  ï¿½ï¿½ï¿½ uint32_t
> > xstats_count_mode_queue;
> >
> > +ï¿½ï¿½ï¿½ uint16_t sched_flush_count; ï¿½ï¿½ï¿½ uint16_t
> > +sched_min_burst;
> > +
> >  ï¿½ï¿½ï¿½ /* Contains all ports - load balanced and directed */
> > ï¿½ï¿½ï¿½ struct sw_port ports[SW_PORTS_MAX] __rte_cache_aligned;
> >
> > diff --git a/drivers/event/sw/sw_evdev_scheduler.c
> > b/drivers/event/sw/sw_evdev_scheduler.c
> > index cff747da8..ca6d1caff 100644
> > --- a/drivers/event/sw/sw_evdev_scheduler.c
> > +++ b/drivers/event/sw/sw_evdev_scheduler.c
> > @@ -26,6 +26,29 @@
> >  /* use cheap bit mixing, we only need to lose a few bits */ï¿½
> > #define
> > SW_HASH_FLOWID(f) (((f) ^ (f >> 10)) & FLOWID_MASK)
> >
> > +
> > +/* single object enq and deq for non MT ring */ static
> > +__rte_always_inline void sw_nonmt_ring_dequeue(struct rte_ring *r,
> > +void **obj) { ï¿½ï¿½ï¿½ if ((r->prod.tail - r->cons.tail) < 1)
> > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ return; ï¿½ï¿½ï¿½ void **ring =
> > +(void **)&r[1]; ï¿½ï¿½ï¿½ *obj = ring[r->cons.tail & r->mask];
> > +ï¿½ï¿½ï¿½ r->cons.tail++; } static __rte_always_inline int
> > +sw_nonmt_ring_enqueue(struct rte_ring *r, void *obj) { ï¿½ï¿½ï¿½ if
> > +((r->capacity + r->cons.tail - r->prod.tail) < 1)
> > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ return 0; ï¿½ï¿½ï¿½ void **ring =
> > +(void **)&r[1]; ï¿½ï¿½ï¿½ ring[r->prod.tail & r->mask] = obj;
> > +ï¿½ï¿½ï¿½ r->prod.tail++; ï¿½ï¿½ï¿½ return 1;
> > +
> > Why not make these APIs part of the rte_ring library? You could further
> optimize them by keeping the indices on the same cacheline.
> > I'm not sure there is any need for non thread-safe rings outside this
> particular case.
> > [Honnappa] I think if we add the APIs, we will find the use cases.
> > But, more than that, I understand that rte_ring structure is exposed to the
> application. The reason for doing that is the inline functions that rte_ring
> provides. IMO, we should still maintain modularity and should not use the
> internals of the rte_ring structure outside of the library.
> >
> > +1 to that.
> >
> > BTW, is there any real perf benefit from such micor-optimisation?
> 
> I'd tend to view these as use-case specific, and I'm not sure we should clutter
> up the ring library with yet more functions, especially since they can't be
> mixed with the existing enqueue/dequeue functions, since they don't use
> the head pointers.
IMO, the ring library is pretty organized with the recent addition of HTS/RTS modes. This can be one of the modes and should allow us to use the existing functions (though additional functions are required as well).
The other concern I have is, this implementation can be further optimized by using a single cache line for the pointers. It uses 2 cache lines just because of the layout of the rte_ring structure.
There was a question earlier about the performance improvements of this patch? Are there any % performance improvements that can be shared?
It is also possible to change the above functions to use the head/tail pointers from producer or the consumer cache line alone to check for perf differences.

next prev parent reply	other threads:[~2020-09-28 16:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08 10:52 Radu Nicolau
2020-09-23 11:13 ` Van Haaren, Harry
2020-09-23 23:10   ` Honnappa Nagarahalli
2020-09-24 15:27     ` Nicolau, Radu
2020-09-24 17:38       ` Honnappa Nagarahalli
2020-09-24 18:02         ` Ananyev, Konstantin
2020-09-25 10:28           ` Bruce Richardson
2020-09-28 16:02             ` Honnappa Nagarahalli [this message]
2020-09-29  9:02               ` Nicolau, Radu
2020-10-05 16:35                 ` Jerin Jacob
2020-10-06  7:59                   ` Van Haaren, Harry
2020-10-06 10:13                     ` Ananyev, Konstantin
2020-10-07 10:44                       ` Nicolau, Radu
2020-10-07 10:52                         ` Ananyev, Konstantin
2020-10-13 19:11                           ` Jerin Jacob
2020-10-14  8:32                             ` Nicolau, Radu
2020-10-14 10:09                               ` Jerin Jacob
2020-10-14 10:21                                 ` Ananyev, Konstantin
2020-10-14 18:27                                   ` Jerin Jacob
2020-09-28  8:28 ` [dpdk-dev] [PATCH v2] " Radu Nicolau
2020-09-28 13:47   ` Van Haaren, Harry
2020-10-07 13:51 ` [dpdk-dev] [PATCH v3] " Radu Nicolau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM8PR08MB581035DD2604028F6A74337198350@AM8PR08MB5810.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=nd@arm.com \
    --cc=radu.nicolau@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).