DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Wiles, Keith" <keith.wiles@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] rte_ring features in use (or not)
Date: Tue, 31 Jan 2017 13:46:52 +0000	[thread overview]
Message-ID: <20170131134651.GB133984@bricha3-MOBL3.ger.corp.intel.com> (raw)
In-Reply-To: <20170131142718.25e6c876@platinum>

On Tue, Jan 31, 2017 at 02:27:18PM +0100, Olivier Matz wrote:
> On Tue, 31 Jan 2017 12:10:50 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > On Tue, Jan 31, 2017 at 11:41:42AM +0000, Bruce Richardson wrote:
> > > On Tue, Jan 31, 2017 at 11:53:49AM +0100, Olivier Matz wrote:  
> > > > On Wed, 25 Jan 2017 17:29:18 +0000, "Ananyev, Konstantin"
> > > > <konstantin.ananyev@intel.com> wrote:  
> > > > > > > > Bonus question:
> > > > > > > > * Do we know how widely used the enq_bulk/deq_bulk
> > > > > > > > functions are? They are useful for unit tests, so they do
> > > > > > > > have uses, but I think it would be good if we harmonized
> > > > > > > > the return values between bulk and burst functions. Right
> > > > > > > > now: enq_bulk  - only enqueues all elements or none.
> > > > > > > > Returns 0 for all, or negative error for none.
> > > > > > > >    enq_burst - enqueues as many elements as possible.
> > > > > > > > Returns the number enqueued.    
> > > > > > >
> > > > > > > I do use the apis in pktgen and the difference in return
> > > > > > > values has got me once. Making them common would be great,
> > > > > > > but the problem is    
> > > > > > backward compat to old versions I would need to have an ifdef
> > > > > > in pktgen now. So it seems like we moved the problem to the
> > > > > > application.    
> > > > > > >    
> > > > > > 
> > > > > > Yes, an ifdef would be needed, but how many versions of DPDK
> > > > > > back do you support? Could the ifdef be removed again after
> > > > > > say, 6 months?   
> > > > > > > I would like to see the old API kept and a new API with the
> > > > > > > new behavior. I know it adds another API but one of the API
> > > > > > > would be nothing    
> > > > > > more than wrapper function if not a macro.    
> > > > > > >
> > > > > > > Would that be more reasonable then changing the ABI?    
> > > > > > 
> > > > > > Technically, this would be an API rather than ABI change,
> > > > > > since the functions are inlined in the code. However, it's
> > > > > > not the only API change I'm looking to make here - I'd like
> > > > > > to have all the functions start returning details of the
> > > > > > state of the ring, rather than have the watermarks facility.
> > > > > > If we add all new functions for this and keep the old ones
> > > > > > around, we are just increasing our maintenance burden.
> > > > > > 
> > > > > > I'd like other opinions here. Do we see increasing the API
> > > > > > surface as the best solution, or are we ok to change the APIs
> > > > > > of a key library like the rings one?    
> > > > > 
> > > > > I am ok with changing API to make both _bulk and _burst return
> > > > > the same thing. Konstantin   
> > > > 
> > > > I agree that the _bulk() functions returning 0 or -err can be
> > > > confusing. But it has at least one advantage: it explicitly shows
> > > > that if user ask for N enqueues/dequeues, it will either get N or
> > > > 0, not something between.
> > > > 
> > > > Changing the API of the existing _bulk() functions looks a bit
> > > > dangerous to me. There's probably a lot of code relying on the
> > > > ring API, and changing its behavior may break it.
> > > > 
> > > > I'd prefer to deprecate the old _bulk and _burst functions, and
> > > > introduce a new api, maybe something like:
> > > > 
> > > >   rte_ring_generic_dequeue(ring, objs, n, behavior, flags)  
> > > >   -> return nb_objs or -err  
> > > >   
> > > Don't like the -err, since it's not a valid value that can be used
> > > e.g. in simple loops in the case that the user doesn't care about
> > > the exact reason for error. I prefer having zero returned on error,
> > > with rte_errno set appropriately, since then it is trivial for apps
> > > to ignore error values they don't care about.
> > > It also makes the APIs in a ring library consistent in that all
> > > will set rte_errno on error, rather than returning the error code.
> > > It's not right for rte_ring_create and rte_ring_lookup to return an
> > > error code since they return pointers, not integer values.
> 
> My assumption was that functions returning an int should return an
> error instead of rte_errno. By the way, it's actually the same debate
> than http://dpdk.org/ml/archives/dev/2017-January/056546.html
> 
> In that particular case, I'm not convinced that this code:
> 
> 	ret = ring_dequeue(r, objs, n);
> 	if (ret == 0) {
> 		/* handle error in rte_errno */
> 		return;
> 	}
> 	do_stuff_with_elements(objs, ret);
> 
> Is better/faster/clearer than this one:
> 
> 	ret = ring_dequeue(r, objs, n);
> 	if (ret <= 0) {
> 		/* handle error in ret */
> 		return;
> 	}
> 	do_stuff_with_elements(objs, ret);
> 
> 
> In the first case, you could argue that the "if (ret)" part could be
> stripped if the app does not care about errors, but I think it's not
> efficient to call the next function with 0 object. Also, this if() does
> not necessarily adds a test since ring_dequeue() is inline.
> 
> In the first case, ring_dequeue needs to write rte_errno in memory on
> error (because it's a global variable), even if the caller does not
> look at it. In the second case, it can stay in a register.
> 

I agree in many cases there is not a lot to choose between the two
methods.

However, I prefer the errno approach for 3 reasons:

1. Firstly, and primarily, it works in all cases, including for use with
  functions that return pointers. That allows a library like rte_ring to
  use it across all functions, rather than having some functions use an
  errno variable, or extra return value, and other functions return the
  error code directly.
2. It's how unix system calls work, so everyone is familiar with the
  scheme.
3. It allows the return value to be always in the valid domain of return
  values for the type. You can have dequeue functions that always return
  unsigned values, you can have functions that return enums etc. This
  means you can track stats and chain function calls without having to
  do error checking if you like: for example, moving packets between
  rings:
  	rte_ring_enqueue_burst(r2, objs, rte_ring_dequeue_burst(r1, objs, sizeof(objs));
  This is for me the least compelling of the reasons, but is still worth
  considering, and I do admit to liking the functional style of
  programming it allows.

> 
> > > 
> > > As for deprecating the functions - I'm not sure about that. I think
> > > the names of the existing functions are ok, and should be kept.
> > > I've a new patchset of cleanups for rte_rings in the works. Let me
> > > try and finish that and send it out as an RFC and we'll see what
> > > you think then. 
> > Sorry, I realised on re-reading this reply seemed overly negative,
> > sorry.
> 
> haha, no problem :)
> 
> 
> > I can actually see the case for deprecating both sets of
> > functions to allow us to "start afresh". If we do so, are we as well
> > to just replace the whole library with a new one, e.g. rte_fifo, which
> > would allow us the freedom to keep e.g. functions with "burst" in the
> > name if we so wish? If might also allow an easier transition.
> 
> Yes, that's also an option.
> 
> My fear is about changing the API of such widely used functions,
> without triggering any compilation error because the prototypes stays
> the same.
>
Don't worry, I also plan to change the prototypes! I want to add in an
extra parameter to each call to optionally return the amount of space in
the ring. This is can lead to large cycle savings - by avoiding extra
calls to rte_ring_count() or rte_ring_free_count() - in cases where it
is useful. We found that this lead to significant performance
improvements in the SW eventdev, as we had less ping-ponging of
cachelines between cores. Since we already know the amount of free space
in an enqueue call we can return that for free, while calling a separate
free_space API can lead to huge stalls if the cachelines have been
adjusted by another core in the meantime. [Yes, this does mean that the
value returned from an enqueue call would be inaccurate, but it can
still, in the case of SP rings, provide a guarantee that any number of
objects up to that number can be enqueued without error, since any
changes will only increase the number that can be enqueued]

The new APIs would therefore be:

	rte_ring_enqueue_bulk(r, objs, n, &free_space);

	rte_ring_dequeue_bulk(r, objs, n, &avail_objs);

This also would allow us to remove the watermarks feature, as the app
can itself check the free_space value against as many watermark
thresholds as it likes.
Hopefully I'll get an RFC with this change ready in the next few days
and we can base the discussion more on working code.

Regards,
/Bruce

  reply	other threads:[~2017-01-31 13:46 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-25 12:14 Bruce Richardson
2017-01-25 12:16 ` Bruce Richardson
2017-01-25 13:20 ` Olivier MATZ
2017-01-25 13:54   ` Bruce Richardson
2017-01-25 14:48     ` Bruce Richardson
2017-01-25 15:59       ` Wiles, Keith
2017-01-25 16:57         ` Bruce Richardson
2017-01-25 17:29           ` Ananyev, Konstantin
2017-01-31 10:53             ` Olivier Matz
2017-01-31 11:41               ` Bruce Richardson
2017-01-31 12:10                 ` Bruce Richardson
2017-01-31 13:27                   ` Olivier Matz
2017-01-31 13:46                     ` Bruce Richardson [this message]
2017-01-25 22:27           ` Wiles, Keith
2017-01-25 16:39   ` Stephen Hemminger
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
2017-02-14  8:32   ` Olivier Matz
2017-02-14  9:39     ` Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 01/19] app/pdump: fix duplicate macro definition Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 02/19] ring: remove split cacheline build setting Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 03/19] ring: create common structure for prod and cons metadata Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 04/19] ring: add a function to return the ring size Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 05/19] crypto/null: use ring size function Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 07/19] ring: remove debug setting Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 08/19] ring: remove the yield when waiting for tail update Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 09/19] ring: remove watermark support Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 10/19] ring: make bulk and burst fn return vals consistent Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 11/19] ring: allow enq fns to return free space value Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 12/19] examples/quota_watermark: use ring space for watermarks Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 13/19] ring: allow dequeue fns to return remaining entry count Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 14/19] ring: reduce scope of local variables Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 15/19] ring: separate out head index manipulation for enq/deq Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 16/19] ring: create common function for updating tail idx Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 17/19] ring: allow macros to work with any type of object Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 18/19] ring: add object size parameter to memory size calculation Bruce Richardson
2017-02-07 14:12 ` [dpdk-dev] [PATCH RFCv3 19/19] ring: add event ring implementation Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170131134651.GB133984@bricha3-MOBL3.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).