From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id A74C4374 for ; Fri, 30 Jun 2017 15:59:07 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Jun 2017 06:59:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,286,1496127600"; d="scan'208";a="105661208" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.28]) by orsmga002.jf.intel.com with SMTP; 30 Jun 2017 06:59:04 -0700 Received: by (sSMTP sendmail emulation); Fri, 30 Jun 2017 14:59:03 +0100 Date: Fri, 30 Jun 2017 14:59:03 +0100 From: Bruce Richardson To: Olivier Matz Cc: dev@dpdk.org, jerin.jacob@caviumnetworks.com Message-ID: <20170630135903.GA18460@bricha3-MOBL3.ger.corp.intel.com> References: <20170607133620.275801-1-bruce.richardson@intel.com> <20170607133620.275801-2-bruce.richardson@intel.com> <20170630114046.5c6eb8bb@platinum> <20170630113227.GG14776@bricha3-MOBL3.ger.corp.intel.com> <20170630142438.293c8e06@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170630142438.293c8e06@platinum> Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.8.1 (2017-04-11) Subject: Re: [dpdk-dev] [PATCH 1/5] ring: allow rings with non power-of-2 sizes X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2017 13:59:10 -0000 On Fri, Jun 30, 2017 at 02:24:38PM +0200, Olivier Matz wrote: > On Fri, 30 Jun 2017 12:32:27 +0100, Bruce Richardson > wrote: > > On Fri, Jun 30, 2017 at 11:40:46AM +0200, Olivier Matz wrote: > > > Hi Bruce, > > > > > > Few comments inline. > > > > > > On Wed, 7 Jun 2017 14:36:16 +0100, Bruce Richardson wrote: > > > > The rte_rings traditionally have only supported having ring sizes as powers > > > > of 2, with the actual usable space being the size - 1. In some cases, for > > > > example, with an eventdev where we want to precisely control queue depths > > > > for latency, we need to allow ring sizes which are not powers of two so we > > > > add in an additional ring capacity value to allow that. For existing rings, > > > > this value will be size-1, i.e. the same as the mask, but if the new > > > > EXACT_SZ flag is passed on ring creation, the ring will have exactly the > > > > usable space requested, although the underlying memory size may be bigger. > > > > > > > > Signed-off-by: Bruce Richardson > > > > > > Specifying the exact wanted size looks to be a better API than the current > > > one, which is to give the power of two above the wanted value, knowing there > > > will be only size-1 elements available. > > > > > > What would you think about deprecating ring init/creation without this > > > flag in a few versions? Then, later, we could remove this flag and > > > the new behavior would become the default. > > > > > > > I'm not really sure it's necessary. For the vast majority of cases what > > we have now is fine, and it ensures that we maximise the ring space. I'd > > tend to keep the new behaviour for those cases where we really need it. > > > > It's a trade off between what we hide and expose: > > * current scheme hides the fact that you get one entry less than you ask > > for, but the memory space is as expected. > > * new scheme gives you exactly the entries you ask for, but hides the > > fact that you could be using up to double the memory space for the > > ring. > > Yes, having 2 different behavior is precisely what I don't really like. > Giving the number of entries the user asks for is straightforward, which > is a good thing for an API. And hiding the extra consumed memory looks > acceptable, this can be documented. > > That said, if we decide to deprecate it, there's no need to do it right > now. > > > > > > @@ -845,69 +857,63 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p) > > > > } > > > > > > > > /** > > > > - * Test if a ring is full. > > > > + * Return the number of entries in a ring. > > > > * > > > > * @param r > > > > * A pointer to the ring structure. > > > > * @return > > > > - * - 1: The ring is full. > > > > - * - 0: The ring is not full. > > > > + * The number of entries in the ring. > > > > */ > > > > -static inline int > > > > -rte_ring_full(const struct rte_ring *r) > > > > +static inline unsigned > > > > +rte_ring_count(const struct rte_ring *r) > > > > { > > > > uint32_t prod_tail = r->prod.tail; > > > > uint32_t cons_tail = r->cons.tail; > > > > - return ((cons_tail - prod_tail - 1) & r->mask) == 0; > > > > + return (prod_tail - cons_tail) & r->mask; > > > > } > > > > > > When used on a mc/mp ring, this function can now return a value > > > which is higher than the ring capacity. Even if this function is > > > not really accurate when used while the ring is use, I think we > > > should maximize the result with r->capacity. > > > > > > This will also avoid rte_ring_free_count() to return a overflowed > > > value. > > > > > > > How does it return an overflowing value? I think in the MP/MC case the > > tests made each time around the loop for cmpset should ensure we never > > overflow. > > Here we are reading r->prod.tail and r->cons.tail without > synchronization, nor memory barriers. So basically, there is no > guarantee that the values are consistent. > > Before, the returned value could be wrong, but always in the > interval [0, mask]. > > Now, rte_ring_count() is still in the interval [0, mask], but the > capacity of the ring is lower than mask. If rte_ring_count() returns > mask, it would cause rte_ring_free_count() to return a value lower than > 0 (overflowed since the result is unsigned). > > I'm still not convinced that this can actually happen, but just in case, I'll add in the extra check in V2. /Bruce