From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 7426F370 for ; Tue, 29 Nov 2016 11:00:48 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP; 29 Nov 2016 02:00:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,568,1473145200"; d="scan'208";a="1065594050" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.64]) by orsmga001.jf.intel.com with SMTP; 29 Nov 2016 02:00:44 -0800 Received: by (sSMTP sendmail emulation); Tue, 29 Nov 2016 10:00:43 +0000 Date: Tue, 29 Nov 2016 10:00:43 +0000 From: Bruce Richardson To: Jerin Jacob Cc: Thomas Monjalon , dev@dpdk.org, harry.van.haaren@intel.com, hemant.agrawal@nxp.com, gage.eads@intel.com Message-ID: <20161129100043.GA197024@bricha3-MOBL3.ger.corp.intel.com> References: <1479447902-3700-1-git-send-email-jerin.jacob@caviumnetworks.com> <3691745.y1f1NvKTEv@xps13> <20161124015912.GA13508@svelivela-lt.caveonetworks.com> <1883454.103LptOkIX@xps13> <20161125002334.GA21048@svelivela-lt.caveonetworks.com> <20161125110053.GA149796@bricha3-MOBL3.ger.corp.intel.com> <20161126025454.GA13886@svelivela-lt.caveonetworks.com> <20161128091610.GB168972@bricha3-MOBL3.ger.corp.intel.com> <20161129040141.GA11674@svelivela-lt.caveonetworks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161129040141.GA11674@svelivela-lt.caveonetworks.com> Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.7.1 (2016-10-04) Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Nov 2016 10:00:49 -0000 On Tue, Nov 29, 2016 at 09:31:42AM +0530, Jerin Jacob wrote: > On Mon, Nov 28, 2016 at 09:16:10AM +0000, Bruce Richardson wrote: > > On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote: > > > On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote: > > > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote: > > > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote: > > > > > > 2016-11-24 07:29, Jerin Jacob: > > > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote: > > > > > > > > 2016-11-18 11:14, Jerin Jacob: > > > > > > > > > +Eventdev API - EXPERIMENTAL > > > > > > > > > +M: Jerin Jacob > > > > > > > > > +F: lib/librte_eventdev/ > > > > > > > > > > > > > > > > > > I don't think there is any portability issue here, I can explain. > > > > > > > > > > The application level, we have two more use case to deal with non burst > > > > > variant > > > > > > > > > > - latency critical work > > > > > - on dequeue, if application wants to deal with only one flow(i.e to > > > > > avoid processing two different application flows to avoid cache trashing) > > > > > > > > > > Selection of the burst variants will be based on > > > > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth, > > > > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth ) > > > > > So I don't think their is portability issue here and I don't want to waste my > > > > > CPU cycles on the for loop if application known to be working with non > > > > > bursts variant like below > > > > > > > > > > > > > If the application is known to be working on non-burst varients, then > > > > they always request a burst-size of 1, and skip the loop completely. > > > > There is no extra performance hit in that case in either the app or the > > > > driver (since the non-burst driver always returns 1, irrespective of the > > > > number requested). > > > > > > Hmm. I am afraid, There is. > > > On the app side, the const "1" can not be optimized by the compiler as > > > on downside it is function pointer based driver interface > > > On the driver side, the implementation would be for loop based instead > > > of plain access. > > > (compiler never can see the const "1" in driver interface) > > > > > > We are planning to implement burst mode as kind of emulation mode and > > > have a different scheme for burst and nonburst. The similar approach we have > > > taken in introducing rte_event_schedule() and split the responsibility so > > > that SW driver can work without additional performance overhead and neat > > > driver interface. > > > > > > If you are concerned about the usability part and regression on the SW > > > driver, then it's not the case, application will use nonburst variant only if > > > dequeue_depth == 1 and/or explicit case where latency matters. > > > > > > On the portability side, we support both case and application if written based > > > on dequeue_depth it will perform well in both implementations.IMO, There is > > > no another shortcut for performance optimized application running on different > > > set of model.I think it is not an issue as, in event model as each cores > > > identical and main loop can be changed based on dequeue_depth > > > if needs performance(anyway mainloop will be function pointer based). > > > > > > > Ok, I think I see your point now. Here is an alternative suggestion. > > > > 1. Keep the single user API. > > 2. Have both single and burst function pointers in the driver > > 3. Call appropriately in the eventdev layer based on parameters. For > > example: > > > > rte_event_dequeue_burst(..., int num) > > { > > if (num == 1 && single_dequeue_fn != NULL) > > return single_dequeue_fn(...); > > return burst_dequeue_fn(...); > > } > > > > This way drivers can optionally special-case the single dequeue case - > > the function pointer check will definitely be predictable in HW making > > that a near-zero-cost check - while not forcing all drivers to do so. > > It also reduces the public API surface, and gives us a single enqueue > > and dequeue function. > > The alternative suggestion looks good to me. Yes, it makes sense to reduces the > public API interface if possible. > > Regarding the implementation, I thought to have a bit approach like below > to reduce the cost of additional AND operation.(with const "1", compiler > can choose with correct one with out any overhead) > > rte_event_dequeue_burst(..., int num) > { > if (num == 1) > return single_dequeue_fn(...); > return burst_dequeue_fn(...); > } > > "single_dequeue_fn" populated from the driver layer. > In the absence of populating the "single_dequeue_fn" from the driver layer, > The common code can create the single_dequeue_fn using driver > provided "burst_dequeue_fn" > > something like > generic_single_dequeue_fn(dev){ > { > dev->burst_dequeue_fn(..,1); > } > > Any concerns? > No, works ok for me /Bruce