From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 8B3D52A5D for ; Mon, 28 Nov 2016 10:16:15 +0100 (CET) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP; 28 Nov 2016 01:16:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,563,1473145200"; d="scan'208";a="35056242" Received: from bricha3-mobl3.ger.corp.intel.com ([10.252.23.61]) by orsmga004.jf.intel.com with SMTP; 28 Nov 2016 01:16:11 -0800 Received: by (sSMTP sendmail emulation); Mon, 28 Nov 2016 09:16:11 +0000 Date: Mon, 28 Nov 2016 09:16:10 +0000 From: Bruce Richardson To: Jerin Jacob Cc: Thomas Monjalon , dev@dpdk.org, harry.van.haaren@intel.com, hemant.agrawal@nxp.com, gage.eads@intel.com Message-ID: <20161128091610.GB168972@bricha3-MOBL3.ger.corp.intel.com> References: <1479447902-3700-1-git-send-email-jerin.jacob@caviumnetworks.com> <3691745.y1f1NvKTEv@xps13> <20161124015912.GA13508@svelivela-lt.caveonetworks.com> <1883454.103LptOkIX@xps13> <20161125002334.GA21048@svelivela-lt.caveonetworks.com> <20161125110053.GA149796@bricha3-MOBL3.ger.corp.intel.com> <20161126025454.GA13886@svelivela-lt.caveonetworks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161126025454.GA13886@svelivela-lt.caveonetworks.com> Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.7.1 (2016-10-04) Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2016 09:16:16 -0000 On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote: > On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote: > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote: > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote: > > > > 2016-11-24 07:29, Jerin Jacob: > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote: > > > > > > 2016-11-18 11:14, Jerin Jacob: > > > > > > > +Eventdev API - EXPERIMENTAL > > > > > > > +M: Jerin Jacob > > > > > > > +F: lib/librte_eventdev/ > > > > > > > > > > > > I don't think there is any portability issue here, I can explain. > > > > > > The application level, we have two more use case to deal with non burst > > > variant > > > > > > - latency critical work > > > - on dequeue, if application wants to deal with only one flow(i.e to > > > avoid processing two different application flows to avoid cache trashing) > > > > > > Selection of the burst variants will be based on > > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth, > > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth ) > > > So I don't think their is portability issue here and I don't want to waste my > > > CPU cycles on the for loop if application known to be working with non > > > bursts variant like below > > > > > > > If the application is known to be working on non-burst varients, then > > they always request a burst-size of 1, and skip the loop completely. > > There is no extra performance hit in that case in either the app or the > > driver (since the non-burst driver always returns 1, irrespective of the > > number requested). > > Hmm. I am afraid, There is. > On the app side, the const "1" can not be optimized by the compiler as > on downside it is function pointer based driver interface > On the driver side, the implementation would be for loop based instead > of plain access. > (compiler never can see the const "1" in driver interface) > > We are planning to implement burst mode as kind of emulation mode and > have a different scheme for burst and nonburst. The similar approach we have > taken in introducing rte_event_schedule() and split the responsibility so > that SW driver can work without additional performance overhead and neat > driver interface. > > If you are concerned about the usability part and regression on the SW > driver, then it's not the case, application will use nonburst variant only if > dequeue_depth == 1 and/or explicit case where latency matters. > > On the portability side, we support both case and application if written based > on dequeue_depth it will perform well in both implementations.IMO, There is > no another shortcut for performance optimized application running on different > set of model.I think it is not an issue as, in event model as each cores > identical and main loop can be changed based on dequeue_depth > if needs performance(anyway mainloop will be function pointer based). > Ok, I think I see your point now. Here is an alternative suggestion. 1. Keep the single user API. 2. Have both single and burst function pointers in the driver 3. Call appropriately in the eventdev layer based on parameters. For example: rte_event_dequeue_burst(..., int num) { if (num == 1 && single_dequeue_fn != NULL) return single_dequeue_fn(...); return burst_dequeue_fn(...); } This way drivers can optionally special-case the single dequeue case - the function pointer check will definitely be predictable in HW making that a near-zero-cost check - while not forcing all drivers to do so. It also reduces the public API surface, and gives us a single enqueue and dequeue function. /Bruce