From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id C95A7379B for ; Fri, 25 Nov 2016 12:59:33 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 25 Nov 2016 03:59:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,547,1473145200"; d="scan'208";a="790639393" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by FMSMGA003.fm.intel.com with ESMTP; 25 Nov 2016 03:59:31 -0800 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 25 Nov 2016 11:59:30 +0000 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.79]) by irsmsx155.ger.corp.intel.com ([169.254.14.237]) with mapi id 14.03.0248.002; Fri, 25 Nov 2016 11:59:30 +0000 From: "Van Haaren, Harry" To: Jerin Jacob , Thomas Monjalon CC: "dev@dpdk.org" , "Richardson, Bruce" , "hemant.agrawal@nxp.com" , "Eads, Gage" Thread-Topic: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model Thread-Index: AQHSQV8GpbuNH8kNckGrh1zby4ltmKDm7wKAgAB69ICAAOQxAIAAk2sAgADANqA= Date: Fri, 25 Nov 2016 11:59:29 +0000 Message-ID: References: <1479447902-3700-1-git-send-email-jerin.jacob@caviumnetworks.com> <3691745.y1f1NvKTEv@xps13> <20161124015912.GA13508@svelivela-lt.caveonetworks.com> <1883454.103LptOkIX@xps13> <20161125002334.GA21048@svelivela-lt.caveonetworks.com> In-Reply-To: <20161125002334.GA21048@svelivela-lt.caveonetworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYTRmNDQwZTAtZjA1MS00YzAyLWFlYmQtMzU2YzczNDM1YTlkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6ImlveUhKbDVWTzkxYkJLRStSQURQWHRFUnBjWXB4eUtNUVpFOHd1OWdyclE9In0= x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Nov 2016 11:59:34 -0000 Hi All, > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Friday, November 25, 2016 12:24 AM > To: Thomas Monjalon > Cc: dev@dpdk.org; Richardson, Bruce ; Van Haa= ren, Harry > ; hemant.agrawal@nxp.com; Eads, Gage > Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven prog= ramming model >=20 > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote: > > 2016-11-24 07:29, Jerin Jacob: > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote: > > > > 2016-11-18 11:14, Jerin Jacob: > > > > > +Eventdev API - EXPERIMENTAL > > > > > +M: Jerin Jacob > > > > > +F: lib/librte_eventdev/ > > > > > > > > OK to mark it experimental. > > > > What is the plan to remove the experimental word? > > > > > > IMO, EXPERIMENTAL status can be changed when > > > - At least two event drivers available(Intel and Cavium are working o= n > > > SW and HW event drivers) > > > - Functional test applications are fine with at least two drivers > > > - Portable example application to showcase the features of the librar= y > > > - eventdev integration with another dpdk subsystem such as ethdev > > > > > > Thoughts?. I am not sure the criteria used in cryptodev case. > > > > Sounds good. > > We will be more confident when drivers and tests will be implemented. > > > > I think the roadmap for the SW driver targets the release 17.05. > > Do you still plan 17.02 for this API and the Cavium driver? >=20 > No. 17.02 too short for up-streaming the Cavium driver.However, I think A= PI and > skeleton event driver can go in 17.02 if there are no objections. >=20 > > > > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton > > > > > +/**< Skeleton event device PMD name */ > > > > > > > > I do not understand this #define. > > > > > > Applications can explicitly request the a specific driver though driv= er > > > name. This will go as argument to rte_event_dev_get_dev_id(const char= *name). > > > The reason for keeping this #define in rte_eventdev.h is that, > > > application needs to include only rte_eventdev.h not rte_eventdev_pmd= .h. > > > > So each driver must register its name in the API? > > Is it really needed? >=20 > Otherwise how application knows the name of the driver. > The similar scheme used in cryptodev. > http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53 > No strong opinion here. Open for suggestions. >=20 > > > > > > > +struct rte_event_dev_config { > > > > > + uint32_t dequeue_wait_ns; > > > > > + /**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this = device. > > > > > > > > Please explain exactly when the wait occurs and why. > > > > > > Here is the explanation from rte_event_dequeue() API definition, > > > - > > > @param wait > > > 0 - no-wait, returns immediately if there is no event. > > > >0 - wait for the event, if the device is configured with > > > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until > > > the event available or *wait* time. > > > if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WA= IT > > > then this function will wait until the event available or *dequeue_wa= it_ns* > > > ^^^^^^^^^^^^^^^= ^^^^^^^ > > > ns which was previously supplied to rte_event_dev_configure() > > > - > > > This is provides the application to have control over, how long the > > > implementation should wait if event is not available. > > > > > > Let me know what exact changes are required if details are not enough= in > > > rte_event_dequeue() API definition. > > > > Maybe that timeout would be a better name. > > It waits only if there is nothing in the queue. > > It can be interesting to highlight in this comment that this parameter > > makes the dequeue function a blocking call. >=20 > OK. I will change to timeout then >=20 > > > > > > > +/** Event port configuration structure */ > > > > > +struct rte_event_port_conf { > > > > > + int32_t new_event_threshold; > > > > > + /**< A backpressure threshold for new event enqueues on this po= rt. > > > > > + * Use for *closed system* event dev where event capacity is li= mited, > > > > > + * and cannot exceed the capacity of the event dev. > > > > > + * Configuring ports with different thresholds can make higher = priority > > > > > + * traffic less likely to be backpressured. > > > > > + * For example, a port used to inject NIC Rx packets into the e= vent dev > > > > > + * can have a lower threshold so as not to overwhelm the device= , > > > > > + * while ports used for worker pools can have a higher threshol= d. > > > > > + * This value cannot exceed the *nb_events_limit* > > > > > + * which previously supplied to rte_event_dev_configure() > > > > > + */ > > > > > + uint8_t dequeue_depth; > > > > > + /**< Configure number of bulk dequeues for this event port. > > > > > + * This value cannot exceed the *nb_event_port_dequeue_depth* > > > > > + * which previously supplied to rte_event_dev_configure() > > > > > + */ > > > > > + uint8_t enqueue_depth; > > > > > + /**< Configure number of bulk enqueues for this event port. > > > > > + * This value cannot exceed the *nb_event_port_enqueue_depth* > > > > > + * which previously supplied to rte_event_dev_configure() > > > > > + */ > > > > > +}; > > > > > > > > The depth configuration is not clear to me. > > > > > > Basically the maximum number of events can be enqueued/dequeued at ti= me > > > from a given event port. depth of one =3D=3D non burst mode. > > > > OK so depth is the queue size. Please could you reword? >=20 > OK >=20 > > > > > > > +/* Event types to classify the event source */ > > > > > > > > Why this classification is needed? > > > > > > This for application pipeling and the cases like, if application want= s to know which > > > subsystem generated the event. > > > > > > example packet forwarding loop on the worker cores: > > > while(1) { > > > ev =3D dequeue() > > > // event from ethdev subsystem > > > if (ev.event_type =3D=3D RTE_EVENT_TYPE_ETHDEV) { > > > - swap the mac address > > > - push to atomic queue for ingress flow order maintenance > > > by CORE > > > /* events from core */ > > > } else if (ev.event_type =3D=3D RTE_EVENT_TYPE_CORE) { > > > > > > } > > > enqueue(ev); > > > } > > > > I don't know why but I feel this classification is weak. > > You need to track the source of the event. Does it make sense to go bey= ond > > and identify the source device? >=20 > No, dequeue has dev_id argument, so event comes only from that device >=20 > > > > > > > +#define RTE_EVENT_TYPE_ETHDEV 0x0 > > > > > +/**< The event generated from ethdev subsystem */ > > > > > +#define RTE_EVENT_TYPE_CRYPTODEV 0x1 > > > > > +/**< The event generated from crypodev subsystem */ > > > > > +#define RTE_EVENT_TYPE_TIMERDEV 0x2 > > > > > +/**< The event generated from timerdev subsystem */ > > > > > +#define RTE_EVENT_TYPE_CORE 0x3 > > > > > +/**< The event generated from core. > > > > > > > > What is core? > > > > > > The event are generated by lcore for pipeling. Any suggestion for > > > better name? lcore? > > > > What about CPU or SW? >=20 > No strong opinion here. I will go with CPU then +1 for CPU (as SW is the software PMD name). > > > > > + /**< Opaque event pointer */ > > > > > + struct rte_mbuf *mbuf; > > > > > + /**< mbuf pointer if dequeued event is associated with mbuf */ > > > > > > > > How do we know that an event is associated with mbuf? > > > > > > By looking at the event source/type RTE_EVENT_TYPE_* > > > > > > > Does it mean that such events are always converted into mbuf even i= f the > > > > application does not need it? > > > > > > Hardware has dependency on getting physical address of the event, so = any > > > struct that has "phys_addr_t buf_physaddr" works. > > > > I do not understand. >=20 > In HW based implementations, the event pointer will be submitted to HW. > As you know, since HW can't understand the virtual address and it needs > to converted to the physical address, any DPDK object that provides phys_= addr_t > such as mbuf can be used with libeventdev. >=20 > > > > I tought that decoding the event would be the responsibility of the app > > by calling a function like > > rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *). >=20 > It can be. But it is costly.i.e Yet another function pointer based > driver interface on fastpath. Instead, if the driver itself can > convert to mbuf(in case of ETHDEV device) and tag the source/event type > as RTE_EVENT_TYPE_ETHDEV. > IMO the proposed schemed helps in SW based implementation as their no rea= l > mbuf conversation. Something we can revisit in ethdev integration if > required. >=20 > > > > > > > +struct rte_eventdev_driver; > > > > > +struct rte_eventdev_ops; > > > > > > > > I think it is better to split API and driver interface in two files= . > > > > (we should do this split in ethdev) > > > > > > I thought so, but then the "static inline" versions of northbound > > > API(like rte_event_enqueue) will go another file(due to the fact that > > > implementation need to deference "dev->data->ports[port_id]"). Do you= want that way? > > > I would like to keep all northbound API in rte_eventdev.h and not any= of them > > > in rte_eventdev_pmd.h. > > > > My comment was confusing. > > You are doing 2 files, one for API (what you call northbound I think) > > and the other one for driver interface (what you call southbound I thin= k), > > it's very fine. > > > > > > > +/** > > > > > + * Enqueue the event object supplied in the *rte_event* structur= e on an > > > > > + * event device designated by its *dev_id* through the event por= t specified by > > > > > + * *port_id*. The event object specifies the event queue on whic= h this > > > > > + * event will be enqueued. > > > > > + * > > > > > + * @param dev_id > > > > > + * Event device identifier. > > > > > + * @param port_id > > > > > + * The identifier of the event port. > > > > > + * @param ev > > > > > + * Pointer to struct rte_event > > > > > + * > > > > > + * @return > > > > > + * - 0 on success > > > > > + * - <0 on failure. Failure can occur if the event port's outpu= t queue is > > > > > + * backpressured, for instance. > > > > > + */ > > > > > +static inline int > > > > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_ev= ent *ev) > > > > > > > > Is it really needed to have non-burst variant of enqueue/dequeue? > > > > > > Yes. certain HW can work only with non burst variants. > > > > Same comment as Bruce, we must keep only the burst variant. > > We cannot have different API for different HW. >=20 > I don't think there is any portability issue here, I can explain. >=20 > The application level, we have two more use case to deal with non burst > variant >=20 > - latency critical work > - on dequeue, if application wants to deal with only one flow(i.e to > avoid processing two different application flows to avoid cache trashin= g) >=20 > Selection of the burst variants will be based on > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_por= t_dequeue_depth, > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_= enqueue_depth ) > So I don't think their is portability issue here and I don't want to wast= e my > CPU cycles on the for loop if application known to be working with non > bursts variant like below >=20 > nb_events =3D rte_event_dequeue_burst(); > for(i=3D0; i < nb_events; i++){ > process ev[i] > } >=20 > And mostly importantly the NPU can get almost same throughput > without burst variant so why not? Perhaps I'm mis-understanding, but can you not just dequeue 1 from the burs= t() function? struct rte_event ev; rte_event_dequeue_burst(dev, port, &ev, 1, wait); process( &ev ); I mean, if an application *demands* to not use bursts, the above allows it.= Of course it won't scale to other implementations that would benefit from = burst - but that's the application authors choice? > > > > > +/** > > > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue() > > > > > + * > > > > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEU= E_WAIT flag then > > > > > + * application can use this function to convert wait value in na= noseconds to > > > > > + * implementations specific wait value supplied in rte_event_deq= ueue() > > > > > > > > Why is it implementation-specific? > > > > Why this conversion is not internal in the driver? > > > > > > This is for performance optimization, otherwise in drivers > > > need to convert ns to ticks in "fast path" > > > > So why not defining the unit of this timeout as CPU cycles like the one= s > > returned by rte_get_timer_cycles()? >=20 > Because HW co-processor can run in different clock domain. Need not be at > CPU frequency.