From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 7F82F7D6B for ; Thu, 24 Aug 2017 00:57:17 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Aug 2017 15:57:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,417,1498546800"; d="scan'208";a="1007028297" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga003.jf.intel.com with ESMTP; 23 Aug 2017 15:57:11 -0700 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 23 Aug 2017 15:57:09 -0700 Received: from fmsmsx115.amr.corp.intel.com ([169.254.4.190]) by fmsmsx121.amr.corp.intel.com ([169.254.6.252]) with mapi id 14.03.0319.002; Wed, 23 Aug 2017 15:57:08 -0700 From: "Carrillo, Erik G" To: Jerin Jacob , "dev@dpdk.org" CC: "thomas@monjalon.net" , "Richardson, Bruce" , "Van Haaren, Harry" , "hemant.agrawal@nxp.com" , "Eads, Gage" , "nipun.gupta@nxp.com" , "Vangati, Narender" , "Rao, Nikhil" , "pbhagavatula@caviumnetworks.com" , "jianbo.liu@linaro.org" , "rsanford@akamai.com" Thread-Topic: [dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer wheel Thread-Index: AQHTF3OpYdF1dPM5zEi+XvBr2N9EMqKSlrQA Date: Wed, 23 Aug 2017 22:57:08 +0000 Message-ID: References: <20170817161104.24293-1-jerin.jacob@caviumnetworks.com> In-Reply-To: <20170817161104.24293-1-jerin.jacob@caviumnetworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMmE2NjM4YjYtMjBmZi00MWMzLWI1OTItMDk5MTg1YmEwNWYzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Ik5heE4wSzJsQ2lrcDdBYVZyaklrOEQyZDFqbytZekZGNFlkZDBsRzlxUlU9In0= x-ctpclassification: CTP_IC x-originating-ip: [10.1.200.108] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer wheel X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2017 22:57:18 -0000 Hi Jerin, Thanks for sharing your proposal. We have implemented something quite similar locally. In applications that = utilize the eventdev framework, entities we call "bridge drivers" are confi= gured, and they are analogous to service cores.=20 One such bridge driver, the Timer Bridge Driver, runs on an lcore specified= during application startup and, once it is started, will manage set of eve= nt timers, and enqueue timer events into an event device upon their expiry.= To use event timers, the application will allocate them and set a payload= pointer and queue id in pretty much the same way you've shown. Then we ca= ll rte_event_timer_reset() to arm the timer, which will install it in one o= f the rte_timer library's skiplists. Concurrently, the Timer Bridge Driver= will be executing a run() function in a loop, which will repeatedly execut= e rte_timer_manage(). For any timers that have expired, a callback defined= in the bridge driver will execute, and this callback will enqueue a new ev= ent of type TIMER. As workers are dequeuing events, they will encounter th= e timer event and can use the "payload pointer" to get back to some specifi= ed object. Some differences include: - our API doesn't currently support burst-style arming - our API supports periodic timers (since they are free from the rte_timer= lib) Regarding the implementation you describe in the "Implementation thoughts" = section of your email, our timer bridge driver doesn't have a ring in which= it enqueues timer events. Instead the rte_event_timer_reset() is mostly a= wrapper for rte_timer_reset(). Because various worker lcores could be mod= ifying timers concurrently, and they were all headed for the same skiplist,= we encountered performance issues with lock contention for that skiplist. = I modified the timer library itself in various ways, including adding a mu= ltiple-producer single-consumer ring for requests to modify the skiplists. = I had the best results however, when I created per-installer skiplists for= each target lcore. I submitted the patch to the ML today[1]. I personall= y saw contention on the CAS operation to update the head pointer of ring wh= en multiple lcores installed timers repeatedly and simultaneously, but perh= aps burst-enqueuing can avoid that. On a separate note, it also looks like attributes of the timer wheel pertai= ning to resolution or max number of timers will have no effect in the softw= are implementation. Instead, timer resolution would be a function of the f= requency with which a service core can invoke rte_timer_manage. Also, it w= ould seem that no limit on the number of timers would be necessary. Does t= hat sound right? In summary, it looks like our solutions align fairly well, and I propose th= at we take on the software implementation if there are no objections. [1] http://dpdk.org/ml/archives/dev/2017-August/073317.html Thanks, Gabriel > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob > Sent: Thursday, August 17, 2017 11:11 AM > To: dev@dpdk.org > Cc: thomas@monjalon.net; Richardson, Bruce > ; Van Haaren, Harry > ; hemant.agrawal@nxp.com; Eads, Gage > ; nipun.gupta@nxp.com; Vangati, Narender > ; Rao, Nikhil ; > pbhagavatula@caviumnetworks.com; jianbo.liu@linaro.org; > rsanford@akamai.com; Jerin Jacob > Subject: [dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer > wheel >=20 > Some of the NPU class of networking hardwares has timer hardware where > the user can arm and cancel the event timer. On the expiry of the timeout > time, the hardware will post the notification as an event to eventdev HW, > Instead of calling a callback like CPU based timer scheme. It enables, > highresolution (1us or so) timer management using internal or external cl= ock > domains, and offloading the timer housing keeping work from the worker > lcores. >=20 > This RFC attempts to abstract such NPU class of timer Hardware and > introduce event timer wheel subsystem inside the eventdev as they are > tightly coupled. >=20 > This RFC introduces the functionality to create an event timer wheel. Thi= s > allows an application to arm event timers, which shall enqueue an event t= o a > specified event queue on expiry of a given interval. >=20 > The event timer wheel uses an ops table to which the various event device= s > (e.g Cavium Octeontx, NXP dpaa2 and SW) register timer subsystem > implementation specific ops to use. >=20 > The RFC extends DPDK event based programming model where event can be > of type timer, and expiry event will be notified through CPU over eventde= v > ports. >=20 > Some of the use cases of event timer wheel are Beacon Timers, Generic SW > Timeout, Wireless MAC Scheduling, 3G Frame Protocols, Packet Scheduling, > Protocol Retransmission Timers, Supervision Timers. > All these use cases require high resolution and low time drift. >=20 > The abstract working model of an event timer wheel is as follows: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D > timer_tick_ns > + > +-------+ | > | | | > +-------+ bkt 0 +----v---+ > | | | | > | +-------+ | > +---+---+ +---+---+ +---+---+---+---+ > | | | | | | | | | > | bkt n | | bkt 1 |<-> t0| t1| t2| tn| > | | | | | | | | | > +---+---+ +---+---+ +---+---+---+---+ > | Timer wheel | > +---+---+ +---+---+ > | | | | > | bkt 4 | | bkt 2 |<--- Current bucket > | | | | > +---+---+ +---+---+ > | +-------+ | > | | | | > +------+ bkt 3 +-------+ > | | > +-------+ >=20 > - It has a virtual monotonically increasing 64-bit timer wheel clock bas= ed on > *enum rte_event_timer_wheel_clk_src* clock source. The clock source > could > be a CPU clock, or a platform depended external clock. >=20 > - Application creates a timer wheel instance with given clock source, > the total number of event timers, resolution(expressed in ns) to trave= rse > between the buckets. >=20 > - Each timer wheel may have 0 to n buckets based on the configured > max timeout(max_tmo_ns) and resolution(timer_tick_ns). On timer wheel > start, the timer starts ticking at *timer_tick_ns* resolution. >=20 > - Application arms an event timer to be expired at the number of > *timer_tick_ns* from now. >=20 > - Application can cancel the existing armed timer if required. >=20 > - If not canceled by the application and the timer expires then the libr= ary > injects the timer expiry event to the designated event queue. >=20 > - The timer expiry event will be received through > *rte_event_dequeue_burst* >=20 > - Application frees the created timer wheel instance. >=20 > A more detailed description of the event timer wheel is contained in the > header's comments. >=20 > Implementation thoughts > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > The event devices have to provide a driver level function that is used to= get > event timer subsystem capability and the respective event timer wheel ops= . > if the event device is not capable a software implementation of the event > timer wheel ops will be selected. >=20 > The software implementation of timer wheel will make use of existing > rte_timer[1], rte_ring library and EAL service cores[2] to achieve event > generation. The worker cores call event timer arm function which enqueues > event timer to a rte_ring. The registered service core would then dequeue > event timer from rte_ring and use the rte_timer library to register a tim= er. > The service core then invokes rte_timer_manage() function to retrieve > expired timers and generates the associated event. >=20 > The implementation of event timer wheel subsystem for both hardware > (Cavium > OCTEONTX) and software(if there are no volunteers) will be undertaken by > Cavium. >=20 > [1] http://dpdk.org/doc/guides/prog_guide/timer_lib.html > [2] http://dpdk.org/ml/archives/dev/2017-May/065207.html >=20 > An example code snippet to show the proposed API usage > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > example: TCP Retransmission in abstract form. >=20 > uint8_t > configure_event_dev(...) > { > /* Create the event device. */ > const struct rte_event_dev_config config =3D { > .nb_event_queues =3D 1, > /* Event device related configuration. */ > ... > }; >=20 > rte_event_dev_configure(event_dev_id, &config); > /* Event queue and port configuration. */ > ... > /* Start the event device.*/ > rte_event_dev_start(event_dev_id); > } >=20 > #define NSECPERSEC 1E9 // No of ns for 1 sec > uint8_t > configure_event_timer_wheel(...) > { > /* Create an event timer wheel for reliable connections. */ > const struct rte_event_timer_wheel_config wheel_config =3D { > .event_dev_id =3D event_dev_id, > .timer_wheel_id =3D 0, > .clk_src =3D RTE_EVENT_TIMER_WHEEL_CPU_CLK, > .timer_tick_ns =3D NSECPERSEC / 10, // 100 milliseconds > .max_tmo_nsec =3D 180 * NSECPERSEC // 2 minutes > .nb_timers =3D 40000, // Number of timers that the wheel can > hold. > .timer_wheel_flags =3D 0, > }; > struct rte_event_timer_wheel *wheel =3D NULL; > wheel =3D rte_event_timer_wheel_create(&wheel_config); > if (wheel =3D=3D NULL) > { > /* Failed to create event timer wheel. */ > ... > return false; >=20 > } > /* Start the event timer wheel. */ > rte_event_timer_wheel_start(wheel); >=20 > /* Create a mempool of event timers. */ > struct rte_mempool *event_timer_pool =3D NULL; >=20 > event_timer_pool =3D > rte_mempool_create("event_timer_mempool", SIZE, > sizeof(struct rte_event_timer), ...); > if (event_timer_pool =3D=3D NULL) > { > /* Failed to create event timer mempool. */ > ... > return false; > } > } >=20 >=20 > uint8_t > process_tcp_data_packet(...) > { > /*Classify based on type*/ > switch (...) { > case ...: > /* Setting up a new connection (Protocol dependent.) */ > ... > /* Setting up a new event timer. */ > conn->timer =3D NULL > rte_mempool_get(event_timer_pool, (void **)&conn- > >timer); > if (timer =3D=3D NULL) { > /* Failed to get event timer instance. */ > /* Tear down the connection */ > return false; > } >=20 > /* Set up the timer event. */ > conn->timer->ev.u64 =3D conn; > conn->timer->ev.queue_id =3D event_queue_id; > ... > /* All necessary resources successfully allocated */ > /* Compute the timer timeout ticks */ > conn->timer->timeout_ticks =3D 30; //3 sec Per RFC1122(TCP > returns) > /* Arm the timer with our timeout */ > ret =3D rte_event_timer_arm_burst(wheel, &conn->timer, 1); > if (ret !=3D 1) { > /* Check return value for too early or too late > expiration > * tick */ > ... > return false; > } > return true; > case ...: > /* Ack for the previous tcp data packet has been received.*/ > /* cancel the retransmission timer*/ > rte_event_timer_cancel_burst(wheel, &conn->timer, 1); > break; > } > } >=20 > uint8_t > process_timer_event(...) > { > /* A retransmission timeout for the connection has been received. > */ > conn =3D ev.event_ptr; > /* Retransmit last packet (e.g. TCP segment). */ > ... > /* Re-arm timer using original values. */ > rte_event_timer_arm_burst(wheel_id, &conn->timer, 1); } >=20 > void > events_processing_loop(...) > { > while (...) { > /* Receive events from the configured event port. */ > rte_event_dequeue_burst(event_dev_id, event_port, &ev, > 1, 0); > ... > /* Classify events based on event_type. */ > switch(ev.event_type) { > case RTE_EVENT_TYPE_ETHDEV: > ... > process_packets(...); > break; > case RTE_EVENT_TYPE_TIMER: > process_timer_event(ev); > ... > break; > } > } > } >=20 > int main() > { >=20 > configure_event_dev(); > configure_event_timer_wheel(); > on_each_worker_lcores(events_processing_loop()) > } >=20 > Jerin Jacob (1): > eventtimer: introduce event timer wheel >=20 > doc/api/doxy-api-index.md | 3 +- > lib/librte_eventdev/Makefile | 1 + > lib/librte_eventdev/rte_event_timer_wheel.h | 493 > ++++++++++++++++++++++++++++ > lib/librte_eventdev/rte_eventdev.h | 4 +- > 4 files changed, 498 insertions(+), 3 deletions(-) create mode 100644 > lib/librte_eventdev/rte_event_timer_wheel.h >=20 > -- > 2.14.1