From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 06EB414EC for ; Fri, 18 Jan 2019 15:29:06 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Jan 2019 06:29:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,491,1539673200"; d="scan'208";a="126873934" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by FMSMGA003.fm.intel.com with ESMTP; 18 Jan 2019 06:29:05 -0800 Received: from fmsmsx153.amr.corp.intel.com (10.18.125.6) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.408.0; Fri, 18 Jan 2019 06:29:05 -0800 Received: from fmsmsx108.amr.corp.intel.com ([169.254.9.99]) by FMSMSX153.amr.corp.intel.com ([169.254.9.27]) with mapi id 14.03.0415.000; Fri, 18 Jan 2019 06:29:05 -0800 From: "Pathak, Pravin" To: "Trahe, Fiona" , Changchun Zhang , "users@dpdk.org" CC: "Trahe, Fiona" Thread-Topic: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK Thread-Index: AQHUrriWdZls45mepEygLwQ2ipLV0aW1h/CA//+NNqA= Date: Fri, 18 Jan 2019 14:29:04 +0000 Message-ID: <168A68C163D584429EF02A476D5274424DEA9B7C@FMSMSX108.amr.corp.intel.com> References: <03fd164b-112b-4e44-a5b0-15c6e3703662@default> <348A99DA5F5B7549AA880327E580B435896CD08F@IRSMSX101.ger.corp.intel.com> In-Reply-To: <348A99DA5F5B7549AA880327E580B435896CD08F@IRSMSX101.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZTk0YzEyYTYtYjM3Ni00NDdjLWFhZTgtYjZmMzAxZWExNzUxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiV1FibmxBWnlNTW4yOXpsVjBWSHY5SXZHaEhkVHQyS0tsU2dTVHVhUDBaRFwvR0xiUDRBOFQ2alV6Z0dQOThUQ1IifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.1.200.107] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2019 14:29:07 -0000 Hi Alex - -----Original Message----- From: users [mailto:users-bounces@dpdk.org] On Behalf Of Trahe, Fiona Sent: Friday, January 18, 2019 8:14 AM To: Changchun Zhang ; users@dpdk.org Cc: Trahe, Fiona Subject: Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPD= K Hi Alex, > -----Original Message----- > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Changchun=20 > Zhang > Sent: Thursday, January 17, 2019 11:01 PM > To: users@dpdk.org > Subject: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in=20 > DPDK >=20 > Hi, >=20 >=20 >=20 > I have user question on using the QAT device in the DPDK. >=20 > In the real design, after calling enqueuer_burst() on the specified=20 > queue pair at one of the lcore, usually which one is usually done? >=20 > 1. should we do run-to-completion to call dequeuer_burst() waiting fo= r the device finishing the > crypto operation, >=20 > 2. or should we do pipe-line, in which we return right after enqueuer= _burst() and release the CPU. > And call dequeuer_burst() on other thread function? >=20 > Option 1 is more like synchronous and can be seen on all the DPDK=20 > crypto examples, while option 2 is asynchronous which I have never seen i= n any reference design if I missed anything. [Fiona] Option 2 is not possible with QAT - the dequeue must be called in the same = thread as the enqueue. This is optimised without atomics for best performan= ce - if this is a problem let us know.=20 However best performance is not quite using option 1 and not a synchronous = blocking method.=20 If you enqueue and then go straight to dequeue, you're not getting the best= advantage from the cycles freed up by offloading.=20 i.e. best to enqueue a burst, then go do some other work, like maybe collec= ting more requests for next enqueue or other processing, then dequeue. Take= and process whatever ops are dequeued - this will not necessarily match up= with the number you've enqueued - depends on how quickly you call the dequ= eue. Don't wait until all the enqueued ops are dequeued before enqueuing the nex= t batch. SO it's asynchronous. But in the same thread. You'll get best throughput when you keep the input filled up so the device = has operations to work on and regularly dequeue a burst. Dequeuing too ofte= n will waste cycles in the overhead calling the API, dequeuing too slowly w= ill cause the device to back up. Ideally tune for your application to find = the sweet spot in between these 2 extremes. =20 [Pravin] I faced exact same issue while moving from software crypto to HW. I impleme= nted option Fiona suggested. =20 Thread enqueues to crypto engine and goes back to other work. It periodical= ly polls crypto to see if work is finished. As we have a single thread running, it keeps doing queuing as work arrives = and de-queuing as results are ready while in between doing other stuff. To keep track of packets, I put some ID into crypto operation private data.