From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B99A42082 for ; Fri, 18 Jan 2019 14:13:46 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Jan 2019 05:13:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,491,1539673200"; d="scan'208";a="311526264" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by fmsmga006.fm.intel.com with ESMTP; 18 Jan 2019 05:13:44 -0800 Received: from irsmsx112.ger.corp.intel.com (10.108.20.5) by IRSMSX104.ger.corp.intel.com (163.33.3.159) with Microsoft SMTP Server (TLS) id 14.3.408.0; Fri, 18 Jan 2019 13:13:43 +0000 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.213]) by irsmsx112.ger.corp.intel.com ([169.254.1.84]) with mapi id 14.03.0415.000; Fri, 18 Jan 2019 13:13:43 +0000 From: "Trahe, Fiona" To: Changchun Zhang , "users@dpdk.org" CC: "Trahe, Fiona" Thread-Topic: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK Thread-Index: AQHUrriXjo1+JKJBjkaH5GQU9xCtbKW0/cFQ Date: Fri, 18 Jan 2019 13:13:43 +0000 Message-ID: <348A99DA5F5B7549AA880327E580B435896CD08F@IRSMSX101.ger.corp.intel.com> References: <03fd164b-112b-4e44-a5b0-15c6e3703662@default> In-Reply-To: <03fd164b-112b-4e44-a5b0-15c6e3703662@default> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZTk0YzEyYTYtYjM3Ni00NDdjLWFhZTgtYjZmMzAxZWExNzUxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiV1FibmxBWnlNTW4yOXpsVjBWSHY5SXZHaEhkVHQyS0tsU2dTVHVhUDBaRFwvR0xiUDRBOFQ2alV6Z0dQOThUQ1IifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2019 13:13:47 -0000 Hi Alex, > -----Original Message----- > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Changchun Zhang > Sent: Thursday, January 17, 2019 11:01 PM > To: users@dpdk.org > Subject: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK >=20 > Hi, >=20 >=20 >=20 > I have user question on using the QAT device in the DPDK. >=20 > In the real design, after calling enqueuer_burst() on the specified queue= pair at one of the lcore, > usually which one is usually done? >=20 > 1. should we do run-to-completion to call dequeuer_burst() waiting fo= r the device finishing the > crypto operation, >=20 > 2. or should we do pipe-line, in which we return right after enqueuer= _burst() and release the CPU. > And call dequeuer_burst() on other thread function? >=20 > Option 1 is more like synchronous and can be seen on all the DPDK crypto = examples, while option 2 is > asynchronous which I have never seen in any reference design if I missed = anything. [Fiona]=20 Option 2 is not possible with QAT - the dequeue must be called in the same = thread as the enqueue. This is optimised without atomics for best performance - if this is a problem let u= s know.=20 However best performance is not quite using option 1 and not a synchronous = blocking method.=20 If you enqueue and then go straight to dequeue, you're not getting the best= advantage from the cycles freed up by offloading.=20 i.e. best to enqueue a burst, then go do some other work, like maybe collec= ting more requests for=20 next enqueue or other processing, then dequeue. Take and process whatever o= ps are dequeued - this will not necessarily match up with the number you've enqueued - depends on = how quickly you call the dequeue. Don't wait until all the enqueued ops are dequeued before enqueuing the nex= t batch. SO it's asynchronous. But in the same thread. You'll get best throughput when you keep the input filled up so the device = has operations to work on and regularly dequeue a burst. Dequeuing too often will waste cycles in the ove= rhead calling the API, dequeuing too slowly will cause the device to back up. Ideally tune for your application = to find the sweet spot in between these 2 extremes. =20 =20