From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7B5A4A0542; Thu, 13 Feb 2020 12:31:06 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 441061BF7A; Thu, 13 Feb 2020 12:31:06 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 35E481BEE5 for ; Thu, 13 Feb 2020 12:31:05 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Feb 2020 03:31:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,436,1574150400"; d="scan'208";a="222620361" Received: from dwdohert-mobl.ger.corp.intel.com (HELO [163.33.176.237]) ([163.33.176.237]) by orsmga007.jf.intel.com with ESMTP; 13 Feb 2020 03:31:03 -0800 To: Jerin Jacob , "Coyle, David" Cc: dpdk-dev , "Trahe, Fiona" References: <1580827512-178449-1-git-send-email-david.coyle@intel.com> From: "Doherty, Declan" Message-ID: <3912d015-8bf8-f3c7-15cf-6e68c6c7515e@intel.com> Date: Thu, 13 Feb 2020 11:31:02 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] Accelerator API to chain packet processing functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 06/02/2020 10:54 AM, Jerin Jacob wrote: > On Thu, Feb 6, 2020 at 3:35 PM Coyle, David wrote: >> >> Hi Jerin, > > Hi David, > >> Thanks for the comments. Please see replies below. >> >> Kind Regards, >> David >> >>> On Tue, Feb 4, 2020 at 8:15 PM David Coyle wrote: >>>> >>>> Introduction >>>> ============ >>>> >>>> This RFC introduces a new DPDK library, rte_accelerator. >>>> >>>> The main aim of this library is to provide a flexible and extensible way of >>> combining one or more packet-processing functions into a single operation, >>> thereby allowing these to be performed in parallel in optimized software >>> libraries or in a hardware accelerator. These functions can include >>> cryptography, compression and CRC/checksum calculation, while others can >>> potentially be added in the future. Performing these functions in parallel as a >>> single operation can enable a significant performance improvement. >>>> >>>> >>>> Background >>>> ========== >>>> >>>> There are a number of byte-wise operations which are present and >>> common across many access network data-plane pipelines, such as Cipher, >>> Authentication, CRC, Bit-Interleaved-Parity (BIP), other checksums etc. Some >>> prototyping has been done at Intel in relation to the 01.org access-network- >>> dataplanes project to prove that a significant performance improvement is >>> possible when such byte-wise operations are combined into a single pass of >>> packet data processing. This performance boost has been prototyped for >>> both XGS-PON MAC data-plane and DOCSIS MAC data-plane pipelines. >>> >>> >>> Could you share the relative performance numbers to show the gain? >> >> [DC] As mentioned above, the main performance gains are when the packet processing operations can be combined into a single pass of the packet. >> Both Crypto-CRC-BIP (for XGS-PON MAC) and Crypto-CRC (for DOCSIS MAC) have been implemented in the AESNI MB library as single pass operation chains. >> >> We have modified the dpdk-crypto-perf-tester as part of our prototyping to test the cases where: >> 1) each packet processing function is done as an independent stage (e.g. calling rte_net_crc for CRC, AESNI MB through rte_cryptodev for cipher, and a C function to calculate the BIP) >> 2) all packet processing functions done as a single-pass operation in AESNI MB through rte_cryptodev >> >> We see the following results for 1024 byte input frames from dpdk-crypto-perf-tester: >> - XGS-PON MAC (Crypto-CRC-BIP): >> - 3 independent stages: 1429 cycles/buf (13.75Gbps) >> - 1 single-pass stage: 896 cycles/buf (21.9Gbps) >> 37% cycle reduction >> >> - DOCSIS MAC (Crypto-CRC): >> - 2 independent stages: 1421 cycles/buf (13.84Gbps) >> - 1 single-pass stage: 1133 cycles/buf (17.34Gbps) >> 20% cycle reduction >> >> Adding the accelerator API will allow vendors gain the benefits of these cycle savings > > Numbers make sense. I have seen a similar performance improvement > doing in one pass with CPU instructions. > > >>>> - XGS-PON MAC: Crypto-CRC-BIP >>>> - Order: >>>> - Downstream: CRC, Encrypt, BIP >>> >>> I understand if the chain has two operations then it may possible to have >>> handcrafted SW code to do both operations in one pass. >>> I understand the spec is agnostic on a number of passes it does require to >>> enable the xfrom but To understand the SW/HW capability, In the above >>> case, "CRC, Encrypt, BIP", It is done in one pass in SW or three passes in SW >>> or one pass using HW? >> >> [DC] The CRC, Encrypt, BIP is also currently done as 1 pass in AESNI MB library SW. >> However, this could also be performed as a single pass in a HW accelerator > > As a specification, cascading the xform chains make sense. > Do we have any HW that does support chaining the xforms more than > "two" in one pass? > i.e real chaining function where two blocks of HWs work hand in hand > for chaining. > If none, it may be better to abstract as synonymous API(No dequeue, no > enqueue) for the CPU use case. > Where you thinking along the lines of a synchronous API option like that just introduced to crytodev? i.e something like uint16_t rte_accelerator_process(struct rte_accelerator_ctx *ctx, struct rte_accelerator_op ops[], uint16_t nb_ops);