From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 48D1BA04FA; Thu, 6 Feb 2020 11:55:13 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8B78C1C0B4; Thu, 6 Feb 2020 11:55:12 +0100 (CET) Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) by dpdk.org (Postfix) with ESMTP id 4C1F61C0AD for ; Thu, 6 Feb 2020 11:55:11 +0100 (CET) Received: by mail-il1-f180.google.com with SMTP id s18so4701173iln.0 for ; Thu, 06 Feb 2020 02:55:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7CWOG7NnY3NiR9MUlT0tfhI01oT4KA0JWAPgSJMADN4=; b=hc6wKg/nPdJHZ6FOHfYIN/2uTYbaerF/U90K71/kdY4b2tfp2eU8hXYfJGTU0FWfMR 8UI0gHc2vRwFg4rLYq3o7gjBqTMfR0IVwplNncvr6Sc7RhSiWs10jv7SbX/pSMw+OKku oIfNRGs3Lvs51cx8iqCuFEiRHrFLpDpCFRtuT6oSddf7YIqD6HKAlBjHJ81uXtbMZEBp DL0MHRNPPgspJsBieTjK9+4Hfk69sVJXrt2CFp7qf/fQ8GwD9ablgtx8GggR134rVd5V rILQx6DZwPo0Ec3utfHQEkin8dzwGLXhvuLaNZbA1VE7e4d+DHH9uq0wKUxrc7OJDqPH 6cMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7CWOG7NnY3NiR9MUlT0tfhI01oT4KA0JWAPgSJMADN4=; b=Lkunw3KFYcn7cLvDCOwSaQ7cLVXtMn0JsNzdPVc3LcK/aBREgFjeL6DSDcwkMYWq2H 0kxWJ5bQVvgQVbLAefQZXzpTjiIIz0KFqLEJjwhrVm9+jcVBuAE2B/MMMtoXm8h6WJ2O jd+CD0M+dujEOArzDWxWTzGRoTqqIxjyTdlljez9uBoJp/47wMpgzkh3XbxfbS8+cvd3 m4liTJpmRiWUEJ4UXl3cu2OU6UBbH1GeRIMGSphf/8f77f4xU8i1VW0en/NA+8NYYLlR FTPr5MZJK4JasQFBTyJmF/rVoMY9TSDIQhtATF49ev0AvapxoymfjxOzr3KpxO+SnAhw ul/A== X-Gm-Message-State: APjAAAWQWt0aEb+jeH9ZAorfe66qfXpbxbbdI3iVMlGyGyP+1zR1EG/8 mpRh/UFZCLvnqBElr8xPZ/y/cMCh1QQ+4+MGYzo= X-Google-Smtp-Source: APXvYqxhoRzdlMAFdNQABQjqhCgcIM2yGFpbviVdpAP30UabygLr5OFikk8P9pO8laGBq+2xo2fex8d8quQJJEq1Cvc= X-Received: by 2002:a92:481d:: with SMTP id v29mr3141429ila.271.1580986510405; Thu, 06 Feb 2020 02:55:10 -0800 (PST) MIME-Version: 1.0 References: <1580827512-178449-1-git-send-email-david.coyle@intel.com> In-Reply-To: From: Jerin Jacob Date: Thu, 6 Feb 2020 16:24:54 +0530 Message-ID: To: "Coyle, David" Cc: dpdk-dev , "Doherty, Declan" , "Trahe, Fiona" Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [RFC] Accelerator API to chain packet processing functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Feb 6, 2020 at 3:35 PM Coyle, David wrote: > > Hi Jerin, Hi David, > Thanks for the comments. Please see replies below. > > Kind Regards, > David > > > On Tue, Feb 4, 2020 at 8:15 PM David Coyle wrote: > > > > > > Introduction > > > ============ > > > > > > This RFC introduces a new DPDK library, rte_accelerator. > > > > > > The main aim of this library is to provide a flexible and extensible way of > > combining one or more packet-processing functions into a single operation, > > thereby allowing these to be performed in parallel in optimized software > > libraries or in a hardware accelerator. These functions can include > > cryptography, compression and CRC/checksum calculation, while others can > > potentially be added in the future. Performing these functions in parallel as a > > single operation can enable a significant performance improvement. > > > > > > > > > Background > > > ========== > > > > > > There are a number of byte-wise operations which are present and > > common across many access network data-plane pipelines, such as Cipher, > > Authentication, CRC, Bit-Interleaved-Parity (BIP), other checksums etc. Some > > prototyping has been done at Intel in relation to the 01.org access-network- > > dataplanes project to prove that a significant performance improvement is > > possible when such byte-wise operations are combined into a single pass of > > packet data processing. This performance boost has been prototyped for > > both XGS-PON MAC data-plane and DOCSIS MAC data-plane pipelines. > > > > > > Could you share the relative performance numbers to show the gain? > > [DC] As mentioned above, the main performance gains are when the packet processing operations can be combined into a single pass of the packet. > Both Crypto-CRC-BIP (for XGS-PON MAC) and Crypto-CRC (for DOCSIS MAC) have been implemented in the AESNI MB library as single pass operation chains. > > We have modified the dpdk-crypto-perf-tester as part of our prototyping to test the cases where: > 1) each packet processing function is done as an independent stage (e.g. calling rte_net_crc for CRC, AESNI MB through rte_cryptodev for cipher, and a C function to calculate the BIP) > 2) all packet processing functions done as a single-pass operation in AESNI MB through rte_cryptodev > > We see the following results for 1024 byte input frames from dpdk-crypto-perf-tester: > - XGS-PON MAC (Crypto-CRC-BIP): > - 3 independent stages: 1429 cycles/buf (13.75Gbps) > - 1 single-pass stage: 896 cycles/buf (21.9Gbps) > 37% cycle reduction > > - DOCSIS MAC (Crypto-CRC): > - 2 independent stages: 1421 cycles/buf (13.84Gbps) > - 1 single-pass stage: 1133 cycles/buf (17.34Gbps) > 20% cycle reduction > > Adding the accelerator API will allow vendors gain the benefits of these cycle savings Numbers make sense. I have seen a similar performance improvement doing in one pass with CPU instructions. > > > - XGS-PON MAC: Crypto-CRC-BIP > > > - Order: > > > - Downstream: CRC, Encrypt, BIP > > > > I understand if the chain has two operations then it may possible to have > > handcrafted SW code to do both operations in one pass. > > I understand the spec is agnostic on a number of passes it does require to > > enable the xfrom but To understand the SW/HW capability, In the above > > case, "CRC, Encrypt, BIP", It is done in one pass in SW or three passes in SW > > or one pass using HW? > > [DC] The CRC, Encrypt, BIP is also currently done as 1 pass in AESNI MB library SW. > However, this could also be performed as a single pass in a HW accelerator As a specification, cascading the xform chains make sense. Do we have any HW that does support chaining the xforms more than "two" in one pass? i.e real chaining function where two blocks of HWs work hand in hand for chaining. If none, it may be better to abstract as synonymous API(No dequeue, no enqueue) for the CPU use case.