From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8A293A0C43; Fri, 22 Oct 2021 15:40:21 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4A2594114A; Fri, 22 Oct 2021 15:40:21 +0200 (CEST) Received: from mail-il1-f176.google.com (mail-il1-f176.google.com [209.85.166.176]) by mails.dpdk.org (Postfix) with ESMTP id 3275E41149 for ; Fri, 22 Oct 2021 15:40:20 +0200 (CEST) Received: by mail-il1-f176.google.com with SMTP id w15so2815184ilv.5 for ; Fri, 22 Oct 2021 06:40:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=EakybmsztMROxeTaHw59V3I0FFd6hFr0qI12jfaaHjM=; b=cDj6cgktsF0oEIst2IEClMB9B0q05awNKSxGI7BJ4lS4yQqCK4suGtMLq7Ku1Nug+8 izPsJQsbgWTv/R/BhrDyTuBS+FGyDSc1zpJC05B1U3sXrhnridjCY6vGtnIBtak7wMoC 1KTyVflQh9EiikErXdRTZe140iHXqIboEPLiBy6ArS5CU9shE+Iq6mDCJD1DoR3PRlta mmkIKALVPrYrt7ivAftcN9jW33qbLqkB7iQAEkK2hgWTo2cLpobNdGgzugRe0zYDeYv3 kqNesCGtbmSgTNYndDgKs4bpA7hh54G9OrQwtzUJrhWDXW04HysTQiVoa217r9ReBVTQ 8KjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=EakybmsztMROxeTaHw59V3I0FFd6hFr0qI12jfaaHjM=; b=TQSe+mbVgdkwkHrSnQ8lEmrF7Fo8r3mfEEDj9yl6wF7qhjewVHrSsFiC7sne8PXKJa DYk1cfJN4/r4WhvEr/EtWmuQicyLkQhGFCdRacva4N50OkZ+erMKFwKGeKfQtutCiUcf DbPcDOqTvLyX66qLNRDvGNX2vyVEjP/51JCSOQNWRLO3VnVziITLrVptOMvtjbT1ZLRS sBNAYdkt3amg9ept4k/DxjYvzhv14PJtGsfzKJ7FrrpY+8TbL/4OhTyE7NvaPL+1zMq/ TCnsJpyIE1Owm3pNlMIdszaluU2yVVjfQ+STF2la/vnxX8cZ+Stzvgb/iZJE2feC3ttm RrkA== X-Gm-Message-State: AOAM530T4UKjUmYux5r1KIZlwBMWA3bfE9bN/YO47+88Wa1qXKWx6d56 qNuGZUsRr6xTVhsEqVEp/mo5YqJ5PyhepPVfrGI= X-Google-Smtp-Source: ABdhPJwWs70eK5moigYtd9OnNfaPIs/G3RmIGDM1WML8g8+vi3vMjQTWTdVl4dFXW0puHMa87Ztq2nohJ88TFEOOsho= X-Received: by 2002:a05:6e02:1d8b:: with SMTP id h11mr58335ila.94.1634910019414; Fri, 22 Oct 2021 06:40:19 -0700 (PDT) MIME-Version: 1.0 References: <20211019181459.1709976-1-jerinj@marvell.com> <6870595.fqHcUuC4to@thomas> In-Reply-To: From: Jerin Jacob Date: Fri, 22 Oct 2021 19:09:52 +0530 Message-ID: To: Elena Agostini Cc: NBU-Contact-Thomas Monjalon , Jerin Jacob , dpdk-dev , Ferruh Yigit , Ajit Khaparde , Andrew Boyer , Andrew Rybchenko , Beilei Xing , "Richardson, Bruce" , Chas Williams , "Xia, Chenbo" , Ciara Loftus , Devendra Singh Rawat , Ed Czeck , Evgeny Schemeilin , Gaetan Rivet , Gagandeep Singh , Guoyang Zhou , Haiyue Wang , Harman Kalra , "heinrich.kuhn@corigine.com" , Hemant Agrawal , Hyong Youb Kim , Igor Chauskin , Igor Russkikh , Jakub Grajciar , Jasvinder Singh , Jian Wang , Jiawen Wu , Jingjing Wu , John Daley , John Miller , "John W. Linville" , "Wiles, Keith" , Kiran Kumar K , Lijun Ou , Liron Himi , NBU-Contact-longli , Marcin Wojtas , Martin Spinler , Matan Azrad , Matt Peters , Maxime Coquelin , Michal Krawczyk , "Min Hu (Connor" , Pradeep Kumar Nalla , Nithin Dabilpuram , Qiming Yang , Qi Zhang , Radha Mohan Chintakuntla , Rahul Lakkireddy , Rasesh Mody , Rosen Xu , Sachin Saxena , Satha Koteswara Rao Kottidi , Shahed Shaikh , Shai Brandes , Shepard Siegel , Somalapuram Amaranath , Somnath Kotur , Stephen Hemminger , Steven Webster , Sunil Kumar Kori , Tetsuya Mukawa , Veerasenareddy Burru , Slava Ovsiienko , Xiao Wang , Xiaoyun Wang , Yisen Zhuang , Yong Wang , Ziyang Xuan , Prasun Kapoor , "nadavh@marvell.com" , Satananda Burla , Narayana Prasad , Akhil Goyal , Ray Kinsella , Dmitry Kozlyuk , Anatoly Burakov , Cristian Dumitrescu , Honnappa Nagarahalli , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , "Ruifeng Wang (Arm Technology China)" , David Christensen , "Ananyev, Konstantin" , Olivier Matz , "Jayatheerthan, Jay" , Ashwin Sekhar Thalakalath Kottilveetil , Pavan Nikhilesh , David Marchand , "tom@herbertland.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [RFC PATCH 0/1] Dataplane Workload Accelerator library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Oct 22, 2021 at 5:30 PM Elena Agostini wrote= : > > On Tue, Oct 19, 2021 at 21:36 Jerin Jacob wrote: > > > > > On Wed, Oct 20, 2021 at 12:38 AM Thomas Monjalon = wrote: > > > > > > > > 19/10/2021 20:14, jerinj@marvell.com: > > > > > Definition of Dataplane Workload Accelerator > > > > > -------------------------------------------- > > > > > Dataplane Workload Accelerator(DWA) typically contains a set of CPU= s, > > > > > Network controllers and programmable data acceleration engines for > > > > > packet processing, cryptography, regex engines, baseband processing= , etc. > > > > > This allows DWA to offload compute/packet processing/baseband/ > > > > > cryptography-related workload from the host CPU to save the cost an= d power. > > > > > Also to enable scaling the workload by adding DWAs to the Host CPU = as needed. > > > > > > > > > > Unlike other devices in DPDK, the DWA device is not fixed-function > > > > > due to the fact that it has CPUs and programmable HW accelerators. > > > > > This enables DWA personality/workload to be completely programmable= . > > > > > Typical examples of DWA offloads are Flow/Session management, > > > > > Virtual switch, TLS offload, IPsec offload, l3fwd offload, etc. > > > > > > > > If I understand well, the idea is to abstract the offload > > > > of some stack layers in the hardware. > > > > > > Yes. It may not just HW, For expressing the complicated workloads > > > may need CPU and/or other HW accelerators. > > > > > > > I am not sure we should give an API for such stack layers in DPDK. > > > > > > Why not? > > > > > > > It looks to be the role of the dataplane application to finely manage > > > > how to use the hardware for a specific dataplane. > > > > > > It is possible with this scheme. > > > > > > > I believe the API for such layer would be either too big, or too limi= ted, > > > > or not optimized for specific needs. > > > > > > It will be optimized for specific needs as applications ask for what to= do? > > > not how to do? > > > > > > > If we really want to automate or abstract the HW/SW co-design, > > > > I think we should better look at compiler work like P4 or PANDA. > > > > > > The compiler stuff is very static in nature. It can address the packet > > > transformation > > > workloads. Not the ones like IPsec or baseband offload. > > > Another way to look at it, GPU RFC started just because you are not abl= e > > > to express all the workload in P4. > > > > > > > That=E2=80=99s not the purpose of the GPU RFC. > > gpudev library goal is to enhance the dialog between GPU, CPU and NIC off= ering the possibility to: > > > > - Have DPDK aware of non-CPU memory like device memory (e.g. similarly to= what happened with MPI) > > - Hide some memory management GPU library specific implementation details > > - Reduce the gap between network activity and device activity (e.g. recei= ve/send packets directly using the device memory) > > - Reduce the gap between CPU activity and application-defined GPU workloa= d > > - Open to the capability to interact with the GPU device, not managing it Agree. I am not advocating P4 as the replacement for gpulib or DWA. If someone thinks possible. It would be great to how to express that for complex workload like TLS offload or ORAN 7.2 split highphy baseband offload etc. Could you give more details on "Open to the capability to interact with the GPU device, not managing it" What do you mean by managing it and what this RFC doing to manage it? > > > > gpudev library can be easily embedded in any GPU specific application wit= h a relatively small effort. > > The application can allocate, communicate and manage the memory with the = device transparently through DPDK. See below > > What you are providing here is different and out of the scope of the gpud= ev library: control and manage the workload submission of possibly any > > accelerator device, hiding a lot of implementation details within DPDK. No. it has both control and user plane. which also allows an implementation to allocate, communicate and manage the memory with the device transparently through DPDK using user action. TLV messages can be at level. We can define the profile from a low and higher level based on what feature we need to offload. Or chain the multiple small profiles to create complex workloads. > > A wrapper for accelerator devices specific libraries and I think that it= =E2=80=99s too far to be realistic. > > As a GPU user, I don=E2=80=99t want to delegate my tasks to DWA because i= t can=E2=80=99t be fully optimized, updated to the latest GPU specific feat= ure, etc.. DWA is the GPU.Task are expressed in generic representation, so it can be optimized for GPU/DPU/IPU based on accelerator speciifics. > > > > Additionally, a generic DWA won't work for a GPU: > > - Memory copies of DWA to CPU / CPU to DWA is latency expensive. Packets = can directly be received in device memory No copy involved. The host port is just an abstract model. You can use just shared memory as underneath. Also, If you see the RFC, We can add new host ports that are specific to the category for transport(Ethernet, PCIe, Shared memory) > > - When launching multiple processing blocks, efficiency may be compromise= d How you are avoiding that with gpulib, the same logic can be moved to driver implementation. Right? > > > > I don=E2=80=99t actually see a real comparison between gpudev and DWA. > > If in the future we=E2=80=99ll expose some GPU workload through the gpude= v library, it will be for some network specific and well-defined problems. How do you want to represent the "network specific" and "well-defined" problem from application PoV. The problem, I am trying to address, if every vendor express the workload in accelerator specific fashion then we need N library and N application code to solve a single problem, I have provided an example for L3FWD, it will be good to know, how it can not map to GPU. Such level of depth discussion will give more ideas instead of an abstract level. Or you can take up a workload that can be NOT expressed with DWA RFC. That helps to understand the gap. I think, TB board/DPDK community needs to decide the direction following questions 1) Agree/Disagree on the need for workload offload accelerators in DPDK. 2) Do we need to expose accelerator-specific workload libraries (ie separate libraries for GPU, DPU etc) let the _DPDK_ application deal with using acceleration-specific API for the workload. If the majority thinks yes, In such case, we can have dpudev library in addition to gpudev, basically, it will be removing the profile concept from this RFC. 3) Allow accelerator-specific libraries and DWA kind of model and application to pick the model they want.