From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2ED30A0C41; Thu, 16 Sep 2021 06:17:00 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B74794003F; Thu, 16 Sep 2021 06:16:59 +0200 (CEST) Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) by mails.dpdk.org (Postfix) with ESMTP id 4A4BD4003C for ; Thu, 16 Sep 2021 06:16:57 +0200 (CEST) Received: by mail-il1-f175.google.com with SMTP id d11so1408692ilc.8 for ; Wed, 15 Sep 2021 21:16:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6uwtyLjgBu9isKg2O5Hbmp06m2qwF3BXwTDs4rsRK7k=; b=YGM0edgTPCQ/dhHOWgZWFRFicemYQabpPx0qp9yIzQTbiA17cf8EygZaFK8NjWFTTB aiEnrXbTuEL2RMcMm1Hj1m0puv94NvDjmBlVl+1BM39REQHiN7npo6vrlKgBTgY0Beij kuD17mplwrUhA1QM8lIHZJkWPJnZ5gZgRWWdCsOMkLNp/mX76KpsjMpiQSVEqUnlSCZT Y8XkYM6Dl1/tUqkTlIYKQ5QFSpHgeSK9C2Tz42Sce6mqJbkjV6KZtn/aC3hhFutU/APl KjCzZCPXGotjX1K2IR2IL8APL2vdxym6H2kuQOKlI2ByaiJiIxC+RTcPl5QuS7QM+gMG OJ9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6uwtyLjgBu9isKg2O5Hbmp06m2qwF3BXwTDs4rsRK7k=; b=lT28UxnMEb24RVKsl1x1qxSXEuUlOfmUHY6gIsCP1zEcv2jDT6MUZb/SXk88PC38uX yvm7Ld/Mx4xZkMj3ql/5bJkyT+dh9kv5Ez1bWOAsgG+9CbySKpXltF2ymOlvHP/mUixs iUzosr2h+2vLjheEaD3BhoGH/7q5tzRkjjHXnrMCwvg7ifS4L8UUe90qyTN2scFsaqla HAotC3Ck7nrJJBTnNbgcu1mjdgF1vQuZEEOwgqnRpjRKH+gK5FhP+fSMteoF222gubml LkDnTkhFTYqy4bFMtW3pcxpB4m3jz2E0toJV9EoTpdv6XdKhBShcvaklKApQroXKfG6T GBGA== X-Gm-Message-State: AOAM530/lr/DgMBSnSkL7JtoLRs14VBLK/tZxIYB33+qmtn32+m7fPRJ HJSsn/fyDidcEHnRVABqqWb0WlivvVt6QAInZ8U= X-Google-Smtp-Source: ABdhPJy351x7WpfyHC9fmm9rLKWHQg5GcZGO6rAUMakyfaWXcsZgfoMwyxTBrexeg3ZF+xa9V/MeZo8iR14AFawU2QE= X-Received: by 2002:a05:6e02:1906:: with SMTP id w6mr2619541ilu.295.1631765816646; Wed, 15 Sep 2021 21:16:56 -0700 (PDT) MIME-Version: 1.0 References: <20210727034204.20649-1-xuemingl@nvidia.com> <20210811140418.393264-1-xuemingl@nvidia.com> <820d6cbe0ddba612ee9bfd6999feb9a3e8312beb.camel@nvidia.com> In-Reply-To: <820d6cbe0ddba612ee9bfd6999feb9a3e8312beb.camel@nvidia.com> From: Jerin Jacob Date: Thu, 16 Sep 2021 09:46:30 +0530 Message-ID: To: "Xueming(Steven) Li" Cc: NBU-Contact-Thomas Monjalon , "andrew.rybchenko@oktetlabs.ru" , "dev@dpdk.org" , "ferruh.yigit@intel.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH v2 01/15] ethdev: introduce shared Rx queue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Sep 15, 2021 at 8:15 PM Xueming(Steven) Li wr= ote: > > Hi Jerin, > > On Mon, 2021-08-30 at 15:01 +0530, Jerin Jacob wrote: > > On Sat, Aug 28, 2021 at 7:46 PM Xueming(Steven) Li wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Jerin Jacob > > > > Sent: Thursday, August 26, 2021 7:58 PM > > > > To: Xueming(Steven) Li > > > > Cc: dpdk-dev ; Ferruh Yigit ;= NBU-Contact-Thomas Monjalon ; > > > > Andrew Rybchenko > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue > > > > > > > > On Thu, Aug 19, 2021 at 5:39 PM Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Jerin Jacob > > > > > > Sent: Thursday, August 19, 2021 1:27 PM > > > > > > To: Xueming(Steven) Li > > > > > > Cc: dpdk-dev ; Ferruh Yigit ; > > > > > > NBU-Contact-Thomas Monjalon ; Andrew Rybch= enko > > > > > > > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue > > > > > > > > > > > > On Wed, Aug 18, 2021 at 4:44 PM Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Jerin Jacob > > > > > > > > Sent: Tuesday, August 17, 2021 11:12 PM > > > > > > > > To: Xueming(Steven) Li > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit > > > > > > > > ; NBU-Contact-Thomas Monjalon > > > > > > > > ; Andrew Rybchenko > > > > > > > > > > > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx q= ueue > > > > > > > > > > > > > > > > On Tue, Aug 17, 2021 at 5:01 PM Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > Sent: Tuesday, August 17, 2021 5:33 PM > > > > > > > > > > To: Xueming(Steven) Li > > > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit > > > > > > > > > > ; NBU-Contact-Thomas Monjalon > > > > > > > > > > ; Andrew Rybchenko > > > > > > > > > > > > > > > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared = Rx > > > > > > > > > > queue > > > > > > > > > > > > > > > > > > > > On Wed, Aug 11, 2021 at 7:34 PM Xueming Li wrote: > > > > > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue is pre-loade= d > > > > > > > > > > > with mbufs for incoming packets. When number of > > > > > > > > > > > representors scale out in a switch domain, the memory > > > > > > > > > > > consumption became significant. Most important, polli= ng > > > > > > > > > > > all ports leads to high cache miss, high latency and = low throughput. > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. Ports with sam= e > > > > > > > > > > > configuration in a switch domain could share RX queue= set by specifying sharing group. > > > > > > > > > > > Polling any queue using same shared RX queue receives > > > > > > > > > > > packets from all member ports. Source port is identif= ied by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group should be identic= al. > > > > > > > > > > > Queue index is > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > Share RX queue must be polled on single thread or cor= e. > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Xueming Li > > > > > > > > > > > Cc: Jerin Jacob > > > > > > > > > > > --- > > > > > > > > > > > Rx queue object could be used as shared Rx queue obje= ct, > > > > > > > > > > > it's important to clear all queue control callback ap= i that using queue object: > > > > > > > > > > > > > > > > > > > > > > https://mails.dpdk.org/archives/dev/2021-July/215574.= html > > > > > > > > > > > > > > > > > > > > > #undef RTE_RX_OFFLOAD_BIT2STR diff --git > > > > > > > > > > > a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h i= ndex > > > > > > > > > > > d2b27c351f..a578c9db9d 100644 > > > > > > > > > > > --- a/lib/ethdev/rte_ethdev.h > > > > > > > > > > > +++ b/lib/ethdev/rte_ethdev.h > > > > > > > > > > > @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf { > > > > > > > > > > > uint8_t rx_drop_en; /**< Drop packets if no d= escriptors are available. */ > > > > > > > > > > > uint8_t rx_deferred_start; /**< Do not start = queue with rte_eth_dev_start(). */ > > > > > > > > > > > uint16_t rx_nseg; /**< Number of descriptions= in rx_seg array. > > > > > > > > > > > */ > > > > > > > > > > > + uint32_t shared_group; /**< Shared port group > > > > > > > > > > > + index in switch domain. */ > > > > > > > > > > > > > > > > > > > > Not to able to see anyone setting/creating this group I= D test application. > > > > > > > > > > How this group is created? > > > > > > > > > > > > > > > > > > Nice catch, the initial testpmd version only support one = default group(0). > > > > > > > > > All ports that supports shared-rxq assigned in same group= . > > > > > > > > > > > > > > > > > > We should be able to change "--rxq-shared" to "--rxq-shar= ed-group" > > > > > > > > > to support group other than default. > > > > > > > > > > > > > > > > > > To support more groups simultaneously, need to consider > > > > > > > > > testpmd forwarding stream core assignment, all streams in= same group need to stay on same core. > > > > > > > > > It's possible to specify how many ports to increase group > > > > > > > > > number, but user must schedule stream affinity carefully = - error prone. > > > > > > > > > > > > > > > > > > On the other hand, one group should be sufficient for mos= t > > > > > > > > > customer, the doubt is whether it valuable to support mul= tiple groups test. > > > > > > > > > > > > > > > > Ack. One group is enough in testpmd. > > > > > > > > > > > > > > > > My question was more about who and how this group is create= d, > > > > > > > > Should n't we need API to create shared_group? If we do the= following, at least, I can think, how it can be implemented in SW > > > > or other HW. > > > > > > > > > > > > > > > > - Create aggregation queue group > > > > > > > > - Attach multiple Rx queues to the aggregation queue group > > > > > > > > - Pull the packets from the queue group(which internally fe= tch > > > > > > > > from the Rx queues _attached_) > > > > > > > > > > > > > > > > Does the above kind of sequence, break your representor use= case? > > > > > > > > > > > > > > Seems more like a set of EAL wrapper. Current API tries to mi= nimize the application efforts to adapt shared-rxq. > > > > > > > - step 1, not sure how important it is to create group with A= PI, in rte_flow, group is created on demand. > > > > > > > > > > > > Which rte_flow pattern/action for this? > > > > > > > > > > No rte_flow for this, just recalled that the group in rte_flow is= not created along with flow, not via api. > > > > > I don=E2=80=99t see anything else to create along with group, jus= t double whether it valuable to introduce a new api set to manage group. > > > > > > > > See below. > > > > > > > > > > > > > > > > > > > > > > - step 2, currently, the attaching is done in rte_eth_rx_queu= e_setup, specify offload and group in rx_conf struct. > > > > > > > - step 3, define a dedicate api to receive packets from share= d rxq? Looks clear to receive packets from shared rxq. > > > > > > > currently, rxq objects in share group is same - the shared = rxq, so the eth callback eth_rx_burst_t(rxq_obj, mbufs, n) could > > > > > > > be used to receive packets from any ports in group, normall= y the first port(PF) in group. > > > > > > > An alternative way is defining a vdev with same queue numbe= r and copy rxq objects will make the vdev a proxy of > > > > > > > the shared rxq group - this could be an helper API. > > > > > > > > > > > > > > Anyway the wrapper doesn't break use case, step 3 api is more= clear, need to understand how to implement efficiently. > > > > > > > > > > > > Are you doing this feature based on any HW support or it just p= ure > > > > > > SW thing, If it is SW, It is better to have just new vdev for l= ike drivers/net/bonding/. This we can help aggregate multiple Rxq across > > > > the multiple ports of same the driver. > > > > > > > > > > Based on HW support. > > > > > > > > In Marvel HW, we do some support, I will outline here and some quer= ies on this. > > > > > > > > # We need to create some new HW structure for aggregation # Connect= each Rxq to the new HW structure for aggregation # Use > > > > rx_burst from the new HW structure. > > > > > > > > Could you outline your HW support? > > > > > > > > Also, I am not able to understand how this will reduce the memory, = atleast in our HW need creating more memory now to deal this as > > > > we need to deal new HW structure. > > > > > > > > How is in your HW it reduces the memory? Also, if memory is the con= straint, why NOT reduce the number of queues. > > > > > > > > > > Glad to know that Marvel is working on this, what's the status of dri= ver implementation? > > > > > > In my PMD implementation, it's very similar, a new HW object shared m= emory pool is created to replace per rxq memory pool. > > > Legacy rxq feed queue with allocated mbufs as number of descriptors, = now shared rxqs share the same pool, no need to supply > > > mbufs for each rxq, just feed the shared rxq. > > > > > > So the memory saving reflects to mbuf per rxq, even 1000 representors= in shared rxq group, the mbufs consumed is one rxq. > > > In other words, new members in shared rxq doesn=E2=80=99t allocate ne= w mbufs to feed rxq, just share with existing shared rxq(HW mempool). > > > The memory required to setup each rxq doesn't change too much, agree. > > > > We can ask the application to configure the same mempool for multiple > > RQ too. RIght? If the saving is based on sharing the mempool > > with multiple RQs. > > > > > > > > > # Also, I was thinking, one way to avoid the fast path or ABI chang= e would like. > > > > > > > > # Driver Initializes one more eth_dev_ops in driver as aggregator e= thdev # devargs of new ethdev or specific API like > > > > drivers/net/bonding/rte_eth_bond.h can take the argument (port, que= ue) tuples which needs to aggregate by new ethdev port # No > > > > change in fastpath or ABI is required in this model. > > > > > > > > > > This could be an option to access shared rxq. What's the difference o= f the new PMD? > > > > No ABI and fast change are required. > > > > > What's the difference of PMD driver to create the new device? > > > > > > Is it important in your implementation? Does it work with existing rx= _burst api? > > > > Yes . It will work with the existing rx_burst API. > > > > The aggregator ethdev required by user is a port, maybe it good to add > a callback for PMD to prepare a complete ethdev just like creating > representor ethdev - pmd register new port internally. If the PMD > doens't provide the callback, ethdev api fallback to initialize an > empty ethdev by copy rxq data(shared) and rx_burst api from source port > and share group. Actually users can do this fallback themselves or with > an util api. > > IIUC, an aggregator ethdev not a must, do you think we can continue and > leave that design in later stage? IMO aggregator ethdev reduces the complexity for application hence avoid any change in test application etc. IMO, I prefer to take that. I will leave the decision to ethdev maintainers. > > > > > > > > > > > > > > > > > Most user might uses PF in group as the anchor port to rx burst, = current definition should be easy for them to migrate. > > > > > but some user might prefer grouping some hot > > > > > plug/unpluggedrepresentors, EAL could provide wrappers, users cou= ld do that either due to the strategy not complex enough. > > > > Anyway, welcome any suggestion. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /** > > > > > > > > > > > * Per-queue Rx offloads to be set using DEV_= RX_OFFLOAD_* flags. > > > > > > > > > > > * Only offloads set on rx_queue_offload_capa= or > > > > > > > > > > > rx_offload_capa @@ -1373,6 +1374,12 @@ struct rte_eth= _conf > > > > > > > > > > > { #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x00040000 > > > > > > > > > > > #define DEV_RX_OFFLOAD_RSS_HASH 0x000= 80000 > > > > > > > > > > > #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000 > > > > > > > > > > > +/** > > > > > > > > > > > + * Rx queue is shared among ports in same switch dom= ain > > > > > > > > > > > +to save memory, > > > > > > > > > > > + * avoid polling each port. Any port in group can be= used to receive packets. > > > > > > > > > > > + * Real source port number saved in mbuf->port field= . > > > > > > > > > > > + */ > > > > > > > > > > > +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ 0x00200000 > > > > > > > > > > > > > > > > > > > > > > #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4= _CKSUM | \ > > > > > > > > > > > DEV_RX_OFFLOAD_UDP_C= KSUM > > > > > > > > > > > > \ > > > > > > > > > > > -- > > > > > > > > > > > 2.25.1 > > > > > > > > > > > >