From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 86436A034F;
	Mon, 30 Aug 2021 11:31:42 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 6F0A041145;
	Mon, 30 Aug 2021 11:31:42 +0200 (CEST)
Received: from mail-il1-f182.google.com (mail-il1-f182.google.com
 [209.85.166.182])
 by mails.dpdk.org (Postfix) with ESMTP id 582A641141
 for <dev@dpdk.org>; Mon, 30 Aug 2021 11:31:41 +0200 (CEST)
Received: by mail-il1-f182.google.com with SMTP id x5so15306897ill.3
 for <dev@dpdk.org>; Mon, 30 Aug 2021 02:31:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=fOHVqnnZ+cTOT4lLNVZFkgfUXlUX//gW26wEBFiiLBg=;
 b=YEd/T4eHyFIV59dTDxGKqSfp29agPKiNSKuJuKbFGoMoj7i1yQZO13Xc3G5GgFIFZg
 vEhQD1YQDUi3r6odvZOrlHmEzU9xX7v6m1DM2K5l9u5TZePvWjNuVNlBTS9hfzHCmlGQ
 EVVxHz5BWvsc7oys7j5GN8Q6gCTmP29vtpjnd7wwXhPMqMzHOeVudFp7w5Pqt+uiaocT
 vjq5I4uL0K9GHZ6YU35bx6FiDaA2QcXNntQOFcFj/qXJh8ngx8fSAOpC/LbkWoxruXhK
 7/PKeMwFnt0FnIeoNnDmUqfJ2o80BNqAB66P1m0Rm3aXTqhCBh/gjxf4Ea8nQ/Tyvo27
 1nyA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=fOHVqnnZ+cTOT4lLNVZFkgfUXlUX//gW26wEBFiiLBg=;
 b=jfsrUr5QFe/Od9Z8pR6H44mSijcgqzyJD5BdTfWBWmySBv7QhE2rTBN4Zt07sqvoE0
 zXPWiC86v1+Bj47ANjuMrFy8+wtvpko75jMMwxC3hZ6snI3AZgZz2ZefOrPkjoh0AduY
 Qns3GLoWKMikmH3cpRE/ex3wq+ss01zbFGWhFmeKCIII6UO58Zvt+x98mQVdNxGn5OHe
 W4Phk0zPehB4mF88q0UoXR+FQemis+2Q8XLmm7/K25mT947LfAoo6/Z3xzt+8JlERcet
 HRtBBBxIxs8z6L0r49I+E5WUmZupBZ++GcJnv3w7ucwnuVlQdpyYgAGteS3KuuR4I08N
 a2DQ==
X-Gm-Message-State: AOAM531QA+5PaZ0SMXlpb5eh01a+9o4jbPQY9q9wlKgEsDAPaLXbDMlC
 Fv2OFaMk5Nqh4/R7OUtPvmEYXjT4oDvDz6Rp5Nc=
X-Google-Smtp-Source: ABdhPJyVcsUMms/yY8nYYZ0voyFO/n15GsSGs0/P6ypdTRrkuSOFChilB1T3zPr19Zt4WHBlWeZsCpUMMcweNfIcTnA=
X-Received: by 2002:a92:d20d:: with SMTP id y13mr15456692ily.294.1630315900674; 
 Mon, 30 Aug 2021 02:31:40 -0700 (PDT)
MIME-Version: 1.0
References: <20210727034204.20649-1-xuemingl@nvidia.com>
 <20210811140418.393264-1-xuemingl@nvidia.com>
 <CALBAE1MYPeTVz1+6UUuC8m7J9=wqqCsuSswa2w_joBYZF1b13w@mail.gmail.com>
 <DM4PR12MB53735BE63CF997A8E1423893A1FE9@DM4PR12MB5373.namprd12.prod.outlook.com>
 <CALBAE1P5oLvC_reYFNTPSFhM+zv3UsyxZT09yo2rS8G2Ur5tVA@mail.gmail.com>
 <DM4PR12MB537327BCB9D78EA1CF816D37A1FF9@DM4PR12MB5373.namprd12.prod.outlook.com>
 <CALBAE1OMiHdUGB9Wy9y+rwOXAKp7g8-gNgmK+gvvPaFd1Dq6LQ@mail.gmail.com>
 <DM4PR12MB5373FDC55B811F993FD1ACEFA1C09@DM4PR12MB5373.namprd12.prod.outlook.com>
 <CALBAE1PE+wFK3J6hRCXS0FcX3_gwUVjwJ9kv9picX_umaJSLWA@mail.gmail.com>
 <DM4PR12MB5373FB538D499701171187DEA1C99@DM4PR12MB5373.namprd12.prod.outlook.com>
In-Reply-To: <DM4PR12MB5373FB538D499701171187DEA1C99@DM4PR12MB5373.namprd12.prod.outlook.com>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Mon, 30 Aug 2021 15:01:14 +0530
Message-ID: <CALBAE1N_H2Fvg1vAktCWPireMjYr6KCHjGqVoW_EritMtOD8dA@mail.gmail.com>
To: "Xueming(Steven) Li" <xuemingl@nvidia.com>
Cc: dpdk-dev <dev@dpdk.org>, Ferruh Yigit <ferruh.yigit@intel.com>, 
 NBU-Contact-Thomas Monjalon <thomas@monjalon.net>,
 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [dpdk-dev] [PATCH v2 01/15] ethdev: introduce shared Rx queue
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Sat, Aug 28, 2021 at 7:46 PM Xueming(Steven) Li <xuemingl@nvidia.com> wr=
ote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Thursday, August 26, 2021 7:58 PM
> > To: Xueming(Steven) Li <xuemingl@nvidia.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Ferruh Yigit <ferruh.yigit@intel.com>; NBU=
-Contact-Thomas Monjalon <thomas@monjalon.net>;
> > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue
> >
> > On Thu, Aug 19, 2021 at 5:39 PM Xueming(Steven) Li <xuemingl@nvidia.com=
> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Thursday, August 19, 2021 1:27 PM
> > > > To: Xueming(Steven) Li <xuemingl@nvidia.com>
> > > > Cc: dpdk-dev <dev@dpdk.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > > NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>
> > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue
> > > >
> > > > On Wed, Aug 18, 2021 at 4:44 PM Xueming(Steven) Li <xuemingl@nvidia=
.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Tuesday, August 17, 2021 11:12 PM
> > > > > > To: Xueming(Steven) Li <xuemingl@nvidia.com>
> > > > > > Cc: dpdk-dev <dev@dpdk.org>; Ferruh Yigit
> > > > > > <ferruh.yigit@intel.com>; NBU-Contact-Thomas Monjalon
> > > > > > <thomas@monjalon.net>; Andrew Rybchenko
> > > > > > <andrew.rybchenko@oktetlabs.ru>
> > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue
> > > > > >
> > > > > > On Tue, Aug 17, 2021 at 5:01 PM Xueming(Steven) Li <xuemingl@nv=
idia.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > Sent: Tuesday, August 17, 2021 5:33 PM
> > > > > > > > To: Xueming(Steven) Li <xuemingl@nvidia.com>
> > > > > > > > Cc: dpdk-dev <dev@dpdk.org>; Ferruh Yigit
> > > > > > > > <ferruh.yigit@intel.com>; NBU-Contact-Thomas Monjalon
> > > > > > > > <thomas@monjalon.net>; Andrew Rybchenko
> > > > > > > > <andrew.rybchenko@oktetlabs.ru>
> > > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx
> > > > > > > > queue
> > > > > > > >
> > > > > > > > On Wed, Aug 11, 2021 at 7:34 PM Xueming Li <xuemingl@nvidia=
.com> wrote:
> > > > > > > > >
> > > > > > > > > In current DPDK framework, each RX queue is pre-loaded
> > > > > > > > > with mbufs for incoming packets. When number of
> > > > > > > > > representors scale out in a switch domain, the memory
> > > > > > > > > consumption became significant. Most important, polling
> > > > > > > > > all ports leads to high cache miss, high latency and low =
throughput.
> > > > > > > > >
> > > > > > > > > This patch introduces shared RX queue. Ports with same
> > > > > > > > > configuration in a switch domain could share RX queue set=
 by specifying sharing group.
> > > > > > > > > Polling any queue using same shared RX queue receives
> > > > > > > > > packets from all member ports. Source port is identified =
by mbuf->port.
> > > > > > > > >
> > > > > > > > > Port queue number in a shared group should be identical.
> > > > > > > > > Queue index is
> > > > > > > > > 1:1 mapped in shared group.
> > > > > > > > >
> > > > > > > > > Share RX queue must be polled on single thread or core.
> > > > > > > > >
> > > > > > > > > Multiple groups is supported by group ID.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > > > > > > > > Cc: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > > ---
> > > > > > > > > Rx queue object could be used as shared Rx queue object,
> > > > > > > > > it's important to clear all queue control callback api th=
at using queue object:
> > > > > > > > >
> > > > > > > > > https://mails.dpdk.org/archives/dev/2021-July/215574.html
> > > > > > > >
> > > > > > > > >  #undef RTE_RX_OFFLOAD_BIT2STR diff --git
> > > > > > > > > a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > > > > > > > > d2b27c351f..a578c9db9d 100644
> > > > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > > > @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
> > > > > > > > >         uint8_t rx_drop_en; /**< Drop packets if no descr=
iptors are available. */
> > > > > > > > >         uint8_t rx_deferred_start; /**< Do not start queu=
e with rte_eth_dev_start(). */
> > > > > > > > >         uint16_t rx_nseg; /**< Number of descriptions in =
rx_seg array.
> > > > > > > > > */
> > > > > > > > > +       uint32_t shared_group; /**< Shared port group
> > > > > > > > > + index in switch domain. */
> > > > > > > >
> > > > > > > > Not to able to see anyone setting/creating this group ID te=
st application.
> > > > > > > > How this group is created?
> > > > > > >
> > > > > > > Nice catch, the initial testpmd version only support one defa=
ult group(0).
> > > > > > > All ports that supports shared-rxq assigned in same group.
> > > > > > >
> > > > > > > We should be able to change "--rxq-shared" to "--rxq-shared-g=
roup"
> > > > > > > to support group other than default.
> > > > > > >
> > > > > > > To support more groups simultaneously, need to consider
> > > > > > > testpmd forwarding stream core assignment, all streams in sam=
e group need to stay on same core.
> > > > > > > It's possible to specify how many ports to increase group
> > > > > > > number, but user must schedule stream affinity carefully - er=
ror prone.
> > > > > > >
> > > > > > > On the other hand, one group should be sufficient for most
> > > > > > > customer, the doubt is whether it valuable to support multipl=
e groups test.
> > > > > >
> > > > > > Ack. One group is enough in testpmd.
> > > > > >
> > > > > > My question was more about who and how this group is created,
> > > > > > Should n't we need API to create shared_group? If we do the fol=
lowing, at least, I can think, how it can be implemented in SW
> > or other HW.
> > > > > >
> > > > > > - Create aggregation queue group
> > > > > > - Attach multiple  Rx queues to the aggregation queue group
> > > > > > - Pull the packets from the queue group(which internally fetch
> > > > > > from the Rx queues _attached_)
> > > > > >
> > > > > > Does the above kind of sequence, break your representor use cas=
e?
> > > > >
> > > > > Seems more like a set of EAL wrapper. Current API tries to minimi=
ze the application efforts to adapt shared-rxq.
> > > > > - step 1, not sure how important it is to create group with API, =
in rte_flow, group is created on demand.
> > > >
> > > > Which rte_flow pattern/action for this?
> > >
> > > No rte_flow for this, just recalled that the group in rte_flow is not=
 created along with flow, not via api.
> > > I don=E2=80=99t see anything else to create along with group, just do=
uble whether it valuable to introduce a new api set to manage group.
> >
> > See below.
> >
> > >
> > > >
> > > > > - step 2, currently, the attaching is done in rte_eth_rx_queue_se=
tup, specify offload and group in rx_conf struct.
> > > > > - step 3, define a dedicate api to receive packets from shared rx=
q? Looks clear to receive packets from shared rxq.
> > > > >   currently, rxq objects in share group is same - the shared rxq,=
 so the eth callback eth_rx_burst_t(rxq_obj, mbufs, n) could
> > > > >   be used to receive packets from any ports in group, normally th=
e first port(PF) in group.
> > > > >   An alternative way is defining a vdev with same queue number an=
d copy rxq objects will make the vdev a proxy of
> > > > >   the shared rxq group - this could be an helper API.
> > > > >
> > > > > Anyway the wrapper doesn't break use case, step 3 api is more cle=
ar, need to understand how to implement efficiently.
> > > >
> > > > Are you doing this feature based on any HW support or it just pure
> > > > SW thing, If it is SW, It is better to have just new vdev for like =
drivers/net/bonding/. This we can help aggregate multiple Rxq across
> > the multiple ports of same the driver.
> > >
> > > Based on HW support.
> >
> > In Marvel HW, we do some support, I will outline here and some queries =
on this.
> >
> > # We need to create some new HW structure for aggregation # Connect eac=
h Rxq to the new HW structure for aggregation # Use
> > rx_burst from the new HW structure.
> >
> > Could you outline your HW support?
> >
> > Also, I am not able to understand how this will reduce the memory, atle=
ast in our HW need creating more memory now to deal this as
> > we need to deal new HW structure.
> >
> > How is in your HW it reduces the memory? Also, if memory is the constra=
int, why NOT reduce the number of queues.
> >
>
> Glad to know that Marvel is working on this, what's the status of driver =
implementation?
>
> In my PMD implementation, it's very similar, a new HW object shared memor=
y pool is created to replace per rxq memory pool.
> Legacy rxq feed queue with allocated mbufs as number of descriptors, now =
shared rxqs share the same pool, no need to supply
> mbufs for each rxq, just feed the shared rxq.
>
> So the memory saving reflects to mbuf per rxq, even 1000 representors in =
shared rxq group, the mbufs consumed is one rxq.
> In other words, new members in shared rxq doesn=E2=80=99t allocate new mb=
ufs to feed rxq, just share with existing shared rxq(HW mempool).
> The memory required to setup each rxq doesn't change too much, agree.

We can ask the application to configure the same mempool for multiple
RQ too. RIght? If the saving is based on sharing the mempool
with multiple RQs.

>
> > # Also, I was thinking, one way to avoid the fast path or ABI change wo=
uld like.
> >
> > # Driver Initializes one more eth_dev_ops in driver as aggregator ethde=
v # devargs of new ethdev or specific API like
> > drivers/net/bonding/rte_eth_bond.h can take the argument (port, queue) =
tuples which needs to aggregate by new ethdev port # No
> > change in fastpath or ABI is required in this model.
> >
>
> This could be an option to access shared rxq. What's the difference of th=
e new PMD?

No ABI and fast change are required.

> What's the difference of PMD driver to create the new device?
>
> Is it important in your implementation? Does it work with existing rx_bur=
st api?

Yes . It will work with the existing rx_burst API.

>
> >
> >
> > > Most user might uses PF in group as the anchor port to rx burst, curr=
ent definition should be easy for them to migrate.
> > > but some user might prefer grouping some hot
> > > plug/unpluggedrepresentors, EAL could provide wrappers, users could d=
o that either due to the strategy not complex enough.
> > Anyway, welcome any suggestion.
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >         /**
> > > > > > > > >          * Per-queue Rx offloads to be set using DEV_RX_O=
FFLOAD_* flags.
> > > > > > > > >          * Only offloads set on rx_queue_offload_capa or
> > > > > > > > > rx_offload_capa @@ -1373,6 +1374,12 @@ struct rte_eth_con=
f
> > > > > > > > > { #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> > > > > > > > >  #define DEV_RX_OFFLOAD_RSS_HASH                0x0008000=
0
> > > > > > > > >  #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> > > > > > > > > +/**
> > > > > > > > > + * Rx queue is shared among ports in same switch domain
> > > > > > > > > +to save memory,
> > > > > > > > > + * avoid polling each port. Any port in group can be use=
d to receive packets.
> > > > > > > > > + * Real source port number saved in mbuf->port field.
> > > > > > > > > + */
> > > > > > > > > +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ   0x00200000
> > > > > > > > >
> > > > > > > > >  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKS=
UM | \
> > > > > > > > >                                  DEV_RX_OFFLOAD_UDP_CKSUM
> > > > > > > > > | \
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > >