From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 08CE9A0C4C; Tue, 28 Sep 2021 15:39:18 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 78B3340E3C; Tue, 28 Sep 2021 15:39:18 +0200 (CEST) Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by mails.dpdk.org (Postfix) with ESMTP id 31AE740DF6 for ; Tue, 28 Sep 2021 15:39:17 +0200 (CEST) Received: by mail-io1-f53.google.com with SMTP id n71so27461859iod.0 for ; Tue, 28 Sep 2021 06:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m94sx3ivQw/itGf7ZlpXtmYbINoH0SdymIOPCiodxM4=; b=CO0Cp5Ou3KfXVZBBv9CK7q7F+tTCgUH/YDrYmN35eU8B7Ivkn7+s2Vlsm5amaUJazf rpgtRJCgX1xmvqKuJVyOUSFGL+x80pdf+CkbaB8foMDFLpvbiMssR11+TPaIqlC5JucK PnmXN7sIUB2AboWEe84elKrsKSwHkuFKYuo/VCdmW+x767Fp9ogbs7xy3dGwduADYKji Z/aeEmp3z0LkGx58o99mWOfBCh5A+EqwEp52ztqyxlr7l+6j0Yei/pda13BryK+3DAQQ TLMLNgJcRGJU7N2/4yp3rOQl2ad8koPHABXMgSAfdW8s8XxF3cPw9gxzOVrWlTWyS6AT zd3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m94sx3ivQw/itGf7ZlpXtmYbINoH0SdymIOPCiodxM4=; b=2Et4v7s8lSZhh9ezC9aAEeZfwN7GaIaRQmX895bzt5g8Z7w+uYRegFvbopuh2C68cE sFBA90ftXhV2c3FHplbdjq7YVTAHr2SwGbUoysOgEt4ICN/S+Pc0HIRk3pjkZYY5UJo7 QDyO7sU31g8Xy/ODVEAiy7SsEQ/6WbYUXtwYooBTmqt8lakNLeMVa5/dZCNreSSYwhsm 7il0GgxeFlH00elZSoQM89PU/g0bR9vYqxBcTUOcqCTubAJzjdSukinUIhzapGle3sRz vH1YtHGitQColGmQfVCHo6hm5fEGJ8FC0Yujqkny3QHHBbKm4FYyyDhGPRtc5MvFkzrv J6Og== X-Gm-Message-State: AOAM5335b7xn9fb1d0XNdNUVeuJATsUi1m3KqgkTaTgsqjdfwbLDJV/b A6QsC9d5zzd6aDMdLWn+62bM+uPgJnbJ6j1eUpg= X-Google-Smtp-Source: ABdhPJy0+P3wHwAXKOeH76TPeclyE+HZyZLCVqc/9tjX0SkVXIIxnYQxuH0afLxMKw6t4ggOQWRs9Mleowpqx5G8n+s= X-Received: by 2002:a05:6602:208f:: with SMTP id a15mr3898339ioa.123.1632836356354; Tue, 28 Sep 2021 06:39:16 -0700 (PDT) MIME-Version: 1.0 References: <20210727034204.20649-1-xuemingl@nvidia.com> <20210809114716.22035-1-xuemingl@nvidia.com> <6d4308c307a72d62b0be4d61f70f8d0c64a4e7ba.camel@nvidia.com> <15b1590a8899d85e85bb4f7c104b9399654ee160.camel@nvidia.com> In-Reply-To: From: Jerin Jacob Date: Tue, 28 Sep 2021 19:08:50 +0530 Message-ID: To: "Xueming(Steven) Li" Cc: NBU-Contact-Thomas Monjalon , "andrew.rybchenko@oktetlabs.ru" , "dev@dpdk.org" , "ferruh.yigit@intel.com" Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Sep 28, 2021 at 6:55 PM Xueming(Steven) Li wrote: > > On Tue, 2021-09-28 at 18:28 +0530, Jerin Jacob wrote: > > On Tue, Sep 28, 2021 at 5:07 PM Xueming(Steven) Li wrote: > > > > > > On Tue, 2021-09-28 at 15:05 +0530, Jerin Jacob wrote: > > > > On Sun, Sep 26, 2021 at 11:06 AM Xueming(Steven) Li wrote: > > > > > > > > > > On Wed, 2021-08-11 at 13:04 +0100, Ferruh Yigit wrote: > > > > > > On 8/11/2021 9:28 AM, Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Jerin Jacob > > > > > > > > Sent: Wednesday, August 11, 2021 4:03 PM > > > > > > > > To: Xueming(Steven) Li > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit ; NBU-Contact-Thomas Monjalon ; > > > > > > > > Andrew Rybchenko > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > > > > > > > > > > > > > > > On Mon, Aug 9, 2021 at 7:46 PM Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > Sent: Monday, August 9, 2021 9:51 PM > > > > > > > > > > To: Xueming(Steven) Li > > > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit ; > > > > > > > > > > NBU-Contact-Thomas Monjalon ; Andrew Rybchenko > > > > > > > > > > > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > > > > > > > > > > > > > > > > > > > On Mon, Aug 9, 2021 at 5:18 PM Xueming Li wrote: > > > > > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue is pre-loaded with mbufs > > > > > > > > > > > for incoming packets. When number of representors scale out in a > > > > > > > > > > > switch domain, the memory consumption became significant. Most > > > > > > > > > > > important, polling all ports leads to high cache miss, high > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. Ports with same > > > > > > > > > > > configuration in a switch domain could share RX queue set by specifying sharing group. > > > > > > > > > > > Polling any queue using same shared RX queue receives packets from > > > > > > > > > > > all member ports. Source port is identified by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group should be identical. Queue > > > > > > > > > > > index is > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled on same thread. > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > Is this offload specific to the representor? If so can this name be changed specifically to representor? > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain could take advantage. > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow ordering will be maintained? > > > > > > > > > > > > > > > > > > Not quite sure that I understood your question. The control path of is > > > > > > > > > almost same as before, PF and representor port still needed, rte flows not impacted. > > > > > > > > > Queues still needed for each member port, descriptors(mbuf) will be > > > > > > > > > supplied from shared Rx queue in my PMD implementation. > > > > > > > > > > > > > > > > My question was if create a generic RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple ethdev receive queues land into the same > > > > > > > > receive queue, In that case, how the flow order is maintained for respective receive queues. > > > > > > > > > > > > > > I guess the question is testpmd forward stream? The forwarding logic has to be changed slightly in case of shared rxq. > > > > > > > basically for each packet in rx_burst result, lookup source stream according to mbuf->port, forwarding to target fs. > > > > > > > Packets from same source port could be grouped as a small burst to process, this will accelerates the performance if traffic come from > > > > > > > limited ports. I'll introduce some common api to do shard rxq forwarding, call it with packets handling callback, so it suites for > > > > > > > all forwarding engine. Will sent patches soon. > > > > > > > > > > > > > > > > > > > All ports will put the packets in to the same queue (share queue), right? Does > > > > > > this means only single core will poll only, what will happen if there are > > > > > > multiple cores polling, won't it cause problem? > > > > > > > > > > > > And if this requires specific changes in the application, I am not sure about > > > > > > the solution, can't this work in a transparent way to the application? > > > > > > > > > > Discussed with Jerin, new API introduced in v3 2/8 that aggregate ports > > > > > in same group into one new port. Users could schedule polling on the > > > > > aggregated port instead of all member ports. > > > > > > > > The v3 still has testpmd changes in fastpath. Right? IMO, For this > > > > feature, we should not change fastpath of testpmd > > > > application. Instead, testpmd can use aggregated ports probably as > > > > separate fwd_engine to show how to use this feature. > > > > > > Good point to discuss :) There are two strategies to polling a shared > > > Rxq: > > > 1. polling each member port > > > All forwarding engines can be reused to work as before. > > > My testpmd patches are efforts towards this direction. > > > Does your PMD support this? > > > > Not unfortunately. More than that, every application needs to change > > to support this model. > > Both strategies need user application to resolve port ID from mbuf and > process accordingly. > This one doesn't demand aggregated port, no polling schedule change. I was thinking, mbuf will be updated from driver/aggregator port as when it comes to application. > > > > > > 2. polling aggregated port > > > Besides forwarding engine, need more work to to demo it. > > > This is an optional API, not supported by my PMD yet. > > > > We are thinking of implementing this PMD when it comes to it, ie. > > without application change in fastpath > > logic. > > Fastpath have to resolve port ID anyway and forwarding according to > logic. Forwarding engine need to adapt to support shard Rxq. > Fortunately, in testpmd, this can be done with an abstract API. > > Let's defer part 2 until some PMD really support it and tested, how do > you think? We are not planning to use this feature so either way it is OK to me. I leave to ethdev maintainers decide between 1 vs 2. I do have a strong opinion not changing the testpmd basic forward engines for this feature.I would like to keep it simple as fastpath optimized and would like to add a separate Forwarding engine as means to verify this feature. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Overall, is this for optimizing memory for the port represontors? If so can't we > > > > > > have a port representor specific solution, reducing scope can reduce the > > > > > > complexity it brings? > > > > > > > > > > > > > > If this offload is only useful for representor case, Can we make this offload specific to representor the case by changing its name and > > > > > > > > scope. > > > > > > > > > > > > > > It works for both PF and representors in same switch domain, for application like OVS, few changes to apply. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Xueming Li > > > > > > > > > > > --- > > > > > > > > > > > doc/guides/nics/features.rst | 11 +++++++++++ > > > > > > > > > > > doc/guides/nics/features/default.ini | 1 + > > > > > > > > > > > doc/guides/prog_guide/switch_representation.rst | 10 ++++++++++ > > > > > > > > > > > lib/ethdev/rte_ethdev.c | 1 + > > > > > > > > > > > lib/ethdev/rte_ethdev.h | 7 +++++++ > > > > > > > > > > > 5 files changed, 30 insertions(+) > > > > > > > > > > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst > > > > > > > > > > > b/doc/guides/nics/features.rst index a96e12d155..2e2a9b1554 100644 > > > > > > > > > > > --- a/doc/guides/nics/features.rst > > > > > > > > > > > +++ b/doc/guides/nics/features.rst > > > > > > > > > > > @@ -624,6 +624,17 @@ Supports inner packet L4 checksum. > > > > > > > > > > > ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +.. _nic_features_shared_rx_queue: > > > > > > > > > > > + > > > > > > > > > > > +Shared Rx queue > > > > > > > > > > > +--------------- > > > > > > > > > > > + > > > > > > > > > > > +Supports shared Rx queue for ports in same switch domain. > > > > > > > > > > > + > > > > > > > > > > > +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``. > > > > > > > > > > > +* **[provides] mbuf**: ``mbuf.port``. > > > > > > > > > > > + > > > > > > > > > > > + > > > > > > > > > > > .. _nic_features_packet_type_parsing: > > > > > > > > > > > > > > > > > > > > > > Packet type parsing > > > > > > > > > > > diff --git a/doc/guides/nics/features/default.ini > > > > > > > > > > > b/doc/guides/nics/features/default.ini > > > > > > > > > > > index 754184ddd4..ebeb4c1851 100644 > > > > > > > > > > > --- a/doc/guides/nics/features/default.ini > > > > > > > > > > > +++ b/doc/guides/nics/features/default.ini > > > > > > > > > > > @@ -19,6 +19,7 @@ Free Tx mbuf on demand = > > > > > > > > > > > Queue start/stop = > > > > > > > > > > > Runtime Rx queue setup = > > > > > > > > > > > Runtime Tx queue setup = > > > > > > > > > > > +Shared Rx queue = > > > > > > > > > > > Burst mode info = > > > > > > > > > > > Power mgmt address monitor = > > > > > > > > > > > MTU update = > > > > > > > > > > > diff --git a/doc/guides/prog_guide/switch_representation.rst > > > > > > > > > > > b/doc/guides/prog_guide/switch_representation.rst > > > > > > > > > > > index ff6aa91c80..45bf5a3a10 100644 > > > > > > > > > > > --- a/doc/guides/prog_guide/switch_representation.rst > > > > > > > > > > > +++ b/doc/guides/prog_guide/switch_representation.rst > > > > > > > > > > > @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for applications. > > > > > > > > > > > .. [1] `Ethernet switch device driver model (switchdev) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `_ > > > > > > > > > > > > > > > > > > > > > > +- Memory usage of representors is huge when number of representor > > > > > > > > > > > +grows, > > > > > > > > > > > + because PMD always allocate mbuf for each descriptor of Rx queue. > > > > > > > > > > > + Polling the large number of ports brings more CPU load, cache > > > > > > > > > > > +miss and > > > > > > > > > > > + latency. Shared Rx queue can be used to share Rx queue between > > > > > > > > > > > +PF and > > > > > > > > > > > + representors in same switch domain. > > > > > > > > > > > +``RTE_ETH_RX_OFFLOAD_SHARED_RXQ`` > > > > > > > > > > > + is present in Rx offloading capability of device info. Setting > > > > > > > > > > > +the > > > > > > > > > > > + offloading flag in device Rx mode or Rx queue configuration to > > > > > > > > > > > +enable > > > > > > > > > > > + shared Rx queue. Polling any member port of shared Rx queue can > > > > > > > > > > > +return > > > > > > > > > > > + packets of all ports in group, port ID is saved in ``mbuf.port``. > > > > > > > > > > > + > > > > > > > > > > > Basic SR-IOV > > > > > > > > > > > ------------ > > > > > > > > > > > > > > > > > > > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > > > > > > > > > > index 9d95cd11e1..1361ff759a 100644 > > > > > > > > > > > --- a/lib/ethdev/rte_ethdev.c > > > > > > > > > > > +++ b/lib/ethdev/rte_ethdev.c > > > > > > > > > > > @@ -127,6 +127,7 @@ static const struct { > > > > > > > > > > > RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM), > > > > > > > > > > > RTE_RX_OFFLOAD_BIT2STR(RSS_HASH), > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT), > > > > > > > > > > > + RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ), > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > #undef RTE_RX_OFFLOAD_BIT2STR > > > > > > > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > > > > > > > > > > > index d2b27c351f..a578c9db9d 100644 > > > > > > > > > > > --- a/lib/ethdev/rte_ethdev.h > > > > > > > > > > > +++ b/lib/ethdev/rte_ethdev.h > > > > > > > > > > > @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf { > > > > > > > > > > > uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */ > > > > > > > > > > > uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */ > > > > > > > > > > > uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. > > > > > > > > > > > */ > > > > > > > > > > > + uint32_t shared_group; /**< Shared port group index in > > > > > > > > > > > + switch domain. */ > > > > > > > > > > > /** > > > > > > > > > > > * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags. > > > > > > > > > > > * Only offloads set on rx_queue_offload_capa or > > > > > > > > > > > rx_offload_capa @@ -1373,6 +1374,12 @@ struct rte_eth_conf { > > > > > > > > > > > #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x00040000 > > > > > > > > > > > #define DEV_RX_OFFLOAD_RSS_HASH 0x00080000 > > > > > > > > > > > #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000 > > > > > > > > > > > +/** > > > > > > > > > > > + * Rx queue is shared among ports in same switch domain to save > > > > > > > > > > > +memory, > > > > > > > > > > > + * avoid polling each port. Any port in group can be used to receive packets. > > > > > > > > > > > + * Real source port number saved in mbuf->port field. > > > > > > > > > > > + */ > > > > > > > > > > > +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ 0x00200000 > > > > > > > > > > > > > > > > > > > > > > #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ > > > > > > > > > > > DEV_RX_OFFLOAD_UDP_CKSUM | \ > > > > > > > > > > > -- > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >