From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 69302A034D; Fri, 28 Jan 2022 12:29:13 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CA7FD42876; Fri, 28 Jan 2022 12:29:12 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 0E75E40141 for ; Fri, 28 Jan 2022 12:29:12 +0100 (CET) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Fri, 28 Jan 2022 12:29:11 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86E57@smartserver.smartshare.dk> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side Thread-Index: Adf45cqRWsm3zOPbTBqjsnkimDKFQgBU+6HQBj0VyOAAHBU7sAAltahA References: <20211224164613.32569-1-feifei.wang2@arm.com> <98CBD80474FA8B44BF855DF32C47DC35D86DAF@smartserver.smartshare.dk> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Honnappa Nagarahalli" , "Ananyev, Konstantin" Cc: , "nd" , , "Feifei Wang" , "Yigit, Ferruh" , "Andrew Rybchenko" , "Zhang, Qi Z" , "Xing, Beilei" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Morten Br=F8rup > Sent: Thursday, 27 January 2022 18.14 >=20 > > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] > > Sent: Thursday, 27 January 2022 05.07 > > > > Thanks Morten, appreciate your comments. Few responses inline. > > > > > -----Original Message----- > > > From: Morten Br=F8rup > > > Sent: Sunday, December 26, 2021 4:25 AM > > > > > > > From: Feifei Wang [mailto:feifei.wang2@arm.com] > > > > Sent: Friday, 24 December 2021 17.46 > > > > > > > > > > > > > > > > However, this solution poses several constraint: > > > > > > > > 1)The receive queue needs to know which transmit queue it should > > take > > > > the buffers from. The application logic decides which transmit > port > > to > > > > use to send out the packets. In many use cases the NIC might = have > a > > > > single port ([1], [2], [3]), in which case a given transmit = queue > > is > > > > always mapped to a single receive queue (1:1 Rx queue: Tx = queue). > > This > > > > is easy to configure. > > > > > > > > If the NIC has 2 ports (there are several references), then we > will > > > > have > > > > 1:2 (RX queue: TX queue) mapping which is still easy to > configure. > > > > However, if this is generalized to 'N' ports, the configuration > can > > be > > > > long. More over the PMD would have to scan a list of transmit > > queues > > > > to pull the buffers from. > > > > > > I disagree with the description of this constraint. > > > > > > As I understand it, it doesn't matter now many ports or queues are > in > > a NIC or > > > system. > > > > > > The constraint is more narrow: > > > > > > This patch requires that all packets ingressing on some port/queue > > must > > > egress on the specific port/queue that it has been configured to > ream > > its > > > buffers from. I.e. an application cannot route packets between > > multiple ports > > > with this patch. > > Agree, this patch as is has this constraint. It is not a constraint > > that would apply for NICs with single port. The above text is > > describing some of the issues associated with generalizing the > solution > > for N number of ports. If N is small, the configuration is small and > > scanning should not be bad. But I think N is the number of queues, not the number of ports. > > >=20 > Perhaps we can live with the 1:1 limitation, if that is the primary = use > case. Or some similar limitation for NICs with dual ports for redundancy. >=20 > Alternatively, the feature could fall back to using the mempool if > unable to get/put buffers directly from/to a participating NIC. In = this > case, I envision a library serving as a shim layer between the NICs = and > the mempool. In other words: Take a step back from the implementation, > and discuss the high level requirements and architecture of the > proposed feature. Please ignore my comment above. I had missed the fact that the direct = re-arm feature only works inside a single NIC, and not across multiple = NICs. And it is not going to work across multiple NICs, unless they are = exactly the same type, because their internal descriptor structures may = differ. Also, taking a deeper look at the i40e part of the patch, I = notice that it already falls back to using the mempool. >=20 > > > > > > > > > > > > > > > > > > > > > > > You are missing the fourth constraint: > > > > > > 4) The application must transmit all received packets immediately, > > i.e. QoS > > > queueing and similar is prohibited. > > I do not understand this, can you please elaborate?. Even if there = is > > QoS queuing, there would be steady stream of packets being > transmitted. > > These transmitted packets will fill the buffers on the RX side. >=20 > E.g. an appliance may receive packets on a 10 Gbps backbone port, and > queue some of the packets up for a customer with a 20 Mbit/s > subscription. When there is a large burst of packets towards that > subscriber, they will queue up in the QoS queue dedicated to that > subscriber. During that traffic burst, there is much more RX than TX. > And after the traffic burst, there will be more TX than RX. >=20 > > > > > > > > > > > > > > > > > > > The patch provides a significant performance improvement, but I am > > > wondering if any real world applications exist that would use = this. > > Only a > > > "router on a stick" (i.e. a single-port router) comes to my mind, > and > > that is > > > probably sufficient to call it useful in the real world. Do you > have > > any other > > > examples to support the usefulness of this patch? > > SmartNIC is a clear and dominant use case, typically they have a > single > > port for data plane traffic (dual ports are mostly for redundancy) > > This patch avoids good amount of store operations. The smaller CPUs > > found in SmartNICs have smaller store buffers which can become > > bottlenecks. Avoiding the lcore cache saves valuable HW cache space. >=20 > OK. This is an important use case! Some NICs have many queues, so the number of RX/TX queue mappings is = big. Aren't SmartNICs going to use many RX/TX queues? >=20 > > > > > > > > Anyway, the patch doesn't do any harm if unused, and the only > > performance > > > cost is the "if (rxq->direct_rxrearm_enable)" branch in the Ethdev > > driver. So I > > > don't oppose to it. If a PMD maintainer agrees to maintaining such a feature, I don't oppose = either. The PMDs are full of cruft already, so why bother complaining about = more, if the performance impact is negligible. :-)