From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 0401545C1F;
	Thu, 31 Oct 2024 11:27:31 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id CD60543567;
	Thu, 31 Oct 2024 11:27:30 +0100 (CET)
Received: from dkmailrelay1.smartsharesystems.com
 (smartserver.smartsharesystems.com [77.243.40.215])
 by mails.dpdk.org (Postfix) with ESMTP id 59AA740264
 for <dev@dpdk.org>; Thu, 31 Oct 2024 11:27:29 +0100 (CET)
Received: from smartserver.smartsharesystems.com
 (smartserver.smartsharesys.local [192.168.4.10])
 by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id D0B72205E5;
 Thu, 31 Oct 2024 11:27:28 +0100 (CET)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: RFC - Tap io_uring PMD
Date: Thu, 31 Oct 2024 11:27:25 +0100
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F858@smartserver.smartshare.dk>
X-MimeOLE: Produced By Microsoft Exchange V6.5
In-Reply-To: <20241030145644.0b97f23c@hermes.local>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: RFC - Tap io_uring PMD
Thread-Index: AdsrFp9rriTKd1B+RN+tckGyS67O/wAYK37Q
References: <20241030145644.0b97f23c@hermes.local>
From: =?iso-8859-1?Q?Morten_Br=F8rup?= <mb@smartsharesystems.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>
Cc: <dev@dpdk.org>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 30 October 2024 22.57
>=20
> The current tap device is slow both due to architectural choices and
> the
> overhead of Linux system calls.

Yes; but isn't it only being used for (low volume) management traffic?
Is the TAP PMD performance an issue for anyone? What is their use case?

Or is the key issue that the TAP PMD makes system calls in the fast =
path, so you are looking to implement a new TAP PMD that doesn't make =
any system calls in the fast path?

> I am exploring a how to fix that but
> some
> of the choices require some tradeoffs. Which leads to some open
> questions:
>=20
> 1. DPDK tap also support tunnel (TUN) mode where there is no Ethernet
> header
>    only L3. Does anyone actually use this? It is different than what
> every other
>    PMD expects.

If used for high volume (data plane) traffic, I would assume standard =
PMD behavior (i.e. incl. Ethernet headers) would suffice.

>=20
> 2. The fastest way to use kernel TAP device would be to use io_uring.
>    But this was added in 5.1 kernel (2019). Rather than having
> conditional or
>    dual mode in DPDK tap device, perhaps there should just be a new =
PMD
> tap_uring?

If the features differ significantly, I'm in favor of a new PMD.
And it would be an opportunity to get rid of useless cruft, which I =
think you are already asking about here. :-)

Furthermore, a "clean sheet" implementation - adding all the experience =
accumulated since the old TAP PMD - could serve as showcase for "best =
practices" for software PMDs.

>=20
> 3. Current TAP device provides hooks for several rte_flow types by
> playing
>    games with kernel qdisc. Does anyone really use this? Propose just
> not doing
>    this in new tap_uring.
>=20
> 4. What other features of TAP device beyond basic send/receive make
> sense?
>    It looks like new device could support better statistics.

IMHO, statistics about missed packets are relevant. If the ingress =
(kernel->DPDK) queue is full, and the kernel has to drop packets, this =
drop counter should be exposed to the application through the PMD.

I don't know if the existing TAP PMD supports it, but associating a =
port/queue with a "network namespace" or VRF in the kernel could also be =
relevant.

>=20
> 5. What about Rx interrupt support?

RX interrupt support seems closely related to power management.
It could be used to reduce jitter/latency (and burstiness) when someone =
on the network communicates with an in-band management interface.

>=20
> Probably the hardest part of using io_uring is figuring out how to
> collect
> completions. The simplest way would be to handle all completions rx =
and
> tx
> in the rx_burst function.

Please don't mix RX and TX, unless explicitly requested by the =
application through the recently introduced "mbuf recycle" feature.

<side tracking>
Currently, rte_rx() does two jobs:
* Deliver packets received from the HW to the application.
* Replenish RX descriptors.

Similarly, rte_tx() does two jobs:
* Deliver packets to be transmitted from the application to the HW.
* Release completed TX descriptors.

It would complicate things, but these two associated jobs could be =
separated into separate functions, rx_pre_rx() for RX replenishment and =
tx_post_tx() for TX completion.
This would also give latency sensitive applications more control over =
when to do what.
And it could introduce a TX completion interrupt.
</side tracking>

Why does this PMD need to handle TX completions differently than other =
PMDs?