From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D756E45BE6; Fri, 1 Nov 2024 01:34:57 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 74C5140264; Fri, 1 Nov 2024 01:34:57 +0100 (CET) Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by mails.dpdk.org (Postfix) with ESMTP id B38B940041 for ; Fri, 1 Nov 2024 01:34:55 +0100 (CET) Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2e3010478e6so1119378a91.1 for ; Thu, 31 Oct 2024 17:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1730421294; x=1731026094; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=WrpayW8qI9kSYxw9dNZykf5BJ/fWebI7GGR+37gRQTU=; b=ic4snn4SQU5ww0cyhCS6RVOijjCTccE83dyHfcvvTfrBDzo5rUMZW7vQIJk/nga1GN CV3m4A6CKBTPGSsN4pqTOf6Kpv3Q+0SNJcThWIrLJJdPN49hyFkICv4DKO+Z2aCoBVRQ wTQuIBn0tsdsrTOu7feVXWtX/XRTeoRqrUrYSJ/EZFyAzsxCz7A4w+NrkiboS381SGL0 CDqlQ0wH+3lFSQVF/HQHJwsQAuTVnxxQ+q1eirsAfzC7KDEhIGjuoGoj4nftvdfbhpKu 3hdnnJ8OOzmA8mzuMb+xUt2yUU9OXi8xVU71ksHcev/sD0F3EMTCXq+N1GVeiIg6OEBD +UNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730421294; x=1731026094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WrpayW8qI9kSYxw9dNZykf5BJ/fWebI7GGR+37gRQTU=; b=nI7M/4d4V2DgtGF9HoSWd6daOvhSseq6qXzZ3reYaenTxg+6gMmj/1c4+68cxkKHQK 9GlvB1m1YzeOR51561dYTyZ1kKxcL2WXA2ViF3OGajcyv9DYfBprBXr2P1+54ZKDFTj5 Bng2NbfGf/FDL9BBwWuY8TLq6d6IEGcVpYLZmn1+YQOFYeRmpSiGxThUgJ/sx/opRkkK 1P2hIShNy2dEwbxFJX8173AJPkAVPC/rLpoctc37FCA7+gIYpP6Me6WhiJUdyjViMPHv XJe3jg+5f3PqMZ+ABwtRaMQO2Z9cE9oUJ5e/YaMTkyEHvo/2x79u/4zkP35uH44I4IV0 eaQg== X-Gm-Message-State: AOJu0YwVaNfzDyofq6xnl0qdaVyV+Z1ylAK8+9gkvb61HkmqgopaW9P7 fzN7Is2tjKJGV8P6HyQsSVDXEQIxk7CJLFjzsBkQ0cS2ULF3oZXrjGNqKFmhMIOCPRpyO3KVQ7F Y X-Google-Smtp-Source: AGHT+IF8eXL5m+nplThKbHQDDznADdJxEFfPuxBjQZaNgQCDrJOSzmMIXsEB7eTa8AVPof3EWsiaDg== X-Received: by 2002:a17:90b:1645:b0:2e2:9522:279e with SMTP id 98e67ed59e1d1-2e94c51b6d1mr2454737a91.31.1730421294365; Thu, 31 Oct 2024 17:34:54 -0700 (PDT) Received: from hermes.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e92fa4850esm3989782a91.27.2024.10.31.17.34.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Oct 2024 17:34:54 -0700 (PDT) Date: Thu, 31 Oct 2024 17:34:50 -0700 From: Stephen Hemminger To: Morten =?UTF-8?B?QnLDuHJ1cA==?= Cc: Subject: Re: RFC - Tap io_uring PMD Message-ID: <20241031173450.26cdb54c@hermes.local> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F858@smartserver.smartshare.dk> References: <20241030145644.0b97f23c@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F858@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, 31 Oct 2024 11:27:25 +0100 Morten Br=C3=B8rup wrote: > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Wednesday, 30 October 2024 22.57 > >=20 > > The current tap device is slow both due to architectural choices and > > the > > overhead of Linux system calls. =20 >=20 > Yes; but isn't it only being used for (low volume) management traffic? > Is the TAP PMD performance an issue for anyone? What is their use case? In embedded systems, if you want to use DPDK for dataplane, you still need to have a control plane path to the kernel. And most of the hardware used does not support a bifurcated driver. Either that or have two NIC's. >=20 > Or is the key issue that the TAP PMD makes system calls in the fast path,= so you are looking to implement a new TAP PMD that doesn't make any system= calls in the fast path? Even the control path performance matters. Think of a router with lots BGP connections, or doing updates. >=20 > > I am exploring a how to fix that but > > some > > of the choices require some tradeoffs. Which leads to some open > > questions: > >=20 > > 1. DPDK tap also support tunnel (TUN) mode where there is no Ethernet > > header > > only L3. Does anyone actually use this? It is different than what > > every other > > PMD expects. =20 >=20 > If used for high volume (data plane) traffic, I would assume standard PMD= behavior (i.e. incl. Ethernet headers) would suffice. >=20 > >=20 > > 2. The fastest way to use kernel TAP device would be to use io_uring. > > But this was added in 5.1 kernel (2019). Rather than having > > conditional or > > dual mode in DPDK tap device, perhaps there should just be a new PMD > > tap_uring? =20 >=20 > If the features differ significantly, I'm in favor of a new PMD. > And it would be an opportunity to get rid of useless cruft, which I think= you are already asking about here. :-) Yes, and the TAP device was written to support a niche use case (all the fl= ow stuff). Also TAP device has lots of extra code, at some point doing bit-by-bit clea= nup gets annoying. >=20 > Furthermore, a "clean sheet" implementation - adding all the experience a= ccumulated since the old TAP PMD - could serve as showcase for "best practi= ces" for software PMDs. >=20 > >=20 > > 3. Current TAP device provides hooks for several rte_flow types by > > playing > > games with kernel qdisc. Does anyone really use this? Propose just > > not doing > > this in new tap_uring. > >=20 > > 4. What other features of TAP device beyond basic send/receive make > > sense? > > It looks like new device could support better statistics. =20 >=20 > IMHO, statistics about missed packets are relevant. If the ingress (kerne= l->DPDK) queue is full, and the kernel has to drop packets, this drop count= er should be exposed to the application through the PMD. It may require some kernel side additions to extract that, but not out of s= cope. >=20 > I don't know if the existing TAP PMD supports it, but associating a port/= queue with a "network namespace" or VRF in the kernel could also be relevan= t. All network devices can be put in network namespace; VRF in Linux is separa= te from netns it has to do with which routing table is associated with the net device. >=20 > >=20 > > 5. What about Rx interrupt support? =20 >=20 > RX interrupt support seems closely related to power management. > It could be used to reduce jitter/latency (and burstiness) when someone o= n the network communicates with an in-band management interface. Not sure if ioring has wakeup mechanism, but probably epoll() is possible. >=20 > >=20 > > Probably the hardest part of using io_uring is figuring out how to > > collect > > completions. The simplest way would be to handle all completions rx and > > tx > > in the rx_burst function. =20 >=20 > Please don't mix RX and TX, unless explicitly requested by the applicatio= n through the recently introduced "mbuf recycle" feature. The issue is Rx and Tx share a single fd and ioring for completion is per f= d. The implementation for ioring came from the storage side so initially it wa= s for fixing the broken Linux AIO support. Some other devices only have single interrupt or ring shared with rx/tx so = not unique. Virtio, netvsc, and some NIC's. The problem is that if Tx completes descriptors then there needs to be lock= ing to prevent Rx thread and Tx thread overlapping. And a spin lock is a perfor= mance buzz kill.