From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 45A5DA0567; Wed, 10 Mar 2021 09:19:06 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2078D22A2D7; Wed, 10 Mar 2021 09:19:06 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 6620240687 for ; Wed, 10 Mar 2021 09:19:04 +0100 (CET) IronPort-SDR: u/TJO6M3ygPWQYGP7ORGLzYjXREMM2XKrRi8+bUbdHvSvLKgwCB+5+1wiRdovQ5xE+bDek+j9P HCFV1SWr1QhQ== X-IronPort-AV: E=McAfee;i="6000,8403,9917"; a="188453881" X-IronPort-AV: E=Sophos;i="5.81,237,1610438400"; d="scan'208";a="188453881" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2021 00:19:03 -0800 IronPort-SDR: VDL9uNGivTcy3bDJmeveSEHn2uI7o70voZrpEFGBPLOq1smwut8HLzO6jgqJ2N789263L7e7Ti RKjAnEmtGFoQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,237,1610438400"; d="scan'208";a="376880555" Received: from silpixa00399839.ir.intel.com (HELO localhost.localdomain) ([10.237.222.142]) by fmsmga007.fm.intel.com with ESMTP; 10 Mar 2021 00:19:02 -0800 From: Ciara Loftus To: dev@dpdk.org Cc: Ciara Loftus Date: Wed, 10 Mar 2021 07:48:13 +0000 Message-Id: <20210310074816.3029-1-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210309101958.27355-1-ciara.loftus@intel.com> References: <20210309101958.27355-1-ciara.loftus@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v3 0/3] AF_XDP Preferred Busy Polling X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Single-core performance of AF_XDP at high loads can be poor because a heavily loaded NAPI context will never enter or allow for busy-polling. 1C testpmd rxonly (both IRQs and PMD on core 0): ./dpdk-testpmd -l 0-1 --vdev=net_af_xdp0,iface=eth0 --main-lcore=1 -- \ --forward-mode=rxonly 0.088Mpps In order to achieve decent performance at high loads, it is currently recommended ensure the IRQs for the netdev queue and the core running the PMD are different. 2C testpmd rxonly (IRQs on core 0, PMD on core 1): ./dpdk-testpmd -l 0-1 --vdev=net_af_xdp0,iface=eth0 --main-lcore=0 -- \ --forward-mode=rxonly 19.26Mpps However using an extra core is of course not ideal. The SO_PREFER_BUSY_POLL socket option was introduced in kernel v5.11 to help improve 1C performance. See [1]. This series sets this socket option on xsks created with DPDK (ie. instances of the AF_XDP PMD) unless explicitly disabled or not supported by the kernel. It was chosen to be enabled by default in order to bring the AF_XDP PMD in line with most other PMDs which execute on a single core. The following system and netdev settings are recommended in conjunction with busy polling: echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs echo 200000 | sudo tee /sys/class/net/eth0/gro_flush_timeout Re-running the 1C test with busy polling support and the above settings: ./dpdk-testpmd -l 0-1 --vdev=net_af_xdp0,iface=eth0 --main-lcore=1 -- \ --forward-mode=rxonly 10.45Mpps A new vdev arg is introduced called 'busy_budget' whose default value is 64. busy_budget is the value supplied to the kernel with the SO_BUSY_POLL_BUDGET socket option and represents the busy-polling NAPI budget ie. the number of packets the kernel will attempt to process in the netdev's NAPI context. To set the busy budget to 256: ./dpdk-testpmd --vdev=net_af_xdp0,iface=eth0,busy_budget=256 14.06Mpps If you still wish to run using 2 cores (one for PMD once for IRQs) it is recommended to disable busy polling to achieve optimal 2C performance: ./dpdk-testpmd --vdev=net_af_xdp0,iface=eth0,busy_budget=0 19.09Mpps v2->v3: * Moved release notes update to correct location * Changed busy_budget from uint32_t to int since this is the type expected by setsockopt * Validate busy_budget arg is <= UINT16_MAX during parse v1->v2: * Set batch size to default size of ring (2048) * Split batches > 2048 into multiples of 2048 or less and process all packets in the same manner that is done for other drivers eg. ixgbe: http://code.dpdk.org/dpdk/v21.02/source/drivers/net/ixgbe/ixgbe_rxtx.c#L318 * Update commit log with reasoning behing batching changes * Update release notes with note on busy polling support * Fix return type for sycall_needed function when the wakeup flag is not present * Apprpriate log leveling * Set default_*xportconf burst sizes to the default busy budget size (64) * Detect support for busy polling via setsockopt instead of using the presence of the flag RFC->v1: * Fixed behaviour of busy_budget=0 * Ensure we bail out any of the new setsockopts fail [1] https://lwn.net/Articles/837010/ Ciara Loftus (3): net/af_xdp: allow bigger batch sizes net/af_xdp: Use recvfrom() instead of poll() net/af_xdp: preferred busy polling doc/guides/nics/af_xdp.rst | 38 ++++- doc/guides/rel_notes/release_21_05.rst | 4 + drivers/net/af_xdp/compat.h | 14 ++ drivers/net/af_xdp/rte_eth_af_xdp.c | 208 ++++++++++++++++++++++--- 4 files changed, 240 insertions(+), 24 deletions(-) -- 2.17.1