From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0CAB1460C5; Mon, 20 Jan 2025 16:39:37 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D7CD940EF0; Mon, 20 Jan 2025 16:39:36 +0100 (CET) Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) by mails.dpdk.org (Postfix) with ESMTP id 57AF140A77 for ; Mon, 20 Jan 2025 16:39:35 +0100 (CET) Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-38be3bfb045so3481066f8f.0 for ; Mon, 20 Jan 2025 07:39:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737387575; x=1737992375; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zvEctApM/OHD87iC6h9fF/q/wROssEcBqndiNPqCw54=; b=A0kv2jCDoZWLUnuIas4f55H+gXo0BpK+ztRCoHvHg7BYCFyCOpMUNTc15wPdHyl7i7 JlCc64AQozqou9pmdcgYL0V8uyXQoFmzi8wSN8k3eIuPAf0Z032Wpu8wP1c5fdMwVLlG F4RW6ebG/2xsGAc7rVG4L8sVYiGCS7Xtig7BW+V1j98eIxeIJtmhEDdjto9lxufJYnGv NS1KspRoy0rPH6iI1kLGLBHv5HinzdOYiveP/EccMVtQgwf9a06K6mLtc0Wc42iZlNOQ KOxHgpjH6G9gBSgafhEnzbCA1x0RYIzUWscmw4wtOESZ4KwocX9rQO2rFZ4eu1pXJUSk f9Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737387575; x=1737992375; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zvEctApM/OHD87iC6h9fF/q/wROssEcBqndiNPqCw54=; b=IagMANoTIj1aFo5ACsQ+BCbC7SQlwUgWCDxgU1K945pDMxAUlusveAz85bI5pjp0SL ABFCcrA0ssRCMHXUJUsQjvmHFutl9omVb9stLW7qafSGJVZu/MucclArhg7tLM78/nEF A+pI5mTAzDoUjvGX3sSUryIPMx28WH/GY8VlCrwOGfHL/8DKLbrir95zZ3Oy1pQFVTWn gK8+uALmpCKCedJI22El3ARVZ6tGC0RwSnqUatdkSLjEbkd4TBVKfW33pvsBUX/DJvBv iW4fD7cV19m6OBI0elBi/z3EA/5FD8BAVo9vq+QIPzLV7KkSqh0z1VkIKis9uy+YZ35Z DY1g== X-Forwarded-Encrypted: i=1; AJvYcCXZhenNDDjrbjRAsIQ06H/jfWDT8/5dZ5c8akjin6JVi4GY5Nke52zH62Beg92rTUaPyAo=@dpdk.org X-Gm-Message-State: AOJu0Yw3ye26KUFE5vNNjnqJYAMmFRez8hFnpyW9WH73IzxHX/TcNAdQ 7R7Ci+OKYR5Ax+wyIwLlCpf7OL3zuk793QpeM/OCfWUbNnYTlSw6cEiuIRNq X-Gm-Gg: ASbGnct3QHSW6UVjlWyZ1kbdfjCLxKqW4kP/gp4FFF4ojfvrCoMehNR6yA3x+GIAT9M H/KJ/zzGNGoA/91xw8HsGLjsikv/RndXVV8iaDqgmt/O9wYc0LucE5kBrz2JGgZhyw30J1je+2R 6abD0bZCh5wAp9Ynjo9yT7DyzFZdZ+F2FNnw8WrkibXHGaOrCTXKlHcIvNLU1sOGnMxk/Uzg7YL gv0PfZDnGd/9pE9HZRrDj3iHAWjBK5rSWv1fP7+gZrehxRVHHgLQWfFxW+2xXhI6RsIw4z2ZKQI fEwasRkTpPp2WEhREQXWGPOcCiDuU8EaUFh7iXRdO/XtrnpkO78= X-Google-Smtp-Source: AGHT+IFgCUZv6JwxgfOExxj1UUzkY8fJVkRJTvipJgTORNJANMs8l9v1ccozzLvBZGEjIpu/sTmsIg== X-Received: by 2002:a5d:64a1:0:b0:38b:d7c3:377e with SMTP id ffacd0b85a97d-38bf5ac6bc1mr12519376f8f.2.1737387574553; Mon, 20 Jan 2025 07:39:34 -0800 (PST) Received: from tucornea-pc-virtualbox.buh.is.keysight.com ([213.249.122.236]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38bf3215482sm10720612f8f.5.2025.01.20.07.39.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jan 2025 07:39:34 -0800 (PST) From: Tudor Cornea To: stephen@networkplumber.org Cc: linville@tuxdriver.com, ferruh.yigit@amd.com, andrew.rybchenko@oktetlabs.ru, dev@dpdk.org, Tudor Cornea Subject: [PATCH v3] net/af_packet: allow changing fanout mode Date: Mon, 20 Jan 2025 17:39:17 +0200 Message-Id: <20250120153917.343511-1-tudor.cornea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250106153508.11262-1-tudor.cornea@gmail.com> References: <20250106153508.11262-1-tudor.cornea@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org This allows us to control the algorithm used to spread traffic between sockets, adding more fine grained control. If the user does not specify a fanout mode, the PMD driver will default to PACKET_FANOUT_HASH. Signed-off-by: Tudor Cornea --- v3: * Removed PACKET_FANOUT ifdef. The feature has existed since Linux 3.1 * Removed PACKET_FANOUT_FLAG_ROLLOVER ifdef * Simplified the get_fanout() function * Renamed mode variable to load_balance. I was also thinking of load_balance_mode as an alternative. * Removed space between get_fanout and following if statement * Improved documentation. Added link to manual page for PACKET_FANOUT v2: * Renamed the patch * Replaced packet_fanout argument with fanout_mode, which allows more fine grained control --- doc/guides/nics/af_packet.rst | 16 +++- drivers/net/af_packet/rte_eth_af_packet.c | 89 ++++++++++++++++++----- 2 files changed, 84 insertions(+), 21 deletions(-) diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index a343d3a961..2ec9e9e674 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -13,8 +13,6 @@ PACKET_MMAP, which provides a mmapped ring buffer, shared between user space and kernel, that's used to send and receive packets. This helps reducing system calls and the copies needed between user space and Kernel. -The PACKET_FANOUT_HASH behavior of AF_PACKET is used for frame reception. - Options and inherent limitations -------------------------------- @@ -25,11 +23,23 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. * ``qpairs`` - number of Rx and Tx queues (optional, default 1); * ``qdisc_bypass`` - set PACKET_QDISC_BYPASS option in AF_PACKET (optional, disabled by default); +* ``fanout_mode`` - set fanout algorithm. + Possible choices: hash, lb, cpu, rollover, rnd, qm (optional, default hash); * ``blocksz`` - PACKET_MMAP block size (optional, default 4096); * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple of 16B); * ``framecnt`` - PACKET_MMAP frame count (optional, default 512). +For details regarding ``fanout_mode`` argument, you can consult the +`PACKET_FANOUT documentation `_. + +As an example, when ``fanout_mode=cpu`` is selected, the PACKET_FANOUT_CPU +mode will be set on the sockets, so that on frame reception, the socket +will be selected based on the CPU on which the packet arrived. + +Only one ``fanout_mode`` can be chosen. If left unspecified, the default is to +use the PACKET_FANOUT_HASH behavior of AF_PACKET for frame reception. + Because this implementation is based on PACKET_MMAP, and PACKET_MMAP has its own pre-requisites, it should be noted that the inner workings of PACKET_MMAP should be carefully considered before modifying some of these options (namely, @@ -64,7 +74,7 @@ framecnt=512): .. code-block:: console - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,fanout_mode=hash Features and Limitations ------------------------ diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index ceb8d9356a..9ca43fc54e 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -36,6 +36,7 @@ #define ETH_AF_PACKET_FRAMESIZE_ARG "framesz" #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" +#define ETH_AF_PACKET_FANOUT_MODE_ARG "fanout_mode" #define DFLT_FRAME_SIZE (1 << 11) #define DFLT_FRAME_COUNT (1 << 9) @@ -96,6 +97,7 @@ static const char *valid_arguments[] = { ETH_AF_PACKET_FRAMESIZE_ARG, ETH_AF_PACKET_FRAMECOUNT_ARG, ETH_AF_PACKET_QDISC_BYPASS_ARG, + ETH_AF_PACKET_FANOUT_MODE_ARG, NULL }; @@ -700,6 +702,53 @@ open_packet_iface(const char *key __rte_unused, return 0; } +#define PACKET_FANOUT_INVALID -1 + +static int +get_fanout_group_id(int if_index) +{ + return (getpid() ^ if_index) & 0xffff; +} + +static int +get_fanout_mode(const char *fanout_mode) +{ + int load_balance = PACKET_FANOUT_FLAG_DEFRAG | + PACKET_FANOUT_FLAG_ROLLOVER; + + if (!fanout_mode) { + /* Default */ + load_balance |= PACKET_FANOUT_HASH; + } else if (!strcmp(fanout_mode, "hash")) { + load_balance |= PACKET_FANOUT_HASH; + } else if (!strcmp(fanout_mode, "lb")) { + load_balance |= PACKET_FANOUT_LB; + } else if (!strcmp(fanout_mode, "cpu")) { + load_balance |= PACKET_FANOUT_CPU; + } else if (!strcmp(fanout_mode, "rollover")) { + load_balance |= PACKET_FANOUT_ROLLOVER; + } else if (!strcmp(fanout_mode, "rnd")) { + load_balance |= PACKET_FANOUT_RND; + } else if (!strcmp(fanout_mode, "qm")) { + load_balance |= PACKET_FANOUT_QM; + } else { + /* Invalid Fanout Mode */ + load_balance = PACKET_FANOUT_INVALID; + } + + return load_balance; +} + +static int +get_fanout(const char *fanout_mode, int if_index) +{ + int load_balance = get_fanout_mode(fanout_mode); + if (load_balance != PACKET_FANOUT_INVALID) + return get_fanout_group_id(if_index) | (load_balance << 16); + else + return PACKET_FANOUT_INVALID; +} + static int rte_pmd_init_internals(struct rte_vdev_device *dev, const int sockfd, @@ -709,6 +758,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, unsigned int framesize, unsigned int framecnt, unsigned int qdisc_bypass, + const char *fanout_mode, struct pmd_internals **internals, struct rte_eth_dev **eth_dev, struct rte_kvargs *kvlist) @@ -727,9 +777,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, int rc, tpver, discard; int qsockfd = -1; unsigned int i, q, rdsize; -#if defined(PACKET_FANOUT) int fanout_arg; -#endif for (k_idx = 0; k_idx < kvlist->count; k_idx++) { pair = &kvlist->pairs[k_idx]; @@ -809,13 +857,11 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, sockaddr.sll_protocol = htons(ETH_P_ALL); sockaddr.sll_ifindex = (*internals)->if_index; -#if defined(PACKET_FANOUT) - fanout_arg = (getpid() ^ (*internals)->if_index) & 0xffff; - fanout_arg |= (PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG) << 16; -#if defined(PACKET_FANOUT_FLAG_ROLLOVER) - fanout_arg |= PACKET_FANOUT_FLAG_ROLLOVER << 16; -#endif -#endif + fanout_arg = get_fanout(fanout_mode, (*internals)->if_index); + if (fanout_arg == PACKET_FANOUT_INVALID) { + PMD_LOG(ERR, "Invalid fanout mode: %s", fanout_mode); + goto error; + } for (q = 0; q < nb_queues; q++) { /* Open an AF_PACKET socket for this queue... */ @@ -926,16 +972,17 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, goto error; } -#if defined(PACKET_FANOUT) - rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, - &fanout_arg, sizeof(fanout_arg)); - if (rc == -1) { - PMD_LOG_ERRNO(ERR, - "%s: could not set PACKET_FANOUT on AF_PACKET socket for %s", - name, pair->value); - goto error; + if (nb_queues > 1) { + rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, + &fanout_arg, sizeof(fanout_arg)); + if (rc == -1) { + PMD_LOG_ERRNO(ERR, + "%s: could not set PACKET_FANOUT " + "on AF_PACKET socket for %s", + name, pair->value); + goto error; + } } -#endif } /* reserve an ethdev entry */ @@ -1003,6 +1050,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, unsigned int framecount = DFLT_FRAME_COUNT; unsigned int qpairs = 1; unsigned int qdisc_bypass = 1; + const char *fanout_mode = NULL; /* do some parameter checking */ if (*sockfd < 0) @@ -1065,6 +1113,10 @@ rte_eth_from_packet(struct rte_vdev_device *dev, } continue; } + if (strstr(pair->key, ETH_AF_PACKET_FANOUT_MODE_ARG) != NULL) { + fanout_mode = pair->value; + continue; + } } if (framesize > blocksize) { @@ -1091,6 +1143,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, blocksize, blockcount, framesize, framecount, qdisc_bypass, + fanout_mode, &internals, ð_dev, kvlist) < 0) return -1; -- 2.34.1