* [PATCH] net/af_packet: allow disabling packet fanout @ 2024-12-12 8:04 Tudor Cornea 2024-12-12 17:11 ` Stephen Hemminger 2025-01-06 15:35 ` [PATCH v2] net/af_packet: allow changing fanout mode Tudor Cornea 0 siblings, 2 replies; 7+ messages in thread From: Tudor Cornea @ 2024-12-12 8:04 UTC (permalink / raw) To: linville; +Cc: ferruh.yigit, andrew.rybchenko, dev, Tudor Cornea This allows us to control whether the PMD will attempt to use the PACKET_FANOUT socket option, and allows the binary compiled against newer kernel headers to run on an older kernel, which lacks support for it. Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- doc/guides/nics/af_packet.rst | 4 ++- drivers/net/af_packet/rte_eth_af_packet.c | 32 ++++++++++++++++++----- 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index a343d3a961..42837031de 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -25,6 +25,8 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. * ``qpairs`` - number of Rx and Tx queues (optional, default 1); * ``qdisc_bypass`` - set PACKET_QDISC_BYPASS option in AF_PACKET (optional, disabled by default); +* ``packet_fanout`` - set PACKET_FANOUT option in AF_PACKET (optional, + enabled by default); * ``blocksz`` - PACKET_MMAP block size (optional, default 4096); * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple of 16B); @@ -64,7 +66,7 @@ framecnt=512): .. code-block:: console - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,packet_fanout=0 Features and Limitations ------------------------ diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index ceb8d9356a..521dbef317 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -36,6 +36,7 @@ #define ETH_AF_PACKET_FRAMESIZE_ARG "framesz" #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" +#define ETH_AF_PACKET_PACKET_FANOUT_ARG "packet_fanout" #define DFLT_FRAME_SIZE (1 << 11) #define DFLT_FRAME_COUNT (1 << 9) @@ -96,6 +97,7 @@ static const char *valid_arguments[] = { ETH_AF_PACKET_FRAMESIZE_ARG, ETH_AF_PACKET_FRAMECOUNT_ARG, ETH_AF_PACKET_QDISC_BYPASS_ARG, + ETH_AF_PACKET_PACKET_FANOUT_ARG, NULL }; @@ -709,6 +711,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, unsigned int framesize, unsigned int framecnt, unsigned int qdisc_bypass, + unsigned int packet_fanout, struct pmd_internals **internals, struct rte_eth_dev **eth_dev, struct rte_kvargs *kvlist) @@ -926,14 +929,17 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, goto error; } + if (packet_fanout) { #if defined(PACKET_FANOUT) - rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, - &fanout_arg, sizeof(fanout_arg)); - if (rc == -1) { - PMD_LOG_ERRNO(ERR, - "%s: could not set PACKET_FANOUT on AF_PACKET socket for %s", - name, pair->value); - goto error; + rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, + &fanout_arg, sizeof(fanout_arg)); + if (rc == -1) { + PMD_LOG_ERRNO(ERR, + "%s: could not set PACKET_FANOUT " + "on AF_PACKET socket for %s", + name, pair->value); + goto error; + } } #endif } @@ -1003,6 +1009,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, unsigned int framecount = DFLT_FRAME_COUNT; unsigned int qpairs = 1; unsigned int qdisc_bypass = 1; + unsigned int packet_fanout = 1; /* do some parameter checking */ if (*sockfd < 0) @@ -1065,6 +1072,16 @@ rte_eth_from_packet(struct rte_vdev_device *dev, } continue; } + if (strstr(pair->key, ETH_AF_PACKET_PACKET_FANOUT_ARG) != NULL) { + packet_fanout = atoi(pair->value); + if (packet_fanout > 1) { + PMD_LOG(ERR, + "%s: invalid packet fanout value", + name); + return -1; + } + continue; + } } if (framesize > blocksize) { @@ -1091,6 +1108,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, blocksize, blockcount, framesize, framecount, qdisc_bypass, + packet_fanout, &internals, ð_dev, kvlist) < 0) return -1; -- 2.34.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/af_packet: allow disabling packet fanout 2024-12-12 8:04 [PATCH] net/af_packet: allow disabling packet fanout Tudor Cornea @ 2024-12-12 17:11 ` Stephen Hemminger 2024-12-16 9:58 ` Tudor Cornea 2025-01-06 15:35 ` [PATCH v2] net/af_packet: allow changing fanout mode Tudor Cornea 1 sibling, 1 reply; 7+ messages in thread From: Stephen Hemminger @ 2024-12-12 17:11 UTC (permalink / raw) To: Tudor Cornea; +Cc: linville, ferruh.yigit, andrew.rybchenko, dev On Thu, 12 Dec 2024 10:04:42 +0200 Tudor Cornea <tudor.cornea@gmail.com> wrote: > This allows us to control whether the PMD will attempt to use > the PACKET_FANOUT socket option, and allows the binary compiled > against newer kernel headers to run on an older kernel, which > lacks support for it. > > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> Controlling fanout more is a good idea but not sure what this patch is trying to do with it. - DPDK minimum kernel version is now 4.19 so no point in worrying about backward compatibility. According to man page for packet, fanout was added in 3.1 kernel. - It would be useful to allow application to control fanout in more detail. According to man page: • The load-balance mode PACKET_FANOUT_LB implements a round- robin algorithm. • PACKET_FANOUT_CPU selects the socket based on the CPU that the packet arrived on. • PACKET_FANOUT_ROLLOVER processes all data on a single socket, moving to the next when one becomes backlogged. • PACKET_FANOUT_RND selects the socket using a pseudo-random number generator. • PACKET_FANOUT_QM (available since Linux 3.14) selects the socket using the recorded queue_mapping of the received skb. The default should be for packet to behave like a hardware NIC if RSS is enabled. And use a single queue if RSS is not enabled. The PMD kind of does this now but not the same. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/af_packet: allow disabling packet fanout 2024-12-12 17:11 ` Stephen Hemminger @ 2024-12-16 9:58 ` Tudor Cornea 2025-01-06 15:29 ` Tudor Cornea 0 siblings, 1 reply; 7+ messages in thread From: Tudor Cornea @ 2024-12-16 9:58 UTC (permalink / raw) To: Stephen Hemminger; +Cc: linville, ferruh.yigit, andrew.rybchenko, dev > Controlling fanout more is a good idea but not sure what this patch > is trying to do with it. I am maintaining a DPDK application which should run on a large number of setups. Unfortunately, I do not have a lot of control over the environment the application runs on (e.g kernel). The problem I was trying to solve is that the Af_Packet PMD seems to fail to initialize properly on some setups. I managed to reproduce the issue with testpmd. sudo ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd \ -l 0-3 \ -m 1024 \ --no-huge \ --no-shconf \ --no-pci \ --vdev=net_af_packet0,iface=eth1,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 \ --vdev=net_af_packet1,iface=eth2,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 \ -- \ -i Error: Unknown device type. EAL: Detected CPU lcores: 8 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Selected IOVA mode 'VA' AFPACKET: rte_pmd_init_internals(): net_af_packet0: could not set PACKET_FANOUT on AF_PACKET socket for eth1:Invalid argument VDEV_BUS: vdev_probe(): failed to initialize net_af_packet0 device AFPACKET: rte_pmd_init_internals(): net_af_packet1: could not set PACKET_FANOUT on AF_PACKET socket for eth1:Invalid argument VDEV_BUS: vdev_probe(): failed to initialize net_af_packet1 device EAL: Bus (vdev) probe failed. testpmd: No probed ethernet devices It seems that the call to PACKET_FANOUT setsockopt failed with EINVAL. I debugged the issue further. It seems that the root cause is that when the interface is down (e.g eth1), after the bind operation succeeds, the PACKET_FANOUT setsockopt will fail. This is a behavior of AF_Packet in Linux which I'm not sure is that well documented. It seems to be fixed in a recent Linux Kernel commit. I re-compiled the kernel with the change, and did not see the issue afterwards. Commit 2cee3e6e2e4b74bec96694169f01cd3feec1f264 af_packet: allow fanout_add when socket is not RUNNING [1] https://github.com/torvalds/linux/commit/2cee3e6e2e4b74bec96694169f01cd3feec1f264 I was trying to find a solution for my application to run on setups which don't have the kernel patch. Introducing a devarg to control when packet_fanout is enabled seemed like a possible way around the issue. As a second solution around the issue, I was also thinking of using the PACKET_FANOUT setsockopt only when nb_queues > 1. Conceptually, I think it would make some sense, because fanout doesn't really help when having a single socket. We would need more sockets in order to load balance incoming packets. And this would still allow me to control the behavior from the existing 'qpairs' devarg. The idea would be something along these lines if (nb_queues > 1) { rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, &fanout_arg, sizeof(fanout_arg)); /* ... */ } Would a patch using this idea be considered more suitable ? > - DPDK minimum kernel version is now 4.19 so no point in worrying about > backward compatibility. According to man page for packet, fanout > was added in 3.1 kernel. > > - It would be useful to allow application to control fanout in more detail. This is a great idea. Would introducing a new devarg, (e.g 'fanout_mode') be the proper way to allow the application to customize fanout in more detail ? --vdev=net_af_packet0,iface=eth1,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,fanout_mode=[fanout_hash|fanout_cpu|fanout_rnd|fanout_qm] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/af_packet: allow disabling packet fanout 2024-12-16 9:58 ` Tudor Cornea @ 2025-01-06 15:29 ` Tudor Cornea 0 siblings, 0 replies; 7+ messages in thread From: Tudor Cornea @ 2025-01-06 15:29 UTC (permalink / raw) To: Stephen Hemminger; +Cc: linville, ferruh.yigit, andrew.rybchenko, dev > This is a great idea. Would introducing a new devarg, (e.g > 'fanout_mode') be the proper way to allow the application to customize > fanout in more detail ? > > --vdev=net_af_packet0,iface=eth1,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,fanout_mode=[fanout_hash|fanout_cpu|fanout_rnd|fanout_qm] I'll send a new version of the patch which will add some additional customization ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] net/af_packet: allow changing fanout mode 2024-12-12 8:04 [PATCH] net/af_packet: allow disabling packet fanout Tudor Cornea 2024-12-12 17:11 ` Stephen Hemminger @ 2025-01-06 15:35 ` Tudor Cornea 2025-01-14 18:38 ` Stephen Hemminger 2025-01-20 15:39 ` [PATCH v3] " Tudor Cornea 1 sibling, 2 replies; 7+ messages in thread From: Tudor Cornea @ 2025-01-06 15:35 UTC (permalink / raw) To: stephen; +Cc: linville, ferruh.yigit, andrew.rybchenko, dev, Tudor Cornea This allows us to control the algorithm used to spread traffic between sockets, adding more fine grained control. If the user does not specify a fanout mode, the PMD driver will default to PACKET_FANOUT_HASH. Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- v2: * Renamed the patch * Replaced packet_fanout argument with fanout_mode, which allows more fine grained control --- doc/guides/nics/af_packet.rst | 4 +- drivers/net/af_packet/rte_eth_af_packet.c | 92 ++++++++++++++++++++--- 2 files changed, 83 insertions(+), 13 deletions(-) diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index a343d3a961..3443f95004 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -25,6 +25,8 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. * ``qpairs`` - number of Rx and Tx queues (optional, default 1); * ``qdisc_bypass`` - set PACKET_QDISC_BYPASS option in AF_PACKET (optional, disabled by default); +* ``fanout_mode`` - set fanout algorithm. Possible choices: hash,lb,cpu,rollover,rnd,qm (optional, + default hash); * ``blocksz`` - PACKET_MMAP block size (optional, default 4096); * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple of 16B); @@ -64,7 +66,7 @@ framecnt=512): .. code-block:: console - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,fanout_mode=hash Features and Limitations ------------------------ diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index ceb8d9356a..8449975384 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -36,6 +36,7 @@ #define ETH_AF_PACKET_FRAMESIZE_ARG "framesz" #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" +#define ETH_AF_PACKET_FANOUT_MODE_ARG "fanout_mode" #define DFLT_FRAME_SIZE (1 << 11) #define DFLT_FRAME_COUNT (1 << 9) @@ -96,6 +97,7 @@ static const char *valid_arguments[] = { ETH_AF_PACKET_FRAMESIZE_ARG, ETH_AF_PACKET_FRAMECOUNT_ARG, ETH_AF_PACKET_QDISC_BYPASS_ARG, + ETH_AF_PACKET_FANOUT_MODE_ARG, NULL }; @@ -700,6 +702,61 @@ open_packet_iface(const char *key __rte_unused, return 0; } +#if defined(PACKET_FANOUT) +#define PACKET_FANOUT_INVALID -1 + +static int +get_fanout_group_id(int if_index) +{ + return (getpid() ^ if_index) & 0xffff; +} + +static int +get_fanout_mode(const char *fanout_mode) +{ + int mode = PACKET_FANOUT_FLAG_DEFRAG; + +#if defined(PACKET_FANOUT_FLAG_ROLLOVER) + mode |= PACKET_FANOUT_FLAG_ROLLOVER; +#endif + + if (!fanout_mode) { + /* Default */ + mode |= PACKET_FANOUT_HASH; + } else if (!strcmp(fanout_mode, "hash")) { + mode |= PACKET_FANOUT_HASH; + } else if (!strcmp(fanout_mode, "lb")) { + mode |= PACKET_FANOUT_LB; + } else if (!strcmp(fanout_mode, "cpu")) { + mode |= PACKET_FANOUT_CPU; + } else if (!strcmp(fanout_mode, "rollover")) { + mode |= PACKET_FANOUT_ROLLOVER; + } else if (!strcmp(fanout_mode, "rnd")) { + mode |= PACKET_FANOUT_RND; + } else if (!strcmp(fanout_mode, "qm")) { + mode |= PACKET_FANOUT_QM; + } else { + /* Invalid Fanout Mode */ + mode = PACKET_FANOUT_INVALID; + } + + return mode; +} + +static int +get_fanout(const char *fanout_mode, int if_index) +{ + int group_id = get_fanout_group_id(if_index); + int mode = get_fanout_mode(fanout_mode); + int fanout = PACKET_FANOUT_INVALID; + + if (mode != PACKET_FANOUT_INVALID) + fanout = group_id | (mode << 16); + + return fanout; +} +#endif + static int rte_pmd_init_internals(struct rte_vdev_device *dev, const int sockfd, @@ -709,6 +766,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, unsigned int framesize, unsigned int framecnt, unsigned int qdisc_bypass, + const char *fanout_mode, struct pmd_internals **internals, struct rte_eth_dev **eth_dev, struct rte_kvargs *kvlist) @@ -810,11 +868,12 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, sockaddr.sll_ifindex = (*internals)->if_index; #if defined(PACKET_FANOUT) - fanout_arg = (getpid() ^ (*internals)->if_index) & 0xffff; - fanout_arg |= (PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG) << 16; -#if defined(PACKET_FANOUT_FLAG_ROLLOVER) - fanout_arg |= PACKET_FANOUT_FLAG_ROLLOVER << 16; -#endif + fanout_arg = get_fanout(fanout_mode, (*internals)->if_index); + + if (fanout_arg == PACKET_FANOUT_INVALID) { + PMD_LOG(ERR, "Invalid fanout mode: %s", fanout_mode); + goto error; + } #endif for (q = 0; q < nb_queues; q++) { @@ -927,13 +986,16 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, } #if defined(PACKET_FANOUT) - rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, - &fanout_arg, sizeof(fanout_arg)); - if (rc == -1) { - PMD_LOG_ERRNO(ERR, - "%s: could not set PACKET_FANOUT on AF_PACKET socket for %s", - name, pair->value); - goto error; + if (nb_queues > 1) { + rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, + &fanout_arg, sizeof(fanout_arg)); + if (rc == -1) { + PMD_LOG_ERRNO(ERR, + "%s: could not set PACKET_FANOUT " + "on AF_PACKET socket for %s", + name, pair->value); + goto error; + } } #endif } @@ -1003,6 +1065,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, unsigned int framecount = DFLT_FRAME_COUNT; unsigned int qpairs = 1; unsigned int qdisc_bypass = 1; + const char *fanout_mode = NULL; /* do some parameter checking */ if (*sockfd < 0) @@ -1065,6 +1128,10 @@ rte_eth_from_packet(struct rte_vdev_device *dev, } continue; } + if (strstr(pair->key, ETH_AF_PACKET_FANOUT_MODE_ARG) != NULL) { + fanout_mode = pair->value; + continue; + } } if (framesize > blocksize) { @@ -1091,6 +1158,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, blocksize, blockcount, framesize, framecount, qdisc_bypass, + fanout_mode, &internals, ð_dev, kvlist) < 0) return -1; -- 2.34.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] net/af_packet: allow changing fanout mode 2025-01-06 15:35 ` [PATCH v2] net/af_packet: allow changing fanout mode Tudor Cornea @ 2025-01-14 18:38 ` Stephen Hemminger 2025-01-20 15:39 ` [PATCH v3] " Tudor Cornea 1 sibling, 0 replies; 7+ messages in thread From: Stephen Hemminger @ 2025-01-14 18:38 UTC (permalink / raw) To: Tudor Cornea; +Cc: linville, ferruh.yigit, andrew.rybchenko, dev On Mon, 6 Jan 2025 17:35:08 +0200 Tudor Cornea <tudor.cornea@gmail.com> wrote: > This allows us to control the algorithm used to spread traffic between > sockets, adding more fine grained control. If the user does not > specify a fanout mode, the PMD driver will default to > PACKET_FANOUT_HASH. > > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> > > --- > v2: > * Renamed the patch > * Replaced packet_fanout argument with fanout_mode, which allows > more fine grained control > --- > doc/guides/nics/af_packet.rst | 4 +- > drivers/net/af_packet/rte_eth_af_packet.c | 92 ++++++++++++++++++++--- > 2 files changed, 83 insertions(+), 13 deletions(-) > > diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst > index a343d3a961..3443f95004 100644 > --- a/doc/guides/nics/af_packet.rst > +++ b/doc/guides/nics/af_packet.rst > @@ -25,6 +25,8 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. > * ``qpairs`` - number of Rx and Tx queues (optional, default 1); > * ``qdisc_bypass`` - set PACKET_QDISC_BYPASS option in AF_PACKET (optional, > disabled by default); > +* ``fanout_mode`` - set fanout algorithm. Possible choices: hash,lb,cpu,rollover,rnd,qm (optional, > + default hash); Use proper punctuation here, need space after comma. A description of what the differences are or reference to af-packet man page would help. https://www.man7.org/linux/man-pages/man7/packet.7.html The user should not have to huting to find what "qm" means for example. Should also address the earlier part in this document that is: The PACKET_FANOUT_HASH behavior of AF_PACKET is used for frame reception. > * ``blocksz`` - PACKET_MMAP block size (optional, default 4096); > * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple > of 16B); > @@ -64,7 +66,7 @@ framecnt=512): > > .. code-block:: console > > - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 > + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,fanout_mode=hash > > Features and Limitations > ------------------------ > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c > index ceb8d9356a..8449975384 100644 > --- a/drivers/net/af_packet/rte_eth_af_packet.c > +++ b/drivers/net/af_packet/rte_eth_af_packet.c > @@ -36,6 +36,7 @@ > #define ETH_AF_PACKET_FRAMESIZE_ARG "framesz" > #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" > #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" > +#define ETH_AF_PACKET_FANOUT_MODE_ARG "fanout_mode" > > #define DFLT_FRAME_SIZE (1 << 11) > #define DFLT_FRAME_COUNT (1 << 9) > @@ -96,6 +97,7 @@ static const char *valid_arguments[] = { > ETH_AF_PACKET_FRAMESIZE_ARG, > ETH_AF_PACKET_FRAMECOUNT_ARG, > ETH_AF_PACKET_QDISC_BYPASS_ARG, > + ETH_AF_PACKET_FANOUT_MODE_ARG, > NULL > }; > > @@ -700,6 +702,61 @@ open_packet_iface(const char *key __rte_unused, > return 0; > } > > +#if defined(PACKET_FANOUT) Since PACKET_FANOUT has existed since Linux 3.1, the #ifdef for this can be removed now. > +#define PACKET_FANOUT_INVALID -1 > + > +static int > +get_fanout_group_id(int if_index) > +{ > + return (getpid() ^ if_index) & 0xffff; > +} > + > +static int > +get_fanout_mode(const char *fanout_mode) > +{ > + int mode = PACKET_FANOUT_FLAG_DEFRAG; > + > +#if defined(PACKET_FANOUT_FLAG_ROLLOVER) No more #ifdefs please > + mode |= PACKET_FANOUT_FLAG_ROLLOVER; > +#endif > + > + if (!fanout_mode) { > + /* Default */ > + mode |= PACKET_FANOUT_HASH; > + } else if (!strcmp(fanout_mode, "hash")) { > + mode |= PACKET_FANOUT_HASH; > + } else if (!strcmp(fanout_mode, "lb")) { > + mode |= PACKET_FANOUT_LB; > + } else if (!strcmp(fanout_mode, "cpu")) { > + mode |= PACKET_FANOUT_CPU; > + } else if (!strcmp(fanout_mode, "rollover")) { > + mode |= PACKET_FANOUT_ROLLOVER; > + } else if (!strcmp(fanout_mode, "rnd")) { > + mode |= PACKET_FANOUT_RND; > + } else if (!strcmp(fanout_mode, "qm")) { > + mode |= PACKET_FANOUT_QM; > + } else { > + /* Invalid Fanout Mode */ > + mode = PACKET_FANOUT_INVALID; > + } Maybe allow longer names like "load_balance" or "queue_map" > + > + return mode; > +} > + > +static int > +get_fanout(const char *fanout_mode, int if_index) > +{ > + int group_id = get_fanout_group_id(if_index); > + int mode = get_fanout_mode(fanout_mode); > + int fanout = PACKET_FANOUT_INVALID; > + > + if (mode != PACKET_FANOUT_INVALID) > + fanout = group_id | (mode << 16); > + > + return fanout; > +} This could be simplified as: static int get_fanout(const char *fanout_mode, int if_index) { int mode = get_fanout_mode(fanout_mode); if (mode != PACKET_FANOUT_INVALID) { return get_fanout_group_id(if_index) | (mode << 16); else return PACKOUT_FANOUT_INVALID; } > +#endif > + > static int > rte_pmd_init_internals(struct rte_vdev_device *dev, > const int sockfd, > @@ -709,6 +766,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, > unsigned int framesize, > unsigned int framecnt, > unsigned int qdisc_bypass, > + const char *fanout_mode, > struct pmd_internals **internals, > struct rte_eth_dev **eth_dev, > struct rte_kvargs *kvlist) > @@ -810,11 +868,12 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, > sockaddr.sll_ifindex = (*internals)->if_index; > > #if defined(PACKET_FANOUT) > - fanout_arg = (getpid() ^ (*internals)->if_index) & 0xffff; > - fanout_arg |= (PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG) << 16; > -#if defined(PACKET_FANOUT_FLAG_ROLLOVER) > - fanout_arg |= PACKET_FANOUT_FLAG_ROLLOVER << 16; > -#endif > + fanout_arg = get_fanout(fanout_mode, (*internals)->if_index); > + > + if (fanout_arg == PACKET_FANOUT_INVALID) { Looks better without blank line between get_fanout() and the if() but that is totally a matter of personal taste. > + PMD_LOG(ERR, "Invalid fanout mode: %s", fanout_mode); > + goto error; > + } ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3] net/af_packet: allow changing fanout mode 2025-01-06 15:35 ` [PATCH v2] net/af_packet: allow changing fanout mode Tudor Cornea 2025-01-14 18:38 ` Stephen Hemminger @ 2025-01-20 15:39 ` Tudor Cornea 1 sibling, 0 replies; 7+ messages in thread From: Tudor Cornea @ 2025-01-20 15:39 UTC (permalink / raw) To: stephen; +Cc: linville, ferruh.yigit, andrew.rybchenko, dev, Tudor Cornea This allows us to control the algorithm used to spread traffic between sockets, adding more fine grained control. If the user does not specify a fanout mode, the PMD driver will default to PACKET_FANOUT_HASH. Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- v3: * Removed PACKET_FANOUT ifdef. The feature has existed since Linux 3.1 * Removed PACKET_FANOUT_FLAG_ROLLOVER ifdef * Simplified the get_fanout() function * Renamed mode variable to load_balance. I was also thinking of load_balance_mode as an alternative. * Removed space between get_fanout and following if statement * Improved documentation. Added link to manual page for PACKET_FANOUT v2: * Renamed the patch * Replaced packet_fanout argument with fanout_mode, which allows more fine grained control --- doc/guides/nics/af_packet.rst | 16 +++- drivers/net/af_packet/rte_eth_af_packet.c | 89 ++++++++++++++++++----- 2 files changed, 84 insertions(+), 21 deletions(-) diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index a343d3a961..2ec9e9e674 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -13,8 +13,6 @@ PACKET_MMAP, which provides a mmapped ring buffer, shared between user space and kernel, that's used to send and receive packets. This helps reducing system calls and the copies needed between user space and Kernel. -The PACKET_FANOUT_HASH behavior of AF_PACKET is used for frame reception. - Options and inherent limitations -------------------------------- @@ -25,11 +23,23 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. * ``qpairs`` - number of Rx and Tx queues (optional, default 1); * ``qdisc_bypass`` - set PACKET_QDISC_BYPASS option in AF_PACKET (optional, disabled by default); +* ``fanout_mode`` - set fanout algorithm. + Possible choices: hash, lb, cpu, rollover, rnd, qm (optional, default hash); * ``blocksz`` - PACKET_MMAP block size (optional, default 4096); * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple of 16B); * ``framecnt`` - PACKET_MMAP frame count (optional, default 512). +For details regarding ``fanout_mode`` argument, you can consult the +`PACKET_FANOUT documentation <https://www.man7.org/linux/man-pages/man7/packet.7.html>`_. + +As an example, when ``fanout_mode=cpu`` is selected, the PACKET_FANOUT_CPU +mode will be set on the sockets, so that on frame reception, the socket +will be selected based on the CPU on which the packet arrived. + +Only one ``fanout_mode`` can be chosen. If left unspecified, the default is to +use the PACKET_FANOUT_HASH behavior of AF_PACKET for frame reception. + Because this implementation is based on PACKET_MMAP, and PACKET_MMAP has its own pre-requisites, it should be noted that the inner workings of PACKET_MMAP should be carefully considered before modifying some of these options (namely, @@ -64,7 +74,7 @@ framecnt=512): .. code-block:: console - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,fanout_mode=hash Features and Limitations ------------------------ diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index ceb8d9356a..9ca43fc54e 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -36,6 +36,7 @@ #define ETH_AF_PACKET_FRAMESIZE_ARG "framesz" #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" +#define ETH_AF_PACKET_FANOUT_MODE_ARG "fanout_mode" #define DFLT_FRAME_SIZE (1 << 11) #define DFLT_FRAME_COUNT (1 << 9) @@ -96,6 +97,7 @@ static const char *valid_arguments[] = { ETH_AF_PACKET_FRAMESIZE_ARG, ETH_AF_PACKET_FRAMECOUNT_ARG, ETH_AF_PACKET_QDISC_BYPASS_ARG, + ETH_AF_PACKET_FANOUT_MODE_ARG, NULL }; @@ -700,6 +702,53 @@ open_packet_iface(const char *key __rte_unused, return 0; } +#define PACKET_FANOUT_INVALID -1 + +static int +get_fanout_group_id(int if_index) +{ + return (getpid() ^ if_index) & 0xffff; +} + +static int +get_fanout_mode(const char *fanout_mode) +{ + int load_balance = PACKET_FANOUT_FLAG_DEFRAG | + PACKET_FANOUT_FLAG_ROLLOVER; + + if (!fanout_mode) { + /* Default */ + load_balance |= PACKET_FANOUT_HASH; + } else if (!strcmp(fanout_mode, "hash")) { + load_balance |= PACKET_FANOUT_HASH; + } else if (!strcmp(fanout_mode, "lb")) { + load_balance |= PACKET_FANOUT_LB; + } else if (!strcmp(fanout_mode, "cpu")) { + load_balance |= PACKET_FANOUT_CPU; + } else if (!strcmp(fanout_mode, "rollover")) { + load_balance |= PACKET_FANOUT_ROLLOVER; + } else if (!strcmp(fanout_mode, "rnd")) { + load_balance |= PACKET_FANOUT_RND; + } else if (!strcmp(fanout_mode, "qm")) { + load_balance |= PACKET_FANOUT_QM; + } else { + /* Invalid Fanout Mode */ + load_balance = PACKET_FANOUT_INVALID; + } + + return load_balance; +} + +static int +get_fanout(const char *fanout_mode, int if_index) +{ + int load_balance = get_fanout_mode(fanout_mode); + if (load_balance != PACKET_FANOUT_INVALID) + return get_fanout_group_id(if_index) | (load_balance << 16); + else + return PACKET_FANOUT_INVALID; +} + static int rte_pmd_init_internals(struct rte_vdev_device *dev, const int sockfd, @@ -709,6 +758,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, unsigned int framesize, unsigned int framecnt, unsigned int qdisc_bypass, + const char *fanout_mode, struct pmd_internals **internals, struct rte_eth_dev **eth_dev, struct rte_kvargs *kvlist) @@ -727,9 +777,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, int rc, tpver, discard; int qsockfd = -1; unsigned int i, q, rdsize; -#if defined(PACKET_FANOUT) int fanout_arg; -#endif for (k_idx = 0; k_idx < kvlist->count; k_idx++) { pair = &kvlist->pairs[k_idx]; @@ -809,13 +857,11 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, sockaddr.sll_protocol = htons(ETH_P_ALL); sockaddr.sll_ifindex = (*internals)->if_index; -#if defined(PACKET_FANOUT) - fanout_arg = (getpid() ^ (*internals)->if_index) & 0xffff; - fanout_arg |= (PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG) << 16; -#if defined(PACKET_FANOUT_FLAG_ROLLOVER) - fanout_arg |= PACKET_FANOUT_FLAG_ROLLOVER << 16; -#endif -#endif + fanout_arg = get_fanout(fanout_mode, (*internals)->if_index); + if (fanout_arg == PACKET_FANOUT_INVALID) { + PMD_LOG(ERR, "Invalid fanout mode: %s", fanout_mode); + goto error; + } for (q = 0; q < nb_queues; q++) { /* Open an AF_PACKET socket for this queue... */ @@ -926,16 +972,17 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, goto error; } -#if defined(PACKET_FANOUT) - rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, - &fanout_arg, sizeof(fanout_arg)); - if (rc == -1) { - PMD_LOG_ERRNO(ERR, - "%s: could not set PACKET_FANOUT on AF_PACKET socket for %s", - name, pair->value); - goto error; + if (nb_queues > 1) { + rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, + &fanout_arg, sizeof(fanout_arg)); + if (rc == -1) { + PMD_LOG_ERRNO(ERR, + "%s: could not set PACKET_FANOUT " + "on AF_PACKET socket for %s", + name, pair->value); + goto error; + } } -#endif } /* reserve an ethdev entry */ @@ -1003,6 +1050,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, unsigned int framecount = DFLT_FRAME_COUNT; unsigned int qpairs = 1; unsigned int qdisc_bypass = 1; + const char *fanout_mode = NULL; /* do some parameter checking */ if (*sockfd < 0) @@ -1065,6 +1113,10 @@ rte_eth_from_packet(struct rte_vdev_device *dev, } continue; } + if (strstr(pair->key, ETH_AF_PACKET_FANOUT_MODE_ARG) != NULL) { + fanout_mode = pair->value; + continue; + } } if (framesize > blocksize) { @@ -1091,6 +1143,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, blocksize, blockcount, framesize, framecount, qdisc_bypass, + fanout_mode, &internals, ð_dev, kvlist) < 0) return -1; -- 2.34.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-01-20 15:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-12-12 8:04 [PATCH] net/af_packet: allow disabling packet fanout Tudor Cornea 2024-12-12 17:11 ` Stephen Hemminger 2024-12-16 9:58 ` Tudor Cornea 2025-01-06 15:29 ` Tudor Cornea 2025-01-06 15:35 ` [PATCH v2] net/af_packet: allow changing fanout mode Tudor Cornea 2025-01-14 18:38 ` Stephen Hemminger 2025-01-20 15:39 ` [PATCH v3] " Tudor Cornea
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).