DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Gur Stavi <gur.stavi@huawei.com>
Cc: dev@dpdk.org, "John W. Linville" <linville@tuxdriver.com>,
	Stephen Hemminger <stephen@networkplumber.org>
Subject: Re: [PATCH v02] net/af_packet: add rollover and defrag options
Date: Fri, 1 Nov 2024 22:49:40 +0000	[thread overview]
Message-ID: <89d610c3-ebae-4fd9-91e1-f200f13584f9@amd.com> (raw)
In-Reply-To: <0dee3d4cf2a563262bddfba86f750122b8dae5ac.1730381951.git.gur.stavi@huawei.com>

On 10/31/2024 1:50 PM, Gur Stavi wrote:
> net_af_packet PMD multi "queue" support relies on Linux FANOUT capability.
> Linux FANOUT is a SW based load balancer that is similar to HW RSS which
> is more common for DPDK PMDs. Instead of multiple HW descriptor queues,
> AF PACKET uses multiple sockets.
> HW RSS will typically drop a packet if its selected RX queue is empty.
> However, Linux FANOUT, as a SW load balancer, can be configured to avoid
> this packet drop by rolling over to the next socket.
> This rollover functionality was ALWAYS enabled in net_af_packet. It is
> surrounded by ifdef, but only to allow compilation on ancient Linux
> versions that did not have it.
> 
> Since DPDK applications are usually designed for HW based PMDs, this
> rollover functionality, which the developers are likely unaware of, could
> be confusing.
> 
> Another option that is part of Linux FANOUT is DEFRAG that instructs Linux
> to compose complete IP packet out of fragments before delivering it to the
> PACKET socket. Again, this behavior typically does not exist for HW based
> PMDs and may confuse users.
> 
> This patch adds 2 options to control these features:
> rollover=[0|1],defrag=[0|1]
> For backward compatibility both features are enabled by default even
> though most users will probably want both of them disabled.
> 
> Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
> ---
> v2:
> * Improve error handling for parsing of arg strings. Add a wrapper around
>   strtoul to replace atoi.
> * Fix checkpatch indentation error due to copy-paste.
> 
> v1: https://mails.dpdk.org/archives/dev/2024-October/307451.html
> ---
>  doc/guides/nics/af_packet.rst             |  4 ++
>  drivers/net/af_packet/rte_eth_af_packet.c | 77 ++++++++++++++++++-----
>  2 files changed, 65 insertions(+), 16 deletions(-)
> 
> diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst
> index 66b977e1a2..4bc748b50b 100644
> --- a/doc/guides/nics/af_packet.rst
> +++ b/doc/guides/nics/af_packet.rst
> @@ -29,6 +29,8 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings.
>  *   ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple
>      of 16B);
>  *   ``framecnt`` - PACKET_MMAP frame count (optional, default 512).
> +*   ``rollover`` - disable(0)/enable(1) PACKET_FANOUT_ROLLOVER (optional, default 1).
> +*   ``defrag`` - disable(0)/enable(1) PACKET_FANOUT_FLAG_DEFRAG (optional, default 1).
> 

+1 to introduce both options, thanks.
Do you have any practical usecase to disable them?


>  Because this implementation is based on PACKET_MMAP, and PACKET_MMAP has its
>  own pre-requisites, it should be noted that the inner workings of PACKET_MMAP
> @@ -47,6 +49,8 @@ For the full details behind PACKET_MMAP's structures and settings, consider
>  reading the `PACKET_MMAP documentation in the Kernel
>  <https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt>`_.
> 
> +For details of ``rollover`` anb ``defrag`` refer to the *packet(7)* man page.
> +
>  Prerequisites
>  -------------
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
> index 68870dd3e2..1a043e8398 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -36,6 +36,8 @@
>  #define ETH_AF_PACKET_FRAMESIZE_ARG	"framesz"
>  #define ETH_AF_PACKET_FRAMECOUNT_ARG	"framecnt"
>  #define ETH_AF_PACKET_QDISC_BYPASS_ARG	"qdisc_bypass"
> +#define ETH_AF_PACKET_ROLLOVER		"rollover"
> +#define ETH_AF_PACKET_DEFRAG		"defrag"
> 

Can you please add new arguments to 'RTE_PMD_REGISTER_PARAM_STRING'
macro below?

>  #define DFLT_FRAME_SIZE		(1 << 11)
>  #define DFLT_FRAME_COUNT	(1 << 9)
> @@ -91,6 +93,8 @@ static const char *valid_arguments[] = {
>  	ETH_AF_PACKET_FRAMESIZE_ARG,
>  	ETH_AF_PACKET_FRAMECOUNT_ARG,
>  	ETH_AF_PACKET_QDISC_BYPASS_ARG,
> +	ETH_AF_PACKET_ROLLOVER,
> +	ETH_AF_PACKET_DEFRAG,
>  	NULL
>  };
> 
> @@ -659,6 +663,20 @@ open_packet_iface(const char *key __rte_unused,
>  	return 0;
>  }
> 
> +static int
> +uint_arg_parse(const char *arg_str, unsigned int *arg_val)
> +{
> +	unsigned long val;
> +	char *endptr;
> +
> +	val = strtoul(arg_str, &endptr, 10);
> +	if (*arg_str == '\0' || *endptr != '\0')
> +		return -EINVAL;
> +
> +	*arg_val = val;
> +	return 0;
> +}
> +
>  static int
>  rte_pmd_init_internals(struct rte_vdev_device *dev,
>                         const int sockfd,
> @@ -676,6 +694,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  	const unsigned int numa_node = dev->device.numa_node;
>  	struct rte_eth_dev_data *data = NULL;
>  	struct rte_kvargs_pair *pair = NULL;
> +	const char *ifname = NULL;
>  	struct ifreq ifr;
>  	size_t ifnamelen;
>  	unsigned k_idx;
> @@ -686,6 +705,8 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  	int rc, tpver, discard;
>  	int qsockfd = -1;
>  	unsigned int i, q, rdsize;
> +	unsigned int rollover = 1;
> +	unsigned int defrag = 1;
>  #if defined(PACKET_FANOUT)
>  	int fanout_arg;
>  #endif
> @@ -693,9 +714,30 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
>  		pair = &kvlist->pairs[k_idx];
>  		if (strstr(pair->key, ETH_AF_PACKET_IFACE_ARG) != NULL)
> -			break;
> +			ifname = pair->value;
> +		if (strcmp(pair->key, ETH_AF_PACKET_ROLLOVER) == 0) {
> +			rc = uint_arg_parse(pair->value, &rollover);
> +			if (rc || rollover > 1) {
> +				PMD_LOG(ERR,
> +					"%s: invalid rollover value: %s",
> +					name, pair->value);
> +				return -1;
> +			}
> +			continue;
> +		}
> +		if (strcmp(pair->key, ETH_AF_PACKET_DEFRAG) == 0) {
> +			rc = uint_arg_parse(pair->value, &defrag);
> +			if (rc || defrag > 1) {
> +				PMD_LOG(ERR,
> +					"%s: invalid defrag value: %s",
> +					name, pair->value);
> +				return -1;
> +			}
> +			continue;
> +		}
>

Instead of adding parameter parsing here, they are already in
'rte_eth_from_packet()' function, better to add there,

BUT, before adding more paramter, can you please fix the parameter
parsing in the af_packet PMD? This is in my list for a long time now.

Instead of accessing to 'kvlist' internals (kvlist->count,
kvlist->pairs[]), and use 'strstr' (or 'strcmp'), there is already
existing 'rte_kvargs_process()' API for it, you can find multiple
samples in other drivers.


>  	}
> -	if (pair == NULL) {
> +
> +	if (ifname == NULL) {
>  		PMD_LOG(ERR,
>  			"%s: no interface specified for AF_PACKET ethdev",
>  		        name);
> @@ -738,21 +780,21 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  	req->tp_frame_size = framesize;
>  	req->tp_frame_nr = framecnt;
> 
> -	ifnamelen = strlen(pair->value);
> +	ifnamelen = strlen(ifname);
>  	if (ifnamelen < sizeof(ifr.ifr_name)) {
> -		memcpy(ifr.ifr_name, pair->value, ifnamelen);
> +		memcpy(ifr.ifr_name, ifname, ifnamelen);
>  		ifr.ifr_name[ifnamelen] = '\0';
>  	} else {
>  		PMD_LOG(ERR,
>  			"%s: I/F name too long (%s)",
> -			name, pair->value);
> +			name, ifname);
>  		goto free_internals;
>  	}
>  	if (ioctl(sockfd, SIOCGIFINDEX, &ifr) == -1) {
>  		PMD_LOG_ERRNO(ERR, "%s: ioctl failed (SIOCGIFINDEX)", name);
>  		goto free_internals;
>  	}
> -	(*internals)->if_name = strdup(pair->value);
> +	(*internals)->if_name = strdup(ifname);
>  	if ((*internals)->if_name == NULL)
>  		goto free_internals;
>  	(*internals)->if_index = ifr.ifr_ifindex;
> @@ -770,9 +812,12 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
> 
>  #if defined(PACKET_FANOUT)
>  	fanout_arg = (getpid() ^ (*internals)->if_index) & 0xffff;
> -	fanout_arg |= (PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG) << 16;
> +	fanout_arg |= PACKET_FANOUT_HASH << 16;
> +	if (defrag)
> +		fanout_arg |= PACKET_FANOUT_FLAG_DEFRAG << 16;
>  #if defined(PACKET_FANOUT_FLAG_ROLLOVER)
> -	fanout_arg |= PACKET_FANOUT_FLAG_ROLLOVER << 16;
> +	if (rollover)
> +		fanout_arg |= PACKET_FANOUT_FLAG_ROLLOVER << 16;
>  #endif
>  #endif
> 
> @@ -792,7 +837,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rc == -1) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: could not set PACKET_VERSION on AF_PACKET socket for %s",
> -				name, pair->value);
> +				name, ifname);
>  			goto error;
>  		}
> 
> @@ -802,7 +847,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rc == -1) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: could not set PACKET_LOSS on AF_PACKET socket for %s",
> -				name, pair->value);
> +				name, ifname);
>  			goto error;
>  		}
> 
> @@ -813,7 +858,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  			if (rc == -1) {
>  				PMD_LOG_ERRNO(ERR,
>  					"%s: could not set PACKET_QDISC_BYPASS on AF_PACKET socket for %s",
> -					name, pair->value);
> +					name, ifname);
>  				goto error;
>  			}
>  #endif
> @@ -823,7 +868,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rc == -1) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: could not set PACKET_RX_RING on AF_PACKET socket for %s",
> -				name, pair->value);
> +				name, ifname);
>  			goto error;
>  		}
> 
> @@ -831,7 +876,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rc == -1) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: could not set PACKET_TX_RING on AF_PACKET "
> -				"socket for %s", name, pair->value);
> +				"socket for %s", name, ifname);
>  			goto error;
>  		}
> 
> @@ -844,7 +889,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rx_queue->map == MAP_FAILED) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: call to mmap failed on AF_PACKET socket for %s",
> -				name, pair->value);
> +				name, ifname);
>  			goto error;
>  		}
> 
> @@ -881,7 +926,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rc == -1) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: could not bind AF_PACKET socket to %s",
> -				name, pair->value);
> +				name, ifname);
>  			goto error;
>  		}
> 
> @@ -891,7 +936,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>  		if (rc == -1) {
>  			PMD_LOG_ERRNO(ERR,
>  				"%s: could not set PACKET_FANOUT on AF_PACKET socket for %s",
> -				name, pair->value);
> +				name, ifname);
>  			goto error;
>  		}
>  #endif
> 
> base-commit: 98613d32e3dac58d685f4f236cf8cc9733abaaf3
> --
> 2.40.1
> 


      reply	other threads:[~2024-11-01 22:49 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31 13:50 Gur Stavi
2024-11-01 22:49 ` Ferruh Yigit [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89d610c3-ebae-4fd9-91e1-f200f13584f9@amd.com \
    --to=ferruh.yigit@amd.com \
    --cc=dev@dpdk.org \
    --cc=gur.stavi@huawei.com \
    --cc=linville@tuxdriver.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).