DPDK patches and discussions
 help / color / mirror / Atom feed
From: Pascal Mazon <pascal.mazon@6wind.com>
To: Ophir Munk <ophirmu@mellanox.com>
Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>,
	dev@dpdk.org, Thomas Monjalon <thomas@monjalon.net>,
	Olga Shern <olgas@mellanox.com>,
	stable@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2] net/tap: fix flow and port commands
Date: Fri, 15 Sep 2017 09:31:19 +0200	[thread overview]
Message-ID: <0e18a611-3b35-6624-013b-59c72f71307b@6wind.com> (raw)
In-Reply-To: <1505401645-19008-1-git-send-email-ophirmu@mellanox.com>

Hi,

The patch looks good globally, but I have a few comments/requests, inline.

Best regards,

Pascal

On 14/09/2017 17:07, Ophir Munk wrote:
> This commit fixes two bugs related to tap devices. The first bug occurs
> when executing in testpmd the following flow rule assuming tap device has
> 4 rx and tx pair queues
> "flow create 0 ingress pattern eth / end actions queue index 5 / end"
> This command will report on success and will print ""Flow rule #0 created"
> although it should have failed as queue index number 5 does not exist
>
> The second bug occurs when executing in testpmd "port start all" following
> a port configuration. Assuming 1 pair of rx and tx queues an error is
> reported: "Fail to start port 0"
>
> Before this commit a fixed max number (16) of rx and tx queue pairs were
> created on startup where the file descriptors (fds) of rx and tx pairs were
> identical.  As a result in the first bug queue index 5 existed because the
> tap device was created with 16 rx and tx queue pairs regardless of the
> configured number of queues. In the second bug when tap device was started
> tx fd was closed before opening it and executing ioctl() on it. However
> closing the sole fd of the device caused ioctl to fail with "No such
> device".
>
> This commit creates the configured number of rx and tx queue pairs (up to
> max 16) and assigns a unique fd to each queue. It was written to solve the
> first bug and was found as the right fix for the second bug as well.
>
> Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD")
> Fixes: bf7b7f437b49 ("net/tap: create netdevice during probing")
> Fixes: de96fe68ae95 ("net/tap: add basic flow API patterns and actions")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> Comment 1:
> Reason for V2: fixing coding style. Previous versions of this patch are supersseded
>
> Comment 2:
> Pascal - I would appreciate your feedback on the following discussion:
>
>>> /* Immediately create the netdevice (this will create the 1st 
>>> queue). */
>>> - if (tap_setup_queue(dev, pmd, 0) == -1)
>>> + // rx queue
>>> + if (tap_setup_queue(dev, pmd, 0, 1) == -1) goto error_exit; // tx 
>>> + queue if (tap_setup_queue(dev, pmd, 0, 0) == -1)
>>> goto error_exit;
>>>
>> Adrien Mazarguil wrote:
>> Can this be removed entirely to instead wait until the application 
>> actually requests the creation of the first queue?
> Ophir Munk wrote:
> Removing this piece of code is causing failure during tap device initialization and testpmd as well.
> The subsequent code following this removal contains hundreds of lines so I prefer getting Pasca advise why the immediate creation is required here and if possible - how to avoid it.
I think it was required to have the netdevice present as soon as we left
the create function, as some other function occurred just a little later
(prior to queue_setup) that needed ioctl() on the netdevice.
I would leave it like this for this patch, but it would be more elegant
to create the netdevice only when really needed.

>
>  drivers/net/tap/rte_eth_tap.c | 130 ++++++++++++++++++++++++++++++++----------
>  drivers/net/tap/rte_eth_tap.h |   1 +
>  drivers/net/tap/tap_flow.c    |   3 +-
>  3 files changed, 104 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index 9acea83..f8cf330 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -603,8 +603,31 @@ enum ioctl_mode {
>  }
>  
>  static int
> -tap_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +tap_dev_configure(struct rte_eth_dev *dev)
>  {
> +	if (dev->data->nb_rx_queues > RTE_PMD_TAP_MAX_QUEUES) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: number of rx queues %d exceeds max num of queues %d\n",
> +			dev->device->name,
> +			dev->data->nb_rx_queues,
> +			RTE_PMD_TAP_MAX_QUEUES);
> +		return -1;
> +	}
> +	if (dev->data->nb_tx_queues > RTE_PMD_TAP_MAX_QUEUES) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: number of tx queues %d exceeds max num of queues %d\n",
> +			dev->device->name,
> +			dev->data->nb_tx_queues,
> +			RTE_PMD_TAP_MAX_QUEUES);
> +		return -1;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "%s: %p: TX queues number update: 0 -> %u\n",
> +	     dev->device->name, (void *)dev, dev->data->nb_tx_queues);
> +
> +	RTE_LOG(INFO, PMD, "%s: %p: RX queues number update: 0 -> %u\n",
> +	     dev->device->name, (void *)dev, dev->data->nb_rx_queues);
> +
>  	return 0;
>  }
tap_dev_configure() can be called when restarting a port, I'm not so
sure the number update changes number of queues from "0".
Why not a simple "TX/RX configured queues number: %u"?
> @@ -673,9 +696,9 @@ enum ioctl_mode {
>  	unsigned long rx_nombuf = 0, ierrors = 0;
>  	const struct pmd_internals *pmd = dev->data->dev_private;
>  
> -	imax = (pmd->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
> -		pmd->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
> -
> +	/* rx queue statistics */
> +	imax = (dev->data->nb_rx_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
> +		dev->data->nb_rx_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
>  	for (i = 0; i < imax; i++) {
>  		tap_stats->q_ipackets[i] = pmd->rxq[i].stats.ipackets;
>  		tap_stats->q_ibytes[i] = pmd->rxq[i].stats.ibytes;
> @@ -683,7 +706,13 @@ enum ioctl_mode {
>  		rx_bytes_total += tap_stats->q_ibytes[i];
>  		rx_nombuf += pmd->rxq[i].stats.rx_nombuf;
>  		ierrors += pmd->rxq[i].stats.ierrors;
> +	}
> +
> +	/* tx queue statistics */
> +	imax = (dev->data->nb_tx_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
> +		dev->data->nb_tx_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
>  
> +	for (i = 0; i < imax; i++) {
>  		tap_stats->q_opackets[i] = pmd->txq[i].stats.opackets;
>  		tap_stats->q_errors[i] = pmd->txq[i].stats.errs;
>  		tap_stats->q_obytes[i] = pmd->txq[i].stats.obytes;
> @@ -730,10 +759,14 @@ enum ioctl_mode {
>  	tap_flow_implicit_flush(internals, NULL);
>  
>  	for (i = 0; i < internals->nb_queues; i++) {
> -		if (internals->rxq[i].fd != -1)
> +		if (internals->rxq[i].fd != -1) {
>  			close(internals->rxq[i].fd);
> -		internals->rxq[i].fd = -1;
> -		internals->txq[i].fd = -1;
> +			internals->rxq[i].fd = -1;
> +		}
> +		if (internals->txq[i].fd != -1) {
> +			close(internals->txq[i].fd);
> +			internals->txq[i].fd = -1;
> +		}
>  	}
>  
>  	if (internals->remote_if_index) {
> @@ -887,30 +920,57 @@ enum ioctl_mode {
>  static int
>  tap_setup_queue(struct rte_eth_dev *dev,
>  		struct pmd_internals *internals,
> -		uint16_t qid)
> +		uint16_t qid,
> +		int is_rx)
>  {
> +	int *fd;
> +	int *other_fd;
> +	const char *dir;
>  	struct pmd_internals *pmd = dev->data->dev_private;
>  	struct rx_queue *rx = &internals->rxq[qid];
>  	struct tx_queue *tx = &internals->txq[qid];
> -	int fd = rx->fd == -1 ? tx->fd : rx->fd;
>  
> -	if (fd == -1) {
> -		RTE_LOG(INFO, PMD, "Add queue to TAP %s for qid %d\n",
> -			pmd->name, qid);
> -		fd = tun_alloc(pmd);
> -		if (fd < 0) {
> +	if (is_rx) {
> +		fd = &rx->fd;
> +		other_fd = &tx->fd;
> +		dir = "rx";
> +	} else {
> +		fd = &tx->fd;
> +		other_fd = &rx->fd;
> +		dir = "tx";
> +	}
> +	if (*fd != -1) {
> +		/* fd for this queue already exists */
> +		RTE_LOG(DEBUG, PMD, "%s: fd %d for %s queue qid %d exists\n",
> +			pmd->name, *fd, dir, qid);
> +	} else if (*other_fd != -1) {
> +		/* Only other_fd exists. dup it */
> +		*fd = dup(*other_fd);
> +		if (*fd < 0) {
> +			*fd = -1;
> +			RTE_LOG(ERR, PMD, "%s: dup() failed.\n",
> +				pmd->name);
> +			return -1;
> +		}
> +		RTE_LOG(DEBUG, PMD, "%s: dup fd %d for %s queue qid %d (%d)\n",
> +			pmd->name, *other_fd, dir, qid, *fd);
> +	} else {
> +		/* Both RX and TX fds do not exist (equal -1). Create fd */
> +		*fd = tun_alloc(pmd);
> +		if (*fd < 0) {
> +			*fd = -1; /* restore original value */
>  			RTE_LOG(ERR, PMD, "%s: tun_alloc() failed.\n",
>  				pmd->name);
>  			return -1;
>  		}
> +		RTE_LOG(DEBUG, PMD, "%s: add %s queue for qid %d fd %d\n",
> +			pmd->name, dir, qid, *fd);
>  	}
>  
> -	rx->fd = fd;
> -	tx->fd = fd;
>  	tx->mtu = &dev->data->mtu;
>  	rx->rxmode = &dev->data->dev_conf.rxmode;
>  
> -	return fd;
> +	return *fd;
>  }
>  
>  static int
> @@ -932,10 +992,10 @@ enum ioctl_mode {
>  	int fd;
>  	int i;
>  
> -	if ((rx_queue_id >= internals->nb_queues) || !mp) {
> +	if (rx_queue_id >= dev->data->nb_rx_queues || !mp) {
>  		RTE_LOG(WARNING, PMD,
> -			"nb_queues %d too small or mempool NULL\n",
> -			internals->nb_queues);
> +			"nb_rx_queues %d too small or mempool NULL\n",
> +			dev->data->nb_rx_queues);
>  		return -1;
>  	}
>  
> @@ -954,7 +1014,7 @@ enum ioctl_mode {
>  	rxq->iovecs = iovecs;
>  
>  	dev->data->rx_queues[rx_queue_id] = rxq;
> -	fd = tap_setup_queue(dev, internals, rx_queue_id);
> +	fd = tap_setup_queue(dev, internals, rx_queue_id, 1);
>  	if (fd == -1) {
>  		ret = fd;
>  		goto error;
> @@ -1002,11 +1062,11 @@ enum ioctl_mode {
>  	struct pmd_internals *internals = dev->data->dev_private;
>  	int ret;
>  
> -	if (tx_queue_id >= internals->nb_queues)
> +	if (tx_queue_id >= dev->data->nb_tx_queues)
>  		return -1;
>  
>  	dev->data->tx_queues[tx_queue_id] = &internals->txq[tx_queue_id];
> -	ret = tap_setup_queue(dev, internals, tx_queue_id);
> +	ret = tap_setup_queue(dev, internals, tx_queue_id, 0);
>  	if (ret == -1)
>  		return -1;
>  
> @@ -1166,7 +1226,6 @@ enum ioctl_mode {
>  	.filter_ctrl            = tap_dev_filter_ctrl,
>  };
>  
> -
>  static int
>  eth_dev_tap_create(struct rte_vdev_device *vdev, char *tap_name,
>  		   char *remote_iface, int fixed_mac_type)
> @@ -1193,6 +1252,7 @@ enum ioctl_mode {
>  	}
>  
>  	pmd = dev->data->dev_private;
> +	pmd->dev = dev;
You use pmd->dev only to access dev->nb_rx_queues.
I would suggest instead to add *nb_rx_queues in struct pmd_internals,
and remove *dev from it.
You can even remove nb_queues from pmd_internals, and only use
nb_rx_queues and RTE_PMD_TAP_MAX_QUEUES where relevant.

>  	snprintf(pmd->name, sizeof(pmd->name), "%s", tap_name);
>  	pmd->nb_queues = RTE_PMD_TAP_MAX_QUEUES;
>  
> @@ -1212,8 +1272,9 @@ enum ioctl_mode {
>  
>  	data->dev_link = pmd_link;
>  	data->mac_addrs = &pmd->eth_addr;
> -	data->nb_rx_queues = pmd->nb_queues;
> -	data->nb_tx_queues = pmd->nb_queues;
> +	/* Set the number of RX and TX queues */
> +	data->nb_rx_queues = 0;
> +	data->nb_tx_queues = 0;
>  
>  	dev->data = data;
>  	dev->dev_ops = &ops;
> @@ -1241,7 +1302,11 @@ enum ioctl_mode {
>  	}
>  
>  	/* Immediately create the netdevice (this will create the 1st queue). */
> -	if (tap_setup_queue(dev, pmd, 0) == -1)
> +	/* rx queue */
> +	if (tap_setup_queue(dev, pmd, 0, 1) == -1)
> +		goto error_exit;
> +	/* tx queue */
> +	if (tap_setup_queue(dev, pmd, 0, 0) == -1)
>  		goto error_exit;
>  
>  	ifr.ifr_mtu = dev->data->mtu;
> @@ -1515,9 +1580,16 @@ enum ioctl_mode {
>  		tap_flow_implicit_flush(internals, NULL);
>  		nl_final(internals->nlsk_fd);
>  	}
> -	for (i = 0; i < internals->nb_queues; i++)
> -		if (internals->rxq[i].fd != -1)
> +	for (i = 0; i < internals->nb_queues; i++) {
> +		if (internals->rxq[i].fd != -1) {
>  			close(internals->rxq[i].fd);
> +			internals->rxq[i].fd = -1;
> +		}
> +		if (internals->txq[i].fd != -1) {
> +			close(internals->txq[i].fd);
> +			internals->txq[i].fd = -1;
> +		}
> +	}
>  
>  	close(internals->ioctl_sock);
>  	rte_free(eth_dev->data->dev_private);
> diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
> index 928a045..be57cc7 100644
> --- a/drivers/net/tap/rte_eth_tap.h
> +++ b/drivers/net/tap/rte_eth_tap.h
> @@ -80,6 +80,7 @@ struct tx_queue {
>  };
>  
>  struct pmd_internals {
> +	struct rte_eth_dev *dev;          /* Ethernet device. */
>  	char remote_iface[RTE_ETH_NAME_MAX_LEN]; /* Remote netdevice name */
>  	char name[RTE_ETH_NAME_MAX_LEN];  /* Internal Tap device name */
>  	uint16_t nb_queues;               /* Number of queues supported */
See comment above.
> diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
> index 41f7345..eefa868 100644
> --- a/drivers/net/tap/tap_flow.c
> +++ b/drivers/net/tap/tap_flow.c
> @@ -1092,7 +1092,8 @@ struct tap_flow_items {
>  			if (action)
>  				goto exit_action_not_supported;
>  			action = 1;
> -			if (!queue || (queue->index >= pmd->nb_queues))
> +			if (!queue ||
> +			(queue->index > pmd->dev->data->nb_rx_queues - 1))
>  				goto exit_action_not_supported;
>  			if (flow)
>  				err = add_action_skbedit(flow, queue->index);

  reply	other threads:[~2017-09-15  7:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-14 10:48 [dpdk-dev] [PATCH] " Ophir Munk
2017-09-14 15:07 ` [dpdk-dev] [PATCH v2] " Ophir Munk
2017-09-15  7:31   ` Pascal Mazon [this message]
2017-09-16 22:32     ` [dpdk-dev] [PATCH v3] " Ophir Munk
2017-09-18  7:46       ` Pascal Mazon
2017-09-18 19:35         ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e18a611-3b35-6624-013b-59c72f71307b@6wind.com \
    --to=pascal.mazon@6wind.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=olgas@mellanox.com \
    --cc=ophirmu@mellanox.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).