From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <keith.wiles@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id A31F42BA7
 for <dev@dpdk.org>; Fri, 20 Jul 2018 17:35:18 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 20 Jul 2018 08:35:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.51,380,1526367600"; d="scan'208";a="74098061"
Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205])
 by fmsmga001.fm.intel.com with ESMTP; 20 Jul 2018 08:35:15 -0700
Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by
 fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Fri, 20 Jul 2018 08:35:15 -0700
Received: from fmsmsx117.amr.corp.intel.com ([169.254.3.3]) by
 fmsmsx156.amr.corp.intel.com ([169.254.13.214]) with mapi id 14.03.0319.002;
 Fri, 20 Jul 2018 08:35:15 -0700
From: "Wiles, Keith" <keith.wiles@intel.com>
To: Thomas Monjalon <thomas@monjalon.net>
CC: Raslan Darawsheh <rasland@mellanox.com>, "ophirmu@mellanox.com"
 <ophirmu@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [PATCH v3] net/tap: add queues when attaching from secondary
 process
Thread-Index: AQHUIBsZNWHFL4UZz0qhG0RtLsBPE6SYs6MA
Date: Fri, 20 Jul 2018 15:35:14 +0000
Message-ID: <A1EE6F3E-FAD5-44DB-923C-D714F351542C@intel.com>
References: <1528374591-26126-1-git-send-email-rasland@mellanox.com>
 <20180720111552.14132-1-thomas@monjalon.net>
In-Reply-To: <20180720111552.14132-1-thomas@monjalon.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.255.76.126]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <8990FC4B6FCEFD498B76728F240D5FD3@intel.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v3] net/tap: add queues when attaching from
 secondary process
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Jul 2018 15:35:19 -0000



> On Jul 20, 2018, at 4:15 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
>=20
> From: Raslan Darawsheh <rasland@mellanox.com>
>=20
> In the case the device is created by the primary process,
> the secondary must request some file descriptors to attach the queues.
> The file descriptors are shared via IPC Unix socket.
>=20
> Thanks to the IPC synchronization, the secondary process
> is now able to do Rx/Tx on a TAP created by the primary process.
>=20
> Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
> Note: there is a bug in EAL IPC regarding fd translation.
> A fix will be sent later for EAL.
>=20
> v3:
>   - split some long lines
> v2:
>   - translate file descriptors via IPC API
>   - add documentation
> ---
> doc/guides/nics/tap.rst                |  16 +++
> doc/guides/rel_notes/release_18_08.rst |   5 +
> drivers/net/tap/Makefile               |   1 +
> drivers/net/tap/rte_eth_tap.c          | 131 ++++++++++++++++++++++++-
> 4 files changed, 152 insertions(+), 1 deletion(-)
>=20
> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
> index 27148681c..d1f3e1c24 100644
> --- a/doc/guides/nics/tap.rst
> +++ b/doc/guides/nics/tap.rst
> @@ -152,6 +152,22 @@ Distribute IPv4 TCP packets using RSS to a given MAC=
 address over queues 0-3::
>    testpmd> flow create 0 priority 4 ingress pattern eth dst is 0a:0b:0c:=
0d:0e:0f \
>             / ipv4 / tcp / end actions rss queues 0 1 2 3 end / end
>=20
> +Multi-process sharing
> +---------------------
> +
> +It is possible to attach an existing TAP device in a secondary process,
> +by declaring it as a vdev with the same name as in the primary process,
> +and without any parameter.
> +
> +The port attached in a secondary process will give access to the
> +statistics and the queues.
> +Therefore it can be used for monitoring or Rx/Tx processing.
> +
> +The IPC synchronization of Rx/Tx queues is currently limited:
> +
> +  - Only 8 queues
> +  - Synchronized on probing, but not on later port update
> +
> Example
> -------
>=20
> diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_note=
s/release_18_08.rst
> index dd611b571..ec6a81236 100644
> --- a/doc/guides/rel_notes/release_18_08.rst
> +++ b/doc/guides/rel_notes/release_18_08.rst
> @@ -74,6 +74,11 @@ New Features
>   * Add handlers to add/delete VxLAN port number.
>   * Add devarg to specify ingress VLAN rewrite mode.
>=20
> +* **Added TAP Rx/Tx queues sharing with a secondary process.**
> +
> +  A secondary process can attach a TAP device created in the primary pro=
cess,
> +  probe the queues, and process Rx/Tx in a secondary process.
> +
>=20
> API Changes
> -----------
> diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
> index 324336535..3dcf05a72 100644
> --- a/drivers/net/tap/Makefile
> +++ b/drivers/net/tap/Makefile
> @@ -27,6 +27,7 @@ LDLIBS +=3D -lrte_ethdev -lrte_net -lrte_kvargs -lrte_h=
ash
> LDLIBS +=3D -lrte_bus_vdev -lrte_gso
>=20
> CFLAGS +=3D -DTAP_MAX_QUEUES=3D$(TAP_MAX_QUEUES)
> +CFLAGS +=3D -DALLOW_EXPERIMENTAL_API
>=20
> #
> # all source are stored in SRCS-y
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.=
c
> index 4493507ed..98cbdf614 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -16,6 +16,8 @@
> #include <rte_debug.h>
> #include <rte_ip.h>
> #include <rte_string_fns.h>
> +#include <rte_ethdev.h>
> +#include <rte_errno.h>
>=20
> #include <assert.h>
> #include <sys/types.h>
> @@ -62,6 +64,9 @@
> #define TAP_GSO_MBUFS_NUM \
> 	(TAP_GSO_MBUFS_PER_CORE * TAP_GSO_MBUF_CACHE_SIZE)
>=20
> +/* IPC key for queue fds sync */
> +#define TAP_MP_KEY "tap_mp_sync_queues"
> +
> static struct rte_vdev_driver pmd_tap_drv;
> static struct rte_vdev_driver pmd_tun_drv;
>=20
> @@ -100,6 +105,17 @@ enum ioctl_mode {
> 	REMOTE_ONLY,
> };
>=20
> +/* Message header to synchronize queues via IPC */
> +struct ipc_queues {
> +	char port_name[RTE_DEV_NAME_MAX_LEN];
> +	int rxq_count;
> +	int txq_count;
> +	/*
> +	 * The file descriptors are in the dedicated part
> +	 * of the Unix message to be translated by the kernel.
> +	 */
> +};
> +
> static int tap_intr_handle_set(struct rte_eth_dev *dev, int set);
>=20
> /**
> @@ -1920,6 +1936,100 @@ rte_pmd_tun_probe(struct rte_vdev_device *dev)
> 	return ret;
> }
>=20
> +/* Request queue file descriptors from secondary to primary. */
> +static int
> +tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
> +{
> +	int ret;
> +	struct pmd_internals *devpriv;
> +	struct timespec timeout =3D {.tv_sec =3D 1, .tv_nsec =3D 0};
> +	struct rte_mp_msg request, *reply;
> +	struct rte_mp_reply replies;
> +	struct ipc_queues *request_param =3D (struct ipc_queues *)request.param=
;
> +	struct ipc_queues *reply_param;
> +	int queue, fd_iterator;
> +
> +	/* Prepare the request */
> +	strcpy(request.name, TAP_MP_KEY);
> +	strcpy(request_param->port_name, port_name);
> +	request.len_param =3D sizeof(*request_param);
> +
> +	/* Send request and receive reply */
> +	ret =3D rte_mp_request_sync(&request, &replies, &timeout);
> +	if (ret < 0) {
> +		TAP_LOG(ERR, "Failed to request queues from primary: %d",
> +			rte_errno);
> +		return -1;
> +	}
> +	/* FIXME: handle replies.nb_received > 1 */

I am not a big fan of having TODO or FIXME comments in the code. Can we rem=
ove them and just describe the problem and what would happen or not happen =
if the condition occurs? If we need to add this support in the future then =
we need to put these in a enhancement tracker or someplace else.
> +	reply =3D &replies.msgs[0];
> +	reply_param =3D (struct ipc_queues *)reply->param;
> +	TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
> +
> +	/* Attach the queues from received file descriptors */
> +	devpriv =3D dev->data->dev_private;
> +	fd_iterator =3D 0;
> +	for (queue =3D 0; queue < reply_param->rxq_count; queue++)
> +		devpriv->rxq[queue].fd =3D reply->fds[fd_iterator++];
> +	for (queue =3D 0; queue < reply_param->txq_count; queue++)
> +		devpriv->txq[queue].fd =3D reply->fds[fd_iterator++];
> +
> +	return 0;
> +}
> +
> +/* Send the queue file descriptors from the primary process to secondary=
. */
> +static int
> +tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
> +{
> +	struct rte_eth_dev *dev;
> +	struct pmd_internals *devpriv;
> +	struct rte_mp_msg reply;
> +	const struct ipc_queues *request_param =3D
> +		(const struct ipc_queues *)request->param;
> +	struct ipc_queues *reply_param =3D
> +		(struct ipc_queues *)reply.param;
> +	uint16_t port_id;
> +	int queue;
> +	int ret;
> +
> +	/* Get requested port */
> +	TAP_LOG(DEBUG, "Received IPC request for %s", request_param->port_name)=
;
> +	ret =3D rte_eth_dev_get_port_by_name(request_param->port_name, &port_id=
);
> +	if (ret) {
> +		TAP_LOG(ERR, "Failed to get port id for %s",
> +			request_param->port_name);
> +		return -1;
> +	}
> +	dev =3D &rte_eth_devices[port_id];
> +	devpriv =3D dev->data->dev_private;
> +
> +	/* Fill file descriptors for all queues */
> +	reply.num_fds =3D 0;
> +	reply_param->rxq_count =3D 0;
> +	for (queue =3D 0; queue < dev->data->nb_rx_queues; queue++) {
> +		reply.fds[reply.num_fds++] =3D devpriv->rxq[queue].fd;
> +		reply_param->rxq_count++;
> +	}
> +	reply_param->txq_count =3D 0;
> +	for (queue =3D 0; queue < dev->data->nb_tx_queues; queue++) {
> +		reply.fds[reply.num_fds++] =3D devpriv->txq[queue].fd;
> +		reply_param->txq_count++;
> +	}
> +	/* FIXME: split message if more queues than RTE_MP_MAX_FD_NUM */

Here too.
> +	RTE_ASSERT(reply.num_fds <=3D RTE_MP_MAX_FD_NUM);
> +
> +	/* Send reply */
> +	strcpy(reply.name, request->name);
> +	strcpy(reply_param->port_name, request_param->port_name);

Normally we use the snprintf or strlcpy() functions for the above should we=
 do that here too?

> +	reply.len_param =3D sizeof(*reply_param);
> +	if (rte_mp_reply(&reply, peer) < 0) {
> +		TAP_LOG(ERR, "Failed to reply an IPC request to sync queues");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> /* Open a TAP interface device.
>  */
> static int
> @@ -1946,8 +2056,18 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
> 			TAP_LOG(ERR, "Failed to probe %s", name);
> 			return -1;
> 		}
> -		/* TODO: request info from primary to set up Rx and Tx */
> 		eth_dev->dev_ops =3D &ops;
> +		eth_dev->rx_pkt_burst =3D pmd_rx_burst;
> +		eth_dev->tx_pkt_burst =3D pmd_tx_burst;
> +
> +		if (!rte_eal_primary_proc_alive(NULL)) {
> +			TAP_LOG(ERR, "Primary process is missing");
> +			return -1;
> +		}
> +		ret =3D tap_mp_attach_queues(name, eth_dev);
> +		if (ret !=3D 0)
> +			return -1;

Does the call above need to be wrapped using if secondary process or is thi=
s for both primary and secondary?
> +
> 		rte_eth_dev_probing_finish(eth_dev);
> 		return 0;
> 	}
> @@ -1998,10 +2118,19 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
> 	ret =3D eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
> 		ETH_TUNTAP_TYPE_TAP);
>=20
> +	/* Register IPC feed callback */
> +	ret =3D rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
> +	if (ret < 0 && rte_errno !=3D EEXIST) {
> +		TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
> +			tuntap_name, strerror(rte_errno));
> +		goto leave;
> +	}

Same for this one as above?
> +
> leave:
> 	if (ret =3D=3D -1) {
> 		TAP_LOG(ERR, "Failed to create pmd for %s as %s",
> 			name, tap_name);
> +		rte_mp_action_unregister(TAP_MP_KEY);
> 		tap_unit--;		/* Restore the unit number */
> 	}
> 	rte_kvargs_free(kvlist);
> --=20
> 2.17.1
>=20

This looks fine to me except for the couple tests and the FIXME comments. L=
et me know about the other comments and with the FIXME changes I can Ack it=
.

Regards,
Keith