On 11/26/2021 4:15 AM, Kumara Parameshwaran wrote:
> From: Kumara Parameshwaran <kparameshwar@vmware.com>
>
> When a tap device is hotplugged to primary process which in turn
> adds the device to all secondary process, the secondary process
> does a tap_mp_attach_queues, but the fds are not populated in
> the primary during the probe they are populated during the queue_setup,
> added a fix to sync the queues during rte_eth_dev_start
>
Hi Kumara,
Original intention seems first running primary application, later secondary,
so yes it doesn't looks like covering the hotplug case.
Thanks. The issue is when we use the vdev_netvsc driver which uses TAP interface, in multiprocess and RSS mode things are broken. We encountered this issue while using Azure DPDK PMD.
I have a few questions below, can you please check?
> Signed-off-by: Kumara Parameshwaran <kparameshwar@vmware.com>
> ---
> drivers/net/tap/rte_eth_tap.c | 80 +++++++++++++++++++++++++++++++++++
> 1 file changed, 80 insertions(+)
>
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index f1b48cae82..8e7915656b 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -67,6 +67,7 @@
>
> /* IPC key for queue fds sync */
> #define TAP_MP_KEY "tap_mp_sync_queues"
> +#define TAP_MP_REQ_START_RXTX "tap_mp_req_start_rxtx"
>
> #define TAP_IOV_DEFAULT_MAX 1024
>
> @@ -880,11 +881,51 @@ tap_link_set_up(struct rte_eth_dev *dev)
> return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
> }
>
> +static int tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
> +{
> + struct rte_mp_msg msg;
> + struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
> + int err;
> + int fd_iterator = 0;
> + struct pmd_process_private *process_private = dev->process_private;
> + int i;
> +
> + memset(&msg, 0, sizeof(msg));
> + strcpy(msg.name, TAP_MP_REQ_START_RXTX);
> + strlcpy(request_param->port_name, dev->data->name,
> + sizeof(request_param->port_name));
Why one use 'strcpy' and other 'strlcpy', can you please unify both to 'strlcpy'?
Sure, will unify it.
> + msg.len_param = sizeof(*request_param);
> + for (i = 0; i < dev->data->nb_tx_queues; i++) {
> + msg.fds[fd_iterator++] = process_private->txq_fds[i];
> + msg.num_fds++;
> + request_param->txq_count++;
> + }
> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
> + msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> + msg.num_fds++;
> + request_param->rxq_count++;
> + }
> +
> + RTE_ASSERT(request_param->txq_count == dev->data->nb_tx_queues);
> + RTE_ASSERT(request_param->rxq_count == dev->data->nb_rx_queues);
> +
Are these assertions really useful?
Asserts are good to verify the external assumptions, but for this case
the values are all related to above logic already.
I agree, will remove it.
> + err = rte_mp_sendmsg(&msg);
> + if (err < 0) {
> + TAP_LOG(ERR, "Failed to send start req to secondary %d",
> + rte_errno);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> static int
> tap_dev_start(struct rte_eth_dev *dev)
> {
> int err, i;
>
> + tap_mp_req_on_rxtx(dev);
> +
As for as I understand your logic is primary sends the message to the secondar(y|ies),
so what happens first secondary is started?
In case of TAP PMD looks like there is an assumption where primary should be started first. There is an existing check below during the probe function call.
if (!rte_eal_primary_proc_alive(NULL)) {
TAP_LOG(ERR, "Primary process is missing");
return -1;
}
What about secondary sends the message when they are started?
IMHO, since primary process setups the queue it should be sufficient for the primary processes to the send the message and secondary need not send anything.
Also above functions is called by both primary and secondary, what happens when it is
called by secondary? And the logic is not clear, it can be good to add a process type
check to clarify.
Sure, these are for tap_intr_handle_set and tap_dev_start functions?
> err = tap_intr_handle_set(dev, 1);
> if (err)
> return err;
> @@ -901,6 +942,39 @@ tap_dev_start(struct rte_eth_dev *dev)
> return err;
> }
>
> +static int
> +tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void *peer)
> +{
> + struct rte_eth_dev *dev;
> + int ret;
> + uint16_t port_id;
> + const struct ipc_queues *request_param =
> + (const struct ipc_queues *)request->param;
> + int fd_iterator;
> + int queue;
> + struct pmd_process_private *process_private;
> +
> + ret = rte_eth_dev_get_port_by_name(request_param->port_name, &port_id);
> + if (ret) {
> + TAP_LOG(ERR, "Failed to get port id for %s",
> + request_param->port_name);
> + return -1;
> + }
> + dev = &rte_eth_devices[port_id];> + process_private = dev->process_private;
> + dev->data->nb_rx_queues = request_param->rxq_count;
> + dev->data->nb_tx_queues = request_param->txq_count;
> + fd_iterator = 0;
> + RTE_LOG(DEBUG, PMD, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
> + request_param->txq_count);
> + for (queue = 0; queue < request_param->txq_count; queue++)
> + process_private->txq_fds[queue] = request->fds[fd_iterator++];
> + for (queue = 0; queue < request_param->rxq_count; queue++)
> + process_private->rxq_fds[queue] = request->fds[fd_iterator++];
> +
> + return 0;
> +}
> +
> /* This function gets called when the current port gets stopped.
> */
> static int
> @@ -2445,6 +2519,12 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
> ret = tap_mp_attach_queues(name, eth_dev);
> if (ret != 0)
> return -1;
> +
> + ret = rte_mp_action_register(TAP_MP_REQ_START_RXTX, tap_mp_req_start_rxtx);
> + if (ret < 0 && rte_errno != ENOTSUP)
> + TAP_LOG(ERR, "tap: Failed to register IPC callback: %s",
> + strerror(rte_errno));
> +
While this update fds in start logic is added, do we still need the 'tap_mp_attach_queues()'?
Can't we remove the 'tap_mp_sync_queues()' after this change?
Agreed, will remove the old tap_mp_attach_queues and tap_mp_sync_queues.
And need to unregister the mp_action, possibly in the 'tap_dev_close()'?
Yes, will do it.