From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5CD5942809 for ; Wed, 22 Mar 2023 14:25:28 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 56F7F40E09; Wed, 22 Mar 2023 14:25:28 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 2461040A84 for ; Wed, 22 Mar 2023 14:25:26 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679491525; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=eLIe2tIeIMottnPYM54HfoXZLgoNOyqr+3Ke7fl9JV4=; b=een5wqJiQRYlQ+1EN5pDIe0rqeNAmNTUwPTBvTGC3Jfgj9u5DJYEdZbZ8rzLx8QVprwD5Z 9ekin3vquEZ4Ko+clO+lMF2ZOGDPk+KBN7pm3uf9JcuPY13oeU7Te+2NtOBjr4Wa+ZVISC aw59SPcUb3nay0jkXM946DhRgATrpYM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-184-vlhmhjkaORKbPgAjQw5iEg-1; Wed, 22 Mar 2023 09:25:22 -0400 X-MC-Unique: vlhmhjkaORKbPgAjQw5iEg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C41132817226; Wed, 22 Mar 2023 13:25:21 +0000 (UTC) Received: from dmarchan.redhat.com (unknown [10.45.224.63]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06457140EBF4; Wed, 22 Mar 2023 13:25:19 +0000 (UTC) From: David Marchand To: stable@dpdk.org, luca.boccassi@gmail.com Cc: Chenbo Xia , Dukai Yuan , Maxime Coquelin , Junjie Chen , Jianfeng Tan , Hyong Youb Kim , Harman Kalra Subject: [PATCH 20.11] net/vhost: fix Rx interrupt Date: Wed, 22 Mar 2023 14:25:15 +0100 Message-Id: <20230322132515.2145238-1-david.marchand@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org [ upstream commit 7995bf4c295464c3a4c3f4ac500f890ab55567e7 ] In the situation when a port was started while no virtio driver was connected, Rx interrupts were broken. They were also broken after a virtio driver reconnects. There were several issues mixed in: - this driver was not exposing a fixed file descriptor per Rx queue, If a virtio driver was not connected yet, each Rx queue vector was pointing at a -1 fd, and an application could interpret this as a lack of Rx interrupt support, - when a virtio driver later (re)connected, this net/vhost driver was hacking into the EAL layer epoll fd to remove a old vring kickfd and insert the new vring kickfd. This hack constitutes a layer violation plus users of rte_eth_dev_rx_intr_ctl_q_get_fd() were not notified of this change, - in the case of reconnection, because the interrupt handle was reallocated, a 0 fd was failing to be removed from the EAL layer epoll fd, which resulted in never fixing the EAL epoll fd, To fix Rx interrupts: - allocating (eth_vhost_install_intr) / releasing (eth_vhost_uninstall_intr) the interrupt handle is moved when starting / closing the port, while setting / resetting per rxq fd is triggered by vhost events via some new helpers (see eth_vhost_configure_intr and eth_vhost_unconfigure_intr), - a "proxy" epoll fd is created per Rx queue at the time the interrupt handle is allocated, so applications can start waiting for events on those fds, even before a virtio driver initialises, - when available, vring kickd are populated in the "proxy" epoll fd, Bugzilla ID: 1135 Fixes: 3f8ff12821e4 ("vhost: support interrupt mode") Fixes: 3d4cd4be577c ("net/vhost: fix interrupt mode") Fixes: d61138d4f0e2 ("drivers: remove direct access to interrupt handle") Signed-off-by: David Marchand Reviewed-by: Chenbo Xia Tested-by: Dukai Yuan --- Some notes on this backport. The main reason for the conflict is the interrupt handle rework from v21.11. The l3fwd-power example is the only in-tree example using Rx interrupts. But trying to directly reproduce bugzilla 1135 is not possible because commit f0b00d983e38 ("examples/l3fwd-power: support virtio/vhost") is missing in v20.11. So for testing purpose, I cherry-picked this commit. For the record, I tested as a normal user on my laptop. I started (in this order), the l3fwd-power application, then testpmd. $ rm -f plop.sock; ./build/examples/dpdk-l3fwd-power -l0,1 --in-memory \ -a 0000:00:00.0 --vdev net_vhost0,iface=plop.sock,client=1 \ --log-level=lib.*:debug -- \ -p 0x1 --interrupt-only --config '(0,0,1)' --parse-ptype 0 $ ./build/app/dpdk-testpmd -l0,2 --log-level=*:debug \ --single-file-segment --in-memory -a 0000:00:00.0 \ --vdev net_virtio_user0,path=plop.sock,server=1 -- \ -i --total-num-mbufs=2048 Then I started / stopped traffic in testpmd. I closed / restarted testpmd, and then started / stopped traffic. In all those steps, I checked l3fwd-power application logs which indicate Rx interrupt delivery / packet reception. --- drivers/net/vhost/rte_eth_vhost.c | 312 +++++++++++++----------------- 1 file changed, 130 insertions(+), 182 deletions(-) diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index 5e57e35ebd..44ebcd3b53 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -97,8 +97,9 @@ struct vhost_queue { uint16_t port; uint16_t virtqueue_id; struct vhost_stats stats; - int intr_enable; rte_spinlock_t intr_lock; + struct epoll_event ev; + int kickfd; }; struct pmd_internal { @@ -525,112 +526,68 @@ find_internal_resource(char *ifname) return list; } -static int +static void eth_vhost_update_intr(struct rte_eth_dev *eth_dev, uint16_t rxq_idx) { - struct rte_intr_handle *handle = eth_dev->intr_handle; - struct rte_epoll_event rev; - int epfd, ret; - - if (!handle) - return 0; + struct rte_vhost_vring vring; + struct vhost_queue *vq; - if (handle->efds[rxq_idx] == handle->elist[rxq_idx].fd) - return 0; + vq = eth_dev->data->rx_queues[rxq_idx]; + if (vq == NULL || vq->vid < 0) + return; - VHOST_LOG(INFO, "kickfd for rxq-%d was changed, updating handler.\n", - rxq_idx); + if (rte_vhost_get_vhost_vring(vq->vid, (rxq_idx << 1) + 1, &vring) < 0) { + VHOST_LOG(DEBUG, "Failed to get rxq-%d's vring, skip!\n", rxq_idx); + return; + } - if (handle->elist[rxq_idx].fd != -1) - VHOST_LOG(ERR, "Unexpected previous kickfd value (Got %d, expected -1).\n", - handle->elist[rxq_idx].fd); + rte_spinlock_lock(&vq->intr_lock); - /* - * First remove invalid epoll event, and then install - * the new one. May be solved with a proper API in the - * future. - */ - epfd = handle->elist[rxq_idx].epfd; - rev = handle->elist[rxq_idx]; - ret = rte_epoll_ctl(epfd, EPOLL_CTL_DEL, rev.fd, - &handle->elist[rxq_idx]); - if (ret) { - VHOST_LOG(ERR, "Delete epoll event failed.\n"); - return ret; + /* Remove previous kickfd from proxy epoll */ + if (vq->kickfd >= 0 && vq->kickfd != vring.kickfd) { + if (epoll_ctl(vq->ev.data.fd, EPOLL_CTL_DEL, vq->kickfd, &vq->ev) < 0) { + VHOST_LOG(DEBUG, "Failed to unregister %d from rxq-%d epoll: %s\n", + vq->kickfd, rxq_idx, strerror(errno)); + } else { + VHOST_LOG(DEBUG, "Unregistered %d from rxq-%d epoll\n", + vq->kickfd, rxq_idx); + } + vq->kickfd = -1; } - rev.fd = handle->efds[rxq_idx]; - handle->elist[rxq_idx] = rev; - ret = rte_epoll_ctl(epfd, EPOLL_CTL_ADD, rev.fd, - &handle->elist[rxq_idx]); - if (ret) { - VHOST_LOG(ERR, "Add epoll event failed.\n"); - return ret; + /* Add new one, if valid */ + if (vq->kickfd != vring.kickfd && vring.kickfd >= 0) { + if (epoll_ctl(vq->ev.data.fd, EPOLL_CTL_ADD, vring.kickfd, &vq->ev) < 0) { + VHOST_LOG(ERR, "Failed to register %d in rxq-%d epoll: %s\n", + vring.kickfd, rxq_idx, strerror(errno)); + } else { + vq->kickfd = vring.kickfd; + VHOST_LOG(DEBUG, "Registered %d in rxq-%d epoll\n", + vq->kickfd, rxq_idx); + } } - return 0; + rte_spinlock_unlock(&vq->intr_lock); } static int eth_rxq_intr_enable(struct rte_eth_dev *dev, uint16_t qid) { - struct rte_vhost_vring vring; - struct vhost_queue *vq; - int old_intr_enable, ret = 0; - - vq = dev->data->rx_queues[qid]; - if (!vq) { - VHOST_LOG(ERR, "rxq%d is not setup yet\n", qid); - return -1; - } - - rte_spinlock_lock(&vq->intr_lock); - old_intr_enable = vq->intr_enable; - vq->intr_enable = 1; - ret = eth_vhost_update_intr(dev, qid); - rte_spinlock_unlock(&vq->intr_lock); - - if (ret < 0) { - VHOST_LOG(ERR, "Failed to update rxq%d's intr\n", qid); - vq->intr_enable = old_intr_enable; - return ret; - } + struct vhost_queue *vq = dev->data->rx_queues[qid]; - ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring); - if (ret < 0) { - VHOST_LOG(ERR, "Failed to get rxq%d's vring\n", qid); - return ret; - } - VHOST_LOG(INFO, "Enable interrupt for rxq%d\n", qid); - rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 1); - rte_wmb(); + if (vq->vid >= 0) + rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 1); - return ret; + return 0; } static int eth_rxq_intr_disable(struct rte_eth_dev *dev, uint16_t qid) { - struct rte_vhost_vring vring; - struct vhost_queue *vq; - int ret = 0; - - vq = dev->data->rx_queues[qid]; - if (!vq) { - VHOST_LOG(ERR, "rxq%d is not setup yet\n", qid); - return -1; - } - - ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring); - if (ret < 0) { - VHOST_LOG(ERR, "Failed to get rxq%d's vring\n", qid); - return ret; - } - VHOST_LOG(INFO, "Disable interrupt for rxq%d\n", qid); - rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 0); - rte_wmb(); + struct vhost_queue *vq = dev->data->rx_queues[qid]; - vq->intr_enable = 0; + if (vq->vid >= 0) + rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 0); return 0; } @@ -641,70 +598,64 @@ eth_vhost_uninstall_intr(struct rte_eth_dev *dev) struct rte_intr_handle *intr_handle = dev->intr_handle; if (intr_handle) { + int i; + + for (i = 0; i < dev->data->nb_rx_queues; i++) { + int epoll_fd = intr_handle->efds[i]; + + if (epoll_fd >= 0) + close(epoll_fd); + } if (intr_handle->intr_vec) free(intr_handle->intr_vec); free(intr_handle); } - dev->intr_handle = NULL; } static int eth_vhost_install_intr(struct rte_eth_dev *dev) { - struct rte_vhost_vring vring; - struct vhost_queue *vq; int nb_rxq = dev->data->nb_rx_queues; - int i; - int ret; + struct vhost_queue *vq; - /* uninstall firstly if we are reconnecting */ - if (dev->intr_handle) - eth_vhost_uninstall_intr(dev); + int ret; + int i; dev->intr_handle = malloc(sizeof(*dev->intr_handle)); if (!dev->intr_handle) { VHOST_LOG(ERR, "Fail to allocate intr_handle\n"); - return -ENOMEM; + ret = -ENOMEM; + goto error; } memset(dev->intr_handle, 0, sizeof(*dev->intr_handle)); - dev->intr_handle->efd_counter_size = sizeof(uint64_t); - dev->intr_handle->intr_vec = malloc(nb_rxq * sizeof(dev->intr_handle->intr_vec[0])); if (!dev->intr_handle->intr_vec) { VHOST_LOG(ERR, "Failed to allocate memory for interrupt vector\n"); - free(dev->intr_handle); - return -ENOMEM; + ret = -ENOMEM; + goto error; } - VHOST_LOG(INFO, "Prepare intr vec\n"); + VHOST_LOG(DEBUG, "Prepare intr vec\n"); for (i = 0; i < nb_rxq; i++) { - dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + i; - dev->intr_handle->efds[i] = -1; - vq = dev->data->rx_queues[i]; - if (!vq) { - VHOST_LOG(INFO, "rxq-%d not setup yet, skip!\n", i); - continue; - } + int epoll_fd = epoll_create1(0); - ret = rte_vhost_get_vhost_vring(vq->vid, (i << 1) + 1, &vring); - if (ret < 0) { - VHOST_LOG(INFO, - "Failed to get rxq-%d's vring, skip!\n", i); - continue; + if (epoll_fd < 0) { + VHOST_LOG(ERR, "Failed to create proxy epoll fd for rxq-%d\n", i); + ret = -errno; + goto error; } - if (vring.kickfd < 0) { - VHOST_LOG(INFO, - "rxq-%d's kickfd is invalid, skip!\n", i); - continue; - } - dev->intr_handle->efds[i] = vring.kickfd; - VHOST_LOG(INFO, "Installed intr vec for rxq-%d\n", i); + dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + i; + dev->intr_handle->efds[i] = epoll_fd; + vq = dev->data->rx_queues[i]; + memset(&vq->ev, 0, sizeof(vq->ev)); + vq->ev.events = EPOLLIN; + vq->ev.data.fd = epoll_fd; } dev->intr_handle->nb_efd = nb_rxq; @@ -712,6 +663,50 @@ eth_vhost_install_intr(struct rte_eth_dev *dev) dev->intr_handle->type = RTE_INTR_HANDLE_VDEV; return 0; + +error: + eth_vhost_uninstall_intr(dev); + return ret; +} + +static void +eth_vhost_configure_intr(struct rte_eth_dev *dev) +{ + int i; + + VHOST_LOG(DEBUG, "Configure intr vec\n"); + for (i = 0; i < dev->data->nb_rx_queues; i++) + eth_vhost_update_intr(dev, i); +} + +static void +eth_vhost_unconfigure_intr(struct rte_eth_dev *eth_dev) +{ + struct vhost_queue *vq; + int i; + + VHOST_LOG(DEBUG, "Unconfigure intr vec\n"); + for (i = 0; i < eth_dev->data->nb_rx_queues; i++) { + vq = eth_dev->data->rx_queues[i]; + if (vq == NULL || vq->vid < 0) + continue; + + rte_spinlock_lock(&vq->intr_lock); + + /* Remove previous kickfd from proxy epoll */ + if (vq->kickfd >= 0) { + if (epoll_ctl(vq->ev.data.fd, EPOLL_CTL_DEL, vq->kickfd, &vq->ev) < 0) { + VHOST_LOG(DEBUG, "Failed to unregister %d from rxq-%d epoll: %s\n", + vq->kickfd, i, strerror(errno)); + } else { + VHOST_LOG(DEBUG, "Unregistered %d from rxq-%d epoll\n", + vq->kickfd, i); + } + vq->kickfd = -1; + } + + rte_spinlock_unlock(&vq->intr_lock); + } } static void @@ -815,16 +810,8 @@ new_device(int vid) internal->vid = vid; if (rte_atomic32_read(&internal->started) == 1) { queue_setup(eth_dev, internal); - - if (dev_conf->intr_conf.rxq) { - if (eth_vhost_install_intr(eth_dev) < 0) { - VHOST_LOG(INFO, - "Failed to install interrupt handler.\n"); - return -1; - } - } - } else { - VHOST_LOG(INFO, "RX/TX queues not exist yet\n"); + if (dev_conf->intr_conf.rxq) + eth_vhost_configure_intr(eth_dev); } for (i = 0; i < rte_vhost_get_vring_num(vid); i++) @@ -866,6 +853,7 @@ destroy_device(int vid) rte_atomic32_set(&internal->dev_attached, 0); update_queuing_status(eth_dev, true); + eth_vhost_unconfigure_intr(eth_dev); eth_dev->data->dev_link.link_status = ETH_LINK_DOWN; @@ -894,52 +882,10 @@ destroy_device(int vid) rte_spinlock_unlock(&state->lock); VHOST_LOG(INFO, "Vhost device %d destroyed\n", vid); - eth_vhost_uninstall_intr(eth_dev); rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC, NULL); } -static int -vring_conf_update(int vid, struct rte_eth_dev *eth_dev, uint16_t vring_id) -{ - struct rte_eth_conf *dev_conf = ð_dev->data->dev_conf; - struct pmd_internal *internal = eth_dev->data->dev_private; - struct vhost_queue *vq; - struct rte_vhost_vring vring; - int rx_idx = vring_id % 2 ? (vring_id - 1) >> 1 : -1; - int ret = 0; - - /* - * The vring kickfd may be changed after the new device notification. - * Update it when the vring state is updated. - */ - if (rx_idx >= 0 && rx_idx < eth_dev->data->nb_rx_queues && - rte_atomic32_read(&internal->dev_attached) && - rte_atomic32_read(&internal->started) && - dev_conf->intr_conf.rxq) { - ret = rte_vhost_get_vhost_vring(vid, vring_id, &vring); - if (ret) { - VHOST_LOG(ERR, "Failed to get vring %d information.\n", - vring_id); - return ret; - } - eth_dev->intr_handle->efds[rx_idx] = vring.kickfd; - - vq = eth_dev->data->rx_queues[rx_idx]; - if (!vq) { - VHOST_LOG(ERR, "rxq%d is not setup yet\n", rx_idx); - return -1; - } - - rte_spinlock_lock(&vq->intr_lock); - if (vq->intr_enable) - ret = eth_vhost_update_intr(eth_dev, rx_idx); - rte_spinlock_unlock(&vq->intr_lock); - } - - return ret; -} - static int vring_state_changed(int vid, uint16_t vring, int enable) { @@ -959,9 +905,8 @@ vring_state_changed(int vid, uint16_t vring, int enable) /* won't be NULL */ state = vring_states[eth_dev->data->port_id]; - if (enable && vring_conf_update(vid, eth_dev, vring)) - VHOST_LOG(INFO, "Failed to update vring-%d configuration.\n", - (int)vring); + if (eth_dev->data->dev_conf.intr_conf.rxq && vring % 2) + eth_vhost_update_intr(eth_dev, (vring - 1) >> 1); rte_spinlock_lock(&state->lock); if (state->cur[vring] == enable) { @@ -1146,18 +1091,17 @@ eth_dev_start(struct rte_eth_dev *eth_dev) struct pmd_internal *internal = eth_dev->data->dev_private; struct rte_eth_conf *dev_conf = ð_dev->data->dev_conf; - queue_setup(eth_dev, internal); - - if (rte_atomic32_read(&internal->dev_attached) == 1) { - if (dev_conf->intr_conf.rxq) { - if (eth_vhost_install_intr(eth_dev) < 0) { - VHOST_LOG(INFO, - "Failed to install interrupt handler.\n"); - return -1; - } - } + eth_vhost_uninstall_intr(eth_dev); + if (dev_conf->intr_conf.rxq && eth_vhost_install_intr(eth_dev) < 0) { + VHOST_LOG(ERR, "Failed to install interrupt handler.\n"); + return -1; } + queue_setup(eth_dev, internal); + if (rte_atomic32_read(&internal->dev_attached) == 1 && + dev_conf->intr_conf.rxq) + eth_vhost_configure_intr(eth_dev); + rte_atomic32_set(&internal->started, 1); update_queuing_status(eth_dev, false); @@ -1212,6 +1156,8 @@ eth_dev_close(struct rte_eth_dev *dev) rte_free(internal->iface_name); rte_free(internal); + eth_vhost_uninstall_intr(dev); + dev->data->dev_private = NULL; rte_free(vring_states[dev->data->port_id]); @@ -1239,6 +1185,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, vq->mb_pool = mb_pool; vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ; rte_spinlock_init(&vq->intr_lock); + vq->kickfd = -1; dev->data->rx_queues[rx_queue_id] = vq; return 0; @@ -1261,6 +1208,7 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id, vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ; rte_spinlock_init(&vq->intr_lock); + vq->kickfd = -1; dev->data->tx_queues[tx_queue_id] = vq; return 0; -- 2.39.2