From: Matan Azrad <matan@mellanox.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
"Wang, Yinan" <yinan.wang@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Xia, Chenbo" <chenbo.xia@intel.com>,
"Liu, Yong" <yong.liu@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>,
"Yigit, Ferruh" <ferruh.yigit@intel.com>,
"david.marchand@redhat.com" <david.marchand@redhat.com>
Subject: Re: [dpdk-dev] [PATCH 0/2] Fix vhost performance regression
Date: Mon, 27 Jul 2020 08:28:16 +0000 [thread overview]
Message-ID: <AM0PR0502MB401924510DFF8263AD16A53AD2720@AM0PR0502MB4019.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <aae8a1a9-0592-3a15-08fa-ea16abd9b933@redhat.com>
Hi Maxime
I understand the direction...
See below
From: Maxime Coquelin
> On 7/24/20 10:54 AM, Maxime Coquelin wrote:
> >
> >
> > On 7/24/20 9:06 AM, Maxime Coquelin wrote:
> >> Hi Yinan,
> >>
> >> On 7/24/20 6:55 AM, Wang, Yinan wrote:
> >>> Hi Maxime,
> >>>
> >>> The performance drop issue can be fixed, thanks!
> >>> The multi-queues interrupt issue still exist w/ this patch set.
> >>
> >> Thanks for the test report, so that's only half good.
> >> I'm setting up the multi-queues interrupt test case to further debug it.
> >
> > I have now a reproducer, i.e. only interrupts are received on rxq0.
> >
> > (gdb) p *((struct internal_list *)internal_list)->eth_dev->intr_handle
> > $20 = {
> > {
> > vfio_dev_fd = 0,
> > uio_cfg_fd = 0
> > },
> > fd = 0,
> > type = RTE_INTR_HANDLE_VDEV,
> > max_intr = 2,
> > nb_efd = 1,
> > efd_counter_size = 8 '\b',
> > efds = {622, 621, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642,
> > 692, 646, 701, 650, 0 <repeats 496 times>},
> > elist = {{
> > status = 1,
> > fd = 622,
> > epfd = 645,
> > epdata = {
> > event = 2147483651,
> > data = 0x1,
> > cb_fun = 0x8af840 <eal_intr_proc_rxtx_intr>,
> > cb_arg = 0x7f4df0001580
> > }
> > }, {
> > status = 0,
> > fd = 0,
> > epfd = 0,
> > epdata = {
> > event = 0,
> > data = 0x0,
> > cb_fun = 0x0,
> > cb_arg = 0x0
> > }
> > } <repeats 511 times>},
> > intr_vec = 0x7f4df0007db0
> > }
> >
> > In above dump, we can see the efds are well set via the fix provided
> > by Matan, but max_intr and nb_efd aren't so polling won't take them
> > into account.
> >
> > I'm working on a fix.
>
> So it is a bit more complex than I imagined.
> There are no DPDK API to update the FD in the epoll, so it seems we need to
> do it directly in the driver by removing the old one and adding the new one.
>
> I have cooked a patch that makes it work, but I would like to know if that
> would be acceptable for this release? We could imagine introducing new
> rte_epoll API to handle that properly in v20.11.
>
> Matan, could you review below patch and confirm whether it is safe?
>
> (The patch needs some style clean-up before being submitted).
>
> Thanks in advance,
> Maxime
>
> ==========================================================
> ========
>
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index 14b7b59f67..f36ea4b24c 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -5,6 +5,7 @@
> #include <unistd.h>
> #include <pthread.h>
> #include <stdbool.h>
> +#include <sys/epoll.h>
>
> #include <rte_mbuf.h>
> #include <rte_ethdev_driver.h>
> @@ -593,7 +594,6 @@ eth_vhost_install_intr(struct rte_eth_dev *dev) {
> struct rte_vhost_vring vring;
> struct vhost_queue *vq;
> - int count = 0;
> int nb_rxq = dev->data->nb_rx_queues;
> int i;
> int ret;
> @@ -623,6 +623,8 @@ eth_vhost_install_intr(struct rte_eth_dev *dev)
>
> VHOST_LOG(INFO, "Prepare intr vec\n");
> for (i = 0; i < nb_rxq; i++) {
> + dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET
> + i;
> + dev->intr_handle->efds[i] = -1;
> vq = dev->data->rx_queues[i];
> if (!vq) {
> VHOST_LOG(INFO, "rxq-%d not setup yet, skip!\n", i); @@ -
> 641,14 +643,12 @@ eth_vhost_install_intr(struct rte_eth_dev *dev)
> "rxq-%d's kickfd is invalid, skip!\n", i);
> continue;
> }
> - dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET
> + i;
> dev->intr_handle->efds[i] = vring.kickfd;
> - count++;
> VHOST_LOG(INFO, "Installed intr vec for rxq-%d\n", i);
> }
>
> - dev->intr_handle->nb_efd = count;
> - dev->intr_handle->max_intr = count + 1;
> + dev->intr_handle->nb_efd = nb_rxq;
> + dev->intr_handle->max_intr = nb_rxq + 1;
> dev->intr_handle->type = RTE_INTR_HANDLE_VDEV;
>
> return 0;
> @@ -836,8 +836,11 @@ vring_conf_update(int vid, struct rte_eth_dev
> *eth_dev, uint16_t vring_id)
> struct rte_eth_conf *dev_conf = ð_dev->data->dev_conf;
> struct pmd_internal *internal = eth_dev->data->dev_private;
> struct rte_vhost_vring vring;
> + struct rte_intr_handle *handle;
> + struct rte_epoll_event rev;
> int rx_idx = vring_id % 2 ? (vring_id - 1) >> 1 : -1;
> int ret = 0;
> + int epfd;
>
> /*
> * The vring kickfd may be changed after the new device notification.
> @@ -852,9 +855,17 @@ vring_conf_update(int vid, struct rte_eth_dev
> *eth_dev, uint16_t vring_id)
> return ret;
>
> if (vring.kickfd != eth_dev->intr_handle->efds[rx_idx]) {
> + handle = eth_dev->intr_handle;
> VHOST_LOG(INFO, "kickfd for rxq-%d was changed.\n",
> rx_idx);
> - eth_dev->intr_handle->efds[rx_idx] = vring.kickfd;
> +
> + handle->efds[rx_idx] = vring.kickfd;
> + epfd = handle->elist[rx_idx].epfd;
> + rev = handle->elist[rx_idx];
> + rev.fd = vring.kickfd;
> + rte_epoll_ctl(epfd, EPOLL_CTL_DEL,
We may have race with application code here and below, no?
> handle->elist[rx_idx].fd, &handle->elist[rx_idx]);
> + handle->elist[rx_idx] = rev;
> + rte_epoll_ctl(epfd, EPOLL_CTL_ADD, rev.fd,
> &handle->elist[rx_idx]);
> }
> }
One more option is to let application do it by the QUEUE_STATE event callback...
But it requires app change ☹
>
> > Regards,
> > Maxime
> >
> >> Regards,
> >> Maxime
> >>
> >>> BR,
> >>> Yinan
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: 2020?7?23? 21:09
> >>>> To: dev@dpdk.org; matan@mellanox.com; Xia, Chenbo
> >>>> <chenbo.xia@intel.com>; Liu, Yong <yong.liu@intel.com>; Wang, Yinan
> >>>> <yinan.wang@intel.com>
> >>>> Cc: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> >>>> david.marchand@redhat.com; Maxime Coquelin
> >>>> <maxime.coquelin@redhat.com>
> >>>> Subject: [PATCH 0/2] Fix vhost performance regression
> >>>>
> >>>> Hi,
> >>>>
> >>>> This series aims at fixing the performance degradation reported by
> >>>> Intel QE. I managed to reproduce the issue, and this series fixes
> >>>> it.
> >>>>
> >>>> I only tested the first test case provided in the Bz[0], but wanted
> >>>> to send early for Intel QE to try and confirm it solves the issue.
> >>>>
> >>>> I will work on reproducing the other test cases, and see if this
> >>>> also fixes them.
> >>>>
> >>>> Thanks to Intel QE team for finding this issue.
> >>>> Maxime
> >>>>
> >>>> [0]:
> >>>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fb
> >>>>
> ugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D507%23c0&data=02%7C01%7C
> mata
> >>>>
> n%40mellanox.com%7Cb76cf49cc7f04f000f0108d82fe850cf%7Ca652971c7d2e
> 4
> >>>>
> d9ba6a4d149256f461b%7C0%7C0%7C637312022123706774&sdata=r2m9
> XpBc
> >>>> rFs7tb14a4Pcs8Qx7fiBpB2E4beRXenuVa0%3D&reserved=0
> >>>>
> >>>> Maxime Coquelin (2):
> >>>> vhost: fix guest notification setting
> >>>> net/vhost: fix queue update
> >>>>
> >>>> drivers/net/vhost/rte_eth_vhost.c | 25 ++++++-------------------
> >>>> lib/librte_vhost/vhost.c | 24 ++++++++++++++++++++----
> >>>> lib/librte_vhost/vhost.h | 5 +++++
> >>>> lib/librte_vhost/vhost_user.c | 11 ++++++++---
> >>>> 4 files changed, 39 insertions(+), 26 deletions(-)
> >>>>
> >>>> --
> >>>> 2.26.2
> >>>
> >>
> >
next prev parent reply other threads:[~2020-07-27 8:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-23 13:08 Maxime Coquelin
2020-07-23 13:08 ` [dpdk-dev] [PATCH 1/2] vhost: fix guest notification setting Maxime Coquelin
2020-07-27 13:46 ` Xia, Chenbo
2020-07-23 13:08 ` [dpdk-dev] [PATCH 2/2] net/vhost: fix queue update Maxime Coquelin
2020-07-27 13:54 ` Xia, Chenbo
2020-07-24 4:55 ` [dpdk-dev] [PATCH 0/2] Fix vhost performance regression Wang, Yinan
2020-07-24 7:06 ` Maxime Coquelin
2020-07-24 8:54 ` Maxime Coquelin
2020-07-24 15:43 ` Maxime Coquelin
2020-07-27 8:28 ` Matan Azrad [this message]
2020-07-24 12:42 ` David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AM0PR0502MB401924510DFF8263AD16A53AD2720@AM0PR0502MB4019.eurprd05.prod.outlook.com \
--to=matan@mellanox.com \
--cc=chenbo.xia@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=maxime.coquelin@redhat.com \
--cc=thomas@monjalon.net \
--cc=yinan.wang@intel.com \
--cc=yong.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).