From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 74FFDA0471 for ; Mon, 15 Jul 2019 09:20:16 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 76EDD326C; Mon, 15 Jul 2019 09:20:14 +0200 (CEST) Received: from rcdn-iport-5.cisco.com (rcdn-iport-5.cisco.com [173.37.86.76]) by dpdk.org (Postfix) with ESMTP id 874C92BF5; Mon, 15 Jul 2019 09:20:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=7559; q=dns/txt; s=iport; t=1563175212; x=1564384812; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=DD16DBZDQAEiMPtd0L/m7oDo48m1uZ68/XYTQEADgPY=; b=bfKS8zw71/fXbU6MUxzcJiIWAioCN6ZTMGJFU1nBtaB5px+hSsC1qjig dRr20x5OCvhj9hbKmg64p3r2cLj0TQMZ+s7fMFSdWbpGYKorjn9cKdOfj 0glQ7p0DKGW15C754OBijCJy3mlHW3ssmr4VNIjamIGGoY/nbs7TPim0k U=; IronPort-PHdr: =?us-ascii?q?9a23=3A5K/MnBdGB9qPsqNZHgy6MR9YlGMj4e+mNxMJ6p?= =?us-ascii?q?chl7NFe7ii+JKnJkHE+PFxlwGRD57D5adCjOzb++D7VGoM7IzJkUhKcYcEFn?= =?us-ascii?q?pnwd4TgxRmBceEDUPhK/u/bz09GsdDUXdu/mqwNg5eH8OtL1A=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0ANAACNKCxd/5BdJa1mGgEBAQEBAgE?= =?us-ascii?q?BAQEHAgEBAQGBVAQBAQEBCwGBQ1ADalUgBAsoh2MDjk4yGoIPiB2PMoEuFIE?= =?us-ascii?q?QA1QJAQEBDAEBHw4CAQGEQAJMAQSCByM1CA4BAwEBBAEBAgEFbYU8DIIogyI?= =?us-ascii?q?BAQEBAxIVEwYBASkOAQsEAgEIEQQBAR8QMh0IAgQBDQUIGoMBgWoDHQEOnw8?= =?us-ascii?q?CgTiIYIFwM4J5AQEFgTIBg1IYghMJgTQBizgmF4FAP4EQAUaCTD6EEQEMBgE?= =?us-ascii?q?DHiSDFoImjAIagi2HWoRkj2AJAoIZhliNT4ItbYY4jjiNNYdIkAgCBAIEBQI?= =?us-ascii?q?OAQEFgVIDMw1acXAVgycJgjgMF4NOhRSFP3KBKYwxAQ0XB4IlAQE?= X-IronPort-AV: E=Sophos;i="5.63,493,1557187200"; d="scan'208";a="376541710" Received: from rcdn-core-8.cisco.com ([173.37.93.144]) by rcdn-iport-5.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 15 Jul 2019 07:20:07 +0000 Received: from XCH-ALN-017.cisco.com (xch-aln-017.cisco.com [173.36.7.27]) by rcdn-core-8.cisco.com (8.15.2/8.15.2) with ESMTPS id x6F7K5Z9022583 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Mon, 15 Jul 2019 07:20:07 GMT Received: from xhs-aln-002.cisco.com (173.37.135.119) by XCH-ALN-017.cisco.com (173.36.7.27) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 15 Jul 2019 02:20:05 -0500 Received: from xhs-aln-001.cisco.com (173.37.135.118) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 15 Jul 2019 02:20:04 -0500 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (173.37.151.57) by xhs-aln-001.cisco.com (173.37.135.118) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 15 Jul 2019 02:20:04 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KK+Ow+r3/sbuDnumRHP1apxb2GlppkwAw2cx5q5Rp3SB4ilz7hzyP1T33lJRWJtpYtPCuuJUqLQJxLeICY2paavQmflERgO3NarViXuJOBhDHupvJS0X5TxgumXCFTxcnYhpSwb1UyS0cu8fkC6dwO7PYxy4wJllLyoojNFDnuAgLki7cR7TPGNwNia8NtedT6B5nXs6CHwW6mkLP6cRtUC5HeKxeY3CRCV7kvB/hkmFhYxWfdufHStyLyp+56n71yarudmY9/TdG+PraQhb5p4ZtQi2D5or2Svh12BO1fgh/x3AgGV95UP9TC4p7D+96hNZyPY939Y5DQ67fiaUmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AcYX/FiJcrstmQdAXUucGffu//ZYJazpgTcgBS7SQpk=; b=fkTFWo1GELDEDkEyUAH+WJ0zX2ClFqeIseY7REZUPmaATA49kzlpvFWQSGwCJ96zdzwY6XOHVtVs3D5tA7tqCpke7k/V18IHLwSjpRciR3AnkLZQ36n3+F28slecMkz9F5ON499ReX+/ysONkAmVAqIwTLOJOnrB5SekY4Fbd5m3lk3HSjkI2fIjBWTeVgcoMbO+MgsxsQXLSFbQPrDBtHPy/3JnCQMK8OghG8f/nQaHNxmWz8JIvZ/JAzBqrMXsRTTZ2fbZ3XUqnoKNzR+hVj5tMUXe177+vBMAC84s/ClbK5oJ9HeaTDZkd2Wqm1vAcqSVDKPIgvs6YnHPubA80Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=cisco.com;dmarc=pass action=none header.from=cisco.com;dkim=pass header.d=cisco.com;arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AcYX/FiJcrstmQdAXUucGffu//ZYJazpgTcgBS7SQpk=; b=hhdfTku5bzOjq0Rc8wR+vjC5Caj1mszPpLJTjnOY8soncje80JEighbH0QsRlAwqNhOiJEJbipRdT/QPbGHmW1qqctZAKExEA++IbCnfwfk94Tu/UIwxtMi9ZC4a+ZHU/t5Vq/ADWhdOPWdX2kulL84XIESIurhl9cVeRcwQpAg= Received: from MWHPR11MB1839.namprd11.prod.outlook.com (10.175.53.12) by MWHPR11MB1422.namprd11.prod.outlook.com (10.169.234.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2073.13; Mon, 15 Jul 2019 07:20:03 +0000 Received: from MWHPR11MB1839.namprd11.prod.outlook.com ([fe80::3cef:35b5:8800:39f1]) by MWHPR11MB1839.namprd11.prod.outlook.com ([fe80::3cef:35b5:8800:39f1%6]) with mapi id 15.20.2073.012; Mon, 15 Jul 2019 07:20:03 +0000 From: "Hyong Youb Kim (hyonkim)" To: Jerin Jacob Kollanukkaran , Thomas Monjalon CC: David Marchand , "dev@dpdk.org" , "anatoly.burakov@intel.com" , "alex.williamson@redhat.com" , "maxime.coquelin@redhat.com" , "stephen@networkplumber.org" , "igor.russkikh@aquantia.com" , "pavel.belous@aquantia.com" , "allain.legacy@windriver.com" , "matt.peters@windriver.com" , "ravi1.kumar@amd.com" , Rasesh Mody , Shahed Shaikh , "ajit.khaparde@broadcom.com" , "somnath.kotur@broadcom.com" , "hemant.agrawal@nxp.com" , "shreyansh.jain@nxp.com" , "wenzhuo.lu@intel.com" , "mw@semihalf.com" , "mk@semihalf.com" , "gtzalik@amazon.com" , "evgenys@amazon.com" , "John Daley (johndale)" , "qi.z.zhang@intel.com" , "xiao.w.wang@intel.com" , "xuanziyang2@huawei.com" , "cloud.wangxiaoyun@huawei.com" , "zhouguoyang@huawei.com" , "beilei.xing@intel.com" , "jingjing.wu@intel.com" , "qiming.yang@intel.com" , "konstantin.ananyev@intel.com" , "alejandro.lucero@netronome.com" , "arybchenko@solarflare.com" , "tiwei.bie@intel.com" , "zhihong.wang@intel.com" , "yongwang@vmware.com" , "stable@dpdk.org" , Nithin Kumar Dabilpuram Thread-Topic: [dpdk-dev] [PATCH] vfio: fix interrupts race condition Thread-Index: AQHVNxvQ/aKE7NGAcUykS+n6I1o+uabEXHuAgAUb5sCAAIV7gIABMhMAgAAbdxA= Date: Mon, 15 Jul 2019 07:20:03 +0000 Message-ID: References: <1562071706-11009-1-git-send-email-david.marchand@redhat.com> <1796500.5oFe8j95cd@xps> <4647179.CTOrK8BQiK@xps> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=hyonkim@cisco.com; x-originating-ip: [2001:420:c0dc:1001::90] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 2ab7de09-86aa-4dc5-97b7-08d708f4daf2 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600148)(711020)(4605104)(1401327)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7193020); SRVR:MWHPR11MB1422; x-ms-traffictypediagnostic: MWHPR11MB1422: x-ms-exchange-purlcount: 1 x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 00997889E7 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(376002)(396003)(346002)(366004)(136003)(39860400002)(189003)(199004)(13464003)(99286004)(66446008)(7696005)(81156014)(478600001)(81166006)(8676002)(5660300002)(76116006)(7416002)(66556008)(66476007)(6306002)(64756008)(9686003)(6116002)(486006)(52536014)(7736002)(74316002)(14444005)(46003)(6436002)(66946007)(55016002)(305945005)(966005)(45080400002)(476003)(86362001)(4326008)(2906002)(68736007)(53546011)(316002)(110136005)(446003)(33656002)(54906003)(229853002)(11346002)(71190400001)(6506007)(76176011)(256004)(7406005)(6246003)(53936002)(186003)(14454004)(25786009)(8936002)(102836004)(71200400001); DIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR11MB1422; H:MWHPR11MB1839.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: bkgRYVAhg8WYa8T7F7xm3eV+rh/XdYBO35KhdMTGQhVrldweXi9BkfS5joZX1mqTVdP8Q0BelUyCfN+9hFHaNv3HHgWwd7IQhe4CpoIttacd1+zJBgEPbYOVBKCGLWKzxO0qQ+B9V4TnfELeMG9yyG1yqqvXZhIfm7S2P8JDuFtA10BzcRJ8oFnZTBgn+qKw23ni7kOkIRxRdezi0QohL7US/nuZfl+9FtYFGov697XOOPm2iQP8IvxQYF2EE2JLkSAxIl1YE91PQBIHh5Vwm8ysGFE1EdWK4ONumdolW7eFDrRzCc6B1hDh+FblkLZiSPuaCEzv0FkLuUoUM7oyvHR7FWHeNY2bGgpZHO78fub01nKWdaWoO7K0F1cU5JuDHGkrgTrUM0JVKvtLIIVhSiHHzthxgU8ZntA5bhUvooQ= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 2ab7de09-86aa-4dc5-97b7-08d708f4daf2 X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jul 2019 07:20:03.1986 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: hyonkim@cisco.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR11MB1422 X-OriginatorOrg: cisco.com X-Outbound-SMTP-Client: 173.36.7.27, xch-aln-017.cisco.com X-Outbound-Node: rcdn-core-8.cisco.com Subject: Re: [dpdk-dev] [PATCH] vfio: fix interrupts race condition X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Jerin Jacob Kollanukkaran > Sent: Monday, July 15, 2019 2:35 PM [...] > Subject: RE: [dpdk-dev] [PATCH] vfio: fix interrupts race condition >=20 > > > > > > > > This is a real bug which should be fixed in this release. > > > > As the patch is quite big and needs a strong validation, I prefer > > > > merging it quickly to give a lot of time before releasing 19.08-rc2= . > > > > The maintainers of all concerned PMDs are Cc. > > > > Please make sure the interrupts are still working well with VFIO. > > > > > > > > Applied, thanks > > > > > > > > > > [Apologies in advance if email format gets messed up. Forced to use > > > outlook for the first time..] > > > > > > Hi, > > > > > > This commit breaks MSI-X + rxq interrupts. I think others are seeing > > > the same error? > > > > > > sudo ~/dpdk/examples/l3fwd-power/build/l3fwd-power \ -c 0x1e -n 4 - > w > > > 0000:1a:00.0 --log-level=3Dpmd,debug -- -p 0x1 -P --config > > "(0,0,2),(0,1,3),(0,2,4)" > > > [...] > > > EAL: Error enabling MSI-X interrupts for fd 35 > > > > > > A rough sequence of events goes like this. The above test is using 3 > > > rxqs (3 interrupts). > > > > > > 1. During probe, pci_vfio_setup_interrupts() runs. > > > This now does ioctl(VFIO_DEVICE_SET_IRQS) for the 1st efd > > > (intr_handle->fd). > > > > > > ioctl does: > > > - pci_enable_msix(1 vector) because this is the first time enabling > > > interrupts. > > > - request_irq(vector 0) > > > > > > 2. App configs > > > The app sets port_conf.intr_conf.rxq=3D1, configs 3 rxqs, etc. > > > > > > 3. rte_eth_dev_start() > > > PMD calls: > > > - rte_intr_efd_enable() > > > This creates 3 efds (intr_handle->nb_efd =3D 3). > > > - rte_intr_enable() =3D> vfio_enable_msix() > > > This does ioctl(VFIO_DEVICE_SET_IRQS) for the 3 efds. > > > > > > ioctl now needs to request_irq() for vectors 1, 2, 3 for the 3 new > > > efds. It does not do another pci_enable_msix() as it has been done > > > earlier. Before calling request_irq(), it sees that only 1 vector was > > > enabled in earlier pci_enable_msix(), so it fails with EINVAL. > > > > > > We would need pci_enable_msix(4 vectors) for this to work > > > (intr_handle->fd + 3 efds). > > > > > > Prior to this patch, VFIO_DEVICE_SET_IRQS is done only in > > > vfio_enable_msix(). So, ioctl ends up doing pci_enable_msix(4 vectors= ) > > > and request_irq() for each of the 4 efds, which completes > > > successfully. > > > > > > Not an expert in this area.. Perhaps, defer enabling 1st efd > > > (intr_handle->fd) until the first invocation of vfio_enable_msix(), s= o > > > it knows the app wants to use 4 vectors in total? > > > > > > Also, vfio_disable_msix() looks a bit wrong. > > > > > > irq_set.flags =3D VFIO_IRQ_SET_DATA_NONE | > > VFIO_IRQ_SET_ACTION_TRIGGER; > > > irq_set.index =3D VFIO_PCI_MSIX_IRQ_INDEX; > > > irq_set.start =3D RTE_INTR_VEC_RXTX_OFFSET; > > > irq_set.count =3D intr_handle->nb_efd; > > > > > > This tells vfio-pci to simulate interrupts by triggering efds? To > > > free_irq() specific efds, I think we need DATA_EVENTFD and set fd =3D > > > -1. > > > > > > flags =3D DATA_EVENTFD | ACTION_TRIGGER > > > data =3D [fd(-1), fd(-1), ...] > > > > > > I have not tested this part myself yet. >=20 >=20 > We do see the following failure[1] on octeontx2 PMD with this patch. > We will try to find a fix. >=20 > irq_set =3D (struct vfio_irq_set *)irq_set_buf; > irq_set->argsz =3D len; > irq_set->start =3D 0; > irq_set->count =3D intr_handle->max_intr; > irq_set->flags =3D VFIO_IRQ_SET_DATA_EVENTFD | > VFIO_IRQ_SET_ACTION_TRIGGER; > irq_set->index =3D VFIO_PCI_MSIX_IRQ_INDEX; >=20 > fd_ptr =3D (int32_t *)&irq_set->data[0]; > for (i =3D 0; i < irq_set->count; i++) > fd_ptr[i] =3D -1; >=20 > rc =3D ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_= set); > if (rc) > otx2_err("Failed to set irqs vector rc=3D%d", rc); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[1] >=20 >=20 >=20 >=20 >=20 > > > > Thanks for your detailed report Hyong. > > Would you be able to propose a fix? > > Did more digging. Another lengthy email.. It feels tricky to fix the problem properly, as it is getting late.. Here is a recap of the problem, as best I can. 1. INTx: Both vfio-pci and igb_uio mask the interrupt (i.e. write 1 to Interrupt Disable in PCI config) and then trigger callback to PMD. PMD needs to unmask the interrupt when exiting its irq callback. This is currently achieved by calling rte_intr_enable. Several PMDs use this pattern. irq_callback() { take_action() rte_intr_enable() } 2. MSI/MSI-X: No automatic masking done within vfio-pci/igb_uio before triggering callback to PMD. No need to "re-enable" interrupt by calling rte_intr_enable. If we forget INTx, we can simply remove rte_intr_enable from all PMD irq callbacks.. The current vfio commit effectively turns rte_intr_enable into no-op for MSI/MSI-X (ignore rxq interrupts for now).. So it is equivalent to removing rte_intr_enable from irq callback. Prior to this commit, rte_intr_enable ends up re-doing irq setup: free_irq() -> request_irq(). In the bugzilla issue (qede), an interrupt arrives in between these, and gets "lost", which causes something that is waiting for it to timeout, etc. In the bz, note that INTx has no issues, probably because it is level-triggered. Now, about this commit (beware unorganized thoughts from me).. I think David wants to turn rte_intr_enable/disable into unmask/mask, and avoid free_irq()/request_irq() "post probe". 3 cases to consider... 1. INTx: Mission accomplished via ACTION_(UN)MASK. 2. MSI/MSI-X and 1st vector (intr_handle->fd): rte_intr_enable/disable is now no-op. This is not quite right, since interrupt remains enabled even after a call to rte_intr_disable. For MSI/MSI-X, ACTION_(UN)MASK is no-op (unimpl) in vfio-pci, so no way to mask. It's been that way ever since, as far as I can tell. Prior to this commit, rte_intr_disable() does free_irq(), so interrupt does get disabled. 3. MSI/MSI-X and rxq vectors (intr_handle->efds): Broken as reported earlier. If we limit scope to only qede, then a variation of David's earlier patch (self NACKed) would be sufficient. http://patchwork.dpdk.org/patch/55310/ qede has separate handlers for INTx and MSI/MSI-X. So, just need to remove rte_intr_enable() from the MSI/MSI-X handler. --- a/drivers/net/qede/qede_ethdev.c +++ b/drivers/net/qede/qede_ethdev.c @@ -261,8 +261,6 @@ qede_interrupt_handler(void *param) struct ecore_dev *edev =3D &qdev->edev; qede_interrupt_action(ECORE_LEADING_HWFN(edev)); - if (rte_intr_enable(eth_dev->intr_handle)) - DP_ERR(edev, "rte_intr_enable failed\n"); } Trying to see if the following works. Do not have a patch yet. - Revert pci_vfio.c so we do not enable interrupt during probe - In eal_interrupts.c - Add state bit so we know if interrupt is enabled - For INTx, if enabled, use David's code to mask/unmask - For MSI/MSI-X, if enabled, do not enable again (i.e. do not do VFIO_DEVICE_SET_IRQS) Jerin or others, do not let me stop you. Kinda reluctant to be the owner of this issue at the moment :-) Thank you. -Hyong