From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4F7B4A04F9; Thu, 9 Jan 2020 18:25:47 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F0D211E4C6; Thu, 9 Jan 2020 18:25:46 +0100 (CET) Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2041.outbound.protection.outlook.com [40.107.22.41]) by dpdk.org (Postfix) with ESMTP id 2FC8D1E4C5 for ; Thu, 9 Jan 2020 18:25:45 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jZYDbffU7bKcdDaxtpXs9TGkAJDasQYhFjKuFluQ1Bwna3gZsUfynTsOBNKH6ZgoioCXWtAHE1kefTyEBz3DKWQPk2pNI6NwgUCMUp0y202WBXil8m9YxUSUpGCevMYAGY1kli1pPMj6t577N9oPuvb0s4ygctdI7aGzcWLvFVoZGJ4uVF+ORlOZFIJIWtSdJgWV0f85WkWAzVnZZVQk6PB1wxHrsklG+Y1JXP46EpkwVFJSIH35BWI8zNkxBypJrTsBhD8mQm3ziG0Udq05qbw5uixfm7pt7jrzj+P9cMrqZU6OrB0Bnva8h7jCxtZ7xMzysnpy+JzJc7eGRDU/gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=p96w31Kr7SeBqEj0+wPH/l5mW1J2M6fCt6O2fxZWpBs=; b=i4Mt5bnGlGtcv3QM6ZcjFyPB7TMgdK7WM1mvTrj5N7D+nudCuHLQRdBcKBVKCYwRPHQUp1in9iCKlKjl7cUMSLlLvL7IyuKSum3g60oYIC6QzcS4O1FNuXIrQo3iuSdggV7/48o4wfco3yoQEvelDVgPTc7VNKVsZJfg3+W2KaylSlYNMTLGMmDYmmdFobIg/O+vA+bIiPMQX7NShVqZEzCecP+ekSe94FoIXIvhqZ8Ho7K2oD6KNF5jOgRWQwNNF7GdkWfgvMvh3Yh9GugiWbhi4C+T32eZSowLDMv0IpKh6Qv+mbWGoSxRRIyS3AKWzcynx0gXh28zvCQRqRKrEQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=p96w31Kr7SeBqEj0+wPH/l5mW1J2M6fCt6O2fxZWpBs=; b=PDRIOMBRwh8JZkXRxTxqJDVDd8nh07G/FjIGRmjSCXekEcFJ2A3lZt/q98y/XiDCxog03PybaaiPi2CnWRY9W6c1ysUqKEpfhb94ECY+IY+QCzFK0A50ZKXuGXuLRSvwD5UP35G9t5r0JBr2yRw4XHxWUNOTUPXive1RJcuUreg= Received: from AM0PR0502MB4019.eurprd05.prod.outlook.com (52.133.39.139) by AM0PR0502MB3747.eurprd05.prod.outlook.com (52.133.47.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2602.11; Thu, 9 Jan 2020 17:25:42 +0000 Received: from AM0PR0502MB4019.eurprd05.prod.outlook.com ([fe80::e495:9258:db09:1de8]) by AM0PR0502MB4019.eurprd05.prod.outlook.com ([fe80::e495:9258:db09:1de8%7]) with mapi id 15.20.2602.016; Thu, 9 Jan 2020 17:25:42 +0000 From: Matan Azrad To: Matan Azrad , Maxime Coquelin , Tiwei Bie , Zhihong Wang , Xiao Wang CC: Ferruh Yigit , "dev@dpdk.org" , Thomas Monjalon , Andrew Rybchenko Thread-Topic: [dpdk-dev] [PATCH v2 3/3] drivers: move ifc driver to the vDPA class Thread-Index: AQHVxtwaAIG74YHQrkKcXf3BkrdAtKfilVUw Date: Thu, 9 Jan 2020 17:25:41 +0000 Message-ID: References: <1577287161-10321-1-git-send-email-matan@mellanox.com> <1578567617-3541-1-git-send-email-matan@mellanox.com> <1578567617-3541-4-git-send-email-matan@mellanox.com> In-Reply-To: <1578567617-3541-4-git-send-email-matan@mellanox.com> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [77.127.34.201] x-ms-publictraffictype: Email x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 50c33d8c-227f-41b9-be2f-08d79528f3fc x-ms-traffictypediagnostic: AM0PR0502MB3747:|AM0PR0502MB3747: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:7691; x-forefront-prvs: 02778BF158 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(396003)(39860400002)(346002)(366004)(376002)(136003)(189003)(199004)(54906003)(81166006)(8676002)(33656002)(2906002)(110136005)(478600001)(4326008)(81156014)(186003)(316002)(66446008)(64756008)(7696005)(30864003)(52536014)(86362001)(71200400001)(66946007)(66556008)(6506007)(66476007)(8936002)(76116006)(5660300002)(55016002)(9686003)(26005)(559001)(569006); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR0502MB3747; H:AM0PR0502MB4019.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: QCpQ/YdiagqPVLyQ6cI5pPwod9iMnmNfIgDmUu6B1vBNrz+d6GXWEHZrd00volZMxbvmxv04rkwoYYLdnoIQcM6ardLOgm3kqVRkE35a83DDT1mHXAE25KmE78xVQ4wkRwkSfVF662LP75YEWGZNnn2+9cCTORVQ4a0sbKxNvY/4SSfSn+RWG7wXE9HxhCKAdDMQjCnwFrp0i9dj9LiiBkqQxdLEID0Hj4q2p7tO7mWJm/osix0kfxBgK1DId5HfDi5PR19Lmxk/lwj24q4+qkEiTp58qZGzg2VHZU0CwLR02LqdZx7yalxczaN1LUUBGh0vfqLRBzcKwcAMBXS+IyjhbJK/8v2b5tnPgJIqzfKPYFMZ5WmN9peCY+B3jkqPgBt++TYYpjkTnC+Cf52XTwV7VH3/j/GQDDgvxDt6BWsqALOrsAFHH84HbYCJixJ86VDI08D0ksJGjEBuERnsmxukX1GJvXYP3vzYK/uTOXQ= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 50c33d8c-227f-41b9-be2f-08d79528f3fc X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jan 2020 17:25:41.9483 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 4cDUy/t1Jg0Z8D7VPKhLjS4An5fCFVkufsaNz+zH/EU8cVRftf8wcabBYUkYa6MAr52Hxwi01OFbL23U55blhg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0502MB3747 Subject: Re: [dpdk-dev] [PATCH v2 3/3] drivers: move ifc driver to the vDPA class X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Small typo inline. From: Matan Azrad > A new vDPA class was recently introduced. >=20 > IFC driver implements the vDPA operations, hence it should be moved to > the vDPA class. >=20 > Move it. >=20 > Signed-off-by: Matan Azrad > --- > MAINTAINERS | 14 +- > doc/guides/nics/features/ifcvf.ini | 8 - > doc/guides/nics/ifc.rst | 106 --- > doc/guides/nics/index.rst | 1 - > doc/guides/vdpadevs/features/ifcvf.ini | 8 + > doc/guides/vdpadevs/ifc.rst | 106 +++ > doc/guides/vdpadevs/index.rst | 1 + > drivers/net/Makefile | 3 - > drivers/net/ifc/Makefile | 34 - > drivers/net/ifc/base/ifcvf.c | 329 -------- > drivers/net/ifc/base/ifcvf.h | 162 ---- > drivers/net/ifc/base/ifcvf_osdep.h | 52 -- > drivers/net/ifc/ifcvf_vdpa.c | 1280 ------------------------= ------ > drivers/net/ifc/meson.build | 9 - > drivers/net/ifc/rte_pmd_ifc_version.map | 3 - > drivers/net/meson.build | 1 - > drivers/vdpa/Makefile | 6 + > drivers/vdpa/ifc/Makefile | 34 + > drivers/vdpa/ifc/base/ifcvf.c | 329 ++++++++ > drivers/vdpa/ifc/base/ifcvf.h | 162 ++++ > drivers/vdpa/ifc/base/ifcvf_osdep.h | 52 ++ > drivers/vdpa/ifc/ifcvf_vdpa.c | 1280 > ++++++++++++++++++++++++++++++ > drivers/vdpa/ifc/meson.build | 9 + > drivers/vdpa/ifc/rte_pmd_ifc_version.map | 3 + > drivers/vdpa/meson.build | 2 +- > 25 files changed, 1997 insertions(+), 1997 deletions(-) > delete mode 100644 doc/guides/nics/features/ifcvf.ini > delete mode 100644 doc/guides/nics/ifc.rst > create mode 100644 doc/guides/vdpadevs/features/ifcvf.ini > create mode 100644 doc/guides/vdpadevs/ifc.rst > delete mode 100644 drivers/net/ifc/Makefile > delete mode 100644 drivers/net/ifc/base/ifcvf.c > delete mode 100644 drivers/net/ifc/base/ifcvf.h > delete mode 100644 drivers/net/ifc/base/ifcvf_osdep.h > delete mode 100644 drivers/net/ifc/ifcvf_vdpa.c > delete mode 100644 drivers/net/ifc/meson.build > delete mode 100644 drivers/net/ifc/rte_pmd_ifc_version.map > create mode 100644 drivers/vdpa/ifc/Makefile > create mode 100644 drivers/vdpa/ifc/base/ifcvf.c > create mode 100644 drivers/vdpa/ifc/base/ifcvf.h > create mode 100644 drivers/vdpa/ifc/base/ifcvf_osdep.h > create mode 100644 drivers/vdpa/ifc/ifcvf_vdpa.c > create mode 100644 drivers/vdpa/ifc/meson.build > create mode 100644 drivers/vdpa/ifc/rte_pmd_ifc_version.map >=20 > diff --git a/MAINTAINERS b/MAINTAINERS > index 17c2df7..16facba 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -679,14 +679,6 @@ T: git://dpdk.org/next/dpdk-next-net-intel > F: drivers/net/iavf/ > F: doc/guides/nics/features/iavf*.ini >=20 > -Intel ifc > -M: Xiao Wang > -T: git://dpdk.org/next/dpdk-next-net-intel > -F: drivers/net/ifc/ > -F: doc/guides/nics/ifc.rst > -F: doc/guides/nics/features/ifc*.ini > - > -Intel ice This line removing is typo. Will be fixed in next version if needed or in integration. > M: Qiming Yang > M: Wenzhuo Lu > T: git://dpdk.org/next/dpdk-next-net-intel > @@ -1093,6 +1085,12 @@ vDPA Drivers > ------------ > T: git://dpdk.org/next/dpdk-next-virtio >=20 > +Intel ifc > +M: Xiao Wang > +F: drivers/vdpa/ifc/ > +F: doc/guides/vdpadevs/ifc.rst > +F: doc/guides/vdpadevs/features/ifcvf.ini > + >=20 > Eventdev Drivers > ---------------- > diff --git a/doc/guides/nics/features/ifcvf.ini > b/doc/guides/nics/features/ifcvf.ini > deleted file mode 100644 > index ef1fc47..0000000 > --- a/doc/guides/nics/features/ifcvf.ini > +++ /dev/null > @@ -1,8 +0,0 @@ > -; > -; Supported features of the 'ifcvf' vDPA driver. > -; > -; Refer to default.ini for the full list of available PMD features. > -; > -[Features] > -x86-32 =3D Y > -x86-64 =3D Y > diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst > deleted file mode 100644 > index 12a2a34..0000000 > --- a/doc/guides/nics/ifc.rst > +++ /dev/null > @@ -1,106 +0,0 @@ > -.. SPDX-License-Identifier: BSD-3-Clause > - Copyright(c) 2018 Intel Corporation. > - > -IFCVF vDPA driver > -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > - > -The IFCVF vDPA (vhost data path acceleration) driver provides support fo= r > the > -Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, = it > -works as a HW vhost backend which can send/receive packets to/from virti= o > -directly by DMA. Besides, it supports dirty page logging and device stat= e > -report/restore, this driver enables its vDPA functionality. > - > - > -Pre-Installation Configuration > ------------------------------- > - > -Config File Options > -~~~~~~~~~~~~~~~~~~~ > - > -The following option can be modified in the ``config`` file. > - > -- ``CONFIG_RTE_LIBRTE_IFC_PMD`` (default ``y`` for linux) > - > - Toggle compilation of the ``librte_pmd_ifc`` driver. > - > - > -IFCVF vDPA Implementation > -------------------------- > - > -IFCVF's vendor ID and device ID are same as that of virtio net pci devic= e, > -with its specific subsystem vendor ID and device ID. To let the device b= e > -probed by IFCVF driver, adding "vdpa=3D1" parameter helps to specify tha= t > this > -device is to be used in vDPA mode, rather than polling mode, virtio pmd = will > -skip when it detects this message. If no this parameter specified, devic= e > -will not be used as a vDPA device, and it will be driven by virtio pmd. > - > -Different VF devices serve different virtio frontends which are in diffe= rent > -VMs, so each VF needs to have its own DMA address translation service. > During > -the driver probe a new container is created for this device, with this > -container vDPA driver can program DMA remapping table with the VM's > memory > -region information. > - > -The device argument "sw-live-migration=3D1" will configure the driver in= to SW > -assisted live migration mode. In this mode, the driver will set up a SW = relay > -thread when LM happens, this thread will help device to log dirty pages. > Thus > -this mode does not require HW to implement a dirty page logging function > block, > -but will consume some percentage of CPU resource depending on the > network > -throughput. If no this parameter specified, driver will rely on device's= logging > -capability. > - > -Key IFCVF vDPA driver ops > -~~~~~~~~~~~~~~~~~~~~~~~~~ > - > -- ifcvf_dev_config: > - Enable VF data path with virtio information provided by vhost lib, inc= luding > - IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt > setup to > - route HW interrupt to virtio driver, create notify relay thread to tra= nslate > - virtio driver's kick to a MMIO write onto HW, HW queues configuration. > - > - This function gets called to set up HW data path backend when virtio d= river > - in VM gets ready. > - > -- ifcvf_dev_close: > - Revoke all the setup in ifcvf_dev_config. > - > - This function gets called when virtio driver stops device in VM. > - > -To create a vhost port with IFC VF > -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > - > -- Create a vhost socket and assign a VF's device ID to this socket via > - vhost API. When QEMU vhost connection gets ready, the assigned VF will > - get configured automatically. > - > - > -Features > --------- > - > -Features of the IFCVF driver are: > - > -- Compatibility with virtio 0.95 and 1.0. > -- SW assisted vDPA live migration. > - > - > -Prerequisites > -------------- > - > -- Platform with IOMMU feature. IFC VF needs address translation service = to > - Rx/Tx directly with virtio driver in VM. > - > - > -Limitations > ------------ > - > -Dependency on vfio-pci > -~~~~~~~~~~~~~~~~~~~~~~ > - > -vDPA driver needs to setup VF MSIX interrupts, each queue's interrupt > vector > -is mapped to a callfd associated with a virtio ring. Currently only vfio= -pci > -allows multiple interrupts, so the IFCVF driver is dependent on vfio-pci= . > - > -Live Migration with VIRTIO_NET_F_GUEST_ANNOUNCE > -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > - > -IFC VF doesn't support RARP packet generation, virtio frontend supportin= g > -VIRTIO_NET_F_GUEST_ANNOUNCE feature can help to do that. > diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst > index d61c27f..8c540c0 100644 > --- a/doc/guides/nics/index.rst > +++ b/doc/guides/nics/index.rst > @@ -31,7 +31,6 @@ Network Interface Controller Drivers > hns3 > i40e > ice > - ifc > igb > ipn3ke > ixgbe > diff --git a/doc/guides/vdpadevs/features/ifcvf.ini > b/doc/guides/vdpadevs/features/ifcvf.ini > new file mode 100644 > index 0000000..ef1fc47 > --- /dev/null > +++ b/doc/guides/vdpadevs/features/ifcvf.ini > @@ -0,0 +1,8 @@ > +; > +; Supported features of the 'ifcvf' vDPA driver. > +; > +; Refer to default.ini for the full list of available PMD features. > +; > +[Features] > +x86-32 =3D Y > +x86-64 =3D Y > diff --git a/doc/guides/vdpadevs/ifc.rst b/doc/guides/vdpadevs/ifc.rst > new file mode 100644 > index 0000000..12a2a34 > --- /dev/null > +++ b/doc/guides/vdpadevs/ifc.rst > @@ -0,0 +1,106 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright(c) 2018 Intel Corporation. > + > +IFCVF vDPA driver > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > + > +The IFCVF vDPA (vhost data path acceleration) driver provides support fo= r > the > +Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, = it > +works as a HW vhost backend which can send/receive packets to/from > virtio > +directly by DMA. Besides, it supports dirty page logging and device stat= e > +report/restore, this driver enables its vDPA functionality. > + > + > +Pre-Installation Configuration > +------------------------------ > + > +Config File Options > +~~~~~~~~~~~~~~~~~~~ > + > +The following option can be modified in the ``config`` file. > + > +- ``CONFIG_RTE_LIBRTE_IFC_PMD`` (default ``y`` for linux) > + > + Toggle compilation of the ``librte_pmd_ifc`` driver. > + > + > +IFCVF vDPA Implementation > +------------------------- > + > +IFCVF's vendor ID and device ID are same as that of virtio net pci devic= e, > +with its specific subsystem vendor ID and device ID. To let the device b= e > +probed by IFCVF driver, adding "vdpa=3D1" parameter helps to specify tha= t > this > +device is to be used in vDPA mode, rather than polling mode, virtio pmd = will > +skip when it detects this message. If no this parameter specified, devic= e > +will not be used as a vDPA device, and it will be driven by virtio pmd. > + > +Different VF devices serve different virtio frontends which are in diffe= rent > +VMs, so each VF needs to have its own DMA address translation service. > During > +the driver probe a new container is created for this device, with this > +container vDPA driver can program DMA remapping table with the VM's > memory > +region information. > + > +The device argument "sw-live-migration=3D1" will configure the driver in= to SW > +assisted live migration mode. In this mode, the driver will set up a SW = relay > +thread when LM happens, this thread will help device to log dirty pages. > Thus > +this mode does not require HW to implement a dirty page logging function > block, > +but will consume some percentage of CPU resource depending on the > network > +throughput. If no this parameter specified, driver will rely on device's > logging > +capability. > + > +Key IFCVF vDPA driver ops > +~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +- ifcvf_dev_config: > + Enable VF data path with virtio information provided by vhost lib, inc= luding > + IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt > setup to > + route HW interrupt to virtio driver, create notify relay thread to tra= nslate > + virtio driver's kick to a MMIO write onto HW, HW queues configuration. > + > + This function gets called to set up HW data path backend when virtio d= river > + in VM gets ready. > + > +- ifcvf_dev_close: > + Revoke all the setup in ifcvf_dev_config. > + > + This function gets called when virtio driver stops device in VM. > + > +To create a vhost port with IFC VF > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +- Create a vhost socket and assign a VF's device ID to this socket via > + vhost API. When QEMU vhost connection gets ready, the assigned VF will > + get configured automatically. > + > + > +Features > +-------- > + > +Features of the IFCVF driver are: > + > +- Compatibility with virtio 0.95 and 1.0. > +- SW assisted vDPA live migration. > + > + > +Prerequisites > +------------- > + > +- Platform with IOMMU feature. IFC VF needs address translation service = to > + Rx/Tx directly with virtio driver in VM. > + > + > +Limitations > +----------- > + > +Dependency on vfio-pci > +~~~~~~~~~~~~~~~~~~~~~~ > + > +vDPA driver needs to setup VF MSIX interrupts, each queue's interrupt > vector > +is mapped to a callfd associated with a virtio ring. Currently only vfio= -pci > +allows multiple interrupts, so the IFCVF driver is dependent on vfio-pci= . > + > +Live Migration with VIRTIO_NET_F_GUEST_ANNOUNCE > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +IFC VF doesn't support RARP packet generation, virtio frontend supportin= g > +VIRTIO_NET_F_GUEST_ANNOUNCE feature can help to do that. > diff --git a/doc/guides/vdpadevs/index.rst b/doc/guides/vdpadevs/index.rs= t > index 89e2b03..6cf0827 100644 > --- a/doc/guides/vdpadevs/index.rst > +++ b/doc/guides/vdpadevs/index.rst > @@ -12,3 +12,4 @@ which can be used from an application through vhost > API. > :numbered: >=20 > features_overview > + ifc > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index cee3036..cca3c44 100644 > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -71,9 +71,6 @@ endif # $(CONFIG_RTE_LIBRTE_SCHED) >=20 > ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) > DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) +=3D vhost > -ifeq ($(CONFIG_RTE_EAL_VFIO),y) > -DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) +=3D ifc > -endif > endif # $(CONFIG_RTE_LIBRTE_VHOST) >=20 > ifeq ($(CONFIG_RTE_LIBRTE_MVPP2_PMD),y) > diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile > deleted file mode 100644 > index fe227b8..0000000 > --- a/drivers/net/ifc/Makefile > +++ /dev/null > @@ -1,34 +0,0 @@ > -# SPDX-License-Identifier: BSD-3-Clause > -# Copyright(c) 2018 Intel Corporation > - > -include $(RTE_SDK)/mk/rte.vars.mk > - > -# > -# library name > -# > -LIB =3D librte_pmd_ifc.a > - > -LDLIBS +=3D -lpthread > -LDLIBS +=3D -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci > -LDLIBS +=3D -lrte_kvargs > - > -CFLAGS +=3D -O3 > -CFLAGS +=3D $(WERROR_FLAGS) > -CFLAGS +=3D -DALLOW_EXPERIMENTAL_API > - > -# > -# Add extra flags for base driver source files to disable warnings in th= em > -# > -BASE_DRIVER_OBJS=3D$(sort $(patsubst %.c,%.o,$(notdir $(wildcard > $(SRCDIR)/base/*.c)))) > - > -VPATH +=3D $(SRCDIR)/base > - > -EXPORT_MAP :=3D rte_pmd_ifc_version.map > - > -# > -# all source are stored in SRCS-y > -# > -SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) +=3D ifcvf_vdpa.c > -SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) +=3D ifcvf.c > - > -include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/net/ifc/base/ifcvf.c b/drivers/net/ifc/base/ifcvf.c > deleted file mode 100644 > index 3c0b2df..0000000 > --- a/drivers/net/ifc/base/ifcvf.c > +++ /dev/null > @@ -1,329 +0,0 @@ > -/* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2018 Intel Corporation > - */ > - > -#include "ifcvf.h" > -#include "ifcvf_osdep.h" > - > -STATIC void * > -get_cap_addr(struct ifcvf_hw *hw, struct ifcvf_pci_cap *cap) > -{ > - u8 bar =3D cap->bar; > - u32 length =3D cap->length; > - u32 offset =3D cap->offset; > - > - if (bar > IFCVF_PCI_MAX_RESOURCE - 1) { > - DEBUGOUT("invalid bar: %u\n", bar); > - return NULL; > - } > - > - if (offset + length < offset) { > - DEBUGOUT("offset(%u) + length(%u) overflows\n", > - offset, length); > - return NULL; > - } > - > - if (offset + length > hw->mem_resource[cap->bar].len) { > - DEBUGOUT("offset(%u) + length(%u) overflows bar > length(%u)", > - offset, length, (u32)hw->mem_resource[cap- > >bar].len); > - return NULL; > - } > - > - return hw->mem_resource[bar].addr + offset; > -} > - > -int > -ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev) > -{ > - int ret; > - u8 pos; > - struct ifcvf_pci_cap cap; > - > - ret =3D PCI_READ_CONFIG_BYTE(dev, &pos, PCI_CAPABILITY_LIST); > - if (ret < 0) { > - DEBUGOUT("failed to read pci capability list\n"); > - return -1; > - } > - > - while (pos) { > - ret =3D PCI_READ_CONFIG_RANGE(dev, (u32 *)&cap, > - sizeof(cap), pos); > - if (ret < 0) { > - DEBUGOUT("failed to read cap at pos: %x", pos); > - break; > - } > - > - if (cap.cap_vndr !=3D PCI_CAP_ID_VNDR) > - goto next; > - > - DEBUGOUT("cfg type: %u, bar: %u, offset: %u, " > - "len: %u\n", cap.cfg_type, cap.bar, > - cap.offset, cap.length); > - > - switch (cap.cfg_type) { > - case IFCVF_PCI_CAP_COMMON_CFG: > - hw->common_cfg =3D get_cap_addr(hw, &cap); > - break; > - case IFCVF_PCI_CAP_NOTIFY_CFG: > - PCI_READ_CONFIG_DWORD(dev, &hw- > >notify_off_multiplier, > - pos + sizeof(cap)); > - hw->notify_base =3D get_cap_addr(hw, &cap); > - hw->notify_region =3D cap.bar; > - break; > - case IFCVF_PCI_CAP_ISR_CFG: > - hw->isr =3D get_cap_addr(hw, &cap); > - break; > - case IFCVF_PCI_CAP_DEVICE_CFG: > - hw->dev_cfg =3D get_cap_addr(hw, &cap); > - break; > - } > -next: > - pos =3D cap.cap_next; > - } > - > - hw->lm_cfg =3D hw->mem_resource[4].addr; > - > - if (hw->common_cfg =3D=3D NULL || hw->notify_base =3D=3D NULL || > - hw->isr =3D=3D NULL || hw->dev_cfg =3D=3D NULL) { > - DEBUGOUT("capability incomplete\n"); > - return -1; > - } > - > - DEBUGOUT("capability mapping:\ncommon cfg: %p\n" > - "notify base: %p\nisr cfg: %p\ndevice cfg: %p\n" > - "multiplier: %u\n", > - hw->common_cfg, hw->dev_cfg, > - hw->isr, hw->notify_base, > - hw->notify_off_multiplier); > - > - return 0; > -} > - > -STATIC u8 > -ifcvf_get_status(struct ifcvf_hw *hw) > -{ > - return IFCVF_READ_REG8(&hw->common_cfg->device_status); > -} > - > -STATIC void > -ifcvf_set_status(struct ifcvf_hw *hw, u8 status) > -{ > - IFCVF_WRITE_REG8(status, &hw->common_cfg->device_status); > -} > - > -STATIC void > -ifcvf_reset(struct ifcvf_hw *hw) > -{ > - ifcvf_set_status(hw, 0); > - > - /* flush status write */ > - while (ifcvf_get_status(hw)) > - msec_delay(1); > -} > - > -STATIC void > -ifcvf_add_status(struct ifcvf_hw *hw, u8 status) > -{ > - if (status !=3D 0) > - status |=3D ifcvf_get_status(hw); > - > - ifcvf_set_status(hw, status); > - ifcvf_get_status(hw); > -} > - > -u64 > -ifcvf_get_features(struct ifcvf_hw *hw) > -{ > - u32 features_lo, features_hi; > - struct ifcvf_pci_common_cfg *cfg =3D hw->common_cfg; > - > - IFCVF_WRITE_REG32(0, &cfg->device_feature_select); > - features_lo =3D IFCVF_READ_REG32(&cfg->device_feature); > - > - IFCVF_WRITE_REG32(1, &cfg->device_feature_select); > - features_hi =3D IFCVF_READ_REG32(&cfg->device_feature); > - > - return ((u64)features_hi << 32) | features_lo; > -} > - > -STATIC void > -ifcvf_set_features(struct ifcvf_hw *hw, u64 features) > -{ > - struct ifcvf_pci_common_cfg *cfg =3D hw->common_cfg; > - > - IFCVF_WRITE_REG32(0, &cfg->guest_feature_select); > - IFCVF_WRITE_REG32(features & ((1ULL << 32) - 1), &cfg- > >guest_feature); > - > - IFCVF_WRITE_REG32(1, &cfg->guest_feature_select); > - IFCVF_WRITE_REG32(features >> 32, &cfg->guest_feature); > -} > - > -STATIC int > -ifcvf_config_features(struct ifcvf_hw *hw) > -{ > - u64 host_features; > - > - host_features =3D ifcvf_get_features(hw); > - hw->req_features &=3D host_features; > - > - ifcvf_set_features(hw, hw->req_features); > - ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_FEATURES_OK); > - > - if (!(ifcvf_get_status(hw) & > IFCVF_CONFIG_STATUS_FEATURES_OK)) { > - DEBUGOUT("failed to set FEATURES_OK status\n"); > - return -1; > - } > - > - return 0; > -} > - > -STATIC void > -io_write64_twopart(u64 val, u32 *lo, u32 *hi) > -{ > - IFCVF_WRITE_REG32(val & ((1ULL << 32) - 1), lo); > - IFCVF_WRITE_REG32(val >> 32, hi); > -} > - > -STATIC int > -ifcvf_hw_enable(struct ifcvf_hw *hw) > -{ > - struct ifcvf_pci_common_cfg *cfg; > - u8 *lm_cfg; > - u32 i; > - u16 notify_off; > - > - cfg =3D hw->common_cfg; > - lm_cfg =3D hw->lm_cfg; > - > - IFCVF_WRITE_REG16(0, &cfg->msix_config); > - if (IFCVF_READ_REG16(&cfg->msix_config) =3D=3D > IFCVF_MSI_NO_VECTOR) { > - DEBUGOUT("msix vec alloc failed for device config\n"); > - return -1; > - } > - > - for (i =3D 0; i < hw->nr_vring; i++) { > - IFCVF_WRITE_REG16(i, &cfg->queue_select); > - io_write64_twopart(hw->vring[i].desc, &cfg- > >queue_desc_lo, > - &cfg->queue_desc_hi); > - io_write64_twopart(hw->vring[i].avail, &cfg- > >queue_avail_lo, > - &cfg->queue_avail_hi); > - io_write64_twopart(hw->vring[i].used, &cfg- > >queue_used_lo, > - &cfg->queue_used_hi); > - IFCVF_WRITE_REG16(hw->vring[i].size, &cfg->queue_size); > - > - *(u32 *)(lm_cfg + IFCVF_LM_RING_STATE_OFFSET + > - (i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4) =3D > - (u32)hw->vring[i].last_avail_idx | > - ((u32)hw->vring[i].last_used_idx << 16); > - > - IFCVF_WRITE_REG16(i + 1, &cfg->queue_msix_vector); > - if (IFCVF_READ_REG16(&cfg->queue_msix_vector) =3D=3D > - IFCVF_MSI_NO_VECTOR) { > - DEBUGOUT("queue %u, msix vec alloc failed\n", > - i); > - return -1; > - } > - > - notify_off =3D IFCVF_READ_REG16(&cfg->queue_notify_off); > - hw->notify_addr[i] =3D (void *)((u8 *)hw->notify_base + > - notify_off * hw->notify_off_multiplier); > - IFCVF_WRITE_REG16(1, &cfg->queue_enable); > - } > - > - return 0; > -} > - > -STATIC void > -ifcvf_hw_disable(struct ifcvf_hw *hw) > -{ > - u32 i; > - struct ifcvf_pci_common_cfg *cfg; > - u32 ring_state; > - > - cfg =3D hw->common_cfg; > - > - IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg->msix_config); > - for (i =3D 0; i < hw->nr_vring; i++) { > - IFCVF_WRITE_REG16(i, &cfg->queue_select); > - IFCVF_WRITE_REG16(0, &cfg->queue_enable); > - IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg- > >queue_msix_vector); > - ring_state =3D *(u32 *)(hw->lm_cfg + > IFCVF_LM_RING_STATE_OFFSET + > - (i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4); > - hw->vring[i].last_avail_idx =3D (u16)(ring_state >> 16); > - hw->vring[i].last_used_idx =3D (u16)(ring_state >> 16); > - } > -} > - > -int > -ifcvf_start_hw(struct ifcvf_hw *hw) > -{ > - ifcvf_reset(hw); > - ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_ACK); > - ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER); > - > - if (ifcvf_config_features(hw) < 0) > - return -1; > - > - if (ifcvf_hw_enable(hw) < 0) > - return -1; > - > - ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER_OK); > - return 0; > -} > - > -void > -ifcvf_stop_hw(struct ifcvf_hw *hw) > -{ > - ifcvf_hw_disable(hw); > - ifcvf_reset(hw); > -} > - > -void > -ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size) > -{ > - u8 *lm_cfg; > - > - lm_cfg =3D hw->lm_cfg; > - > - *(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_LOW) =3D > - log_base & IFCVF_32_BIT_MASK; > - > - *(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_HIGH) =3D > - (log_base >> 32) & IFCVF_32_BIT_MASK; > - > - *(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_LOW) =3D > - (log_base + log_size) & IFCVF_32_BIT_MASK; > - > - *(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_HIGH) =3D > - ((log_base + log_size) >> 32) & IFCVF_32_BIT_MASK; > - > - *(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) =3D > IFCVF_LM_ENABLE_VF; > -} > - > -void > -ifcvf_disable_logging(struct ifcvf_hw *hw) > -{ > - u8 *lm_cfg; > - > - lm_cfg =3D hw->lm_cfg; > - *(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) =3D IFCVF_LM_DISABLE; > -} > - > -void > -ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid) > -{ > - IFCVF_WRITE_REG16(qid, hw->notify_addr[qid]); > -} > - > -u8 > -ifcvf_get_notify_region(struct ifcvf_hw *hw) > -{ > - return hw->notify_region; > -} > - > -u64 > -ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid) > -{ > - return (u8 *)hw->notify_addr[qid] - > - (u8 *)hw->mem_resource[hw->notify_region].addr; > -} > diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h > deleted file mode 100644 > index 9be2770..0000000 > --- a/drivers/net/ifc/base/ifcvf.h > +++ /dev/null > @@ -1,162 +0,0 @@ > -/* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2018 Intel Corporation > - */ > - > -#ifndef _IFCVF_H_ > -#define _IFCVF_H_ > - > -#include "ifcvf_osdep.h" > - > -#define IFCVF_VENDOR_ID 0x1AF4 > -#define IFCVF_DEVICE_ID 0x1041 > -#define IFCVF_SUBSYS_VENDOR_ID 0x8086 > -#define IFCVF_SUBSYS_DEVICE_ID 0x001A > - > -#define IFCVF_MAX_QUEUES 1 > -#define VIRTIO_F_IOMMU_PLATFORM 33 > - > -/* Common configuration */ > -#define IFCVF_PCI_CAP_COMMON_CFG 1 > -/* Notifications */ > -#define IFCVF_PCI_CAP_NOTIFY_CFG 2 > -/* ISR Status */ > -#define IFCVF_PCI_CAP_ISR_CFG 3 > -/* Device specific configuration */ > -#define IFCVF_PCI_CAP_DEVICE_CFG 4 > -/* PCI configuration access */ > -#define IFCVF_PCI_CAP_PCI_CFG 5 > - > -#define IFCVF_CONFIG_STATUS_RESET 0x00 > -#define IFCVF_CONFIG_STATUS_ACK 0x01 > -#define IFCVF_CONFIG_STATUS_DRIVER 0x02 > -#define IFCVF_CONFIG_STATUS_DRIVER_OK 0x04 > -#define IFCVF_CONFIG_STATUS_FEATURES_OK 0x08 > -#define IFCVF_CONFIG_STATUS_FAILED 0x80 > - > -#define IFCVF_MSI_NO_VECTOR 0xffff > -#define IFCVF_PCI_MAX_RESOURCE 6 > - > -#define IFCVF_LM_CFG_SIZE 0x40 > -#define IFCVF_LM_RING_STATE_OFFSET 0x20 > - > -#define IFCVF_LM_LOGGING_CTRL 0x0 > - > -#define IFCVF_LM_BASE_ADDR_LOW 0x10 > -#define IFCVF_LM_BASE_ADDR_HIGH 0x14 > -#define IFCVF_LM_END_ADDR_LOW 0x18 > -#define IFCVF_LM_END_ADDR_HIGH 0x1c > - > -#define IFCVF_LM_DISABLE 0x0 > -#define IFCVF_LM_ENABLE_VF 0x1 > -#define IFCVF_LM_ENABLE_PF 0x3 > -#define IFCVF_LOG_BASE 0x100000000000 > -#define IFCVF_MEDIATED_VRING 0x200000000000 > - > -#define IFCVF_32_BIT_MASK 0xffffffff > - > - > -struct ifcvf_pci_cap { > - u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ > - u8 cap_next; /* Generic PCI field: next ptr. */ > - u8 cap_len; /* Generic PCI field: capability length */ > - u8 cfg_type; /* Identifies the structure. */ > - u8 bar; /* Where to find it. */ > - u8 padding[3]; /* Pad to full dword. */ > - u32 offset; /* Offset within bar. */ > - u32 length; /* Length of the structure, in bytes. */ > -}; > - > -struct ifcvf_pci_notify_cap { > - struct ifcvf_pci_cap cap; > - u32 notify_off_multiplier; /* Multiplier for queue_notify_off. */ > -}; > - > -struct ifcvf_pci_common_cfg { > - /* About the whole device. */ > - u32 device_feature_select; > - u32 device_feature; > - u32 guest_feature_select; > - u32 guest_feature; > - u16 msix_config; > - u16 num_queues; > - u8 device_status; > - u8 config_generation; > - > - /* About a specific virtqueue. */ > - u16 queue_select; > - u16 queue_size; > - u16 queue_msix_vector; > - u16 queue_enable; > - u16 queue_notify_off; > - u32 queue_desc_lo; > - u32 queue_desc_hi; > - u32 queue_avail_lo; > - u32 queue_avail_hi; > - u32 queue_used_lo; > - u32 queue_used_hi; > -}; > - > -struct ifcvf_net_config { > - u8 mac[6]; > - u16 status; > - u16 max_virtqueue_pairs; > -} __attribute__((packed)); > - > -struct ifcvf_pci_mem_resource { > - u64 phys_addr; /**< Physical address, 0 if not resource. */ > - u64 len; /**< Length of the resource. */ > - u8 *addr; /**< Virtual address, NULL when not mapped. */ > -}; > - > -struct vring_info { > - u64 desc; > - u64 avail; > - u64 used; > - u16 size; > - u16 last_avail_idx; > - u16 last_used_idx; > -}; > - > -struct ifcvf_hw { > - u64 req_features; > - u8 notify_region; > - u32 notify_off_multiplier; > - struct ifcvf_pci_common_cfg *common_cfg; > - struct ifcvf_net_config *dev_cfg; > - u8 *isr; > - u16 *notify_base; > - u16 *notify_addr[IFCVF_MAX_QUEUES * 2]; > - u8 *lm_cfg; > - struct vring_info vring[IFCVF_MAX_QUEUES * 2]; > - u8 nr_vring; > - struct ifcvf_pci_mem_resource > mem_resource[IFCVF_PCI_MAX_RESOURCE]; > -}; > - > -int > -ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev); > - > -u64 > -ifcvf_get_features(struct ifcvf_hw *hw); > - > -int > -ifcvf_start_hw(struct ifcvf_hw *hw); > - > -void > -ifcvf_stop_hw(struct ifcvf_hw *hw); > - > -void > -ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size); > - > -void > -ifcvf_disable_logging(struct ifcvf_hw *hw); > - > -void > -ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid); > - > -u8 > -ifcvf_get_notify_region(struct ifcvf_hw *hw); > - > -u64 > -ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid); > - > -#endif /* _IFCVF_H_ */ > diff --git a/drivers/net/ifc/base/ifcvf_osdep.h > b/drivers/net/ifc/base/ifcvf_osdep.h > deleted file mode 100644 > index 6aef25e..0000000 > --- a/drivers/net/ifc/base/ifcvf_osdep.h > +++ /dev/null > @@ -1,52 +0,0 @@ > -/* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2018 Intel Corporation > - */ > - > -#ifndef _IFCVF_OSDEP_H_ > -#define _IFCVF_OSDEP_H_ > - > -#include > -#include > - > -#include > -#include > -#include > -#include > -#include > - > -#define DEBUGOUT(S, args...) RTE_LOG(DEBUG, PMD, S, ##args) > -#define STATIC static > - > -#define msec_delay(x) rte_delay_us_sleep(1000 * (x)) > - > -#define IFCVF_READ_REG8(reg) rte_read8(reg) > -#define IFCVF_WRITE_REG8(val, reg) rte_write8((val), (reg)) > -#define IFCVF_READ_REG16(reg) rte_read16(reg) > -#define IFCVF_WRITE_REG16(val, reg) rte_write16((val), (reg)) > -#define IFCVF_READ_REG32(reg) rte_read32(reg) > -#define IFCVF_WRITE_REG32(val, reg) rte_write32((val), (reg)) > - > -typedef struct rte_pci_device PCI_DEV; > - > -#define PCI_READ_CONFIG_BYTE(dev, val, where) \ > - rte_pci_read_config(dev, val, 1, where) > - > -#define PCI_READ_CONFIG_DWORD(dev, val, where) \ > - rte_pci_read_config(dev, val, 4, where) > - > -typedef uint8_t u8; > -typedef int8_t s8; > -typedef uint16_t u16; > -typedef int16_t s16; > -typedef uint32_t u32; > -typedef int32_t s32; > -typedef int64_t s64; > -typedef uint64_t u64; > - > -static inline int > -PCI_READ_CONFIG_RANGE(PCI_DEV *dev, uint32_t *val, int size, int > where) > -{ > - return rte_pci_read_config(dev, val, size, where); > -} > - > -#endif /* _IFCVF_OSDEP_H_ */ > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c > deleted file mode 100644 > index da4667b..0000000 > --- a/drivers/net/ifc/ifcvf_vdpa.c > +++ /dev/null > @@ -1,1280 +0,0 @@ > -/* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2018 Intel Corporation > - */ > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -#include "base/ifcvf.h" > - > -#define DRV_LOG(level, fmt, args...) \ > - rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \ > - "IFCVF %s(): " fmt "\n", __func__, ##args) > - > -#ifndef PAGE_SIZE > -#define PAGE_SIZE 4096 > -#endif > - > -#define IFCVF_USED_RING_LEN(size) \ > - ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) > - > -#define IFCVF_VDPA_MODE "vdpa" > -#define IFCVF_SW_FALLBACK_LM "sw-live-migration" > - > -static const char * const ifcvf_valid_arguments[] =3D { > - IFCVF_VDPA_MODE, > - IFCVF_SW_FALLBACK_LM, > - NULL > -}; > - > -static int ifcvf_vdpa_logtype; > - > -struct ifcvf_internal { > - struct rte_vdpa_dev_addr dev_addr; > - struct rte_pci_device *pdev; > - struct ifcvf_hw hw; > - int vfio_container_fd; > - int vfio_group_fd; > - int vfio_dev_fd; > - pthread_t tid; /* thread for notify relay */ > - int epfd; > - int vid; > - int did; > - uint16_t max_queues; > - uint64_t features; > - rte_atomic32_t started; > - rte_atomic32_t dev_attached; > - rte_atomic32_t running; > - rte_spinlock_t lock; > - bool sw_lm; > - bool sw_fallback_running; > - /* mediated vring for sw fallback */ > - struct vring m_vring[IFCVF_MAX_QUEUES * 2]; > - /* eventfd for used ring interrupt */ > - int intr_fd[IFCVF_MAX_QUEUES * 2]; > -}; > - > -struct internal_list { > - TAILQ_ENTRY(internal_list) next; > - struct ifcvf_internal *internal; > -}; > - > -TAILQ_HEAD(internal_list_head, internal_list); > -static struct internal_list_head internal_list =3D > - TAILQ_HEAD_INITIALIZER(internal_list); > - > -static pthread_mutex_t internal_list_lock =3D PTHREAD_MUTEX_INITIALIZER; > - > -static void update_used_ring(struct ifcvf_internal *internal, uint16_t q= id); > - > -static struct internal_list * > -find_internal_resource_by_did(int did) > -{ > - int found =3D 0; > - struct internal_list *list; > - > - pthread_mutex_lock(&internal_list_lock); > - > - TAILQ_FOREACH(list, &internal_list, next) { > - if (did =3D=3D list->internal->did) { > - found =3D 1; > - break; > - } > - } > - > - pthread_mutex_unlock(&internal_list_lock); > - > - if (!found) > - return NULL; > - > - return list; > -} > - > -static struct internal_list * > -find_internal_resource_by_dev(struct rte_pci_device *pdev) > -{ > - int found =3D 0; > - struct internal_list *list; > - > - pthread_mutex_lock(&internal_list_lock); > - > - TAILQ_FOREACH(list, &internal_list, next) { > - if (pdev =3D=3D list->internal->pdev) { > - found =3D 1; > - break; > - } > - } > - > - pthread_mutex_unlock(&internal_list_lock); > - > - if (!found) > - return NULL; > - > - return list; > -} > - > -static int > -ifcvf_vfio_setup(struct ifcvf_internal *internal) > -{ > - struct rte_pci_device *dev =3D internal->pdev; > - char devname[RTE_DEV_NAME_MAX_LEN] =3D {0}; > - int iommu_group_num; > - int i, ret; > - > - internal->vfio_dev_fd =3D -1; > - internal->vfio_group_fd =3D -1; > - internal->vfio_container_fd =3D -1; > - > - rte_pci_device_name(&dev->addr, devname, > RTE_DEV_NAME_MAX_LEN); > - ret =3D rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname, > - &iommu_group_num); > - if (ret <=3D 0) { > - DRV_LOG(ERR, "%s failed to get IOMMU group", devname); > - return -1; > - } > - > - internal->vfio_container_fd =3D rte_vfio_container_create(); > - if (internal->vfio_container_fd < 0) > - return -1; > - > - internal->vfio_group_fd =3D rte_vfio_container_group_bind( > - internal->vfio_container_fd, iommu_group_num); > - if (internal->vfio_group_fd < 0) > - goto err; > - > - if (rte_pci_map_device(dev)) > - goto err; > - > - internal->vfio_dev_fd =3D dev->intr_handle.vfio_dev_fd; > - > - for (i =3D 0; i < RTE_MIN(PCI_MAX_RESOURCE, > IFCVF_PCI_MAX_RESOURCE); > - i++) { > - internal->hw.mem_resource[i].addr =3D > - internal->pdev->mem_resource[i].addr; > - internal->hw.mem_resource[i].phys_addr =3D > - internal->pdev->mem_resource[i].phys_addr; > - internal->hw.mem_resource[i].len =3D > - internal->pdev->mem_resource[i].len; > - } > - > - return 0; > - > -err: > - rte_vfio_container_destroy(internal->vfio_container_fd); > - return -1; > -} > - > -static int > -ifcvf_dma_map(struct ifcvf_internal *internal, int do_map) > -{ > - uint32_t i; > - int ret; > - struct rte_vhost_memory *mem =3D NULL; > - int vfio_container_fd; > - > - ret =3D rte_vhost_get_mem_table(internal->vid, &mem); > - if (ret < 0) { > - DRV_LOG(ERR, "failed to get VM memory layout."); > - goto exit; > - } > - > - vfio_container_fd =3D internal->vfio_container_fd; > - > - for (i =3D 0; i < mem->nregions; i++) { > - struct rte_vhost_mem_region *reg; > - > - reg =3D &mem->regions[i]; > - DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " > - "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", > - do_map ? "DMA map" : "DMA unmap", i, > - reg->host_user_addr, reg->guest_phys_addr, reg- > >size); > - > - if (do_map) { > - ret =3D > rte_vfio_container_dma_map(vfio_container_fd, > - reg->host_user_addr, reg- > >guest_phys_addr, > - reg->size); > - if (ret < 0) { > - DRV_LOG(ERR, "DMA map failed."); > - goto exit; > - } > - } else { > - ret =3D > rte_vfio_container_dma_unmap(vfio_container_fd, > - reg->host_user_addr, reg- > >guest_phys_addr, > - reg->size); > - if (ret < 0) { > - DRV_LOG(ERR, "DMA unmap failed."); > - goto exit; > - } > - } > - } > - > -exit: > - if (mem) > - free(mem); > - return ret; > -} > - > -static uint64_t > -hva_to_gpa(int vid, uint64_t hva) > -{ > - struct rte_vhost_memory *mem =3D NULL; > - struct rte_vhost_mem_region *reg; > - uint32_t i; > - uint64_t gpa =3D 0; > - > - if (rte_vhost_get_mem_table(vid, &mem) < 0) > - goto exit; > - > - for (i =3D 0; i < mem->nregions; i++) { > - reg =3D &mem->regions[i]; > - > - if (hva >=3D reg->host_user_addr && > - hva < reg->host_user_addr + reg->size) { > - gpa =3D hva - reg->host_user_addr + reg- > >guest_phys_addr; > - break; > - } > - } > - > -exit: > - if (mem) > - free(mem); > - return gpa; > -} > - > -static int > -vdpa_ifcvf_start(struct ifcvf_internal *internal) > -{ > - struct ifcvf_hw *hw =3D &internal->hw; > - int i, nr_vring; > - int vid; > - struct rte_vhost_vring vq; > - uint64_t gpa; > - > - vid =3D internal->vid; > - nr_vring =3D rte_vhost_get_vring_num(vid); > - rte_vhost_get_negotiated_features(vid, &hw->req_features); > - > - for (i =3D 0; i < nr_vring; i++) { > - rte_vhost_get_vhost_vring(vid, i, &vq); > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); > - if (gpa =3D=3D 0) { > - DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > - return -1; > - } > - hw->vring[i].desc =3D gpa; > - > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail); > - if (gpa =3D=3D 0) { > - DRV_LOG(ERR, "Fail to get GPA for available ring."); > - return -1; > - } > - hw->vring[i].avail =3D gpa; > - > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > - if (gpa =3D=3D 0) { > - DRV_LOG(ERR, "Fail to get GPA for used ring."); > - return -1; > - } > - hw->vring[i].used =3D gpa; > - > - hw->vring[i].size =3D vq.size; > - rte_vhost_get_vring_base(vid, i, &hw- > >vring[i].last_avail_idx, > - &hw->vring[i].last_used_idx); > - } > - hw->nr_vring =3D i; > - > - return ifcvf_start_hw(&internal->hw); > -} > - > -static void > -vdpa_ifcvf_stop(struct ifcvf_internal *internal) > -{ > - struct ifcvf_hw *hw =3D &internal->hw; > - uint32_t i; > - int vid; > - uint64_t features =3D 0; > - uint64_t log_base =3D 0, log_size =3D 0; > - uint64_t len; > - > - vid =3D internal->vid; > - ifcvf_stop_hw(hw); > - > - for (i =3D 0; i < hw->nr_vring; i++) > - rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > - hw->vring[i].last_used_idx); > - > - if (internal->sw_lm) > - return; > - > - rte_vhost_get_negotiated_features(vid, &features); > - if (RTE_VHOST_NEED_LOG(features)) { > - ifcvf_disable_logging(hw); > - rte_vhost_get_log_base(internal->vid, &log_base, > &log_size); > - rte_vfio_container_dma_unmap(internal- > >vfio_container_fd, > - log_base, IFCVF_LOG_BASE, log_size); > - /* > - * IFCVF marks dirty memory pages for only packet buffer, > - * SW helps to mark the used ring as dirty after device stops. > - */ > - for (i =3D 0; i < hw->nr_vring; i++) { > - len =3D IFCVF_USED_RING_LEN(hw->vring[i].size); > - rte_vhost_log_used_vring(vid, i, 0, len); > - } > - } > -} > - > -#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \ > - sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1)) > -static int > -vdpa_enable_vfio_intr(struct ifcvf_internal *internal, bool m_rx) > -{ > - int ret; > - uint32_t i, nr_vring; > - char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > - struct vfio_irq_set *irq_set; > - int *fd_ptr; > - struct rte_vhost_vring vring; > - int fd; > - > - vring.callfd =3D -1; > - > - nr_vring =3D rte_vhost_get_vring_num(internal->vid); > - > - irq_set =3D (struct vfio_irq_set *)irq_set_buf; > - irq_set->argsz =3D sizeof(irq_set_buf); > - irq_set->count =3D nr_vring + 1; > - irq_set->flags =3D VFIO_IRQ_SET_DATA_EVENTFD | > - VFIO_IRQ_SET_ACTION_TRIGGER; > - irq_set->index =3D VFIO_PCI_MSIX_IRQ_INDEX; > - irq_set->start =3D 0; > - fd_ptr =3D (int *)&irq_set->data; > - fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] =3D internal->pdev- > >intr_handle.fd; > - > - for (i =3D 0; i < nr_vring; i++) > - internal->intr_fd[i] =3D -1; > - > - for (i =3D 0; i < nr_vring; i++) { > - rte_vhost_get_vhost_vring(internal->vid, i, &vring); > - fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] =3D vring.callfd; > - if ((i & 1) =3D=3D 0 && m_rx =3D=3D true) { > - fd =3D eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > - if (fd < 0) { > - DRV_LOG(ERR, "can't setup eventfd: %s", > - strerror(errno)); > - return -1; > - } > - internal->intr_fd[i] =3D fd; > - fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] =3D fd; > - } > - } > - > - ret =3D ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > - if (ret) { > - DRV_LOG(ERR, "Error enabling MSI-X interrupts: %s", > - strerror(errno)); > - return -1; > - } > - > - return 0; > -} > - > -static int > -vdpa_disable_vfio_intr(struct ifcvf_internal *internal) > -{ > - int ret; > - uint32_t i, nr_vring; > - char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > - struct vfio_irq_set *irq_set; > - > - irq_set =3D (struct vfio_irq_set *)irq_set_buf; > - irq_set->argsz =3D sizeof(irq_set_buf); > - irq_set->count =3D 0; > - irq_set->flags =3D VFIO_IRQ_SET_DATA_NONE | > VFIO_IRQ_SET_ACTION_TRIGGER; > - irq_set->index =3D VFIO_PCI_MSIX_IRQ_INDEX; > - irq_set->start =3D 0; > - > - nr_vring =3D rte_vhost_get_vring_num(internal->vid); > - for (i =3D 0; i < nr_vring; i++) { > - if (internal->intr_fd[i] >=3D 0) > - close(internal->intr_fd[i]); > - internal->intr_fd[i] =3D -1; > - } > - > - ret =3D ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > - if (ret) { > - DRV_LOG(ERR, "Error disabling MSI-X interrupts: %s", > - strerror(errno)); > - return -1; > - } > - > - return 0; > -} > - > -static void * > -notify_relay(void *arg) > -{ > - int i, kickfd, epfd, nfds =3D 0; > - uint32_t qid, q_num; > - struct epoll_event events[IFCVF_MAX_QUEUES * 2]; > - struct epoll_event ev; > - uint64_t buf; > - int nbytes; > - struct rte_vhost_vring vring; > - struct ifcvf_internal *internal =3D (struct ifcvf_internal *)arg; > - struct ifcvf_hw *hw =3D &internal->hw; > - > - q_num =3D rte_vhost_get_vring_num(internal->vid); > - > - epfd =3D epoll_create(IFCVF_MAX_QUEUES * 2); > - if (epfd < 0) { > - DRV_LOG(ERR, "failed to create epoll instance."); > - return NULL; > - } > - internal->epfd =3D epfd; > - > - vring.kickfd =3D -1; > - for (qid =3D 0; qid < q_num; qid++) { > - ev.events =3D EPOLLIN | EPOLLPRI; > - rte_vhost_get_vhost_vring(internal->vid, qid, &vring); > - ev.data.u64 =3D qid | (uint64_t)vring.kickfd << 32; > - if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { > - DRV_LOG(ERR, "epoll add error: %s", > strerror(errno)); > - return NULL; > - } > - } > - > - for (;;) { > - nfds =3D epoll_wait(epfd, events, q_num, -1); > - if (nfds < 0) { > - if (errno =3D=3D EINTR) > - continue; > - DRV_LOG(ERR, "epoll_wait return fail\n"); > - return NULL; > - } > - > - for (i =3D 0; i < nfds; i++) { > - qid =3D events[i].data.u32; > - kickfd =3D (uint32_t)(events[i].data.u64 >> 32); > - do { > - nbytes =3D read(kickfd, &buf, 8); > - if (nbytes < 0) { > - if (errno =3D=3D EINTR || > - errno =3D=3D EWOULDBLOCK || > - errno =3D=3D EAGAIN) > - continue; > - DRV_LOG(INFO, "Error reading " > - "kickfd: %s", > - strerror(errno)); > - } > - break; > - } while (1); > - > - ifcvf_notify_queue(hw, qid); > - } > - } > - > - return NULL; > -} > - > -static int > -setup_notify_relay(struct ifcvf_internal *internal) > -{ > - int ret; > - > - ret =3D pthread_create(&internal->tid, NULL, notify_relay, > - (void *)internal); > - if (ret) { > - DRV_LOG(ERR, "failed to create notify relay pthread."); > - return -1; > - } > - return 0; > -} > - > -static int > -unset_notify_relay(struct ifcvf_internal *internal) > -{ > - void *status; > - > - if (internal->tid) { > - pthread_cancel(internal->tid); > - pthread_join(internal->tid, &status); > - } > - internal->tid =3D 0; > - > - if (internal->epfd >=3D 0) > - close(internal->epfd); > - internal->epfd =3D -1; > - > - return 0; > -} > - > -static int > -update_datapath(struct ifcvf_internal *internal) > -{ > - int ret; > - > - rte_spinlock_lock(&internal->lock); > - > - if (!rte_atomic32_read(&internal->running) && > - (rte_atomic32_read(&internal->started) && > - rte_atomic32_read(&internal->dev_attached))) { > - ret =3D ifcvf_dma_map(internal, 1); > - if (ret) > - goto err; > - > - ret =3D vdpa_enable_vfio_intr(internal, 0); > - if (ret) > - goto err; > - > - ret =3D vdpa_ifcvf_start(internal); > - if (ret) > - goto err; > - > - ret =3D setup_notify_relay(internal); > - if (ret) > - goto err; > - > - rte_atomic32_set(&internal->running, 1); > - } else if (rte_atomic32_read(&internal->running) && > - (!rte_atomic32_read(&internal->started) || > - !rte_atomic32_read(&internal->dev_attached))) { > - ret =3D unset_notify_relay(internal); > - if (ret) > - goto err; > - > - vdpa_ifcvf_stop(internal); > - > - ret =3D vdpa_disable_vfio_intr(internal); > - if (ret) > - goto err; > - > - ret =3D ifcvf_dma_map(internal, 0); > - if (ret) > - goto err; > - > - rte_atomic32_set(&internal->running, 0); > - } > - > - rte_spinlock_unlock(&internal->lock); > - return 0; > -err: > - rte_spinlock_unlock(&internal->lock); > - return ret; > -} > - > -static int > -m_ifcvf_start(struct ifcvf_internal *internal) > -{ > - struct ifcvf_hw *hw =3D &internal->hw; > - uint32_t i, nr_vring; > - int vid, ret; > - struct rte_vhost_vring vq; > - void *vring_buf; > - uint64_t m_vring_iova =3D IFCVF_MEDIATED_VRING; > - uint64_t size; > - uint64_t gpa; > - > - memset(&vq, 0, sizeof(vq)); > - vid =3D internal->vid; > - nr_vring =3D rte_vhost_get_vring_num(vid); > - rte_vhost_get_negotiated_features(vid, &hw->req_features); > - > - for (i =3D 0; i < nr_vring; i++) { > - rte_vhost_get_vhost_vring(vid, i, &vq); > - > - size =3D RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), > - PAGE_SIZE); > - vring_buf =3D rte_zmalloc("ifcvf", size, PAGE_SIZE); > - vring_init(&internal->m_vring[i], vq.size, vring_buf, > - PAGE_SIZE); > - > - ret =3D rte_vfio_container_dma_map(internal- > >vfio_container_fd, > - (uint64_t)(uintptr_t)vring_buf, m_vring_iova, size); > - if (ret < 0) { > - DRV_LOG(ERR, "mediated vring DMA map failed."); > - goto error; > - } > - > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); > - if (gpa =3D=3D 0) { > - DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > - return -1; > - } > - hw->vring[i].desc =3D gpa; > - > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail); > - if (gpa =3D=3D 0) { > - DRV_LOG(ERR, "Fail to get GPA for available ring."); > - return -1; > - } > - hw->vring[i].avail =3D gpa; > - > - /* Direct I/O for Tx queue, relay for Rx queue */ > - if (i & 1) { > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > - if (gpa =3D=3D 0) { > - DRV_LOG(ERR, "Fail to get GPA for used > ring."); > - return -1; > - } > - hw->vring[i].used =3D gpa; > - } else { > - hw->vring[i].used =3D m_vring_iova + > - (char *)internal->m_vring[i].used - > - (char *)internal->m_vring[i].desc; > - } > - > - hw->vring[i].size =3D vq.size; > - > - rte_vhost_get_vring_base(vid, i, > - &internal->m_vring[i].avail->idx, > - &internal->m_vring[i].used->idx); > - > - rte_vhost_get_vring_base(vid, i, &hw- > >vring[i].last_avail_idx, > - &hw->vring[i].last_used_idx); > - > - m_vring_iova +=3D size; > - } > - hw->nr_vring =3D nr_vring; > - > - return ifcvf_start_hw(&internal->hw); > - > -error: > - for (i =3D 0; i < nr_vring; i++) > - if (internal->m_vring[i].desc) > - rte_free(internal->m_vring[i].desc); > - > - return -1; > -} > - > -static int > -m_ifcvf_stop(struct ifcvf_internal *internal) > -{ > - int vid; > - uint32_t i; > - struct rte_vhost_vring vq; > - struct ifcvf_hw *hw =3D &internal->hw; > - uint64_t m_vring_iova =3D IFCVF_MEDIATED_VRING; > - uint64_t size, len; > - > - vid =3D internal->vid; > - ifcvf_stop_hw(hw); > - > - for (i =3D 0; i < hw->nr_vring; i++) { > - /* synchronize remaining new used entries if any */ > - if ((i & 1) =3D=3D 0) > - update_used_ring(internal, i); > - > - rte_vhost_get_vhost_vring(vid, i, &vq); > - len =3D IFCVF_USED_RING_LEN(vq.size); > - rte_vhost_log_used_vring(vid, i, 0, len); > - > - size =3D RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), > - PAGE_SIZE); > - rte_vfio_container_dma_unmap(internal- > >vfio_container_fd, > - (uint64_t)(uintptr_t)internal->m_vring[i].desc, > - m_vring_iova, size); > - > - rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > - hw->vring[i].last_used_idx); > - rte_free(internal->m_vring[i].desc); > - m_vring_iova +=3D size; > - } > - > - return 0; > -} > - > -static void > -update_used_ring(struct ifcvf_internal *internal, uint16_t qid) > -{ > - rte_vdpa_relay_vring_used(internal->vid, qid, &internal- > >m_vring[qid]); > - rte_vhost_vring_call(internal->vid, qid); > -} > - > -static void * > -vring_relay(void *arg) > -{ > - int i, vid, epfd, fd, nfds; > - struct ifcvf_internal *internal =3D (struct ifcvf_internal *)arg; > - struct rte_vhost_vring vring; > - uint16_t qid, q_num; > - struct epoll_event events[IFCVF_MAX_QUEUES * 4]; > - struct epoll_event ev; > - int nbytes; > - uint64_t buf; > - > - vid =3D internal->vid; > - q_num =3D rte_vhost_get_vring_num(vid); > - > - /* add notify fd and interrupt fd to epoll */ > - epfd =3D epoll_create(IFCVF_MAX_QUEUES * 2); > - if (epfd < 0) { > - DRV_LOG(ERR, "failed to create epoll instance."); > - return NULL; > - } > - internal->epfd =3D epfd; > - > - vring.kickfd =3D -1; > - for (qid =3D 0; qid < q_num; qid++) { > - ev.events =3D EPOLLIN | EPOLLPRI; > - rte_vhost_get_vhost_vring(vid, qid, &vring); > - ev.data.u64 =3D qid << 1 | (uint64_t)vring.kickfd << 32; > - if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { > - DRV_LOG(ERR, "epoll add error: %s", > strerror(errno)); > - return NULL; > - } > - } > - > - for (qid =3D 0; qid < q_num; qid +=3D 2) { > - ev.events =3D EPOLLIN | EPOLLPRI; > - /* leave a flag to mark it's for interrupt */ > - ev.data.u64 =3D 1 | qid << 1 | > - (uint64_t)internal->intr_fd[qid] << 32; > - if (epoll_ctl(epfd, EPOLL_CTL_ADD, internal->intr_fd[qid], > &ev) > - < 0) { > - DRV_LOG(ERR, "epoll add error: %s", > strerror(errno)); > - return NULL; > - } > - update_used_ring(internal, qid); > - } > - > - /* start relay with a first kick */ > - for (qid =3D 0; qid < q_num; qid++) > - ifcvf_notify_queue(&internal->hw, qid); > - > - /* listen to the events and react accordingly */ > - for (;;) { > - nfds =3D epoll_wait(epfd, events, q_num * 2, -1); > - if (nfds < 0) { > - if (errno =3D=3D EINTR) > - continue; > - DRV_LOG(ERR, "epoll_wait return fail\n"); > - return NULL; > - } > - > - for (i =3D 0; i < nfds; i++) { > - fd =3D (uint32_t)(events[i].data.u64 >> 32); > - do { > - nbytes =3D read(fd, &buf, 8); > - if (nbytes < 0) { > - if (errno =3D=3D EINTR || > - errno =3D=3D EWOULDBLOCK || > - errno =3D=3D EAGAIN) > - continue; > - DRV_LOG(INFO, "Error reading " > - "kickfd: %s", > - strerror(errno)); > - } > - break; > - } while (1); > - > - qid =3D events[i].data.u32 >> 1; > - > - if (events[i].data.u32 & 1) > - update_used_ring(internal, qid); > - else > - ifcvf_notify_queue(&internal->hw, qid); > - } > - } > - > - return NULL; > -} > - > -static int > -setup_vring_relay(struct ifcvf_internal *internal) > -{ > - int ret; > - > - ret =3D pthread_create(&internal->tid, NULL, vring_relay, > - (void *)internal); > - if (ret) { > - DRV_LOG(ERR, "failed to create ring relay pthread."); > - return -1; > - } > - return 0; > -} > - > -static int > -unset_vring_relay(struct ifcvf_internal *internal) > -{ > - void *status; > - > - if (internal->tid) { > - pthread_cancel(internal->tid); > - pthread_join(internal->tid, &status); > - } > - internal->tid =3D 0; > - > - if (internal->epfd >=3D 0) > - close(internal->epfd); > - internal->epfd =3D -1; > - > - return 0; > -} > - > -static int > -ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal) > -{ > - int ret; > - int vid =3D internal->vid; > - > - /* stop the direct IO data path */ > - unset_notify_relay(internal); > - vdpa_ifcvf_stop(internal); > - vdpa_disable_vfio_intr(internal); > - > - ret =3D rte_vhost_host_notifier_ctrl(vid, false); > - if (ret && ret !=3D -ENOTSUP) > - goto error; > - > - /* set up interrupt for interrupt relay */ > - ret =3D vdpa_enable_vfio_intr(internal, 1); > - if (ret) > - goto unmap; > - > - /* config the VF */ > - ret =3D m_ifcvf_start(internal); > - if (ret) > - goto unset_intr; > - > - /* set up vring relay thread */ > - ret =3D setup_vring_relay(internal); > - if (ret) > - goto stop_vf; > - > - rte_vhost_host_notifier_ctrl(vid, true); > - > - internal->sw_fallback_running =3D true; > - > - return 0; > - > -stop_vf: > - m_ifcvf_stop(internal); > -unset_intr: > - vdpa_disable_vfio_intr(internal); > -unmap: > - ifcvf_dma_map(internal, 0); > -error: > - return -1; > -} > - > -static int > -ifcvf_dev_config(int vid) > -{ > - int did; > - struct internal_list *list; > - struct ifcvf_internal *internal; > - > - did =3D rte_vhost_get_vdpa_device_id(vid); > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - internal =3D list->internal; > - internal->vid =3D vid; > - rte_atomic32_set(&internal->dev_attached, 1); > - update_datapath(internal); > - > - if (rte_vhost_host_notifier_ctrl(vid, true) !=3D 0) > - DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); > - > - return 0; > -} > - > -static int > -ifcvf_dev_close(int vid) > -{ > - int did; > - struct internal_list *list; > - struct ifcvf_internal *internal; > - > - did =3D rte_vhost_get_vdpa_device_id(vid); > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - internal =3D list->internal; > - > - if (internal->sw_fallback_running) { > - /* unset ring relay */ > - unset_vring_relay(internal); > - > - /* reset VF */ > - m_ifcvf_stop(internal); > - > - /* remove interrupt setting */ > - vdpa_disable_vfio_intr(internal); > - > - /* unset DMA map for guest memory */ > - ifcvf_dma_map(internal, 0); > - > - internal->sw_fallback_running =3D false; > - } else { > - rte_atomic32_set(&internal->dev_attached, 0); > - update_datapath(internal); > - } > - > - return 0; > -} > - > -static int > -ifcvf_set_features(int vid) > -{ > - uint64_t features =3D 0; > - int did; > - struct internal_list *list; > - struct ifcvf_internal *internal; > - uint64_t log_base =3D 0, log_size =3D 0; > - > - did =3D rte_vhost_get_vdpa_device_id(vid); > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - internal =3D list->internal; > - rte_vhost_get_negotiated_features(vid, &features); > - > - if (!RTE_VHOST_NEED_LOG(features)) > - return 0; > - > - if (internal->sw_lm) { > - ifcvf_sw_fallback_switchover(internal); > - } else { > - rte_vhost_get_log_base(vid, &log_base, &log_size); > - rte_vfio_container_dma_map(internal->vfio_container_fd, > - log_base, IFCVF_LOG_BASE, log_size); > - ifcvf_enable_logging(&internal->hw, IFCVF_LOG_BASE, > log_size); > - } > - > - return 0; > -} > - > -static int > -ifcvf_get_vfio_group_fd(int vid) > -{ > - int did; > - struct internal_list *list; > - > - did =3D rte_vhost_get_vdpa_device_id(vid); > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - return list->internal->vfio_group_fd; > -} > - > -static int > -ifcvf_get_vfio_device_fd(int vid) > -{ > - int did; > - struct internal_list *list; > - > - did =3D rte_vhost_get_vdpa_device_id(vid); > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - return list->internal->vfio_dev_fd; > -} > - > -static int > -ifcvf_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size= ) > -{ > - int did; > - struct internal_list *list; > - struct ifcvf_internal *internal; > - struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; > - int ret; > - > - did =3D rte_vhost_get_vdpa_device_id(vid); > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - internal =3D list->internal; > - > - reg.index =3D ifcvf_get_notify_region(&internal->hw); > - ret =3D ioctl(internal->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, > ®); > - if (ret) { > - DRV_LOG(ERR, "Get not get device region info: %s", > - strerror(errno)); > - return -1; > - } > - > - *offset =3D ifcvf_get_queue_notify_off(&internal->hw, qid) + > reg.offset; > - *size =3D 0x1000; > - > - return 0; > -} > - > -static int > -ifcvf_get_queue_num(int did, uint32_t *queue_num) > -{ > - struct internal_list *list; > - > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - *queue_num =3D list->internal->max_queues; > - > - return 0; > -} > - > -static int > -ifcvf_get_vdpa_features(int did, uint64_t *features) > -{ > - struct internal_list *list; > - > - list =3D find_internal_resource_by_did(did); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device id: %d", did); > - return -1; > - } > - > - *features =3D list->internal->features; > - > - return 0; > -} > - > -#define VDPA_SUPPORTED_PROTOCOL_FEATURES \ > - (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \ > - 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \ > - 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \ > - 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | \ > - 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) > -static int > -ifcvf_get_protocol_features(int did __rte_unused, uint64_t *features) > -{ > - *features =3D VDPA_SUPPORTED_PROTOCOL_FEATURES; > - return 0; > -} > - > -static struct rte_vdpa_dev_ops ifcvf_ops =3D { > - .get_queue_num =3D ifcvf_get_queue_num, > - .get_features =3D ifcvf_get_vdpa_features, > - .get_protocol_features =3D ifcvf_get_protocol_features, > - .dev_conf =3D ifcvf_dev_config, > - .dev_close =3D ifcvf_dev_close, > - .set_vring_state =3D NULL, > - .set_features =3D ifcvf_set_features, > - .migration_done =3D NULL, > - .get_vfio_group_fd =3D ifcvf_get_vfio_group_fd, > - .get_vfio_device_fd =3D ifcvf_get_vfio_device_fd, > - .get_notify_area =3D ifcvf_get_notify_area, > -}; > - > -static inline int > -open_int(const char *key __rte_unused, const char *value, void > *extra_args) > -{ > - uint16_t *n =3D extra_args; > - > - if (value =3D=3D NULL || extra_args =3D=3D NULL) > - return -EINVAL; > - > - *n =3D (uint16_t)strtoul(value, NULL, 0); > - if (*n =3D=3D USHRT_MAX && errno =3D=3D ERANGE) > - return -1; > - > - return 0; > -} > - > -static int > -ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, > - struct rte_pci_device *pci_dev) > -{ > - uint64_t features; > - struct ifcvf_internal *internal =3D NULL; > - struct internal_list *list =3D NULL; > - int vdpa_mode =3D 0; > - int sw_fallback_lm =3D 0; > - struct rte_kvargs *kvlist =3D NULL; > - int ret =3D 0; > - > - if (rte_eal_process_type() !=3D RTE_PROC_PRIMARY) > - return 0; > - > - if (!pci_dev->device.devargs) > - return 1; > - > - kvlist =3D rte_kvargs_parse(pci_dev->device.devargs->args, > - ifcvf_valid_arguments); > - if (kvlist =3D=3D NULL) > - return 1; > - > - /* probe only when vdpa mode is specified */ > - if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) =3D=3D 0) { > - rte_kvargs_free(kvlist); > - return 1; > - } > - > - ret =3D rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int, > - &vdpa_mode); > - if (ret < 0 || vdpa_mode =3D=3D 0) { > - rte_kvargs_free(kvlist); > - return 1; > - } > - > - list =3D rte_zmalloc("ifcvf", sizeof(*list), 0); > - if (list =3D=3D NULL) > - goto error; > - > - internal =3D rte_zmalloc("ifcvf", sizeof(*internal), 0); > - if (internal =3D=3D NULL) > - goto error; > - > - internal->pdev =3D pci_dev; > - rte_spinlock_init(&internal->lock); > - > - if (ifcvf_vfio_setup(internal) < 0) { > - DRV_LOG(ERR, "failed to setup device %s", pci_dev->name); > - goto error; > - } > - > - if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) { > - DRV_LOG(ERR, "failed to init device %s", pci_dev->name); > - goto error; > - } > - > - internal->max_queues =3D IFCVF_MAX_QUEUES; > - features =3D ifcvf_get_features(&internal->hw); > - internal->features =3D (features & > - ~(1ULL << VIRTIO_F_IOMMU_PLATFORM)) | > - (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | > - (1ULL << VIRTIO_NET_F_CTRL_VQ) | > - (1ULL << VIRTIO_NET_F_STATUS) | > - (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | > - (1ULL << VHOST_F_LOG_ALL); > - > - internal->dev_addr.pci_addr =3D pci_dev->addr; > - internal->dev_addr.type =3D PCI_ADDR; > - list->internal =3D internal; > - > - if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) { > - ret =3D rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM, > - &open_int, &sw_fallback_lm); > - if (ret < 0) > - goto error; > - } > - internal->sw_lm =3D sw_fallback_lm; > - > - internal->did =3D rte_vdpa_register_device(&internal->dev_addr, > - &ifcvf_ops); > - if (internal->did < 0) { > - DRV_LOG(ERR, "failed to register device %s", pci_dev- > >name); > - goto error; > - } > - > - pthread_mutex_lock(&internal_list_lock); > - TAILQ_INSERT_TAIL(&internal_list, list, next); > - pthread_mutex_unlock(&internal_list_lock); > - > - rte_atomic32_set(&internal->started, 1); > - update_datapath(internal); > - > - rte_kvargs_free(kvlist); > - return 0; > - > -error: > - rte_kvargs_free(kvlist); > - rte_free(list); > - rte_free(internal); > - return -1; > -} > - > -static int > -ifcvf_pci_remove(struct rte_pci_device *pci_dev) > -{ > - struct ifcvf_internal *internal; > - struct internal_list *list; > - > - if (rte_eal_process_type() !=3D RTE_PROC_PRIMARY) > - return 0; > - > - list =3D find_internal_resource_by_dev(pci_dev); > - if (list =3D=3D NULL) { > - DRV_LOG(ERR, "Invalid device: %s", pci_dev->name); > - return -1; > - } > - > - internal =3D list->internal; > - rte_atomic32_set(&internal->started, 0); > - update_datapath(internal); > - > - rte_pci_unmap_device(internal->pdev); > - rte_vfio_container_destroy(internal->vfio_container_fd); > - rte_vdpa_unregister_device(internal->did); > - > - pthread_mutex_lock(&internal_list_lock); > - TAILQ_REMOVE(&internal_list, list, next); > - pthread_mutex_unlock(&internal_list_lock); > - > - rte_free(list); > - rte_free(internal); > - > - return 0; > -} > - > -/* > - * IFCVF has the same vendor ID and device ID as virtio net PCI > - * device, with its specific subsystem vendor ID and device ID. > - */ > -static const struct rte_pci_id pci_id_ifcvf_map[] =3D { > - { .class_id =3D RTE_CLASS_ANY_ID, > - .vendor_id =3D IFCVF_VENDOR_ID, > - .device_id =3D IFCVF_DEVICE_ID, > - .subsystem_vendor_id =3D IFCVF_SUBSYS_VENDOR_ID, > - .subsystem_device_id =3D IFCVF_SUBSYS_DEVICE_ID, > - }, > - > - { .vendor_id =3D 0, /* sentinel */ > - }, > -}; > - > -static struct rte_pci_driver rte_ifcvf_vdpa =3D { > - .id_table =3D pci_id_ifcvf_map, > - .drv_flags =3D 0, > - .probe =3D ifcvf_pci_probe, > - .remove =3D ifcvf_pci_remove, > -}; > - > -RTE_PMD_REGISTER_PCI(net_ifcvf, rte_ifcvf_vdpa); > -RTE_PMD_REGISTER_PCI_TABLE(net_ifcvf, pci_id_ifcvf_map); > -RTE_PMD_REGISTER_KMOD_DEP(net_ifcvf, "* vfio-pci"); > - > -RTE_INIT(ifcvf_vdpa_init_log) > -{ > - ifcvf_vdpa_logtype =3D rte_log_register("pmd.net.ifcvf_vdpa"); > - if (ifcvf_vdpa_logtype >=3D 0) > - rte_log_set_level(ifcvf_vdpa_logtype, RTE_LOG_NOTICE); > -} > diff --git a/drivers/net/ifc/meson.build b/drivers/net/ifc/meson.build > deleted file mode 100644 > index adc9ed9..0000000 > --- a/drivers/net/ifc/meson.build > +++ /dev/null > @@ -1,9 +0,0 @@ > -# SPDX-License-Identifier: BSD-3-Clause > -# Copyright(c) 2018 Intel Corporation > - > -build =3D dpdk_conf.has('RTE_LIBRTE_VHOST') > -reason =3D 'missing dependency, DPDK vhost library' > -allow_experimental_apis =3D true > -sources =3D files('ifcvf_vdpa.c', 'base/ifcvf.c') > -includes +=3D include_directories('base') > -deps +=3D 'vhost' > diff --git a/drivers/net/ifc/rte_pmd_ifc_version.map > b/drivers/net/ifc/rte_pmd_ifc_version.map > deleted file mode 100644 > index f9f17e4..0000000 > --- a/drivers/net/ifc/rte_pmd_ifc_version.map > +++ /dev/null > @@ -1,3 +0,0 @@ > -DPDK_20.0 { > - local: *; > -}; > diff --git a/drivers/net/meson.build b/drivers/net/meson.build > index c300afb..b0ea8fe 100644 > --- a/drivers/net/meson.build > +++ b/drivers/net/meson.build > @@ -21,7 +21,6 @@ drivers =3D ['af_packet', > 'hns3', > 'iavf', > 'ice', > - 'ifc', > 'ipn3ke', > 'ixgbe', > 'kni', > diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile > index 82a2b70..27fec96 100644 > --- a/drivers/vdpa/Makefile > +++ b/drivers/vdpa/Makefile > @@ -5,4 +5,10 @@ include $(RTE_SDK)/mk/rte.vars.mk >=20 > # DIRS-$() +=3D >=20 > +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) > +ifeq ($(CONFIG_RTE_EAL_VFIO),y) > +DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) +=3D ifc > +endif > +endif # $(CONFIG_RTE_LIBRTE_VHOST) > + > include $(RTE_SDK)/mk/rte.subdir.mk > diff --git a/drivers/vdpa/ifc/Makefile b/drivers/vdpa/ifc/Makefile > new file mode 100644 > index 0000000..fe227b8 > --- /dev/null > +++ b/drivers/vdpa/ifc/Makefile > @@ -0,0 +1,34 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2018 Intel Corporation > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# > +# library name > +# > +LIB =3D librte_pmd_ifc.a > + > +LDLIBS +=3D -lpthread > +LDLIBS +=3D -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci > +LDLIBS +=3D -lrte_kvargs > + > +CFLAGS +=3D -O3 > +CFLAGS +=3D $(WERROR_FLAGS) > +CFLAGS +=3D -DALLOW_EXPERIMENTAL_API > + > +# > +# Add extra flags for base driver source files to disable warnings in th= em > +# > +BASE_DRIVER_OBJS=3D$(sort $(patsubst %.c,%.o,$(notdir $(wildcard > $(SRCDIR)/base/*.c)))) > + > +VPATH +=3D $(SRCDIR)/base > + > +EXPORT_MAP :=3D rte_pmd_ifc_version.map > + > +# > +# all source are stored in SRCS-y > +# > +SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) +=3D ifcvf_vdpa.c > +SRCS-$(CONFIG_RTE_LIBRTE_IFC_PMD) +=3D ifcvf.c > + > +include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/vdpa/ifc/base/ifcvf.c b/drivers/vdpa/ifc/base/ifcvf.= c > new file mode 100644 > index 0000000..3c0b2df > --- /dev/null > +++ b/drivers/vdpa/ifc/base/ifcvf.c > @@ -0,0 +1,329 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ > + > +#include "ifcvf.h" > +#include "ifcvf_osdep.h" > + > +STATIC void * > +get_cap_addr(struct ifcvf_hw *hw, struct ifcvf_pci_cap *cap) > +{ > + u8 bar =3D cap->bar; > + u32 length =3D cap->length; > + u32 offset =3D cap->offset; > + > + if (bar > IFCVF_PCI_MAX_RESOURCE - 1) { > + DEBUGOUT("invalid bar: %u\n", bar); > + return NULL; > + } > + > + if (offset + length < offset) { > + DEBUGOUT("offset(%u) + length(%u) overflows\n", > + offset, length); > + return NULL; > + } > + > + if (offset + length > hw->mem_resource[cap->bar].len) { > + DEBUGOUT("offset(%u) + length(%u) overflows bar > length(%u)", > + offset, length, (u32)hw->mem_resource[cap- > >bar].len); > + return NULL; > + } > + > + return hw->mem_resource[bar].addr + offset; > +} > + > +int > +ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev) > +{ > + int ret; > + u8 pos; > + struct ifcvf_pci_cap cap; > + > + ret =3D PCI_READ_CONFIG_BYTE(dev, &pos, PCI_CAPABILITY_LIST); > + if (ret < 0) { > + DEBUGOUT("failed to read pci capability list\n"); > + return -1; > + } > + > + while (pos) { > + ret =3D PCI_READ_CONFIG_RANGE(dev, (u32 *)&cap, > + sizeof(cap), pos); > + if (ret < 0) { > + DEBUGOUT("failed to read cap at pos: %x", pos); > + break; > + } > + > + if (cap.cap_vndr !=3D PCI_CAP_ID_VNDR) > + goto next; > + > + DEBUGOUT("cfg type: %u, bar: %u, offset: %u, " > + "len: %u\n", cap.cfg_type, cap.bar, > + cap.offset, cap.length); > + > + switch (cap.cfg_type) { > + case IFCVF_PCI_CAP_COMMON_CFG: > + hw->common_cfg =3D get_cap_addr(hw, &cap); > + break; > + case IFCVF_PCI_CAP_NOTIFY_CFG: > + PCI_READ_CONFIG_DWORD(dev, &hw- > >notify_off_multiplier, > + pos + sizeof(cap)); > + hw->notify_base =3D get_cap_addr(hw, &cap); > + hw->notify_region =3D cap.bar; > + break; > + case IFCVF_PCI_CAP_ISR_CFG: > + hw->isr =3D get_cap_addr(hw, &cap); > + break; > + case IFCVF_PCI_CAP_DEVICE_CFG: > + hw->dev_cfg =3D get_cap_addr(hw, &cap); > + break; > + } > +next: > + pos =3D cap.cap_next; > + } > + > + hw->lm_cfg =3D hw->mem_resource[4].addr; > + > + if (hw->common_cfg =3D=3D NULL || hw->notify_base =3D=3D NULL || > + hw->isr =3D=3D NULL || hw->dev_cfg =3D=3D NULL) { > + DEBUGOUT("capability incomplete\n"); > + return -1; > + } > + > + DEBUGOUT("capability mapping:\ncommon cfg: %p\n" > + "notify base: %p\nisr cfg: %p\ndevice cfg: %p\n" > + "multiplier: %u\n", > + hw->common_cfg, hw->dev_cfg, > + hw->isr, hw->notify_base, > + hw->notify_off_multiplier); > + > + return 0; > +} > + > +STATIC u8 > +ifcvf_get_status(struct ifcvf_hw *hw) > +{ > + return IFCVF_READ_REG8(&hw->common_cfg->device_status); > +} > + > +STATIC void > +ifcvf_set_status(struct ifcvf_hw *hw, u8 status) > +{ > + IFCVF_WRITE_REG8(status, &hw->common_cfg->device_status); > +} > + > +STATIC void > +ifcvf_reset(struct ifcvf_hw *hw) > +{ > + ifcvf_set_status(hw, 0); > + > + /* flush status write */ > + while (ifcvf_get_status(hw)) > + msec_delay(1); > +} > + > +STATIC void > +ifcvf_add_status(struct ifcvf_hw *hw, u8 status) > +{ > + if (status !=3D 0) > + status |=3D ifcvf_get_status(hw); > + > + ifcvf_set_status(hw, status); > + ifcvf_get_status(hw); > +} > + > +u64 > +ifcvf_get_features(struct ifcvf_hw *hw) > +{ > + u32 features_lo, features_hi; > + struct ifcvf_pci_common_cfg *cfg =3D hw->common_cfg; > + > + IFCVF_WRITE_REG32(0, &cfg->device_feature_select); > + features_lo =3D IFCVF_READ_REG32(&cfg->device_feature); > + > + IFCVF_WRITE_REG32(1, &cfg->device_feature_select); > + features_hi =3D IFCVF_READ_REG32(&cfg->device_feature); > + > + return ((u64)features_hi << 32) | features_lo; > +} > + > +STATIC void > +ifcvf_set_features(struct ifcvf_hw *hw, u64 features) > +{ > + struct ifcvf_pci_common_cfg *cfg =3D hw->common_cfg; > + > + IFCVF_WRITE_REG32(0, &cfg->guest_feature_select); > + IFCVF_WRITE_REG32(features & ((1ULL << 32) - 1), &cfg- > >guest_feature); > + > + IFCVF_WRITE_REG32(1, &cfg->guest_feature_select); > + IFCVF_WRITE_REG32(features >> 32, &cfg->guest_feature); > +} > + > +STATIC int > +ifcvf_config_features(struct ifcvf_hw *hw) > +{ > + u64 host_features; > + > + host_features =3D ifcvf_get_features(hw); > + hw->req_features &=3D host_features; > + > + ifcvf_set_features(hw, hw->req_features); > + ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_FEATURES_OK); > + > + if (!(ifcvf_get_status(hw) & > IFCVF_CONFIG_STATUS_FEATURES_OK)) { > + DEBUGOUT("failed to set FEATURES_OK status\n"); > + return -1; > + } > + > + return 0; > +} > + > +STATIC void > +io_write64_twopart(u64 val, u32 *lo, u32 *hi) > +{ > + IFCVF_WRITE_REG32(val & ((1ULL << 32) - 1), lo); > + IFCVF_WRITE_REG32(val >> 32, hi); > +} > + > +STATIC int > +ifcvf_hw_enable(struct ifcvf_hw *hw) > +{ > + struct ifcvf_pci_common_cfg *cfg; > + u8 *lm_cfg; > + u32 i; > + u16 notify_off; > + > + cfg =3D hw->common_cfg; > + lm_cfg =3D hw->lm_cfg; > + > + IFCVF_WRITE_REG16(0, &cfg->msix_config); > + if (IFCVF_READ_REG16(&cfg->msix_config) =3D=3D > IFCVF_MSI_NO_VECTOR) { > + DEBUGOUT("msix vec alloc failed for device config\n"); > + return -1; > + } > + > + for (i =3D 0; i < hw->nr_vring; i++) { > + IFCVF_WRITE_REG16(i, &cfg->queue_select); > + io_write64_twopart(hw->vring[i].desc, &cfg- > >queue_desc_lo, > + &cfg->queue_desc_hi); > + io_write64_twopart(hw->vring[i].avail, &cfg- > >queue_avail_lo, > + &cfg->queue_avail_hi); > + io_write64_twopart(hw->vring[i].used, &cfg- > >queue_used_lo, > + &cfg->queue_used_hi); > + IFCVF_WRITE_REG16(hw->vring[i].size, &cfg->queue_size); > + > + *(u32 *)(lm_cfg + IFCVF_LM_RING_STATE_OFFSET + > + (i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4) =3D > + (u32)hw->vring[i].last_avail_idx | > + ((u32)hw->vring[i].last_used_idx << 16); > + > + IFCVF_WRITE_REG16(i + 1, &cfg->queue_msix_vector); > + if (IFCVF_READ_REG16(&cfg->queue_msix_vector) =3D=3D > + IFCVF_MSI_NO_VECTOR) { > + DEBUGOUT("queue %u, msix vec alloc failed\n", > + i); > + return -1; > + } > + > + notify_off =3D IFCVF_READ_REG16(&cfg->queue_notify_off); > + hw->notify_addr[i] =3D (void *)((u8 *)hw->notify_base + > + notify_off * hw->notify_off_multiplier); > + IFCVF_WRITE_REG16(1, &cfg->queue_enable); > + } > + > + return 0; > +} > + > +STATIC void > +ifcvf_hw_disable(struct ifcvf_hw *hw) > +{ > + u32 i; > + struct ifcvf_pci_common_cfg *cfg; > + u32 ring_state; > + > + cfg =3D hw->common_cfg; > + > + IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg->msix_config); > + for (i =3D 0; i < hw->nr_vring; i++) { > + IFCVF_WRITE_REG16(i, &cfg->queue_select); > + IFCVF_WRITE_REG16(0, &cfg->queue_enable); > + IFCVF_WRITE_REG16(IFCVF_MSI_NO_VECTOR, &cfg- > >queue_msix_vector); > + ring_state =3D *(u32 *)(hw->lm_cfg + > IFCVF_LM_RING_STATE_OFFSET + > + (i / 2) * IFCVF_LM_CFG_SIZE + (i % 2) * 4); > + hw->vring[i].last_avail_idx =3D (u16)(ring_state >> 16); > + hw->vring[i].last_used_idx =3D (u16)(ring_state >> 16); > + } > +} > + > +int > +ifcvf_start_hw(struct ifcvf_hw *hw) > +{ > + ifcvf_reset(hw); > + ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_ACK); > + ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER); > + > + if (ifcvf_config_features(hw) < 0) > + return -1; > + > + if (ifcvf_hw_enable(hw) < 0) > + return -1; > + > + ifcvf_add_status(hw, IFCVF_CONFIG_STATUS_DRIVER_OK); > + return 0; > +} > + > +void > +ifcvf_stop_hw(struct ifcvf_hw *hw) > +{ > + ifcvf_hw_disable(hw); > + ifcvf_reset(hw); > +} > + > +void > +ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size) > +{ > + u8 *lm_cfg; > + > + lm_cfg =3D hw->lm_cfg; > + > + *(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_LOW) =3D > + log_base & IFCVF_32_BIT_MASK; > + > + *(u32 *)(lm_cfg + IFCVF_LM_BASE_ADDR_HIGH) =3D > + (log_base >> 32) & IFCVF_32_BIT_MASK; > + > + *(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_LOW) =3D > + (log_base + log_size) & IFCVF_32_BIT_MASK; > + > + *(u32 *)(lm_cfg + IFCVF_LM_END_ADDR_HIGH) =3D > + ((log_base + log_size) >> 32) & IFCVF_32_BIT_MASK; > + > + *(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) =3D > IFCVF_LM_ENABLE_VF; > +} > + > +void > +ifcvf_disable_logging(struct ifcvf_hw *hw) > +{ > + u8 *lm_cfg; > + > + lm_cfg =3D hw->lm_cfg; > + *(u32 *)(lm_cfg + IFCVF_LM_LOGGING_CTRL) =3D IFCVF_LM_DISABLE; > +} > + > +void > +ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid) > +{ > + IFCVF_WRITE_REG16(qid, hw->notify_addr[qid]); > +} > + > +u8 > +ifcvf_get_notify_region(struct ifcvf_hw *hw) > +{ > + return hw->notify_region; > +} > + > +u64 > +ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid) > +{ > + return (u8 *)hw->notify_addr[qid] - > + (u8 *)hw->mem_resource[hw->notify_region].addr; > +} > diff --git a/drivers/vdpa/ifc/base/ifcvf.h b/drivers/vdpa/ifc/base/ifcvf.= h > new file mode 100644 > index 0000000..9be2770 > --- /dev/null > +++ b/drivers/vdpa/ifc/base/ifcvf.h > @@ -0,0 +1,162 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ > + > +#ifndef _IFCVF_H_ > +#define _IFCVF_H_ > + > +#include "ifcvf_osdep.h" > + > +#define IFCVF_VENDOR_ID 0x1AF4 > +#define IFCVF_DEVICE_ID 0x1041 > +#define IFCVF_SUBSYS_VENDOR_ID 0x8086 > +#define IFCVF_SUBSYS_DEVICE_ID 0x001A > + > +#define IFCVF_MAX_QUEUES 1 > +#define VIRTIO_F_IOMMU_PLATFORM 33 > + > +/* Common configuration */ > +#define IFCVF_PCI_CAP_COMMON_CFG 1 > +/* Notifications */ > +#define IFCVF_PCI_CAP_NOTIFY_CFG 2 > +/* ISR Status */ > +#define IFCVF_PCI_CAP_ISR_CFG 3 > +/* Device specific configuration */ > +#define IFCVF_PCI_CAP_DEVICE_CFG 4 > +/* PCI configuration access */ > +#define IFCVF_PCI_CAP_PCI_CFG 5 > + > +#define IFCVF_CONFIG_STATUS_RESET 0x00 > +#define IFCVF_CONFIG_STATUS_ACK 0x01 > +#define IFCVF_CONFIG_STATUS_DRIVER 0x02 > +#define IFCVF_CONFIG_STATUS_DRIVER_OK 0x04 > +#define IFCVF_CONFIG_STATUS_FEATURES_OK 0x08 > +#define IFCVF_CONFIG_STATUS_FAILED 0x80 > + > +#define IFCVF_MSI_NO_VECTOR 0xffff > +#define IFCVF_PCI_MAX_RESOURCE 6 > + > +#define IFCVF_LM_CFG_SIZE 0x40 > +#define IFCVF_LM_RING_STATE_OFFSET 0x20 > + > +#define IFCVF_LM_LOGGING_CTRL 0x0 > + > +#define IFCVF_LM_BASE_ADDR_LOW 0x10 > +#define IFCVF_LM_BASE_ADDR_HIGH 0x14 > +#define IFCVF_LM_END_ADDR_LOW 0x18 > +#define IFCVF_LM_END_ADDR_HIGH 0x1c > + > +#define IFCVF_LM_DISABLE 0x0 > +#define IFCVF_LM_ENABLE_VF 0x1 > +#define IFCVF_LM_ENABLE_PF 0x3 > +#define IFCVF_LOG_BASE 0x100000000000 > +#define IFCVF_MEDIATED_VRING 0x200000000000 > + > +#define IFCVF_32_BIT_MASK 0xffffffff > + > + > +struct ifcvf_pci_cap { > + u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ > + u8 cap_next; /* Generic PCI field: next ptr. */ > + u8 cap_len; /* Generic PCI field: capability length */ > + u8 cfg_type; /* Identifies the structure. */ > + u8 bar; /* Where to find it. */ > + u8 padding[3]; /* Pad to full dword. */ > + u32 offset; /* Offset within bar. */ > + u32 length; /* Length of the structure, in bytes. */ > +}; > + > +struct ifcvf_pci_notify_cap { > + struct ifcvf_pci_cap cap; > + u32 notify_off_multiplier; /* Multiplier for queue_notify_off. */ > +}; > + > +struct ifcvf_pci_common_cfg { > + /* About the whole device. */ > + u32 device_feature_select; > + u32 device_feature; > + u32 guest_feature_select; > + u32 guest_feature; > + u16 msix_config; > + u16 num_queues; > + u8 device_status; > + u8 config_generation; > + > + /* About a specific virtqueue. */ > + u16 queue_select; > + u16 queue_size; > + u16 queue_msix_vector; > + u16 queue_enable; > + u16 queue_notify_off; > + u32 queue_desc_lo; > + u32 queue_desc_hi; > + u32 queue_avail_lo; > + u32 queue_avail_hi; > + u32 queue_used_lo; > + u32 queue_used_hi; > +}; > + > +struct ifcvf_net_config { > + u8 mac[6]; > + u16 status; > + u16 max_virtqueue_pairs; > +} __attribute__((packed)); > + > +struct ifcvf_pci_mem_resource { > + u64 phys_addr; /**< Physical address, 0 if not resource. */ > + u64 len; /**< Length of the resource. */ > + u8 *addr; /**< Virtual address, NULL when not mapped. */ > +}; > + > +struct vring_info { > + u64 desc; > + u64 avail; > + u64 used; > + u16 size; > + u16 last_avail_idx; > + u16 last_used_idx; > +}; > + > +struct ifcvf_hw { > + u64 req_features; > + u8 notify_region; > + u32 notify_off_multiplier; > + struct ifcvf_pci_common_cfg *common_cfg; > + struct ifcvf_net_config *dev_cfg; > + u8 *isr; > + u16 *notify_base; > + u16 *notify_addr[IFCVF_MAX_QUEUES * 2]; > + u8 *lm_cfg; > + struct vring_info vring[IFCVF_MAX_QUEUES * 2]; > + u8 nr_vring; > + struct ifcvf_pci_mem_resource > mem_resource[IFCVF_PCI_MAX_RESOURCE]; > +}; > + > +int > +ifcvf_init_hw(struct ifcvf_hw *hw, PCI_DEV *dev); > + > +u64 > +ifcvf_get_features(struct ifcvf_hw *hw); > + > +int > +ifcvf_start_hw(struct ifcvf_hw *hw); > + > +void > +ifcvf_stop_hw(struct ifcvf_hw *hw); > + > +void > +ifcvf_enable_logging(struct ifcvf_hw *hw, u64 log_base, u64 log_size); > + > +void > +ifcvf_disable_logging(struct ifcvf_hw *hw); > + > +void > +ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid); > + > +u8 > +ifcvf_get_notify_region(struct ifcvf_hw *hw); > + > +u64 > +ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid); > + > +#endif /* _IFCVF_H_ */ > diff --git a/drivers/vdpa/ifc/base/ifcvf_osdep.h > b/drivers/vdpa/ifc/base/ifcvf_osdep.h > new file mode 100644 > index 0000000..6aef25e > --- /dev/null > +++ b/drivers/vdpa/ifc/base/ifcvf_osdep.h > @@ -0,0 +1,52 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ > + > +#ifndef _IFCVF_OSDEP_H_ > +#define _IFCVF_OSDEP_H_ > + > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#define DEBUGOUT(S, args...) RTE_LOG(DEBUG, PMD, S, ##args) > +#define STATIC static > + > +#define msec_delay(x) rte_delay_us_sleep(1000 * (x)) > + > +#define IFCVF_READ_REG8(reg) rte_read8(reg) > +#define IFCVF_WRITE_REG8(val, reg) rte_write8((val), (reg)) > +#define IFCVF_READ_REG16(reg) rte_read16(reg) > +#define IFCVF_WRITE_REG16(val, reg) rte_write16((val), (reg)) > +#define IFCVF_READ_REG32(reg) rte_read32(reg) > +#define IFCVF_WRITE_REG32(val, reg) rte_write32((val), (reg)) > + > +typedef struct rte_pci_device PCI_DEV; > + > +#define PCI_READ_CONFIG_BYTE(dev, val, where) \ > + rte_pci_read_config(dev, val, 1, where) > + > +#define PCI_READ_CONFIG_DWORD(dev, val, where) \ > + rte_pci_read_config(dev, val, 4, where) > + > +typedef uint8_t u8; > +typedef int8_t s8; > +typedef uint16_t u16; > +typedef int16_t s16; > +typedef uint32_t u32; > +typedef int32_t s32; > +typedef int64_t s64; > +typedef uint64_t u64; > + > +static inline int > +PCI_READ_CONFIG_RANGE(PCI_DEV *dev, uint32_t *val, int size, int > where) > +{ > + return rte_pci_read_config(dev, val, size, where); > +} > + > +#endif /* _IFCVF_OSDEP_H_ */ > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.= c > new file mode 100644 > index 0000000..da4667b > --- /dev/null > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c > @@ -0,0 +1,1280 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "base/ifcvf.h" > + > +#define DRV_LOG(level, fmt, args...) \ > + rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \ > + "IFCVF %s(): " fmt "\n", __func__, ##args) > + > +#ifndef PAGE_SIZE > +#define PAGE_SIZE 4096 > +#endif > + > +#define IFCVF_USED_RING_LEN(size) \ > + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) > + > +#define IFCVF_VDPA_MODE "vdpa" > +#define IFCVF_SW_FALLBACK_LM "sw-live-migration" > + > +static const char * const ifcvf_valid_arguments[] =3D { > + IFCVF_VDPA_MODE, > + IFCVF_SW_FALLBACK_LM, > + NULL > +}; > + > +static int ifcvf_vdpa_logtype; > + > +struct ifcvf_internal { > + struct rte_vdpa_dev_addr dev_addr; > + struct rte_pci_device *pdev; > + struct ifcvf_hw hw; > + int vfio_container_fd; > + int vfio_group_fd; > + int vfio_dev_fd; > + pthread_t tid; /* thread for notify relay */ > + int epfd; > + int vid; > + int did; > + uint16_t max_queues; > + uint64_t features; > + rte_atomic32_t started; > + rte_atomic32_t dev_attached; > + rte_atomic32_t running; > + rte_spinlock_t lock; > + bool sw_lm; > + bool sw_fallback_running; > + /* mediated vring for sw fallback */ > + struct vring m_vring[IFCVF_MAX_QUEUES * 2]; > + /* eventfd for used ring interrupt */ > + int intr_fd[IFCVF_MAX_QUEUES * 2]; > +}; > + > +struct internal_list { > + TAILQ_ENTRY(internal_list) next; > + struct ifcvf_internal *internal; > +}; > + > +TAILQ_HEAD(internal_list_head, internal_list); > +static struct internal_list_head internal_list =3D > + TAILQ_HEAD_INITIALIZER(internal_list); > + > +static pthread_mutex_t internal_list_lock =3D PTHREAD_MUTEX_INITIALIZER; > + > +static void update_used_ring(struct ifcvf_internal *internal, uint16_t q= id); > + > +static struct internal_list * > +find_internal_resource_by_did(int did) > +{ > + int found =3D 0; > + struct internal_list *list; > + > + pthread_mutex_lock(&internal_list_lock); > + > + TAILQ_FOREACH(list, &internal_list, next) { > + if (did =3D=3D list->internal->did) { > + found =3D 1; > + break; > + } > + } > + > + pthread_mutex_unlock(&internal_list_lock); > + > + if (!found) > + return NULL; > + > + return list; > +} > + > +static struct internal_list * > +find_internal_resource_by_dev(struct rte_pci_device *pdev) > +{ > + int found =3D 0; > + struct internal_list *list; > + > + pthread_mutex_lock(&internal_list_lock); > + > + TAILQ_FOREACH(list, &internal_list, next) { > + if (pdev =3D=3D list->internal->pdev) { > + found =3D 1; > + break; > + } > + } > + > + pthread_mutex_unlock(&internal_list_lock); > + > + if (!found) > + return NULL; > + > + return list; > +} > + > +static int > +ifcvf_vfio_setup(struct ifcvf_internal *internal) > +{ > + struct rte_pci_device *dev =3D internal->pdev; > + char devname[RTE_DEV_NAME_MAX_LEN] =3D {0}; > + int iommu_group_num; > + int i, ret; > + > + internal->vfio_dev_fd =3D -1; > + internal->vfio_group_fd =3D -1; > + internal->vfio_container_fd =3D -1; > + > + rte_pci_device_name(&dev->addr, devname, > RTE_DEV_NAME_MAX_LEN); > + ret =3D rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname, > + &iommu_group_num); > + if (ret <=3D 0) { > + DRV_LOG(ERR, "%s failed to get IOMMU group", devname); > + return -1; > + } > + > + internal->vfio_container_fd =3D rte_vfio_container_create(); > + if (internal->vfio_container_fd < 0) > + return -1; > + > + internal->vfio_group_fd =3D rte_vfio_container_group_bind( > + internal->vfio_container_fd, iommu_group_num); > + if (internal->vfio_group_fd < 0) > + goto err; > + > + if (rte_pci_map_device(dev)) > + goto err; > + > + internal->vfio_dev_fd =3D dev->intr_handle.vfio_dev_fd; > + > + for (i =3D 0; i < RTE_MIN(PCI_MAX_RESOURCE, > IFCVF_PCI_MAX_RESOURCE); > + i++) { > + internal->hw.mem_resource[i].addr =3D > + internal->pdev->mem_resource[i].addr; > + internal->hw.mem_resource[i].phys_addr =3D > + internal->pdev->mem_resource[i].phys_addr; > + internal->hw.mem_resource[i].len =3D > + internal->pdev->mem_resource[i].len; > + } > + > + return 0; > + > +err: > + rte_vfio_container_destroy(internal->vfio_container_fd); > + return -1; > +} > + > +static int > +ifcvf_dma_map(struct ifcvf_internal *internal, int do_map) > +{ > + uint32_t i; > + int ret; > + struct rte_vhost_memory *mem =3D NULL; > + int vfio_container_fd; > + > + ret =3D rte_vhost_get_mem_table(internal->vid, &mem); > + if (ret < 0) { > + DRV_LOG(ERR, "failed to get VM memory layout."); > + goto exit; > + } > + > + vfio_container_fd =3D internal->vfio_container_fd; > + > + for (i =3D 0; i < mem->nregions; i++) { > + struct rte_vhost_mem_region *reg; > + > + reg =3D &mem->regions[i]; > + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " > + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", > + do_map ? "DMA map" : "DMA unmap", i, > + reg->host_user_addr, reg->guest_phys_addr, reg- > >size); > + > + if (do_map) { > + ret =3D > rte_vfio_container_dma_map(vfio_container_fd, > + reg->host_user_addr, reg- > >guest_phys_addr, > + reg->size); > + if (ret < 0) { > + DRV_LOG(ERR, "DMA map failed."); > + goto exit; > + } > + } else { > + ret =3D > rte_vfio_container_dma_unmap(vfio_container_fd, > + reg->host_user_addr, reg- > >guest_phys_addr, > + reg->size); > + if (ret < 0) { > + DRV_LOG(ERR, "DMA unmap failed."); > + goto exit; > + } > + } > + } > + > +exit: > + if (mem) > + free(mem); > + return ret; > +} > + > +static uint64_t > +hva_to_gpa(int vid, uint64_t hva) > +{ > + struct rte_vhost_memory *mem =3D NULL; > + struct rte_vhost_mem_region *reg; > + uint32_t i; > + uint64_t gpa =3D 0; > + > + if (rte_vhost_get_mem_table(vid, &mem) < 0) > + goto exit; > + > + for (i =3D 0; i < mem->nregions; i++) { > + reg =3D &mem->regions[i]; > + > + if (hva >=3D reg->host_user_addr && > + hva < reg->host_user_addr + reg->size) { > + gpa =3D hva - reg->host_user_addr + reg- > >guest_phys_addr; > + break; > + } > + } > + > +exit: > + if (mem) > + free(mem); > + return gpa; > +} > + > +static int > +vdpa_ifcvf_start(struct ifcvf_internal *internal) > +{ > + struct ifcvf_hw *hw =3D &internal->hw; > + int i, nr_vring; > + int vid; > + struct rte_vhost_vring vq; > + uint64_t gpa; > + > + vid =3D internal->vid; > + nr_vring =3D rte_vhost_get_vring_num(vid); > + rte_vhost_get_negotiated_features(vid, &hw->req_features); > + > + for (i =3D 0; i < nr_vring; i++) { > + rte_vhost_get_vhost_vring(vid, i, &vq); > + gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); > + if (gpa =3D=3D 0) { > + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > + return -1; > + } > + hw->vring[i].desc =3D gpa; > + > + gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail); > + if (gpa =3D=3D 0) { > + DRV_LOG(ERR, "Fail to get GPA for available ring."); > + return -1; > + } > + hw->vring[i].avail =3D gpa; > + > + gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > + if (gpa =3D=3D 0) { > + DRV_LOG(ERR, "Fail to get GPA for used ring."); > + return -1; > + } > + hw->vring[i].used =3D gpa; > + > + hw->vring[i].size =3D vq.size; > + rte_vhost_get_vring_base(vid, i, &hw- > >vring[i].last_avail_idx, > + &hw->vring[i].last_used_idx); > + } > + hw->nr_vring =3D i; > + > + return ifcvf_start_hw(&internal->hw); > +} > + > +static void > +vdpa_ifcvf_stop(struct ifcvf_internal *internal) > +{ > + struct ifcvf_hw *hw =3D &internal->hw; > + uint32_t i; > + int vid; > + uint64_t features =3D 0; > + uint64_t log_base =3D 0, log_size =3D 0; > + uint64_t len; > + > + vid =3D internal->vid; > + ifcvf_stop_hw(hw); > + > + for (i =3D 0; i < hw->nr_vring; i++) > + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > + hw->vring[i].last_used_idx); > + > + if (internal->sw_lm) > + return; > + > + rte_vhost_get_negotiated_features(vid, &features); > + if (RTE_VHOST_NEED_LOG(features)) { > + ifcvf_disable_logging(hw); > + rte_vhost_get_log_base(internal->vid, &log_base, > &log_size); > + rte_vfio_container_dma_unmap(internal- > >vfio_container_fd, > + log_base, IFCVF_LOG_BASE, log_size); > + /* > + * IFCVF marks dirty memory pages for only packet buffer, > + * SW helps to mark the used ring as dirty after device stops. > + */ > + for (i =3D 0; i < hw->nr_vring; i++) { > + len =3D IFCVF_USED_RING_LEN(hw->vring[i].size); > + rte_vhost_log_used_vring(vid, i, 0, len); > + } > + } > +} > + > +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \ > + sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1)) > +static int > +vdpa_enable_vfio_intr(struct ifcvf_internal *internal, bool m_rx) > +{ > + int ret; > + uint32_t i, nr_vring; > + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > + struct vfio_irq_set *irq_set; > + int *fd_ptr; > + struct rte_vhost_vring vring; > + int fd; > + > + vring.callfd =3D -1; > + > + nr_vring =3D rte_vhost_get_vring_num(internal->vid); > + > + irq_set =3D (struct vfio_irq_set *)irq_set_buf; > + irq_set->argsz =3D sizeof(irq_set_buf); > + irq_set->count =3D nr_vring + 1; > + irq_set->flags =3D VFIO_IRQ_SET_DATA_EVENTFD | > + VFIO_IRQ_SET_ACTION_TRIGGER; > + irq_set->index =3D VFIO_PCI_MSIX_IRQ_INDEX; > + irq_set->start =3D 0; > + fd_ptr =3D (int *)&irq_set->data; > + fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] =3D internal->pdev- > >intr_handle.fd; > + > + for (i =3D 0; i < nr_vring; i++) > + internal->intr_fd[i] =3D -1; > + > + for (i =3D 0; i < nr_vring; i++) { > + rte_vhost_get_vhost_vring(internal->vid, i, &vring); > + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] =3D vring.callfd; > + if ((i & 1) =3D=3D 0 && m_rx =3D=3D true) { > + fd =3D eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > + if (fd < 0) { > + DRV_LOG(ERR, "can't setup eventfd: %s", > + strerror(errno)); > + return -1; > + } > + internal->intr_fd[i] =3D fd; > + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] =3D fd; > + } > + } > + > + ret =3D ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > + if (ret) { > + DRV_LOG(ERR, "Error enabling MSI-X interrupts: %s", > + strerror(errno)); > + return -1; > + } > + > + return 0; > +} > + > +static int > +vdpa_disable_vfio_intr(struct ifcvf_internal *internal) > +{ > + int ret; > + uint32_t i, nr_vring; > + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > + struct vfio_irq_set *irq_set; > + > + irq_set =3D (struct vfio_irq_set *)irq_set_buf; > + irq_set->argsz =3D sizeof(irq_set_buf); > + irq_set->count =3D 0; > + irq_set->flags =3D VFIO_IRQ_SET_DATA_NONE | > VFIO_IRQ_SET_ACTION_TRIGGER; > + irq_set->index =3D VFIO_PCI_MSIX_IRQ_INDEX; > + irq_set->start =3D 0; > + > + nr_vring =3D rte_vhost_get_vring_num(internal->vid); > + for (i =3D 0; i < nr_vring; i++) { > + if (internal->intr_fd[i] >=3D 0) > + close(internal->intr_fd[i]); > + internal->intr_fd[i] =3D -1; > + } > + > + ret =3D ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > + if (ret) { > + DRV_LOG(ERR, "Error disabling MSI-X interrupts: %s", > + strerror(errno)); > + return -1; > + } > + > + return 0; > +} > + > +static void * > +notify_relay(void *arg) > +{ > + int i, kickfd, epfd, nfds =3D 0; > + uint32_t qid, q_num; > + struct epoll_event events[IFCVF_MAX_QUEUES * 2]; > + struct epoll_event ev; > + uint64_t buf; > + int nbytes; > + struct rte_vhost_vring vring; > + struct ifcvf_internal *internal =3D (struct ifcvf_internal *)arg; > + struct ifcvf_hw *hw =3D &internal->hw; > + > + q_num =3D rte_vhost_get_vring_num(internal->vid); > + > + epfd =3D epoll_create(IFCVF_MAX_QUEUES * 2); > + if (epfd < 0) { > + DRV_LOG(ERR, "failed to create epoll instance."); > + return NULL; > + } > + internal->epfd =3D epfd; > + > + vring.kickfd =3D -1; > + for (qid =3D 0; qid < q_num; qid++) { > + ev.events =3D EPOLLIN | EPOLLPRI; > + rte_vhost_get_vhost_vring(internal->vid, qid, &vring); > + ev.data.u64 =3D qid | (uint64_t)vring.kickfd << 32; > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { > + DRV_LOG(ERR, "epoll add error: %s", > strerror(errno)); > + return NULL; > + } > + } > + > + for (;;) { > + nfds =3D epoll_wait(epfd, events, q_num, -1); > + if (nfds < 0) { > + if (errno =3D=3D EINTR) > + continue; > + DRV_LOG(ERR, "epoll_wait return fail\n"); > + return NULL; > + } > + > + for (i =3D 0; i < nfds; i++) { > + qid =3D events[i].data.u32; > + kickfd =3D (uint32_t)(events[i].data.u64 >> 32); > + do { > + nbytes =3D read(kickfd, &buf, 8); > + if (nbytes < 0) { > + if (errno =3D=3D EINTR || > + errno =3D=3D EWOULDBLOCK || > + errno =3D=3D EAGAIN) > + continue; > + DRV_LOG(INFO, "Error reading " > + "kickfd: %s", > + strerror(errno)); > + } > + break; > + } while (1); > + > + ifcvf_notify_queue(hw, qid); > + } > + } > + > + return NULL; > +} > + > +static int > +setup_notify_relay(struct ifcvf_internal *internal) > +{ > + int ret; > + > + ret =3D pthread_create(&internal->tid, NULL, notify_relay, > + (void *)internal); > + if (ret) { > + DRV_LOG(ERR, "failed to create notify relay pthread."); > + return -1; > + } > + return 0; > +} > + > +static int > +unset_notify_relay(struct ifcvf_internal *internal) > +{ > + void *status; > + > + if (internal->tid) { > + pthread_cancel(internal->tid); > + pthread_join(internal->tid, &status); > + } > + internal->tid =3D 0; > + > + if (internal->epfd >=3D 0) > + close(internal->epfd); > + internal->epfd =3D -1; > + > + return 0; > +} > + > +static int > +update_datapath(struct ifcvf_internal *internal) > +{ > + int ret; > + > + rte_spinlock_lock(&internal->lock); > + > + if (!rte_atomic32_read(&internal->running) && > + (rte_atomic32_read(&internal->started) && > + rte_atomic32_read(&internal->dev_attached))) { > + ret =3D ifcvf_dma_map(internal, 1); > + if (ret) > + goto err; > + > + ret =3D vdpa_enable_vfio_intr(internal, 0); > + if (ret) > + goto err; > + > + ret =3D vdpa_ifcvf_start(internal); > + if (ret) > + goto err; > + > + ret =3D setup_notify_relay(internal); > + if (ret) > + goto err; > + > + rte_atomic32_set(&internal->running, 1); > + } else if (rte_atomic32_read(&internal->running) && > + (!rte_atomic32_read(&internal->started) || > + !rte_atomic32_read(&internal->dev_attached))) { > + ret =3D unset_notify_relay(internal); > + if (ret) > + goto err; > + > + vdpa_ifcvf_stop(internal); > + > + ret =3D vdpa_disable_vfio_intr(internal); > + if (ret) > + goto err; > + > + ret =3D ifcvf_dma_map(internal, 0); > + if (ret) > + goto err; > + > + rte_atomic32_set(&internal->running, 0); > + } > + > + rte_spinlock_unlock(&internal->lock); > + return 0; > +err: > + rte_spinlock_unlock(&internal->lock); > + return ret; > +} > + > +static int > +m_ifcvf_start(struct ifcvf_internal *internal) > +{ > + struct ifcvf_hw *hw =3D &internal->hw; > + uint32_t i, nr_vring; > + int vid, ret; > + struct rte_vhost_vring vq; > + void *vring_buf; > + uint64_t m_vring_iova =3D IFCVF_MEDIATED_VRING; > + uint64_t size; > + uint64_t gpa; > + > + memset(&vq, 0, sizeof(vq)); > + vid =3D internal->vid; > + nr_vring =3D rte_vhost_get_vring_num(vid); > + rte_vhost_get_negotiated_features(vid, &hw->req_features); > + > + for (i =3D 0; i < nr_vring; i++) { > + rte_vhost_get_vhost_vring(vid, i, &vq); > + > + size =3D RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), > + PAGE_SIZE); > + vring_buf =3D rte_zmalloc("ifcvf", size, PAGE_SIZE); > + vring_init(&internal->m_vring[i], vq.size, vring_buf, > + PAGE_SIZE); > + > + ret =3D rte_vfio_container_dma_map(internal- > >vfio_container_fd, > + (uint64_t)(uintptr_t)vring_buf, m_vring_iova, size); > + if (ret < 0) { > + DRV_LOG(ERR, "mediated vring DMA map failed."); > + goto error; > + } > + > + gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); > + if (gpa =3D=3D 0) { > + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > + return -1; > + } > + hw->vring[i].desc =3D gpa; > + > + gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail); > + if (gpa =3D=3D 0) { > + DRV_LOG(ERR, "Fail to get GPA for available ring."); > + return -1; > + } > + hw->vring[i].avail =3D gpa; > + > + /* Direct I/O for Tx queue, relay for Rx queue */ > + if (i & 1) { > + gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > + if (gpa =3D=3D 0) { > + DRV_LOG(ERR, "Fail to get GPA for used > ring."); > + return -1; > + } > + hw->vring[i].used =3D gpa; > + } else { > + hw->vring[i].used =3D m_vring_iova + > + (char *)internal->m_vring[i].used - > + (char *)internal->m_vring[i].desc; > + } > + > + hw->vring[i].size =3D vq.size; > + > + rte_vhost_get_vring_base(vid, i, > + &internal->m_vring[i].avail->idx, > + &internal->m_vring[i].used->idx); > + > + rte_vhost_get_vring_base(vid, i, &hw- > >vring[i].last_avail_idx, > + &hw->vring[i].last_used_idx); > + > + m_vring_iova +=3D size; > + } > + hw->nr_vring =3D nr_vring; > + > + return ifcvf_start_hw(&internal->hw); > + > +error: > + for (i =3D 0; i < nr_vring; i++) > + if (internal->m_vring[i].desc) > + rte_free(internal->m_vring[i].desc); > + > + return -1; > +} > + > +static int > +m_ifcvf_stop(struct ifcvf_internal *internal) > +{ > + int vid; > + uint32_t i; > + struct rte_vhost_vring vq; > + struct ifcvf_hw *hw =3D &internal->hw; > + uint64_t m_vring_iova =3D IFCVF_MEDIATED_VRING; > + uint64_t size, len; > + > + vid =3D internal->vid; > + ifcvf_stop_hw(hw); > + > + for (i =3D 0; i < hw->nr_vring; i++) { > + /* synchronize remaining new used entries if any */ > + if ((i & 1) =3D=3D 0) > + update_used_ring(internal, i); > + > + rte_vhost_get_vhost_vring(vid, i, &vq); > + len =3D IFCVF_USED_RING_LEN(vq.size); > + rte_vhost_log_used_vring(vid, i, 0, len); > + > + size =3D RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), > + PAGE_SIZE); > + rte_vfio_container_dma_unmap(internal- > >vfio_container_fd, > + (uint64_t)(uintptr_t)internal->m_vring[i].desc, > + m_vring_iova, size); > + > + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > + hw->vring[i].last_used_idx); > + rte_free(internal->m_vring[i].desc); > + m_vring_iova +=3D size; > + } > + > + return 0; > +} > + > +static void > +update_used_ring(struct ifcvf_internal *internal, uint16_t qid) > +{ > + rte_vdpa_relay_vring_used(internal->vid, qid, &internal- > >m_vring[qid]); > + rte_vhost_vring_call(internal->vid, qid); > +} > + > +static void * > +vring_relay(void *arg) > +{ > + int i, vid, epfd, fd, nfds; > + struct ifcvf_internal *internal =3D (struct ifcvf_internal *)arg; > + struct rte_vhost_vring vring; > + uint16_t qid, q_num; > + struct epoll_event events[IFCVF_MAX_QUEUES * 4]; > + struct epoll_event ev; > + int nbytes; > + uint64_t buf; > + > + vid =3D internal->vid; > + q_num =3D rte_vhost_get_vring_num(vid); > + > + /* add notify fd and interrupt fd to epoll */ > + epfd =3D epoll_create(IFCVF_MAX_QUEUES * 2); > + if (epfd < 0) { > + DRV_LOG(ERR, "failed to create epoll instance."); > + return NULL; > + } > + internal->epfd =3D epfd; > + > + vring.kickfd =3D -1; > + for (qid =3D 0; qid < q_num; qid++) { > + ev.events =3D EPOLLIN | EPOLLPRI; > + rte_vhost_get_vhost_vring(vid, qid, &vring); > + ev.data.u64 =3D qid << 1 | (uint64_t)vring.kickfd << 32; > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { > + DRV_LOG(ERR, "epoll add error: %s", > strerror(errno)); > + return NULL; > + } > + } > + > + for (qid =3D 0; qid < q_num; qid +=3D 2) { > + ev.events =3D EPOLLIN | EPOLLPRI; > + /* leave a flag to mark it's for interrupt */ > + ev.data.u64 =3D 1 | qid << 1 | > + (uint64_t)internal->intr_fd[qid] << 32; > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, internal->intr_fd[qid], > &ev) > + < 0) { > + DRV_LOG(ERR, "epoll add error: %s", > strerror(errno)); > + return NULL; > + } > + update_used_ring(internal, qid); > + } > + > + /* start relay with a first kick */ > + for (qid =3D 0; qid < q_num; qid++) > + ifcvf_notify_queue(&internal->hw, qid); > + > + /* listen to the events and react accordingly */ > + for (;;) { > + nfds =3D epoll_wait(epfd, events, q_num * 2, -1); > + if (nfds < 0) { > + if (errno =3D=3D EINTR) > + continue; > + DRV_LOG(ERR, "epoll_wait return fail\n"); > + return NULL; > + } > + > + for (i =3D 0; i < nfds; i++) { > + fd =3D (uint32_t)(events[i].data.u64 >> 32); > + do { > + nbytes =3D read(fd, &buf, 8); > + if (nbytes < 0) { > + if (errno =3D=3D EINTR || > + errno =3D=3D EWOULDBLOCK || > + errno =3D=3D EAGAIN) > + continue; > + DRV_LOG(INFO, "Error reading " > + "kickfd: %s", > + strerror(errno)); > + } > + break; > + } while (1); > + > + qid =3D events[i].data.u32 >> 1; > + > + if (events[i].data.u32 & 1) > + update_used_ring(internal, qid); > + else > + ifcvf_notify_queue(&internal->hw, qid); > + } > + } > + > + return NULL; > +} > + > +static int > +setup_vring_relay(struct ifcvf_internal *internal) > +{ > + int ret; > + > + ret =3D pthread_create(&internal->tid, NULL, vring_relay, > + (void *)internal); > + if (ret) { > + DRV_LOG(ERR, "failed to create ring relay pthread."); > + return -1; > + } > + return 0; > +} > + > +static int > +unset_vring_relay(struct ifcvf_internal *internal) > +{ > + void *status; > + > + if (internal->tid) { > + pthread_cancel(internal->tid); > + pthread_join(internal->tid, &status); > + } > + internal->tid =3D 0; > + > + if (internal->epfd >=3D 0) > + close(internal->epfd); > + internal->epfd =3D -1; > + > + return 0; > +} > + > +static int > +ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal) > +{ > + int ret; > + int vid =3D internal->vid; > + > + /* stop the direct IO data path */ > + unset_notify_relay(internal); > + vdpa_ifcvf_stop(internal); > + vdpa_disable_vfio_intr(internal); > + > + ret =3D rte_vhost_host_notifier_ctrl(vid, false); > + if (ret && ret !=3D -ENOTSUP) > + goto error; > + > + /* set up interrupt for interrupt relay */ > + ret =3D vdpa_enable_vfio_intr(internal, 1); > + if (ret) > + goto unmap; > + > + /* config the VF */ > + ret =3D m_ifcvf_start(internal); > + if (ret) > + goto unset_intr; > + > + /* set up vring relay thread */ > + ret =3D setup_vring_relay(internal); > + if (ret) > + goto stop_vf; > + > + rte_vhost_host_notifier_ctrl(vid, true); > + > + internal->sw_fallback_running =3D true; > + > + return 0; > + > +stop_vf: > + m_ifcvf_stop(internal); > +unset_intr: > + vdpa_disable_vfio_intr(internal); > +unmap: > + ifcvf_dma_map(internal, 0); > +error: > + return -1; > +} > + > +static int > +ifcvf_dev_config(int vid) > +{ > + int did; > + struct internal_list *list; > + struct ifcvf_internal *internal; > + > + did =3D rte_vhost_get_vdpa_device_id(vid); > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + internal =3D list->internal; > + internal->vid =3D vid; > + rte_atomic32_set(&internal->dev_attached, 1); > + update_datapath(internal); > + > + if (rte_vhost_host_notifier_ctrl(vid, true) !=3D 0) > + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); > + > + return 0; > +} > + > +static int > +ifcvf_dev_close(int vid) > +{ > + int did; > + struct internal_list *list; > + struct ifcvf_internal *internal; > + > + did =3D rte_vhost_get_vdpa_device_id(vid); > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + internal =3D list->internal; > + > + if (internal->sw_fallback_running) { > + /* unset ring relay */ > + unset_vring_relay(internal); > + > + /* reset VF */ > + m_ifcvf_stop(internal); > + > + /* remove interrupt setting */ > + vdpa_disable_vfio_intr(internal); > + > + /* unset DMA map for guest memory */ > + ifcvf_dma_map(internal, 0); > + > + internal->sw_fallback_running =3D false; > + } else { > + rte_atomic32_set(&internal->dev_attached, 0); > + update_datapath(internal); > + } > + > + return 0; > +} > + > +static int > +ifcvf_set_features(int vid) > +{ > + uint64_t features =3D 0; > + int did; > + struct internal_list *list; > + struct ifcvf_internal *internal; > + uint64_t log_base =3D 0, log_size =3D 0; > + > + did =3D rte_vhost_get_vdpa_device_id(vid); > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + internal =3D list->internal; > + rte_vhost_get_negotiated_features(vid, &features); > + > + if (!RTE_VHOST_NEED_LOG(features)) > + return 0; > + > + if (internal->sw_lm) { > + ifcvf_sw_fallback_switchover(internal); > + } else { > + rte_vhost_get_log_base(vid, &log_base, &log_size); > + rte_vfio_container_dma_map(internal->vfio_container_fd, > + log_base, IFCVF_LOG_BASE, log_size); > + ifcvf_enable_logging(&internal->hw, IFCVF_LOG_BASE, > log_size); > + } > + > + return 0; > +} > + > +static int > +ifcvf_get_vfio_group_fd(int vid) > +{ > + int did; > + struct internal_list *list; > + > + did =3D rte_vhost_get_vdpa_device_id(vid); > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + return list->internal->vfio_group_fd; > +} > + > +static int > +ifcvf_get_vfio_device_fd(int vid) > +{ > + int did; > + struct internal_list *list; > + > + did =3D rte_vhost_get_vdpa_device_id(vid); > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + return list->internal->vfio_dev_fd; > +} > + > +static int > +ifcvf_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size= ) > +{ > + int did; > + struct internal_list *list; > + struct ifcvf_internal *internal; > + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; > + int ret; > + > + did =3D rte_vhost_get_vdpa_device_id(vid); > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + internal =3D list->internal; > + > + reg.index =3D ifcvf_get_notify_region(&internal->hw); > + ret =3D ioctl(internal->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, > ®); > + if (ret) { > + DRV_LOG(ERR, "Get not get device region info: %s", > + strerror(errno)); > + return -1; > + } > + > + *offset =3D ifcvf_get_queue_notify_off(&internal->hw, qid) + > reg.offset; > + *size =3D 0x1000; > + > + return 0; > +} > + > +static int > +ifcvf_get_queue_num(int did, uint32_t *queue_num) > +{ > + struct internal_list *list; > + > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + *queue_num =3D list->internal->max_queues; > + > + return 0; > +} > + > +static int > +ifcvf_get_vdpa_features(int did, uint64_t *features) > +{ > + struct internal_list *list; > + > + list =3D find_internal_resource_by_did(did); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + *features =3D list->internal->features; > + > + return 0; > +} > + > +#define VDPA_SUPPORTED_PROTOCOL_FEATURES \ > + (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \ > + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \ > + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \ > + 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | \ > + 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) > +static int > +ifcvf_get_protocol_features(int did __rte_unused, uint64_t *features) > +{ > + *features =3D VDPA_SUPPORTED_PROTOCOL_FEATURES; > + return 0; > +} > + > +static struct rte_vdpa_dev_ops ifcvf_ops =3D { > + .get_queue_num =3D ifcvf_get_queue_num, > + .get_features =3D ifcvf_get_vdpa_features, > + .get_protocol_features =3D ifcvf_get_protocol_features, > + .dev_conf =3D ifcvf_dev_config, > + .dev_close =3D ifcvf_dev_close, > + .set_vring_state =3D NULL, > + .set_features =3D ifcvf_set_features, > + .migration_done =3D NULL, > + .get_vfio_group_fd =3D ifcvf_get_vfio_group_fd, > + .get_vfio_device_fd =3D ifcvf_get_vfio_device_fd, > + .get_notify_area =3D ifcvf_get_notify_area, > +}; > + > +static inline int > +open_int(const char *key __rte_unused, const char *value, void > *extra_args) > +{ > + uint16_t *n =3D extra_args; > + > + if (value =3D=3D NULL || extra_args =3D=3D NULL) > + return -EINVAL; > + > + *n =3D (uint16_t)strtoul(value, NULL, 0); > + if (*n =3D=3D USHRT_MAX && errno =3D=3D ERANGE) > + return -1; > + > + return 0; > +} > + > +static int > +ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, > + struct rte_pci_device *pci_dev) > +{ > + uint64_t features; > + struct ifcvf_internal *internal =3D NULL; > + struct internal_list *list =3D NULL; > + int vdpa_mode =3D 0; > + int sw_fallback_lm =3D 0; > + struct rte_kvargs *kvlist =3D NULL; > + int ret =3D 0; > + > + if (rte_eal_process_type() !=3D RTE_PROC_PRIMARY) > + return 0; > + > + if (!pci_dev->device.devargs) > + return 1; > + > + kvlist =3D rte_kvargs_parse(pci_dev->device.devargs->args, > + ifcvf_valid_arguments); > + if (kvlist =3D=3D NULL) > + return 1; > + > + /* probe only when vdpa mode is specified */ > + if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) =3D=3D 0) { > + rte_kvargs_free(kvlist); > + return 1; > + } > + > + ret =3D rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int, > + &vdpa_mode); > + if (ret < 0 || vdpa_mode =3D=3D 0) { > + rte_kvargs_free(kvlist); > + return 1; > + } > + > + list =3D rte_zmalloc("ifcvf", sizeof(*list), 0); > + if (list =3D=3D NULL) > + goto error; > + > + internal =3D rte_zmalloc("ifcvf", sizeof(*internal), 0); > + if (internal =3D=3D NULL) > + goto error; > + > + internal->pdev =3D pci_dev; > + rte_spinlock_init(&internal->lock); > + > + if (ifcvf_vfio_setup(internal) < 0) { > + DRV_LOG(ERR, "failed to setup device %s", pci_dev->name); > + goto error; > + } > + > + if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) { > + DRV_LOG(ERR, "failed to init device %s", pci_dev->name); > + goto error; > + } > + > + internal->max_queues =3D IFCVF_MAX_QUEUES; > + features =3D ifcvf_get_features(&internal->hw); > + internal->features =3D (features & > + ~(1ULL << VIRTIO_F_IOMMU_PLATFORM)) | > + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | > + (1ULL << VIRTIO_NET_F_CTRL_VQ) | > + (1ULL << VIRTIO_NET_F_STATUS) | > + (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | > + (1ULL << VHOST_F_LOG_ALL); > + > + internal->dev_addr.pci_addr =3D pci_dev->addr; > + internal->dev_addr.type =3D PCI_ADDR; > + list->internal =3D internal; > + > + if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) { > + ret =3D rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM, > + &open_int, &sw_fallback_lm); > + if (ret < 0) > + goto error; > + } > + internal->sw_lm =3D sw_fallback_lm; > + > + internal->did =3D rte_vdpa_register_device(&internal->dev_addr, > + &ifcvf_ops); > + if (internal->did < 0) { > + DRV_LOG(ERR, "failed to register device %s", pci_dev- > >name); > + goto error; > + } > + > + pthread_mutex_lock(&internal_list_lock); > + TAILQ_INSERT_TAIL(&internal_list, list, next); > + pthread_mutex_unlock(&internal_list_lock); > + > + rte_atomic32_set(&internal->started, 1); > + update_datapath(internal); > + > + rte_kvargs_free(kvlist); > + return 0; > + > +error: > + rte_kvargs_free(kvlist); > + rte_free(list); > + rte_free(internal); > + return -1; > +} > + > +static int > +ifcvf_pci_remove(struct rte_pci_device *pci_dev) > +{ > + struct ifcvf_internal *internal; > + struct internal_list *list; > + > + if (rte_eal_process_type() !=3D RTE_PROC_PRIMARY) > + return 0; > + > + list =3D find_internal_resource_by_dev(pci_dev); > + if (list =3D=3D NULL) { > + DRV_LOG(ERR, "Invalid device: %s", pci_dev->name); > + return -1; > + } > + > + internal =3D list->internal; > + rte_atomic32_set(&internal->started, 0); > + update_datapath(internal); > + > + rte_pci_unmap_device(internal->pdev); > + rte_vfio_container_destroy(internal->vfio_container_fd); > + rte_vdpa_unregister_device(internal->did); > + > + pthread_mutex_lock(&internal_list_lock); > + TAILQ_REMOVE(&internal_list, list, next); > + pthread_mutex_unlock(&internal_list_lock); > + > + rte_free(list); > + rte_free(internal); > + > + return 0; > +} > + > +/* > + * IFCVF has the same vendor ID and device ID as virtio net PCI > + * device, with its specific subsystem vendor ID and device ID. > + */ > +static const struct rte_pci_id pci_id_ifcvf_map[] =3D { > + { .class_id =3D RTE_CLASS_ANY_ID, > + .vendor_id =3D IFCVF_VENDOR_ID, > + .device_id =3D IFCVF_DEVICE_ID, > + .subsystem_vendor_id =3D IFCVF_SUBSYS_VENDOR_ID, > + .subsystem_device_id =3D IFCVF_SUBSYS_DEVICE_ID, > + }, > + > + { .vendor_id =3D 0, /* sentinel */ > + }, > +}; > + > +static struct rte_pci_driver rte_ifcvf_vdpa =3D { > + .id_table =3D pci_id_ifcvf_map, > + .drv_flags =3D 0, > + .probe =3D ifcvf_pci_probe, > + .remove =3D ifcvf_pci_remove, > +}; > + > +RTE_PMD_REGISTER_PCI(net_ifcvf, rte_ifcvf_vdpa); > +RTE_PMD_REGISTER_PCI_TABLE(net_ifcvf, pci_id_ifcvf_map); > +RTE_PMD_REGISTER_KMOD_DEP(net_ifcvf, "* vfio-pci"); > + > +RTE_INIT(ifcvf_vdpa_init_log) > +{ > + ifcvf_vdpa_logtype =3D rte_log_register("pmd.net.ifcvf_vdpa"); > + if (ifcvf_vdpa_logtype >=3D 0) > + rte_log_set_level(ifcvf_vdpa_logtype, RTE_LOG_NOTICE); > +} > diff --git a/drivers/vdpa/ifc/meson.build b/drivers/vdpa/ifc/meson.build > new file mode 100644 > index 0000000..adc9ed9 > --- /dev/null > +++ b/drivers/vdpa/ifc/meson.build > @@ -0,0 +1,9 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2018 Intel Corporation > + > +build =3D dpdk_conf.has('RTE_LIBRTE_VHOST') > +reason =3D 'missing dependency, DPDK vhost library' > +allow_experimental_apis =3D true > +sources =3D files('ifcvf_vdpa.c', 'base/ifcvf.c') > +includes +=3D include_directories('base') > +deps +=3D 'vhost' > diff --git a/drivers/vdpa/ifc/rte_pmd_ifc_version.map > b/drivers/vdpa/ifc/rte_pmd_ifc_version.map > new file mode 100644 > index 0000000..f9f17e4 > --- /dev/null > +++ b/drivers/vdpa/ifc/rte_pmd_ifc_version.map > @@ -0,0 +1,3 @@ > +DPDK_20.0 { > + local: *; > +}; > diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build > index a839ff5..fd164d3 100644 > --- a/drivers/vdpa/meson.build > +++ b/drivers/vdpa/meson.build > @@ -1,7 +1,7 @@ > # SPDX-License-Identifier: BSD-3-Clause > # Copyright 2019 Mellanox Technologies, Ltd >=20 > -drivers =3D [] > +drivers =3D ['ifc'] > std_deps =3D ['bus_pci', 'kvargs'] > std_deps +=3D ['vhost'] > config_flag_fmt =3D 'RTE_LIBRTE_@0@_PMD' > -- > 1.8.3.1