From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9B007A00C4; Tue, 26 Apr 2022 12:07:10 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 490EB40C35; Tue, 26 Apr 2022 12:07:10 +0200 (CEST) Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id B60EA40691 for ; Tue, 26 Apr 2022 12:07:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650967628; x=1682503628; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=nQx+rB6HAogDIjEDAYpSfvIYuxJncxaQAArs2m/zp+I=; b=k/vFZBcp3CWshUSPbUNTLBc86Ql2GZW/f5t7DVettpucM9ksrVkK1JK9 H9hB+zW6n+y9Kkn7KA4cyM+0HlSGFL7okx4d0kWrqYfuzJpTupU2HXNe4 YZHTDwfFxNKtUKOjTQzcsgeqdz0erwb+U9viPbn7glS7Yy5UKVpbW87B0 BGhJ40otk7ZlHRssQiqu9nuLRLsoyTy+bkHzay+UAvKo/kOHBsuEbCYeJ G6zPA6LzfWIY66qHOwjum3yw0N5w+Tvx2YeP3ysVnUuo9Do/BXj33ynCQ T/UcUJ8/0Ek44eT/qtrBCAm6QvImbEO1OwSL1nhqFSSjDNG6FdOFdEjAf Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10328"; a="326015937" X-IronPort-AV: E=Sophos;i="5.90,290,1643702400"; d="scan'208";a="326015937" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2022 03:07:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,290,1643702400"; d="scan'208";a="678541092" Received: from fmsmsx605.amr.corp.intel.com ([10.18.126.85]) by orsmga004.jf.intel.com with ESMTP; 26 Apr 2022 03:07:07 -0700 Received: from fmsmsx604.amr.corp.intel.com (10.18.126.84) by fmsmsx605.amr.corp.intel.com (10.18.126.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Tue, 26 Apr 2022 03:07:07 -0700 Received: from fmsmsx604.amr.corp.intel.com ([10.18.126.84]) by fmsmsx604.amr.corp.intel.com ([10.18.126.84]) with mapi id 15.01.2308.027; Tue, 26 Apr 2022 03:07:07 -0700 From: "Pei, Andy" To: "Xia, Chenbo" , "dev@dpdk.org" CC: "maxime.coquelin@redhat.com" , "Cao, Gang" , "Liu, Changpeng" Subject: RE: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration Thread-Topic: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration Thread-Index: AQHYVWD65JXsuzZrC0WNxheAnlGfoa0AntdggAFgj5A= Date: Tue, 26 Apr 2022 10:07:07 +0000 Message-ID: <4e92f99e31f24e4e9c3d7afb02f0b02c@intel.com> References: <1643093258-47258-2-git-send-email-andy.pei@intel.com> <1650530034-59744-1-git-send-email-andy.pei@intel.com> <1650530034-59744-7-git-send-email-andy.pei@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.6.401.20 dlp-reaction: no-action x-originating-ip: [10.239.127.36] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Chenbo, Thanks for your reply. My reply is inline. > -----Original Message----- > From: Xia, Chenbo > Sent: Monday, April 25, 2022 9:10 PM > To: Pei, Andy ; dev@dpdk.org > Cc: maxime.coquelin@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migratio= n >=20 > > -----Original Message----- > > From: Pei, Andy > > Sent: Thursday, April 21, 2022 4:34 PM > > To: dev@dpdk.org > > Cc: Xia, Chenbo ; maxime.coquelin@redhat.com; > > Cao, Gang ; Liu, Changpeng > > > > Subject: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration > > > > Add SW live-migration support to block device. > > Add dirty page logging to block device. >=20 > Add SW live-migration support including dirty page logging for block devi= ce. >=20 Sure, I will remove " Add dirty page logging to block device." In next vers= ion. > > > > Signed-off-by: Andy Pei > > --- > > drivers/vdpa/ifc/base/ifcvf.c | 4 +- > > drivers/vdpa/ifc/base/ifcvf.h | 6 ++ > > drivers/vdpa/ifc/ifcvf_vdpa.c | 128 > > +++++++++++++++++++++++++++++++++++-- > > ----- > > 3 files changed, 115 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/vdpa/ifc/base/ifcvf.c > > b/drivers/vdpa/ifc/base/ifcvf.c index d10c1fd..e417c50 100644 > > --- a/drivers/vdpa/ifc/base/ifcvf.c > > +++ b/drivers/vdpa/ifc/base/ifcvf.c > > @@ -191,7 +191,7 @@ > > IFCVF_WRITE_REG32(val >> 32, hi); > > } > > > > -STATIC int > > +int > > ifcvf_hw_enable(struct ifcvf_hw *hw) > > { > > struct ifcvf_pci_common_cfg *cfg; > > @@ -240,7 +240,7 @@ > > return 0; > > } > > > > -STATIC void > > +void > > ifcvf_hw_disable(struct ifcvf_hw *hw) { > > u32 i; > > diff --git a/drivers/vdpa/ifc/base/ifcvf.h > > b/drivers/vdpa/ifc/base/ifcvf.h index 769c603..6dd7925 100644 > > --- a/drivers/vdpa/ifc/base/ifcvf.h > > +++ b/drivers/vdpa/ifc/base/ifcvf.h > > @@ -179,4 +179,10 @@ struct ifcvf_hw { > > u64 > > ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid); > > > > +int > > +ifcvf_hw_enable(struct ifcvf_hw *hw); > > + > > +void > > +ifcvf_hw_disable(struct ifcvf_hw *hw); > > + > > #endif /* _IFCVF_H_ */ > > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c > > b/drivers/vdpa/ifc/ifcvf_vdpa.c index 8d104b7..a23dc2d 100644 > > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c > > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c > > @@ -345,6 +345,56 @@ struct rte_vdpa_dev_info { > > } > > } > > > > +static void > > +vdpa_ifcvf_blk_pause(struct ifcvf_internal *internal) { > > + struct ifcvf_hw *hw =3D &internal->hw; > > + struct rte_vhost_vring vq; > > + int i, vid; > > + uint64_t features =3D 0; > > + uint64_t log_base =3D 0, log_size =3D 0; > > + uint64_t len; > > + > > + vid =3D internal->vid; > > + > > + if (internal->device_type =3D=3D IFCVF_BLK) { > > + for (i =3D 0; i < hw->nr_vring; i++) { > > + rte_vhost_get_vhost_vring(internal->vid, i, &vq); > > + while (vq.avail->idx !=3D vq.used->idx) { > > + ifcvf_notify_queue(hw, i); > > + usleep(10); > > + } > > + hw->vring[i].last_avail_idx =3D vq.avail->idx; > > + hw->vring[i].last_used_idx =3D vq.used->idx; > > + } > > + } > > + > > + ifcvf_hw_disable(hw); > > + > > + for (i =3D 0; i < hw->nr_vring; i++) > > + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > > + hw->vring[i].last_used_idx); > > + > > + if (internal->sw_lm) > > + return; > > + > > + rte_vhost_get_negotiated_features(vid, &features); > > + if (RTE_VHOST_NEED_LOG(features)) { > > + ifcvf_disable_logging(hw); > > + rte_vhost_get_log_base(internal->vid, &log_base, &log_size); > > + rte_vfio_container_dma_unmap(internal->vfio_container_fd, > > + log_base, IFCVF_LOG_BASE, log_size); > > + /* > > + * IFCVF marks dirty memory pages for only packet buffer, > > + * SW helps to mark the used ring as dirty after device stops. > > + */ > > + for (i =3D 0; i < hw->nr_vring; i++) { > > + len =3D IFCVF_USED_RING_LEN(hw->vring[i].size); > > + rte_vhost_log_used_vring(vid, i, 0, len); > > + } > > + } > > +} >=20 > Can we consider combining vdpa_ifcvf_blk_pause and vdpa_ifcvf_stop to > one function and check device type internally to do different things? Bec= ause > as I see, most logic is the same. >=20 OK, I will address it in next version. > > + > > #define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \ > > sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1)) static int @@ - > 659,15 > > +709,22 @@ struct rte_vdpa_dev_info { > > } > > hw->vring[i].avail =3D gpa; > > > > - /* Direct I/O for Tx queue, relay for Rx queue */ > > - if (i & 1) { > > - gpa =3D hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > > - if (gpa =3D=3D 0) { > > - DRV_LOG(ERR, "Fail to get GPA for used > ring."); > > - return -1; > > + if (internal->device_type =3D=3D IFCVF_NET) { > > + /* Direct I/O for Tx queue, relay for Rx queue */ > > + if (i & 1) { > > + gpa =3D hva_to_gpa(vid, > > (uint64_t)(uintptr_t)vq.used); > > + if (gpa =3D=3D 0) { > > + DRV_LOG(ERR, "Fail to get GPA for > used > > ring."); > > + return -1; > > + } > > + hw->vring[i].used =3D gpa; > > + } else { > > + hw->vring[i].used =3D m_vring_iova + > > + (char *)internal->m_vring[i].used - > > + (char *)internal->m_vring[i].desc; > > } > > - hw->vring[i].used =3D gpa; > > - } else { > > + } else if (internal->device_type =3D=3D IFCVF_BLK) { > > + /* BLK: relay every queue */ > > hw->vring[i].used =3D m_vring_iova + > > (char *)internal->m_vring[i].used - > > (char *)internal->m_vring[i].desc; @@ -686,7 > +743,10 @@ struct > > rte_vdpa_dev_info { > > } > > hw->nr_vring =3D nr_vring; > > > > - return ifcvf_start_hw(&internal->hw); > > + if (internal->device_type =3D=3D IFCVF_NET) > > + return ifcvf_start_hw(&internal->hw); > > + else if (internal->device_type =3D=3D IFCVF_BLK) > > + return ifcvf_hw_enable(&internal->hw); > > > > error: > > for (i =3D 0; i < nr_vring; i++) > > @@ -710,8 +770,12 @@ struct rte_vdpa_dev_info { > > > > for (i =3D 0; i < hw->nr_vring; i++) { > > /* synchronize remaining new used entries if any */ > > - if ((i & 1) =3D=3D 0) > > + if (internal->device_type =3D=3D IFCVF_NET) { > > + if ((i & 1) =3D=3D 0) > > + update_used_ring(internal, i); > > + } else if (internal->device_type =3D=3D IFCVF_BLK) { > > update_used_ring(internal, i); > > + } > > > > rte_vhost_get_vhost_vring(vid, i, &vq); > > len =3D IFCVF_USED_RING_LEN(vq.size); @@ -773,17 +837,36 > @@ struct > > rte_vdpa_dev_info { > > } > > } > > > > - for (qid =3D 0; qid < q_num; qid +=3D 2) { > > - ev.events =3D EPOLLIN | EPOLLPRI; > > - /* leave a flag to mark it's for interrupt */ > > - ev.data.u64 =3D 1 | qid << 1 | > > - (uint64_t)internal->intr_fd[qid] << 32; > > - if (epoll_ctl(epfd, EPOLL_CTL_ADD, internal->intr_fd[qid], > &ev) > > - < 0) { > > - DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); > > - return NULL; > > + if (internal->device_type =3D=3D IFCVF_NET) { > > + for (qid =3D 0; qid < q_num; qid +=3D 2) { > > + ev.events =3D EPOLLIN | EPOLLPRI; > > + /* leave a flag to mark it's for interrupt */ > > + ev.data.u64 =3D 1 | qid << 1 | > > + (uint64_t)internal->intr_fd[qid] << 32; > > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, > > + internal->intr_fd[qid], &ev) > > + < 0) { > > + DRV_LOG(ERR, "epoll add error: %s", > > + strerror(errno)); > > + return NULL; > > + } > > + update_used_ring(internal, qid); > > + } > > + } else if (internal->device_type =3D=3D IFCVF_BLK) { > > + for (qid =3D 0; qid < q_num; qid +=3D 1) { > > + ev.events =3D EPOLLIN | EPOLLPRI; > > + /* leave a flag to mark it's for interrupt */ > > + ev.data.u64 =3D 1 | qid << 1 | > > + (uint64_t)internal->intr_fd[qid] << 32; > > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, > > + internal->intr_fd[qid], &ev) > > + < 0) { > > + DRV_LOG(ERR, "epoll add error: %s", > > + strerror(errno)); > > + return NULL; > > + } > > + update_used_ring(internal, qid); >=20 > It seems we can also reduce duplicate code for above case. And for other > checks, if we can use only one combined condition to check, I prefer to j= ust > use one. >=20 OK, I will address it in next version. > Thanks, > Chenbo >=20 > > } > > - update_used_ring(internal, qid); > > } > > > > /* start relay with a first kick */ > > @@ -871,7 +954,10 @@ struct rte_vdpa_dev_info { > > > > /* stop the direct IO data path */ > > unset_notify_relay(internal); > > - vdpa_ifcvf_stop(internal); > > + if (internal->device_type =3D=3D IFCVF_NET) > > + vdpa_ifcvf_stop(internal); > > + else if (internal->device_type =3D=3D IFCVF_BLK) > > + vdpa_ifcvf_blk_pause(internal); > > vdpa_disable_vfio_intr(internal); > > > > ret =3D rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, > false); > > -- > > 1.8.3.1 >=20