From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 82AA5A04FF; Tue, 22 Mar 2022 12:10:14 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5D69A410E5; Tue, 22 Mar 2022 12:10:14 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id C8A5240E64 for ; Tue, 22 Mar 2022 12:10:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647947412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7qhZQkZ7NMbqzby5O9QN3VXWDfPNqlp3fvAYNNmZHj4=; b=IkgvJHnj+srZNX5fehpwa8XsxPfJJS9OwaYG+NMDSYRTqQ9GrQRwuZ/XSmcaofdeOQGPXY CZNfsk3v/ubtmV6S8I0E7qXpxL7SYOIy8CgCCCdnZ9nelZ3jag564cj5RvnsYfc86RSdNG nh0wjcw3ne5/wqdJZcXu7PtcfHfEX6E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-459-uInjKZrNOjiICe1A0NaLLQ-1; Tue, 22 Mar 2022 07:10:06 -0400 X-MC-Unique: uInjKZrNOjiICe1A0NaLLQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9DD5C1044560; Tue, 22 Mar 2022 11:10:05 +0000 (UTC) Received: from [10.39.208.2] (unknown [10.39.208.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 93AFAC26E9A; Tue, 22 Mar 2022 11:10:04 +0000 (UTC) Message-ID: Date: Tue, 22 Mar 2022 12:10:03 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 To: Andy Pei , dev@dpdk.org Cc: chenbo.xia@intel.com, gang.cao@intel.com, changpeng.liu@intel.com References: <1643093258-47258-2-git-send-email-andy.pei@intel.com> <1643425417-215270-1-git-send-email-andy.pei@intel.com> <1643425417-215270-6-git-send-email-andy.pei@intel.com> From: Maxime Coquelin Subject: Re: [PATCH v3 05/15] vdpa/ifc: add blk dev sw live migration In-Reply-To: <1643425417-215270-6-git-send-email-andy.pei@intel.com> X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Andy, "vdpa/ifc: add block device SW live-migration" On 1/29/22 04:03, Andy Pei wrote: > Enable virtio blk sw live migration relay callfd and log the dirty page. Please try to make the above sentence simpler. Also, it seems that below patch changes behaviour for net devices, so the commit message should explain that. > In this version we ignore the write cmd and still mark it dirty. > > Signed-off-by: Andy Pei > --- > drivers/vdpa/ifc/base/ifcvf.c | 4 +- > drivers/vdpa/ifc/base/ifcvf.h | 6 ++ > drivers/vdpa/ifc/ifcvf_vdpa.c | 128 +++++++++++++++++++++++++++++++++++------- > 3 files changed, 116 insertions(+), 22 deletions(-) > > diff --git a/drivers/vdpa/ifc/base/ifcvf.c b/drivers/vdpa/ifc/base/ifcvf.c > index 721cb1d..3a69e53 100644 > --- a/drivers/vdpa/ifc/base/ifcvf.c > +++ b/drivers/vdpa/ifc/base/ifcvf.c > @@ -189,7 +189,7 @@ > IFCVF_WRITE_REG32(val >> 32, hi); > } > > -STATIC int > +int > ifcvf_hw_enable(struct ifcvf_hw *hw) > { > struct ifcvf_pci_common_cfg *cfg; > @@ -238,7 +238,7 @@ > return 0; > } > > -STATIC void > +void > ifcvf_hw_disable(struct ifcvf_hw *hw) > { > u32 i; > diff --git a/drivers/vdpa/ifc/base/ifcvf.h b/drivers/vdpa/ifc/base/ifcvf.h > index 769c603..6dd7925 100644 > --- a/drivers/vdpa/ifc/base/ifcvf.h > +++ b/drivers/vdpa/ifc/base/ifcvf.h > @@ -179,4 +179,10 @@ struct ifcvf_hw { > u64 > ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid); > > +int > +ifcvf_hw_enable(struct ifcvf_hw *hw); > + > +void > +ifcvf_hw_disable(struct ifcvf_hw *hw); > + > #endif /* _IFCVF_H_ */ > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c > index 4f99bb3..a930825 100644 > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c > @@ -332,10 +332,67 @@ struct rte_vdpa_dev_info { > > rte_vhost_get_negotiated_features(vid, &features); > if (RTE_VHOST_NEED_LOG(features)) { > - ifcvf_disable_logging(hw); > - rte_vhost_get_log_base(internal->vid, &log_base, &log_size); > - rte_vfio_container_dma_unmap(internal->vfio_container_fd, > - log_base, IFCVF_LOG_BASE, log_size); > + if (internal->device_type == IFCVF_NET) { > + ifcvf_disable_logging(hw); > + rte_vhost_get_log_base(internal->vid, &log_base, > + &log_size); > + rte_vfio_container_dma_unmap( > + internal->vfio_container_fd, log_base, > + IFCVF_LOG_BASE, log_size); > + } > + /* IFCVF marks dirty memory pages for only packet buffer, > + * SW helps to mark the used ring as dirty after device stops. > + */ > + for (i = 0; i < hw->nr_vring; i++) { > + len = IFCVF_USED_RING_LEN(hw->vring[i].size); > + rte_vhost_log_used_vring(vid, i, 0, len); > + } > + } > +} > + > +static void > +vdpa_ifcvf_blk_pause(struct ifcvf_internal *internal) > +{ > + struct ifcvf_hw *hw = &internal->hw; > + struct rte_vhost_vring vq; > + int i, vid; > + uint64_t features = 0; > + uint64_t log_base = 0, log_size = 0; > + uint64_t len; > + > + vid = internal->vid; > + > + if (internal->device_type == IFCVF_BLK) { > + for (i = 0; i < hw->nr_vring; i++) { > + rte_vhost_get_vhost_vring(internal->vid, i, &vq); > + while (vq.avail->idx != vq.used->idx) { > + ifcvf_notify_queue(hw, i); > + usleep(10); > + } > + hw->vring[i].last_avail_idx = vq.avail->idx; > + hw->vring[i].last_used_idx = vq.used->idx; > + } > + } > + > + ifcvf_hw_disable(hw); > + > + for (i = 0; i < hw->nr_vring; i++) > + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > + hw->vring[i].last_used_idx); > + > + if (internal->sw_lm) > + return; > + > + rte_vhost_get_negotiated_features(vid, &features); > + if (RTE_VHOST_NEED_LOG(features)) { > + if (internal->device_type == IFCVF_NET) { > + ifcvf_disable_logging(hw); > + rte_vhost_get_log_base(internal->vid, &log_base, > + &log_size); > + rte_vfio_container_dma_unmap( > + internal->vfio_container_fd, log_base, > + IFCVF_LOG_BASE, log_size); > + } > /* > * IFCVF marks dirty memory pages for only packet buffer, > * SW helps to mark the used ring as dirty after device stops. > @@ -661,15 +718,17 @@ struct rte_vdpa_dev_info { > } > hw->vring[i].avail = gpa; > > - /* Direct I/O for Tx queue, relay for Rx queue */ > - if (i & 1) { > + /* NETWORK: Direct I/O for Tx queue, relay for Rx queue > + * BLK: relay every queue > + */ > + if ((i & 1) && (internal->device_type == IFCVF_NET)) { > gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > if (gpa == 0) { > DRV_LOG(ERR, "Fail to get GPA for used ring."); > return -1; > } > hw->vring[i].used = gpa; > - } else { > + } else if (internal->device_type == IFCVF_BLK) { > hw->vring[i].used = m_vring_iova + > (char *)internal->m_vring[i].used - > (char *)internal->m_vring[i].desc; > @@ -688,7 +747,10 @@ struct rte_vdpa_dev_info { > } > hw->nr_vring = nr_vring; > > - return ifcvf_start_hw(&internal->hw); > + if (internal->device_type == IFCVF_NET) > + return ifcvf_start_hw(&internal->hw); > + else if (internal->device_type == IFCVF_BLK) > + return ifcvf_hw_enable(&internal->hw); > > error: > for (i = 0; i < nr_vring; i++) > @@ -713,8 +775,10 @@ struct rte_vdpa_dev_info { > > for (i = 0; i < hw->nr_vring; i++) { > /* synchronize remaining new used entries if any */ > - if ((i & 1) == 0) > + if (((i & 1) == 0 && internal->device_type == IFCVF_NET) || > + internal->device_type == IFCVF_BLK) { > update_used_ring(internal, i); > + } > > rte_vhost_get_vhost_vring(vid, i, &vq); > len = IFCVF_USED_RING_LEN(vq.size); > @@ -726,6 +790,8 @@ struct rte_vdpa_dev_info { > (uint64_t)(uintptr_t)internal->m_vring[i].desc, > m_vring_iova, size); > > + hw->vring[i].last_avail_idx = vq.used->idx; > + hw->vring[i].last_used_idx = vq.used->idx; > rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > hw->vring[i].last_used_idx); > rte_free(internal->m_vring[i].desc); > @@ -776,17 +842,36 @@ struct rte_vdpa_dev_info { > } > } > > - for (qid = 0; qid < q_num; qid += 2) { > - ev.events = EPOLLIN | EPOLLPRI; > - /* leave a flag to mark it's for interrupt */ > - ev.data.u64 = 1 | qid << 1 | > - (uint64_t)internal->intr_fd[qid] << 32; > - if (epoll_ctl(epfd, EPOLL_CTL_ADD, internal->intr_fd[qid], &ev) > - < 0) { > - DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); > - return NULL; > + if (internal->device_type == IFCVF_NET) { > + for (qid = 0; qid < q_num; qid += 2) { > + ev.events = EPOLLIN | EPOLLPRI; > + /* leave a flag to mark it's for interrupt */ > + ev.data.u64 = 1 | qid << 1 | > + (uint64_t)internal->intr_fd[qid] << 32; > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, > + internal->intr_fd[qid], &ev) > + < 0) { > + DRV_LOG(ERR, "epoll add error: %s", > + strerror(errno)); > + return NULL; > + } > + update_used_ring(internal, qid); > + } > + } else if (internal->device_type == IFCVF_BLK) { > + for (qid = 0; qid < q_num; qid += 1) { > + ev.events = EPOLLIN | EPOLLPRI; > + /* leave a flag to mark it's for interrupt */ > + ev.data.u64 = 1 | qid << 1 | > + (uint64_t)internal->intr_fd[qid] << 32; > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, > + internal->intr_fd[qid], &ev) > + < 0) { > + DRV_LOG(ERR, "epoll add error: %s", > + strerror(errno)); > + return NULL; > + } > + update_used_ring(internal, qid); > } > - update_used_ring(internal, qid); > } > > /* start relay with a first kick */ > @@ -874,7 +959,10 @@ struct rte_vdpa_dev_info { > > /* stop the direct IO data path */ > unset_notify_relay(internal); > - vdpa_ifcvf_stop(internal); > + if (internal->device_type == IFCVF_NET) > + vdpa_ifcvf_stop(internal); > + else if (internal->device_type == IFCVF_BLK) > + vdpa_ifcvf_blk_pause(internal); > vdpa_disable_vfio_intr(internal); > > ret = rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false);