From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 1115A1B2FA for ; Thu, 2 Nov 2017 08:22:15 +0100 (CET) Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2017 00:22:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,332,1505804400"; d="scan'208";a="168254542" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga005.jf.intel.com with ESMTP; 02 Nov 2017 00:21:50 -0700 Received: from fmsmsx151.amr.corp.intel.com (10.18.125.4) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 2 Nov 2017 00:21:48 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by FMSMSX151.amr.corp.intel.com (10.18.125.4) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 2 Nov 2017 00:21:48 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.175]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.213]) with mapi id 14.03.0319.002; Thu, 2 Nov 2017 15:21:46 +0800 From: "Yao, Lei A" To: Maxime Coquelin , "dev@dpdk.org" , "Horton, Remy" , "Bie, Tiwei" , "yliu@fridaylinux.org" CC: "mst@redhat.com" , "jfreiman@redhat.com" , "vkaplans@redhat.com" , "jasowang@redhat.com" Thread-Topic: [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Thread-Index: AQHTPbZKHRv4v6mGlUCKjsvZRny2V6MA2bbw Date: Thu, 2 Nov 2017 07:21:45 +0000 Message-ID: <2DBBFF226F7CF64BAFCA79B681719D953A2CD10A@shsmsx102.ccr.corp.intel.com> References: <20171005083627.27828-1-maxime.coquelin@redhat.com> <20171005083627.27828-18-maxime.coquelin@redhat.com> In-Reply-To: <20171005083627.27828-18-maxime.coquelin@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDRmMDc4MzEtNDA3MC00NDQ1LTgwMmUtODBjMGE1MWUzZGQ1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6InhUb21lY2dDa05id2V3QzFNblRrazB4T014ZjBHMjJlZDA4R3VqUjlKa2s9In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 07:22:17 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin > Sent: Thursday, October 5, 2017 4:36 PM > To: dev@dpdk.org; Horton, Remy ; Bie, Tiwei > ; yliu@fridaylinux.org > Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com; > jasowang@redhat.com; Maxime Coquelin > Subject: [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device > creation until ring are mapped >=20 > Translating the start addresses of the rings is not enough, we need to > be sure all the ring is made available by the guest. >=20 > It depends on the size of the rings, which is not known on SET_VRING_ADDR > reception. Furthermore, we need to be be safe against vring pages > invalidates. >=20 > This patch introduces a new access_ok flag per virtqueue, which is set > when all the rings are mapped, and cleared as soon as a page used by a > ring is invalidated. The invalidation part is implemented in a following > patch. >=20 > Signed-off-by: Maxime Coquelin > --- > lib/librte_vhost/vhost.c | 37 ++++++++++++++++++++++++++ > lib/librte_vhost/vhost.h | 2 ++ > lib/librte_vhost/vhost_user.c | 62 +++++++++++++++++++++++++++++++-- > ---------- > lib/librte_vhost/virtio_net.c | 60 +++++++++++++++++++++++++------------= - > --- > 4 files changed, 121 insertions(+), 40 deletions(-) >=20 > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c > index 0e2ad3322..ef54835a6 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -135,6 +135,43 @@ free_device(struct virtio_net *dev) > rte_free(dev); > } >=20 > +int > +vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq) > +{ > + uint64_t size; > + > + if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) > + goto out; > + > + size =3D sizeof(struct vring_desc) * vq->size; > + vq->desc =3D (struct vring_desc *)vhost_iova_to_vva(dev, vq, > + vq- > >ring_addrs.desc_user_addr, > + size, VHOST_ACCESS_RW); > + if (!vq->desc) > + return -1; > + > + size =3D sizeof(struct vring_avail); > + size +=3D sizeof(uint16_t) * vq->size; > + vq->avail =3D (struct vring_avail *)vhost_iova_to_vva(dev, vq, > + vq- > >ring_addrs.avail_user_addr, > + size, VHOST_ACCESS_RW); > + if (!vq->avail) > + return -1; > + > + size =3D sizeof(struct vring_used); > + size +=3D sizeof(struct vring_used_elem) * vq->size; > + vq->used =3D (struct vring_used *)vhost_iova_to_vva(dev, vq, > + vq- > >ring_addrs.used_user_addr, > + size, VHOST_ACCESS_RW); > + if (!vq->used) > + return -1; > + > +out: > + vq->access_ok =3D 1; > + > + return 0; > +} > + > static void > init_vring_queue(struct virtio_net *dev, uint32_t vring_idx) > { > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 903da5db5..b3fe6bb8e 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -113,6 +113,7 @@ struct vhost_virtqueue { > /* Currently unused as polling mode is enabled */ > int kickfd; > int enabled; > + int access_ok; >=20 > /* Physical address of used ring, for logging */ > uint64_t log_guest_addr; > @@ -378,6 +379,7 @@ void vhost_backend_cleanup(struct virtio_net *dev); >=20 > uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct > vhost_virtqueue *vq, > uint64_t iova, uint64_t size, uint8_t perm); > +int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq); >=20 > static __rte_always_inline uint64_t > vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq, > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.= c > index 90b209764..dd6562fd8 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -391,6 +391,12 @@ vhost_user_set_vring_addr(struct virtio_net *dev, > VhostUserMsg *msg) > */ > memcpy(&vq->ring_addrs, addr, sizeof(*addr)); >=20 > + vq->desc =3D NULL; > + vq->avail =3D NULL; > + vq->used =3D NULL; > + > + vq->access_ok =3D 0; > + > return 0; > } >=20 > @@ -407,10 +413,10 @@ static struct virtio_net > *translate_ring_addresses(struct virtio_net *dev, > vq->desc =3D (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev, > vq, addr->desc_user_addr, sizeof(struct vring_desc)); > if (vq->desc =3D=3D 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > + RTE_LOG(DEBUG, VHOST_CONFIG, > "(%d) failed to find desc ring address.\n", > dev->vid); > - return NULL; > + return dev; > } >=20 > dev =3D numa_realloc(dev, vq_index); > @@ -419,19 +425,19 @@ static struct virtio_net > *translate_ring_addresses(struct virtio_net *dev, > vq->avail =3D (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev, > vq, addr->avail_user_addr, sizeof(struct vring_avail)); > if (vq->avail =3D=3D 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > + RTE_LOG(DEBUG, VHOST_CONFIG, > "(%d) failed to find avail ring address.\n", > dev->vid); > - return NULL; > + return dev; > } >=20 > vq->used =3D (struct vring_used *)(uintptr_t)ring_addr_to_vva(dev, > vq, addr->used_user_addr, sizeof(struct > vring_used)); > if (vq->used =3D=3D 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > + RTE_LOG(DEBUG, VHOST_CONFIG, > "(%d) failed to find used ring address.\n", > dev->vid); > - return NULL; > + return dev; > } >=20 > if (vq->last_used_idx !=3D vq->used->idx) { > @@ -677,7 +683,7 @@ vhost_user_set_mem_table(struct virtio_net *dev, > struct VhostUserMsg *pmsg) > static int > vq_is_ready(struct vhost_virtqueue *vq) > { > - return vq && vq->desc && > + return vq && vq->desc && vq->avail && vq->used && > vq->kickfd !=3D VIRTIO_UNINITIALIZED_EVENTFD && > vq->callfd !=3D VIRTIO_UNINITIALIZED_EVENTFD; > } > @@ -986,8 +992,29 @@ vhost_user_set_req_fd(struct virtio_net *dev, > struct VhostUserMsg *msg) > } >=20 > static int > -vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg) > +is_vring_iotlb_update(struct vhost_virtqueue *vq, struct vhost_iotlb_msg > *imsg) > { > + struct vhost_vring_addr *ra; > + uint64_t start, end; > + > + start =3D imsg->iova; > + end =3D start + imsg->size; > + > + ra =3D &vq->ring_addrs; > + if (ra->desc_user_addr >=3D start && ra->desc_user_addr < end) > + return 1; > + if (ra->avail_user_addr >=3D start && ra->avail_user_addr < end) > + return 1; > + if (ra->used_user_addr >=3D start && ra->used_user_addr < end) > + return 1; > + > + return 0; > +} > + > +static int > +vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg) > +{ > + struct virtio_net *dev =3D *pdev; > struct vhost_iotlb_msg *imsg =3D &msg->payload.iotlb; > uint16_t i; > uint64_t vva; > @@ -1003,6 +1030,9 @@ vhost_user_iotlb_msg(struct virtio_net *dev, > struct VhostUserMsg *msg) >=20 > vhost_user_iotlb_cache_insert(vq, imsg->iova, vva, > imsg->size, imsg->perm); > + > + if (is_vring_iotlb_update(vq, imsg)) > + *pdev =3D dev =3D translate_ring_addresses(dev, > i); > } > break; > case VHOST_IOTLB_INVALIDATE: > @@ -1151,8 +1181,12 @@ vhost_user_msg_handler(int vid, int fd) > } >=20 > ret =3D 0; > - RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n", > - vhost_message_str[msg.request]); > + if (msg.request !=3D VHOST_USER_IOTLB_MSG) > + RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n", > + vhost_message_str[msg.request]); > + else > + RTE_LOG(DEBUG, VHOST_CONFIG, "read message %s\n", > + vhost_message_str[msg.request]); >=20 > ret =3D vhost_user_check_and_alloc_queue_pair(dev, &msg); > if (ret < 0) { > @@ -1254,7 +1288,7 @@ vhost_user_msg_handler(int vid, int fd) > break; >=20 > case VHOST_USER_IOTLB_MSG: > - ret =3D vhost_user_iotlb_msg(dev, &msg); > + ret =3D vhost_user_iotlb_msg(&dev, &msg); > break; >=20 > default: > @@ -1263,12 +1297,6 @@ vhost_user_msg_handler(int vid, int fd) >=20 > } >=20 > - /* > - * The virtio_net struct might have been reallocated on a different > - * NUMA node, so dev pointer might no more be valid. > - */ > - dev =3D get_device(vid); > - > if (msg.flags & VHOST_USER_NEED_REPLY) { > msg.payload.u64 =3D !!ret; > msg.size =3D sizeof(msg.payload.u64); > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.= c > index cdfb6f957..b75c93cf1 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -329,13 +329,23 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, > if (unlikely(vq->enabled =3D=3D 0)) > return 0; >=20 > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_lock(vq); > + > + if (unlikely(vq->access_ok =3D=3D 0)) { > + if (unlikely(vring_translate(dev, vq) < 0)) { > + count =3D 0; > + goto out; > + } > + } > + > avail_idx =3D *((volatile uint16_t *)&vq->avail->idx); > start_idx =3D vq->last_used_idx; > free_entries =3D avail_idx - start_idx; > count =3D RTE_MIN(count, free_entries); > count =3D RTE_MIN(count, (uint32_t)MAX_PKT_BURST); > if (count =3D=3D 0) > - return 0; > + goto out; >=20 > LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n", > dev->vid, start_idx, start_idx + count); > @@ -356,10 +366,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, > } >=20 > rte_prefetch0(&vq->desc[desc_indexes[0]]); > - > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_lock(vq); > - > for (i =3D 0; i < count; i++) { > uint16_t desc_idx =3D desc_indexes[i]; > int err; > @@ -394,9 +400,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, >=20 > do_data_copy_enqueue(dev, vq); >=20 > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_unlock(vq); > - > rte_smp_wmb(); >=20 > *(volatile uint16_t *)&vq->used->idx +=3D count; > @@ -412,6 +415,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) > && (vq->callfd >=3D 0)) > eventfd_write(vq->callfd, (eventfd_t)1); > +out: > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_unlock(vq); > + > return count; > } >=20 > @@ -647,9 +654,16 @@ virtio_dev_merge_rx(struct virtio_net *dev, > uint16_t queue_id, > if (unlikely(vq->enabled =3D=3D 0)) > return 0; >=20 > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_lock(vq); > + > + if (unlikely(vq->access_ok =3D=3D 0)) > + if (unlikely(vring_translate(dev, vq) < 0)) > + goto out; > + > count =3D RTE_MIN((uint32_t)MAX_PKT_BURST, count); > if (count =3D=3D 0) > - return 0; > + goto out; >=20 > vq->batch_copy_nb_elems =3D 0; >=20 > @@ -657,10 +671,6 @@ virtio_dev_merge_rx(struct virtio_net *dev, > uint16_t queue_id, >=20 > vq->shadow_used_idx =3D 0; > avail_head =3D *((volatile uint16_t *)&vq->avail->idx); > - > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_lock(vq); > - > for (pkt_idx =3D 0; pkt_idx < count; pkt_idx++) { > uint32_t pkt_len =3D pkts[pkt_idx]->pkt_len + dev- > >vhost_hlen; >=20 > @@ -689,9 +699,6 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t > queue_id, >=20 > do_data_copy_enqueue(dev, vq); >=20 > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_unlock(vq); > - > if (likely(vq->shadow_used_idx)) { > flush_shadow_used_ring(dev, vq); >=20 > @@ -704,6 +711,10 @@ virtio_dev_merge_rx(struct virtio_net *dev, > uint16_t queue_id, > eventfd_write(vq->callfd, (eventfd_t)1); > } >=20 > +out: > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_unlock(vq); > + > return pkt_idx; > } >=20 > @@ -1173,6 +1184,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, >=20 > vq->batch_copy_nb_elems =3D 0; >=20 > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_lock(vq); > + > + if (unlikely(vq->access_ok =3D=3D 0)) > + if (unlikely(vring_translate(dev, vq) < 0)) > + goto out; > + > if (unlikely(dev->dequeue_zero_copy)) { > struct zcopy_mbuf *zmbuf, *next; > int nr_updated =3D 0; > @@ -1262,10 +1280,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, >=20 > /* Prefetch descriptor index. */ > rte_prefetch0(&vq->desc[desc_indexes[0]]); > - > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_lock(vq); > - > for (i =3D 0; i < count; i++) { > struct vring_desc *desc; > uint16_t sz, idx; > @@ -1329,9 +1343,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); > } > } > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_unlock(vq); > - > vq->last_avail_idx +=3D i; >=20 > if (likely(dev->dequeue_zero_copy =3D=3D 0)) { > @@ -1341,6 +1352,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > } >=20 > out: > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_unlock(vq); > + > if (unlikely(rarp_mbuf !=3D NULL)) { > /* > * Inject it to the head of "pkts" array, so that switch's mac > -- > 2.13.6 Hi, Maxime I met one issue with your patch set during the v17.11 test. The test scenario is following,=20 1. Bind one NIC, use test-pmd set vhost-user with 2 queue usertools/dpdk-devbind.py --bind=3Digb_uio 0000:05:00.0 ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4 --socket-mem 1024,1024= \ --vdev 'net_vhost0,iface=3Dvhost-net,queues=3D2' - -i --rxq=3D2 --txq=3D2 -= -nb-cores=3D2 --rss-ip 2. Launch qemu with virtio device which has 2 queue=20 3. In VM, launch testpmd with virtio-pmd using only 1 queue. x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i --txqflags=3D0xf01= \ --rxq=3D1 --txq=3D1 --rss-ip --nb-cores=3D1 First,=20 commit 09927b5249694bad1c094d3068124673722e6b8f vhost: translate ring addresses when IOMMU enabled The patch causes no traffic in PVP test. but link status is still up in vho= st-user. Second,=20 eefac9536a901a1f0bb52aa3b6fec8f375f09190=20 vhost: postpone device creation until rings are mapped The patch causes link status "down" in vhost-user. Could you have a check at your side? Thanks. BRs Lei