From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BB9C441D52; Thu, 23 Feb 2023 17:12:22 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9E84F41153; Thu, 23 Feb 2023 17:12:22 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 2DEDB40DFD for ; Thu, 23 Feb 2023 17:12:21 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677168740; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PwMfkX+RgRVuSTOLiNfQTDy07SXtmhHpWmAd0Rc6NFE=; b=ZCg94DepmKlw1Ee8MKzq5IbaXog7FZxmxe02D6/5YFaDwKygvHZsSeW/yU+/PnEvY3CIlG 7acFVTO4OEfCZvZcICe15fDhimDjvlKJjPNQA5gfFyOBaKcZaWrKejzk2W1pOd7I5tiGiO DohKKswhU6NS4O7ZVH42Ml8e3HKHLlU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-608--D_F8EEzOe63P25cGIcziA-1; Thu, 23 Feb 2023 11:12:17 -0500 X-MC-Unique: -D_F8EEzOe63P25cGIcziA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F2AF98588E3; Thu, 23 Feb 2023 16:12:16 +0000 (UTC) Received: from [10.39.208.21] (unknown [10.39.208.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E9A0FC15BA0; Thu, 23 Feb 2023 16:12:15 +0000 (UTC) Message-ID: Date: Thu, 23 Feb 2023 17:12:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 To: Mike Pattrick , dev@dpdk.org Cc: david.marchand@redhat.com, chenbo.xia@intel.com References: <20230222214342.1092333-1-mkp@redhat.com> <20230223043551.1108541-1-mkp@redhat.com> From: Maxime Coquelin Subject: Re: [PATCH v2] vhost: fix madvise arguments alignment In-Reply-To: <20230223043551.1108541-1-mkp@redhat.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Mike, Thanks for looking into this issue. On 2/23/23 05:35, Mike Pattrick wrote: > The arguments passed to madvise should be aligned to the alignment of > the backing memory. Now we keep track of each regions alignment and use > then when setting coredump preferences. To facilitate this, a new member > was added to rte_vhost_mem_region. A new function was added to easily > translate memory address back to region alignment. Unneeded calls to > madvise were reduced, as the cache removal case should already be > covered by the cache insertion case. The previously inline function > mem_set_dump was removed from a header file and made not inline. > > Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps") > > Signed-off-by: Mike Pattrick > --- > Since v1: > - Corrected a cast for 32bit compiles > --- > lib/vhost/iotlb.c | 9 +++--- > lib/vhost/rte_vhost.h | 1 + > lib/vhost/vhost.h | 12 ++------ > lib/vhost/vhost_user.c | 63 +++++++++++++++++++++++++++++++++++------- > 4 files changed, 60 insertions(+), 25 deletions(-) > > diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c > index a0b8fd7302..5293507b63 100644 > --- a/lib/vhost/iotlb.c > +++ b/lib/vhost/iotlb.c > @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq) > rte_rwlock_write_lock(&vq->iotlb_lock); > > RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) { > - mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true); Hmm, it should have been called with enable=false here since we are removing the entry from the IOTLB cache. It should be kept in order to "DONTDUMP" pages evicted from the cache. > TAILQ_REMOVE(&vq->iotlb_list, node, next); > vhost_user_iotlb_pool_put(vq, node); > } > @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq) > > RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) { > if (!entry_idx) { > - mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true); Same here. > TAILQ_REMOVE(&vq->iotlb_list, node, next); > vhost_user_iotlb_pool_put(vq, node); > vq->iotlb_cache_nr--; > @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq > vhost_user_iotlb_pool_put(vq, new_node); > goto unlock; > } else if (node->iova > new_node->iova) { > - mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true); > + mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true, > + hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr)); > TAILQ_INSERT_BEFORE(node, new_node, next); > vq->iotlb_cache_nr++; > goto unlock; > } > } > > - mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true); > + mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true, > + hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr)); > TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next); > vq->iotlb_cache_nr++; > > @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq, > break; > > if (iova < node->iova + node->size) { > - mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true); > TAILQ_REMOVE(&vq->iotlb_list, node, next); > vhost_user_iotlb_pool_put(vq, node); > vq->iotlb_cache_nr--; > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h > index a395843fe9..c5c97ea67e 100644 > --- a/lib/vhost/rte_vhost.h > +++ b/lib/vhost/rte_vhost.h > @@ -136,6 +136,7 @@ struct rte_vhost_mem_region { > void *mmap_addr; > uint64_t mmap_size; > int fd; > + uint64_t alignment; This is not possible to do this as it breaks the ABI. You have to store the information somewhere else, or simply call get_blk_size() in hua_to_alignment() since the fd is not closed. > }; > > /** > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h > index 5750f0c005..a2467ba509 100644 > --- a/lib/vhost/vhost.h > +++ b/lib/vhost/vhost.h > @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m) > return true; > } > > -static __rte_always_inline void > -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size, __rte_unused bool enable) > -{ > -#ifdef MADV_DONTDUMP > - if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) { > - rte_log(RTE_LOG_INFO, vhost_config_log_level, > - "VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno)); > - } > -#endif > -} > +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr); > +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment); > #endif /* _VHOST_NET_CDEV_H_ */ > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c > index d702d082dd..6d09597fbe 100644 > --- a/lib/vhost/vhost_user.c > +++ b/lib/vhost/vhost_user.c > @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq) > return log_gpa; > } > > +uint64_t > +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr) > +{ > + struct rte_vhost_mem_region *r; > + uint32_t i; > + uintptr_t hua = (uintptr_t)ptr; > + > + for (i = 0; i < mem->nregions; i++) { > + r = &mem->regions[i]; > + if (hua >= r->host_user_addr && > + hua < r->host_user_addr + r->size) { > + return r->alignment; > + } > + } > + > + /* If region isn't found, don't align at all */ > + return 1; > +} > + > +void > +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz) > +{ > +#ifdef MADV_DONTDUMP > + void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz); > + uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz); > + size_t len = end - (uintptr_t)start; > + > + if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) { > + rte_log(RTE_LOG_INFO, vhost_config_log_level, > + "VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno)); > + } > +#endif > +} > + > static void > translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > { > @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > return; > } > > + mem_set_dump(vq->desc_packed, len, true, > + hua_to_alignment(dev->mem, vq->desc_packed)); > numa_realloc(&dev, &vq); > *pdev = dev; > *pvq = vq; > @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > return; > } > > + mem_set_dump(vq->driver_event, len, true, > + hua_to_alignment(dev->mem, vq->driver_event)); > len = sizeof(struct vring_packed_desc_event); > vq->device_event = (struct vring_packed_desc_event *) > (uintptr_t)ring_addr_to_vva(dev, > @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > return; > } > > - mem_set_dump(vq->desc_packed, len, true); > - mem_set_dump(vq->driver_event, len, true); > - mem_set_dump(vq->device_event, len, true); > + mem_set_dump(vq->device_event, len, true, > + hua_to_alignment(dev->mem, vq->device_event)); > vq->access_ok = true; > return; > } > @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > return; > } > > + mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem, vq->desc)); > numa_realloc(&dev, &vq); > *pdev = dev; > *pvq = vq; > @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > return; > } > > + mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem, vq->avail)); > len = sizeof(struct vring_used) + > sizeof(struct vring_used_elem) * vq->size; > if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) > @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > return; > } > > + mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem, vq->used)); > + > if (vq->last_used_idx != vq->used->idx) { > VHOST_LOG_CONFIG(dev->ifname, WARNING, > "last_used_idx (%u) and vq->used->idx (%u) mismatches;\n", > @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq) > "some packets maybe resent for Tx and dropped for Rx\n"); > } > > - mem_set_dump(vq->desc, len, true); > - mem_set_dump(vq->avail, len, true); > - mem_set_dump(vq->used, len, true); > vq->access_ok = true; > > VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc: %p\n", vq->desc); > @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev, > region->mmap_addr = mmap_addr; > region->mmap_size = mmap_size; > region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset; > - mem_set_dump(mmap_addr, mmap_size, false); > + region->alignment = alignment; > + mem_set_dump(mmap_addr, mmap_size, false, alignment); > > if (dev->async_copy) { > if (add_guest_pages(dev, region, alignment) < 0) { > @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const char *name, size_t size, int *f > return NULL; > } > > - mem_set_dump(ptr, size, false); > *fd = mfd; > return ptr; > } > @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev, > uint64_t pervq_inflight_size, mmap_size; > uint16_t num_queues, queue_size; > struct virtio_net *dev = *pdev; > + uint64_t alignment; > int fd, i, j; > int numa_node = SOCKET_ID_ANY; > void *addr; > @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev, > dev->inflight_info->fd = -1; > } > > + alignment = get_blk_size(fd); > + mem_set_dump(addr, mmap_size, false, alignment); > dev->inflight_info->addr = addr; > dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size = mmap_size; > dev->inflight_info->fd = ctx->fds[0] = fd; > @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, > dev->inflight_info->fd = -1; > } > > - mem_set_dump(addr, mmap_size, false); > dev->inflight_info->fd = fd; > dev->inflight_info->addr = addr; > dev->inflight_info->size = mmap_size; > + mem_set_dump(addr, mmap_size, false, get_blk_size(fd)); > > for (i = 0; i < num_queues; i++) { > vq = dev->virtqueue[i]; > @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, > struct virtio_net *dev = *pdev; > int fd = ctx->fds[0]; > uint64_t size, off; > + uint64_t alignment; > void *addr; > uint32_t i; > > @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, > * fail when offset is not page size aligned. > */ > addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > + alignment = get_blk_size(fd); > close(fd); > if (addr == MAP_FAILED) { > VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base failed!\n"); > @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, > dev->log_addr = (uint64_t)(uintptr_t)addr; > dev->log_base = dev->log_addr + off; > dev->log_size = size; > - mem_set_dump(addr, size, false); > + mem_set_dump(addr, size + off, false, alignment); > > for (i = 0; i < dev->nr_vring; i++) { > struct vhost_virtqueue *vq = dev->virtqueue[i];