From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0098D1B119 for ; Mon, 29 Apr 2019 12:55:12 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Apr 2019 03:55:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,409,1549958400"; d="scan'208";a="153235491" Received: from npg_dpdk_virtio_tbie_2.sh.intel.com (HELO ___) ([10.67.104.173]) by FMSMGA003.fm.intel.com with ESMTP; 29 Apr 2019 03:55:09 -0700 Date: Mon, 29 Apr 2019 18:54:25 +0800 From: Tiwei Bie To: lin li Cc: maxime.coquelin@redhat.com, zhihong.wang@intel.com, dev@dpdk.org, dariusz.stojaczyk@intel.com, changpeng.liu@intel.com, james.r.harris@intel.com, lilin24 , Ni Xun , Zhang Yu , mst@redhat.com Message-ID: <20190429105425.GA12868@___> References: <1556190633-8099-1-git-send-email-chuckylinchuckylin@gmail.com> <1556271621-8594-1-git-send-email-chuckylinchuckylin@gmail.com> <20190428111748.GA16470@___> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH v3] vhost: support inflight share memory protocol feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Apr 2019 10:55:13 -0000 On Mon, Apr 29, 2019 at 12:07:05PM +0800, lin li wrote: > Tiwei Bie 于2019年4月28日周日 下午7:18写道: > > On Fri, Apr 26, 2019 at 05:40:21AM -0400, Li Lin wrote: [...] > > > @@ -98,12 +102,26 @@ struct rte_vhost_memory { > > > struct rte_vhost_mem_region regions[]; > > > }; > > > > > > +typedef struct VhostUserInflightEntry { > > > + uint8_t inflight; > > > +} VhostUserInflightEntry; > > > + > > > +typedef struct VhostInflightInfo { > > > + uint16_t version; > > > + uint16_t last_inflight_io; > > > + uint16_t used_idx; > > > + VhostUserInflightEntry desc[0]; > > > +} VhostInflightInfo; > > > > Is there any details on above structure? Why does it not match > > QueueRegionSplit or QueueRegionPacked structures described in > > qemu/docs/interop/vhost-user.txt? > > Qemu have its vhost-user backend, Do you mean contrib/libvhost-user in QEMU? > qemu did the submission of IO in it. What does this mean? > The implementation of dpdk is more general. It is just to mark inflight entry. Based on the discussion in below threads: https://patchwork.kernel.org/patch/10753893/ https://patchwork.kernel.org/patch/10817735/ IIUC, above structure is the extra buffer allocated by slave to store the information of inflight descriptors and share with master for persistent. All vhost-user backends' implementation should follow the vhost-user protocol (including the format of above structure) defined in qemu/docs/interop/vhost-user.txt. Right? > The submission of inflight entry is handle over to different backends. > They have their own ways to handle it, such as spdk. > So there are some differences in data structure. > > > [...] > > > +static int > > > +vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg, > > > + int main_fd __rte_unused) > > > +{ > > > + int fd, i; > > > + uint64_t mmap_size, mmap_offset; > > > + uint16_t num_queues, queue_size; > > > + uint32_t pervq_inflight_size; > > > + void *rc; > > > + struct vhost_virtqueue *vq; > > > + struct virtio_net *dev = *pdev; > > > + > > > + fd = msg->fds[0]; > > > + if (msg->size != sizeof(msg->payload.inflight) || fd < 0) { > > > + RTE_LOG(ERR, VHOST_CONFIG, "Invalid set_inflight_fd message size is %d,fd is %d\n", > > > + msg->size, fd); > > > + return -1; > > > > Ditto. > > > > > + } > > > + > > > + mmap_size = msg->payload.inflight.mmap_size; > > > + mmap_offset = msg->payload.inflight.mmap_offset; > > > + num_queues = msg->payload.inflight.num_queues; > > > + queue_size = msg->payload.inflight.queue_size; > > > + pervq_inflight_size = get_pervq_shm_size(queue_size); > > > + > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd mmap_size: %lu\n", mmap_size); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd mmap_offset: %lu\n", mmap_offset); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd num_queues: %u\n", num_queues); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd queue_size: %u\n", queue_size); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd fd: %d\n", fd); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd pervq_inflight_size: %d\n", > > > + pervq_inflight_size); > > > + > > > + if (dev->inflight_info.addr) > > > + munmap(dev->inflight_info.addr, dev->inflight_info.size); > > > + > > > + rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, > > > + fd, mmap_offset); > > > > Why call it rc? Maybe addr is a better name? > > In some scenarios, shared memory is reallocated or resized by qemu, so > again mmap is needed. In which case the shared memory will be reallocated or resized by QEMU? > > > > > > + if (rc == MAP_FAILED) { > > > + RTE_LOG(ERR, VHOST_CONFIG, "failed to mmap share memory.\n"); > > > + return -1; > > > > Should always return RTE_VHOST_MSG_RESULT_* in > > message handler. > > > > > + } > > > + > > > + if (dev->inflight_info.fd) > > > + close(dev->inflight_info.fd); > > > + > > > + dev->inflight_info.fd = fd; > > > + dev->inflight_info.addr = rc; > > > + dev->inflight_info.size = mmap_size; > > > + > > > + for (i = 0; i < num_queues; i++) { > > > + vq = dev->virtqueue[i]; > > > + vq->inflight = (VhostInflightInfo *)rc; > > > + rc = (void *)((char *)rc + pervq_inflight_size); > > > + } > > > + > > > + return RTE_VHOST_MSG_RESULT_OK; > > > +} [...] > > > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h > > > index 2a650fe4b..35f969b1b 100644 > > > --- a/lib/librte_vhost/vhost_user.h > > > +++ b/lib/librte_vhost/vhost_user.h > > > @@ -23,7 +23,8 @@ > > > (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \ > > > (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ > > > (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ > > > - (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT)) > > > + (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \ > > > + (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) > > > > It will advertise this feature for builtin net and crypto > > backends. It's probably not what you intended. > > > > Indeed, this feature is mainly used for spdk-like backends. You mean > this function is disabled by default? External backends should use rte_vhost_driver_set_protocol_features() to advertise the protocol features they support. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 0387BA0679 for ; Mon, 29 Apr 2019 12:55:14 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EEA921B129; Mon, 29 Apr 2019 12:55:13 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0098D1B119 for ; Mon, 29 Apr 2019 12:55:12 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Apr 2019 03:55:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,409,1549958400"; d="scan'208";a="153235491" Received: from npg_dpdk_virtio_tbie_2.sh.intel.com (HELO ___) ([10.67.104.173]) by FMSMGA003.fm.intel.com with ESMTP; 29 Apr 2019 03:55:09 -0700 Date: Mon, 29 Apr 2019 18:54:25 +0800 From: Tiwei Bie To: lin li Cc: maxime.coquelin@redhat.com, zhihong.wang@intel.com, dev@dpdk.org, dariusz.stojaczyk@intel.com, changpeng.liu@intel.com, james.r.harris@intel.com, lilin24 , Ni Xun , Zhang Yu , mst@redhat.com Message-ID: <20190429105425.GA12868@___> References: <1556190633-8099-1-git-send-email-chuckylinchuckylin@gmail.com> <1556271621-8594-1-git-send-email-chuckylinchuckylin@gmail.com> <20190428111748.GA16470@___> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH v3] vhost: support inflight share memory protocol feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190429105425.kFfBFg_rwx6_ww9rGUL4ObnvDWi7YffFK0pJbZaQ_lE@z> On Mon, Apr 29, 2019 at 12:07:05PM +0800, lin li wrote: > Tiwei Bie 于2019年4月28日周日 下午7:18写道: > > On Fri, Apr 26, 2019 at 05:40:21AM -0400, Li Lin wrote: [...] > > > @@ -98,12 +102,26 @@ struct rte_vhost_memory { > > > struct rte_vhost_mem_region regions[]; > > > }; > > > > > > +typedef struct VhostUserInflightEntry { > > > + uint8_t inflight; > > > +} VhostUserInflightEntry; > > > + > > > +typedef struct VhostInflightInfo { > > > + uint16_t version; > > > + uint16_t last_inflight_io; > > > + uint16_t used_idx; > > > + VhostUserInflightEntry desc[0]; > > > +} VhostInflightInfo; > > > > Is there any details on above structure? Why does it not match > > QueueRegionSplit or QueueRegionPacked structures described in > > qemu/docs/interop/vhost-user.txt? > > Qemu have its vhost-user backend, Do you mean contrib/libvhost-user in QEMU? > qemu did the submission of IO in it. What does this mean? > The implementation of dpdk is more general. It is just to mark inflight entry. Based on the discussion in below threads: https://patchwork.kernel.org/patch/10753893/ https://patchwork.kernel.org/patch/10817735/ IIUC, above structure is the extra buffer allocated by slave to store the information of inflight descriptors and share with master for persistent. All vhost-user backends' implementation should follow the vhost-user protocol (including the format of above structure) defined in qemu/docs/interop/vhost-user.txt. Right? > The submission of inflight entry is handle over to different backends. > They have their own ways to handle it, such as spdk. > So there are some differences in data structure. > > > [...] > > > +static int > > > +vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg, > > > + int main_fd __rte_unused) > > > +{ > > > + int fd, i; > > > + uint64_t mmap_size, mmap_offset; > > > + uint16_t num_queues, queue_size; > > > + uint32_t pervq_inflight_size; > > > + void *rc; > > > + struct vhost_virtqueue *vq; > > > + struct virtio_net *dev = *pdev; > > > + > > > + fd = msg->fds[0]; > > > + if (msg->size != sizeof(msg->payload.inflight) || fd < 0) { > > > + RTE_LOG(ERR, VHOST_CONFIG, "Invalid set_inflight_fd message size is %d,fd is %d\n", > > > + msg->size, fd); > > > + return -1; > > > > Ditto. > > > > > + } > > > + > > > + mmap_size = msg->payload.inflight.mmap_size; > > > + mmap_offset = msg->payload.inflight.mmap_offset; > > > + num_queues = msg->payload.inflight.num_queues; > > > + queue_size = msg->payload.inflight.queue_size; > > > + pervq_inflight_size = get_pervq_shm_size(queue_size); > > > + > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd mmap_size: %lu\n", mmap_size); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd mmap_offset: %lu\n", mmap_offset); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd num_queues: %u\n", num_queues); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd queue_size: %u\n", queue_size); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd fd: %d\n", fd); > > > + RTE_LOG(INFO, VHOST_CONFIG, > > > + "set_inflight_fd pervq_inflight_size: %d\n", > > > + pervq_inflight_size); > > > + > > > + if (dev->inflight_info.addr) > > > + munmap(dev->inflight_info.addr, dev->inflight_info.size); > > > + > > > + rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, > > > + fd, mmap_offset); > > > > Why call it rc? Maybe addr is a better name? > > In some scenarios, shared memory is reallocated or resized by qemu, so > again mmap is needed. In which case the shared memory will be reallocated or resized by QEMU? > > > > > > + if (rc == MAP_FAILED) { > > > + RTE_LOG(ERR, VHOST_CONFIG, "failed to mmap share memory.\n"); > > > + return -1; > > > > Should always return RTE_VHOST_MSG_RESULT_* in > > message handler. > > > > > + } > > > + > > > + if (dev->inflight_info.fd) > > > + close(dev->inflight_info.fd); > > > + > > > + dev->inflight_info.fd = fd; > > > + dev->inflight_info.addr = rc; > > > + dev->inflight_info.size = mmap_size; > > > + > > > + for (i = 0; i < num_queues; i++) { > > > + vq = dev->virtqueue[i]; > > > + vq->inflight = (VhostInflightInfo *)rc; > > > + rc = (void *)((char *)rc + pervq_inflight_size); > > > + } > > > + > > > + return RTE_VHOST_MSG_RESULT_OK; > > > +} [...] > > > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h > > > index 2a650fe4b..35f969b1b 100644 > > > --- a/lib/librte_vhost/vhost_user.h > > > +++ b/lib/librte_vhost/vhost_user.h > > > @@ -23,7 +23,8 @@ > > > (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \ > > > (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ > > > (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ > > > - (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT)) > > > + (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \ > > > + (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) > > > > It will advertise this feature for builtin net and crypto > > backends. It's probably not what you intended. > > > > Indeed, this feature is mainly used for spdk-like backends. You mean > this function is disabled by default? External backends should use rte_vhost_driver_set_protocol_features() to advertise the protocol features they support.