From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BE382A0524; Tue, 13 Apr 2021 09:11:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 31E4E160B9A; Tue, 13 Apr 2021 09:11:56 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mails.dpdk.org (Postfix) with ESMTP id DC051160B8A for ; Tue, 13 Apr 2021 09:11:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618297913; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GXFaBfD5Pg1UBFN7BFZI2P8GQF8os6BPy3OFt/+aiuQ=; b=KMYGY/XovXEexIDM2mK98QECpZJQkKgDhJomjgD2icziSmY6K660tn1mwRJAHopZd35V34 JX4oLUQVLMOQvH84ZC0+7mCbGnpKXrOrbSxRF546UkZNInZ/TeSOIVrec09vvcYvz8uhWo 3JtABWrauU3f0CTVlHBlauZbyLbC7Ec= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-137-_MUF77m6NhCAzsDpxUUhbQ-1; Tue, 13 Apr 2021 03:11:51 -0400 X-MC-Unique: _MUF77m6NhCAzsDpxUUhbQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A1CB21883520; Tue, 13 Apr 2021 07:11:49 +0000 (UTC) Received: from [10.36.110.28] (unknown [10.36.110.28]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D5D031F44D; Tue, 13 Apr 2021 07:11:47 +0000 (UTC) To: Cheng Jiang , chenbo.xia@intel.com Cc: dev@dpdk.org, jiayu.hu@intel.com, yvonnex.yang@intel.com, yinan.wang@intel.com, yong.liu@intel.com References: <20210317085426.10119-1-Cheng1.jiang@intel.com> <20210412113430.17587-1-Cheng1.jiang@intel.com> <20210412113430.17587-2-Cheng1.jiang@intel.com> From: Maxime Coquelin Message-ID: <2ccb21b8-bff9-7334-6f63-adceef1b7641@redhat.com> Date: Tue, 13 Apr 2021 09:11:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210412113430.17587-2-Cheng1.jiang@intel.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v5 1/4] vhost: abstract and reorganize async split ring code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Cheng, On 4/12/21 1:34 PM, Cheng Jiang wrote: > In order to improve code efficiency and readability when async packed > ring support is enabled. This patch abstract some functions like > shadow_ring_store and write_back_completed_descs_split. And improve > the efficiency of some pointer offset calculation. > > Signed-off-by: Cheng Jiang > --- > lib/librte_vhost/virtio_net.c | 146 +++++++++++++++++++--------------- > 1 file changed, 84 insertions(+), 62 deletions(-) > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index ff3987860..c43ab0093 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -1458,6 +1458,29 @@ virtio_dev_rx_async_get_info_idx(uint16_t pkts_idx, > (vq_size - n_inflight + pkts_idx) & (vq_size - 1); > } > > +static __rte_always_inline void > +shadow_ring_store(struct vhost_virtqueue *vq, void *shadow_ring, void *d_ring, > + uint16_t s_idx, uint16_t d_idx, > + uint16_t count, uint16_t elem_size) > +{ > + if (d_idx + count <= vq->size) { > + rte_memcpy((void *)((uintptr_t)d_ring + d_idx * elem_size), > + (void *)((uintptr_t)shadow_ring + s_idx * elem_size), > + count * elem_size); > + } else { > + uint16_t size = vq->size - d_idx; > + > + rte_memcpy((void *)((uintptr_t)d_ring + d_idx * elem_size), > + (void *)((uintptr_t)shadow_ring + s_idx * elem_size), > + size * elem_size); > + > + rte_memcpy((void *)((uintptr_t)d_ring), > + (void *)((uintptr_t)shadow_ring + > + (s_idx + size) * elem_size), > + (count - size) * elem_size); > + } > +} > + > static __rte_noinline uint32_t > virtio_dev_rx_async_submit_split(struct virtio_net *dev, > struct vhost_virtqueue *vq, uint16_t queue_id, > @@ -1478,6 +1501,7 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > struct rte_vhost_iov_iter *dst_it = it_pool + 1; > uint16_t slot_idx = 0; > uint16_t segs_await = 0; > + uint16_t iovec_idx = 0, it_idx = 0; > struct async_inflight_info *pkts_info = vq->async_pkts_info; > uint32_t n_pkts = 0, pkt_err = 0; > uint32_t num_async_pkts = 0, num_done_pkts = 0; > @@ -1513,27 +1537,32 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > > if (async_mbuf_to_desc(dev, vq, pkts[pkt_idx], > buf_vec, nr_vec, num_buffers, > - src_iovec, dst_iovec, src_it, dst_it) < 0) { > + &src_iovec[iovec_idx], > + &dst_iovec[iovec_idx], > + &src_it[it_idx], > + &dst_it[it_idx]) < 0) { > vq->shadow_used_idx -= num_buffers; > break; > } > > slot_idx = (vq->async_pkts_idx + num_async_pkts) & > (vq->size - 1); > - if (src_it->count) { > + if (src_it[it_idx].count) { > uint16_t from, to; > > - async_fill_desc(&tdes[pkt_burst_idx++], src_it, dst_it); > + async_fill_desc(&tdes[pkt_burst_idx++], > + &src_it[it_idx], > + &dst_it[it_idx]); > pkts_info[slot_idx].descs = num_buffers; > pkts_info[slot_idx].mbuf = pkts[pkt_idx]; > async_pkts_log[num_async_pkts].pkt_idx = pkt_idx; > async_pkts_log[num_async_pkts++].last_avail_idx = > vq->last_avail_idx; > - src_iovec += src_it->nr_segs; > - dst_iovec += dst_it->nr_segs; > - src_it += 2; > - dst_it += 2; > - segs_await += src_it->nr_segs; > + > + iovec_idx += src_it[it_idx].nr_segs; > + it_idx += 2; > + > + segs_await += src_it[it_idx].nr_segs; > > /** > * recover shadow used ring and keep DMA-occupied > @@ -1541,23 +1570,12 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > */ > from = vq->shadow_used_idx - num_buffers; > to = vq->async_desc_idx & (vq->size - 1); > - if (num_buffers + to <= vq->size) { > - rte_memcpy(&vq->async_descs_split[to], > - &vq->shadow_used_split[from], > - num_buffers * > - sizeof(struct vring_used_elem)); > - } else { > - int size = vq->size - to; > - > - rte_memcpy(&vq->async_descs_split[to], > - &vq->shadow_used_split[from], > - size * > - sizeof(struct vring_used_elem)); > - rte_memcpy(vq->async_descs_split, > - &vq->shadow_used_split[from + > - size], (num_buffers - size) * > - sizeof(struct vring_used_elem)); > - } > + > + shadow_ring_store(vq, vq->shadow_used_split, > + vq->async_descs_split, > + from, to, num_buffers, > + sizeof(struct vring_used_elem)); > + I'm not convinced with this rework. I think it is good to create a dedicated function for this to simplify this huge virtio_dev_rx_async_submit_split() function. But we should have a dedicated version for split ring. Having a single function for both split and packed ring does not improve readability, and unlikely improve performance. > vq->async_desc_idx += num_buffers; > vq->shadow_used_idx -= num_buffers; > } else > @@ -1575,10 +1593,9 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > BUF_VECTOR_MAX))) { > n_pkts = vq->async_ops.transfer_data(dev->vid, > queue_id, tdes, 0, pkt_burst_idx); > - src_iovec = vec_pool; > - dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1); > - src_it = it_pool; > - dst_it = it_pool + 1; > + iovec_idx = 0; > + it_idx = 0; > + > segs_await = 0; > vq->async_pkts_inflight_n += n_pkts; > > @@ -1639,6 +1656,43 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > return pkt_idx; > } > > +static __rte_always_inline void > +write_back_completed_descs_split(struct vhost_virtqueue *vq, uint16_t n_descs) > +{ > + uint16_t nr_left = n_descs; > + uint16_t nr_copy; > + uint16_t to, from; > + > + do { > + from = vq->last_async_desc_idx & (vq->size - 1); > + nr_copy = nr_left + from <= vq->size ? nr_left : > + vq->size - from; > + to = vq->last_used_idx & (vq->size - 1); > + > + if (to + nr_copy <= vq->size) { > + rte_memcpy(&vq->used->ring[to], > + &vq->async_descs_split[from], > + nr_copy * > + sizeof(struct vring_used_elem)); > + } else { > + uint16_t size = vq->size - to; > + > + rte_memcpy(&vq->used->ring[to], > + &vq->async_descs_split[from], > + size * > + sizeof(struct vring_used_elem)); > + rte_memcpy(vq->used->ring, &vq->used->ring[0] for consistency > + &vq->async_descs_split[from + > + size], (nr_copy - size) * > + sizeof(struct vring_used_elem)); Lines can now be up to 100 chars. Please take the opportunity to indent properly not to have parts of each args being put on the same line. It will help readability. > + } > + > + vq->last_async_desc_idx += nr_copy; > + vq->last_used_idx += nr_copy; > + nr_left -= nr_copy; > + } while (nr_left > 0); > +} > + > uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, > struct rte_mbuf **pkts, uint16_t count) > { > @@ -1695,39 +1749,7 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, > vq->async_pkts_inflight_n -= n_pkts_put; > > if (likely(vq->enabled && vq->access_ok)) { > - uint16_t nr_left = n_descs; > - uint16_t nr_copy; > - uint16_t to; > - > - /* write back completed descriptors to used ring */ > - do { > - from = vq->last_async_desc_idx & (vq->size - 1); > - nr_copy = nr_left + from <= vq->size ? nr_left : > - vq->size - from; > - to = vq->last_used_idx & (vq->size - 1); > - > - if (to + nr_copy <= vq->size) { > - rte_memcpy(&vq->used->ring[to], > - &vq->async_descs_split[from], > - nr_copy * > - sizeof(struct vring_used_elem)); > - } else { > - uint16_t size = vq->size - to; > - > - rte_memcpy(&vq->used->ring[to], > - &vq->async_descs_split[from], > - size * > - sizeof(struct vring_used_elem)); > - rte_memcpy(vq->used->ring, > - &vq->async_descs_split[from + > - size], (nr_copy - size) * > - sizeof(struct vring_used_elem)); > - } > - > - vq->last_async_desc_idx += nr_copy; > - vq->last_used_idx += nr_copy; > - nr_left -= nr_copy; > - } while (nr_left > 0); > + write_back_completed_descs_split(vq, n_descs); > > __atomic_add_fetch(&vq->used->idx, n_descs, __ATOMIC_RELEASE); > vhost_vring_call_split(dev, vq); >