From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id DD8766833; Thu, 6 Dec 2018 14:48:21 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0DD7B72D8A; Thu, 6 Dec 2018 13:48:21 +0000 (UTC) Received: from redhat.com (ovpn-121-223.rdu2.redhat.com [10.10.121.223]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97BFF5D967; Thu, 6 Dec 2018 13:48:14 +0000 (UTC) Date: Thu, 6 Dec 2018 08:48:14 -0500 From: "Michael S. Tsirkin" To: Jason Wang Cc: Ilya Maximets , Maxime Coquelin , dev@dpdk.org, jfreimann@redhat.com, tiwei.bie@intel.com, zhihong.wang@intel.com, stable@dpdk.org Message-ID: <20181206083048-mutt-send-email-mst@kernel.org> References: <20181205094957.1938-2-maxime.coquelin@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 06 Dec 2018 13:48:21 +0000 (UTC) Subject: Re: [dpdk-dev] [1/5] vhost: enforce avail index and desc read ordering X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2018 13:48:22 -0000 On Thu, Dec 06, 2018 at 12:17:38PM +0800, Jason Wang wrote: > > On 2018/12/5 下午7:30, Ilya Maximets wrote: > > On 05.12.2018 12:49, Maxime Coquelin wrote: > > > A read barrier is required to ensure the ordering between > > > available index and the descriptor reads is enforced. > > > > > > Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application") > > > Cc: stable@dpdk.org > > > > > > Reported-by: Jason Wang > > > Signed-off-by: Maxime Coquelin > > > --- > > > lib/librte_vhost/virtio_net.c | 12 ++++++++++++ > > > 1 file changed, 12 insertions(+) > > > > > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > > > index 5e1a1a727..f11ebb54f 100644 > > > --- a/lib/librte_vhost/virtio_net.c > > > +++ b/lib/librte_vhost/virtio_net.c > > > @@ -791,6 +791,12 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > > > rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]); > > > avail_head = *((volatile uint16_t *)&vq->avail->idx); > > > + /* > > > + * The ordering between avail index and > > > + * desc reads needs to be enforced. > > > + */ > > > + rte_smp_rmb(); > > > + > > Hmm. This looks weird to me. > > Could you please describe the bad scenario here? (It'll be good to have it > > in commit message too) > > > > As I understand, you're enforcing the read of avail->idx to happen before > > reading the avail->ring[avail_idx]. Is it correct? > > > > But we have following code sequence: > > > > 1. read avail->idx (avail_head). > > 2. check that last_avail_idx != avail_head. > > 3. read from the ring using last_avail_idx. > > > > So, there is a strict dependency between all 3 steps and the memory > > transaction will be finished at the step #2 in any case. There is no > > way to read the ring before reading the avail->idx. > > > > Am I missing something? > > > Nope, I kind of get what you meaning now. And even if we will > > 4. read descriptor from descriptor ring using the id read from 3 > > 5. read descriptor content according to the address from 4 > > They still have dependent memory access. So there's no need for rmb. I am pretty sure on some architectures there is a need for a barrier here. This is an execution dependency since avail_head is not used as an index. And reads can be speculated. So the read from the ring can be speculated and execute before the read of avail_head and the check. However SMP rmb is/should be free on x86. So unless someone on this thread is actually testing performance on non-x86, you are both wasting cycles discussing removal of nop macros and also risk pushing untested software on users. > > > > > > for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { > > > uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen; > > > uint16_t nr_vec = 0; > > > @@ -1373,6 +1379,12 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > > > if (free_entries == 0) > > > return 0; > > > + /* > > > + * The ordering between avail index and > > > + * desc reads needs to be enforced. > > > + */ > > > + rte_smp_rmb(); > > > + > > This one is strange too. > > > > free_entries = *((volatile uint16_t *)&vq->avail->idx) - > > vq->last_avail_idx; > > if (free_entries == 0) > > return 0; > > > > The code reads the value of avail->idx and uses the value on the next > > line even with any compiler optimizations. There is no way for CPU to > > postpone the actual read. > > > Yes. > > Thanks > > > > > > > VHOST_LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__); > > > count = RTE_MIN(count, MAX_PKT_BURST); > > >