From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id C24665F72; Thu, 6 Dec 2018 14:26:12 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C359A300294A; Thu, 6 Dec 2018 13:26:11 +0000 (UTC) Received: from [10.72.12.143] (ovpn-12-143.pek2.redhat.com [10.72.12.143]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CC42328DF0; Thu, 6 Dec 2018 13:25:59 +0000 (UTC) To: Ilya Maximets , Maxime Coquelin , dev@dpdk.org, jfreimann@redhat.com, tiwei.bie@intel.com, zhihong.wang@intel.com Cc: stable@dpdk.org, "Michael S. Tsirkin" References: <20181205094957.1938-2-maxime.coquelin@redhat.com> From: Jason Wang Message-ID: Date: Thu, 6 Dec 2018 21:25:56 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Thu, 06 Dec 2018 13:26:11 +0000 (UTC) Subject: Re: [dpdk-dev] [1/5] vhost: enforce avail index and desc read ordering X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2018 13:26:13 -0000 On 2018/12/6 下午8:48, Ilya Maximets wrote: > On 06.12.2018 7:17, Jason Wang wrote: >> On 2018/12/5 下午7:30, Ilya Maximets wrote: >>> On 05.12.2018 12:49, Maxime Coquelin wrote: >>>> A read barrier is required to ensure the ordering between >>>> available index and the descriptor reads is enforced. >>>> >>>> Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application") >>>> Cc: stable@dpdk.org >>>> >>>> Reported-by: Jason Wang >>>> Signed-off-by: Maxime Coquelin >>>> --- >>>>   lib/librte_vhost/virtio_net.c | 12 ++++++++++++ >>>>   1 file changed, 12 insertions(+) >>>> >>>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >>>> index 5e1a1a727..f11ebb54f 100644 >>>> --- a/lib/librte_vhost/virtio_net.c >>>> +++ b/lib/librte_vhost/virtio_net.c >>>> @@ -791,6 +791,12 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, >>>>       rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]); >>>>       avail_head = *((volatile uint16_t *)&vq->avail->idx); >>>>   +    /* >>>> +     * The ordering between avail index and >>>> +     * desc reads needs to be enforced. >>>> +     */ >>>> +    rte_smp_rmb(); >>>> + >>> Hmm. This looks weird to me. >>> Could you please describe the bad scenario here? (It'll be good to have it >>> in commit message too) >>> >>> As I understand, you're enforcing the read of avail->idx to happen before >>> reading the avail->ring[avail_idx]. Is it correct? >>> >>> But we have following code sequence: >>> >>> 1. read avail->idx (avail_head). >>> 2. check that last_avail_idx != avail_head. >>> 3. read from the ring using last_avail_idx. >>> >>> So, there is a strict dependency between all 3 steps and the memory >>> transaction will be finished at the step #2 in any case. There is no >>> way to read the ring before reading the avail->idx. >>> >>> Am I missing something? >> >> Nope, I kind of get what you meaning now. And even if we will >> >> 4. read descriptor from descriptor ring using the id read from 3 >> >> 5. read descriptor content according to the address from 4 >> >> They still have dependent memory access. So there's no need for rmb. >> > On a second glance I changed my mind. > The code looks like this: > > 1. read avail_head = avail->idx > 2. read cur_idx = last_avail_idx > if (cur_idx != avail_head) { > 3. read idx = avail->ring[cur_idx] > 4. read desc[idx] > } > > There is an address (data) dependency: 2 -> 3 -> 4. > These reads could not be reordered. > > But it's only control dependency between 1 and (3, 4), because 'avail_head' > is not used to calculate 'cur_idx'. In case of aggressive speculative > execution, 1 could be reordered with 3 resulting with reading of not yet > updated 'idx'. > > Not sure if speculative execution could go so far while 'avail_head' is not > read yet, but it's should be possible in theory. > > Thoughts ? I think I change my mind as well, this is similar to the discussion of desc_is_avail(). So I think it's possible. > >>>>       for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { >>>>           uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen; >>>>           uint16_t nr_vec = 0; >>>> @@ -1373,6 +1379,12 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, >>>>       if (free_entries == 0) >>>>           return 0; >>>>   +    /* >>>> +     * The ordering between avail index and >>>> +     * desc reads needs to be enforced. >>>> +     */ >>>> +    rte_smp_rmb(); >>>> + >>> This one is strange too. >>> >>>     free_entries = *((volatile uint16_t *)&vq->avail->idx) - >>>             vq->last_avail_idx; >>>     if (free_entries == 0) >>>         return 0; >>> >>> The code reads the value of avail->idx and uses the value on the next >>> line even with any compiler optimizations. There is no way for CPU to >>> postpone the actual read. >> >> Yes. >> > It's kind of similar situation here, but 'avail_head' is involved somehow > in 'cur_idx' calculation because of > fill_vec_buf_split(..., vq->last_avail_idx + i, ...) > And 'i' depends on 'free_entries'. I agree it depends on compiler,  it can choose to remove such data dependency. > But we need to look at the exact asm > code to be sure. I think it's probably hard to get a conclusion by checking asm code generated by one specific version or kind of a compiler > I think, we may add barrier here to avoid possible issues. Yes. Thanks. > >> Thanks >> >> >>>>       VHOST_LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__); >>>>         count = RTE_MIN(count, MAX_PKT_BURST); >>>> >>