From: Ilya Maximets <i.maximets@samsung.com>
To: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>,
dev@dpdk.org, jfreimann@redhat.com, tiwei.bie@intel.com,
zhihong.wang@intel.com, stable@dpdk.org
Subject: Re: [dpdk-stable] [1/5] vhost: enforce avail index and desc read ordering
Date: Fri, 7 Dec 2018 17:58:24 +0300 [thread overview]
Message-ID: <07187c69-54b8-c5bd-9c02-a3f25e437a9a@samsung.com> (raw)
In-Reply-To: <20181206083048-mutt-send-email-mst@kernel.org>
On 06.12.2018 16:48, Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2018 at 12:17:38PM +0800, Jason Wang wrote:
>>
>> On 2018/12/5 下午7:30, Ilya Maximets wrote:
>>> On 05.12.2018 12:49, Maxime Coquelin wrote:
>>>> A read barrier is required to ensure the ordering between
>>>> available index and the descriptor reads is enforced.
>>>>
>>>> Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Reported-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>> lib/librte_vhost/virtio_net.c | 12 ++++++++++++
>>>> 1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>>>> index 5e1a1a727..f11ebb54f 100644
>>>> --- a/lib/librte_vhost/virtio_net.c
>>>> +++ b/lib/librte_vhost/virtio_net.c
>>>> @@ -791,6 +791,12 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>>>> rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
>>>> avail_head = *((volatile uint16_t *)&vq->avail->idx);
>>>> + /*
>>>> + * The ordering between avail index and
>>>> + * desc reads needs to be enforced.
>>>> + */
>>>> + rte_smp_rmb();
>>>> +
>>> Hmm. This looks weird to me.
>>> Could you please describe the bad scenario here? (It'll be good to have it
>>> in commit message too)
>>>
>>> As I understand, you're enforcing the read of avail->idx to happen before
>>> reading the avail->ring[avail_idx]. Is it correct?
>>>
>>> But we have following code sequence:
>>>
>>> 1. read avail->idx (avail_head).
>>> 2. check that last_avail_idx != avail_head.
>>> 3. read from the ring using last_avail_idx.
>>>
>>> So, there is a strict dependency between all 3 steps and the memory
>>> transaction will be finished at the step #2 in any case. There is no
>>> way to read the ring before reading the avail->idx.
>>>
>>> Am I missing something?
>>
>>
>> Nope, I kind of get what you meaning now. And even if we will
>>
>> 4. read descriptor from descriptor ring using the id read from 3
>>
>> 5. read descriptor content according to the address from 4
>>
>> They still have dependent memory access. So there's no need for rmb.
>
> I am pretty sure on some architectures there is a need for a barrier
> here. This is an execution dependency since avail_head is not used as an
> index. And reads can be speculated. So the read from the ring can be
> speculated and execute before the read of avail_head and the check.
>
> However SMP rmb is/should be free on x86.
rte_smp_rmd() turns into compiler barrier on x86. And compiler barriers
could be harmful too in some cases.
> So unless someone on this
> thread is actually testing performance on non-x86, you are both wasting
> cycles discussing removal of nop macros and also risk pushing untested
> software on users.
Since DPDK supports not only x86, we have to consider possible performance
issues on different architectures. In fact that this patch makes no sense
on x86, the only thing we need to consider is the stability and performance
on non-x86 architectures. If we'll not pay attention to things like this,
vhost-user could become completely unusable on non-x86 architectures someday.
It'll be cool if someone could test patches (autotest would be nice too) on
ARM at least. But, unfortunately, testing of DPDK is still far from being
ideal. And the lack of hardware is the main issue. I'm running vhost with
qemu on my ARMv8 platform from time to time, but it's definitely not enough.
And I can not test every patch on a list.
However I made a few tests on ARMv8 and this patch shows no significant
performance difference. But it makes the performance a bit more stable
between runs, which is nice.
>
>
>>
>>>
>>>> for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
>>>> uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
>>>> uint16_t nr_vec = 0;
>>>> @@ -1373,6 +1379,12 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>>>> if (free_entries == 0)
>>>> return 0;
>>>> + /*
>>>> + * The ordering between avail index and
>>>> + * desc reads needs to be enforced.
>>>> + */
>>>> + rte_smp_rmb();
>>>> +
>>> This one is strange too.
>>>
>>> free_entries = *((volatile uint16_t *)&vq->avail->idx) -
>>> vq->last_avail_idx;
>>> if (free_entries == 0)
>>> return 0;
>>>
>>> The code reads the value of avail->idx and uses the value on the next
>>> line even with any compiler optimizations. There is no way for CPU to
>>> postpone the actual read.
>>
>>
>> Yes.
>>
>> Thanks
>>
>>
>>>
>>>> VHOST_LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
>>>> count = RTE_MIN(count, MAX_PKT_BURST);
>>>>
>
>
next prev parent reply other threads:[~2018-12-07 14:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20181205094957.1938-1-maxime.coquelin@redhat.com>
2018-12-05 9:49 ` [dpdk-stable] [PATCH 1/5] " Maxime Coquelin
[not found] ` <CGME20181205113041eucas1p1943b9c13af2fb5b736ba4906b59a9cd5@eucas1p1.samsung.com>
2018-12-05 11:30 ` [dpdk-stable] [1/5] " Ilya Maximets
2018-12-06 4:17 ` Jason Wang
2018-12-06 12:48 ` Ilya Maximets
2018-12-06 13:25 ` Jason Wang
2018-12-06 13:48 ` Michael S. Tsirkin
2018-12-07 14:58 ` Ilya Maximets [this message]
2018-12-07 15:44 ` Michael S. Tsirkin
[not found] ` <CGME20181211103848eucas1p10c270ca8997fea8a2f55c2d94d02baea@eucas1p1.samsung.com>
2018-12-11 10:38 ` Ilya Maximets
2018-12-11 14:46 ` Maxime Coquelin
2018-12-05 9:49 ` [dpdk-stable] [PATCH 2/5] vhost: enforce desc flags and content " Maxime Coquelin
[not found] ` <CGME20181205133332eucas1p195b3864ed146403e314d7004d27be285@eucas1p1.samsung.com>
2018-12-05 13:33 ` [dpdk-stable] [2/5] " Ilya Maximets
2018-12-06 4:24 ` Jason Wang
2018-12-06 11:34 ` Ilya Maximets
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=07187c69-54b8-c5bd-9c02-a3f25e437a9a@samsung.com \
--to=i.maximets@samsung.com \
--cc=dev@dpdk.org \
--cc=jasowang@redhat.com \
--cc=jfreimann@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=stable@dpdk.org \
--cc=tiwei.bie@intel.com \
--cc=zhihong.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).