patches for DPDK stable branches
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ilya Maximets <i.maximets@samsung.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	dev@dpdk.org, jfreimann@redhat.com, tiwei.bie@intel.com,
	zhihong.wang@intel.com, stable@dpdk.org
Subject: Re: [dpdk-stable] [1/5] vhost: enforce avail index and desc read ordering
Date: Fri, 7 Dec 2018 10:44:53 -0500	[thread overview]
Message-ID: <20181207104125-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <07187c69-54b8-c5bd-9c02-a3f25e437a9a@samsung.com>

On Fri, Dec 07, 2018 at 05:58:24PM +0300, Ilya Maximets wrote:
> On 06.12.2018 16:48, Michael S. Tsirkin wrote:
> > On Thu, Dec 06, 2018 at 12:17:38PM +0800, Jason Wang wrote:
> >>
> >> On 2018/12/5 下午7:30, Ilya Maximets wrote:
> >>> On 05.12.2018 12:49, Maxime Coquelin wrote:
> >>>> A read barrier is required to ensure the ordering between
> >>>> available index and the descriptor reads is enforced.
> >>>>
> >>>> Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Reported-by: Jason Wang <jasowang@redhat.com>
> >>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> ---
> >>>>   lib/librte_vhost/virtio_net.c | 12 ++++++++++++
> >>>>   1 file changed, 12 insertions(+)
> >>>>
> >>>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> >>>> index 5e1a1a727..f11ebb54f 100644
> >>>> --- a/lib/librte_vhost/virtio_net.c
> >>>> +++ b/lib/librte_vhost/virtio_net.c
> >>>> @@ -791,6 +791,12 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> >>>>   	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
> >>>>   	avail_head = *((volatile uint16_t *)&vq->avail->idx);
> >>>> +	/*
> >>>> +	 * The ordering between avail index and
> >>>> +	 * desc reads needs to be enforced.
> >>>> +	 */
> >>>> +	rte_smp_rmb();
> >>>> +
> >>> Hmm. This looks weird to me.
> >>> Could you please describe the bad scenario here? (It'll be good to have it
> >>> in commit message too)
> >>>
> >>> As I understand, you're enforcing the read of avail->idx to happen before
> >>> reading the avail->ring[avail_idx]. Is it correct?
> >>>
> >>> But we have following code sequence:
> >>>
> >>> 1. read avail->idx (avail_head).
> >>> 2. check that last_avail_idx != avail_head.
> >>> 3. read from the ring using last_avail_idx.
> >>>
> >>> So, there is a strict dependency between all 3 steps and the memory
> >>> transaction will be finished at the step #2 in any case. There is no
> >>> way to read the ring before reading the avail->idx.
> >>>
> >>> Am I missing something?
> >>
> >>
> >> Nope, I kind of get what you meaning now. And even if we will
> >>
> >> 4. read descriptor from descriptor ring using the id read from 3
> >>
> >> 5. read descriptor content according to the address from 4
> >>
> >> They still have dependent memory access. So there's no need for rmb.
> > 
> > I am pretty sure on some architectures there is a need for a barrier
> > here.  This is an execution dependency since avail_head is not used as an
> > index. And reads can be speculated.  So the read from the ring can be
> > speculated and execute before the read of avail_head and the check.
> > 
> > However SMP rmb is/should be free on x86.
> 
> rte_smp_rmd() turns into compiler barrier on x86. And compiler barriers
> could be harmful too in some cases.
> 
> > So unless someone on this
> > thread is actually testing performance on non-x86, you are both wasting
> > cycles discussing removal of nop macros and also risk pushing untested
> > software on users.
> 
> Since DPDK supports not only x86, we have to consider possible performance
> issues on different architectures. In fact that this patch makes no sense
> on x86, the only thing we need to consider is the stability and performance
> on non-x86 architectures. If we'll not pay attention to things like this,
> vhost-user could become completely unusable on non-x86 architectures someday.
> 
> It'll be cool if someone could test patches (autotest would be nice too) on
> ARM at least. But, unfortunately, testing of DPDK is still far from being
> ideal. And the lack of hardware is the main issue. I'm running vhost with
> qemu on my ARMv8 platform from time to time, but it's definitely not enough.
> And I can not test every patch on a list.
> 
> However I made a few tests on ARMv8 and this patch shows no significant
> performance difference. But it makes the performance a bit more stable
> between runs, which is nice.

I'm sorry about being unclear. I think a barrier is required, so this
patch is good.  I was trying to say that splitting hairs trying to prove
that the barrier can be omitted without testing that omitting it gives a
performance benefit doesn't make sense. Since you observed that adding a
barrier actually helps performance stability, it's all good.


> > 
> > 
> >>
> >>>
> >>>>   	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
> >>>>   		uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
> >>>>   		uint16_t nr_vec = 0;
> >>>> @@ -1373,6 +1379,12 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> >>>>   	if (free_entries == 0)
> >>>>   		return 0;
> >>>> +	/*
> >>>> +	 * The ordering between avail index and
> >>>> +	 * desc reads needs to be enforced.
> >>>> +	 */
> >>>> +	rte_smp_rmb();
> >>>> +
> >>> This one is strange too.
> >>>
> >>> 	free_entries = *((volatile uint16_t *)&vq->avail->idx) -
> >>> 			vq->last_avail_idx;
> >>> 	if (free_entries == 0)
> >>> 		return 0;
> >>>
> >>> The code reads the value of avail->idx and uses the value on the next
> >>> line even with any compiler optimizations. There is no way for CPU to
> >>> postpone the actual read.
> >>
> >>
> >> Yes.
> >>
> >> Thanks
> >>
> >>
> >>>
> >>>>   	VHOST_LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
> >>>>   	count = RTE_MIN(count, MAX_PKT_BURST);
> >>>>
> > 
> > 

  reply	other threads:[~2018-12-07 15:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20181205094957.1938-1-maxime.coquelin@redhat.com>
2018-12-05  9:49 ` [dpdk-stable] [PATCH 1/5] " Maxime Coquelin
     [not found]   ` <CGME20181205113041eucas1p1943b9c13af2fb5b736ba4906b59a9cd5@eucas1p1.samsung.com>
2018-12-05 11:30     ` [dpdk-stable] [1/5] " Ilya Maximets
2018-12-06  4:17       ` Jason Wang
2018-12-06 12:48         ` Ilya Maximets
2018-12-06 13:25           ` Jason Wang
2018-12-06 13:48         ` Michael S. Tsirkin
2018-12-07 14:58           ` Ilya Maximets
2018-12-07 15:44             ` Michael S. Tsirkin [this message]
     [not found]   ` <CGME20181211103848eucas1p10c270ca8997fea8a2f55c2d94d02baea@eucas1p1.samsung.com>
2018-12-11 10:38     ` Ilya Maximets
2018-12-11 14:46       ` Maxime Coquelin
2018-12-05  9:49 ` [dpdk-stable] [PATCH 2/5] vhost: enforce desc flags and content " Maxime Coquelin
     [not found]   ` <CGME20181205133332eucas1p195b3864ed146403e314d7004d27be285@eucas1p1.samsung.com>
2018-12-05 13:33     ` [dpdk-stable] [2/5] " Ilya Maximets
2018-12-06  4:24       ` Jason Wang
2018-12-06 11:34         ` Ilya Maximets

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181207104125-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=dev@dpdk.org \
    --cc=i.maximets@samsung.com \
    --cc=jasowang@redhat.com \
    --cc=jfreimann@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=stable@dpdk.org \
    --cc=tiwei.bie@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).