From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Xie, Huawei" <huawei.xie@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>
Subject: Re: [dpdk-dev] virtio optimization idea
Date: Thu, 10 Sep 2015 10:20:07 +0300 [thread overview]
Message-ID: <20150910101319-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <C37D651A908B024F974696C65296B57B2BDC4012@SHSMSX101.ccr.corp.intel.com>
On Thu, Sep 10, 2015 at 06:32:35AM +0000, Xie, Huawei wrote:
> On 9/9/2015 3:34 PM, Michael S. Tsirkin wrote:
> > On Fri, Sep 04, 2015 at 08:25:05AM +0000, Xie, Huawei wrote:
> >> Hi:
> >>
> >> Recently I have done one virtio optimization proof of concept. The
> >> optimization includes two parts:
> >> 1) avail ring set with fixed descriptors
> >> 2) RX vectorization
> >> With the optimizations, we could have several times of performance boost
> >> for purely vhost-virtio throughput.
> > Thanks!
> > I'm very happy to see people work on the virtio ring format
> > optimizations.
> >
> > I think it's best to analyze each optimization separately,
> > unless you see a reason why they would only give benefit when applied
> > together.
>
> Agree. Will split the patch to see each change's benefit. Of course it
> is also very common two give much more gain than the sum of individual.
>
> >
> > Also ideally, we'd need a unit test to show the performance impact.
> > We've been using the tests in tools/virtio/ under linux,
> > feel free to enhance these to simulate more workloads, or
> > to suggest something else entirely.
> If possible, we would provide perf test case under tools/virtio.
> I am interested if the optimization could help kernel virtio-net driver
> performance, if not the other is the bottleneck.
> >
> > Avail ring is initialized with fixed descriptor and is never changed,
> > i.e, the index value of the nth avail ring entry is always n, which
> > means virtio PMD is actually refilling desc ring only, without having to
> > change avail ring.
> > When vhost fetches avail ring, if not evicted, it is always in its first
> > level cache.
> >
> > When RX receives packets from used ring, we use the used->idx as the
> > desc idx. This requires that vhost processes and returns descs from
> > avail ring to used ring in order, which is true for both current dpdk
> > vhost and kernel vhost implementation. In my understanding, there is no
> > necessity for vhost net to process descriptors OOO. One case could be
> > zero copy, for example, if one descriptor doesn't meet zero copy
> > requirment, we could directly return it to used ring, earlier than the
> > descriptors in front of it.
> > To enforce this, i want to use a reserved bit to indicate in order
> > processing of descriptors.
> > So what's the point in changing the idx for the used ring?
> > You need to communicate the length to the guest anyway, don't you?
> For guest virtio driver, we only use the length field in the entry of
> the used ring and don't use the index in the entry. Instead, use
> used->idx & 255 as the desc idx for RX, and used->idx & 127 as the desc
> idx for TX.
OK but length and index are in the same cache line.
As long as we read the length, does it make sense
to skip reading the index?
> For vhost driver, as it couldn't assume fixed ring layout(to support
> legacy virtio), it needs to update the idx in the used ring entry.
> >
> >> For tx ring, the arrangement is like below. Each transmitted mbuf needs
> >> a desc for virtio_net_hdr, so actually we have only 128 free slots.
> > Just fix this one. Support ANY_LAYOUT and then you can put data
> > linearly. And/or support INDIRECT_DESC and then you can
> > use an indirect descriptor.
> Would check those two features.
> >
> >>
> >>
> > This one came out corrupted.
> Actually i ever replied to the original mail and fixed it. Copy it here
> again.
>
> ++
> ||
> ||
> +-----+-----+-----+--------------+------+------+------+
> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | | | || | | |
> v v v || v v v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | | | || | | |
> v v v || v v v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat
>
>
> >>
> >> /huawei
> >
> > Please Cc virtio related discussion more widely.
> > I added the virtualization mailing list.
> >
> >
> > So what you want to do is avoid changing the avail
> > ring, isn't it enough to pre-format it and cache
> > the values in the guest?
> >
> > Host can then keep using avail ring without changes, it will stay in cache.
> > Something like the below for guest should do the trick (untested):
>
> Good optimization to tackle the cache line transfer issue.
> Your change could avoid changing avail ring if it points to same head
> index. It could improve kernel virtio-net driver's performance if the
> avail ring doesn't change in running.
> We would also investigating applying this idea to DPDK virtio PMD slow path.
> For the optimization i did, which i call fast path, the driver "knows"
> in theory the avail ring willn't never get changed, so it doesn't a)
> allocate and free descriptors b) care the avail ring.
> Based on the fact the DPDK pmd is actually using the simple desc ring
> below, we could directly map avail->idx and used->idx to desc idx, and
> use vector instruction to do parallel processing.
>
> +-+--+-+--+-+-+---------+---+--+---+
> | 0 | 1 | 2 | ... | 254 | 255 | rx desc ring
> +----+----+---+-------------+------+
Yes, using these instructions in kernel is generally problematic,
but can work in userspace. Waiting for a description of that.
>
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 096b857..9363b50 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -91,6 +91,7 @@ struct vring_virtqueue {
> > bool last_add_time_valid;
> > ktime_t last_add_time;
> > #endif
> > + u16 *avail;
> >
> > /* Tokens for callbacks. */
> > void *data[];
> > @@ -236,7 +237,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > /* Put entry in available array (but don't update avail->idx until they
> > * do sync). */
> > avail = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) & (vq->vring.num - 1);
> > - vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > + if (vq->avail[avail] != head) {
> > + vq->avail[avail] = head;
> > + vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > + }
> >
> > /* Descriptors and available array need to be set before we expose the
> > * new available array entries. */
> > @@ -724,6 +728,11 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > vq = kmalloc(sizeof(*vq) + sizeof(void *)*num, GFP_KERNEL);
> > if (!vq)
> > return NULL;
> > + vq->avail = kzalloc(sizeof (*vq->avail) * num, GFP_KERNEL);
> > + if (!va->avail) {
> > + kfree(vq);
> > + return NULL;
> > + }
> >
> > vring_init(&vq->vring, num, pages, vring_align);
> > vq->vq.callback = callback;
> >
>
next prev parent reply other threads:[~2015-09-10 7:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-04 8:25 Xie, Huawei
2015-09-04 16:50 ` Xie, Huawei
2015-09-08 8:21 ` Tetsuya Mukawa
2015-09-08 9:42 ` Xie, Huawei
2015-09-08 15:39 ` Stephen Hemminger
2015-09-08 15:52 ` Xie, Huawei
2015-09-17 15:41 ` Xie, Huawei
2015-09-09 7:33 ` Michael S. Tsirkin
2015-09-10 6:32 ` Xie, Huawei
2015-09-10 7:20 ` Michael S. Tsirkin [this message]
2015-09-14 3:08 ` Xie, Huawei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150910101319-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=dev@dpdk.org \
--cc=huawei.xie@intel.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).