DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Xie, Huawei" <huawei.xie@intel.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"ms >> Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [dpdk-dev] virtio optimization idea
Date: Thu, 17 Sep 2015 15:41:36 +0000	[thread overview]
Message-ID: <C37D651A908B024F974696C65296B57B40F1D8C1@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <C37D651A908B024F974696C65296B57B2BDC0872@SHSMSX101.ccr.corp.intel.com>

On 9/8/2015 11:54 PM, Xie, Huawei wrote:
> On 9/8/2015 11:39 PM, Stephen Hemminger wrote:
>> On Fri, 4 Sep 2015 08:25:05 +0000
>> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>>
>>> Hi:
>>>
>>> Recently I have done one virtio optimization proof of concept. The
>>> optimization includes two parts:
>>> 1) avail ring set with fixed descriptors
>>> 2) RX vectorization
>>> With the optimizations, we could have several times of performance boost
>>> for purely vhost-virtio throughput.
>>>
>>> Here i will only cover the first part, which is the prerequisite for the
>>> second part.
>>> Let us first take RX for example. Currently when we fill the avail ring
>>> with guest mbuf, we need
>>> a) allocate one descriptor(for non sg mbuf) from free descriptors
>>> b) set the idx of the desc into the entry of avail ring
>>> c) set the addr/len field of the descriptor to point to guest blank mbuf
>>> data area
>>>
>>> Those operation takes time, and especially step b results in modifed (M)
>>> state of the cache line for the avail ring in the virtio processing
>>> core. When vhost processes the avail ring, the cache line transfer from
>>> virtio processing core to vhost processing core takes pretty much CPU
>>> cycles.
>>> To solve this problem, this is the arrangement of RX ring for DPDK
>>> pmd(for non-mergable case).
>>>    
>>>                     avail                      
>>>                     idx                        
>>>                     +                          
>>>                     |                          
>>> +----+----+---+-------------+------+           
>>> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
>>> +-+--+-+--+-+-+---------+---+--+---+           
>>>   |    |    |       |   |      |               
>>>   |    |    |       |   |      |               
>>>   v    v    v       |   v      v               
>>> +-+--+-+--+-+-+---------+---+--+---+           
>>> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
>>> +----+----+---+-------------+------+           
>>>                     |                          
>>>                     |                          
>>> +----+----+---+-------------+------+           
>>> | 0  | 1  | 2 |     |  254  | 255  |  used ring
>>> +----+----+---+-------------+------+           
>>>                     |                          
>>>                     +    
>>> Avail ring is initialized with fixed descriptor and is never changed,
>>> i.e, the index value of the nth avail ring entry is always n, which
>>> means virtio PMD is actually refilling desc ring only, without having to
>>> change avail ring.
>>> When vhost fetches avail ring, if not evicted, it is always in its first
>>> level cache.
>>>
>>> When RX receives packets from used ring, we use the used->idx as the
>>> desc idx. This requires that vhost processes and returns descs from
>>> avail ring to used ring in order, which is true for both current dpdk
>>> vhost and kernel vhost implementation. In my understanding, there is no
>>> necessity for vhost net to process descriptors OOO. One case could be
>>> zero copy, for example, if one descriptor doesn't meet zero copy
>>> requirment, we could directly return it to used ring, earlier than the
>>> descriptors in front of it.
>>> To enforce this, i want to use a reserved bit to indicate in order
>>> processing of descriptors.
>>>
>>> For tx ring, the arrangement is like below. Each transmitted mbuf needs
>>> a desc for virtio_net_hdr, so actually we have only 128 free slots.
>>>                                                                                       
>>>
>>>                            

                            ++                                                           
                            ||                                                           
                            ||                                                           
   +-----+-----+-----+--------------+------+------+------+                               
   |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring                  
   +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
      |     |            |  ||  |      |             |                                   
      v     v            v  ||  v      v             v                                   
   +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
   | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
   +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
      |     |            |  ||  |      |             |                                   
      v     v            v  ||  v      v             v                                   
   +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
   |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat 

>>>           
>>>
>> Does this still work with Linux (or BSD) guest/host.
>> If you are assuming both virtio/vhost are DPDK this is never going
>> to be usable.
> It works with both dpdk vhost and kernel vhost implementations.
> But to enforce this, we had better add a new feature bit.
Hi Stephen, some update about compatibility:
This optimization in theory is compliant with current kernel vhost,
qemu, and dpdk vhost implementations.
Today i run dpdk virtio PMD with qemu and kernel vhost, and it works fine.


>> On a related note, have you looked at getting virtio to support the
>> new standard (not legacy) mode?
> Yes, we add it to our plan to support virtio 1.0.
>>
>


  reply	other threads:[~2015-09-17 15:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-04  8:25 Xie, Huawei
2015-09-04 16:50 ` Xie, Huawei
2015-09-08  8:21   ` Tetsuya Mukawa
2015-09-08  9:42     ` Xie, Huawei
2015-09-08 15:39 ` Stephen Hemminger
2015-09-08 15:52   ` Xie, Huawei
2015-09-17 15:41     ` Xie, Huawei [this message]
2015-09-09  7:33 ` Michael S. Tsirkin
2015-09-10  6:32   ` Xie, Huawei
2015-09-10  7:20     ` Michael S. Tsirkin
2015-09-14  3:08       ` Xie, Huawei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C37D651A908B024F974696C65296B57B40F1D8C1@SHSMSX101.ccr.corp.intel.com \
    --to=huawei.xie@intel.com \
    --cc=dev@dpdk.org \
    --cc=mst@redhat.com \
    --cc=stephen@networkplumber.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).