From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by dpdk.org (Postfix) with ESMTP id 6C2EA8DA8 for ; Tue, 8 Sep 2015 10:21:13 +0200 (CEST) Received: by padhy16 with SMTP id hy16so115853464pad.1 for ; Tue, 08 Sep 2015 01:21:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=doUWq/pKE2GcLjbucFDfythsV7DWUv2fM3NuiwiITys=; b=lRqpDFQcF+rVtcreFf+BHiYEnzRDwJvs9rJ+e1DPqevb9qtISJXNLoY94zZ4fMQwXd qmehTg+GnHYBjuiMOzfji/2Cq/xyCJbSvY5KLODYd4qTrqFdibqZyxgE1oK+rRM5GfLp T3R9AEgk+3m9javrQTgF1NfG6GwNsTinqqluu/WBiUpNAuCoY1j3OZO841Nmj06lL6tf HmpWbZty67DNpSqjVcXgk+iFhLSRqZacUOvxt9eJqlo4UZGOVmi0j5WQYPIwVu/KzsMp UtIWwh+JcSK9ooW4Ur5nt86rlG4KycQNglRgtmonT3nNqONAG6aARudDai7sl/wO2KP/ KKkg== X-Gm-Message-State: ALoCoQl/AUs2jdzK67vKDy97p6POPkAEnoy17FMH/zARvXW6DuSWcvssahX2tMeFbRc0qXFpL4K4 X-Received: by 10.68.236.195 with SMTP id uw3mr9421892pbc.68.1441700472769; Tue, 08 Sep 2015 01:21:12 -0700 (PDT) Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132]) by smtp.googlemail.com with ESMTPSA id nv8sm2355075pdb.92.2015.09.08.01.21.10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Sep 2015 01:21:12 -0700 (PDT) Message-ID: <55EE9A75.7020306@igel.co.jp> Date: Tue, 08 Sep 2015 17:21:09 +0900 From: Tetsuya Mukawa User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Xie, Huawei" , "dev@dpdk.org" , Thomas Monjalon , Linhaifeng References: In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "ms >> Michael S. Tsirkin" Subject: Re: [dpdk-dev] virtio optimization idea X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Sep 2015 08:21:13 -0000 On 2015/09/05 1:50, Xie, Huawei wrote: > There is some format issue with the ascii chart of the tx ring. Update > that chart. > Sorry for the trouble. Hi XIe, Thanks for sharing a way to optimize virtio. I have a few questions. > > On 9/4/2015 4:25 PM, Xie, Huawei wrote: >> Hi: >> >> Recently I have done one virtio optimization proof of concept. The >> optimization includes two parts: >> 1) avail ring set with fixed descriptors >> 2) RX vectorization >> With the optimizations, we could have several times of performance boost >> for purely vhost-virtio throughput. When you check performance, have you optimized only virtio-net driver? If so, can we optimize vhost backend(librte_vhost) also using your optimization way? >> >> Here i will only cover the first part, which is the prerequisite for the >> second part. >> Let us first take RX for example. Currently when we fill the avail ring >> with guest mbuf, we need >> a) allocate one descriptor(for non sg mbuf) from free descriptors >> b) set the idx of the desc into the entry of avail ring >> c) set the addr/len field of the descriptor to point to guest blank mbuf >> data area >> >> Those operation takes time, and especially step b results in modifed (M) >> state of the cache line for the avail ring in the virtio processing >> core. When vhost processes the avail ring, the cache line transfer from >> virtio processing core to vhost processing core takes pretty much CPU >> cycles. >> To solve this problem, this is the arrangement of RX ring for DPDK >> pmd(for non-mergable case). >> >> avail >> idx >> + >> | >> +----+----+---+-------------+------+ >> | 0 | 1 | 2 | ... | 254 | 255 | avail ring >> +-+--+-+--+-+-+---------+---+--+---+ >> | | | | | | >> | | | | | | >> v v v | v v >> +-+--+-+--+-+-+---------+---+--+---+ >> | 0 | 1 | 2 | ... | 254 | 255 | desc ring >> +----+----+---+-------------+------+ >> | >> | >> +----+----+---+-------------+------+ >> | 0 | 1 | 2 | | 254 | 255 | used ring >> +----+----+---+-------------+------+ >> | >> + >> Avail ring is initialized with fixed descriptor and is never changed, >> i.e, the index value of the nth avail ring entry is always n, which >> means virtio PMD is actually refilling desc ring only, without having to >> change avail ring. For example, avail ring is like below. struct vring_avail { uint16_t flags; uint16_t idx; uint16_t ring[QUEUE_SIZE]; }; My understanding is that virtio-net driver still needs to change avail_ring.idx, but don't need to change avail_ring.ring[]. Is this correct? Tetsuya >> When vhost fetches avail ring, if not evicted, it is always in its first >> level cache. >> >> When RX receives packets from used ring, we use the used->idx as the >> desc idx. This requires that vhost processes and returns descs from >> avail ring to used ring in order, which is true for both current dpdk >> vhost and kernel vhost implementation. In my understanding, there is no >> necessity for vhost net to process descriptors OOO. One case could be >> zero copy, for example, if one descriptor doesn't meet zero copy >> requirment, we could directly return it to used ring, earlier than the >> descriptors in front of it. >> To enforce this, i want to use a reserved bit to indicate in order >> processing of descriptors. >> >> For tx ring, the arrangement is like below. Each transmitted mbuf needs >> a desc for virtio_net_hdr, so actually we have only 128 free slots. >> >> >> >> ++ >> || >> || >> +-----+-----+-----+--------------+------+------+------+ >> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> | | | || | | | >> v v v || v v v >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> | | | || | | | >> v v v || v v v >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat >> >> >> >> /huawei >>