From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 1F72F902 for ; Thu, 17 Dec 2015 06:22:58 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 16 Dec 2015 21:22:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,439,1444719600"; d="scan'208";a="873236814" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga002.jf.intel.com with ESMTP; 16 Dec 2015 21:22:57 -0800 Received: from fmsmsx123.amr.corp.intel.com (10.18.125.38) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 16 Dec 2015 21:22:57 -0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by fmsmsx123.amr.corp.intel.com (10.18.125.38) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 16 Dec 2015 21:22:57 -0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.190]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.105]) with mapi id 14.03.0248.002; Thu, 17 Dec 2015 13:22:54 +0800 From: "Xie, Huawei" To: "Xu, Qian Q" , "dev@dpdk.org" , "Thomas Monjalon" , Stephen Hemminger Thread-Topic: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing Thread-Index: AdE4ivsUOynmGgFLQ2y4OnLnMBzbcw== Date: Thu, 17 Dec 2015 05:22:54 +0000 Message-ID: References: <1443537953-23917-1-git-send-email-huawei.xie@intel.com> <1446130409-8217-1-git-send-email-huawei.xie@intel.com> <82F45D86ADE5454A95A89742C8D1410E031866E2@shsmsx102.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2015 05:22:59 -0000 On 11/27/2015 2:03 PM, Xu, Qian Q wrote:=0A= > Some virtio-pmd optimization performance data sharing: =0A= > 1. Use simplified vhost-sample, only doing the dequeuer and free, so virt= io only tx, then test the virtio tx performance improvement. Then in the VM= , using one virtio to do the txonly, and let the virtio tx working. Also mo= dified the txonly file to remove the memory copy part, then check the virti= o TX rate. The performance of optimized virtio-pmd will have ~2x performanc= e than the non-optimized virtio-pmd. =0A= > 2. Similarly as item1, but use the default txonly file, so with memory co= py, then the performance of optimized virtio-pmd will have ~37% performance= improvement than the non-optimized virtio-pmd. =0A= > 3. In the OVS test scenario, one physical NIC + one virtio in the VM, the= n let the virtio do the loopback(having rx and tx), running testpmd in the = VM, then the performance will have 60% performance improvement than the non= -optimized virtio-pmd. =0A= Thomas:=0A= You ever asked about the performance data.=0A= Another thing is how about adding a simple vhost performance example,=0A= like the vring bench which is used to test virtio performance, so that=0A= each time we have some performance related patches, we could use this=0A= benchmark to report the performance difference?=0A= >=0A= >=0A= >=0A= > Thanks=0A= > Qian=0A= >=0A= > -----Original Message-----=0A= > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie=0A= > Sent: Thursday, October 29, 2015 10:53 PM=0A= > To: dev@dpdk.org=0A= > Subject: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and si= mple rx/tx processing=0A= >=0A= > Changes in v6:=0A= > - Update release notes=0A= > - Fix the error in virtio tx ring layout ascii chart in the cover-letter= =0A= >=0A= > Changes in v5:=0A= > - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs=0A= >=0A= > Changes in v4:=0A= > - Fix the error in virtio tx ring layout ascii chart in the commit messag= e=0A= > - Move virtio_xmit_cleanup ahead to free descriptors earlier=0A= > - Test merge-able feature when select simple rx/tx functions=0A= >=0A= > Changes in v3:=0A= > - Remove unnecessary NULL test for rte_free=0A= > - Remove unnecessary assign of local var after free=0A= > - Remove return at the end of void function=0A= > - Remove always_inline attribute for virtio_xmit_cleanup=0A= > - Reword some commit messages=0A= > - Add TODO in the commit message of simple tx patch=0A= >=0A= > Changes in v2:=0A= > - Remove the configure macro=0A= > - Enable simple R/TX processing when user specifies simple txq flags=0A= > - Reword some comments and commit messages=0A= >=0A= > In DPDK based switching enviroment, mostly vhost runs on a dedicated core= while virtio processing in guest VMs runs on other different cores.=0A= > Take RX for example, with generic implementation, for each guest buffer,= =0A= > a) virtio driver allocates a descriptor from free descriptor list=0A= > b) modify the entry of avail ring to point to allocated descriptor=0A= > c) after packet is received, free the descriptor=0A= >=0A= > When vhost fetches the avail ring, it need to fetch the modified L1 cache= from virtio core, which is a heavy cost in current CPU implementation.=0A= >=0A= > This idea of this optimization is:=0A= > allocate the fixed descriptor for each entry of avail ring, so avail = ring will always be the same during the run.=0A= > This removes L1M cache transfer from virtio core to vhost core for avail = ring.=0A= > (Note we couldn't avoid the cache transfer for descriptors).=0A= > Besides, descriptor allocation and free operation is eliminated.=0A= > This also makes vector procesing possible to further accelerate the proce= ssing.=0A= >=0A= > This is the layout for the avail ring(take 256 ring entries for example),= with each entry pointing to the descriptor with the same index.=0A= > avail=0A= > idx=0A= > +=0A= > |=0A= > +----+----+---+-------------+------+=0A= > | 0 | 1 | 2 | ... | 254 | 255 | avail ring=0A= > +-+--+-+--+-+-+---------+---+--+---+=0A= > | | | | | |=0A= > | | | | | |=0A= > v v v | v v=0A= > +-+--+-+--+-+-+---------+---+--+---+=0A= > | 0 | 1 | 2 | ... | 254 | 255 | desc ring=0A= > +----+----+---+-------------+------+=0A= > |=0A= > |=0A= > +----+----+---+-------------+------+=0A= > | 0 | 1 | 2 | | 254 | 255 | used ring=0A= > +----+----+---+-------------+------+=0A= > |=0A= > +=0A= >=0A= > This is the ring layout for TX.=0A= > As we need one virtio header for each xmit packet, we have 128 slots avai= lable.=0A= >=0A= > ++=0A= > ||=0A= > ||=0A= > +-----+-----+-----+--------------+------+------+------+=0A= > | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring=0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+=0A= > | | | || | | |=0A= > v v v || v v v=0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+=0A= > | 128 | 129 | ... | 255 || 127 | 128 | ... | 255 | desc ring for v= irtio_net_hdr=0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+=0A= > | | | || | | |=0A= > v v v || v v v=0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+=0A= > | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for t= x dat=0A= > +-----+-----+-----+--------------+------+------+------+=0A= > ||=0A= > ||=0A= > ++=0A= >=0A= >=0A= > Performance boost could be observed only if the virtio backend isn't the = bottleneck or in VM2VM case.=0A= > There are also several vhost optimization patches to be submitted later.= =0A= >=0A= >=0A= > Huawei Xie (8):=0A= > virtio: add virtio_rxtx.h header file=0A= > virtio: add software rx ring, fake_buf into virtqueue=0A= > virtio: rx/tx ring layout optimization=0A= > virtio: fill RX avail ring with blank mbufs=0A= > virtio: virtio vec rx=0A= > virtio: simple tx routine=0A= > virtio: pick simple rx/tx func=0A= > doc: update release notes 2.2 about virtio performance optimization=0A= >=0A= > doc/guides/rel_notes/release_2_2.rst | 3 +=0A= > drivers/net/virtio/Makefile | 2 +-=0A= > drivers/net/virtio/virtio_ethdev.c | 12 +-=0A= > drivers/net/virtio/virtio_ethdev.h | 5 +=0A= > drivers/net/virtio/virtio_rxtx.c | 56 ++++-=0A= > drivers/net/virtio/virtio_rxtx.h | 39 +++=0A= > drivers/net/virtio/virtio_rxtx_simple.c | 414 ++++++++++++++++++++++++++= ++++++=0A= > drivers/net/virtio/virtqueue.h | 5 +=0A= > 8 files changed, 532 insertions(+), 4 deletions(-) create mode 100644 d= rivers/net/virtio/virtio_rxtx.h create mode 100644 drivers/net/virtio/virt= io_rxtx_simple.c=0A= >=0A= > --=0A= > 1.8.1.4=0A= >=0A= >=0A= =0A=