From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 5C0C68E82 for ; Tue, 20 Oct 2015 16:13:13 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP; 20 Oct 2015 07:13:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,707,1437462000"; d="scan'208";a="831259472" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by fmsmga002.fm.intel.com with ESMTP; 20 Oct 2015 07:13:11 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.229]) by IRSMSX153.ger.corp.intel.com ([169.254.9.191]) with mapi id 14.03.0248.002; Tue, 20 Oct 2015 15:13:08 +0100 From: "Loftus, Ciara" To: Tetsuya Mukawa , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD Thread-Index: AQHQ46EZDQst5vy2d0C0oFzvXGhqQZ5KaSawgCObewCABrOZEA== Date: Tue, 20 Oct 2015 14:13:08 +0000 Message-ID: <74F120C019F4A64C9B78E802F6AD4CC24F7AC24F@IRSMSX106.ger.corp.intel.com> References: <1440993326-21205-1-git-send-email-mukawa@igel.co.jp> <1440993326-21205-2-git-send-email-mukawa@igel.co.jp> <74F120C019F4A64C9B78E802F6AD4CC24CB97F66@IRSMSX106.ger.corp.intel.com> <5620B804.1050300@igel.co.jp> In-Reply-To: <5620B804.1050300@igel.co.jp> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "ann.zhuangyanying@huawei.com" Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Oct 2015 14:13:13 -0000 >=20 > On 2015/09/24 2:47, Loftus, Ciara wrote: > >> The patch introduces a new PMD. This PMD is implemented as thin > wrapper > >> of librte_vhost. It means librte_vhost is also needed to compile the P= MD. > >> The PMD can have 'iface' parameter like below to specify a path to > connect > >> to a virtio-net device. > >> > >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=3D/tmp/sock0' -- -i > >> > >> To connect above testpmd, here is qemu command example. > >> > >> $ qemu-system-x86_64 \ > >> > >> -chardev socket,id=3Dchr0,path=3D/tmp/sock0 \ > >> -netdev vhost-user,id=3Dnet0,chardev=3Dchr0,vhostforce \ > >> -device virtio-net-pci,netdev=3Dnet0 > >> > >> Signed-off-by: Tetsuya Mukawa > >> --- > >> config/common_linuxapp | 6 + > >> drivers/net/Makefile | 4 + > >> drivers/net/vhost/Makefile | 61 +++ > >> drivers/net/vhost/rte_eth_vhost.c | 640 > >> ++++++++++++++++++++++++++++ > >> drivers/net/vhost/rte_pmd_vhost_version.map | 4 + > >> mk/rte.app.mk | 8 +- > >> 6 files changed, 722 insertions(+), 1 deletion(-) > >> create mode 100644 drivers/net/vhost/Makefile > >> create mode 100644 drivers/net/vhost/rte_eth_vhost.c > >> create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map > >> > >> +struct pmd_internal { > >> + TAILQ_ENTRY(pmd_internal) next; > >> + char *dev_name; > >> + char *iface_name; > >> + unsigned nb_rx_queues; > >> + unsigned nb_tx_queues; > >> + rte_atomic16_t xfer; > > Is this flag just used to indicate the state of the virtio_net device? > > Ie. if =3D0 then virtio_dev=3DNULL and if =3D1 then virtio_net !=3DNULL= & the > VIRTIO_DEV_RUNNING flag is set? >=20 > Hi Clara, >=20 > I am sorry for very late reply. >=20 > Yes, it is. Probably we can optimize it more. > I will change this implementation a bit in next patches. > Could you please check it? Of course, thanks. >=20 > >> + > >> +static uint16_t > >> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) > >> +{ > >> + struct vhost_queue *r =3D q; > >> + uint16_t i, nb_tx =3D 0; > >> + > >> + if (unlikely(r->internal =3D=3D NULL)) > >> + return 0; > >> + > >> + if (unlikely(rte_atomic16_read(&r->internal->xfer) =3D=3D 0)) > >> + return 0; > >> + > >> + rte_atomic16_set(&r->tx_executing, 1); > >> + > >> + if (unlikely(rte_atomic16_read(&r->internal->xfer) =3D=3D 0)) > >> + goto out; > >> + > >> + nb_tx =3D (uint16_t)rte_vhost_enqueue_burst(r->device, > >> + VIRTIO_RXQ, bufs, nb_bufs); > >> + > >> + rte_atomic64_add(&(r->tx_pkts), nb_tx); > >> + rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx); > >> + > >> + for (i =3D 0; likely(i < nb_tx); i++) > >> + rte_pktmbuf_free(bufs[i]); > > We may not always want to free these mbufs. For example, if a call is m= ade > to rte_eth_tx_burst with buffers from another (non DPDK) source, they may > not be ours to free. >=20 > Sorry, I am not sure what type of buffer you want to transfer. >=20 > This is a PMD that wraps librte_vhost. > And I guess other PMDs cannot handle buffers from another non DPDK > source. > Should we take care such buffers? >=20 > I have also checked af_packet PMD. > It seems the tx function of af_packet PMD just frees mbuf. For example if using the PMD with an application that receives buffers from= another source. Eg. a virtual switch receiving packets from an interface u= sing the kernel driver. I see that af_packet also frees the mbuf. I've checked the ixgbe and ring p= mds though and they don't seem to free the buffers, although I may have mis= sed something, the code for these is rather large and I am unfamiliar with = most of it. If I am correct though, should this behaviour vary from PMD to = PMD I wonder? >=20 > >> + > >> + > >> + eth_dev =3D rte_eth_dev_allocated(internal->dev_name); > >> + if (eth_dev =3D=3D NULL) { > >> + RTE_LOG(INFO, PMD, "failuer to find ethdev\n"); > > Typo: Failure. Same for the destroy_device function >=20 > Thanks, I will fix it in next patches. >=20 > >> + return -1; > >> + } > >> + > >> + internal =3D eth_dev->data->dev_private; > >> + > >> + for (i =3D 0; i < internal->nb_rx_queues; i++) { > >> + vq =3D &internal->rx_vhost_queues[i]; > >> + vq->device =3D dev; > >> + vq->internal =3D internal; > >> + } > >> + for (i =3D 0; i < internal->nb_tx_queues; i++) { > >> + vq =3D &internal->tx_vhost_queues[i]; > >> + vq->device =3D dev; > >> + vq->internal =3D internal; > >> + } > >> + > >> + dev->flags |=3D VIRTIO_DEV_RUNNING; > >> + dev->priv =3D eth_dev; > >> + > >> + eth_dev->data->dev_link.link_status =3D 1; > >> + rte_atomic16_set(&internal->xfer, 1); > >> + > >> + RTE_LOG(INFO, PMD, "New connection established\n"); > >> + > >> + return 0; > > Some freedom is taken away if the new_device and destroy_device > callbacks are implemented in the driver. > > For example if one wishes to call the rte_vhost_enable_guest_notificat= ion > function when a new device is brought up. They cannot now as there is no > scope to modify these callbacks, as is done in for example the vHost samp= le > app. Is this correct? >=20 > So how about adding one more parameter to be able to choose guest > notification behavior? >=20 > ex) > ./testpmd --vdev 'eth_vhost0,iface=3D/tmp/sock0,guest_notification=3D0' >=20 > In above case, all queues in this device will have > VRING_USED_F_NO_NOTIFY. I'm not too concerned about this particular function, I was just making an = example. The main concern I was expressing here and in the other thread wit= h Bruce, is the risk that we will lose some functionality available in the = library but not in the PMD. This function is an example of that. If we coul= d find some way to retain the functionality available in the library, it wo= uld be ideal. Thanks for the response! I will review and test further patches if they bec= ome available. Ciara >=20 > Thanks, > Tetsuya