From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ciara.loftus@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 5C0C68E82
 for <dev@dpdk.org>; Tue, 20 Oct 2015 16:13:13 +0200 (CEST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga101.jf.intel.com with ESMTP; 20 Oct 2015 07:13:12 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,707,1437462000"; d="scan'208";a="831259472"
Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75])
 by fmsmga002.fm.intel.com with ESMTP; 20 Oct 2015 07:13:11 -0700
Received: from irsmsx106.ger.corp.intel.com ([169.254.8.229]) by
 IRSMSX153.ger.corp.intel.com ([169.254.9.191]) with mapi id 14.03.0248.002;
 Tue, 20 Oct 2015 15:13:08 +0100
From: "Loftus, Ciara" <ciara.loftus@intel.com>
To: Tetsuya Mukawa <mukawa@igel.co.jp>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
Thread-Index: AQHQ46EZDQst5vy2d0C0oFzvXGhqQZ5KaSawgCObewCABrOZEA==
Date: Tue, 20 Oct 2015 14:13:08 +0000
Message-ID: <74F120C019F4A64C9B78E802F6AD4CC24F7AC24F@IRSMSX106.ger.corp.intel.com>
References: <1440993326-21205-1-git-send-email-mukawa@igel.co.jp>
 <1440993326-21205-2-git-send-email-mukawa@igel.co.jp>
 <74F120C019F4A64C9B78E802F6AD4CC24CB97F66@IRSMSX106.ger.corp.intel.com>
 <5620B804.1050300@igel.co.jp>
In-Reply-To: <5620B804.1050300@igel.co.jp>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ann.zhuangyanying@huawei.com" <ann.zhuangyanying@huawei.com>
Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Oct 2015 14:13:13 -0000

>=20
> On 2015/09/24 2:47, Loftus, Ciara wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin
> wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile the P=
MD.
> >> The PMD can have 'iface' parameter like below to specify a path to
> connect
> >> to a virtio-net device.
> >>
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=3D/tmp/sock0' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >>         <snip>
> >>         -chardev socket,id=3Dchr0,path=3D/tmp/sock0 \
> >>         -netdev vhost-user,id=3Dnet0,chardev=3Dchr0,vhostforce \
> >>         -device virtio-net-pci,netdev=3Dnet0
> >>
> >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> >> ---
> >>  config/common_linuxapp                      |   6 +
> >>  drivers/net/Makefile                        |   4 +
> >>  drivers/net/vhost/Makefile                  |  61 +++
> >>  drivers/net/vhost/rte_eth_vhost.c           | 640
> >> ++++++++++++++++++++++++++++
> >>  drivers/net/vhost/rte_pmd_vhost_version.map |   4 +
> >>  mk/rte.app.mk                               |   8 +-
> >>  6 files changed, 722 insertions(+), 1 deletion(-)
> >>  create mode 100644 drivers/net/vhost/Makefile
> >>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
> >>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> >>
> >> +struct pmd_internal {
> >> +	TAILQ_ENTRY(pmd_internal) next;
> >> +	char *dev_name;
> >> +	char *iface_name;
> >> +	unsigned nb_rx_queues;
> >> +	unsigned nb_tx_queues;
> >> +	rte_atomic16_t xfer;
> > Is this flag just used to indicate the state of the virtio_net device?
> > Ie. if =3D0 then virtio_dev=3DNULL and if =3D1 then virtio_net !=3DNULL=
 & the
> VIRTIO_DEV_RUNNING flag is set?
>=20
> Hi Clara,
>=20
> I am sorry for very late reply.
>=20
> Yes, it is. Probably we can optimize it more.
> I will change this implementation a bit in next patches.
> Could you please check it?
Of course, thanks.

>=20
> >> +
> >> +static uint16_t
> >> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> >> +{
> >> +	struct vhost_queue *r =3D q;
> >> +	uint16_t i, nb_tx =3D 0;
> >> +
> >> +	if (unlikely(r->internal =3D=3D NULL))
> >> +		return 0;
> >> +
> >> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) =3D=3D 0))
> >> +		return 0;
> >> +
> >> +	rte_atomic16_set(&r->tx_executing, 1);
> >> +
> >> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) =3D=3D 0))
> >> +		goto out;
> >> +
> >> +	nb_tx =3D (uint16_t)rte_vhost_enqueue_burst(r->device,
> >> +			VIRTIO_RXQ, bufs, nb_bufs);
> >> +
> >> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
> >> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
> >> +
> >> +	for (i =3D 0; likely(i < nb_tx); i++)
> >> +		rte_pktmbuf_free(bufs[i]);
> > We may not always want to free these mbufs. For example, if a call is m=
ade
> to rte_eth_tx_burst with buffers from another (non DPDK) source, they may
> not be ours to free.
>=20
> Sorry, I am not sure what type of buffer you want to transfer.
>=20
> This is a PMD that wraps librte_vhost.
> And I guess other PMDs cannot handle buffers from another non DPDK
> source.
> Should we take care such buffers?
>=20
> I have also checked af_packet PMD.
> It seems the tx function of af_packet PMD just frees mbuf.

For example if using the PMD with an application that receives buffers from=
 another source. Eg. a virtual switch receiving packets from an interface u=
sing the kernel driver.
I see that af_packet also frees the mbuf. I've checked the ixgbe and ring p=
mds though and they don't seem to free the buffers, although I may have mis=
sed something, the code for these is rather large and I am unfamiliar with =
most of it. If I am correct though, should this behaviour vary from PMD to =
PMD I wonder?
>=20
> >> +
> >> +
> >> +	eth_dev =3D rte_eth_dev_allocated(internal->dev_name);
> >> +	if (eth_dev =3D=3D NULL) {
> >> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
> > Typo: Failure. Same for the destroy_device function
>=20
> Thanks, I will fix it in next patches.
>=20
> >> +		return -1;
> >> +	}
> >> +
> >> +	internal =3D eth_dev->data->dev_private;
> >> +
> >> +	for (i =3D 0; i < internal->nb_rx_queues; i++) {
> >> +		vq =3D &internal->rx_vhost_queues[i];
> >> +		vq->device =3D dev;
> >> +		vq->internal =3D internal;
> >> +	}
> >> +	for (i =3D 0; i < internal->nb_tx_queues; i++) {
> >> +		vq =3D &internal->tx_vhost_queues[i];
> >> +		vq->device =3D dev;
> >> +		vq->internal =3D internal;
> >> +	}
> >> +
> >> +	dev->flags |=3D VIRTIO_DEV_RUNNING;
> >> +	dev->priv =3D eth_dev;
> >> +
> >> +	eth_dev->data->dev_link.link_status =3D 1;
> >> +	rte_atomic16_set(&internal->xfer, 1);
> >> +
> >> +	RTE_LOG(INFO, PMD, "New connection established\n");
> >> +
> >> +	return 0;
> > Some freedom is taken away if the new_device and destroy_device
> callbacks are implemented in the driver.
> > For example if one wishes to  call the rte_vhost_enable_guest_notificat=
ion
> function when a new device is brought up. They cannot now as there is no
> scope to modify these callbacks, as is done in for example the vHost samp=
le
> app. Is this correct?
>=20
> So how about adding one more parameter to be able to choose guest
> notification behavior?
>=20
> ex)
> ./testpmd --vdev 'eth_vhost0,iface=3D/tmp/sock0,guest_notification=3D0'
>=20
> In above case, all queues in this device will have
> VRING_USED_F_NO_NOTIFY.

I'm not too concerned about this particular function, I was just making an =
example. The main concern I was expressing here and in the other thread wit=
h Bruce, is the risk that we will lose some functionality available in the =
library but not in the PMD. This function is an example of that. If we coul=
d find some way to retain the functionality available in the library, it wo=
uld be ideal.

Thanks for the response! I will review and test further patches if they bec=
ome available.

Ciara

>=20
> Thanks,
> Tetsuya