RE: [PATCH v3] net/vhost: support asynchronous data path

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Xia, Chenbo" <chenbo.xia@intel.com>
To: "Hu, Jiayu" <jiayu.hu@intel.com>,
	"maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>
Cc: "He, Xingguang" <xingguang.he@intel.com>,
	"Jiang, Cheng1" <cheng1.jiang@intel.com>,
	"Ma, WenwuX" <wenwux.ma@intel.com>,
	"Wang, YuanX" <yuanx.wang@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [PATCH v3] net/vhost: support asynchronous data path
Date: Tue, 18 Oct 2022 11:59:15 +0000	[thread overview]
Message-ID: <SN6PR11MB3504C5A2D10B93164E4562D39C289@SN6PR11MB3504.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CY5PR11MB6487A05028F9A0BC9DCA08AB92209@CY5PR11MB6487.namprd11.prod.outlook.com>

> -----Original Message-----
> From: Hu, Jiayu <jiayu.hu@intel.com>
> Sent: Monday, October 10, 2022 1:17 PM
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: He, Xingguang <xingguang.he@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH v3] net/vhost: support asynchronous data path
> 
> Hi Chenbo and Maxime,
> 
> > -----Original Message-----
> > From: Wang, YuanX <yuanx.wang@intel.com>
> > Sent: Wednesday, August 24, 2022 12:36 AM
> > To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> > dev@dpdk.org
> > Cc: Hu, Jiayu <jiayu.hu@intel.com>; He, Xingguang
> > <xingguang.he@intel.com>; Jiang, Cheng1 <cheng1.jiang@intel.com>; Wang,
> > YuanX <yuanx.wang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>
> > Subject: [PATCH v3] net/vhost: support asynchronous data path
> >
> > Vhost asynchronous data-path offloads packet copy from the CPU to the
> > DMA engine. As a result, large packet copy can be accelerated by the DMA
> > engine, and vhost can free CPU cycles for higher level functions.
> >
> > In this patch, we enable asynchronous data-path for vhostpmd.
> > Asynchronous data path is enabled per tx/rx queue, and users need to
> > specify the DMA device used by the tx/rx queue. Each tx/rx queue only
> > supports to use one DMA device, but one DMA device can be shared among
> > multiple tx/rx queues of different vhostpmd ports.
> >
> > Two PMD parameters are added:
> > - dmas:	specify the used DMA device for a tx/rx queue.
> > 	(Default: no queues enable asynchronous data path)
> > - dma-ring-size: DMA ring size.
> > 	(Default: 4096).
> >
> > Here is an example:
> > --vdev
> > 'eth_vhost0,iface=./s0,dmas=[txq0@0000:00.01.0;rxq0@0000:00.01.1],dma-
> > ring-size=4096'
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > ---
> >
> >  static uint16_t
> >  eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)  { @@ -
> > 403,7 +469,7 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t
> > nb_bufs)
> >  	uint16_t nb_receive = nb_bufs;
> >
> >  	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> > -		return 0;
> > +		goto tx_poll;
> >
> >  	rte_atomic32_set(&r->while_queuing, 1);
> >
> > @@ -411,19 +477,36 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs,
> > uint16_t nb_bufs)
> >  		goto out;
> >
> >  	/* Dequeue packets from guest TX queue */
> > -	while (nb_receive) {
> > -		uint16_t nb_pkts;
> > -		uint16_t num = (uint16_t)RTE_MIN(nb_receive,
> > -						 VHOST_MAX_PKT_BURST);
> > -
> > -		nb_pkts = rte_vhost_dequeue_burst(r->vid, r->virtqueue_id,
> > -						  r->mb_pool, &bufs[nb_rx],
> > -						  num);
> > -
> > -		nb_rx += nb_pkts;
> > -		nb_receive -= nb_pkts;
> > -		if (nb_pkts < num)
> > -			break;
> > +	if (!r->async_register) {
> > +		while (nb_receive) {
> > +			uint16_t nb_pkts;
> > +			uint16_t num = (uint16_t)RTE_MIN(nb_receive,
> > +
> > 	VHOST_MAX_PKT_BURST);
> > +
> > +			nb_pkts = rte_vhost_dequeue_burst(r->vid, r-
> > >virtqueue_id,
> > +						r->mb_pool, &bufs[nb_rx],
> > +						num);
> > +
> > +			nb_rx += nb_pkts;
> > +			nb_receive -= nb_pkts;
> > +			if (nb_pkts < num)
> > +				break;
> > +		}
> > +	} else {
> > +		while (nb_receive) {
> > +			uint16_t nb_pkts;
> > +			uint16_t num = (uint16_t)RTE_MIN(nb_receive,
> > VHOST_MAX_PKT_BURST);
> > +			int nr_inflight;
> > +
> > +			nb_pkts = rte_vhost_async_try_dequeue_burst(r-
> > >vid, r->virtqueue_id,
> > +						r->mb_pool, &bufs[nb_rx],
> > num, &nr_inflight,
> > +						r->dma_id, 0);
> > +
> > +			nb_rx += nb_pkts;
> > +			nb_receive -= nb_pkts;
> > +			if (nb_pkts < num)
> > +				break;
> > +		}
> >  	}
> >
> >  	r->stats.pkts += nb_rx;
> > @@ -444,6 +527,17 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs,
> > uint16_t nb_bufs)
> >  out:
> >  	rte_atomic32_set(&r->while_queuing, 0);
> >
> > +tx_poll:
> > +	/**
> > +	 * Poll and free completed packets for the virtqueue of Tx queue.
> > +	 * Note that we access Tx queue's virtqueue, which is protected
> > +	 * by vring lock.
> > +	 */
> > +	if (!async_tx_poll_completed && r->txq->async_register) {
> > +		vhost_tx_free_completed(r->vid, r->txq->virtqueue_id, r-
> > >txq->dma_id,
> > +				r->cmpl_pkts, VHOST_MAX_PKT_BURST);
> > +	}
> > +
> 
> For Tx queue's rte_vhost_poll_enqueue_completed function, there are two
> place
> to call it. One is the TX path, the other is the RX path. Calling it in
> the RX path can
> make the front-end receive the last burst of TX packets when there are no
> more TX
> operations, but it also potentially causes more DMA contentions between
> lcores.
> For example, testpmd has 2 lcores, one NIC port and one vhost PMD port
> with
> one queue. The RXQ and TXQ of vhost PMD port use dedicated DMA devices. So
> the traffic flow is like below:
> lcore0: NIC RXQ -> vhost TXQ
> lcore1: vhost RXQ -> NIC TXQ
> Calling rte_vhost_poll_enqueue_completed function in the vhost RX path
> will cause
> lcore1 contend the DMA device used by the vhost TXQ while operating the
> vhost RXQ.
> Performance degradation depends on different cases. In the 4-core 2-queue
> PVP case,
> testpmd throughput degradation is ~10%.
> 
> Calling it in the TX path can avoid the DMA contention case above, but the
> front-end
> cannot receive the last burst of TX packets if there is no more TX
> operations for the
> same TXQ.
> 
> In the current implementation, we select the first design, but users can
> enable vhost
> PMD to call rte_vhost_poll_enqueue_completed in the TX path by the testpmd
> command
> during runtime. So the DMA contention above can be avoided in testpmd.
> 
> I wonder if you have any comments on the design?

I now will prefer the current design:

1. Make the packet finish by calling it in RX.
2. Still leave one testpmd option to avoid performance issue/tx-only issue to
   get best perf and make tx-only work.

As vhost PMD is targeted for mainly testing now, make sure functionality works
well and also leave one way to get some reference performance number (which will
be useful for real production-level APP like OVS to refer). That's good for
vhost PMD IMHO.

Maxime, any different opinion?

Thanks,
Chenbo

> 
> Thanks,
> Jiayu
> 
> >  	return nb_rx;
> >  }
> >

next prev parent reply	other threads:[~2022-10-18 11:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-14 15:06 [PATCH] " Jiayu Hu
2022-08-18  2:05 ` [PATCH v2] " Jiayu Hu
2022-08-23 16:35 ` [PATCH v3] " Yuan Wang
2022-09-26  6:55   ` Xia, Chenbo
2022-09-27  7:34     ` Wang, YuanX
2022-09-28  8:13       ` Wang, YuanX
2022-10-10  5:17   ` Hu, Jiayu
2022-10-18 11:59     ` Xia, Chenbo [this message]
2022-09-29 19:47 ` [PATCH v4] " Yuan Wang
2022-10-19 14:10   ` Xia, Chenbo
2022-10-20 14:00     ` Wang, YuanX
2022-10-24 15:14 ` [PATCH v5] " Yuan Wang
2022-10-24  9:02   ` Xia, Chenbo
2022-10-24  9:25     ` Wang, YuanX
2022-10-24  9:08   ` Maxime Coquelin
2022-10-25  2:14     ` Hu, Jiayu
2022-10-25  7:52       ` Maxime Coquelin
2022-10-25  9:15         ` Hu, Jiayu
2022-10-25 15:33           ` Maxime Coquelin
2022-10-25 15:44             ` Bruce Richardson
2022-10-25 16:04               ` Maxime Coquelin
2022-10-26  2:48                 ` Hu, Jiayu
2022-10-26  5:00                   ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR11MB3504C5A2D10B93164E4562D39C289@SN6PR11MB3504.namprd11.prod.outlook.com \
    --to=chenbo.xia@intel.com \
    --cc=cheng1.jiang@intel.com \
    --cc=dev@dpdk.org \
    --cc=jiayu.hu@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=wenwux.ma@intel.com \
    --cc=xingguang.he@intel.com \
    --cc=yuanx.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).