From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <changchun.ouyang@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 79C6D689B
 for <dev@dpdk.org>; Fri, 26 Dec 2014 02:52:31 +0100 (CET)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga103.fm.intel.com with ESMTP; 25 Dec 2014 17:48:32 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.07,646,1413270000"; d="scan'208";a="653265762"
Received: from pgsmsx104.gar.corp.intel.com ([10.221.44.91])
 by fmsmga002.fm.intel.com with ESMTP; 25 Dec 2014 17:52:28 -0800
Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by
 PGSMSX104.gar.corp.intel.com (10.221.44.91) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Fri, 26 Dec 2014 09:52:27 +0800
Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.216]) by
 SHSMSX103.ccr.corp.intel.com ([169.254.4.240]) with mapi id 14.03.0195.001;
 Fri, 26 Dec 2014 09:52:26 +0800
From: "Ouyang, Changchun" <changchun.ouyang@intel.com>
To: Vlad Zolotarov <vladz@cloudius-systems.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
Thread-Index: AQHQHznC9fbr7m6mLEynsGIYrmMT5JyeB1MAgAGQNdCAAC7zAIABUPNQ
Date: Fri, 26 Dec 2014 01:52:25 +0000
Message-ID: <F52918179C57134FAEC9EA62FA2F96251194B681@shsmsx102.ccr.corp.intel.com>
References: <1419389808-9559-1-git-send-email-changchun.ouyang@intel.com>
 <1419398584-19520-1-git-send-email-changchun.ouyang@intel.com>
 <1419398584-19520-6-git-send-email-changchun.ouyang@intel.com>
 <549A97F6.30901@cloudius-systems.com>
 <F52918179C57134FAEC9EA62FA2F96251194A773@shsmsx102.ccr.corp.intel.com>
 <549C0F10.8050402@cloudius-systems.com>
In-Reply-To: <549C0F10.8050402@cloudius-systems.com>
Accept-Language: zh-CN, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Dec 2014 01:52:32 -0000



> -----Original Message-----
> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
> Sent: Thursday, December 25, 2014 9:20 PM
> To: Ouyang, Changchun; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
>=20
>=20
> On 12/25/14 04:43, Ouyang, Changchun wrote:
> > Hi,
> > Sorry miss some comments, so continue my response below,
> >
> >> -----Original Message-----
> >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
> >> Sent: Wednesday, December 24, 2014 6:40 PM
> >> To: Ouyang, Changchun; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
> >>
> >>
> >> On 12/24/14 07:23, Ouyang Changchun wrote:
> >>> It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable
> VF
> >> RSS.
> >>> The psrtype will determine how many queues the received packets will
> >>> distribute to, and the value of psrtype should depends on both facet:
> >>> max VF rxq number which has been negotiated with PF, and the number
> >>> of
> >> rxq specified in config on guest.
> >>> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> >>> ---
> >>>    lib/librte_pmd_ixgbe/ixgbe_pf.c   | 15 +++++++
> >>>    lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 92
> >> ++++++++++++++++++++++++++++++++++-----
> >>>    2 files changed, 97 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> >>> b/lib/librte_pmd_ixgbe/ixgbe_pf.c index cbb0145..9c9dad8 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> >>> @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
> >> *eth_dev)
> >>>    	IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
> mac.num_rar_entries), 0);
> >>>    	IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
> mac.num_rar_entries), 0);
> >>>
> >>> +	/*
> >>> +	 * VF RSS can support at most 4 queues for each VF, even if
> >>> +	 * 8 queues are available for each VF, it need refine to 4
> >>> +	 * queues here due to this limitation, otherwise no queue
> >>> +	 * will receive any packet even RSS is enabled.
> >> According to Table 7-3 in the 82599 spec RSS is not available when
> >> port is configured to have 8 queues per pool. This means that if u
> >> see this configuration u may immediately disable RSS flow in your code=
.
> >>
> >>> +	 */
> >>> +	if (eth_dev->data->dev_conf.rxmode.mq_mode =3D=3D
> >> ETH_MQ_RX_VMDQ_RSS) {
> >>> +		if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool =3D=3D 8) {
> >>> +			RTE_ETH_DEV_SRIOV(eth_dev).active =3D
> >> ETH_32_POOLS;
> >>> +			RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool =3D 4;
> >>> +			RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =3D
> >>> +				dev_num_vf(eth_dev) * 4;
> >> According to 82599 spec u can't do that since RSS is not allowed when
> >> port is configured to have 8 function per-VF. Have u verified that
> >> this works? If yes, then spec should be updated.
> >>
> >>> +		}
> >>> +	}
> >>> +
> >>>    	/* set VMDq map to default PF pool */
> >>>    	hw->mac.ops.set_vmdq(hw, 0,
> >>> RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx);
> >>>
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> index f69abda..a7c17a4 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> @@ -3327,6 +3327,39 @@ ixgbe_alloc_rx_queue_mbufs(struct
> >> igb_rx_queue *rxq)
> >>>    }
> >>>
> >>>    static int
> >>> +ixgbe_config_vf_rss(struct rte_eth_dev *dev) {
> >>> +	struct ixgbe_hw *hw;
> >>> +	uint32_t mrqc;
> >>> +
> >>> +	ixgbe_rss_configure(dev);
> >>> +
> >>> +	hw =3D IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> +
> >>> +	/* MRQC: enable VF RSS */
> >>> +	mrqc =3D IXGBE_READ_REG(hw, IXGBE_MRQC);
> >>> +	mrqc &=3D ~IXGBE_MRQC_MRQE_MASK;
> >>> +	switch (RTE_ETH_DEV_SRIOV(dev).active) {
> >>> +	case ETH_64_POOLS:
> >>> +		mrqc |=3D IXGBE_MRQC_VMDQRSS64EN;
> >>> +		break;
> >>> +
> >>> +	case ETH_32_POOLS:
> >>> +	case ETH_16_POOLS:
> >>> +		mrqc |=3D IXGBE_MRQC_VMDQRSS32EN;
> >> Again, this contradicts with the spec.
> > Yes, the spec say the hw can't support vf rss at all, but experiment fi=
nd that
> could be done.
>=20
> The spec explicitly say that VF RSS *is* supported in particular in the t=
able
> mentioned above.
But the spec(January 2014 revision 2.9) on my hand says: "in IOV mode, VMDq=
+RSS mode is not available"  in note of section 4.6.10.2.1
> What your code is doing is that in case of 16 VFs u setup a 32 pools
> configuration and use only 16 out of them.
But I don't see any big issue here, in this case, each vf COULD have 8 queu=
es, like I said before, but this is estimation value, actually only 4 queue=
s
Are really available for one vf, you can refer to spec for the correctness =
here.
>=20
> > We can focus on discussing the implementation firstly.
>=20
> >
> >>> +		break;
> >>> +
> >>> +	default:
> >>> +		PMD_INIT_LOG(ERR, "Invalid pool number in IOV mode");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +static int
> >>>    ixgbe_dev_mq_rx_configure(struct rte_eth_dev *dev)
> >>>    {
> >>>    	struct ixgbe_hw *hw =3D
> >>> @@ -3358,24 +3391,38 @@ ixgbe_dev_mq_rx_configure(struct
> >> rte_eth_dev *dev)
> >>>    			default: ixgbe_rss_disable(dev);
> >>>    		}
> >>>    	} else {
> >>> -		switch (RTE_ETH_DEV_SRIOV(dev).active) {
> >>>    		/*
> >>>    		 * SRIOV active scheme
> >>>    		 * FIXME if support DCB/RSS together with VMDq & SRIOV
> >>>    		 */
> >>> -		case ETH_64_POOLS:
> >>> -			IXGBE_WRITE_REG(hw, IXGBE_MRQC,
> >> IXGBE_MRQC_VMDQEN);
> >>> +		switch (dev->data->dev_conf.rxmode.mq_mode) {
> >>> +		case ETH_MQ_RX_RSS:
> >>> +		case ETH_MQ_RX_VMDQ_RSS:
> >>> +			ixgbe_config_vf_rss(dev);
> >>>    			break;
> >>>
> >>> -		case ETH_32_POOLS:
> >>> -			IXGBE_WRITE_REG(hw, IXGBE_MRQC,
> >> IXGBE_MRQC_VMDQRT4TCEN);
> >>> -			break;
> >>> +		default:
> >>> +			switch (RTE_ETH_DEV_SRIOV(dev).active) {
> >> Sorry for nitpicking but have u considered taking this encapsulated
> >> "switch- case" block into a separate function? This could make the
> >> code look a lot nicer. ;)
> >>
> >>> +			case ETH_64_POOLS:
> >>> +				IXGBE_WRITE_REG(hw, IXGBE_MRQC,
> >>> +					IXGBE_MRQC_VMDQEN);
> >>> +				break;
> >>>
> >>> -		case ETH_16_POOLS:
> >>> -			IXGBE_WRITE_REG(hw, IXGBE_MRQC,
> >> IXGBE_MRQC_VMDQRT8TCEN);
> >>> +			case ETH_32_POOLS:
> >>> +				IXGBE_WRITE_REG(hw, IXGBE_MRQC,
> >>> +					IXGBE_MRQC_VMDQRT4TCEN);
> >>> +				break;
> >>> +
> >>> +			case ETH_16_POOLS:
> >>> +				IXGBE_WRITE_REG(hw, IXGBE_MRQC,
> >>> +					IXGBE_MRQC_VMDQRT8TCEN);
> >>> +				break;
> >>> +			default:
> >>> +				PMD_INIT_LOG(ERR,
> >>> +					"invalid pool number in IOV mode");
> >>> +				break;
> >>> +			}
> >>>    			break;
> >>> -		default:
> >>> -			PMD_INIT_LOG(ERR, "invalid pool number in IOV
> >> mode");
> >>>    		}
> >>>    	}
> >>>
> >>> @@ -3989,10 +4036,32 @@ ixgbevf_dev_rx_init(struct rte_eth_dev
> *dev)
> >>>    	uint16_t buf_size;
> >>>    	uint16_t i;
> >>>    	int ret;
> >>> +	uint16_t valid_rxq_num;
> >>>
> >>>    	PMD_INIT_FUNC_TRACE();
> >>>    	hw =3D IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>>
> >>> +	valid_rxq_num =3D RTE_MIN(dev->data->nb_rx_queues,
> >>> +hw->mac.max_rx_queues);
> >>> +
> >>> +	/*
> >>> +	 * VMDq RSS can't support 3 queues, so config it into 4 queues,
> >>> +	 * and give user a hint that some packets may loss if it doesn't
> >>> +	 * poll the queue where those packets are distributed to.
> >>> +	 */
> >>> +	if (valid_rxq_num =3D=3D 3)
> >>> +		valid_rxq_num =3D 4;
> >> Why to configure more queues that requested and not less (2)? Why to
> >> configure anything at all and not return an error?
> > Sorry, I don't agree this is "anything" you say, because I don't use 5,=
6,7,
> 8, ..., 16, 2014, 2015,... etc.
> > By considering 2 or 4,
> > I prefer 4, the reason is if user need more than 3 queues per vf to do
> > something, And pf has also the capability to setup 4 queues per vf,
> > confining to 2 queues is also not good thing, So here try to enable 4 q=
ueues,
> and give user hints here.
> > Btw, change it into 2 is another way, depends on other guys' more insig=
ht
> here.
>=20
> Like I said before, trying to guess what user wants is a way to making a =
code
> that is very hard to use and to maintain. Pls., just return an error and =
let the
> user code deal with it the way he/she really wants and not the way u *thi=
nk*
> he/she wants.
>=20
I didn't disagree on this, either :-)
If you have strong reason for this way and more guys agree with it,
I will modify it probably in v4.=20
> >
> >>> +
> >>> +	if (dev->data->nb_rx_queues > valid_rxq_num) {
> >>> +		PMD_INIT_LOG(ERR, "The number of Rx queue invalid, "
> >>> +			"it should be equal to or less than %d",
> >>> +			valid_rxq_num);
> >>> +		return -1;
> >>> +	} else if (dev->data->nb_rx_queues < valid_rxq_num)
> >>> +		PMD_INIT_LOG(ERR, "The number of Rx queue is less "
> >>> +			"than the number of available Rx queues:%d, "
> >>> +			"packets in Rx queues(q_id >=3D %d) may loss.",
> >>> +			valid_rxq_num, dev->data->nb_rx_queues);
> >> Who ever looks in the "INIT_LOG" if everything "work well" and u make
> >> it look so by allowing this call to succeed. And then some packets
> >> will just silently not arrive?! And what the used should somehow guess=
 to
> do?
> >> - Look in the "INIT_LOG"?! This is a nightmare!
> >>
> >>> +
> >>>    	/*
> >>>    	 * When the VF driver issues a IXGBE_VF_RESET request, the PF
> >> driver
> >>>    	 * disables the VF receipt of packets if the PF MTU is > 1500.
> >>> @@ -4094,6 +4163,9 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev)
> >>>    			IXGBE_PSRTYPE_IPV6HDR;
> >>>    #endif
> >>>
> >>> +	/* Set RQPL for VF RSS according to max Rx queue */
> >>> +	psrtype |=3D (valid_rxq_num >> 1) <<
> >>> +		IXGBE_PSRTYPE_RQPL_SHIFT;
> >>>    	IXGBE_WRITE_REG(hw, IXGBE_VFPSRTYPE, psrtype);
> >>>
> >>>    	if (dev->data->dev_conf.rxmode.enable_scatter) {