From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vladz@cloudius-systems.com>
Received: from mail-wg0-f46.google.com (mail-wg0-f46.google.com [74.125.82.46])
 by dpdk.org (Postfix) with ESMTP id 17F2A282
 for <dev@dpdk.org>; Thu, 25 Dec 2014 14:13:43 +0100 (CET)
Received: by mail-wg0-f46.google.com with SMTP id x13so13123134wgg.33
 for <dev@dpdk.org>; Thu, 25 Dec 2014 05:13:43 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=cHPyzvQq9aB/keeybnHxOCkl6Voii4o+/dKil1u0v4s=;
 b=dwP7dvppLjy640uZ0aWXsOUWrYX8VCVzN2ta2nwYPU+0GgjpIelozWUEyInBKwwtmY
 P8DGVnQ2O9tJRiXNM3WCZdw78j/EEhfqhB2ZCnumFse2r7KWDZ3wKrq6uYPEPsJo/4uT
 pbZrEOdFI/s/3toKa9K7J+77G0mh62R7x5j3nN3uurOikWTRG0kuZ7hNTYli8gMY53P5
 LW/z3pdPXw4HqPBaX/OtTwn/h2NeRSybpMqisYC5OlQQveZF8E442PDEBl86kTM+SwX/
 nxuvchY8sQRWTTrUnc1YJyXjSQ541SFWIj1DYAONZp2bIZxNJoMCOa0tirv9eKsiFvJ1
 jeOQ==
X-Gm-Message-State: ALoCoQntEipYXYnclSpg9PSe1co5qE7jRInPgBn3D09aasobq2XqfpmLflrPkdwjJv3Pn3O5dWEW
X-Received: by 10.194.110.69 with SMTP id hy5mr73973302wjb.121.1419513222903; 
 Thu, 25 Dec 2014 05:13:42 -0800 (PST)
Received: from [10.0.0.165] (system.cloudius-systems.com. [84.94.198.183])
 by mx.google.com with ESMTPSA id nj9sm24707383wic.10.2014.12.25.05.13.41
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 25 Dec 2014 05:13:42 -0800 (PST)
Message-ID: <549C0D84.60305@cloudius-systems.com>
Date: Thu, 25 Dec 2014 15:13:40 +0200
From: Vlad Zolotarov <vladz@cloudius-systems.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: "Ouyang, Changchun" <changchun.ouyang@intel.com>, 
 "dev@dpdk.org" <dev@dpdk.org>
References: <1419389808-9559-1-git-send-email-changchun.ouyang@intel.com>
 <1419398584-19520-1-git-send-email-changchun.ouyang@intel.com>
 <1419398584-19520-6-git-send-email-changchun.ouyang@intel.com>
 <549A97F6.30901@cloudius-systems.com>
 <F52918179C57134FAEC9EA62FA2F96251194A6FE@shsmsx102.ccr.corp.intel.com>
In-Reply-To: <F52918179C57134FAEC9EA62FA2F96251194A6FE@shsmsx102.ccr.corp.intel.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Dec 2014 13:13:43 -0000


On 12/25/14 04:14, Ouyang, Changchun wrote:
> Hi,
>
>> -----Original Message-----
>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
>> Sent: Wednesday, December 24, 2014 6:40 PM
>> To: Ouyang, Changchun; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
>>
>>
>> On 12/24/14 07:23, Ouyang Changchun wrote:
>>> It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF
>> RSS.
>>> The psrtype will determine how many queues the received packets will
>>> distribute to, and the value of psrtype should depends on both facet:
>>> max VF rxq number which has been negotiated with PF, and the number of
>> rxq specified in config on guest.
>>> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
>>> ---
>>>    lib/librte_pmd_ixgbe/ixgbe_pf.c   | 15 +++++++
>>>    lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 92
>> ++++++++++++++++++++++++++++++++++-----
>>>    2 files changed, 97 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
>>> b/lib/librte_pmd_ixgbe/ixgbe_pf.c index cbb0145..9c9dad8 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
>>> @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
>> *eth_dev)
>>>    	IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
>>> mac.num_rar_entries), 0);
>>>    	IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
>>> mac.num_rar_entries), 0);
>>>
>>> +	/*
>>> +	 * VF RSS can support at most 4 queues for each VF, even if
>>> +	 * 8 queues are available for each VF, it need refine to 4
>>> +	 * queues here due to this limitation, otherwise no queue
>>> +	 * will receive any packet even RSS is enabled.
>> According to Table 7-3 in the 82599 spec RSS is not available when port is
>> configured to have 8 queues per pool. This means that if u see this
>> configuration u may immediately disable RSS flow in your code.
>>
> 8 queues here means the available number queue per vf, it is calculated according to max vfs,
> e.g. if max vfs is 16(or less than), then each vf 'COULD' have 8 queues evenly, pf early init stage estimate this value,
> but that is not precise, so need refine this.
> User don't know this estimated value, it is internal value, not come from user's input/configure.
> Hope it is clear to you.
>>> +	 */
>>> +	if (eth_dev->data->dev_conf.rxmode.mq_mode ==
>> ETH_MQ_RX_VMDQ_RSS) {
>>> +		if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
>>> +			RTE_ETH_DEV_SRIOV(eth_dev).active =
>> ETH_32_POOLS;
>>> +			RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
>>> +			RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
>>> +				dev_num_vf(eth_dev) * 4;
>> According to 82599 spec u can't do that since RSS is not allowed when port is
>> configured to have 8 function per-VF. Have u verified that this works? If yes,
>> then spec should be updated.
>>
> Response as above,
> Of course I have validated this. It works well.
>
>>> +		}
>>> +	}
>>> +
>>>    	/* set VMDq map to default PF pool */
>>>    	hw->mac.ops.set_vmdq(hw, 0,
>>> RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx);
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> index f69abda..a7c17a4 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> @@ -3327,6 +3327,39 @@ ixgbe_alloc_rx_queue_mbufs(struct
>> igb_rx_queue *rxq)
>>>    }
>>>
>>>    static int
>>> +ixgbe_config_vf_rss(struct rte_eth_dev *dev) {
>>> +	struct ixgbe_hw *hw;
>>> +	uint32_t mrqc;
>>> +
>>> +	ixgbe_rss_configure(dev);
>>> +
>>> +	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> +
>>> +	/* MRQC: enable VF RSS */
>>> +	mrqc = IXGBE_READ_REG(hw, IXGBE_MRQC);
>>> +	mrqc &= ~IXGBE_MRQC_MRQE_MASK;
>>> +	switch (RTE_ETH_DEV_SRIOV(dev).active) {
>>> +	case ETH_64_POOLS:
>>> +		mrqc |= IXGBE_MRQC_VMDQRSS64EN;
>>> +		break;
>>> +
>>> +	case ETH_32_POOLS:
>>> +	case ETH_16_POOLS:
>>> +		mrqc |= IXGBE_MRQC_VMDQRSS32EN;
>> Again, this contradicts with the spec.
>>
>>> +		break;
>>> +
>>> +	default:
>>> +		PMD_INIT_LOG(ERR, "Invalid pool number in IOV mode");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int
>>>    ixgbe_dev_mq_rx_configure(struct rte_eth_dev *dev)
>>>    {
>>>    	struct ixgbe_hw *hw =
>>> @@ -3358,24 +3391,38 @@ ixgbe_dev_mq_rx_configure(struct
>> rte_eth_dev *dev)
>>>    			default: ixgbe_rss_disable(dev);
>>>    		}
>>>    	} else {
>>> -		switch (RTE_ETH_DEV_SRIOV(dev).active) {
>>>    		/*
>>>    		 * SRIOV active scheme
>>>    		 * FIXME if support DCB/RSS together with VMDq & SRIOV
>>>    		 */
>>> -		case ETH_64_POOLS:
>>> -			IXGBE_WRITE_REG(hw, IXGBE_MRQC,
>> IXGBE_MRQC_VMDQEN);
>>> +		switch (dev->data->dev_conf.rxmode.mq_mode) {
>>> +		case ETH_MQ_RX_RSS:
>>> +		case ETH_MQ_RX_VMDQ_RSS:
>>> +			ixgbe_config_vf_rss(dev);
>>>    			break;
>>>
>>> -		case ETH_32_POOLS:
>>> -			IXGBE_WRITE_REG(hw, IXGBE_MRQC,
>> IXGBE_MRQC_VMDQRT4TCEN);
>>> -			break;
>>> +		default:
>>> +			switch (RTE_ETH_DEV_SRIOV(dev).active) {
>> Sorry for nitpicking but have u considered taking this encapsulated "switch-
>> case" block into a separate function? This could make the code look a lot
>> nicer. ;)
> Only one place use it, so don't need make it a function,
> And I prefer to the current code.

Functions may be used not only to have a repeatedly called code but also 
to make a caller code more readable. Encapsulated switch-case is one of 
the examples of a *not* readable code constructs which should be avoided.

>
>>> +			case ETH_64_POOLS:
>>> +				IXGBE_WRITE_REG(hw, IXGBE_MRQC,
>>> +					IXGBE_MRQC_VMDQEN);
>>> +				break;
>>>
>>> -		case ETH_16_POOLS:
>>> -			IXGBE_WRITE_REG(hw, IXGBE_MRQC,
>> IXGBE_MRQC_VMDQRT8TCEN);
>>> +			case ETH_32_POOLS:
>>> +				IXGBE_WRITE_REG(hw, IXGBE_MRQC,
>>> +					IXGBE_MRQC_VMDQRT4TCEN);
>>> +				break;
>>> +
>>> +			case ETH_16_POOLS:
>>> +				IXGBE_WRITE_REG(hw, IXGBE_MRQC,
>>> +					IXGBE_MRQC_VMDQRT8TCEN);
>>> +				break;
>>> +			default:
>>> +				PMD_INIT_LOG(ERR,
>>> +					"invalid pool number in IOV mode");
>>> +				break;
>>> +			}
>>>    			break;
>>> -		default:
>>> -			PMD_INIT_LOG(ERR, "invalid pool number in IOV
>> mode");
>>>    		}
>>>    	}
>>>
>>> @@ -3989,10 +4036,32 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev)
>>>    	uint16_t buf_size;
>>>    	uint16_t i;
>>>    	int ret;
>>> +	uint16_t valid_rxq_num;
>>>
>>>    	PMD_INIT_FUNC_TRACE();
>>>    	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>
>>> +	valid_rxq_num = RTE_MIN(dev->data->nb_rx_queues,
>>> +hw->mac.max_rx_queues);
>>> +
>>> +	/*
>>> +	 * VMDq RSS can't support 3 queues, so config it into 4 queues,
>>> +	 * and give user a hint that some packets may loss if it doesn't
>>> +	 * poll the queue where those packets are distributed to.
>>> +	 */
>>> +	if (valid_rxq_num == 3)
>>> +		valid_rxq_num = 4;
>> Why to configure more queues that requested and not less (2)? Why to
>> configure anything at all and not return an error?
>>
>>> +
>>> +	if (dev->data->nb_rx_queues > valid_rxq_num) {
>>> +		PMD_INIT_LOG(ERR, "The number of Rx queue invalid, "
>>> +			"it should be equal to or less than %d",
>>> +			valid_rxq_num);
>>> +		return -1;
>>> +	} else if (dev->data->nb_rx_queues < valid_rxq_num)
>>> +		PMD_INIT_LOG(ERR, "The number of Rx queue is less "
>>> +			"than the number of available Rx queues:%d, "
>>> +			"packets in Rx queues(q_id >= %d) may loss.",
>>> +			valid_rxq_num, dev->data->nb_rx_queues);
>> Who ever looks in the "INIT_LOG" if everything "work well" and u make it
>> look so by allowing this call to succeed. And then some packets will just
>> silently not arrive?! And what the used should somehow guess to do?
>> - Look in the "INIT_LOG"?! This is a nightmare!
> Sorry, I don't think so again, if user find any packets loss, he will care for log,
> Then he can find that log there, then user can refine its rxq number due the wrong rxq number,
> Why is it a nightmare?

Because usually u expect that if the function call returns with a 
success it means a success. Why a user has to learn that a device 
configuration function was provided with wrong parameters from the 
packet loss? If parameters are not allowed u expect to get an error as a 
return value. Since when errors are returned in a form of a log message? 
Why do u think there is a living person running a DPDK based 
application? How do u expect somebody build an automated environment 
when part of errors are returned in some log? Should he/she add a log 
parser?
On the other hand, why do u think 4 queues is a better option for a user 
than 2 queue when he asked for 3 queues? What kind of heuristics is that?

To summarize - it would be much better if u just returned an EINVAL 
error in that case.

>
> I don't agree with you about "silently not arrive", because we have hint/log there.
>
> Return error here is also possible way,

It's the only possible way! ;)

> Again need other guys' insight here.
>
>>> +
>>>    	/*
>>>    	 * When the VF driver issues a IXGBE_VF_RESET request, the PF
>> driver
>>>    	 * disables the VF receipt of packets if the PF MTU is > 1500.
>>> @@ -4094,6 +4163,9 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev)
>>>    			IXGBE_PSRTYPE_IPV6HDR;
>>>    #endif
>>>
>>> +	/* Set RQPL for VF RSS according to max Rx queue */
>>> +	psrtype |= (valid_rxq_num >> 1) <<
>>> +		IXGBE_PSRTYPE_RQPL_SHIFT;
>>>    	IXGBE_WRITE_REG(hw, IXGBE_VFPSRTYPE, psrtype);
>>>
>>>    	if (dev->data->dev_conf.rxmode.enable_scatter) {