From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f53.google.com (mail-wg0-f53.google.com [74.125.82.53]) by dpdk.org (Postfix) with ESMTP id 5E5432EDA for ; Sun, 28 Dec 2014 11:14:05 +0100 (CET) Received: by mail-wg0-f53.google.com with SMTP id l18so16778151wgh.26 for ; Sun, 28 Dec 2014 02:14:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type; bh=veaCZZR5qyRuKKbVYzGzgvlGrngCWfyMoNrPDwmnOOg=; b=VwFn2ftsqyGZCPyhRIFMCrhUku8VK5mfm/9X7K6SoeXResTmFrnmmxDEn3oPizs8Cu wILrOlTKX7Mf28vtgtCms8Jj5ycDXdJoWQKPy85sCcWBEIETmh6JCPOQy3JeAkGKCnDs ZJrdSYo/PNnCRgi8NTVce7NL8DljSgOeI2v9XBvjh7GVmQfn7KtiXtMjfr8i94dkfZFj Nz9wxwPOWDzmDn+1w0/08/6uqTWvx3ZQT+cj9xnhfSY7d4SWbjhLTBtCAPtiWgMLsqt4 brCYPPJjZPdv7djnRLDL1F55JhrsyYfhE8+6ecpeh+kFzw8IvgXk4+cv36l0TWSlfYSb E0rA== X-Gm-Message-State: ALoCoQl/uhA9aA3DpJcOhkE1i02ntDqHJV0LfjmBI80APEWaT2Q/a77HwJLO8A9BMW4c2N3WdNYk X-Received: by 10.194.58.19 with SMTP id m19mr38774461wjq.52.1419761645074; Sun, 28 Dec 2014 02:14:05 -0800 (PST) Received: from [10.0.0.1] (bzq-79-179-97-80.red.bezeqint.net. [79.179.97.80]) by mx.google.com with ESMTPSA id js5sm34981161wid.11.2014.12.28.02.14.03 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Dec 2014 02:14:04 -0800 (PST) Message-ID: <549FD7EA.60504@cloudius-systems.com> Date: Sun, 28 Dec 2014 12:14:02 +0200 From: Vlad Zolotarov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: "Ouyang, Changchun" References: <1419389808-9559-1-git-send-email-changchun.ouyang@intel.com> <1419398584-19520-1-git-send-email-changchun.ouyang@intel.com> <1419398584-19520-6-git-send-email-changchun.ouyang@intel.com> <549A97F6.30901@cloudius-systems.com> <549C0F10.8050402@cloudius-systems.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Dec 2014 10:14:06 -0000 On 12/26/14 10:45, Ouyang, Changchun wrote: > Hi Vladislav, > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] > Sent: Friday, December 26, 2014 3:37 PM > To: Ouyang, Changchun > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS > > > On Dec 26, 2014 9:28 AM, "Ouyang, Changchun" wrote: >> Hi Vladislav, >> >> >> >> From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] >> Sent: Friday, December 26, 2014 2:49 PM >> To: Ouyang, Changchun >> Cc: dev@dpdk.org >> Subject: RE: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS >> >> >> >> >> On Dec 26, 2014 3:52 AM, "Ouyang, Changchun" wrote: >>> >>> >>>> -----Original Message----- >>>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >>>> Sent: Thursday, December 25, 2014 9:20 PM >>>> To: Ouyang, Changchun; dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS >>>> >>>> >>>> On 12/25/14 04:43, Ouyang, Changchun wrote: >>>>> Hi, >>>>> Sorry miss some comments, so continue my response below, >>>>> >>>>>> -----Original Message----- >>>>>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >>>>>> Sent: Wednesday, December 24, 2014 6:40 PM >>>>>> To: Ouyang, Changchun; dev@dpdk.org >>>>>> Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS >>>>>> >>>>>> >>>>>> On 12/24/14 07:23, Ouyang Changchun wrote: >>>>>>> It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable >>>> VF >>>>>> RSS. >>>>>>> The psrtype will determine how many queues the received packets will >>>>>>> distribute to, and the value of psrtype should depends on both facet: >>>>>>> max VF rxq number which has been negotiated with PF, and the number >>>>>>> of >>>>>> rxq specified in config on guest. >>>>>>> Signed-off-by: Changchun Ouyang >>>>>>> --- >>>>>>> lib/librte_pmd_ixgbe/ixgbe_pf.c | 15 +++++++ >>>>>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 92 >>>>>> ++++++++++++++++++++++++++++++++++----- >>>>>>> 2 files changed, 97 insertions(+), 10 deletions(-) >>>>>>> >>>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c >>>>>>> b/lib/librte_pmd_ixgbe/ixgbe_pf.c index cbb0145..9c9dad8 100644 >>>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c >>>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c >>>>>>> @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev >>>>>> *eth_dev) >>>>>>> IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw- >>>> mac.num_rar_entries), 0); >>>>>>> IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw- >>>> mac.num_rar_entries), 0); >>>>>>> + /* >>>>>>> + * VF RSS can support at most 4 queues for each VF, even if >>>>>>> + * 8 queues are available for each VF, it need refine to 4 >>>>>>> + * queues here due to this limitation, otherwise no queue >>>>>>> + * will receive any packet even RSS is enabled. >>>>>> According to Table 7-3 in the 82599 spec RSS is not available when >>>>>> port is configured to have 8 queues per pool. This means that if u >>>>>> see this configuration u may immediately disable RSS flow in your code. >>>>>> >>>>>>> + */ >>>>>>> + if (eth_dev->data->dev_conf.rxmode.mq_mode == >>>>>> ETH_MQ_RX_VMDQ_RSS) { >>>>>>> + if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) { >>>>>>> + RTE_ETH_DEV_SRIOV(eth_dev).active = >>>>>> ETH_32_POOLS; >>>>>>> + RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4; >>>>>>> + RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = >>>>>>> + dev_num_vf(eth_dev) * 4; >>>>>> According to 82599 spec u can't do that since RSS is not allowed when >>>>>> port is configured to have 8 function per-VF. Have u verified that >>>>>> this works? If yes, then spec should be updated. >>>>>> >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> /* set VMDq map to default PF pool */ >>>>>>> hw->mac.ops.set_vmdq(hw, 0, >>>>>>> RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx); >>>>>>> >>>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c >>>>>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c >>>>>>> index f69abda..a7c17a4 100644 >>>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c >>>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c >>>>>>> @@ -3327,6 +3327,39 @@ ixgbe_alloc_rx_queue_mbufs(struct >>>>>> igb_rx_queue *rxq) >>>>>>> } >>>>>>> >>>>>>> static int >>>>>>> +ixgbe_config_vf_rss(struct rte_eth_dev *dev) { >>>>>>> + struct ixgbe_hw *hw; >>>>>>> + uint32_t mrqc; >>>>>>> + >>>>>>> + ixgbe_rss_configure(dev); >>>>>>> + >>>>>>> + hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >>>>>>> + >>>>>>> + /* MRQC: enable VF RSS */ >>>>>>> + mrqc = IXGBE_READ_REG(hw, IXGBE_MRQC); >>>>>>> + mrqc &= ~IXGBE_MRQC_MRQE_MASK; >>>>>>> + switch (RTE_ETH_DEV_SRIOV(dev).active) { >>>>>>> + case ETH_64_POOLS: >>>>>>> + mrqc |= IXGBE_MRQC_VMDQRSS64EN; >>>>>>> + break; >>>>>>> + >>>>>>> + case ETH_32_POOLS: >>>>>>> + case ETH_16_POOLS: >>>>>>> + mrqc |= IXGBE_MRQC_VMDQRSS32EN; >>>>>> Again, this contradicts with the spec. >>>>> Yes, the spec say the hw can't support vf rss at all, but experiment find that >>>> could be done. >>>> >>>> The spec explicitly say that VF RSS *is* supported in particular in the table >>>> mentioned above. >>> But the spec(January 2014 revision 2.9) on my hand says: "in IOV mode, VMDq+RSS mode is not available" in note of section 4.6.10.2.1 >>> And still there is the whole section about configuring packet filtering including Rx in the VF mode (including the table i've referred) . It's quite confusing i must say... >> Changchun: do you mind tell me which table you are referring to, I will try to have a look and may share my thought if I can. >> >>>> What your code is doing is that in case of 16 VFs u setup a 32 pools >>>> configuration and use only 16 out of them. >>> But I don't see any big issue here, in this case, each vf COULD have 8 queues, like I said before, but this is estimation value, actually only 4 queues >>> Are really available for one vf, you can refer to spec for the correctness here. >>> No issues, i just wanted to clarify that it seems like you are doing it quite according to the spec. >>>>> We can focus on discussing the implementation firstly. >>> Right. So, after we clarified that there is nothing u can do at the moment about the rss query flow, there is one more open issue here. >>> In general we need a way to know how many queues from those that are available may be configured as RSS. While the same issue is present with the PF as well (it's 16 for 82599 but it may be a different number for a different device) for VF it's more pronounced since it depends on the PF configuration. >>> Don't u think it would be logical to add a specific filed for it in the dev_info struct? >> Changchun: you are right, and we have already the max_rx_queues in dev_info, >> >> while negotiating between pf and vf, the negotiated max rx queue number will be set into hw->mac.max_rx_queues, >> >> And after that when you call ixgbe_dev_info_get, that value will be set into dev_info->max_rx_queues. >> >> Then you could get the number of queue all packets will distribute to by getting dev_info->max_rx_queues. >> I'm afraid u've missed my point here. For instance, for a PF max_rx_queues will be set to 128 while u may only configure 16 RSS queues. The similar will happen for a VF in the 16 VF >> configuration: max_rx_queues will be set to 8 while u may configure only 4 RSS queues. >> This is why i suggested to add a separate info field... 😉 > Yes, I got your point this time, but the issue is that when I have 16 vf, and try to set max_rx_queues as 8, then no queue can rx any packet on vf, > This is why I have to add a logic to refine the rx queue number from 8 to 4 queues. > I have tried to do it in the way as you suggest, but unfortunately rx queue can't work. If you find any other good method, pls let me know. Pls., note that RSS is not the only multi-queue mode supported by both HW and DPDK - there is a DCB mode. This mode is also supported in the VF mode according to the same Table 7-3. And, according to the same table, there is a 8 TC per 16 pools mode. Therefore if a user desires to utilize all 8 available Rx queues of VF he/she could - in a DCB Rx mode. Now looking at your code a bit mode deeply I see that u cut the number of Rx queues per pool down to 4 in a PF configuration when VMDQ_RSS mode is requested, which is ok but it still leaves the general issue open. Let's describe it in details for PF and VF separately: For a PF: * When a user queries the PF he only gets the maximum number of Rx queues and he has no way to know what is a maximum set of RSS/DCB queues he/she may configure. E.g. for 82599 PF the maximum Rx queues number is 128 and the maximum RSS set size is 16 (see table 7-1 in the spec for all set of supported modes). * Therefore the user can't write a generic vendor independent code that will be able to configure RSS for a PF based on the current rte_eth_dev_info_get() output. For a VF: * Similarly to PF above, if VF supports both RSS and DCB configurations, having the max_rx_queues is not enough since the maximum RSS set may be smaller from that number. "Luckily", 82599 supports only either RSS or DCB VF configuration at the same time and this is configured globally during PF configuration but who said that later Intel's NICs or other provider's NICs supported by DCB are going to have the same limitation? So, in a general case, we find ourselves in the same uncertainty in a VF case like in a PF case above. * Currently, VF have no tools to know what is a PF multi-queue configuration (Table 7-3): is it RSS or DCB. Your patch-set sort of assumes that PF is accessible at the same level where VF DPDK code runs but this is not the case on the environments when SRIOV was originally targeting to - the virtualization environment, where the Guest code has no access whatsoever to the PF and may only query VF. AWS is one real-life example. So, when a DPDK code has to initialize an SRIOV VF in a Guest OS it lacks the information about both the available Rx MQ mode and it's capabilities (u may say we max_rx_queues for a capabilities but see the bullet above). What I suggest to address all issues above is to add the following fields to the rte_eth_dev_info: 1. rte_eth_rx_mq_mode rx_mode - for supported Rx MQ modes. 2. uint16_t max_rss_queues - for the maximum RSS set size. 3. uint16_t max_dcb_queues - for the maximum number of TCs. These 3 new fields will clearly describe the Rx MQ capabilities of the function. The further correctness checking, like the specific RSS queues number configuration for VF, should be implemented as a error code returned from the rte_eth_dev_configure(). Pls., comment. Vlad > Thanks and regards, > Changchun >