From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 680D8A04DD; Wed, 21 Oct 2020 12:53:55 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AF8BEA901; Wed, 21 Oct 2020 12:53:53 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 270569AF1 for ; Wed, 21 Oct 2020 12:53:50 +0200 (CEST) IronPort-SDR: E/o5DSZtxnNN8poMM08UplV+9CyfF0it7f2XogcER6maIy09kYZo4y6YaylDBkdVjGgcx0sGmc lrlU/Qcl+Fag== X-IronPort-AV: E=McAfee;i="6000,8403,9780"; a="252044294" X-IronPort-AV: E=Sophos;i="5.77,401,1596524400"; d="scan'208";a="252044294" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2020 03:53:48 -0700 IronPort-SDR: TQ6NH6siQD+s7az7pcgCNJNLAdySoggEGLTP4t3aqlfSekaYYo0nrcWiUCFwJ9Qm6bOYFJN46V 6EOg0I8jYTaQ== X-IronPort-AV: E=Sophos;i="5.77,401,1596524400"; d="scan'208";a="466255902" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.213.217.119]) ([10.213.217.119]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2020 03:53:44 -0700 To: "Ananyev, Konstantin" , "Yang, SteveX" , "Zhang, Qi Z" , "dev@dpdk.org" Cc: "Zhao1, Wei" , "Guo, Jia" , "Yang, Qiming" , "Wu, Jingjing" , "Xing, Beilei" , "Stokes, Ian" References: <20200923040909.73418-1-stevex.yang@intel.com> <7704b7ce95fd4db2a9c6a8a33c3f0805@intel.com> <77ac2293-e532-e702-2370-c07cdd957c57@intel.com> <483bd509-82b9-9724-d28c-c517ef091e0c@intel.com> <2b14260d-5870-1c4a-2bda-6d35f88c62c3@intel.com> From: Ferruh Yigit Message-ID: Date: Wed, 21 Oct 2020 11:53:40 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag cannot be received by default X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/21/2020 11:44 AM, Ananyev, Konstantin wrote: >>>> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote: >>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which >>>>>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the >>>>>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will >>>>>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw >>>>>>>>>>>>>> side. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max >>>>>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in >>>>>>>>>>>>>>>>>> dev_config >>>>>>>>>>>>>>> ops. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size") >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Signed-off-by: SteveX Yang >>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>> drivers/net/ice/ice_ethdev.c | 11 +++++++++++ >>>>>>>>>>>>>>>>>> 1 file changed, 11 insertions(+) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c >>>>>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index >>>>>>>>>>>>>>>>>> cfd357b05..6b7098444 100644 >>>>>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c >>>>>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c >>>>>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev >>>>>>>>>>>> *dev) >>>>>>>>>>>>>>>>>> struct ice_adapter *ad = >>>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); >>>>>>>>>>>>>>>>>> struct ice_pf *pf = >>>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); >>>>>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD; >>>>>>>>>>>>>>>>>> int ret; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /* Initialize to TRUE. If any of Rx queues doesn't meet the @@ >>>>>>>>>>>>>>>>>> -3157,6 >>>>>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev) >>>>>>>>>>>>>>>>>> if (dev->data->dev_conf.rxmode.mq_mode & >>>>>>>>>>>> ETH_MQ_RX_RSS_FLAG) >>>>>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |= >>>>>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> +/** >>>>>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or >>>>>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead. >>>>>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Why we need this check? >>>>>>>>>>>>>>>>> Can we just call ice_mtu_set directly >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think that without that check we can silently overwrite provided >>>>>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OK, I see >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But still have one question >>>>>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if >>>>>>>>>>>>>>> dev->data->application set >>>>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure. >>>>>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise >>>>>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500). >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, this describe the problem more general and better to replace exist >>>>>>>>>>>> code comment and commit log for easy understanding. >>>>>>>>>>>>> Please send a new version for reword >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I didn't really get this set. >>>>>>>>>>>> >>>>>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than >>>>>>>>>>>> this size is dropped. >>>>>>>>>>> >>>>>>>>>>> Sure, it is normal case for dropping oversize data. >>>>>>>>>>> >>>>>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration >>>>>>>>>>>> in PMD to prevent this? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time. >>>>>>>>>>> This fix will make a decision when confliction occurred. >>>>>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly, >>>>>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead. >>>>>>>>>>> >>>>>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer. >>>>>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'. >>>>>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000' >>>>>>>>>>>> and mean it? PMD will not honor the user config. >>>>>>>>>>> >>>>>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected? >>>>>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd? >>>>>>>>>>>> >>>>>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd, >>>>>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable. >>>>>>>>>>> >>>>>>>>>>>> And I guess even better what we need is to tell to the application what the >>>>>>>>>>>> frame overhead PMD accepts. >>>>>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a >>>>>>>>>>>> given/requested MTU value. >>>>>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD >>>>>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps >>>>>>>>>>>> he has a solution now? >>>>>>>>>> >>>>>>>>>> From my perspective the main problem here: >>>>>>>>>> We have 2 different variables for nearly the same thing: >>>>>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len. >>>>>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure(). >>>>>>>>> >>>>>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled' >>>>>>>>> Although not sure that is practically what is done for all drivers. >>>>>>>> >>>>>>>> I think most of Intel PMDs use it unconditionally. >>>>>>>> >>>>>>>>> >>>>>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync: >>>>>>>>>> - mtu_set() will update both variables. >>>>>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact. >>>>>>>>>> >>>>>>>>>> This patch fixes this inconsistency, which I think is a good thing. >>>>>>>>>> Though yes, it introduces change in behaviour. >>>>>>>>>> >>>>>>>>>> Let say the code: >>>>>>>>>> rte_eth_dev_set_mtu(port, 1500); >>>>>>>>>> dev_conf.max_rx_pkt_len = 1000; >>>>>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf); >>>>>>>>>> >>>>>>>>> >>>>>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before >>>>>>>>> 'rte_eth_dev_set_mtu(). >>>>>>>> >>>>>>>> Usually yes. >>>>>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start(); >>>>>>>> >>>>>>>>> >>>>>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by >>>>>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is. >>>>>>>> >>>>>>>> See above. >>>>>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value) >>>>>>>> and probably it shouldn't care. >>>>>>>> >>>>>>>>> >>>>>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len' >>>>>>>>> are updated (mostly). >>>>>>>> >>>>>>>> Yes, in mtu_set() we update both. >>>>>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len. >>>>>>>> That what this patch tries to fix (as I understand it). >>>>>>> >>>>>>> To be more precise - it doesn't change MTU value in dev_configure(), >>>>>>> but instead doesn't allow max_rx_pkt_len to become smaller >>>>>>> then MTU + OVERHEAD. >>>>>>> Probably changing MTU value instead is a better choice. >>>>>>> >>>>>> >>>>>> +1 to change mtu for this case. >>>>>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()' >>>>>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU. >>>>> >>>>> Hmm, I don't see that happens within Intel PMDs. >>>>> As I can read the code: if user never call mtu_set(), then MTU value is left intact. >>>>> >>>> >>>> I was checking ice, >>>> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device. >>> >>> Yes, I am not arguing with that. >>> What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len. >>> While it probably should. >>> >> >> Yes 'dev_configure()' doesn't update the 'dev->data->mtu' and 'max_rx_pkt_len' & >> 'dev->data->mtu' may diverge there. >> >> I think best place to update 'dev->data->mtu' is where the device is actually >> updated, but to prevent the diversion above we can update 'dev->data->mtu' in >> ethdev layer, in 'rte_eth_dev_configure()' based on 'max_rx_pkt_len', will it work? > > I think - yes. > At least, I don't foresee any implications with that. > Thanks. @Steve, I think there is agreement on two patches: 1- Update testpmd to take overhead account instead of setting 'max_rx_pkt_len' to 1518 blindly. 2- In 'rte_eth_dev_configure()' update 'dev->data->mtu' based on 'max_rx_pkt_len', again taking overhead into the account. Would you mind updating the new version as above? Thanks, ferruh >> >> Only concern I see is if user reads the MTU ('rte_eth_dev_get_mtu()') after >> 'rte_eth_dev_configure()' but before device configured, user will get the wrong >> value, I guess that problem was already there but changing default value may >> make it more visible. >> >>>> >>>>>> But this won't solve the problem Steve is trying to solve. >>>>> >>>>> You mean we still need to update test-pmd code to calculate max_rx_pkt_len >>>>> properly for default mtu value? >>>>> >>>> >>>> Yes. >>>> Because target of this set is able to receive packets with payload size >>>> 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len', >>>> device still won't able to receive those packets. >>> >>> Agree. >>> >>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Before the patch will result: >>>>>>>>>> mtu==1500, max_rx_pkt_len=1000; //out of sync looks wrong to me >>>>>>>>>> >>>>>>>>>> After the patch: >>>>>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour. >>>>>>>>>> >>>>>>>>>> If you think we need to preserve current behaviour, >>>>>>>>>> then I suppose the easiest thing would be to change dev_config() code >>>>>>>>>> to update mtu value based on max_rx_pkt_len. >>>>>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...} >>>>>>>>>> So the code snippet above will result: >>>>>>>>>> mtu=982,max_rx_pkt_len=1000; >>>>>>>>>> >>>>>>>>> >>>>>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just >>>>>>>>> drop it? >>>>>>>>> >>>>>>>>> By default device will be up with default MTU (1500), later >>>>>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all. >>>>>>>>> >>>>>>>>> Will this work? >>>>>>>> >>>>>>>> I think it might, but that's a big change, probably too risky at that stage... >>>>>>>> >>>>>> >>>>>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see >>>>>> what happens. >>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> And for short term, for above Intel PMDs, there must be a place this >>>>>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that >>>>>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set, >>>>>>>>> otherwise use the 'MTU' value. >>>>>>>> >>>>>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set, >>>>>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync. >>>>>>>> >>>>>>>>> >>>>>>>>> Without 'start()' updated the current logic won't work after stop & start anyway. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for >>>>>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs. >>>>>>>>>>>> >>>>>>>>>>> No, all PMDs exist the same issue, another proposal: >>>>>>>>>>> - rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure(); >>>>>>>>>>> - provide the uniform API for fetching the NIC's supported Ether Overhead size; >>>>>>>>>>> Is it feasible? >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link >>>>>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to >>>>>>>>>>>>>> satisfy mtu requirement. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len) >>>>>>>>>>>>>>> here? >>>>>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to >>>>>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd >>>>>>>>>>>>>> parameter, or not the max_rx_pkt_len. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And please remove above comment, since ether overhead is already >>>>>>>>>>>>>>>> considered in ice_mtu_set. >>>>>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also >>>>>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set >>>>>>>>>>>> need be invoked. >>>>>>>>>>>>>> So, it perhaps should remain this comment before this if() condition. >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return >>>>>>>>>>>>>>>>>> +ret; } >>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>> ret = ice_init_rss(pf); >>>>>>>>>>>>>>>>>> if (ret) { >>>>>>>>>>>>>>>>>> PMD_DRV_LOG(ERR, "Failed to enable rss for PF"); >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> 2.17.1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> >