From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 0EB0BA04DC;
	Tue, 20 Oct 2020 14:30:30 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id AC4C5BBD6;
	Tue, 20 Oct 2020 14:30:28 +0200 (CEST)
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id EEC81BBD2
 for <dev@dpdk.org>; Tue, 20 Oct 2020 14:30:26 +0200 (CEST)
IronPort-SDR: 7oKoc7lNoNaRMTzpUVBgWBIIGtTlwxHXJ2BP2S9cwBCZtdUmTxK8sULIvq9+kI1LHKgV8GTNas
 oqj22kabBGuw==
X-IronPort-AV: E=McAfee;i="6000,8403,9779"; a="167317400"
X-IronPort-AV: E=Sophos;i="5.77,397,1596524400"; d="scan'208";a="167317400"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Oct 2020 05:30:24 -0700
IronPort-SDR: fnm9RUCneBOyTr2peIZ6hoEG8w7nEZ51P6JqxO/IiHC/R5QkOmu4UEz0YiPKFahEFEi12Wk7zH
 sksSCrIZfL8A==
X-IronPort-AV: E=Sophos;i="5.77,397,1596524400"; d="scan'208";a="347845880"
Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.213.247.249])
 ([10.213.247.249])
 by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Oct 2020 05:30:21 -0700
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 "Yang, SteveX" <stevex.yang@intel.com>, "Zhang, Qi Z"
 <qi.z.zhang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "Zhao1, Wei" <wei.zhao1@intel.com>, "Guo, Jia" <jia.guo@intel.com>,
 "Yang, Qiming" <qiming.yang@intel.com>, "Wu, Jingjing"
 <jingjing.wu@intel.com>, "Xing, Beilei" <beilei.xing@intel.com>,
 "Stokes, Ian" <ian.stokes@intel.com>
References: <20200923040909.73418-1-stevex.yang@intel.com>
 <20200928065541.7520-4-stevex.yang@intel.com>
 <8459e979b76c43cdbd5a9fbd809f9b00@intel.com>
 <BYAPR11MB330182D3293C02EE38EBF53C9A320@BYAPR11MB3301.namprd11.prod.outlook.com>
 <6ad9e3ec00194e31891d97849135655c@intel.com>
 <DM6PR11MB4362515283D00E27A793E6B0F9330@DM6PR11MB4362.namprd11.prod.outlook.com>
 <7704b7ce95fd4db2a9c6a8a33c3f0805@intel.com>
 <77ac2293-e532-e702-2370-c07cdd957c57@intel.com>
 <DM6PR11MB43628BBF9DCE7CC4D7C05AD8F91E0@DM6PR11MB4362.namprd11.prod.outlook.com>
 <BYAPR11MB330105FA146CB24BA6791CC39A1E0@BYAPR11MB3301.namprd11.prod.outlook.com>
 <483bd509-82b9-9724-d28c-c517ef091e0c@intel.com>
 <BYAPR11MB3301983CE1B7968BAE8D30279A1E0@BYAPR11MB3301.namprd11.prod.outlook.com>
 <BYAPR11MB33017C521B1CA651C27AC3CB9A1E0@BYAPR11MB3301.namprd11.prod.outlook.com>
 <e4aa7658-2ccd-39a8-a93c-e867a68470a4@intel.com>
 <BYAPR11MB33014CE4FE31BE166D3396EE9A1F0@BYAPR11MB3301.namprd11.prod.outlook.com>
From: Ferruh Yigit <ferruh.yigit@intel.com>
Message-ID: <d496fe2c-c873-3d43-a2af-f56260578ed9@intel.com>
Date: Tue, 20 Oct 2020 13:29:56 +0100
MIME-Version: 1.0
In-Reply-To: <BYAPR11MB33014CE4FE31BE166D3396EE9A1F0@BYAPR11MB3301.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets
 with vlan tag cannot be received by default
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
> 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>>>>>>> side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>>>>>> dev_config
>>>>>>>>>>> ops.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>      drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>>>>>      1 file changed, 11 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>>>>>> *dev)
>>>>>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>>>>>      struct ice_pf *pf =
>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>>>>>      int ret;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>>>>>>> -3157,6
>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>>>>>      if (dev->data->dev_conf.rxmode.mq_mode &
>>>>>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why we need this check?
>>>>>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>>>>>
>>>>>>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>>>>>
>>>>>>>>>>> OK, I see
>>>>>>>>>>>
>>>>>>>>>>> But still have one question
>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>>>>>> dev->data->application set
>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>>>>>>
>>>>>>>>> Ok, this describe the problem more general and better to replace exist
>>>>>>>> code comment and commit log for easy understanding.
>>>>>>>>> Please send a new version for reword
>>>>>>>>>
>>>>>>>>
>>>>>>>> I didn't really get this set.
>>>>>>>>
>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>>>>>>> this size is dropped.
>>>>>>>
>>>>>>> Sure, it is normal case for dropping oversize data.
>>>>>>>
>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
>>>>>>>> in PMD to prevent this?
>>>>>>>>
>>>>>>>
>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>>>>>>> This fix will make a decision when confliction occurred.
>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>>>>>>
>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>>>>>>> and mean it? PMD will not honor the user config.
>>>>>>>
>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>>>>>>
>>>>>>>>
>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>>>>>
>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>>>>>>
>>>>>>>> And I guess even better what we need is to tell to the application what the
>>>>>>>> frame overhead PMD accepts.
>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>>>>>>> given/requested MTU value.
>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>>>>>>> he has a solution now?
>>>>>>
>>>>>>    From my perspective the main problem here:
>>>>>> We have 2 different variables for nearly the same thing:
>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
>>>>>
>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
>>>>> Although not sure that is practically what is done for all drivers.
>>>>
>>>> I think most of Intel PMDs use it unconditionally.
>>>>
>>>>>
>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
>>>>>> - mtu_set() will update both variables.
>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
>>>>>>
>>>>>> This patch fixes this inconsistency, which I think is a good thing.
>>>>>> Though yes, it introduces change in behaviour.
>>>>>>
>>>>>> Let say the code:
>>>>>> rte_eth_dev_set_mtu(port, 1500);
>>>>>> dev_conf.max_rx_pkt_len = 1000;
>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
>>>>>>
>>>>>
>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
>>>>> 'rte_eth_dev_set_mtu().
>>>>
>>>> Usually yes.
>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
>>>>
>>>>>
>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
>>>>
>>>> See above.
>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
>>>> and probably it shouldn't care.
>>>>
>>>>>
>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
>>>>> are updated (mostly).
>>>>
>>>> Yes, in mtu_set() we update both.
>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
>>>> That what this patch tries to fix (as I understand it).
>>>
>>> To be more precise - it doesn't change MTU value in dev_configure(),
>>> but instead doesn't allow max_rx_pkt_len to become smaller
>>> then MTU + OVERHEAD.
>>> Probably changing MTU value instead is a better choice.
>>>
>>
>> +1 to change mtu for this case.
>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
> 
> Hmm, I don't see that happens within Intel PMDs.
> As I can read the code: if user never call mtu_set(), then MTU value is left intact.
> 

I was checking ice,
in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.

>> But this won't solve the problem Steve is trying to solve.
> 
> You mean we still need to update test-pmd code to calculate max_rx_pkt_len
> properly for default mtu value?
> 

Yes.
Because target of this set is able to receive packets with payload size 
'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len', 
device still won't able to receive those packets.

>>>>>
>>>>>
>>>>>> Before the patch will result:
>>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
>>>>>>
>>>>>> After the patch:
>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
>>>>>>
>>>>>> If you think we need to preserve current behaviour,
>>>>>> then I suppose the easiest thing would be to change dev_config() code
>>>>>> to update mtu value based on max_rx_pkt_len.
>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
>>>>>> So the code snippet above will result:
>>>>>> mtu=982,max_rx_pkt_len=1000;
>>>>>>
>>>>>
>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
>>>>> drop it?
>>>>>
>>>>> By default device will be up with default MTU (1500), later
>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
>>>>>
>>>>> Will this work?
>>>>
>>>> I think it might, but that's a big change, probably too risky at that stage...
>>>>
>>
>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
>> what happens.
>>
>>>>
>>>>>
>>>>>
>>>>> And for short term, for above Intel PMDs, there must be a place this
>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
>>>>> otherwise use the 'MTU' value.
>>>>
>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
>>>>
>>>>>
>>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>>>>>>
>>>>>>> No, all PMDs exist the same issue, another proposal:
>>>>>>>     -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>>>>>>     - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>>>>>>> Is it feasible?
>>>>>>>
>>>>>>>>>
>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>>>>>> satisfy mtu requirement.
>>>>>>>>>>
>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>>>>>>> here?
>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>>>>>>> considered in ice_mtu_set.
>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>>>>>> need be invoked.
>>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>>>>>> +ret; }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>      ret = ice_init_rss(pf);
>>>>>>>>>>>>>>      if (ret) {
>>>>>>>>>>>>>>      PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>