From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 7108AA0A0A;
	Sat, 23 Jan 2021 11:54:34 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 25B1D140E46;
	Sat, 23 Jan 2021 11:54:34 +0100 (CET)
Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191])
 by mails.dpdk.org (Postfix) with ESMTP id ECF29140E40
 for <dev@dpdk.org>; Sat, 23 Jan 2021 11:54:18 +0100 (CET)
Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.60])
 by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4DN9D23JyZzMBvd
 for <dev@dpdk.org>; Sat, 23 Jan 2021 17:04:02 +0800 (CST)
Received: from [10.67.103.119] (10.67.103.119) by
 DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id
 14.3.498.0; Sat, 23 Jan 2021 17:05:27 +0800
To: <dev@dpdk.org>
References: <20201217092312.27033-1-stevex.yang@intel.com>
 <20210114094537.13661-1-stevex.yang@intel.com>
 <c79413fe-06ab-a2eb-96ec-d55d968fd2b2@huawei.com>
 <7ec898cd-60b0-0d8d-b006-06d6ac3822b8@intel.com>
 <c2f61061-14b2-c621-af83-31f956a654ca@huawei.com>
From: oulijun <oulijun@huawei.com>
Message-ID: <a56ea062-b272-2433-3b7e-7f93eb0987ae@huawei.com>
Date: Sat, 23 Jan 2021 17:05:23 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.1.0
MIME-Version: 1.0
In-Reply-To: <c2f61061-14b2-c621-af83-31f956a654ca@huawei.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.67.103.119]
X-CFilter-Loop: Reflected
Subject: Re: [dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx
 packet length
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>



在 2021/1/19 16:46, oulijun 写道:
> 
> 
> 在 2021/1/18 18:42, Ferruh Yigit 写道:
>> On 1/15/2021 10:44 AM, oulijun wrote:
>>> Hi Steve
>>> This is a very good job! But I have some question and suggestions.
>>> Please check it.
>>>
>>> 在 2021/1/14 17:45, Steve Yang 写道:
>>>> Ethdev is using default Ethernet overhead to decide if provided
>>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>>> and limits it to MAX if it is.
>>>>
>>>> Since the application/driver used Ethernet overhead is different than
>>>> the ethdev one, check result is wrong.
>>>>
>>>> If the driver is using Ethernet overhead bigger than the default one,
>>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>>> less than the intended one, causing some packets to be dropped.
>>>>
>>>> Like,
>>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>>> driver  -> MTU = 1518 - 22 = 1496
>>>> Packets with size 1497-1500 are dropped although intention is to be 
>>>> able
>>>> to send/receive them.
>>>>
>>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>>> instead of default one.
>>>>
>>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>>
>>>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
>>>> ---
>>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c 
>>>> b/lib/librte_ethdev/rte_ethdev.c
>>>> index 17ddacc78d..19ca4c4512 100644
>>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>       struct rte_eth_dev *dev;
>>>>       struct rte_eth_dev_info dev_info;
>>>>       struct rte_eth_conf orig_conf;
>>>> +    uint16_t overhead_len;
>>>>       int diag;
>>>>       int ret;
>>>> +    uint16_t old_mtu;
>>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>           memcpy(&dev->data->dev_conf, dev_conf,
>>>>                  sizeof(dev->data->dev_conf));
>>>> +    /* Backup mtu for rollback */
>>>> +    old_mtu = dev->data->mtu;
>>>> +
>>>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>       if (ret != 0)
>>>>           goto rollback;
>>>> +    /* Get the real Ethernet overhead length */
>>>> +    if (dev_info.max_mtu != UINT16_MAX &&
>>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>>> +    else
>>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>>> +
>>> The Ethernet frame header length supported by each NIC may be 
>>> different, which cause the difference of allowed packets size and 
>>> drop you say.
>>> The diference of Ethernet frame header length have an impact for the 
>>> maximum Ethernet frame length, which is a boundary value
>>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>>> However, we need to do the same thing you do above to get the 
>>> overhead_len every time, which will
>>> cause a lot of duplicate code in the framework and app. For examples, 
>>> parsing and processing for '--max-pkt-len=xxx' parameter,
>>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here 
>>> modifying here dev_configure API.
>>> It's a little redundant and troublesome.
>>>
>>> Maybe, it is necessary for driver to directly report the supported 
>>> maximum Ethernet frame length by rte_dev_info_get API.
>>> As following:
>>> struct rte_eth_dev_info {
>>>      xxxx
>>>      /**
>>>       * The maximum Ethernet frame length supported by each
>>>       * driver varies with the Ethernet header length.
>>>       */
>>>      uint16_t eth_max_len;
>>>      xxxx
>>> }
>>>
>>> int
>>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info 
>>> *dev_info)
>>> {
>>>      xxx
>>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>>      dev_info->max_mtu = UINT16_MAX;
>>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>>>      xxx
>>> }
>>>
>>> And then:
>>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct 
>>> rte_eth_dev_info *info)
>>> {
>>>      xxx
>>>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>>>      xxx
>>> }
>>> What do you think?
>>
>> Hi Lijun,
>>
>> The 'max_mtu' has been added in the past to solve exact same problem, 
>> perhaps it would be better to add something like 'eth_max_len' at that 
>> time.
>>
>> But as Steve mentioned we are planning to switch to more MTU based 
>> configuration, which should remove the need of the field you mentioned.
>>
>>>>       /* If number of queues specified by application for both Rx 
>>>> and Tx is
>>>>        * zero, use driver preferred values. This cannot be done 
>>>> individually
>>>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>               goto rollback;
>>>>           }
>>>>       } else {
>>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>>               /* Use default value */
>>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>>> -                            RTE_ETHER_MAX_LEN;
>>>> +                        RTE_ETHER_MTU + overhead_len;
>>>> +    }
>>>> +
>>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>>> +                overhead_len;
>>>>       }
>>> Now that we update mtu here when jumbo frame enabled. It is necessary 
>>> to check validity of max_rx_pkt_len.
>>> As following:
>>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + 
>>> overhead_len) {
>>
>> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
>> overhead_len" is a valid value or not, the API says:
>> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"
>>
>> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
>> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
>>
> The MTU setting entry does need to be adjusted. I agree! But I think 
> this place should be modified, and it's probably more reasonable.
> 
> Generally speaking, max_rx_pkt_len will be also updated with 'mtu + 
> overhead_len' also when application sets mtu > 1500, which means 
> JUMBO_FRAME is enabled.
> Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 
> 1500 + overhead_len.
> Besides, having following check for "rxmode.offloads & 
> DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
> nb_rx_q, uint16_t nb_tx_q,
>                goto rollback;
>            }
>        } else {
>          uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>          if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>              pktlen > RTE_ETHER_MTU + overhead_len)
>                /* Use default value */
>                dev->data->dev_conf.rxmode.max_rx_pkt_len =
>                          RTE_ETHER_MTU + overhead_len;
> }
> One of check condition for the branch is "pktlen > RTE_ETHER_MTU + 
> overhead_len ".
> 
> In view of the above two aspects, when application set 
> DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be 
> greater than " RTE_ETHER_MTU + overhead_len "
> 
> Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.
> 
>>>              xxxx
>>>          }
>>> } else {
>>>          xxx
>>> }
>>>
>>> If it does not thing above, which will cause an unreasonable case.
>>> Like,
>>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>>> dev_conf->rxmode.max_rx_pkt_len = 1500
>>> overhead_len = 26
>>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 
>>> 1500 - 26 = 1474
>>>
>>
>> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 
>> 1500', the requeted MTU is 1474,
>> if application wants an MTU as 1500, it should set the 
>> 'max_rx_pkt_len' as "1500 + overhead".
>>
>>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when 
>>> mtu > 1500.
Now that there are two interfaces for modifying MTU, namely, 
'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()'.
They modify MTU based on 'max_rx_pkt_len' and 'mtu' value, respectively. 
The 'max_rx_pkt_len' field has been used
  in xxx_dev_configure ops in many PMD drivers. In addition, if 
'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, the value of
'max_rx_pkt_len' is also verified by using max_rx_pktlen and 
RTE_ETHER_MTU + overhead_len in i40e and ice driver.

Currently, DEV_RX_OFFLOAD_JUMBO_FRAME has not been removed, so it is 
necessary and reasonable
  to check 'max_rx_pkt_len' in 'rte_eth_dev_configure()'.

Please check it again. Looking forward to your reply.
>>
>> Overall the 'max_rx_pkt_len' setting is a little messy, it has been 
>> tried to fix in last release but we reverted them because of 
>> unexpected side affects.
>>
>> Current long term plan is,
>> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to 
>> work on MTU values. (instead one working on frame lenght and other on 
>> MTU)
>> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame 
>> request by requested MTU value check
>> - App decide the jumbo frame capability from the 'max_mtu' check
>> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
>>
>>>>       /*
>>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>       eth_dev_tx_queue_config(dev, 0);
>>>>   rollback:
>>>>       memcpy(&dev->data->dev_conf, &orig_conf, 
>>>> sizeof(dev->data->dev_conf));
>>>> +    if (old_mtu != dev->data->mtu)
>>>> +        dev->data->mtu = old_mtu;
>>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, 
>>>> dev_conf, ret);
>>>>       return ret;
>>>>
>>
>> .
>>
> .
>