From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7108AA0A0A; Sat, 23 Jan 2021 11:54:34 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 25B1D140E46; Sat, 23 Jan 2021 11:54:34 +0100 (CET) Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by mails.dpdk.org (Postfix) with ESMTP id ECF29140E40 for ; Sat, 23 Jan 2021 11:54:18 +0100 (CET) Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4DN9D23JyZzMBvd for ; Sat, 23 Jan 2021 17:04:02 +0800 (CST) Received: from [10.67.103.119] (10.67.103.119) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Sat, 23 Jan 2021 17:05:27 +0800 To: References: <20201217092312.27033-1-stevex.yang@intel.com> <20210114094537.13661-1-stevex.yang@intel.com> <7ec898cd-60b0-0d8d-b006-06d6ac3822b8@intel.com> From: oulijun Message-ID: Date: Sat, 23 Jan 2021 17:05:23 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.119] X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 在 2021/1/19 16:46, oulijun 写道: > > > 在 2021/1/18 18:42, Ferruh Yigit 写道: >> On 1/15/2021 10:44 AM, oulijun wrote: >>> Hi Steve >>> This is a very good job! But I have some question and suggestions. >>> Please check it. >>> >>> 在 2021/1/14 17:45, Steve Yang 写道: >>>> Ethdev is using default Ethernet overhead to decide if provided >>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value, >>>> and limits it to MAX if it is. >>>> >>>> Since the application/driver used Ethernet overhead is different than >>>> the ethdev one, check result is wrong. >>>> >>>> If the driver is using Ethernet overhead bigger than the default one, >>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when >>>> correct Ethernet overhead is used to convert back, the resulting MTU is >>>> less than the intended one, causing some packets to be dropped. >>>> >>>> Like, >>>> app -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522 >>>> ethdev -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518 >>>> driver -> MTU = 1518 - 22 = 1496 >>>> Packets with size 1497-1500 are dropped although intention is to be >>>> able >>>> to send/receive them. >>>> >>>> The fix is to make ethdev use the correct Ethernet overhead for port, >>>> instead of default one. >>>> >>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors") >>>> >>>> Signed-off-by: Steve Yang >>>> --- >>>> lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++--- >>>> 1 file changed, 24 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/lib/librte_ethdev/rte_ethdev.c >>>> b/lib/librte_ethdev/rte_ethdev.c >>>> index 17ddacc78d..19ca4c4512 100644 >>>> --- a/lib/librte_ethdev/rte_ethdev.c >>>> +++ b/lib/librte_ethdev/rte_ethdev.c >>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, >>>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>>> struct rte_eth_dev *dev; >>>> struct rte_eth_dev_info dev_info; >>>> struct rte_eth_conf orig_conf; >>>> + uint16_t overhead_len; >>>> int diag; >>>> int ret; >>>> + uint16_t old_mtu; >>>> RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, >>>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>>> memcpy(&dev->data->dev_conf, dev_conf, >>>> sizeof(dev->data->dev_conf)); >>>> + /* Backup mtu for rollback */ >>>> + old_mtu = dev->data->mtu; >>>> + >>>> ret = rte_eth_dev_info_get(port_id, &dev_info); >>>> if (ret != 0) >>>> goto rollback; >>>> + /* Get the real Ethernet overhead length */ >>>> + if (dev_info.max_mtu != UINT16_MAX && >>>> + dev_info.max_rx_pktlen > dev_info.max_mtu) >>>> + overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu; >>>> + else >>>> + overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; >>>> + >>> The Ethernet frame header length supported by each NIC may be >>> different, which cause the difference of allowed packets size and >>> drop you say. >>> The diference of Ethernet frame header length have an impact for the >>> maximum Ethernet frame length, which is a boundary value >>> of enabling jumbo frame through the ' max_rx_pkt_len '. >>> However, we need to do the same thing you do above to get the >>> overhead_len every time, which will >>> cause a lot of duplicate code in the framework and app. For examples, >>> parsing and processing for '--max-pkt-len=xxx' parameter, >>> and " cmd_config_max_pkt_len_parsed " in testpmd, and here >>> modifying here dev_configure API. >>> It's a little redundant and troublesome. >>> >>> Maybe, it is necessary for driver to directly report the supported >>> maximum Ethernet frame length by rte_dev_info_get API. >>> As following: >>> struct rte_eth_dev_info { >>> xxxx >>> /** >>> * The maximum Ethernet frame length supported by each >>> * driver varies with the Ethernet header length. >>> */ >>> uint16_t eth_max_len; >>> xxxx >>> } >>> >>> int >>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info >>> *dev_info) >>> { >>> xxx >>> dev_info->min_mtu = RTE_ETHER_MIN_MTU; >>> dev_info->max_mtu = UINT16_MAX; >>> dev_info->eth_max_len = RTE_ETHER_MAX_LEN; >>> xxx >>> } >>> >>> And then: >>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct >>> rte_eth_dev_info *info) >>> { >>> xxx >>> info->eth_max_len = xxx_ETHER_MAX _LEN; >>> xxx >>> } >>> What do you think? >> >> Hi Lijun, >> >> The 'max_mtu' has been added in the past to solve exact same problem, >> perhaps it would be better to add something like 'eth_max_len' at that >> time. >> >> But as Steve mentioned we are planning to switch to more MTU based >> configuration, which should remove the need of the field you mentioned. >> >>>> /* If number of queues specified by application for both Rx >>>> and Tx is >>>> * zero, use driver preferred values. This cannot be done >>>> individually >>>> * as it is valid for either Tx or Rx (but not both) to be zero. >>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, >>>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>>> goto rollback; >>>> } >>>> } else { >>>> - if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN || >>>> - dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN) >>>> + uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; >>>> + if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || >>>> + pktlen > RTE_ETHER_MTU + overhead_len) >>>> /* Use default value */ >>>> dev->data->dev_conf.rxmode.max_rx_pkt_len = >>>> - RTE_ETHER_MAX_LEN; >>>> + RTE_ETHER_MTU + overhead_len; >>>> + } >>>> + >>>> + /* Scale the MTU size to adapt max_rx_pkt_len */ >>>> + if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >>>> + dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - >>>> + overhead_len; >>>> } >>> Now that we update mtu here when jumbo frame enabled. It is necessary >>> to check validity of max_rx_pkt_len. >>> As following: >>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >>> if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen || >>> dev_conf->rxmode.max_rx_pkt_len <= RTE_ETHER_MTU + >>> overhead_len) { >> >> It is not clear from API that if "max_rx_pkt_len <= RTE_ETHER_MTU + >> overhead_len" is a valid value or not, the API says: >> "max_rx_pkt_len; /**< Only used if JUMBO_FRAME enabled. */" >> >> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, >> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU' >> > The MTU setting entry does need to be adjusted. I agree! But I think > this place should be modified, and it's probably more reasonable. > > Generally speaking, max_rx_pkt_len will be also updated with 'mtu + > overhead_len' also when application sets mtu > 1500, which means > JUMBO_FRAME is enabled. > Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > > 1500 + overhead_len. > Besides, having following check for "rxmode.offloads & > DEV_RX_OFFLOAD_JUMBO_FRAME == 0": > @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t > nb_rx_q, uint16_t nb_tx_q, > goto rollback; > } > } else { > uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; > if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || > pktlen > RTE_ETHER_MTU + overhead_len) > /* Use default value */ > dev->data->dev_conf.rxmode.max_rx_pkt_len = > RTE_ETHER_MTU + overhead_len; > } > One of check condition for the branch is "pktlen > RTE_ETHER_MTU + > overhead_len ". > > In view of the above two aspects, when application set > DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be > greater than " RTE_ETHER_MTU + overhead_len " > > Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary. > >>> xxxx >>> } >>> } else { >>> xxx >>> } >>> >>> If it does not thing above, which will cause an unreasonable case. >>> Like, >>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1 >>> dev_conf->rxmode.max_rx_pkt_len = 1500 >>> overhead_len = 26 >>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = >>> 1500 - 26 = 1474 >>> >> >> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = >> 1500', the requeted MTU is 1474, >> if application wants an MTU as 1500, it should set the >> 'max_rx_pkt_len' as "1500 + overhead". >> >>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when >>> mtu > 1500. Now that there are two interfaces for modifying MTU, namely, 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()'. They modify MTU based on 'max_rx_pkt_len' and 'mtu' value, respectively. The 'max_rx_pkt_len' field has been used in xxx_dev_configure ops in many PMD drivers. In addition, if 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, the value of 'max_rx_pkt_len' is also verified by using max_rx_pktlen and RTE_ETHER_MTU + overhead_len in i40e and ice driver. Currently, DEV_RX_OFFLOAD_JUMBO_FRAME has not been removed, so it is necessary and reasonable to check 'max_rx_pkt_len' in 'rte_eth_dev_configure()'. Please check it again. Looking forward to your reply. >> >> Overall the 'max_rx_pkt_len' setting is a little messy, it has been >> tried to fix in last release but we reverted them because of >> unexpected side affects. >> >> Current long term plan is, >> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to >> work on MTU values. (instead one working on frame lenght and other on >> MTU) >> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame >> request by requested MTU value check >> - App decide the jumbo frame capability from the 'max_mtu' check >> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen' >> >>>> /* >>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, >>>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>>> eth_dev_tx_queue_config(dev, 0); >>>> rollback: >>>> memcpy(&dev->data->dev_conf, &orig_conf, >>>> sizeof(dev->data->dev_conf)); >>>> + if (old_mtu != dev->data->mtu) >>>> + dev->data->mtu = old_mtu; >>>> rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, >>>> dev_conf, ret); >>>> return ret; >>>> >> >> . >> > . >