From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2AB39A0A03; Tue, 19 Jan 2021 09:47:04 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DA8A0140D4D; Tue, 19 Jan 2021 09:47:03 +0100 (CET) Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by mails.dpdk.org (Postfix) with ESMTP id 8FC61140D46 for ; Tue, 19 Jan 2021 09:47:00 +0100 (CET) Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4DKj0Z3ld9zMMKZ; Tue, 19 Jan 2021 16:45:34 +0800 (CST) Received: from [10.67.103.119] (10.67.103.119) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Tue, 19 Jan 2021 16:46:56 +0800 To: Ferruh Yigit , Steve Yang , CC: , References: <20201217092312.27033-1-stevex.yang@intel.com> <20210114094537.13661-1-stevex.yang@intel.com> <7ec898cd-60b0-0d8d-b006-06d6ac3822b8@intel.com> From: oulijun Message-ID: Date: Tue, 19 Jan 2021 16:46:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <7ec898cd-60b0-0d8d-b006-06d6ac3822b8@intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.119] X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 在 2021/1/18 18:42, Ferruh Yigit 写道: > On 1/15/2021 10:44 AM, oulijun wrote: >> Hi Steve >> This is a very good job! But I have some question and suggestions. >> Please check it. >> >> 在 2021/1/14 17:45, Steve Yang 写道: >>> Ethdev is using default Ethernet overhead to decide if provided >>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value, >>> and limits it to MAX if it is. >>> >>> Since the application/driver used Ethernet overhead is different than >>> the ethdev one, check result is wrong. >>> >>> If the driver is using Ethernet overhead bigger than the default one, >>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when >>> correct Ethernet overhead is used to convert back, the resulting MTU is >>> less than the intended one, causing some packets to be dropped. >>> >>> Like, >>> app -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522 >>> ethdev -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518 >>> driver -> MTU = 1518 - 22 = 1496 >>> Packets with size 1497-1500 are dropped although intention is to be able >>> to send/receive them. >>> >>> The fix is to make ethdev use the correct Ethernet overhead for port, >>> instead of default one. >>> >>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors") >>> >>> Signed-off-by: Steve Yang >>> --- >>> lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++--- >>> 1 file changed, 24 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/librte_ethdev/rte_ethdev.c >>> b/lib/librte_ethdev/rte_ethdev.c >>> index 17ddacc78d..19ca4c4512 100644 >>> --- a/lib/librte_ethdev/rte_ethdev.c >>> +++ b/lib/librte_ethdev/rte_ethdev.c >>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, >>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>> struct rte_eth_dev *dev; >>> struct rte_eth_dev_info dev_info; >>> struct rte_eth_conf orig_conf; >>> + uint16_t overhead_len; >>> int diag; >>> int ret; >>> + uint16_t old_mtu; >>> RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, >>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>> memcpy(&dev->data->dev_conf, dev_conf, >>> sizeof(dev->data->dev_conf)); >>> + /* Backup mtu for rollback */ >>> + old_mtu = dev->data->mtu; >>> + >>> ret = rte_eth_dev_info_get(port_id, &dev_info); >>> if (ret != 0) >>> goto rollback; >>> + /* Get the real Ethernet overhead length */ >>> + if (dev_info.max_mtu != UINT16_MAX && >>> + dev_info.max_rx_pktlen > dev_info.max_mtu) >>> + overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu; >>> + else >>> + overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; >>> + >> The Ethernet frame header length supported by each NIC may be >> different, which cause the difference of allowed packets size and drop >> you say. >> The diference of Ethernet frame header length have an impact for the >> maximum Ethernet frame length, which is a boundary value >> of enabling jumbo frame through the ' max_rx_pkt_len '. >> However, we need to do the same thing you do above to get the >> overhead_len every time, which will >> cause a lot of duplicate code in the framework and app. For examples, >> parsing and processing for '--max-pkt-len=xxx' parameter, >> and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying >> here dev_configure API. >> It's a little redundant and troublesome. >> >> Maybe, it is necessary for driver to directly report the supported >> maximum Ethernet frame length by rte_dev_info_get API. >> As following: >> struct rte_eth_dev_info { >> xxxx >> /** >> * The maximum Ethernet frame length supported by each >> * driver varies with the Ethernet header length. >> */ >> uint16_t eth_max_len; >> xxxx >> } >> >> int >> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) >> { >> xxx >> dev_info->min_mtu = RTE_ETHER_MIN_MTU; >> dev_info->max_mtu = UINT16_MAX; >> dev_info->eth_max_len = RTE_ETHER_MAX_LEN; >> xxx >> } >> >> And then: >> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct >> rte_eth_dev_info *info) >> { >> xxx >> info->eth_max_len = xxx_ETHER_MAX _LEN; >> xxx >> } >> What do you think? > > Hi Lijun, > > The 'max_mtu' has been added in the past to solve exact same problem, > perhaps it would be better to add something like 'eth_max_len' at that > time. > > But as Steve mentioned we are planning to switch to more MTU based > configuration, which should remove the need of the field you mentioned. > >>> /* If number of queues specified by application for both Rx and >>> Tx is >>> * zero, use driver preferred values. This cannot be done >>> individually >>> * as it is valid for either Tx or Rx (but not both) to be zero. >>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, >>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>> goto rollback; >>> } >>> } else { >>> - if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN || >>> - dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN) >>> + uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; >>> + if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || >>> + pktlen > RTE_ETHER_MTU + overhead_len) >>> /* Use default value */ >>> dev->data->dev_conf.rxmode.max_rx_pkt_len = >>> - RTE_ETHER_MAX_LEN; >>> + RTE_ETHER_MTU + overhead_len; >>> + } >>> + >>> + /* Scale the MTU size to adapt max_rx_pkt_len */ >>> + if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >>> + dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - >>> + overhead_len; >>> } >> Now that we update mtu here when jumbo frame enabled. It is necessary >> to check validity of max_rx_pkt_len. >> As following: >> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >> if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen || >> dev_conf->rxmode.max_rx_pkt_len <= RTE_ETHER_MTU + >> overhead_len) { > > It is not clear from API that if "max_rx_pkt_len <= RTE_ETHER_MTU + > overhead_len" is a valid value or not, the API says: > "max_rx_pkt_len; /**< Only used if JUMBO_FRAME enabled. */" > > But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, > 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU' > The MTU setting entry does need to be adjusted. I agree! But I think this place should be modified, and it's probably more reasonable. Generally speaking, max_rx_pkt_len will be also updated with 'mtu + overhead_len' also when application sets mtu > 1500, which means JUMBO_FRAME is enabled. Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 1500 + overhead_len. Besides, having following check for "rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 0": @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, goto rollback; } } else { uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || pktlen > RTE_ETHER_MTU + overhead_len) /* Use default value */ dev->data->dev_conf.rxmode.max_rx_pkt_len = RTE_ETHER_MTU + overhead_len; } One of check condition for the branch is "pktlen > RTE_ETHER_MTU + overhead_len ". In view of the above two aspects, when application set DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be greater than " RTE_ETHER_MTU + overhead_len " Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary. >> xxxx >> } >> } else { >> xxx >> } >> >> If it does not thing above, which will cause an unreasonable case. >> Like, >> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1 >> dev_conf->rxmode.max_rx_pkt_len = 1500 >> overhead_len = 26 >> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 >> - 26 = 1474 >> > > This looks correct, if the applicatin is requesting 'max_rx_pkt_len = > 1500', the requeted MTU is 1474, > if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' > as "1500 + overhead". > >> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu >> > 1500. > > Overall the 'max_rx_pkt_len' setting is a little messy, it has been > tried to fix in last release but we reverted them because of unexpected > side affects. > > Current long term plan is, > - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to > work on MTU values. (instead one working on frame lenght and other on MTU) > - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request > by requested MTU value check > - App decide the jumbo frame capability from the 'max_mtu' check > - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen' > >>> /* >>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, >>> uint16_t nb_rx_q, uint16_t nb_tx_q, >>> eth_dev_tx_queue_config(dev, 0); >>> rollback: >>> memcpy(&dev->data->dev_conf, &orig_conf, >>> sizeof(dev->data->dev_conf)); >>> + if (old_mtu != dev->data->mtu) >>> + dev->data->mtu = old_mtu; >>> rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, >>> ret); >>> return ret; >>> > > . >