From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A0142A0A03; Mon, 18 Jan 2021 11:42:27 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 23353140D17; Mon, 18 Jan 2021 11:42:27 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id F17F0140D10 for ; Mon, 18 Jan 2021 11:42:24 +0100 (CET) IronPort-SDR: 14In6zluRzENPVlo7fbOwrNJis2w1lLL7AKttQ1Y93GmQUhKi0qwNoc5u8+/78ocibdHh0FUnK Ms3ClvnCmRUA== X-IronPort-AV: E=McAfee;i="6000,8403,9867"; a="242857247" X-IronPort-AV: E=Sophos;i="5.79,356,1602572400"; d="scan'208";a="242857247" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2021 02:42:09 -0800 IronPort-SDR: WkJqNaxsPHLZ8LlNCUtIdW9dajtpAJE50CPbZCCgnP3Kvvq7w7jYc3womiK9FgGMNWu40hIFxa YhyQ+QenZZYg== X-IronPort-AV: E=Sophos;i="5.79,356,1602572400"; d="scan'208";a="383523687" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.213.241.24]) ([10.213.241.24]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2021 02:42:07 -0800 To: oulijun , Steve Yang , dev@dpdk.org Cc: thomas@monjalon.net, andrew.rybchenko@oktetlabs.ru, linuxarm@openeuler.org References: <20201217092312.27033-1-stevex.yang@intel.com> <20210114094537.13661-1-stevex.yang@intel.com> From: Ferruh Yigit Message-ID: <7ec898cd-60b0-0d8d-b006-06d6ac3822b8@intel.com> Date: Mon, 18 Jan 2021 10:42:03 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 1/15/2021 10:44 AM, oulijun wrote: > Hi Steve > This is a very good job! But I have some question and suggestions. > Please check it. > > 在 2021/1/14 17:45, Steve Yang 写道: >> Ethdev is using default Ethernet overhead to decide if provided >> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value, >> and limits it to MAX if it is. >> >> Since the application/driver used Ethernet overhead is different than >> the ethdev one, check result is wrong. >> >> If the driver is using Ethernet overhead bigger than the default one, >> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when >> correct Ethernet overhead is used to convert back, the resulting MTU is >> less than the intended one, causing some packets to be dropped. >> >> Like, >> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522 >> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518 >> driver  -> MTU = 1518 - 22 = 1496 >> Packets with size 1497-1500 are dropped although intention is to be able >> to send/receive them. >> >> The fix is to make ethdev use the correct Ethernet overhead for port, >> instead of default one. >> >> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors") >> >> Signed-off-by: Steve Yang >> --- >>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++--- >>   1 file changed, 24 insertions(+), 3 deletions(-) >> >> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c >> index 17ddacc78d..19ca4c4512 100644 >> --- a/lib/librte_ethdev/rte_ethdev.c >> +++ b/lib/librte_ethdev/rte_ethdev.c >> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >> nb_rx_q, uint16_t nb_tx_q, >>       struct rte_eth_dev *dev; >>       struct rte_eth_dev_info dev_info; >>       struct rte_eth_conf orig_conf; >> +    uint16_t overhead_len; >>       int diag; >>       int ret; >> +    uint16_t old_mtu; >>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >> nb_rx_q, uint16_t nb_tx_q, >>           memcpy(&dev->data->dev_conf, dev_conf, >>                  sizeof(dev->data->dev_conf)); >> +    /* Backup mtu for rollback */ >> +    old_mtu = dev->data->mtu; >> + >>       ret = rte_eth_dev_info_get(port_id, &dev_info); >>       if (ret != 0) >>           goto rollback; >> +    /* Get the real Ethernet overhead length */ >> +    if (dev_info.max_mtu != UINT16_MAX && >> +        dev_info.max_rx_pktlen > dev_info.max_mtu) >> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu; >> +    else >> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; >> + > The Ethernet frame header length supported by each NIC may be different, which > cause the difference of allowed packets size and drop you say. > The diference of Ethernet frame header length have an impact for the maximum > Ethernet frame length, which is a boundary value > of enabling jumbo frame through the ' max_rx_pkt_len '. > However, we need to do the same thing you do above to get the overhead_len every > time, which will > cause a lot of duplicate code in the framework and app. For examples, parsing > and processing for '--max-pkt-len=xxx' parameter, >  and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here > dev_configure API. > It's a little redundant and troublesome. > > Maybe, it is necessary for driver to directly report the supported maximum > Ethernet frame length by rte_dev_info_get API. > As following: > struct rte_eth_dev_info { >     xxxx >     /** >      * The maximum Ethernet frame length supported by each >      * driver varies with the Ethernet header length. >      */ >     uint16_t eth_max_len; >     xxxx > } > > int > rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) > { >     xxx >     dev_info->min_mtu = RTE_ETHER_MIN_MTU; >     dev_info->max_mtu = UINT16_MAX; >     dev_info->eth_max_len = RTE_ETHER_MAX_LEN; >     xxx > } > > And then: > xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info) > { >     xxx >     info->eth_max_len = xxx_ETHER_MAX _LEN; >     xxx > } > What do you think? Hi Lijun, The 'max_mtu' has been added in the past to solve exact same problem, perhaps it would be better to add something like 'eth_max_len' at that time. But as Steve mentioned we are planning to switch to more MTU based configuration, which should remove the need of the field you mentioned. >>       /* If number of queues specified by application for both Rx and Tx is >>        * zero, use driver preferred values. This cannot be done individually >>        * as it is valid for either Tx or Rx (but not both) to be zero. >> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >> nb_rx_q, uint16_t nb_tx_q, >>               goto rollback; >>           } >>       } else { >> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN || >> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN) >> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; >> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || >> +            pktlen > RTE_ETHER_MTU + overhead_len) >>               /* Use default value */ >>               dev->data->dev_conf.rxmode.max_rx_pkt_len = >> -                            RTE_ETHER_MAX_LEN; >> +                        RTE_ETHER_MTU + overhead_len; >> +    } >> + >> +    /* Scale the MTU size to adapt max_rx_pkt_len */ >> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - >> +                overhead_len; >>       } > Now that we update mtu here when jumbo frame enabled. It is necessary to check > validity of max_rx_pkt_len. > As following: > if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >     if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen || >         dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) { It is not clear from API that if "max_rx_pkt_len <= RTE_ETHER_MTU + overhead_len" is a valid value or not, the API says: "max_rx_pkt_len; /**< Only used if JUMBO_FRAME enabled. */" But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU' >             xxxx >         } > } else { >         xxx > } > > If it does not thing above, which will cause an unreasonable case. > Like, > dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1 > dev_conf->rxmode.max_rx_pkt_len = 1500 > overhead_len = 26 > dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = 1474 > This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', the requeted MTU is 1474, if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as "1500 + overhead". > In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500. Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to fix in last release but we reverted them because of unexpected side affects. Current long term plan is, - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on MTU values. (instead one working on frame lenght and other on MTU) - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by requested MTU value check - App decide the jumbo frame capability from the 'max_mtu' check - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen' >>       /* >> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >> nb_rx_q, uint16_t nb_tx_q, >>       eth_dev_tx_queue_config(dev, 0); >>   rollback: >>       memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf)); >> +    if (old_mtu != dev->data->mtu) >> +        dev->data->mtu = old_mtu; >>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret); >>       return ret; >>