From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0AA73A052A; Mon, 25 Jan 2021 13:22:10 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 77945140EBE; Mon, 25 Jan 2021 13:22:10 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id EEB34140E8C for ; Mon, 25 Jan 2021 13:22:08 +0100 (CET) IronPort-SDR: CvgSpEr2sRl8LeLHOh6bG3OFrxjsNW2xXWlI8ZhmURMI4D8C4VBTGbnbUVmYX9ufFkBZkFzK28 6D5SBhfjOrHQ== X-IronPort-AV: E=McAfee;i="6000,8403,9874"; a="264539442" X-IronPort-AV: E=Sophos;i="5.79,373,1602572400"; d="scan'208";a="264539442" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 04:22:08 -0800 IronPort-SDR: oxq25hFe4iKw9STDwTvCoRl89lkCF0LkXvq/CJxU7Ae5BQIxVTBMddIuJ/t03Gs6D+H49+xrzf A1UN5VuVu3iw== X-IronPort-AV: E=Sophos;i="5.79,373,1602572400"; d="scan'208";a="361472587" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.213.243.89]) ([10.213.243.89]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 04:22:06 -0800 To: oulijun , Steve Yang , dev@dpdk.org Cc: thomas@monjalon.net, andrew.rybchenko@oktetlabs.ru References: <20201217092312.27033-1-stevex.yang@intel.com> <20210114094537.13661-1-stevex.yang@intel.com> <7ec898cd-60b0-0d8d-b006-06d6ac3822b8@intel.com> From: Ferruh Yigit Message-ID: <88bcfc14-acdb-19c5-857b-157af922901e@intel.com> Date: Mon, 25 Jan 2021 12:22:03 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 1/19/2021 8:46 AM, oulijun wrote: > > > 在 2021/1/18 18:42, Ferruh Yigit 写道: >> On 1/15/2021 10:44 AM, oulijun wrote: >>> Hi Steve >>> This is a very good job! But I have some question and suggestions. >>> Please check it. >>> >>> 在 2021/1/14 17:45, Steve Yang 写道: >>>> Ethdev is using default Ethernet overhead to decide if provided >>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value, >>>> and limits it to MAX if it is. >>>> >>>> Since the application/driver used Ethernet overhead is different than >>>> the ethdev one, check result is wrong. >>>> >>>> If the driver is using Ethernet overhead bigger than the default one, >>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when >>>> correct Ethernet overhead is used to convert back, the resulting MTU is >>>> less than the intended one, causing some packets to be dropped. >>>> >>>> Like, >>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522 >>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518 >>>> driver  -> MTU = 1518 - 22 = 1496 >>>> Packets with size 1497-1500 are dropped although intention is to be able >>>> to send/receive them. >>>> >>>> The fix is to make ethdev use the correct Ethernet overhead for port, >>>> instead of default one. >>>> >>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors") >>>> >>>> Signed-off-by: Steve Yang >>>> --- >>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++--- >>>>   1 file changed, 24 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c >>>> index 17ddacc78d..19ca4c4512 100644 >>>> --- a/lib/librte_ethdev/rte_ethdev.c >>>> +++ b/lib/librte_ethdev/rte_ethdev.c >>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >>>> nb_rx_q, uint16_t nb_tx_q, >>>>       struct rte_eth_dev *dev; >>>>       struct rte_eth_dev_info dev_info; >>>>       struct rte_eth_conf orig_conf; >>>> +    uint16_t overhead_len; >>>>       int diag; >>>>       int ret; >>>> +    uint16_t old_mtu; >>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >>>> nb_rx_q, uint16_t nb_tx_q, >>>>           memcpy(&dev->data->dev_conf, dev_conf, >>>>                  sizeof(dev->data->dev_conf)); >>>> +    /* Backup mtu for rollback */ >>>> +    old_mtu = dev->data->mtu; >>>> + >>>>       ret = rte_eth_dev_info_get(port_id, &dev_info); >>>>       if (ret != 0) >>>>           goto rollback; >>>> +    /* Get the real Ethernet overhead length */ >>>> +    if (dev_info.max_mtu != UINT16_MAX && >>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu) >>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu; >>>> +    else >>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; >>>> + >>> The Ethernet frame header length supported by each NIC may be different, >>> which cause the difference of allowed packets size and drop you say. >>> The diference of Ethernet frame header length have an impact for the maximum >>> Ethernet frame length, which is a boundary value >>> of enabling jumbo frame through the ' max_rx_pkt_len '. >>> However, we need to do the same thing you do above to get the overhead_len >>> every time, which will >>> cause a lot of duplicate code in the framework and app. For examples, parsing >>> and processing for '--max-pkt-len=xxx' parameter, >>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here >>> dev_configure API. >>> It's a little redundant and troublesome. >>> >>> Maybe, it is necessary for driver to directly report the supported maximum >>> Ethernet frame length by rte_dev_info_get API. >>> As following: >>> struct rte_eth_dev_info { >>>      xxxx >>>      /** >>>       * The maximum Ethernet frame length supported by each >>>       * driver varies with the Ethernet header length. >>>       */ >>>      uint16_t eth_max_len; >>>      xxxx >>> } >>> >>> int >>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) >>> { >>>      xxx >>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU; >>>      dev_info->max_mtu = UINT16_MAX; >>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN; >>>      xxx >>> } >>> >>> And then: >>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info) >>> { >>>      xxx >>>      info->eth_max_len = xxx_ETHER_MAX _LEN; >>>      xxx >>> } >>> What do you think? >> >> Hi Lijun, >> >> The 'max_mtu' has been added in the past to solve exact same problem, perhaps >> it would be better to add something like 'eth_max_len' at that time. >> >> But as Steve mentioned we are planning to switch to more MTU based >> configuration, which should remove the need of the field you mentioned. >> >>>>       /* If number of queues specified by application for both Rx and Tx is >>>>        * zero, use driver preferred values. This cannot be done individually >>>>        * as it is valid for either Tx or Rx (but not both) to be zero. >>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >>>> nb_rx_q, uint16_t nb_tx_q, >>>>               goto rollback; >>>>           } >>>>       } else { >>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN || >>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN) >>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; >>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || >>>> +            pktlen > RTE_ETHER_MTU + overhead_len) >>>>               /* Use default value */ >>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len = >>>> -                            RTE_ETHER_MAX_LEN; >>>> +                        RTE_ETHER_MTU + overhead_len; >>>> +    } >>>> + >>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */ >>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - >>>> +                overhead_len; >>>>       } >>> Now that we update mtu here when jumbo frame enabled. It is necessary to >>> check validity of max_rx_pkt_len. >>> As following: >>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) { >>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen || >>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) { >> >> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + >> overhead_len" is a valid value or not, the API says: >> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */" >> >> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, >> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU' >> > The MTU setting entry does need to be adjusted. I agree! But I think this place > should be modified, and it's probably more reasonable. > > Generally speaking, max_rx_pkt_len will be also updated with 'mtu + > overhead_len' also when application sets mtu > 1500, which means JUMBO_FRAME is > enabled. > Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 1500 + > overhead_len. > Besides, having following check for "rxmode.offloads & > DEV_RX_OFFLOAD_JUMBO_FRAME == 0": > @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t > nb_rx_q, uint16_t nb_tx_q, >               goto rollback; >           } >       } else { >         uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len; >         if (pktlen < RTE_ETHER_MIN_MTU + overhead_len || >             pktlen > RTE_ETHER_MTU + overhead_len) >               /* Use default value */ >               dev->data->dev_conf.rxmode.max_rx_pkt_len = >                         RTE_ETHER_MTU + overhead_len; > } > One of check condition for the branch is "pktlen > RTE_ETHER_MTU + overhead_len ". > > In view of the above two aspects, when application set > DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be greater than " > RTE_ETHER_MTU + overhead_len " > Hi Lijun, Most probably intention was like you said, but that is not implemented or documented that way. The API doc only says 'max_rx_pkt_len' is valid only when 'JUMBO_FRAME' is set, but it doesn't say what is the valid range. And when 'JUMBO_FRAME' is set, the check in the 'rte_eth_dev_configure()' is same from the beggining of the project, and it is between 'RTE_ETHER_MIN_LEN' to 'dev_info.max_rx_pktlen', and changing it will be the behaviour change. So I suggest lets not add this new config restriction, but it is application's responsiblity to set/unset the 'JUMBO_FRAME' flag based on the packet size, and we can implemet this in the testpmd. > Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary. > >>>              xxxx >>>          } >>> } else { >>>          xxx >>> } >>> >>> If it does not thing above, which will cause an unreasonable case. >>> Like, >>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1 >>> dev_conf->rxmode.max_rx_pkt_len = 1500 >>> overhead_len = 26 >>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = >>> 1474 >>> >> >> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', >> the requeted MTU is 1474, >> if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as >> "1500 + overhead". >> >>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500. >> >> Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to >> fix in last release but we reverted them because of unexpected side affects. >> >> Current long term plan is, >> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on >> MTU values. (instead one working on frame lenght and other on MTU) >> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by >> requested MTU value check >> - App decide the jumbo frame capability from the 'max_mtu' check >> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen' >> >>>>       /* >>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t >>>> nb_rx_q, uint16_t nb_tx_q, >>>>       eth_dev_tx_queue_config(dev, 0); >>>>   rollback: >>>>       memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf)); >>>> +    if (old_mtu != dev->data->mtu) >>>> +        dev->data->mtu = old_mtu; >>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret); >>>>       return ret; >>>> >> >> . >>