From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id AEC2B43D8D; Sat, 30 Mar 2024 12:38:08 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 302EA402C3; Sat, 30 Mar 2024 12:38:08 +0100 (CET) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 04C1F402C2 for ; Sat, 30 Mar 2024 12:38:05 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4V6FZ56ckQzXk9W; Sat, 30 Mar 2024 19:35:09 +0800 (CST) Received: from dggpeml500011.china.huawei.com (unknown [7.185.36.84]) by mail.maildlp.com (Postfix) with ESMTPS id 9A26814037D; Sat, 30 Mar 2024 19:38:01 +0800 (CST) Received: from [10.67.121.193] (10.67.121.193) by dggpeml500011.china.huawei.com (7.185.36.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Sat, 30 Mar 2024 19:38:01 +0800 Message-ID: <5d2ab42c-4b56-4a40-8e0c-3ac9a5e34ec6@huawei.com> Date: Sat, 30 Mar 2024 19:38:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/6] ethdev: support setting lanes Content-Language: en-US To: Damodharam Ammepalli , Ajit Khaparde , Thomas Monjalon CC: "lihuisong (C)" , Thomas Monjalon , , , , , , , , , , , References: <20240312075238.3319480-4-huangdengdui@huawei.com> <3325989.AxlXzFCzgd@thomas> <68ee0a54-c0b4-293c-67ee-efed8964c33b@huawei.com> <3628913.0YcMNavOfZ@thomas> From: huangdengdui In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.121.193] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500011.china.huawei.com (7.185.36.84) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024/3/27 2:21, Damodharam Ammepalli wrote: > On Tue, Mar 26, 2024 at 11:12 AM Ajit Khaparde > wrote: >> >> On Tue, Mar 26, 2024 at 6:47 AM Ajit Khaparde >> wrote: >>> >>> On Tue, Mar 26, 2024 at 4:15 AM lihuisong (C) wrote: >>>> >>>> >>>> 在 2024/3/26 18:30, Thomas Monjalon 写道: >>>>> 26/03/2024 02:42, lihuisong (C): >>>>>> 在 2024/3/25 17:30, Thomas Monjalon 写道: >>>>>>> 25/03/2024 07:24, huangdengdui: >>>>>>>> On 2024/3/22 21:58, Thomas Monjalon wrote: >>>>>>>>> 22/03/2024 08:09, Dengdui Huang: >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps 2lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17) /**< 10 Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_100G_2LANES RTE_BIT32(19) /**< 100 Gbps 2 lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_100G_4LANES RTE_BIT32(20) /**< 100 Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_200G_2LANES RTE_BIT32(21) /**< 200 Gbps 2lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_400G_8LANES RTE_BIT32(22) /**< 400 Gbps 8lanes */ >>>>>>>>> I don't think it is a good idea to make this more complex. >>>>>>>>> It brings nothing as far as I can see, compared to having speed and lanes separated. >>>>>>>>> Can we have lanes information a separate value? no need for bitmask. >>>>>>>>> >>>>>>>> Hi,Thomas, Ajit, roretzla, damodharam >>>>>>>> >>>>>>>> I also considered the option at the beginning of the design. >>>>>>>> But this option is not used due to the following reasons: >>>>>>>> 1. For the user, ethtool couples speed and lanes. >>>>>>>> The result of querying the NIC capability is as follows: >>>>>>>> Supported link modes: >>>>>>>> 100000baseSR4/Full >>>>>>>> 100000baseSR2/Full >>>>>>>> The NIC capability is configured as follows: >>>>>>>> ethtool -s eth1 speed 100000 lanes 4 autoneg off >>>>>>>> ethtool -s eth1 speed 100000 lanes 2 autoneg off >>>>>>>> >>>>>>>> Therefore, users are more accustomed to the coupling of speed and lanes. >>>>>>>> >>>>>>>> 2. For the PHY, When the physical layer capability is configured through the MDIO, >>>>>>>> the speed and lanes are also coupled. >>>>>>>> For example: >>>>>>>> Table 45–7—PMA/PMD control 2 register bit definitions[1] >>>>>>>> PMA/PMD type selection >>>>>>>> 1 0 0 1 0 1 0 = 100GBASE-SR2 PMA/PMD >>>>>>>> 0 1 0 1 1 1 1 = 100GBASE-SR4 PMA/PMD >>>>>>>> >>>>>>>> Therefore, coupling speeds and lanes is easier to understand. >>>>>>>> And it is easier for the driver to report the support lanes. >>>>>>>> >>>>>>>> In addition, the code implementation is compatible with the old version. >>>>>>>> When the driver does not support the lanes setting, the code does not need to be modified. >>>>>>>> >>>>>>>> So I think the speed and lanes coupling is better. >>>>>>> I don't think so. >>>>>>> You are mixing hardware implementation, user tool, and API. >>>>>>> Having a separate and simple API is cleaner and not more difficult to handle >>>>>>> in some get/set style functions. >>>>>> Having a separate and simple API is cleaner. It's good. >>>>>> But supported lane capabilities have a lot to do with the specified >>>>>> speed. This is determined by releated specification. >>>>>> If we add a separate API for speed lanes, it probably is hard to check >>>>>> the validity of the configuration for speed and lanes. >>>>>> And the setting lane API sepparated from speed is not good for >>>>>> uniforming all PMD's behavior in ethdev layer. >>>>> Please let's be more specific. >>>>> There are 3 needs: >>>>> - set PHY lane config >>>>> - get PHY lane config >>>>> - get PHY lane capabilities >>>> IMO, this lane capabilities should be reported based on supported speed >>>> capabilities. >>>>> >>>>> There is no problem providing a function to get the number of PHY lanes. >>>>> It is possible to set PHY lanes number after defining a fixed speed. >>>> yes it's ok. >>>>> >>>>>> The patch[1] is an example for this separate API. >>>>>> I think it is not very good. It cannot tell user and PMD the follow points: >>>>>> 1) user don't know what lanes should or can be set for a specified speed >>>>>> on one NIC. >>>>> This is about capabilities. >>>>> Can we say a HW will support a maximum number of PHY lanes in general? >>>>> We may need to associate the maximum speed per lane? >>>>> Do we really need to associate PHY lane and PHY speed numbers for capabilities? >>>> Personally, it should contain the below releationship at least. >>>> speed 10G --> 1lane | 4lane >>>> speed 100G --> 2lane | 4lane >>>>> Example: if a HW supports 100G-4-lanes and 200G-2-lanes, >>>>> may we assume it is also supporting 200G-4-lanes? >>>> I think we cannot assume that NIC also support 200G-4-lanes. >>>> Beause it has a lot to do with HW design. >>>>> >>>>>> 2) how should PMD do for a supported lanes in their HW? >>>>> I don't understand this question. Please rephrase. >>>> I mean that PMD don't know set how many lanes when the lanes from user >>>> is not supported on a fixed speed by PMD. >>>> So ethdev layer should limit the avaiable lane number based on a fixed >>>> speed. >>> >>> ethdev layer has generally been opaque. We should keep it that way. >> I mis-typed. >> %s/opaque/transparent >> >> >>> The PMD should know what the HW supports. >>> So it should show the capabilities correctly. Right? >>> And if the user provides incorrect settings, it should reject it. >>> >>>>> >>>>>> Anyway, if we add setting speed lanes feature, we must report and set >>>>>> speed and lanes capabilities for user well. >>>>>> otherwise, user will be more confused. >>>>> Well is not necessarily exposing all raw combinations as ethtool does. >>>> Agreed. >>>>> >>>>>> [1] https://patchwork.dpdk.org/project/dpdk/list/?series=31606 >>>>> >>>>> >>>>> > Our RFC patch's cmdline design is inspired by how ethtool works as it > provides carriage return at user choice, > which makes it backward compatible for no lanes config also. testpmd > does not have that flexibility > in the speed command and we resorted to a separate command for lanes > and for all other reasons mentioned > earlier. > > 2nd, the lanes validation logic resting place, irrespective of lanes > in speed or separate lanes command, > like others said, the AutoNegotiation itself should suffice for link > train and up. Taking this example, > if the link came up at 100G PAM4-112 AN'd, and user for whatever > reasons, even others mentioned earlier, > may want it to force 100Gb NRZ which is 25G per lane 4, lanes), the > user should aware of cmds the tool offers, > and the driver can do final validation, for anomalies. > > In any case, in RFC patch we are doing lanes validation in > cmd_validate_lanes(portid_t pid, uint32_t *lanes), that gets populated > by hw/driver based on current > AN's link up speed and signalling type. > > Today 400Gig is 4(pam4_56), 8(pam4_112) lanes and in future with a new > HW design, > it may be 2 x 200Gig lanes, at that time. we don't need to update > testpmd, handle it in the driver. > Maybe for a new speed 800Gb+, can demand an update to app/lib entries. > As thomas said, There are 3 needs: - set the number of lanes for a device. - get number of current lanes for a device. - get the number of lanes supported by the device. For the first two needs, similar to the Damodharam's RFC patch [1], the lane setting and speed setting are separated, I'm happy to set the lanes this way. But, there are different solutions for the device to report the setting lane capability, as following: 1. Like the current patch, reporting device capabilities in speed and lane coupling mode. However, if we use this solution, we will have to couple the the lanes setting with speed setting. 2. Like the Damodharam's RFC patch [1], the device reports the maximum number of supported lanes. Users can config a lane randomly, which is completely separated from the speed. 3. Similar to the FEC capability reported by a device, the device reports the relationship table of the number of lanes supported by the speed, for example: speed lanes_capa 50G 1,2 100G 1,2,4 200G 2,4 Options 1 and 2 have been discussed a lot above. For solution 1, the speed and lanes are over-coupled, and the implementation is too complex. But I think it's easier to understand and easier for the device to report capabilities. In addition, the ethtool reporting capability also uses this mode. For solution 2, as huisong said that user don't know what lanes should or can be set for a specified speed on one NIC. I think that when the device reports the capability, the lanes should be associated with the speed. In this way, users can know which lanes are supported by the current speed and verify the configuration validity. So I think solution 3 is better. What do you think? [1] https://patchwork.dpdk.org/project/dpdk/list/?series=31606