From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DC333A00BE; Fri, 1 Nov 2019 10:32:48 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4C20A1D440; Fri, 1 Nov 2019 10:32:48 +0100 (CET) Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) by dpdk.org (Postfix) with ESMTP id DCB951D424 for ; Fri, 1 Nov 2019 10:32:46 +0100 (CET) X-Originating-IP: 90.177.210.238 Received: from [192.168.1.110] (238.210.broadband10.iol.cz [90.177.210.238]) (Authenticated sender: i.maximets@ovn.org) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 5D26D1BF205; Fri, 1 Nov 2019 09:32:44 +0000 (UTC) To: Thomas Monjalon , Ilya Maximets Cc: dev@dpdk.org, Shahaf Shuler , Jerin Jacob , Andrew Rybchenko , Ferruh Yigit , Stephen Hemminger , Roni Bar Yanai , Rony Efraim , declan.doherty@intel.com, bernard.iremonger@intel.com, ajit.khaparde@broadcom.com References: <4165509.5enYigmRGf@xps> <5190422.ormd5srm06@xps> <1968866.mbH2BcW0Fd@xps> From: Ilya Maximets Message-ID: Date: Fri, 1 Nov 2019 10:32:43 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <1968866.mbH2BcW0Fd@xps> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 30.10.2019 22:42, Thomas Monjalon wrote: > 30/10/2019 17:09, Ilya Maximets: >> On 30.10.2019 16:49, Thomas Monjalon wrote: >>> 30/10/2019 16:07, Ilya Maximets: >>>> On 29.10.2019 19:50, Thomas Monjalon wrote: >>>>> In a virtual environment, the network controller may have to configure >>>>> some SR-IOV VF parameters for security reasons. >>>>> >>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case), >>>>> we face two different cases: >>>>> - driver is bifurcated (Mellanox case), >>>>> so the VF can be configured via the kernel. >>>>> - driver is on top of UIO or VFIO, so DPDK API is required, >>>>> and PMD-specific APIs were used. >>>> >>>> So, what is wrong with setting VF mac via the representor port as >>>> it done now? From our previous discussion, I thought that we concluded >>>> that is enough to have current API, i.e. just call set_default_mac() >>>> for a representor port and this will lead to setting mac for VF. >>>> This is how it's implemented in Linux kernel and this is how it's >>>> implemented in current DPDK drivers that supports setting mac for >>>> the representor. >>> >>> I don't know what is done in the Linux kernel. >>> In DPDK, setting the MAC of the representor is really setting >>> the MAC of the representor. Is it crazy? >> >> Kind of. And no, it doesn't work this way in DPDK. >> Just look at the i40e driver: >> >> 325 static int >> 326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev, >> 327 struct ether_addr *mac_addr) >> 328 { >> 329 struct i40e_vf_representor *representor = ethdev->data->dev_private; >> 330 >> 331 return rte_pmd_i40e_set_vf_mac_addr( >> 332 representor->adapter->eth_dev->data->port_id, >> 333 representor->vf_id, mac_addr); >> 334 } >> .... >> 423 static const struct eth_dev_ops i40e_representor_dev_ops = { >> <...> >> 445 .mac_addr_set = i40e_vf_representor_mac_addr_set, >> >> >> It really calls the function to set VF mac address. > > Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling > rte_pmd_i40e_set_vf_mac_addr(). > >> And for MLX drivers it's the same. >> MLX driver call netlink to set representor MAC, but netlink is in >> kernel and this call inside the kernel translates to the setting >> of mac address of the VF. >> >> Am I missing something? > > I am not sure about this kernel translation. At least not in mlx5. > Setting MAC address of a VF representor seems not supported on mlx5. > But there is a specific netlink request to set a VF MAC address: > ip link set vf mac > > >>>> The only use case for this new API is to be able to control mac of >>>> the representor itself, which doesn't make much sense. At least there >>>> are only hypothetical use cases. And once again, Linux kernel doesn't >>>> support this behavior. >>> >>> I think it is sane to be able to set different MAC addresses >>> for the representor and the VF. > > Let me explain better my thoughts. > In the switchdev design, VF and uplink ports are all connected > together via a switch. > The representors are mirrors of those switch ports. > So a VF representor port is supposed to mirror the switch port > where a VF is connected to. It is different of the VF itself. > > This is a drawing of my understanding of switchdev design: > > VF1 ------ VF1 rep /--------\ > | switch | uplink rep ------ uplink ------ wire > VF2 ------ VF2 rep \--------/ (PF) > > Of course, there can be more VF and uplink ports. > > With some switch-aware protocols, it might be interesting to have > different MAC addresses on a VF and its representor. > And more generally, I imagine that the config of a VF representor > could be different of the config of a VF. > > >>>>> This new generic API will avoid vendors fragmentation. >>>> >>>> I don't see the fragmentation. Right now you can set VF mac from DPDK >>>> by calling set_default_mac() for representor. This API exists for all >>>> vendors. Not implemented for Intel, but new API is not implemented too. >>> >>> No, the current situation is the following: >>> - for mlx5, VF MAC can be configured with iproute2 or netlink >>> - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr() >>> - for i40e, rte_pmd_i40e_set_vf_mac_addr() >>> - for bnxt, rte_pmd_bnxt_set_vf_mac_addr() > > Thanks to Ilya's comment, this is an update of the DPDK situation: > - for mlx5, VF MAC can be configured with iproute2 or netlink > and rte_eth_dev_default_mac_addr_set(rep) is not supported > - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr() > and rte_eth_dev_default_mac_addr_set(rep) does the same > - for i40e, rte_pmd_i40e_set_vf_mac_addr() > and rte_eth_dev_default_mac_addr_set(rep) does the same > - for bnxt, rte_pmd_bnxt_set_vf_mac_addr() > and no representor > > If we consider what Intel did, i.e. configure VF in place of representor > for some operations, there are two drawbacks: > - confusing that some ops apply to representor, others apply to VF > - some ops are not possible on representor (because targetted to VF) > > I still feel that the addition of one single bit in the port ID > is an elegant solution to target either the VF or its representor. Since we already have a confusion about what is configured when operations are performed on a representor port we have 2 options: 1. Have this proposed API to configure representor itself while setting config to representor and configuring VF if special bit enabled. 2. Reverse the logic of current proposal, i.e. always apply configuration to VF while working with representor and apply configuration to representor itself if special bit is set. I'd probably prefer option #2, because: - From the OVS and OpenStack point of view, I think, we don't really need to configure representor itself in most cases. And OVS really should not know if it works with representor or some real port. - It seems that most of the existing code in DPDK already works like this, i.e. applying configs to VF itself. Intel drivers works like this and Mellanox drivers, as Thomas said, doesn't have this functionality at all. > > >>>> The this is that this new API will produce conceptual fragmentation >>>> between DPDK and the Linux kernel, because to do the same thing you'll >>>> have to use different ways. I mean, to change mac of VF in kernel you >>>> need to set mac to the representor, but in DPDK changing setting mac to >>>> representor will lead to changing the mac of the representor itself, >>>> not the VF. This will be really confusing for users. >>> >>> I am not responsible of the choices in Linux. >>> But I agree it would be interesting to check the reason of such decision. >>> Rony, please could you explain? > > I looked at few Linux drivers: > > bnxt_vf_rep_netdev_ops has no op to set MAC > bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF > > lio_vf_rep_ndev_ops has no op to set MAC > lionetdevops.ndo_set_vf_mac = set VF MAC from PF > > mlx5e_netdev_ops_rep has no op to set MAC > mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF > mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF > > nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC > nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor > nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF > > There is a big chance that the behaviour is not standardized in Linux > (as usual). So it is already confusing for users of Linux. > >