From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27823A0551; Wed, 7 Sep 2022 04:16:47 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C0738400D6; Wed, 7 Sep 2022 04:16:46 +0200 (CEST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by mails.dpdk.org (Postfix) with ESMTP id C34C740042 for ; Wed, 7 Sep 2022 04:16:44 +0200 (CEST) Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4MMm5p5FJcznWGt; Wed, 7 Sep 2022 10:14:06 +0800 (CST) Received: from kwepemm600004.china.huawei.com (7.193.23.242) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 7 Sep 2022 10:16:40 +0800 Received: from [10.67.103.231] (10.67.103.231) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 7 Sep 2022 10:16:39 +0800 Message-ID: <6a33d5a2-7b5a-d143-2979-451a60e413ae@huawei.com> Date: Wed, 7 Sep 2022 10:16:39 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment and doc To: Long Li , Ferruh Yigit CC: "dev@dpdk.org" , Ajay Sharma , Stephen Hemminger References: <1661899911-13086-1-git-send-email-longli@linuxonhyperv.com> <1661899911-13086-2-git-send-email-longli@linuxonhyperv.com> <020abf35-2bb0-795c-fc2f-c970f12c875c@huawei.com> <849ec1cc-3774-6aa0-65a9-7461a483169d@huawei.com> From: "lihuisong (C)" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.231] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600004.china.huawei.com (7.193.23.242) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 在 2022/9/7 9:36, Long Li 写道: >> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment >> and doc >> >> >> 在 2022/9/1 2:05, Long Li 写道: >>>> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build >>>> environment and doc >>>> >>>> >>>> 在 2022/8/31 6:51, longli@linuxonhyperv.com 写道: >>>>> From: Long Li >>>>> >>>>> MANA is a PCI device. It uses IB verbs to access hardware through >>>>> the kernel RDMA layer. This patch introduces build environment and >>>>> basic device probe functions. >>>>> >>>>> Signed-off-by: Long Li >>>>> --- >>>>> Change log: >>>>> v2: >>>>> Fix typos. >>>>> Make the driver build only on x86-64 and Linux. >>>>> Remove unused header files. >>>>> Change port definition to uint16_t or uint8_t (for IB). >>>>> Use getline() in place of fgets() to read and truncate a line. >>>>> v3: >>>>> Add meson build check for required functions from RDMA direct verb >>>>> header file >>>>> v4: >>>>> Remove extra "\n" in logging code. >>>>> Use "r" in place of "rb" in fopen() to read text files. >>>>> >>>>> [snip] >>>>> + >>>>> +static int mana_pci_probe_mac(struct rte_pci_driver *pci_drv >>>> __rte_unused, >>>>> + struct rte_pci_device *pci_dev, >>>>> + struct rte_ether_addr *mac_addr) { >>>>> + struct ibv_device **ibv_list; >>>>> + int ibv_idx; >>>>> + struct ibv_context *ctx; >>>>> + struct ibv_device_attr_ex dev_attr; >>>>> + int num_devices; >>>>> + int ret = 0; >>>>> + uint8_t port; >>>>> + struct mana_priv *priv = NULL; >>>>> + struct rte_eth_dev *eth_dev = NULL; >>>>> + bool found_port; >>>>> + >>>>> + ibv_list = ibv_get_device_list(&num_devices); >>>>> + for (ibv_idx = 0; ibv_idx < num_devices; ibv_idx++) { >>>>> + struct ibv_device *ibdev = ibv_list[ibv_idx]; >>>>> + struct rte_pci_addr pci_addr; >>>>> + >>>>> + DRV_LOG(INFO, "Probe device name %s dev_name %s >>>> ibdev_path %s", >>>>> + ibdev->name, ibdev->dev_name, ibdev- >>>>> ibdev_path); >>>>> + >>>>> + if (mana_ibv_device_to_pci_addr(ibdev, &pci_addr)) >>>>> + continue; >>>>> + >>>>> + /* Ignore if this IB device is not this PCI device */ >>>>> + if (pci_dev->addr.domain != pci_addr.domain || >>>>> + pci_dev->addr.bus != pci_addr.bus || >>>>> + pci_dev->addr.devid != pci_addr.devid || >>>>> + pci_dev->addr.function != pci_addr.function) >>>>> + continue; >>>>> + >>>>> + ctx = ibv_open_device(ibdev); >>>>> + if (!ctx) { >>>>> + DRV_LOG(ERR, "Failed to open IB device %s", >>>>> + ibdev->name); >>>>> + continue; >>>>> + } >>>>> + >>>>> + ret = ibv_query_device_ex(ctx, NULL, &dev_attr); >>>>> + DRV_LOG(INFO, "dev_attr.orig_attr.phys_port_cnt %u", >>>>> + dev_attr.orig_attr.phys_port_cnt); >>>>> + found_port = false; >>>>> + >>>>> + for (port = 1; port <= dev_attr.orig_attr.phys_port_cnt; >>>>> + port++) { >>>>> + struct ibv_parent_domain_init_attr attr = {}; >>>>> + struct rte_ether_addr addr; >>>>> + char address[64]; >>>>> + char name[RTE_ETH_NAME_MAX_LEN]; >>>>> + >>>>> + ret = get_port_mac(ibdev, port, &addr); >>>>> + if (ret) >>>>> + continue; >>>>> + >>>>> + if (mac_addr && !rte_is_same_ether_addr(&addr, >>>> mac_addr)) >>>>> + continue; >>>>> + >>>>> + rte_ether_format_addr(address, sizeof(address), >>>> &addr); >>>>> + DRV_LOG(INFO, "device located port %u address >>>> %s", >>>>> + port, address); >>>>> + found_port = true; >>>>> + >>>>> + priv = rte_zmalloc_socket(NULL, sizeof(*priv), >>>>> + RTE_CACHE_LINE_SIZE, >>>>> + SOCKET_ID_ANY); >>>>> + if (!priv) { >>>>> + ret = -ENOMEM; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + snprintf(name, sizeof(name), "%s_port%d", >>>>> + pci_dev->device.name, port); >>>>> + >>>>> + if (rte_eal_process_type() == >>>> RTE_PROC_SECONDARY) { >>>>> + int fd; >>>>> + >>>>> + eth_dev = >>>> rte_eth_dev_attach_secondary(name); >>>>> + if (!eth_dev) { >>>>> + DRV_LOG(ERR, "Can't attach to dev >>>> %s", >>>>> + name); >>>>> + ret = -ENOMEM; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + eth_dev->device = &pci_dev->device; >>>>> + eth_dev->dev_ops = &mana_dev_sec_ops; >>>>> + ret = mana_proc_priv_init(eth_dev); >>>>> + if (ret) >>>>> + goto failed; >>>>> + priv->process_priv = eth_dev- >>>>> process_private; >>>>> + >>>>> + /* Get the IB FD from the primary process */ >>>>> + fd = >>>> mana_mp_req_verbs_cmd_fd(eth_dev); >>>>> + if (fd < 0) { >>>>> + DRV_LOG(ERR, "Failed to get FD %d", >>>> fd); >>>>> + ret = -ENODEV; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + ret = >>>> mana_map_doorbell_secondary(eth_dev, fd); >>>>> + if (ret) { >>>>> + DRV_LOG(ERR, "Failed secondary >>>> map %d", >>>>> + fd); >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + /* fd is no not used after mapping doorbell */ >>>>> + close(fd); >>>>> + >>>>> + rte_spinlock_lock(&mana_shared_data- >>>>> lock); >>>>> + mana_shared_data->secondary_cnt++; >>>>> + mana_local_data.secondary_cnt++; >>>>> + rte_spinlock_unlock(&mana_shared_data- >>>>> lock); >>>>> + >>>>> + rte_eth_copy_pci_info(eth_dev, pci_dev); >>>>> + rte_eth_dev_probing_finish(eth_dev); >>>>> + >>>>> + /* Impossible to have more than one port >>>>> + * matching a MAC address >>>>> + */ >>>>> + continue; >>>>> + } >>>>> + >>>>> + eth_dev = rte_eth_dev_allocate(name); >>>>> + if (!eth_dev) { >>>>> + ret = -ENOMEM; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + eth_dev->data->mac_addrs = >>>>> + rte_calloc("mana_mac", 1, >>>>> + sizeof(struct rte_ether_addr), 0); >>>>> + if (!eth_dev->data->mac_addrs) { >>>>> + ret = -ENOMEM; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + rte_ether_addr_copy(&addr, eth_dev->data- >>>>> mac_addrs); >>>>> + >>>>> + priv->ib_pd = ibv_alloc_pd(ctx); >>>>> + if (!priv->ib_pd) { >>>>> + DRV_LOG(ERR, "ibv_alloc_pd failed port %d", >>>> port); >>>>> + ret = -ENOMEM; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + /* Create a parent domain with the port number */ >>>>> + attr.pd = priv->ib_pd; >>>>> + attr.comp_mask = >>>> IBV_PARENT_DOMAIN_INIT_ATTR_PD_CONTEXT; >>>>> + attr.pd_context = (void *)(uint64_t)port; >>>>> + priv->ib_parent_pd = ibv_alloc_parent_domain(ctx, >>>> &attr); >>>>> + if (!priv->ib_parent_pd) { >>>>> + DRV_LOG(ERR, >>>>> + "ibv_alloc_parent_domain failed port >>>> %d", >>>>> + port); >>>>> + ret = -ENOMEM; >>>>> + goto failed; >>>>> + } >>>>> + >>>>> + priv->ib_ctx = ctx; >>>>> + priv->port_id = eth_dev->data->port_id; >>>>> + priv->dev_port = port; >>>>> + eth_dev->data->dev_private = priv; >>>>> + priv->dev_data = eth_dev->data; >>>>> + >>>>> + priv->max_rx_queues = dev_attr.orig_attr.max_qp; >>>>> + priv->max_tx_queues = dev_attr.orig_attr.max_qp; >>>>> + >>>>> + priv->max_rx_desc = >>>>> + RTE_MIN(dev_attr.orig_attr.max_qp_wr, >>>>> + dev_attr.orig_attr.max_cqe); >>>>> + priv->max_tx_desc = >>>>> + RTE_MIN(dev_attr.orig_attr.max_qp_wr, >>>>> + dev_attr.orig_attr.max_cqe); >>>>> + >>>>> + priv->max_send_sge = dev_attr.orig_attr.max_sge; >>>>> + priv->max_recv_sge = dev_attr.orig_attr.max_sge; >>>>> + >>>>> + priv->max_mr = dev_attr.orig_attr.max_mr; >>>>> + priv->max_mr_size = >>>> dev_attr.orig_attr.max_mr_size; >>>>> + >>>>> + DRV_LOG(INFO, "dev %s max queues %d desc %d >>>> sge %d", >>>>> + name, priv->max_rx_queues, priv- >>>>> max_rx_desc, >>>>> + priv->max_send_sge); >>>>> + >>>>> + rte_spinlock_lock(&mana_shared_data->lock); >>>>> + mana_shared_data->primary_cnt++; >>>>> + rte_spinlock_unlock(&mana_shared_data->lock); >>>>> + >>>>> + eth_dev->data->dev_flags |= >>>> RTE_ETH_DEV_INTR_RMV; >>>>> + >>>>> + eth_dev->device = &pci_dev->device; >>>>> + eth_dev->data->dev_flags |= >>>>> + RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; >>>>> + >>>> Please do not use the temporary macro. Please review this patch: >>>> >>>> f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue >>>> xstats") >>>> >>>> This patch requires that per queue statistics are filled in >>>> .xstats_get() by PMD. >>> Thanks for pointing this out. >>> >>> It seems some PMDs are still depending on this flag for xstats. >>> >>> MANA doesn't implement xstats_get() currently, this flag is useful. Is it >> okay to keep using this flag before it's finally the time to remove it from all >> PMDs, or when MANA implements xstats? >> Yes, your xstats doesn't implement now. Per queue stats should be filled in >> xstats API, and the stats API cannot see per queue stats, so stats API in driver >> shouldn't fill it(suggest that delete it from patch 17/18). >> >> I guess this flag can be removed if PMD does not support xstats. >>> > I don't understand your suggestion. An application can call rte_eth_stats_get() to get port stats, and this will call into stats_get() in the driver, as implemented in patch 17/18. > > When flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS is set, an application can also use rte_eth_xstats_get() to get port stats even the driver doesn't implement xstats_get(). I think new PMD should follow the announced switch Ferruh mentioned, otherwise, the switch will never be completed. Suggest that mana driver can implement a simple xstats_get() to fill per queue stats if you want to support per queue stats. @Ferruh, what do you think?