From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 91886A034C; Mon, 5 Sep 2022 09:15:21 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 39BB440697; Mon, 5 Sep 2022 09:15:21 +0200 (CEST) Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by mails.dpdk.org (Postfix) with ESMTP id 00262400D4 for ; Mon, 5 Sep 2022 09:15:19 +0200 (CEST) Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4MLfnv5Zz6z1P6gG; Mon, 5 Sep 2022 15:11:31 +0800 (CST) Received: from kwepemm600004.china.huawei.com (7.193.23.242) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 5 Sep 2022 15:15:13 +0800 Received: from [10.67.103.231] (10.67.103.231) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 5 Sep 2022 15:15:12 +0800 Message-ID: <849ec1cc-3774-6aa0-65a9-7461a483169d@huawei.com> Date: Mon, 5 Sep 2022 15:15:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment and doc To: Long Li , Ferruh Yigit CC: "dev@dpdk.org" , Ajay Sharma , Stephen Hemminger References: <1661899911-13086-1-git-send-email-longli@linuxonhyperv.com> <1661899911-13086-2-git-send-email-longli@linuxonhyperv.com> <020abf35-2bb0-795c-fc2f-c970f12c875c@huawei.com> From: "lihuisong (C)" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.231] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600004.china.huawei.com (7.193.23.242) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 在 2022/9/1 2:05, Long Li 写道: >> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment >> and doc >> >> >> 在 2022/8/31 6:51, longli@linuxonhyperv.com 写道: >>> From: Long Li >>> >>> MANA is a PCI device. It uses IB verbs to access hardware through the >>> kernel RDMA layer. This patch introduces build environment and basic >>> device probe functions. >>> >>> Signed-off-by: Long Li >>> --- >>> Change log: >>> v2: >>> Fix typos. >>> Make the driver build only on x86-64 and Linux. >>> Remove unused header files. >>> Change port definition to uint16_t or uint8_t (for IB). >>> Use getline() in place of fgets() to read and truncate a line. >>> v3: >>> Add meson build check for required functions from RDMA direct verb >>> header file >>> v4: >>> Remove extra "\n" in logging code. >>> Use "r" in place of "rb" in fopen() to read text files. >>> >>> [snip] >>> + >>> +static int mana_pci_probe_mac(struct rte_pci_driver *pci_drv >> __rte_unused, >>> + struct rte_pci_device *pci_dev, >>> + struct rte_ether_addr *mac_addr) { >>> + struct ibv_device **ibv_list; >>> + int ibv_idx; >>> + struct ibv_context *ctx; >>> + struct ibv_device_attr_ex dev_attr; >>> + int num_devices; >>> + int ret = 0; >>> + uint8_t port; >>> + struct mana_priv *priv = NULL; >>> + struct rte_eth_dev *eth_dev = NULL; >>> + bool found_port; >>> + >>> + ibv_list = ibv_get_device_list(&num_devices); >>> + for (ibv_idx = 0; ibv_idx < num_devices; ibv_idx++) { >>> + struct ibv_device *ibdev = ibv_list[ibv_idx]; >>> + struct rte_pci_addr pci_addr; >>> + >>> + DRV_LOG(INFO, "Probe device name %s dev_name %s >> ibdev_path %s", >>> + ibdev->name, ibdev->dev_name, ibdev- >>> ibdev_path); >>> + >>> + if (mana_ibv_device_to_pci_addr(ibdev, &pci_addr)) >>> + continue; >>> + >>> + /* Ignore if this IB device is not this PCI device */ >>> + if (pci_dev->addr.domain != pci_addr.domain || >>> + pci_dev->addr.bus != pci_addr.bus || >>> + pci_dev->addr.devid != pci_addr.devid || >>> + pci_dev->addr.function != pci_addr.function) >>> + continue; >>> + >>> + ctx = ibv_open_device(ibdev); >>> + if (!ctx) { >>> + DRV_LOG(ERR, "Failed to open IB device %s", >>> + ibdev->name); >>> + continue; >>> + } >>> + >>> + ret = ibv_query_device_ex(ctx, NULL, &dev_attr); >>> + DRV_LOG(INFO, "dev_attr.orig_attr.phys_port_cnt %u", >>> + dev_attr.orig_attr.phys_port_cnt); >>> + found_port = false; >>> + >>> + for (port = 1; port <= dev_attr.orig_attr.phys_port_cnt; >>> + port++) { >>> + struct ibv_parent_domain_init_attr attr = {}; >>> + struct rte_ether_addr addr; >>> + char address[64]; >>> + char name[RTE_ETH_NAME_MAX_LEN]; >>> + >>> + ret = get_port_mac(ibdev, port, &addr); >>> + if (ret) >>> + continue; >>> + >>> + if (mac_addr && !rte_is_same_ether_addr(&addr, >> mac_addr)) >>> + continue; >>> + >>> + rte_ether_format_addr(address, sizeof(address), >> &addr); >>> + DRV_LOG(INFO, "device located port %u address >> %s", >>> + port, address); >>> + found_port = true; >>> + >>> + priv = rte_zmalloc_socket(NULL, sizeof(*priv), >>> + RTE_CACHE_LINE_SIZE, >>> + SOCKET_ID_ANY); >>> + if (!priv) { >>> + ret = -ENOMEM; >>> + goto failed; >>> + } >>> + >>> + snprintf(name, sizeof(name), "%s_port%d", >>> + pci_dev->device.name, port); >>> + >>> + if (rte_eal_process_type() == >> RTE_PROC_SECONDARY) { >>> + int fd; >>> + >>> + eth_dev = >> rte_eth_dev_attach_secondary(name); >>> + if (!eth_dev) { >>> + DRV_LOG(ERR, "Can't attach to dev >> %s", >>> + name); >>> + ret = -ENOMEM; >>> + goto failed; >>> + } >>> + >>> + eth_dev->device = &pci_dev->device; >>> + eth_dev->dev_ops = &mana_dev_sec_ops; >>> + ret = mana_proc_priv_init(eth_dev); >>> + if (ret) >>> + goto failed; >>> + priv->process_priv = eth_dev- >>> process_private; >>> + >>> + /* Get the IB FD from the primary process */ >>> + fd = >> mana_mp_req_verbs_cmd_fd(eth_dev); >>> + if (fd < 0) { >>> + DRV_LOG(ERR, "Failed to get FD %d", >> fd); >>> + ret = -ENODEV; >>> + goto failed; >>> + } >>> + >>> + ret = >> mana_map_doorbell_secondary(eth_dev, fd); >>> + if (ret) { >>> + DRV_LOG(ERR, "Failed secondary >> map %d", >>> + fd); >>> + goto failed; >>> + } >>> + >>> + /* fd is no not used after mapping doorbell */ >>> + close(fd); >>> + >>> + rte_spinlock_lock(&mana_shared_data- >>> lock); >>> + mana_shared_data->secondary_cnt++; >>> + mana_local_data.secondary_cnt++; >>> + rte_spinlock_unlock(&mana_shared_data- >>> lock); >>> + >>> + rte_eth_copy_pci_info(eth_dev, pci_dev); >>> + rte_eth_dev_probing_finish(eth_dev); >>> + >>> + /* Impossible to have more than one port >>> + * matching a MAC address >>> + */ >>> + continue; >>> + } >>> + >>> + eth_dev = rte_eth_dev_allocate(name); >>> + if (!eth_dev) { >>> + ret = -ENOMEM; >>> + goto failed; >>> + } >>> + >>> + eth_dev->data->mac_addrs = >>> + rte_calloc("mana_mac", 1, >>> + sizeof(struct rte_ether_addr), 0); >>> + if (!eth_dev->data->mac_addrs) { >>> + ret = -ENOMEM; >>> + goto failed; >>> + } >>> + >>> + rte_ether_addr_copy(&addr, eth_dev->data- >>> mac_addrs); >>> + >>> + priv->ib_pd = ibv_alloc_pd(ctx); >>> + if (!priv->ib_pd) { >>> + DRV_LOG(ERR, "ibv_alloc_pd failed port %d", >> port); >>> + ret = -ENOMEM; >>> + goto failed; >>> + } >>> + >>> + /* Create a parent domain with the port number */ >>> + attr.pd = priv->ib_pd; >>> + attr.comp_mask = >> IBV_PARENT_DOMAIN_INIT_ATTR_PD_CONTEXT; >>> + attr.pd_context = (void *)(uint64_t)port; >>> + priv->ib_parent_pd = ibv_alloc_parent_domain(ctx, >> &attr); >>> + if (!priv->ib_parent_pd) { >>> + DRV_LOG(ERR, >>> + "ibv_alloc_parent_domain failed port >> %d", >>> + port); >>> + ret = -ENOMEM; >>> + goto failed; >>> + } >>> + >>> + priv->ib_ctx = ctx; >>> + priv->port_id = eth_dev->data->port_id; >>> + priv->dev_port = port; >>> + eth_dev->data->dev_private = priv; >>> + priv->dev_data = eth_dev->data; >>> + >>> + priv->max_rx_queues = dev_attr.orig_attr.max_qp; >>> + priv->max_tx_queues = dev_attr.orig_attr.max_qp; >>> + >>> + priv->max_rx_desc = >>> + RTE_MIN(dev_attr.orig_attr.max_qp_wr, >>> + dev_attr.orig_attr.max_cqe); >>> + priv->max_tx_desc = >>> + RTE_MIN(dev_attr.orig_attr.max_qp_wr, >>> + dev_attr.orig_attr.max_cqe); >>> + >>> + priv->max_send_sge = dev_attr.orig_attr.max_sge; >>> + priv->max_recv_sge = dev_attr.orig_attr.max_sge; >>> + >>> + priv->max_mr = dev_attr.orig_attr.max_mr; >>> + priv->max_mr_size = >> dev_attr.orig_attr.max_mr_size; >>> + >>> + DRV_LOG(INFO, "dev %s max queues %d desc %d >> sge %d", >>> + name, priv->max_rx_queues, priv- >>> max_rx_desc, >>> + priv->max_send_sge); >>> + >>> + rte_spinlock_lock(&mana_shared_data->lock); >>> + mana_shared_data->primary_cnt++; >>> + rte_spinlock_unlock(&mana_shared_data->lock); >>> + >>> + eth_dev->data->dev_flags |= >> RTE_ETH_DEV_INTR_RMV; >>> + >>> + eth_dev->device = &pci_dev->device; >>> + eth_dev->data->dev_flags |= >>> + RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; >>> + >> Please do not use the temporary macro. Please review this patch: >> >> f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue xstats") >> >> This patch requires that per queue statistics are filled in >> .xstats_get() by PMD. > Thanks for pointing this out. > > It seems some PMDs are still depending on this flag for xstats. > > MANA doesn't implement xstats_get() currently, this flag is useful. Is it okay to keep using this flag before it's finally the time to remove it from all PMDs, or when MANA implements xstats? Yes, your xstats doesn't implement now. Per queue stats should be filled in xstats API, and the stats API cannot see per queue stats, so stats API in driver shouldn't fill it(suggest that delete it from patch 17/18). I guess this flag can be removed if PMD does not support xstats. > >