DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@xilinx.com>
To: "lihuisong (C)" <lihuisong@huawei.com>, Long Li <longli@microsoft.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Ajay Sharma <sharmaajay@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment and doc
Date: Wed, 7 Sep 2022 12:11:09 +0100	[thread overview]
Message-ID: <a858c5be-42b5-0401-d16e-7f27d129f1ce@xilinx.com> (raw)
In-Reply-To: <6a33d5a2-7b5a-d143-2979-451a60e413ae@huawei.com>

On 9/7/2022 3:16 AM, lihuisong (C) wrote:
> CAUTION: This message has originated from an External Source. Please use 
> proper judgment and caution when opening attachments, clicking links, or 
> responding to this email.
> 
> 
> 在 2022/9/7 9:36, Long Li 写道:
>>> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build 
>>> environment
>>> and doc
>>>
>>>
>>> 在 2022/9/1 2:05, Long Li 写道:
>>>>> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build
>>>>> environment and doc
>>>>>
>>>>>
>>>>> 在 2022/8/31 6:51, longli@linuxonhyperv.com 写道:
>>>>>> From: Long Li <longli@microsoft.com>
>>>>>>
>>>>>> MANA is a PCI device. It uses IB verbs to access hardware through
>>>>>> the kernel RDMA layer. This patch introduces build environment and
>>>>>> basic device probe functions.
>>>>>>
>>>>>> Signed-off-by: Long Li <longli@microsoft.com>
>>>>>> ---
>>>>>> Change log:
>>>>>> v2:
>>>>>> Fix typos.
>>>>>> Make the driver build only on x86-64 and Linux.
>>>>>> Remove unused header files.
>>>>>> Change port definition to uint16_t or uint8_t (for IB).
>>>>>> Use getline() in place of fgets() to read and truncate a line.
>>>>>> v3:
>>>>>> Add meson build check for required functions from RDMA direct verb
>>>>>> header file
>>>>>> v4:
>>>>>> Remove extra "\n" in logging code.
>>>>>> Use "r" in place of "rb" in fopen() to read text files.
>>>>>>
>>>>>> [snip]
>>>>>> +
>>>>>> +static int mana_pci_probe_mac(struct rte_pci_driver *pci_drv
>>>>> __rte_unused,
>>>>>> +                       struct rte_pci_device *pci_dev,
>>>>>> +                       struct rte_ether_addr *mac_addr) {
>>>>>> + struct ibv_device **ibv_list;
>>>>>> + int ibv_idx;
>>>>>> + struct ibv_context *ctx;
>>>>>> + struct ibv_device_attr_ex dev_attr;
>>>>>> + int num_devices;
>>>>>> + int ret = 0;
>>>>>> + uint8_t port;
>>>>>> + struct mana_priv *priv = NULL;
>>>>>> + struct rte_eth_dev *eth_dev = NULL;
>>>>>> + bool found_port;
>>>>>> +
>>>>>> + ibv_list = ibv_get_device_list(&num_devices);
>>>>>> + for (ibv_idx = 0; ibv_idx < num_devices; ibv_idx++) {
>>>>>> +         struct ibv_device *ibdev = ibv_list[ibv_idx];
>>>>>> +         struct rte_pci_addr pci_addr;
>>>>>> +
>>>>>> +         DRV_LOG(INFO, "Probe device name %s dev_name %s
>>>>> ibdev_path %s",
>>>>>> +                 ibdev->name, ibdev->dev_name, ibdev-
>>>>>> ibdev_path);
>>>>>> +
>>>>>> +         if (mana_ibv_device_to_pci_addr(ibdev, &pci_addr))
>>>>>> +                 continue;
>>>>>> +
>>>>>> +         /* Ignore if this IB device is not this PCI device */
>>>>>> +         if (pci_dev->addr.domain != pci_addr.domain ||
>>>>>> +             pci_dev->addr.bus != pci_addr.bus ||
>>>>>> +             pci_dev->addr.devid != pci_addr.devid ||
>>>>>> +             pci_dev->addr.function != pci_addr.function)
>>>>>> +                 continue;
>>>>>> +
>>>>>> +         ctx = ibv_open_device(ibdev);
>>>>>> +         if (!ctx) {
>>>>>> +                 DRV_LOG(ERR, "Failed to open IB device %s",
>>>>>> +                         ibdev->name);
>>>>>> +                 continue;
>>>>>> +         }
>>>>>> +
>>>>>> +         ret = ibv_query_device_ex(ctx, NULL, &dev_attr);
>>>>>> +         DRV_LOG(INFO, "dev_attr.orig_attr.phys_port_cnt %u",
>>>>>> +                 dev_attr.orig_attr.phys_port_cnt);
>>>>>> +         found_port = false;
>>>>>> +
>>>>>> +         for (port = 1; port <= dev_attr.orig_attr.phys_port_cnt;
>>>>>> +              port++) {
>>>>>> +                 struct ibv_parent_domain_init_attr attr = {};
>>>>>> +                 struct rte_ether_addr addr;
>>>>>> +                 char address[64];
>>>>>> +                 char name[RTE_ETH_NAME_MAX_LEN];
>>>>>> +
>>>>>> +                 ret = get_port_mac(ibdev, port, &addr);
>>>>>> +                 if (ret)
>>>>>> +                         continue;
>>>>>> +
>>>>>> +                 if (mac_addr && !rte_is_same_ether_addr(&addr,
>>>>> mac_addr))
>>>>>> +                         continue;
>>>>>> +
>>>>>> +                 rte_ether_format_addr(address, sizeof(address),
>>>>> &addr);
>>>>>> +                 DRV_LOG(INFO, "device located port %u address
>>>>> %s",
>>>>>> +                         port, address);
>>>>>> +                 found_port = true;
>>>>>> +
>>>>>> +                 priv = rte_zmalloc_socket(NULL, sizeof(*priv),
>>>>>> +                                           RTE_CACHE_LINE_SIZE,
>>>>>> +                                           SOCKET_ID_ANY);
>>>>>> +                 if (!priv) {
>>>>>> +                         ret = -ENOMEM;
>>>>>> +                         goto failed;
>>>>>> +                 }
>>>>>> +
>>>>>> +                 snprintf(name, sizeof(name), "%s_port%d",
>>>>>> +                          pci_dev->device.name, port);
>>>>>> +
>>>>>> +                 if (rte_eal_process_type() ==
>>>>> RTE_PROC_SECONDARY) {
>>>>>> +                         int fd;
>>>>>> +
>>>>>> +                         eth_dev =
>>>>> rte_eth_dev_attach_secondary(name);
>>>>>> +                         if (!eth_dev) {
>>>>>> +                                 DRV_LOG(ERR, "Can't attach to dev
>>>>> %s",
>>>>>> +                                         name);
>>>>>> +                                 ret = -ENOMEM;
>>>>>> +                                 goto failed;
>>>>>> +                         }
>>>>>> +
>>>>>> +                         eth_dev->device = &pci_dev->device;
>>>>>> +                         eth_dev->dev_ops = &mana_dev_sec_ops;
>>>>>> +                         ret = mana_proc_priv_init(eth_dev);
>>>>>> +                         if (ret)
>>>>>> +                                 goto failed;
>>>>>> +                         priv->process_priv = eth_dev-
>>>>>> process_private;
>>>>>> +
>>>>>> +                         /* Get the IB FD from the primary 
>>>>>> process */
>>>>>> +                         fd =
>>>>> mana_mp_req_verbs_cmd_fd(eth_dev);
>>>>>> +                         if (fd < 0) {
>>>>>> +                                 DRV_LOG(ERR, "Failed to get FD %d",
>>>>> fd);
>>>>>> +                                 ret = -ENODEV;
>>>>>> +                                 goto failed;
>>>>>> +                         }
>>>>>> +
>>>>>> +                         ret =
>>>>> mana_map_doorbell_secondary(eth_dev, fd);
>>>>>> +                         if (ret) {
>>>>>> +                                 DRV_LOG(ERR, "Failed secondary
>>>>> map %d",
>>>>>> +                                         fd);
>>>>>> +                                 goto failed;
>>>>>> +                         }
>>>>>> +
>>>>>> +                         /* fd is no not used after mapping 
>>>>>> doorbell */
>>>>>> +                         close(fd);
>>>>>> +
>>>>>> +                         rte_spinlock_lock(&mana_shared_data-
>>>>>> lock);
>>>>>> +                         mana_shared_data->secondary_cnt++;
>>>>>> +                         mana_local_data.secondary_cnt++;
>>>>>> +                         rte_spinlock_unlock(&mana_shared_data-
>>>>>> lock);
>>>>>> +
>>>>>> +                         rte_eth_copy_pci_info(eth_dev, pci_dev);
>>>>>> +                         rte_eth_dev_probing_finish(eth_dev);
>>>>>> +
>>>>>> +                         /* Impossible to have more than one port
>>>>>> +                          * matching a MAC address
>>>>>> +                          */
>>>>>> +                         continue;
>>>>>> +                 }
>>>>>> +
>>>>>> +                 eth_dev = rte_eth_dev_allocate(name);
>>>>>> +                 if (!eth_dev) {
>>>>>> +                         ret = -ENOMEM;
>>>>>> +                         goto failed;
>>>>>> +                 }
>>>>>> +
>>>>>> +                 eth_dev->data->mac_addrs =
>>>>>> +                         rte_calloc("mana_mac", 1,
>>>>>> +                                    sizeof(struct 
>>>>>> rte_ether_addr), 0);
>>>>>> +                 if (!eth_dev->data->mac_addrs) {
>>>>>> +                         ret = -ENOMEM;
>>>>>> +                         goto failed;
>>>>>> +                 }
>>>>>> +
>>>>>> +                 rte_ether_addr_copy(&addr, eth_dev->data-
>>>>>> mac_addrs);
>>>>>> +
>>>>>> +                 priv->ib_pd = ibv_alloc_pd(ctx);
>>>>>> +                 if (!priv->ib_pd) {
>>>>>> +                         DRV_LOG(ERR, "ibv_alloc_pd failed port %d",
>>>>> port);
>>>>>> +                         ret = -ENOMEM;
>>>>>> +                         goto failed;
>>>>>> +                 }
>>>>>> +
>>>>>> +                 /* Create a parent domain with the port number */
>>>>>> +                 attr.pd = priv->ib_pd;
>>>>>> +                 attr.comp_mask =
>>>>> IBV_PARENT_DOMAIN_INIT_ATTR_PD_CONTEXT;
>>>>>> +                 attr.pd_context = (void *)(uint64_t)port;
>>>>>> +                 priv->ib_parent_pd = ibv_alloc_parent_domain(ctx,
>>>>> &attr);
>>>>>> +                 if (!priv->ib_parent_pd) {
>>>>>> +                         DRV_LOG(ERR,
>>>>>> +                                 "ibv_alloc_parent_domain failed 
>>>>>> port
>>>>> %d",
>>>>>> +                                 port);
>>>>>> +                         ret = -ENOMEM;
>>>>>> +                         goto failed;
>>>>>> +                 }
>>>>>> +
>>>>>> +                 priv->ib_ctx = ctx;
>>>>>> +                 priv->port_id = eth_dev->data->port_id;
>>>>>> +                 priv->dev_port = port;
>>>>>> +                 eth_dev->data->dev_private = priv;
>>>>>> +                 priv->dev_data = eth_dev->data;
>>>>>> +
>>>>>> +                 priv->max_rx_queues = dev_attr.orig_attr.max_qp;
>>>>>> +                 priv->max_tx_queues = dev_attr.orig_attr.max_qp;
>>>>>> +
>>>>>> +                 priv->max_rx_desc =
>>>>>> +                         RTE_MIN(dev_attr.orig_attr.max_qp_wr,
>>>>>> +                                 dev_attr.orig_attr.max_cqe);
>>>>>> +                 priv->max_tx_desc =
>>>>>> +                         RTE_MIN(dev_attr.orig_attr.max_qp_wr,
>>>>>> +                                 dev_attr.orig_attr.max_cqe);
>>>>>> +
>>>>>> +                 priv->max_send_sge = dev_attr.orig_attr.max_sge;
>>>>>> +                 priv->max_recv_sge = dev_attr.orig_attr.max_sge;
>>>>>> +
>>>>>> +                 priv->max_mr = dev_attr.orig_attr.max_mr;
>>>>>> +                 priv->max_mr_size =
>>>>> dev_attr.orig_attr.max_mr_size;
>>>>>> +
>>>>>> +                 DRV_LOG(INFO, "dev %s max queues %d desc %d
>>>>> sge %d",
>>>>>> +                         name, priv->max_rx_queues, priv-
>>>>>> max_rx_desc,
>>>>>> +                         priv->max_send_sge);
>>>>>> +
>>>>>> +                 rte_spinlock_lock(&mana_shared_data->lock);
>>>>>> +                 mana_shared_data->primary_cnt++;
>>>>>> +                 rte_spinlock_unlock(&mana_shared_data->lock);
>>>>>> +
>>>>>> +                 eth_dev->data->dev_flags |=
>>>>> RTE_ETH_DEV_INTR_RMV;
>>>>>> +
>>>>>> +                 eth_dev->device = &pci_dev->device;
>>>>>> +                 eth_dev->data->dev_flags |=
>>>>>> +                         RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>>>>>> +
>>>>> Please do not use the temporary macro. Please review this patch:
>>>>>
>>>>> f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue
>>>>> xstats")
>>>>>
>>>>> This patch requires that per queue statistics are filled in
>>>>> .xstats_get() by PMD.
>>>> Thanks for pointing this out.
>>>>
>>>> It seems some PMDs are still depending on this flag for xstats.
>>>>
>>>> MANA doesn't implement xstats_get() currently, this flag is useful. 
>>>> Is it
>>> okay to keep using this flag before it's finally the time to remove 
>>> it from all
>>> PMDs, or when MANA implements xstats?
>>> Yes, your xstats doesn't implement now. Per queue stats should be 
>>> filled in
>>> xstats API, and the stats API cannot see per queue stats, so stats 
>>> API in driver
>>> shouldn't fill it(suggest that delete it from patch 17/18).
>>>
>>> I guess this flag can be removed if PMD does not support xstats.
>>>>
>> I don't understand your suggestion. An application can call 
>> rte_eth_stats_get() to get port stats, and this will call into 
>> stats_get() in the driver, as implemented in patch 17/18.
>>
>> When flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS  is set, an application 
>> can also use rte_eth_xstats_get() to get port stats even the driver 
>> doesn't implement xstats_get().
> 
> I think new PMD should follow the announced switch Ferruh mentioned,
> otherwise, the switch will never be completed.
> 
> Suggest that mana driver can implement a simple xstats_get() to fill per
> queue stats if you want to support per queue stats.
> 
> @Ferruh, what do you think?
> 

Hi Huisong,

Thanks for reminding it, yes it makes sense to implement new method in 
new drivers.


Long,

There is a long term plan to move queue stats from basic stats structure 
to xstats. The reason behind is increasing number of queues makes basis 
stats struct too big, on the other hand xstats is more flexible and no 
fixes size array is required.

You can remove the 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' flag and driver 
won't support queue stats in xstats by default. Instead to have that 
support you will need to implement xstats, later or in this set.
When queue stats implemented in xstats, please remember to remove 
updating 'stats->q_*' in basic stats.

Also in 17/18, the feature 'Stats per queue' seems added, but that is 
not correct, feature name is misleading here. But it is about queue 
stats mapping, please check 'doc/guides/nics/features.rst'.
So can you please drop that feature.

  parent reply	other threads:[~2022-09-07 11:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30 22:51 [Patch v6 00/18] Introduce Microsoft Azure Network Adatper (MANA) PMD longli
2022-08-30 22:51 ` [Patch v6 01/18] net/mana: add basic driver, build environment and doc longli
2022-08-31  1:32   ` lihuisong (C)
2022-08-31 18:05     ` Long Li
2022-09-05  7:15       ` lihuisong (C)
2022-09-07  1:36         ` Long Li
2022-09-07  2:16           ` lihuisong (C)
2022-09-07  2:26             ` Long Li
2022-09-07 11:11             ` Ferruh Yigit [this message]
2022-09-07 18:12               ` Long Li
2022-09-02 12:09   ` fengchengwen
2022-09-02 19:45     ` Long Li
2022-09-03  1:44       ` fengchengwen
2022-08-30 22:51 ` [Patch v6 02/18] net/mana: add device configuration and stop longli
2022-08-30 22:51 ` [Patch v6 03/18] net/mana: add function to report support ptypes longli
2022-08-30 22:51 ` [Patch v6 04/18] net/mana: add link update longli
2022-08-30 22:51 ` [Patch v6 05/18] net/mana: add function for device removal interrupts longli
2022-08-30 22:51 ` [Patch v6 06/18] net/mana: add device info longli
2022-09-02 12:11   ` fengchengwen
2022-09-02 19:35     ` Long Li
2022-08-30 22:51 ` [Patch v6 07/18] net/mana: add function to configure RSS longli
2022-08-30 22:51 ` [Patch v6 08/18] net/mana: add function to configure RX queues longli
2022-08-30 22:51 ` [Patch v6 09/18] net/mana: add function to configure TX queues longli
2022-08-30 22:51 ` [Patch v6 10/18] net/mana: implement memory registration longli
2022-08-30 22:51 ` [Patch v6 11/18] net/mana: implement the hardware layer operations longli
2022-08-30 22:51 ` [Patch v6 12/18] net/mana: add function to start/stop TX queues longli
2022-08-30 22:51 ` [Patch v6 13/18] net/mana: add function to start/stop RX queues longli
2022-08-30 22:51 ` [Patch v6 14/18] net/mana: add function to receive packets longli
2022-08-30 22:51 ` [Patch v6 15/18] net/mana: add function to send packets longli
2022-09-02 12:18   ` fengchengwen
2022-09-02 19:40     ` Long Li
2022-08-30 22:51 ` [Patch v6 16/18] net/mana: add function to start/stop device longli
2022-08-30 22:51 ` [Patch v6 17/18] net/mana: add function to report queue stats longli
2022-08-30 22:51 ` [Patch v6 18/18] net/mana: add function to support RX interrupts longli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a858c5be-42b5-0401-d16e-7f27d129f1ce@xilinx.com \
    --to=ferruh.yigit@xilinx.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=lihuisong@huawei.com \
    --cc=longli@microsoft.com \
    --cc=sharmaajay@microsoft.com \
    --cc=sthemmin@microsoft.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).