From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1CE5742D27 for ; Thu, 22 Jun 2023 17:48:05 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D83B840DDA; Thu, 22 Jun 2023 17:48:04 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id BB58B406BA for ; Thu, 22 Jun 2023 17:48:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687448883; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CF0n18itNL4pfsHv4wTVjJ0my/6+yoFCbZ84DE9Sf/k=; b=UW5TmqMaDRQdI9NuIK/eHAwNLKJaLZ44l/rbXhKRjwDEfs04pvARaVam/swlXbTdvf0SMn 3LnbIIkgfUEC9xDxH71BRncL5uYW+r/GVs9T25hk3Yc13FBARLDmKnoy79CGDpshcXSgof cNNjYMrgC/sh7efWcQive/teqZzi/SQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-658-hB4epCv4MGelezWACIxifA-1; Thu, 22 Jun 2023 11:47:59 -0400 X-MC-Unique: hB4epCv4MGelezWACIxifA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A4AB28870D7; Thu, 22 Jun 2023 15:47:55 +0000 (UTC) Received: from [10.39.208.22] (unknown [10.39.208.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 91323112132C; Thu, 22 Jun 2023 15:47:41 +0000 (UTC) Message-ID: Date: Thu, 22 Jun 2023 17:47:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 From: Maxime Coquelin To: =?UTF-8?Q?Daniel_=c3=96stman?= , Erez Ferber , Slava Ovsiienko Cc: "users@dpdk.org" , Matan Azrad , "david.marchand@redhat.com" References: <5d9ae8ec-450a-c411-c044-577f00b127f5@redhat.com> Subject: Re: mlx5: imissed / out_of_buffer counter always 0 In-Reply-To: <5d9ae8ec-450a-c411-c044-577f00b127f5@redhat.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Hi, On 6/21/23 22:22, Maxime Coquelin wrote: > Hi Daniel, all, > > On 6/5/23 16:00, Daniel Östman wrote: >> Hi Slava and Erez and thanks for your answers, >> >> Regarding the firmware, I’ve also deployed in a different OpenShift >> cluster were I see the exact same issue but with a different Mellanox >> NIC: >> >> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE >> QSFP56 PCIe Adapter >> >> driver: mlx5_core >> >> version: 5.0-0 >> firmware-version: 22.36.1010 (DEL0000000027) >> >>  From what I can see the firmware is relatively new on that one? > > With below configuration: > - ConnectX-6 Dx MT2892 > - Kernel: 6.4.0-rc6 > - FW version: 22.35.1012 (MT_0000000528) > > The out-of-buffer counter is fetched via > mlx5_devx_cmd_queue_counter_query(): > > [pid  2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 > [pid  2942] write(1, "\n  ######################## NIC "..., 80) = 80 > [pid  2942] write(1, "  RX-packets: 630997736  RX-miss"..., 70) = 70 > [pid  2942] write(1, "  RX-errors: 0\n", 15) = 15 > [pid  2942] write(1, "  RX-nombuf:  0         \n", 25) = 25 > [pid  2942] write(1, "  TX-packets: 0          TX-erro"..., 60) = 60 > [pid  2942] write(1, "\n", 1)           = 1 > [pid  2942] write(1, "  Throughput (since last show)\n", 31) = 31 > [pid  2942] write(1, "  Rx-pps:            0          "..., 106) = 106 > [pid  2942] write(1, "  ##############################"..., 79) = 79 > > It looks like we may miss some mlx5 kernel patches so that we can use > mlx5_devx_cmd_queue_counter_query() with RHEL? > > Erez, Slava, any idea on the patches that could be missing? Above test was on baremetal as root, I get the same "working" behaviour on RHEL as root. We managed to reproduce Daniel's with running the same within a container, enabling debug logs we have this warning: mlx5_common: DevX create q counter set failed errno=121 status=0x2 syndrome=0x8975f1 mlx5_net: Port 0 queue counter object cannot be created by DevX - fall-back to use the kernel driver global queue counter. Running the container as privileged solves the issue, and so does when adding SYS_RAWIO capability to the container. Erez, Slava, is that expected to require SYS_RAWIO just to get a stat counter? Daniel, could you try adding SYS_RAWIO to your pod to confirm you face the same issue? Thanks in advance, Maxime > Regards, > Maxime > >> >> I tried setting dv_flow_en=0 (and saw that it was propagated to >> config->dv_flow_en) but it didn’t seem to help. >> >> Erez, I’m not sure what you mean by shared or non-shared mode in this >> case, however it seems it could be related to the fact that the >> container is running in a separate network namespace. Because the >> hw_counter directory is available on the host (cluster node), but not >> in the pod container. >> >> Best regards, >> >> Daniel >> >> *From:*Erez Ferber >> *Sent:* Monday, 5 June 2023 12:29 >> *To:* Slava Ovsiienko >> *Cc:* Daniel Östman ; users@dpdk.org; >> Matan Azrad ; maxime.coquelin@redhat.com; >> david.marchand@redhat.com >> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0 >> >> Hi Daniel, >> >> is the container running in shared or non-shared mode ? >> >> For shared mode, I assume the kernel sysfs counters which DPDK relies >> on for imissed/out_of_buffer are not exposed. >> >> Best regards, >> >> Erez >> >> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko > > wrote: >> >>     Hi, Daniel >> >>     I would recommend to take the following action: >> >>     - update the firmware, 16.33.xxxx looks to be outdated a little bit. >>     Please, try 16.35.1012 or later. >>        mlx5_glue->devx_obj_create might succeed with the newer FW. >> >>     - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use >>     rdma_core library for queue management >>       and kernel driver will  be aware about Rx queues being created and >>     attach them to the kernel counter set >> >>     With best regards, >>     Slava >> >>     *From:*Daniel Östman >     > >>     *Sent:* Friday, June 2, 2023 3:59 PM >>     *To:* users@dpdk.org >>     *Cc:* Matan Azrad >; >>     Slava Ovsiienko >     >; maxime.coquelin@redhat.com >>     ; david.marchand@redhat.com >>     >>     *Subject:* mlx5: imissed / out_of_buffer counter always 0 >> >>     Hi, >> >>     I’m deploying a containerized DPDK application in an OpenShift >>     Kubernetes environment using DPDK 21.11.3. >> >>     The application uses a Mellanox ConnectX-5 100G NIC through VFs. >> >>     The problem I have is that the ETH stats counter imissed (which >>     seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) >>     is 0 when I don’t expect it to be, i.e. when the application doesn’t >>     read the packets fast enough. >> >>     Using GDB I can see that it tries to access the counter through >>     /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but >>     the hw_counters directory is missing so it will just return a zero >>     value. I don’t know why it is missing. >> >>     When looking at mlx5_os_read_dev_stat() I can see that there is an >>     alternative way of reading the counter, through >>     mlx5_devx_cmd_queue_counter_query() but under the condition that >>     priv->q_counters are set. >> >>     It doesn’t get set in my case because mlx5_glue->devx_obj_create() >>     fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc(). >> >>     Have I missed something? >> >>     NIC info: >> >>     Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port >>     QSFP28 MCX516A-CCHT >>     driver: mlx5_core >>     version: 5.0-0 >>     firmware-version: 16.33.1048 (MT_0000000417) >> >>     Please let me know if I need to provide more information. >> >>     Best regards, >> >>     Daniel >>