DPDK patches and discussions
 help / color / mirror / Atom feed
From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [dpdk-dev] [Bug 440] net/mlx5: Read of "out_of_buffer" using fopen/fread/fclose causing TLB shootdowns due to mmap/munmap
Date: Mon, 06 Apr 2020 04:10:11 +0000	[thread overview]
Message-ID: <bug-440-3@http.bugs.dpdk.org/> (raw)

https://bugs.dpdk.org/show_bug.cgi?id=440

            Bug ID: 440
           Summary: net/mlx5: Read of "out_of_buffer" using
                    fopen/fread/fclose causing TLB shootdowns due to
                    mmap/munmap
           Product: DPDK
           Version: 19.11
          Hardware: x86
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: mohsinshaikh@niometrics.com
  Target Milestone: ---

Created attachment 86
  --> https://bugs.dpdk.org/attachment.cgi?id=86&action=edit
Potential fix patch

Setup: HPE ProLiant DL560 Gen10 Server with 4 x "Intel(R) Xeon(R) Platinum
8180M CPU @ 2.50GHz" and 6TB of memory (1 TB reserved huge pages) with 2 X
"Mellanox Technologies ConnectX®-5 EN network interface card, 100GbE
single-port QSFP28, PCIe3.0 x16, tall bracket; MCX515A-CCAT". OS is Centos 7
with kernel 3.10.0-957.27.2.el7.x86_64. 

While testing I noticed that the rx cores of the DPDK app were getting TLB
shootdowns periodically. After debugging I found that the stats thread of the
DPDK app was responsible for those. Specifically the read of "out_of_buffer"
stat:

static inline void
mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat)
{
        FILE *file;
        if (priv->sh) {
                MKSTR(path, "%s/ports/%d/hw_counters/%s",
                          priv->sh->ibdev_path,
                          priv->ibv_port,
                          ctr_name);

                file = fopen(path, "rb");
                if (file) { 
                        int n = fscanf(file, "%" SCNu64, stat); <----- Calls
mmap().

                        fclose(file); <---- Calls munmap which leads to TLB
shootdown
                        if (n == 1)
                                return;
                }
        }
        *stat = 0;
}

NOTE: We almost exclusively use huge pages in the DPDK app for allocations.

Stack traces:

(gdb) where
#0  __mmap (addr=addr@entry=0x0, len=4096, prot=prot@entry=3,
flags=flags@entry=34, fd=fd@entry=-1, offset=offset@entry=0) at
../sysdeps/unix/sysv/linux/wordsize-64/mmap.c:32
#1  0x00007fffdeaf8ed1 in __GI__IO_file_doallocate (fp=0x2b52263ae000) at
filedoalloc.c:127
#2  0x00007fffdeb07d07 in __GI__IO_doallocbuf (fp=fp@entry=0x2b52263ae000) at
genops.c:399
#3  0x00007fffdeb06c8c in _IO_new_file_underflow (fp=0x2b52263ae000) at
fileops.c:557
#4  0x00007fffdeb07dd2 in __GI__IO_default_uflow (fp=0x2b52263ae000) at
genops.c:414
#5  0x00007fffdeae7efa in _IO_vfscanf_internal (s=<optimized out>,
format=<optimized out>, argptr=argptr@entry=0x7fff51f07498,
errp=errp@entry=0x0) at vfscanf.c:600
#6  0x00007fffdeaed617 in ___vfscanf (s=<optimized out>, format=<optimized
out>, argptr=argptr@entry=0x7fff51f07498) at vfscanf.c:2942
#7  0x00007fffdeaf6247 in __fscanf (stream=<optimized out>, format=<optimized
out>) at fscanf.c:31
#8  0x00007fffe632ee9f in mlx5_stats_get () from
/opt/nio/dpdk/lib/librte_pmd_mlx5.so.20.0
#9  0x00007fffeee6b468 in rte_eth_stats_get () from
/opt/nio/dpdk/lib/librte_ethdev.so.20.0


Breakpoint 2, munmap () at ../sysdeps/unix/syscall-template.S:81
81      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) where
#0  munmap () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fffdeb07cb2 in __GI__IO_setb (f=f@entry=0x2b52263ae000,
b=b@entry=0x0, eb=eb@entry=0x0, a=a@entry=0) at genops.c:383
#2  0x00007fffdeb06090 in _IO_new_file_close_it (fp=fp@entry=0x2b52263ae000) at
fileops.c:192
#3  0x00007fffdeaf9138 in _IO_new_fclose (fp=0x2b52263ae000) at iofclose.c:58
#4  0x00007fffe632eeaa in mlx5_stats_get () from
/opt/nio/dpdk/lib/librte_pmd_mlx5.so.20.0
#5  0x00007fffeee6b468 in rte_eth_stats_get () from
/opt/nio/dpdk/lib/librte_ethdev.so.20.0


Output from kernel tracing (using trace-cmd):

           <...>-120694 [000] 502442.376421: function:            
native_flush_tlb_others
           <...>-120694 [000] 502442.376423: kernel_stack:         <stack
trace>
=> tlb_flush_mmu.part.76 (ffffffffa89e57d7)
=> tlb_finish_mmu (ffffffffa89e70d5)
=> unmap_region (ffffffffa89f0754)
=> do_munmap (ffffffffa89f2d45)
=> vm_munmap (ffffffffa89f2f85)
=> SyS_munmap (ffffffffa89f4212)
=> tracesys (ffffffffa8f7706b)


           <...>-120324 [002] 502442.382250: function:            
flush_tlb_func
           <...>-120323 [001] 502442.382250: function:            
flush_tlb_func
           <...>-120325 [003] 502442.382251: function:            
flush_tlb_func
           <...>-120327 [005] 502442.382251: function:            
flush_tlb_func
           <...>-120326 [004] 502442.382251: function:            
flush_tlb_func
           <...>-120332 [010] 502442.382251: function:            
flush_tlb_func
           <...>-120324 [002] 502442.382252: kernel_stack:         <stack
trace>
=> generic_smp_call_function_single_interrupt (ffffffffa8912ea3)
=> smp_call_function_interrupt (ffffffffa885737d)
=> call_function_interrupt (ffffffffa8f79382)


The following patch seems to fix the issue:
=========================================================================

commit e4e93bd3ffb4e75b2178b0fdfe12341bfddf171d                                 
Author: Mohsin Shaikh <mohsinshaikh@niometrics.com>                             
Date:   Tue Mar 31 14:28:41 2020 +0800                                          

    net/mlx5: Use open/read/close for reading ib stat                           

diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c      
index 205e4fe..769f080 100644                                                   
--- a/drivers/net/mlx5/mlx5_stats.c                                             
+++ b/drivers/net/mlx5/mlx5_stats.c                                             
@@ -8,6 +8,8 @@                                                                 
 #include <linux/ethtool.h>                                                     
 #include <stdint.h>
 #include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>

 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
@@ -139,20 +141,22 @@
 static inline void
 mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t
*stat)
 {
-       FILE *file;
+       int fd;
        if (priv->sh) {
                MKSTR(path, "%s/ports/%d/hw_counters/%s",
                          priv->sh->ibdev_path,
                          priv->ibv_port,
                          ctr_name);

-               file = fopen(path, "rb");
-               if (file) {
-                       int n = fscanf(file, "%" SCNu64, stat);
-
-                       fclose(file);
-                       if (n == 1)
+               fd = open(path, O_RDONLY);
+               if (fd != -1) {
+                       char buf[32];
+                       ssize_t n = read(fd, buf, sizeof(buf));
+                       close(fd);
+                       if (n != -1) {
+                               sscanf(buf, "%lu", stat);
                                return;
+                       }
                }
        }
        *stat = 0;

=========================================================================

Patch attached to bug.

These TLB shootdowns lead to packets being dropped sporadically by the mlx5
NICs with the "rx_discards_phy" counter being incremented. The above patch
significantly reduced the drops in our app.

-- 
You are receiving this mail because:
You are the assignee for the bug.

                 reply	other threads:[~2020-04-06  4:10 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-440-3@http.bugs.dpdk.org/ \
    --to=bugzilla@dpdk.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).