DPDK usage discussions
 help / color / mirror / Atom feed
From: "David Coen" <d.coen@resi.it>
To: "'Kai Zhang'" <kay21s@gmail.com>,
	"'Wiles, Keith'" <keith.wiles@intel.com>
Cc: "'Van Haaren, Harry'" <harry.van.haaren@intel.com>, <users@dpdk.org>
Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
Date: Wed, 15 Mar 2017 16:48:53 +0100	[thread overview]
Message-ID: <002b01d29da3$ad1859d0$07490d70$@resi.it> (raw)

Hi Kai,

I'm sure that it's not necessary to use --base-virtaddr option on the secondary process.

Referring to addresses of your last post, to fully try my method, 
you should set your real primary application with

 --base-virtaddr=0x7ffef5000000

that is the smallest address I can see in your post (see below "Region 5").

I hope this could help you,

David
------------------------------------------------------------------------------------------
Da: Kai Zhang [mailto:kay21s@gmail.com] 
Inviato: mercoledì 15 marzo 2017 05:14
A: Wiles, Keith
Cc: David Coen; Van Haaren, Harry
Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file

I have also tried to use the same option --base-virtaddr=0x7fffdc200000 on the secondary process. But it does not help.

Thank you, Keith. I think I can try to figure it out first, if the internal is not too complicated ...

Regards,
Kai


On Wed, Mar 15, 2017 at 11:28 AM, Wiles, Keith <keith.wiles@intel.com> wrote:

> On Mar 15, 2017, at 10:56 AM, Kai Zhang <kay21s@gmail.com> wrote:
>
> Hi David,
>
> I find your method not work for me :-(
>
> The dummy primary application shows the following regions:
> Region 0: virtual address [0x7fffdc200000, 0x7ffff5a00000], physical address 0x59c00000, len 427819008
> Region 1: virtual address [0x7fffdbe00000, 0x7fffdc000000], physical address 0x7b600000, len 2097152
> Region 2: virtual address [0x7fffdba00000, 0x7fffdbc00000], physical address 0xf25800000, len 2097152
> Region 3: virtual address [0x7ffef5800000, 0x7fffdb800000], physical address 0xf25c00000, len 3858759680
> Region 4: virtual address [0x7ffef5400000, 0x7ffef5600000], physical address 0x100f000000, len 2097152
> Region 5: virtual address [0x7ffef5000000, 0x7ffef5200000], physical address 0x1024000000, len 2097152
>
> I set the real primary application with --base-virtaddr=0x7fffdc200000
>
> The error in the secondary process is:
> EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7ffff2bfd000

This one seems like a hardware issue around the PCI device can not be set to the correct. The path above is the device path to the resource0 value in the PCI and the system is having problem mapping the address. The secondary process, does it need to have the same option setting the base address?

Sorry, not much help here as I not able to focus on the problem more because I am off site at a week long meeting.

>
> It seems that they are not accessing the same region.
>
> Regards,
> Kai
>
> On Wed, Mar 15, 2017 at 12:47 AM, David Coen <d.coen@resi.it> wrote:
> Hi Kai, I agree with you.
>
>
>
> Hi have quite the same issue, a primary application and a secondary one running, sometimes, with more than 4 cores.
>
> I'm using DPDK 16.11 on RedHat 6.7.
>
>
>
> Till now I solved in this way:
>
>
>
> - Disabling ASLR by adding those two lines to "/etc/sysctl.conf":
>
>                 # Disable Address Space Layout Randomization (ASLR)  (needed by DPDK)
>
>                 kernel.randomize_va_space = 0
>
>
>
> - Getting virtual address of the first (the one with the minimum address value) memory segment returned from the function "rte_eal_get_physmem_layout ()", called from a "dummy" primary application used only to get this address.
>
> - Passing the above virtual address as a parameter for the "real" primary application using the " --base-virtaddr= " dpdk command line option. When secondary app starts, it all goes well with the specified base address.
>
>
>
> I've tested this solution on different servers and it's always ok.
>
> I think that there is some kind of limitation on DPDK primary/secondary initialization process that could be improved.
>
>
>
> Regards,
>
> David
>
>
>
> -----Messaggio originale-----
>
> Da: Kai Zhang [mailto:kay21s@gmail.com]
>
> Inviato: lunedì 13 marzo 2017 11:59
>
> A: Van Haaren, Harry
>
> Cc: Wiles, Keith; users@dpdk.org
>
> Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
>
>
>
> Thank you for your info, Harry.
>
>
>
> Even if the ASLR is the root reason, I don't think DPDK should expect users to disable it to use the primary/secondary model. Is it possible for the DPDK team to check this issue and fix the bug?
>
>
>
> Regards,
>
> Kai
>
>
>
> On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry < harry.van.haaren@intel.com> wrote:
>
>
>
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang
>
> > > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot
>
> > > mmap
>
> > device resource
>
> > > file
>
> > >
>
> > > Yes, my application is somewhat special and should run with the
>
> > > primary/secondary mode. I will search for the way to turn of the
>
> > > random page mapping and try it.
>
> >
>
> >
>
> > You're searching for ASLR, or Address Space Layout Randomization.
>
> >
>
> > Some useful links regarding ASLR, DPDK and Linux;
>
> > http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_
>
> > support.html#multi-process-limitations
>
> > http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-as
>
> > lr-
>
> > address-space-layout-randomization
>
> > http://dpdk.org/ml/archives/dev/2015-June/019364.html
>
> >
>
> > Please note that ASLR is a security feature of the OS, think twice
>
> > before disabling it.
>
> >
>
> >
>
> > Hope that helps, -Harry
>
> >
>
> >
>
> > > Thanks for your help :)
>
> > >
>
> > > Regards,
>
> > > Kai
>
> > >
>
> > > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith
>
> > > <keith.wiles@intel.com>
>
> > wrote:
>
> > >
>
> > > >
>
> > > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
>
> > > > >
>
> > > > >
>
> > > > > Your application may be attaching to the same port for each core.
>
> > > > Normally this means the each core could be allocating memory and
>
> > > > the
>
> > 4th
>
> > > > core just goes over the amount of memory you have reserved.
>
> > > > >
>
> > > > > I don't think so. Because the error is in the rte_eal_init(),
>
> > > > > which
>
> > is
>
> > > > executed in the first line of the main() function. At the time,
>
> > > > the
>
> > other
>
> > > > threads are not even launched.
>
> > > > >
>
> > > > > Is it possible to consider this as a bug in DPDK?
>
> > > >
>
> > > > One more thing, I run Pktgen as two processes all of the time. The
>
> > > > big difference is I do not run in primary and secondary modes. I
>
> > > > run two different instances of pktgen at the same time without
>
> > > > seeing this type problem. If the failure is associated with
>
> > > > primary/secondary
>
> > application
>
> > > > model, then it could be a bug in that code as a lot of syncing up
>
> > between
>
> > > > the two processes needs to be done because of memory/device sharing.
>
> > One
>
> > > > problem with P/S applications is memory needs to be mapped at the
>
> > > > same address between the processes and Linux has the Random memory
>
> > > > mapping builtin for security reasons. I forget the name of the
>
> > > > mode in Linux to turn off the random page mapping and google is not work for me ATM.
>
> > > >
>
> > > > Does your application require running as a primary/secondary
>
> > application?
>
> > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Kai
>
> > > > >
>
> > > > >
>
> > > > > >
>
> > > > > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:
>
> > 00.0/resource0
>
> > > > to address: 0x7fff65bfc000
>
> > > > > > EAL: Error - exiting with code: 1
>
> > > > > >   Cause: Requested device 0000:02:00.0 cannot be used
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Kai
>
> > > > > >
>
> > > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com>
>
> > wrote:
>
> > > > > >
>
> > > > > > Command line:
>
> > > > > > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
>
> > > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4
>
> > > > > > --proc-type=secondary
>
> > > > > >
>
> > > > > > The configurations are as follows:
>
> > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
>
> > > > 02:00.0,    2048 x 4k huge page
>
> > > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet
>
> > > > > > Controller
>
> > > > XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
>
> > > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet
>
> > > > > > Controller
>
> > > > XL710 for 40GbE QSFP+ (rev 02)
>
> > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > >         Socket 0
>
> > > > > > --------
>
> > > > > > Core 0  [0, 12]
>
> > > > > > Core 1  [1, 13]
>
> > > > > > Core 2  [2, 14]
>
> > > > > > Core 3  [3, 15]
>
> > > > > > Core 4  [4, 16]
>
> > > > > > Core 5  [5, 17]
>
> > > > > > Core 8  [6, 18]
>
> > > > > > Core 9  [7, 19]
>
> > > > > > Core 10 [8, 20]
>
> > > > > > Core 11 [9, 21]
>
> > > > > > Core 12 [10, 22]
>
> > > > > > Core 13 [11, 23]
>
> > > > > >
>
> > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,
>
> > 2048 x
>
> > > > 4k huge page
>
> > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > >         Socket 0        Socket 1
>
> > > > > >         --------        --------
>
> > > > > > Core 0  [0, 20]         [10, 30]
>
> > > > > > Core 1  [1, 21]         [11, 31]
>
> > > > > > Core 2  [2, 22]         [12, 32]
>
> > > > > > Core 3  [3, 23]         [13, 33]
>
> > > > > > Core 4  [4, 24]         [14, 34]
>
> > > > > > Core 8  [5, 25]         [15, 35]
>
> > > > > > Core 9  [6, 26]         [16, 36]
>
> > > > > > Core 10 [7, 27]         [17, 37]
>
> > > > > > Core 11 [8, 28]         [18, 38]
>
> > > > > > Core 12 [9, 29]         [19, 39]
>
> > > > > >
>
> > > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC
>
> > > > > > and
>
> > run
>
> > > > my program with locally generated packets. But I am using other
>
> > > > DPDK features, such as memory sharing and message passing. Maybe
>
> > > > that is the reason it works correctly? I can only access machine B
>
> > > > remotely, so I
>
> > am
>
> > > > unable to install a NIC on it. I have another PC that is used as a
>
> > client
>
> > > > that only has four cores, which also cannot be used for verification...
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Kai
>
> > > > > >
>
> > > > > >
>
> > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <
>
> > keith.wiles@intel.com>
>
> > > > wrote:
>
> > > > > >
>
> > > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
>
> > > > > > >
>
> > > > > > > Hi Keith,
>
> > > > > > >
>
> > > > > > > Thank you for your reply.
>
> > > > > > >
>
> > > > > > > I have tested my program on two machines
>
> > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
>
> > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
>
> > > > > > >
>
> > > > > > > I am very sure that the primary process uses different cores
>
> > > > > > > with
>
> > > > the secondary process. The strange thing is that my program works
>
> > correctly
>
> > > > on machine B. But on machine A, the above issue happens with more
>
> > > > than
>
> > 4
>
> > > > cores assigned to the secondary process.
>
> > > > > > >
>
> > > > > > > I have tried to assign cores 1-5  to the secondary process
>
> > > > > > > and
>
> > also
>
> > > > tried other core assignment policies, but the error still happens
>
> > > > rte_eal_init() with more than 4 cores.
>
> > > > > >
>
> > > > > > It would be nice to see both command lines. I am not sure I
>
> > > > > > can
>
> > help
>
> > > > more all I can do is suggest some ideas to look at.
>
> > > > > >
>
> > > > > > Does machine B have the same number and type of NICs? Use
>
> > > > > > ‘lspci |
>
> > > > grep Ethernet’ to get a list of all Ethernet devices on both machines.
>
> > > > > >
>
> > > > > > What is the number of hugepages you have allocated for both
>
> > machines.
>
> > > > > >
>
> > > > > > Also look at the cpu_layout.py script to see why adding the
>
> > > > > > 5th
>
> > core
>
> > > > would be different on the two machines and try to make them the same.
>
> > > > > >
>
> > > > > > >
>
> > > > > > > Regards,
>
> > > > > > > Kai
>
> > > > > > >
>
> > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
>
> > > > keith.wiles@intel.com> wrote:
>
> > > > > > >
>
> > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com>
>
> > wrote:
>
> > > > > > > >
>
> > > > > > > > Hi, there
>
> > > > > > > >
>
> > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS
>
> > > > > > > > 7.3.1611
>
> > with
>
> > > > Linux
>
> > > > > > > > kernel version 3.8.0-30.
>
> > > > > > > >
>
> > > > > > > > I have a master process and a secondary process. When I
>
> > > > > > > > run the
>
> > > > secondary
>
> > > > > > > > process with less than or equal to 4 cores, it works correctly.
>
> > > > Such as:
>
> > > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary sudo
>
> > > > > > > > ./program -c 0x0f -n 4 --proc-type=secondary
>
> > > > > > > >
>
> > > > > > > > However, there will be error in the rte_eal_init if I
>
> > > > > > > > assign
>
> > more
>
> > > > than 4
>
> > > > > > > > cores.
>
> > > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
>
> > > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
>
> > > > > > > >
>
> > > > > > > > EAL: Cannot mmap device resource file
>
> > > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
>
> > > > 0x7fff65bfc000
>
> > > > > > > > EAL: Error - exiting with code: 1
>
> > > > > > > >  Cause: Requested device 0000:02:00.0 cannot be used
>
> > > > > > >
>
> > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on
>
> > > > > > > the
>
> > > > secondary process.
>
> > > > > > >
>
> > > > > > > You did not show the primary process command line, but the
>
> > > > > > > if you
>
> > > > use 1-5 then you can only give primary process -l 6-7 or two
>
> > > > cores. It
>
> > is
>
> > > > always a reasonable thing is to leave core zero for linux to use.
>
> > > > > > >
>
> > > > > > > Also it could be you ran out of memory or hugepages you
>
> > allocated to
>
> > > > the system.
>
> > > > > > >
>
> > > > > > > >
>
> > > > > > > > Anyone knows why this happens?
>
> > > > > > > >
>
> > > > > > > > Thanks a lot,
>
> > > > > > > > Kai Zhang
>
> > > > > > >
>
> > > > > > > Regards,
>
> > > > > > > Keith
>
> > > > > > >
>
> > > > > > >
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Keith
>
> > > > > >
>
> > > > > >
>
> > > > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Keith
>
> > > >
>
> > > > Regards,
>
> > > > Keith
>
> > > >
>
> > > >
>
> >
>
>
>
>
>
>
Regards,
Keith

             reply	other threads:[~2017-03-15 15:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-15 15:48 David Coen [this message]
2017-03-15 17:02 ` Kai Zhang
  -- strict thread matches above, loose matches on Subject: below --
2017-03-15 14:56 David Coen
2017-03-11  3:35 Kai Zhang
2017-03-11 14:52 ` Wiles, Keith
2017-03-11 15:45   ` Kai Zhang
2017-03-11 18:59     ` Wiles, Keith
2017-03-12  3:21       ` Kai Zhang
2017-03-12  3:29         ` Kai Zhang
2017-03-12 10:32           ` Wiles, Keith
2017-03-12 10:39             ` Kai Zhang
2017-03-12 18:55               ` Wiles, Keith
2017-03-12 19:24               ` Wiles, Keith
2017-03-12 23:44                 ` Kai Zhang
2017-03-13  9:58                   ` Van Haaren, Harry
2017-03-13 10:59                     ` Kai Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='002b01d29da3$ad1859d0$07490d70$@resi.it' \
    --to=d.coen@resi.it \
    --cc=harry.van.haaren@intel.com \
    --cc=kay21s@gmail.com \
    --cc=keith.wiles@intel.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).