From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpcmd04135.aruba.it (smtpcmd04135.aruba.it [62.149.158.135]) by dpdk.org (Postfix) with ESMTP id 48525A10 for ; Wed, 15 Mar 2017 16:51:39 +0100 (CET) Received: from CoenD-NBK ([93.146.250.201]) by smtpcmd04.ad.aruba.it with bizsmtp id wFqY1u01n4MU9Ql01FqeSL; Wed, 15 Mar 2017 16:50:39 +0100 Received: from CoenDNBK by CoenD-NBK (PGP Universal service); Wed, 15 Mar 2017 16:49:05 +0100 X-PGP-Universal: processed; by CoenD-NBK on Wed, 15 Mar 2017 16:49:05 +0100 From: "David Coen" To: "'Kai Zhang'" , "'Wiles, Keith'" Cc: "'Van Haaren, Harry'" , Date: Wed, 15 Mar 2017 16:48:53 +0100 Message-ID: <002b01d29da3$ad1859d0$07490d70$@resi.it> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AdKdoKzq9Et07YF2ScaDb6foHgspQw== Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Content-Language: it DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aruba.it; s=a1; t=1489593039; bh=6y8tqxnM6KVmausYYDbzgPGx76KShHUzrTHC31iZmCU=; h=From:To:Subject:Date:MIME-Version:Content-Type; b=mOiEVhQ/sTMXFLTyMUw/WIeU/XgFPomDct5gFb3UVrVyTLhIj0x5JugAVhnzIPaWG 6vwuvox5p6rQwr/cDuZUwDKFWyZDe2v18I7cfhsvst6oO8AfKzuFWcXCiNGV3k+97D QuV8ygnvfrIfaRpeYMA4X5hfkUFmLi2Fhukaf4GDbPUzFvVweXHYJTGzYSZODyHR73 lW48/0BMnWkxV4YMoDwUQ6WZE10YwrPL+YptqCE2BylL2rXNy+IA7wwF+Ql8wbbcc/ 30a2kOqoCIIYw5aOPMxUAkJvswq1f0RwpLSdpCM5KGSs56JxhSdBRXBUf2AFo/Bpbs CPNXTMbFdGprw== Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 15:51:39 -0000 Hi Kai, I'm sure that it's not necessary to use --base-virtaddr option on the = secondary process. Referring to addresses of your last post, to fully try my method,=20 you should set your real primary application with --base-virtaddr=3D0x7ffef5000000 that is the smallest address I can see in your post (see below "Region = 5"). I hope this could help you, David -------------------------------------------------------------------------= ----------------- Da: Kai Zhang [mailto:kay21s@gmail.com]=20 Inviato: mercoled=C3=AC 15 marzo 2017 05:14 A: Wiles, Keith Cc: David Coen; Van Haaren, Harry Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap = device resource file I have also tried to use the same option = --base-virtaddr=3D0x7fffdc200000 on the secondary process. But it does = not help. Thank you, Keith. I think I can try to figure it out first, if the = internal is not too complicated ... Regards, Kai On Wed, Mar 15, 2017 at 11:28 AM, Wiles, Keith = wrote: > On Mar 15, 2017, at 10:56 AM, Kai Zhang wrote: > > Hi David, > > I find your method not work for me :-( > > The dummy primary application shows the following regions: > Region 0: virtual address [0x7fffdc200000, 0x7ffff5a00000], physical = address 0x59c00000, len 427819008 > Region 1: virtual address [0x7fffdbe00000, 0x7fffdc000000], physical = address 0x7b600000, len 2097152 > Region 2: virtual address [0x7fffdba00000, 0x7fffdbc00000], physical = address 0xf25800000, len 2097152 > Region 3: virtual address [0x7ffef5800000, 0x7fffdb800000], physical = address 0xf25c00000, len 3858759680 > Region 4: virtual address [0x7ffef5400000, 0x7ffef5600000], physical = address 0x100f000000, len 2097152 > Region 5: virtual address [0x7ffef5000000, 0x7ffef5200000], physical = address 0x1024000000, len 2097152 > > I set the real primary application with = --base-virtaddr=3D0x7fffdc200000 > > The error in the secondary process is: > EAL: Cannot mmap device resource file = /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7ffff2bfd000 This one seems like a hardware issue around the PCI device can not be = set to the correct. The path above is the device path to the resource0 = value in the PCI and the system is having problem mapping the address. = The secondary process, does it need to have the same option setting the = base address? Sorry, not much help here as I not able to focus on the problem more = because I am off site at a week long meeting. > > It seems that they are not accessing the same region. > > Regards, > Kai > > On Wed, Mar 15, 2017 at 12:47 AM, David Coen wrote: > Hi Kai, I agree with you. > > > > Hi have quite the same issue, a primary application and a secondary = one running, sometimes, with more than 4 cores. > > I'm using DPDK 16.11 on RedHat 6.7. > > > > Till now I solved in this way: > > > > - Disabling ASLR by adding those two lines to "/etc/sysctl.conf": > > # Disable Address Space Layout Randomization (ASLR) = (needed by DPDK) > > kernel.randomize_va_space =3D 0 > > > > - Getting virtual address of the first (the one with the minimum = address value) memory segment returned from the function = "rte_eal_get_physmem_layout ()", called from a "dummy" primary = application used only to get this address. > > - Passing the above virtual address as a parameter for the "real" = primary application using the " --base-virtaddr=3D " dpdk command line = option. When secondary app starts, it all goes well with the specified = base address. > > > > I've tested this solution on different servers and it's always ok. > > I think that there is some kind of limitation on DPDK = primary/secondary initialization process that could be improved. > > > > Regards, > > David > > > > -----Messaggio originale----- > > Da: Kai Zhang [mailto:kay21s@gmail.com] > > Inviato: luned=C3=AC 13 marzo 2017 11:59 > > A: Van Haaren, Harry > > Cc: Wiles, Keith; users@dpdk.org > > Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap = device resource file > > > > Thank you for your info, Harry. > > > > Even if the ASLR is the root reason, I don't think DPDK should expect = users to disable it to use the primary/secondary model. Is it possible = for the DPDK team to check this issue and fix the bug? > > > > Regards, > > Kai > > > > On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry < = harry.van.haaren@intel.com> wrote: > > > > > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang > > > > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot > > > > mmap > > > device resource > > > > file > > > > > > > > Yes, my application is somewhat special and should run with the > > > > primary/secondary mode. I will search for the way to turn of the > > > > random page mapping and try it. > > > > > > > > > You're searching for ASLR, or Address Space Layout Randomization. > > > > > > Some useful links regarding ASLR, DPDK and Linux; > > > http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_ > > > support.html#multi-process-limitations > > > = http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-as > > > lr- > > > address-space-layout-randomization > > > http://dpdk.org/ml/archives/dev/2015-June/019364.html > > > > > > Please note that ASLR is a security feature of the OS, think twice > > > before disabling it. > > > > > > > > > Hope that helps, -Harry > > > > > > > > > > Thanks for your help :) > > > > > > > > Regards, > > > > Kai > > > > > > > > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith > > > > > > > wrote: > > > > > > > > > > > > > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang = wrote: > > > > > > > > > > > > > > > > > > Your application may be attaching to the same port for each = core. > > > > > Normally this means the each core could be allocating memory and > > > > > the > > > 4th > > > > > core just goes over the amount of memory you have reserved. > > > > > > > > > > > > I don't think so. Because the error is in the rte_eal_init(), > > > > > > which > > > is > > > > > executed in the first line of the main() function. At the time, > > > > > the > > > other > > > > > threads are not even launched. > > > > > > > > > > > > Is it possible to consider this as a bug in DPDK? > > > > > > > > > > One more thing, I run Pktgen as two processes all of the time. = The > > > > > big difference is I do not run in primary and secondary modes. I > > > > > run two different instances of pktgen at the same time without > > > > > seeing this type problem. If the failure is associated with > > > > > primary/secondary > > > application > > > > > model, then it could be a bug in that code as a lot of syncing = up > > > between > > > > > the two processes needs to be done because of memory/device = sharing. > > > One > > > > > problem with P/S applications is memory needs to be mapped at = the > > > > > same address between the processes and Linux has the Random = memory > > > > > mapping builtin for security reasons. I forget the name of the > > > > > mode in Linux to turn off the random page mapping and google is = not work for me ATM. > > > > > > > > > > Does your application require running as a primary/secondary > > > application? > > > > > > > > > > > > > > > > > Regards, > > > > > > Kai > > > > > > > > > > > > > > > > > > > > > > > > > > EAL: Cannot mmap device resource file = /sys/bus/pci/devices/0000:02: > > > 00.0/resource0 > > > > > to address: 0x7fff65bfc000 > > > > > > > EAL: Error - exiting with code: 1 > > > > > > > Cause: Requested device 0000:02:00.0 cannot be used > > > > > > > > > > > > > > Regards, > > > > > > > Kai > > > > > > > > > > > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang = > > > wrote: > > > > > > > > > > > > > > Command line: > > > > > > > primary: sudo ./primary -l 0,1,2,3 -n 4 = --proc-type=3Dprimary > > > > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 > > > > > > > --proc-type=3Dsecondary > > > > > > > > > > > > > > The configurations are as follows: > > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA], XL710 40GbE, = bind > > > > > 02:00.0, 2048 x 4k huge page > > > > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet > > > > > > > Controller > > > > > XL710 for 40GbE QSFP+ (rev 02) [<<- Only bind this one] > > > > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet > > > > > > > Controller > > > > > XL710 for 40GbE QSFP+ (rev 02) > > > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit > > > > > > > Network > > > > > Connection (rev 03) > > > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit > > > > > > > Network > > > > > Connection (rev 03) > > > > > > > Socket 0 > > > > > > > -------- > > > > > > > Core 0 [0, 12] > > > > > > > Core 1 [1, 13] > > > > > > > Core 2 [2, 14] > > > > > > > Core 3 [3, 15] > > > > > > > Core 4 [4, 16] > > > > > > > Core 5 [5, 17] > > > > > > > Core 8 [6, 18] > > > > > > > Core 9 [7, 19] > > > > > > > Core 10 [8, 20] > > > > > > > Core 11 [9, 21] > > > > > > > Core 12 [10, 22] > > > > > > > Core 13 [11, 23] > > > > > > > > > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA], No Port Bind, > > > 2048 x > > > > > 4k huge page > > > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit > > > > > > > Network > > > > > Connection (rev 03) > > > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit > > > > > > > Network > > > > > Connection (rev 03) > > > > > > > Socket 0 Socket 1 > > > > > > > -------- -------- > > > > > > > Core 0 [0, 20] [10, 30] > > > > > > > Core 1 [1, 21] [11, 31] > > > > > > > Core 2 [2, 22] [12, 32] > > > > > > > Core 3 [3, 23] [13, 33] > > > > > > > Core 4 [4, 24] [14, 34] > > > > > > > Core 8 [5, 25] [15, 35] > > > > > > > Core 9 [6, 26] [16, 36] > > > > > > > Core 10 [7, 27] [17, 37] > > > > > > > Core 11 [8, 28] [18, 38] > > > > > > > Core 12 [9, 29] [19, 39] > > > > > > > > > > > > > > Ah, as machine B does not have a 40GbE, I did not bind any = NIC > > > > > > > and > > > run > > > > > my program with locally generated packets. But I am using other > > > > > DPDK features, such as memory sharing and message passing. Maybe > > > > > that is the reason it works correctly? I can only access machine = B > > > > > remotely, so I > > > am > > > > > unable to install a NIC on it. I have another PC that is used as = a > > > client > > > > > that only has four cores, which also cannot be used for = verification... > > > > > > > > > > > > > > Regards, > > > > > > > Kai > > > > > > > > > > > > > > > > > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith < > > > keith.wiles@intel.com> > > > > > wrote: > > > > > > > > > > > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang = wrote: > > > > > > > > > > > > > > > > Hi Keith, > > > > > > > > > > > > > > > > Thank you for your reply. > > > > > > > > > > > > > > > > I have tested my program on two machines > > > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA] > > > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA] > > > > > > > > > > > > > > > > I am very sure that the primary process uses different = cores > > > > > > > > with > > > > > the secondary process. The strange thing is that my program = works > > > correctly > > > > > on machine B. But on machine A, the above issue happens with = more > > > > > than > > > 4 > > > > > cores assigned to the secondary process. > > > > > > > > > > > > > > > > I have tried to assign cores 1-5 to the secondary process > > > > > > > > and > > > also > > > > > tried other core assignment policies, but the error still = happens > > > > > rte_eal_init() with more than 4 cores. > > > > > > > > > > > > > > It would be nice to see both command lines. I am not sure I > > > > > > > can > > > help > > > > > more all I can do is suggest some ideas to look at. > > > > > > > > > > > > > > Does machine B have the same number and type of NICs? Use > > > > > > > =E2=80=98lspci | > > > > > grep Ethernet=E2=80=99 to get a list of all Ethernet devices on = both machines. > > > > > > > > > > > > > > What is the number of hugepages you have allocated for both > > > machines. > > > > > > > > > > > > > > Also look at the cpu_layout.py script to see why adding the > > > > > > > 5th > > > core > > > > > would be different on the two machines and try to make them the = same. > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > Kai > > > > > > > > > > > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith < > > > > > keith.wiles@intel.com> wrote: > > > > > > > > > > > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang = > > > wrote: > > > > > > > > > > > > > > > > > > Hi, there > > > > > > > > > > > > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS > > > > > > > > > 7.3.1611 > > > with > > > > > Linux > > > > > > > > > kernel version 3.8.0-30. > > > > > > > > > > > > > > > > > > I have a master process and a secondary process. When I > > > > > > > > > run the > > > > > secondary > > > > > > > > > process with less than or equal to 4 cores, it works = correctly. > > > > > Such as: > > > > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=3Dsecondary = sudo > > > > > > > > > ./program -c 0x0f -n 4 --proc-type=3Dsecondary > > > > > > > > > > > > > > > > > > However, there will be error in the rte_eal_init if I > > > > > > > > > assign > > > more > > > > > than 4 > > > > > > > > > cores. > > > > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=3Dsecondary > > > > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=3Dsecondary > > > > > > > > > > > > > > > > > > EAL: Cannot mmap device resource file > > > > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: > > > > > 0x7fff65bfc000 > > > > > > > > > EAL: Error - exiting with code: 1 > > > > > > > > > Cause: Requested device 0000:02:00.0 cannot be used > > > > > > > > > > > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 = on > > > > > > > > the > > > > > secondary process. > > > > > > > > > > > > > > > > You did not show the primary process command line, but the > > > > > > > > if you > > > > > use 1-5 then you can only give primary process -l 6-7 or two > > > > > cores. It > > > is > > > > > always a reasonable thing is to leave core zero for linux to = use. > > > > > > > > > > > > > > > > Also it could be you ran out of memory or hugepages you > > > allocated to > > > > > the system. > > > > > > > > > > > > > > > > > > > > > > > > > > Anyone knows why this happens? > > > > > > > > > > > > > > > > > > Thanks a lot, > > > > > > > > > Kai Zhang > > > > > > > > > > > > > > > > Regards, > > > > > > > > Keith > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > Keith > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Keith > > > > > > > > > > Regards, > > > > > Keith > > > > > > > > > > > > > > > > > > > Regards, Keith