From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by dpdk.org (Postfix) with ESMTP id B9DD31075 for ; Mon, 13 Mar 2017 11:59:45 +0100 (CET) Received: by mail-wm0-f52.google.com with SMTP id n11so37207711wma.1 for ; Mon, 13 Mar 2017 03:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=0U2H4xdEAOvBlHtHiZyuAqgsYLtMe70JHT3ije8e+0Y=; b=ADLXkWZ1eMpohNZKMI/yP7ZmMdX+RdQow1VBRDksZmFJKxaEoa4g3FLW9CKJXN9SnR alORwEqb71kLn685Rw7+15g1LSuwXsNAAx6rl7QYswvRueSZK4qsn8HTyrnG778YAV5w M0TiVRNJUlwX36+4SaNoVoR9hjOmD6uTXg65744K59fyuaGt7nYahn7lvMZFMsxFlWU7 z9tmcSmXoch5LXD75ikrV76VyG2pSKuyRN3KDSTApRQ4TtIlczvSxZTU8NPnr1N4aNSN 0rowHQdNK29gD9vPHTvZvitvhza8jD3bTT7PWTG3qlzd/hAjbi2pSW/WeAj+taRu/2FV KyQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=0U2H4xdEAOvBlHtHiZyuAqgsYLtMe70JHT3ije8e+0Y=; b=sJ3MYGsWu3auRYLdOPQgEyhfdUj2sFJ8uGTvXwunWUE38wPyRdtoXvZLql5LlBhqBG 8QLPrFxV4EePrP/nDRK6TEiNjusY4OwFJHAKk52QdRUOcCrKM+wi5MjYm1Afnolpbj3s BG42e4sFqU0Mt7zalH/al7Hlv/NC1sYV+0jsvlzW53jeXJSMhErVED/ONvufNxbea414 ipJvEYY/0uVyr75PVdl8ivxlNxAnoiNWTiHUme2W4KrpaCHB28ZlifokKRVjEH3XVcCE rv6wUmZeB3iWbO4EBJIUwLKJeVN7SSyso5gkI6ercEcpUV29FAjMXPuPL/oRX0zK/hqA G7oA== X-Gm-Message-State: AFeK/H0dKmWyiSi2DXZv6et/vO/ew0XiE4+oXy19bmaWU2q6njq2XbNorAvsA+xdK4Y4kj6AxbPmNfNX++hN9w== X-Received: by 10.28.101.68 with SMTP id z65mr9123766wmb.102.1489402785129; Mon, 13 Mar 2017 03:59:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.141.80 with HTTP; Mon, 13 Mar 2017 03:59:24 -0700 (PDT) In-Reply-To: References: <53D94D58-3FBF-4DFF-9DC4-70ABF4E1CAB2@intel.com> <305F974F-C9A3-4D54-85C2-6F503A98D890@intel.com> From: Kai Zhang Date: Mon, 13 Mar 2017 18:59:24 +0800 Message-ID: To: "Van Haaren, Harry" Cc: "Wiles, Keith" , "users@dpdk.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 10:59:45 -0000 Thank you for your info, Harry. Even if the ASLR is the root reason, I don't think DPDK should expect users to disable it to use the primary/secondary model. Is it possible for the DPDK team to check this issue and fix the bug? Regards, Kai On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry < harry.van.haaren@intel.com> wrote: > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang > > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap > device resource > > file > > > > Yes, my application is somewhat special and should run with the > > primary/secondary mode. I will search for the way to turn of the random > > page mapping and try it. > > > You're searching for ASLR, or Address Space Layout Randomization. > > Some useful links regarding ASLR, DPDK and Linux; > http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_ > support.html#multi-process-limitations > http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-aslr- > address-space-layout-randomization > http://dpdk.org/ml/archives/dev/2015-June/019364.html > > Please note that ASLR is a security feature of the OS, think twice before > disabling it. > > > Hope that helps, -Harry > > > > Thanks for your help :) > > > > Regards, > > Kai > > > > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith > wrote: > > > > > > > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang wrote: > > > > > > > > > > > > Your application may be attaching to the same port for each core. > > > Normally this means the each core could be allocating memory and the > 4th > > > core just goes over the amount of memory you have reserved. > > > > > > > > I don't think so. Because the error is in the rte_eal_init(), which > is > > > executed in the first line of the main() function. At the time, the > other > > > threads are not even launched. > > > > > > > > Is it possible to consider this as a bug in DPDK? > > > > > > One more thing, I run Pktgen as two processes all of the time. The bi= g > > > difference is I do not run in primary and secondary modes. I run two > > > different instances of pktgen at the same time without seeing this ty= pe > > > problem. If the failure is associated with primary/secondary > application > > > model, then it could be a bug in that code as a lot of syncing up > between > > > the two processes needs to be done because of memory/device sharing. > One > > > problem with P/S applications is memory needs to be mapped at the sam= e > > > address between the processes and Linux has the Random memory mapping > > > builtin for security reasons. I forget the name of the mode in Linux = to > > > turn off the random page mapping and google is not work for me ATM. > > > > > > Does your application require running as a primary/secondary > application? > > > > > > > > > > > Regards, > > > > Kai > > > > > > > > > > > > > > > > > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:0= 2: > 00.0/resource0 > > > to address: 0x7fff65bfc000 > > > > > EAL: Error - exiting with code: 1 > > > > > Cause: Requested device 0000:02:00.0 cannot be used > > > > > > > > > > Regards, > > > > > Kai > > > > > > > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang > wrote: > > > > > > > > > > Command line: > > > > > primary: sudo ./primary -l 0,1,2,3 -n 4 --proc-type=3Dprimar= y > > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=3Dsecon= dary > > > > > > > > > > The configurations are as follows: > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA], XL710 40GbE, bind > > > 02:00.0, 2048 x 4k huge page > > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controlle= r > > > XL710 for 40GbE QSFP+ (rev 02) [<<- Only bind this one] > > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controlle= r > > > XL710 for 40GbE QSFP+ (rev 02) > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Netwo= rk > > > Connection (rev 03) > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Netwo= rk > > > Connection (rev 03) > > > > > Socket 0 > > > > > -------- > > > > > Core 0 [0, 12] > > > > > Core 1 [1, 13] > > > > > Core 2 [2, 14] > > > > > Core 3 [3, 15] > > > > > Core 4 [4, 16] > > > > > Core 5 [5, 17] > > > > > Core 8 [6, 18] > > > > > Core 9 [7, 19] > > > > > Core 10 [8, 20] > > > > > Core 11 [9, 21] > > > > > Core 12 [10, 22] > > > > > Core 13 [11, 23] > > > > > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA], No Port Bind, > 2048 x > > > 4k huge page > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Netwo= rk > > > Connection (rev 03) > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Netwo= rk > > > Connection (rev 03) > > > > > Socket 0 Socket 1 > > > > > -------- -------- > > > > > Core 0 [0, 20] [10, 30] > > > > > Core 1 [1, 21] [11, 31] > > > > > Core 2 [2, 22] [12, 32] > > > > > Core 3 [3, 23] [13, 33] > > > > > Core 4 [4, 24] [14, 34] > > > > > Core 8 [5, 25] [15, 35] > > > > > Core 9 [6, 26] [16, 36] > > > > > Core 10 [7, 27] [17, 37] > > > > > Core 11 [8, 28] [18, 38] > > > > > Core 12 [9, 29] [19, 39] > > > > > > > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC an= d > run > > > my program with locally generated packets. But I am using other DPDK > > > features, such as memory sharing and message passing. Maybe that is t= he > > > reason it works correctly? I can only access machine B remotely, so I > am > > > unable to install a NIC on it. I have another PC that is used as a > client > > > that only has four cores, which also cannot be used for verification.= .. > > > > > > > > > > Regards, > > > > > Kai > > > > > > > > > > > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith < > keith.wiles@intel.com> > > > wrote: > > > > > > > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang wrote= : > > > > > > > > > > > > Hi Keith, > > > > > > > > > > > > Thank you for your reply. > > > > > > > > > > > > I have tested my program on two machines > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA] > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA] > > > > > > > > > > > > I am very sure that the primary process uses different cores wi= th > > > the secondary process. The strange thing is that my program works > correctly > > > on machine B. But on machine A, the above issue happens with more tha= n > 4 > > > cores assigned to the secondary process. > > > > > > > > > > > > I have tried to assign cores 1-5 to the secondary process and > also > > > tried other core assignment policies, but the error still happens > > > rte_eal_init() with more than 4 cores. > > > > > > > > > > It would be nice to see both command lines. I am not sure I can > help > > > more all I can do is suggest some ideas to look at. > > > > > > > > > > Does machine B have the same number and type of NICs? Use =E2=80= =98lspci | > > > grep Ethernet=E2=80=99 to get a list of all Ethernet devices on both = machines. > > > > > > > > > > What is the number of hugepages you have allocated for both > machines. > > > > > > > > > > Also look at the cpu_layout.py script to see why adding the 5th > core > > > would be different on the two machines and try to make them the same. > > > > > > > > > > > > > > > > > Regards, > > > > > > Kai > > > > > > > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith < > > > keith.wiles@intel.com> wrote: > > > > > > > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang > wrote: > > > > > > > > > > > > > > Hi, there > > > > > > > > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 > with > > > Linux > > > > > > > kernel version 3.8.0-30. > > > > > > > > > > > > > > I have a master process and a secondary process. When I run t= he > > > secondary > > > > > > > process with less than or equal to 4 cores, it works correctl= y. > > > Such as: > > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=3Dsecondary > > > > > > > sudo ./program -c 0x0f -n 4 --proc-type=3Dsecondary > > > > > > > > > > > > > > However, there will be error in the rte_eal_init if I assign > more > > > than 4 > > > > > > > cores. > > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=3Dsecondary > > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=3Dsecondary > > > > > > > > > > > > > > EAL: Cannot mmap device resource file > > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: > > > 0x7fff65bfc000 > > > > > > > EAL: Error - exiting with code: 1 > > > > > > > Cause: Requested device 0000:02:00.0 cannot be used > > > > > > > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on th= e > > > secondary process. > > > > > > > > > > > > You did not show the primary process command line, but the if y= ou > > > use 1-5 then you can only give primary process -l 6-7 or two cores. I= t > is > > > always a reasonable thing is to leave core zero for linux to use. > > > > > > > > > > > > Also it could be you ran out of memory or hugepages you > allocated to > > > the system. > > > > > > > > > > > > > > > > > > > > Anyone knows why this happens? > > > > > > > > > > > > > > Thanks a lot, > > > > > > > Kai Zhang > > > > > > > > > > > > Regards, > > > > > > Keith > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > Keith > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > Keith > > > > > > Regards, > > > Keith > > > > > > >