DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
@ 2017-03-11  3:35 Kai Zhang
  2017-03-11 14:52 ` Wiles, Keith
  0 siblings, 1 reply; 16+ messages in thread
From: Kai Zhang @ 2017-03-11  3:35 UTC (permalink / raw)
  To: users

Hi, there

I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
kernel version 3.8.0-30.

I have a master process and a secondary process. When I run the secondary
process with less than or equal to 4 cores, it works correctly. Such as:
sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
sudo ./program -c 0x0f -n 4 --proc-type=secondary

However, there will be error in the rte_eal_init if I assign more than 4
cores.
sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
sudo ./program -c 0x1f -n 4 --proc-type=secondary

EAL: Cannot mmap device resource file
/sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
EAL: Error - exiting with code: 1
  Cause: Requested device 0000:02:00.0 cannot be used

Anyone knows why this happens?

Thanks a lot,
Kai Zhang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-11  3:35 [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file Kai Zhang
@ 2017-03-11 14:52 ` Wiles, Keith
  2017-03-11 15:45   ` Kai Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: Wiles, Keith @ 2017-03-11 14:52 UTC (permalink / raw)
  To: Kai Zhang; +Cc: users


> On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> 
> Hi, there
> 
> I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> kernel version 3.8.0-30.
> 
> I have a master process and a secondary process. When I run the secondary
> process with less than or equal to 4 cores, it works correctly. Such as:
> sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> sudo ./program -c 0x0f -n 4 --proc-type=secondary
> 
> However, there will be error in the rte_eal_init if I assign more than 4
> cores.
> sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> sudo ./program -c 0x1f -n 4 --proc-type=secondary
> 
> EAL: Cannot mmap device resource file
> /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> EAL: Error - exiting with code: 1
>  Cause: Requested device 0000:02:00.0 cannot be used

I assume you have at least 8 cores. Have you tried -l 1-5 on the secondary process.

You did not show the primary process command line, but the if you use 1-5 then you can only give primary process -l 6-7 or two cores. It is always a reasonable thing is to leave core zero for linux to use.

Also it could be you ran out of memory or hugepages you allocated to the system.

> 
> Anyone knows why this happens?
> 
> Thanks a lot,
> Kai Zhang

Regards,
Keith

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-11 14:52 ` Wiles, Keith
@ 2017-03-11 15:45   ` Kai Zhang
  2017-03-11 18:59     ` Wiles, Keith
  0 siblings, 1 reply; 16+ messages in thread
From: Kai Zhang @ 2017-03-11 15:45 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users

Hi Keith,

Thank you for your reply.

I have tested my program on two machines
A) 1 x Intel E5-2650 v4, 12 cores [UMA]
B) 2 x Intel E5-2640 v4, 10 cores [NUMA]

I am very sure that the primary process uses different cores with the
secondary process. The strange thing is that my program works correctly on
machine *B*. But on machine *A*, the above issue happens with more than 4
cores assigned to the secondary process.

I have tried to assign cores 1-5  to the secondary process and also
tried other core assignment policies, but the error still happens
rte_eal_init() with more than 4 cores.

Regards,
Kai

On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com>
wrote:

>
> > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Hi, there
> >
> > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > kernel version 3.8.0-30.
> >
> > I have a master process and a secondary process. When I run the secondary
> > process with less than or equal to 4 cores, it works correctly. Such as:
> > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> >
> > However, there will be error in the rte_eal_init if I assign more than 4
> > cores.
> > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> >
> > EAL: Cannot mmap device resource file
> > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >  Cause: Requested device 0000:02:00.0 cannot be used
>
> I assume you have at least 8 cores. Have you tried -l 1-5 on the secondary
> process.
>
> You did not show the primary process command line, but the if you use 1-5
> then you can only give primary process -l 6-7 or two cores. It is always a
> reasonable thing is to leave core zero for linux to use.
>
> Also it could be you ran out of memory or hugepages you allocated to the
> system.
>
> >
> > Anyone knows why this happens?
> >
> > Thanks a lot,
> > Kai Zhang
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-11 15:45   ` Kai Zhang
@ 2017-03-11 18:59     ` Wiles, Keith
  2017-03-12  3:21       ` Kai Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: Wiles, Keith @ 2017-03-11 18:59 UTC (permalink / raw)
  To: Kai Zhang; +Cc: users


> On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> 
> Hi Keith,
> 
> Thank you for your reply.
> 
> I have tested my program on two machines
> A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> 
> I am very sure that the primary process uses different cores with the secondary process. The strange thing is that my program works correctly on machine B. But on machine A, the above issue happens with more than 4 cores assigned to the secondary process.
> 
> I have tried to assign cores 1-5  to the secondary process and also tried other core assignment policies, but the error still happens rte_eal_init() with more than 4 cores.

It would be nice to see both command lines. I am not sure I can help more all I can do is suggest some ideas to look at.

Does machine B have the same number and type of NICs? Use ‘lspci | grep Ethernet’ to get a list of all Ethernet devices on both machines.

What is the number of hugepages you have allocated for both machines.

Also look at the cpu_layout.py script to see why adding the 5th core would be different on the two machines and try to make them the same. 

> 
> Regards,
> Kai
> 
> On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
> > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Hi, there
> >
> > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > kernel version 3.8.0-30.
> >
> > I have a master process and a secondary process. When I run the secondary
> > process with less than or equal to 4 cores, it works correctly. Such as:
> > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> >
> > However, there will be error in the rte_eal_init if I assign more than 4
> > cores.
> > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> >
> > EAL: Cannot mmap device resource file
> > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >  Cause: Requested device 0000:02:00.0 cannot be used
> 
> I assume you have at least 8 cores. Have you tried -l 1-5 on the secondary process.
> 
> You did not show the primary process command line, but the if you use 1-5 then you can only give primary process -l 6-7 or two cores. It is always a reasonable thing is to leave core zero for linux to use.
> 
> Also it could be you ran out of memory or hugepages you allocated to the system.
> 
> >
> > Anyone knows why this happens?
> >
> > Thanks a lot,
> > Kai Zhang
> 
> Regards,
> Keith
> 
> 

Regards,
Keith


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-11 18:59     ` Wiles, Keith
@ 2017-03-12  3:21       ` Kai Zhang
  2017-03-12  3:29         ` Kai Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: Kai Zhang @ 2017-03-12  3:21 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users

Command line:
primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary

The configurations are as follows:
A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,
 2048 x 4k huge page
02:00.0 *Ethernet* controller: Intel Corporation *Ethernet* Controller
XL710 for 40GbE QSFP+ (rev 02)  * [<<- Only bind this one]*
02:00.1 *Ethernet* controller: Intel Corporation *Ethernet* Controller
XL710 for 40GbE QSFP+ (rev 02)
05:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
06:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
        Socket 0
--------
Core 0  [0, 12]
Core 1  [1, 13]
Core 2  [2, 14]
Core 3  [3, 15]
Core 4  [4, 16]
Core 5  [5, 17]
Core 8  [6, 18]
Core 9  [7, 19]
Core 10 [8, 20]
Core 11 [9, 21]
Core 12 [10, 22]
Core 13 [11, 23]

B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k
huge page
05:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
06:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
        Socket 0        Socket 1
        --------        --------
Core 0  [0, 20]         [10, 30]
Core 1  [1, 21]         [11, 31]
Core 2  [2, 22]         [12, 32]
Core 3  [3, 23]         [13, 33]
Core 4  [4, 24]         [14, 34]
Core 8  [5, 25]         [15, 35]
Core 9  [6, 26]         [16, 36]
Core 10 [7, 27]         [17, 37]
Core 11 [8, 28]         [18, 38]
Core 12 [9, 29]         [19, 39]

Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my
program with locally generated packets. But I am using other DPDK features,
such as memory sharing and message passing. Maybe that is the reason it
works correctly? I can only access machine B remotely, so I am unable to
install a NIC on it. I have another PC that is used as a client that only
has four cores, which also cannot be used for verification...

Regards,
Kai


On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com> wrote:

>
> > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Hi Keith,
> >
> > Thank you for your reply.
> >
> > I have tested my program on two machines
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> >
> > I am very sure that the primary process uses different cores with the
> secondary process. The strange thing is that my program works correctly on
> machine B. But on machine A, the above issue happens with more than 4 cores
> assigned to the secondary process.
> >
> > I have tried to assign cores 1-5  to the secondary process and also
> tried other core assignment policies, but the error still happens
> rte_eal_init() with more than 4 cores.
>
> It would be nice to see both command lines. I am not sure I can help more
> all I can do is suggest some ideas to look at.
>
> Does machine B have the same number and type of NICs? Use ‘lspci | grep
> Ethernet’ to get a list of all Ethernet devices on both machines.
>
> What is the number of hugepages you have allocated for both machines.
>
> Also look at the cpu_layout.py script to see why adding the 5th core would
> be different on the two machines and try to make them the same.
>
> >
> > Regards,
> > Kai
> >
> > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
> >
> > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > > Hi, there
> > >
> > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with
> Linux
> > > kernel version 3.8.0-30.
> > >
> > > I have a master process and a secondary process. When I run the
> secondary
> > > process with less than or equal to 4 cores, it works correctly. Such
> as:
> > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > >
> > > However, there will be error in the rte_eal_init if I assign more than
> 4
> > > cores.
> > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > >
> > > EAL: Cannot mmap device resource file
> > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > > EAL: Error - exiting with code: 1
> > >  Cause: Requested device 0000:02:00.0 cannot be used
> >
> > I assume you have at least 8 cores. Have you tried -l 1-5 on the
> secondary process.
> >
> > You did not show the primary process command line, but the if you use
> 1-5 then you can only give primary process -l 6-7 or two cores. It is
> always a reasonable thing is to leave core zero for linux to use.
> >
> > Also it could be you ran out of memory or hugepages you allocated to the
> system.
> >
> > >
> > > Anyone knows why this happens?
> > >
> > > Thanks a lot,
> > > Kai Zhang
> >
> > Regards,
> > Keith
> >
> >
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12  3:21       ` Kai Zhang
@ 2017-03-12  3:29         ` Kai Zhang
  2017-03-12 10:32           ` Wiles, Keith
  0 siblings, 1 reply; 16+ messages in thread
From: Kai Zhang @ 2017-03-12  3:29 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users

Update:

After unbinding the ports on machine A, the program passes rte_eal_init()
and works correctly with the primary process for any number of cores.

>From the error message when binding the port, I think there are some
resource allocation issues with the bound port. But why is it related with
the number of cores ...

EAL: Cannot mmap device resource file
/sys/bus/pci/devices/*0000:02:00.0*/resource0
to address: 0x7fff65bfc000
EAL: Error - exiting with code: 1
  Cause: Requested device *0000:02:00.0* cannot be used

Regards,
Kai

On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:

>
> Command line:
> primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
>
> The configurations are as follows:
> A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,
>  2048 x 4k huge page
> 02:00.0 *Ethernet* controller: Intel Corporation *Ethernet* Controller
> XL710 for 40GbE QSFP+ (rev 02)  * [<<- Only bind this one]*
> 02:00.1 *Ethernet* controller: Intel Corporation *Ethernet* Controller
> XL710 for 40GbE QSFP+ (rev 02)
> 05:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> 06:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
>         Socket 0
> --------
> Core 0  [0, 12]
> Core 1  [1, 13]
> Core 2  [2, 14]
> Core 3  [3, 15]
> Core 4  [4, 16]
> Core 5  [5, 17]
> Core 8  [6, 18]
> Core 9  [7, 19]
> Core 10 [8, 20]
> Core 11 [9, 21]
> Core 12 [10, 22]
> Core 13 [11, 23]
>
> B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k
> huge page
> 05:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> 06:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
>         Socket 0        Socket 1
>         --------        --------
> Core 0  [0, 20]         [10, 30]
> Core 1  [1, 21]         [11, 31]
> Core 2  [2, 22]         [12, 32]
> Core 3  [3, 23]         [13, 33]
> Core 4  [4, 24]         [14, 34]
> Core 8  [5, 25]         [15, 35]
> Core 9  [6, 26]         [16, 36]
> Core 10 [7, 27]         [17, 37]
> Core 11 [8, 28]         [18, 38]
> Core 12 [9, 29]         [19, 39]
>
> Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my
> program with locally generated packets. But I am using other DPDK features,
> such as memory sharing and message passing. Maybe that is the reason it
> works correctly? I can only access machine B remotely, so I am unable to
> install a NIC on it. I have another PC that is used as a client that only
> has four cores, which also cannot be used for verification...
>
> Regards,
> Kai
>
>
> On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
>>
>> > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
>> >
>> > Hi Keith,
>> >
>> > Thank you for your reply.
>> >
>> > I have tested my program on two machines
>> > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
>> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
>> >
>> > I am very sure that the primary process uses different cores with the
>> secondary process. The strange thing is that my program works correctly on
>> machine B. But on machine A, the above issue happens with more than 4 cores
>> assigned to the secondary process.
>> >
>> > I have tried to assign cores 1-5  to the secondary process and also
>> tried other core assignment policies, but the error still happens
>> rte_eal_init() with more than 4 cores.
>>
>> It would be nice to see both command lines. I am not sure I can help more
>> all I can do is suggest some ideas to look at.
>>
>> Does machine B have the same number and type of NICs? Use ‘lspci | grep
>> Ethernet’ to get a list of all Ethernet devices on both machines.
>>
>> What is the number of hugepages you have allocated for both machines.
>>
>> Also look at the cpu_layout.py script to see why adding the 5th core
>> would be different on the two machines and try to make them the same.
>>
>> >
>> > Regards,
>> > Kai
>> >
>> > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com>
>> wrote:
>> >
>> > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
>> > >
>> > > Hi, there
>> > >
>> > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with
>> Linux
>> > > kernel version 3.8.0-30.
>> > >
>> > > I have a master process and a secondary process. When I run the
>> secondary
>> > > process with less than or equal to 4 cores, it works correctly. Such
>> as:
>> > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
>> > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
>> > >
>> > > However, there will be error in the rte_eal_init if I assign more
>> than 4
>> > > cores.
>> > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
>> > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
>> > >
>> > > EAL: Cannot mmap device resource file
>> > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
>> 0x7fff65bfc000
>> > > EAL: Error - exiting with code: 1
>> > >  Cause: Requested device 0000:02:00.0 cannot be used
>> >
>> > I assume you have at least 8 cores. Have you tried -l 1-5 on the
>> secondary process.
>> >
>> > You did not show the primary process command line, but the if you use
>> 1-5 then you can only give primary process -l 6-7 or two cores. It is
>> always a reasonable thing is to leave core zero for linux to use.
>> >
>> > Also it could be you ran out of memory or hugepages you allocated to
>> the system.
>> >
>> > >
>> > > Anyone knows why this happens?
>> > >
>> > > Thanks a lot,
>> > > Kai Zhang
>> >
>> > Regards,
>> > Keith
>> >
>> >
>>
>> Regards,
>> Keith
>>
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12  3:29         ` Kai Zhang
@ 2017-03-12 10:32           ` Wiles, Keith
  2017-03-12 10:39             ` Kai Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: Wiles, Keith @ 2017-03-12 10:32 UTC (permalink / raw)
  To: Kai Zhang; +Cc: users


> On Mar 12, 2017, at 11:29 AM, Kai Zhang <kay21s@gmail.com> wrote:
> 
> Update: 
> 
> After unbinding the ports on machine A, the program passes rte_eal_init() and works correctly with the primary process for any number of cores.
> 
> From the error message when binding the port, I think there are some resource allocation issues with the bound port. But why is it related with the number of cores … 

Your application may be attaching to the same port for each core. Normally this means the each core could be allocating memory and the 4th core just goes over the amount of memory you have reserved.

> 
> EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> EAL: Error - exiting with code: 1
>   Cause: Requested device 0000:02:00.0 cannot be used
> 
> Regards,
> Kai
> 
> On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:
> 
> Command line:
> primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> 
> The configurations are as follows:
> A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,    2048 x 4k huge page
> 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
> 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
>         Socket 0 
> --------        
> Core 0  [0, 12]         
> Core 1  [1, 13]        
> Core 2  [2, 14]        
> Core 3  [3, 15]        
> Core 4  [4, 16]        
> Core 5  [5, 17]        
> Core 8  [6, 18]        
> Core 9  [7, 19]        
> Core 10 [8, 20]        
> Core 11 [9, 21]        
> Core 12 [10, 22]        
> Core 13 [11, 23]
> 
> B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k huge page
> 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
>         Socket 0        Socket 1        
>         --------        --------        
> Core 0  [0, 20]         [10, 30]        
> Core 1  [1, 21]         [11, 31]        
> Core 2  [2, 22]         [12, 32]        
> Core 3  [3, 23]         [13, 33]        
> Core 4  [4, 24]         [14, 34]        
> Core 8  [5, 25]         [15, 35]        
> Core 9  [6, 26]         [16, 36]        
> Core 10 [7, 27]         [17, 37]        
> Core 11 [8, 28]         [18, 38]        
> Core 12 [9, 29]         [19, 39]
> 
> Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my program with locally generated packets. But I am using other DPDK features, such as memory sharing and message passing. Maybe that is the reason it works correctly? I can only access machine B remotely, so I am unable to install a NIC on it. I have another PC that is used as a client that only has four cores, which also cannot be used for verification...
> 
> Regards,
> Kai
> 
> 
> On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
> > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Hi Keith,
> >
> > Thank you for your reply.
> >
> > I have tested my program on two machines
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> >
> > I am very sure that the primary process uses different cores with the secondary process. The strange thing is that my program works correctly on machine B. But on machine A, the above issue happens with more than 4 cores assigned to the secondary process.
> >
> > I have tried to assign cores 1-5  to the secondary process and also tried other core assignment policies, but the error still happens rte_eal_init() with more than 4 cores.
> 
> It would be nice to see both command lines. I am not sure I can help more all I can do is suggest some ideas to look at.
> 
> Does machine B have the same number and type of NICs? Use ‘lspci | grep Ethernet’ to get a list of all Ethernet devices on both machines.
> 
> What is the number of hugepages you have allocated for both machines.
> 
> Also look at the cpu_layout.py script to see why adding the 5th core would be different on the two machines and try to make them the same.
> 
> >
> > Regards,
> > Kai
> >
> > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >
> > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > > Hi, there
> > >
> > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > > kernel version 3.8.0-30.
> > >
> > > I have a master process and a secondary process. When I run the secondary
> > > process with less than or equal to 4 cores, it works correctly. Such as:
> > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > >
> > > However, there will be error in the rte_eal_init if I assign more than 4
> > > cores.
> > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > >
> > > EAL: Cannot mmap device resource file
> > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > > EAL: Error - exiting with code: 1
> > >  Cause: Requested device 0000:02:00.0 cannot be used
> >
> > I assume you have at least 8 cores. Have you tried -l 1-5 on the secondary process.
> >
> > You did not show the primary process command line, but the if you use 1-5 then you can only give primary process -l 6-7 or two cores. It is always a reasonable thing is to leave core zero for linux to use.
> >
> > Also it could be you ran out of memory or hugepages you allocated to the system.
> >
> > >
> > > Anyone knows why this happens?
> > >
> > > Thanks a lot,
> > > Kai Zhang
> >
> > Regards,
> > Keith
> >
> >
> 
> Regards,
> Keith
> 
> 
> 

Regards,
Keith


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12 10:32           ` Wiles, Keith
@ 2017-03-12 10:39             ` Kai Zhang
  2017-03-12 18:55               ` Wiles, Keith
  2017-03-12 19:24               ` Wiles, Keith
  0 siblings, 2 replies; 16+ messages in thread
From: Kai Zhang @ 2017-03-12 10:39 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users

> Your application may be attaching to the same port for each core. Normally
> this means the each core could be allocating memory and the 4th core just
> goes over the amount of memory you have reserved.
>
> I don't think so. Because the error is in the rte_eal_init(), which is
executed in the first line of the main() function. At the time, the other
threads are not even launched.

Is it possible to consider this as a bug in DPDK?

Regards,
Kai



> >
> > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0
> to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device 0000:02:00.0 cannot be used
> >
> > Regards,
> > Kai
> >
> > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Command line:
> > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> >
> > The configurations are as follows:
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,
>   2048 x 4k huge page
> > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710
> for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710
> for 40GbE QSFP+ (rev 02)
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> >         Socket 0
> > --------
> > Core 0  [0, 12]
> > Core 1  [1, 13]
> > Core 2  [2, 14]
> > Core 3  [3, 15]
> > Core 4  [4, 16]
> > Core 5  [5, 17]
> > Core 8  [6, 18]
> > Core 9  [7, 19]
> > Core 10 [8, 20]
> > Core 11 [9, 21]
> > Core 12 [10, 22]
> > Core 13 [11, 23]
> >
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x
> 4k huge page
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> >         Socket 0        Socket 1
> >         --------        --------
> > Core 0  [0, 20]         [10, 30]
> > Core 1  [1, 21]         [11, 31]
> > Core 2  [2, 22]         [12, 32]
> > Core 3  [3, 23]         [13, 33]
> > Core 4  [4, 24]         [14, 34]
> > Core 8  [5, 25]         [15, 35]
> > Core 9  [6, 26]         [16, 36]
> > Core 10 [7, 27]         [17, 37]
> > Core 11 [8, 28]         [18, 38]
> > Core 12 [9, 29]         [19, 39]
> >
> > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run
> my program with locally generated packets. But I am using other DPDK
> features, such as memory sharing and message passing. Maybe that is the
> reason it works correctly? I can only access machine B remotely, so I am
> unable to install a NIC on it. I have another PC that is used as a client
> that only has four cores, which also cannot be used for verification...
> >
> > Regards,
> > Kai
> >
> >
> > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
> >
> > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > > Hi Keith,
> > >
> > > Thank you for your reply.
> > >
> > > I have tested my program on two machines
> > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > >
> > > I am very sure that the primary process uses different cores with the
> secondary process. The strange thing is that my program works correctly on
> machine B. But on machine A, the above issue happens with more than 4 cores
> assigned to the secondary process.
> > >
> > > I have tried to assign cores 1-5  to the secondary process and also
> tried other core assignment policies, but the error still happens
> rte_eal_init() with more than 4 cores.
> >
> > It would be nice to see both command lines. I am not sure I can help
> more all I can do is suggest some ideas to look at.
> >
> > Does machine B have the same number and type of NICs? Use ‘lspci | grep
> Ethernet’ to get a list of all Ethernet devices on both machines.
> >
> > What is the number of hugepages you have allocated for both machines.
> >
> > Also look at the cpu_layout.py script to see why adding the 5th core
> would be different on the two machines and try to make them the same.
> >
> > >
> > > Regards,
> > > Kai
> > >
> > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
> > >
> > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > > >
> > > > Hi, there
> > > >
> > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with
> Linux
> > > > kernel version 3.8.0-30.
> > > >
> > > > I have a master process and a secondary process. When I run the
> secondary
> > > > process with less than or equal to 4 cores, it works correctly. Such
> as:
> > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > >
> > > > However, there will be error in the rte_eal_init if I assign more
> than 4
> > > > cores.
> > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > >
> > > > EAL: Cannot mmap device resource file
> > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
> 0x7fff65bfc000
> > > > EAL: Error - exiting with code: 1
> > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > >
> > > I assume you have at least 8 cores. Have you tried -l 1-5 on the
> secondary process.
> > >
> > > You did not show the primary process command line, but the if you use
> 1-5 then you can only give primary process -l 6-7 or two cores. It is
> always a reasonable thing is to leave core zero for linux to use.
> > >
> > > Also it could be you ran out of memory or hugepages you allocated to
> the system.
> > >
> > > >
> > > > Anyone knows why this happens?
> > > >
> > > > Thanks a lot,
> > > > Kai Zhang
> > >
> > > Regards,
> > > Keith
> > >
> > >
> >
> > Regards,
> > Keith
> >
> >
> >
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12 10:39             ` Kai Zhang
@ 2017-03-12 18:55               ` Wiles, Keith
  2017-03-12 19:24               ` Wiles, Keith
  1 sibling, 0 replies; 16+ messages in thread
From: Wiles, Keith @ 2017-03-12 18:55 UTC (permalink / raw)
  To: Kai Zhang; +Cc: users


> On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
> 
> 
> Your application may be attaching to the same port for each core. Normally this means the each core could be allocating memory and the 4th core just goes over the amount of memory you have reserved.
> 
> I don't think so. Because the error is in the rte_eal_init(), which is executed in the first line of the main() function. At the time, the other threads are not even launched.
> 
> Is it possible to consider this as a bug in DPDK?

It is possible it is a bug, but we need to be able to reproduce the bug. Can you create this problem with any of the DPDK example apps? If not then we need to figure out how to create a set of steps or code to produce the failure.

> 
> Regards,
> Kai
> 
>  
> >
> > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device 0000:02:00.0 cannot be used
> >
> > Regards,
> > Kai
> >
> > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Command line:
> > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> >
> > The configurations are as follows:
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,    2048 x 4k huge page
> > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> >         Socket 0
> > --------
> > Core 0  [0, 12]
> > Core 1  [1, 13]
> > Core 2  [2, 14]
> > Core 3  [3, 15]
> > Core 4  [4, 16]
> > Core 5  [5, 17]
> > Core 8  [6, 18]
> > Core 9  [7, 19]
> > Core 10 [8, 20]
> > Core 11 [9, 21]
> > Core 12 [10, 22]
> > Core 13 [11, 23]
> >
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k huge page
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> >         Socket 0        Socket 1
> >         --------        --------
> > Core 0  [0, 20]         [10, 30]
> > Core 1  [1, 21]         [11, 31]
> > Core 2  [2, 22]         [12, 32]
> > Core 3  [3, 23]         [13, 33]
> > Core 4  [4, 24]         [14, 34]
> > Core 8  [5, 25]         [15, 35]
> > Core 9  [6, 26]         [16, 36]
> > Core 10 [7, 27]         [17, 37]
> > Core 11 [8, 28]         [18, 38]
> > Core 12 [9, 29]         [19, 39]
> >
> > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my program with locally generated packets. But I am using other DPDK features, such as memory sharing and message passing. Maybe that is the reason it works correctly? I can only access machine B remotely, so I am unable to install a NIC on it. I have another PC that is used as a client that only has four cores, which also cannot be used for verification...
> >
> > Regards,
> > Kai
> >
> >
> > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >
> > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > > Hi Keith,
> > >
> > > Thank you for your reply.
> > >
> > > I have tested my program on two machines
> > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > >
> > > I am very sure that the primary process uses different cores with the secondary process. The strange thing is that my program works correctly on machine B. But on machine A, the above issue happens with more than 4 cores assigned to the secondary process.
> > >
> > > I have tried to assign cores 1-5  to the secondary process and also tried other core assignment policies, but the error still happens rte_eal_init() with more than 4 cores.
> >
> > It would be nice to see both command lines. I am not sure I can help more all I can do is suggest some ideas to look at.
> >
> > Does machine B have the same number and type of NICs? Use ‘lspci | grep Ethernet’ to get a list of all Ethernet devices on both machines.
> >
> > What is the number of hugepages you have allocated for both machines.
> >
> > Also look at the cpu_layout.py script to see why adding the 5th core would be different on the two machines and try to make them the same.
> >
> > >
> > > Regards,
> > > Kai
> > >
> > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> > >
> > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > > >
> > > > Hi, there
> > > >
> > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > > > kernel version 3.8.0-30.
> > > >
> > > > I have a master process and a secondary process. When I run the secondary
> > > > process with less than or equal to 4 cores, it works correctly. Such as:
> > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > >
> > > > However, there will be error in the rte_eal_init if I assign more than 4
> > > > cores.
> > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > >
> > > > EAL: Cannot mmap device resource file
> > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > > > EAL: Error - exiting with code: 1
> > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > >
> > > I assume you have at least 8 cores. Have you tried -l 1-5 on the secondary process.
> > >
> > > You did not show the primary process command line, but the if you use 1-5 then you can only give primary process -l 6-7 or two cores. It is always a reasonable thing is to leave core zero for linux to use.
> > >
> > > Also it could be you ran out of memory or hugepages you allocated to the system.
> > >
> > > >
> > > > Anyone knows why this happens?
> > > >
> > > > Thanks a lot,
> > > > Kai Zhang
> > >
> > > Regards,
> > > Keith
> > >
> > >
> >
> > Regards,
> > Keith
> >
> >
> >
> 
> Regards,
> Keith

Regards,
Keith


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12 10:39             ` Kai Zhang
  2017-03-12 18:55               ` Wiles, Keith
@ 2017-03-12 19:24               ` Wiles, Keith
  2017-03-12 23:44                 ` Kai Zhang
  1 sibling, 1 reply; 16+ messages in thread
From: Wiles, Keith @ 2017-03-12 19:24 UTC (permalink / raw)
  To: Kai Zhang; +Cc: users


> On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
> 
> 
> Your application may be attaching to the same port for each core. Normally this means the each core could be allocating memory and the 4th core just goes over the amount of memory you have reserved.
> 
> I don't think so. Because the error is in the rte_eal_init(), which is executed in the first line of the main() function. At the time, the other threads are not even launched.
> 
> Is it possible to consider this as a bug in DPDK?

One more thing, I run Pktgen as two processes all of the time. The big difference is I do not run in primary and secondary modes. I run two different instances of pktgen at the same time without seeing this type problem. If the failure is associated with primary/secondary application model, then it could be a bug in that code as a lot of syncing up between the two processes needs to be done because of memory/device sharing. One problem with P/S applications is memory needs to be mapped at the same address between the processes and Linux has the Random memory mapping builtin for security reasons. I forget the name of the mode in Linux to turn off the random page mapping and google is not work for me ATM.

Does your application require running as a primary/secondary application?

> 
> Regards,
> Kai
> 
>  
> >
> > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device 0000:02:00.0 cannot be used
> >
> > Regards,
> > Kai
> >
> > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Command line:
> > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> >
> > The configurations are as follows:
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,    2048 x 4k huge page
> > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> >         Socket 0
> > --------
> > Core 0  [0, 12]
> > Core 1  [1, 13]
> > Core 2  [2, 14]
> > Core 3  [3, 15]
> > Core 4  [4, 16]
> > Core 5  [5, 17]
> > Core 8  [6, 18]
> > Core 9  [7, 19]
> > Core 10 [8, 20]
> > Core 11 [9, 21]
> > Core 12 [10, 22]
> > Core 13 [11, 23]
> >
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k huge page
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> >         Socket 0        Socket 1
> >         --------        --------
> > Core 0  [0, 20]         [10, 30]
> > Core 1  [1, 21]         [11, 31]
> > Core 2  [2, 22]         [12, 32]
> > Core 3  [3, 23]         [13, 33]
> > Core 4  [4, 24]         [14, 34]
> > Core 8  [5, 25]         [15, 35]
> > Core 9  [6, 26]         [16, 36]
> > Core 10 [7, 27]         [17, 37]
> > Core 11 [8, 28]         [18, 38]
> > Core 12 [9, 29]         [19, 39]
> >
> > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my program with locally generated packets. But I am using other DPDK features, such as memory sharing and message passing. Maybe that is the reason it works correctly? I can only access machine B remotely, so I am unable to install a NIC on it. I have another PC that is used as a client that only has four cores, which also cannot be used for verification...
> >
> > Regards,
> > Kai
> >
> >
> > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >
> > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > > Hi Keith,
> > >
> > > Thank you for your reply.
> > >
> > > I have tested my program on two machines
> > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > >
> > > I am very sure that the primary process uses different cores with the secondary process. The strange thing is that my program works correctly on machine B. But on machine A, the above issue happens with more than 4 cores assigned to the secondary process.
> > >
> > > I have tried to assign cores 1-5  to the secondary process and also tried other core assignment policies, but the error still happens rte_eal_init() with more than 4 cores.
> >
> > It would be nice to see both command lines. I am not sure I can help more all I can do is suggest some ideas to look at.
> >
> > Does machine B have the same number and type of NICs? Use ‘lspci | grep Ethernet’ to get a list of all Ethernet devices on both machines.
> >
> > What is the number of hugepages you have allocated for both machines.
> >
> > Also look at the cpu_layout.py script to see why adding the 5th core would be different on the two machines and try to make them the same.
> >
> > >
> > > Regards,
> > > Kai
> > >
> > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> > >
> > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > > >
> > > > Hi, there
> > > >
> > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > > > kernel version 3.8.0-30.
> > > >
> > > > I have a master process and a secondary process. When I run the secondary
> > > > process with less than or equal to 4 cores, it works correctly. Such as:
> > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > >
> > > > However, there will be error in the rte_eal_init if I assign more than 4
> > > > cores.
> > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > >
> > > > EAL: Cannot mmap device resource file
> > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > > > EAL: Error - exiting with code: 1
> > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > >
> > > I assume you have at least 8 cores. Have you tried -l 1-5 on the secondary process.
> > >
> > > You did not show the primary process command line, but the if you use 1-5 then you can only give primary process -l 6-7 or two cores. It is always a reasonable thing is to leave core zero for linux to use.
> > >
> > > Also it could be you ran out of memory or hugepages you allocated to the system.
> > >
> > > >
> > > > Anyone knows why this happens?
> > > >
> > > > Thanks a lot,
> > > > Kai Zhang
> > >
> > > Regards,
> > > Keith
> > >
> > >
> >
> > Regards,
> > Keith
> >
> >
> >
> 
> Regards,
> Keith

Regards,
Keith


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12 19:24               ` Wiles, Keith
@ 2017-03-12 23:44                 ` Kai Zhang
  2017-03-13  9:58                   ` Van Haaren, Harry
  0 siblings, 1 reply; 16+ messages in thread
From: Kai Zhang @ 2017-03-12 23:44 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users

Yes, my application is somewhat special and should run with the
primary/secondary mode. I will search for the way to turn of the random
page mapping and try it. Thanks for your help :)

Regards,
Kai

On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith <keith.wiles@intel.com> wrote:

>
> > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> >
> > Your application may be attaching to the same port for each core.
> Normally this means the each core could be allocating memory and the 4th
> core just goes over the amount of memory you have reserved.
> >
> > I don't think so. Because the error is in the rte_eal_init(), which is
> executed in the first line of the main() function. At the time, the other
> threads are not even launched.
> >
> > Is it possible to consider this as a bug in DPDK?
>
> One more thing, I run Pktgen as two processes all of the time. The big
> difference is I do not run in primary and secondary modes. I run two
> different instances of pktgen at the same time without seeing this type
> problem. If the failure is associated with primary/secondary application
> model, then it could be a bug in that code as a lot of syncing up between
> the two processes needs to be done because of memory/device sharing. One
> problem with P/S applications is memory needs to be mapped at the same
> address between the processes and Linux has the Random memory mapping
> builtin for security reasons. I forget the name of the mode in Linux to
> turn off the random page mapping and google is not work for me ATM.
>
> Does your application require running as a primary/secondary application?
>
> >
> > Regards,
> > Kai
> >
> >
> > >
> > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0
> to address: 0x7fff65bfc000
> > > EAL: Error - exiting with code: 1
> > >   Cause: Requested device 0000:02:00.0 cannot be used
> > >
> > > Regards,
> > > Kai
> > >
> > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > > Command line:
> > > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> > >
> > > The configurations are as follows:
> > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
> 02:00.0,    2048 x 4k huge page
> > > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> XL710 for 40GbE QSFP+ (rev 02)
> > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> > >         Socket 0
> > > --------
> > > Core 0  [0, 12]
> > > Core 1  [1, 13]
> > > Core 2  [2, 14]
> > > Core 3  [3, 15]
> > > Core 4  [4, 16]
> > > Core 5  [5, 17]
> > > Core 8  [6, 18]
> > > Core 9  [7, 19]
> > > Core 10 [8, 20]
> > > Core 11 [9, 21]
> > > Core 12 [10, 22]
> > > Core 13 [11, 23]
> > >
> > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x
> 4k huge page
> > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
> > >         Socket 0        Socket 1
> > >         --------        --------
> > > Core 0  [0, 20]         [10, 30]
> > > Core 1  [1, 21]         [11, 31]
> > > Core 2  [2, 22]         [12, 32]
> > > Core 3  [3, 23]         [13, 33]
> > > Core 4  [4, 24]         [14, 34]
> > > Core 8  [5, 25]         [15, 35]
> > > Core 9  [6, 26]         [16, 36]
> > > Core 10 [7, 27]         [17, 37]
> > > Core 11 [8, 28]         [18, 38]
> > > Core 12 [9, 29]         [19, 39]
> > >
> > > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run
> my program with locally generated packets. But I am using other DPDK
> features, such as memory sharing and message passing. Maybe that is the
> reason it works correctly? I can only access machine B remotely, so I am
> unable to install a NIC on it. I have another PC that is used as a client
> that only has four cores, which also cannot be used for verification...
> > >
> > > Regards,
> > > Kai
> > >
> > >
> > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
> > >
> > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > > >
> > > > Hi Keith,
> > > >
> > > > Thank you for your reply.
> > > >
> > > > I have tested my program on two machines
> > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > > >
> > > > I am very sure that the primary process uses different cores with
> the secondary process. The strange thing is that my program works correctly
> on machine B. But on machine A, the above issue happens with more than 4
> cores assigned to the secondary process.
> > > >
> > > > I have tried to assign cores 1-5  to the secondary process and also
> tried other core assignment policies, but the error still happens
> rte_eal_init() with more than 4 cores.
> > >
> > > It would be nice to see both command lines. I am not sure I can help
> more all I can do is suggest some ideas to look at.
> > >
> > > Does machine B have the same number and type of NICs? Use ‘lspci |
> grep Ethernet’ to get a list of all Ethernet devices on both machines.
> > >
> > > What is the number of hugepages you have allocated for both machines.
> > >
> > > Also look at the cpu_layout.py script to see why adding the 5th core
> would be different on the two machines and try to make them the same.
> > >
> > > >
> > > > Regards,
> > > > Kai
> > > >
> > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
> keith.wiles@intel.com> wrote:
> > > >
> > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > > > >
> > > > > Hi, there
> > > > >
> > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with
> Linux
> > > > > kernel version 3.8.0-30.
> > > > >
> > > > > I have a master process and a secondary process. When I run the
> secondary
> > > > > process with less than or equal to 4 cores, it works correctly.
> Such as:
> > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > > >
> > > > > However, there will be error in the rte_eal_init if I assign more
> than 4
> > > > > cores.
> > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > > >
> > > > > EAL: Cannot mmap device resource file
> > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
> 0x7fff65bfc000
> > > > > EAL: Error - exiting with code: 1
> > > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > > >
> > > > I assume you have at least 8 cores. Have you tried -l 1-5 on the
> secondary process.
> > > >
> > > > You did not show the primary process command line, but the if you
> use 1-5 then you can only give primary process -l 6-7 or two cores. It is
> always a reasonable thing is to leave core zero for linux to use.
> > > >
> > > > Also it could be you ran out of memory or hugepages you allocated to
> the system.
> > > >
> > > > >
> > > > > Anyone knows why this happens?
> > > > >
> > > > > Thanks a lot,
> > > > > Kai Zhang
> > > >
> > > > Regards,
> > > > Keith
> > > >
> > > >
> > >
> > > Regards,
> > > Keith
> > >
> > >
> > >
> >
> > Regards,
> > Keith
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-12 23:44                 ` Kai Zhang
@ 2017-03-13  9:58                   ` Van Haaren, Harry
  2017-03-13 10:59                     ` Kai Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: Van Haaren, Harry @ 2017-03-13  9:58 UTC (permalink / raw)
  To: Kai Zhang, Wiles, Keith; +Cc: users

> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang
> Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource
> file
> 
> Yes, my application is somewhat special and should run with the
> primary/secondary mode. I will search for the way to turn of the random
> page mapping and try it. 


You're searching for ASLR, or Address Space Layout Randomization.

Some useful links regarding ASLR, DPDK and Linux;
http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_support.html#multi-process-limitations
http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-aslr-address-space-layout-randomization
http://dpdk.org/ml/archives/dev/2015-June/019364.html

Please note that ASLR is a security feature of the OS, think twice before disabling it.


Hope that helps, -Harry


> Thanks for your help :)
> 
> Regards,
> Kai
> 
> On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
> >
> > > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > >
> > >
> > > Your application may be attaching to the same port for each core.
> > Normally this means the each core could be allocating memory and the 4th
> > core just goes over the amount of memory you have reserved.
> > >
> > > I don't think so. Because the error is in the rte_eal_init(), which is
> > executed in the first line of the main() function. At the time, the other
> > threads are not even launched.
> > >
> > > Is it possible to consider this as a bug in DPDK?
> >
> > One more thing, I run Pktgen as two processes all of the time. The big
> > difference is I do not run in primary and secondary modes. I run two
> > different instances of pktgen at the same time without seeing this type
> > problem. If the failure is associated with primary/secondary application
> > model, then it could be a bug in that code as a lot of syncing up between
> > the two processes needs to be done because of memory/device sharing. One
> > problem with P/S applications is memory needs to be mapped at the same
> > address between the processes and Linux has the Random memory mapping
> > builtin for security reasons. I forget the name of the mode in Linux to
> > turn off the random page mapping and google is not work for me ATM.
> >
> > Does your application require running as a primary/secondary application?
> >
> > >
> > > Regards,
> > > Kai
> > >
> > >
> > > >
> > > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0
> > to address: 0x7fff65bfc000
> > > > EAL: Error - exiting with code: 1
> > > >   Cause: Requested device 0000:02:00.0 cannot be used
> > > >
> > > > Regards,
> > > > Kai
> > > >
> > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > > >
> > > > Command line:
> > > > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> > > >
> > > > The configurations are as follows:
> > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
> > 02:00.0,    2048 x 4k huge page
> > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> > XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> > XL710 for 40GbE QSFP+ (rev 02)
> > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)
> > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)
> > > >         Socket 0
> > > > --------
> > > > Core 0  [0, 12]
> > > > Core 1  [1, 13]
> > > > Core 2  [2, 14]
> > > > Core 3  [3, 15]
> > > > Core 4  [4, 16]
> > > > Core 5  [5, 17]
> > > > Core 8  [6, 18]
> > > > Core 9  [7, 19]
> > > > Core 10 [8, 20]
> > > > Core 11 [9, 21]
> > > > Core 12 [10, 22]
> > > > Core 13 [11, 23]
> > > >
> > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x
> > 4k huge page
> > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)
> > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)
> > > >         Socket 0        Socket 1
> > > >         --------        --------
> > > > Core 0  [0, 20]         [10, 30]
> > > > Core 1  [1, 21]         [11, 31]
> > > > Core 2  [2, 22]         [12, 32]
> > > > Core 3  [3, 23]         [13, 33]
> > > > Core 4  [4, 24]         [14, 34]
> > > > Core 8  [5, 25]         [15, 35]
> > > > Core 9  [6, 26]         [16, 36]
> > > > Core 10 [7, 27]         [17, 37]
> > > > Core 11 [8, 28]         [18, 38]
> > > > Core 12 [9, 29]         [19, 39]
> > > >
> > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run
> > my program with locally generated packets. But I am using other DPDK
> > features, such as memory sharing and message passing. Maybe that is the
> > reason it works correctly? I can only access machine B remotely, so I am
> > unable to install a NIC on it. I have another PC that is used as a client
> > that only has four cores, which also cannot be used for verification...
> > > >
> > > > Regards,
> > > > Kai
> > > >
> > > >
> > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <keith.wiles@intel.com>
> > wrote:
> > > >
> > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > > > >
> > > > > Hi Keith,
> > > > >
> > > > > Thank you for your reply.
> > > > >
> > > > > I have tested my program on two machines
> > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > > > >
> > > > > I am very sure that the primary process uses different cores with
> > the secondary process. The strange thing is that my program works correctly
> > on machine B. But on machine A, the above issue happens with more than 4
> > cores assigned to the secondary process.
> > > > >
> > > > > I have tried to assign cores 1-5  to the secondary process and also
> > tried other core assignment policies, but the error still happens
> > rte_eal_init() with more than 4 cores.
> > > >
> > > > It would be nice to see both command lines. I am not sure I can help
> > more all I can do is suggest some ideas to look at.
> > > >
> > > > Does machine B have the same number and type of NICs? Use ‘lspci |
> > grep Ethernet’ to get a list of all Ethernet devices on both machines.
> > > >
> > > > What is the number of hugepages you have allocated for both machines.
> > > >
> > > > Also look at the cpu_layout.py script to see why adding the 5th core
> > would be different on the two machines and try to make them the same.
> > > >
> > > > >
> > > > > Regards,
> > > > > Kai
> > > > >
> > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
> > keith.wiles@intel.com> wrote:
> > > > >
> > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > > > > >
> > > > > > Hi, there
> > > > > >
> > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with
> > Linux
> > > > > > kernel version 3.8.0-30.
> > > > > >
> > > > > > I have a master process and a secondary process. When I run the
> > secondary
> > > > > > process with less than or equal to 4 cores, it works correctly.
> > Such as:
> > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > > > >
> > > > > > However, there will be error in the rte_eal_init if I assign more
> > than 4
> > > > > > cores.
> > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > > > >
> > > > > > EAL: Cannot mmap device resource file
> > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
> > 0x7fff65bfc000
> > > > > > EAL: Error - exiting with code: 1
> > > > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > > > >
> > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on the
> > secondary process.
> > > > >
> > > > > You did not show the primary process command line, but the if you
> > use 1-5 then you can only give primary process -l 6-7 or two cores. It is
> > always a reasonable thing is to leave core zero for linux to use.
> > > > >
> > > > > Also it could be you ran out of memory or hugepages you allocated to
> > the system.
> > > > >
> > > > > >
> > > > > > Anyone knows why this happens?
> > > > > >
> > > > > > Thanks a lot,
> > > > > > Kai Zhang
> > > > >
> > > > > Regards,
> > > > > Keith
> > > > >
> > > > >
> > > >
> > > > Regards,
> > > > Keith
> > > >
> > > >
> > > >
> > >
> > > Regards,
> > > Keith
> >
> > Regards,
> > Keith
> >
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-13  9:58                   ` Van Haaren, Harry
@ 2017-03-13 10:59                     ` Kai Zhang
  0 siblings, 0 replies; 16+ messages in thread
From: Kai Zhang @ 2017-03-13 10:59 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: Wiles, Keith, users

Thank you for your info, Harry.

Even if the ASLR is the root reason, I don't think DPDK should expect users
to disable it to use the primary/secondary model. Is it possible for the
DPDK team to check this issue and fix the bug?

Regards,
Kai

On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry <
harry.van.haaren@intel.com> wrote:

> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang
> > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap
> device resource
> > file
> >
> > Yes, my application is somewhat special and should run with the
> > primary/secondary mode. I will search for the way to turn of the random
> > page mapping and try it.
>
>
> You're searching for ASLR, or Address Space Layout Randomization.
>
> Some useful links regarding ASLR, DPDK and Linux;
> http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_
> support.html#multi-process-limitations
> http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-aslr-
> address-space-layout-randomization
> http://dpdk.org/ml/archives/dev/2015-June/019364.html
>
> Please note that ASLR is a security feature of the OS, think twice before
> disabling it.
>
>
> Hope that helps, -Harry
>
>
> > Thanks for your help :)
> >
> > Regards,
> > Kai
> >
> > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
> >
> > >
> > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
> > > >
> > > >
> > > > Your application may be attaching to the same port for each core.
> > > Normally this means the each core could be allocating memory and the
> 4th
> > > core just goes over the amount of memory you have reserved.
> > > >
> > > > I don't think so. Because the error is in the rte_eal_init(), which
> is
> > > executed in the first line of the main() function. At the time, the
> other
> > > threads are not even launched.
> > > >
> > > > Is it possible to consider this as a bug in DPDK?
> > >
> > > One more thing, I run Pktgen as two processes all of the time. The big
> > > difference is I do not run in primary and secondary modes. I run two
> > > different instances of pktgen at the same time without seeing this type
> > > problem. If the failure is associated with primary/secondary
> application
> > > model, then it could be a bug in that code as a lot of syncing up
> between
> > > the two processes needs to be done because of memory/device sharing.
> One
> > > problem with P/S applications is memory needs to be mapped at the same
> > > address between the processes and Linux has the Random memory mapping
> > > builtin for security reasons. I forget the name of the mode in Linux to
> > > turn off the random page mapping and google is not work for me ATM.
> > >
> > > Does your application require running as a primary/secondary
> application?
> > >
> > > >
> > > > Regards,
> > > > Kai
> > > >
> > > >
> > > > >
> > > > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:
> 00.0/resource0
> > > to address: 0x7fff65bfc000
> > > > > EAL: Error - exiting with code: 1
> > > > >   Cause: Requested device 0000:02:00.0 cannot be used
> > > > >
> > > > > Regards,
> > > > > Kai
> > > > >
> > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com>
> wrote:
> > > > >
> > > > > Command line:
> > > > > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> > > > >
> > > > > The configurations are as follows:
> > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
> > > 02:00.0,    2048 x 4k huge page
> > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> > > XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> > > XL710 for 40GbE QSFP+ (rev 02)
> > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > > Connection (rev 03)
> > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > > Connection (rev 03)
> > > > >         Socket 0
> > > > > --------
> > > > > Core 0  [0, 12]
> > > > > Core 1  [1, 13]
> > > > > Core 2  [2, 14]
> > > > > Core 3  [3, 15]
> > > > > Core 4  [4, 16]
> > > > > Core 5  [5, 17]
> > > > > Core 8  [6, 18]
> > > > > Core 9  [7, 19]
> > > > > Core 10 [8, 20]
> > > > > Core 11 [9, 21]
> > > > > Core 12 [10, 22]
> > > > > Core 13 [11, 23]
> > > > >
> > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,
> 2048 x
> > > 4k huge page
> > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > > Connection (rev 03)
> > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > > Connection (rev 03)
> > > > >         Socket 0        Socket 1
> > > > >         --------        --------
> > > > > Core 0  [0, 20]         [10, 30]
> > > > > Core 1  [1, 21]         [11, 31]
> > > > > Core 2  [2, 22]         [12, 32]
> > > > > Core 3  [3, 23]         [13, 33]
> > > > > Core 4  [4, 24]         [14, 34]
> > > > > Core 8  [5, 25]         [15, 35]
> > > > > Core 9  [6, 26]         [16, 36]
> > > > > Core 10 [7, 27]         [17, 37]
> > > > > Core 11 [8, 28]         [18, 38]
> > > > > Core 12 [9, 29]         [19, 39]
> > > > >
> > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC and
> run
> > > my program with locally generated packets. But I am using other DPDK
> > > features, such as memory sharing and message passing. Maybe that is the
> > > reason it works correctly? I can only access machine B remotely, so I
> am
> > > unable to install a NIC on it. I have another PC that is used as a
> client
> > > that only has four cores, which also cannot be used for verification...
> > > > >
> > > > > Regards,
> > > > > Kai
> > > > >
> > > > >
> > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <
> keith.wiles@intel.com>
> > > wrote:
> > > > >
> > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
> > > > > >
> > > > > > Hi Keith,
> > > > > >
> > > > > > Thank you for your reply.
> > > > > >
> > > > > > I have tested my program on two machines
> > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > > > > >
> > > > > > I am very sure that the primary process uses different cores with
> > > the secondary process. The strange thing is that my program works
> correctly
> > > on machine B. But on machine A, the above issue happens with more than
> 4
> > > cores assigned to the secondary process.
> > > > > >
> > > > > > I have tried to assign cores 1-5  to the secondary process and
> also
> > > tried other core assignment policies, but the error still happens
> > > rte_eal_init() with more than 4 cores.
> > > > >
> > > > > It would be nice to see both command lines. I am not sure I can
> help
> > > more all I can do is suggest some ideas to look at.
> > > > >
> > > > > Does machine B have the same number and type of NICs? Use ‘lspci |
> > > grep Ethernet’ to get a list of all Ethernet devices on both machines.
> > > > >
> > > > > What is the number of hugepages you have allocated for both
> machines.
> > > > >
> > > > > Also look at the cpu_layout.py script to see why adding the 5th
> core
> > > would be different on the two machines and try to make them the same.
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Kai
> > > > > >
> > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
> > > keith.wiles@intel.com> wrote:
> > > > > >
> > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com>
> wrote:
> > > > > > >
> > > > > > > Hi, there
> > > > > > >
> > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611
> with
> > > Linux
> > > > > > > kernel version 3.8.0-30.
> > > > > > >
> > > > > > > I have a master process and a secondary process. When I run the
> > > secondary
> > > > > > > process with less than or equal to 4 cores, it works correctly.
> > > Such as:
> > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > > > > >
> > > > > > > However, there will be error in the rte_eal_init if I assign
> more
> > > than 4
> > > > > > > cores.
> > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > > > > >
> > > > > > > EAL: Cannot mmap device resource file
> > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
> > > 0x7fff65bfc000
> > > > > > > EAL: Error - exiting with code: 1
> > > > > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > > > > >
> > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on the
> > > secondary process.
> > > > > >
> > > > > > You did not show the primary process command line, but the if you
> > > use 1-5 then you can only give primary process -l 6-7 or two cores. It
> is
> > > always a reasonable thing is to leave core zero for linux to use.
> > > > > >
> > > > > > Also it could be you ran out of memory or hugepages you
> allocated to
> > > the system.
> > > > > >
> > > > > > >
> > > > > > > Anyone knows why this happens?
> > > > > > >
> > > > > > > Thanks a lot,
> > > > > > > Kai Zhang
> > > > > >
> > > > > > Regards,
> > > > > > Keith
> > > > > >
> > > > > >
> > > > >
> > > > > Regards,
> > > > > Keith
> > > > >
> > > > >
> > > > >
> > > >
> > > > Regards,
> > > > Keith
> > >
> > > Regards,
> > > Keith
> > >
> > >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
  2017-03-15 15:48 David Coen
@ 2017-03-15 17:02 ` Kai Zhang
  0 siblings, 0 replies; 16+ messages in thread
From: Kai Zhang @ 2017-03-15 17:02 UTC (permalink / raw)
  To: David Coen; +Cc: Wiles, Keith, Van Haaren, Harry, users

Hi, David

I got your point now ;-)

I don't know why, but my program works correctly now... even without
setting the base-virtaddr. I will try your method when the error happens
again.

Thanks for your detailed explanation, I really appreciate it.

Regards,
Kai

On Wed, Mar 15, 2017 at 11:48 PM, David Coen <d.coen@resi.it> wrote:

> Hi Kai,
>
> I'm sure that it's not necessary to use --base-virtaddr option on the
> secondary process.
>
> Referring to addresses of your last post, to fully try my method,
> you should set your real primary application with
>
>  --base-virtaddr=0x7ffef5000000
>
> that is the smallest address I can see in your post (see below "Region 5").
>
> I hope this could help you,
>
> David
> ------------------------------------------------------------
> ------------------------------
> Da: Kai Zhang [mailto:kay21s@gmail.com]
> Inviato: mercoledì 15 marzo 2017 05:14
> A: Wiles, Keith
> Cc: David Coen; Van Haaren, Harry
> Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap
> device resource file
>
> I have also tried to use the same option --base-virtaddr=0x7fffdc200000 on
> the secondary process. But it does not help.
>
> Thank you, Keith. I think I can try to figure it out first, if the
> internal is not too complicated ...
>
> Regards,
> Kai
>
>
> On Wed, Mar 15, 2017 at 11:28 AM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> > On Mar 15, 2017, at 10:56 AM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > Hi David,
> >
> > I find your method not work for me :-(
> >
> > The dummy primary application shows the following regions:
> > Region 0: virtual address [0x7fffdc200000, 0x7ffff5a00000], physical
> address 0x59c00000, len 427819008
> > Region 1: virtual address [0x7fffdbe00000, 0x7fffdc000000], physical
> address 0x7b600000, len 2097152
> > Region 2: virtual address [0x7fffdba00000, 0x7fffdbc00000], physical
> address 0xf25800000, len 2097152
> > Region 3: virtual address [0x7ffef5800000, 0x7fffdb800000], physical
> address 0xf25c00000, len 3858759680
> > Region 4: virtual address [0x7ffef5400000, 0x7ffef5600000], physical
> address 0x100f000000, len 2097152
> > Region 5: virtual address [0x7ffef5000000, 0x7ffef5200000], physical
> address 0x1024000000, len 2097152
> >
> > I set the real primary application with --base-virtaddr=0x7fffdc200000
> >
> > The error in the secondary process is:
> > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0
> to address: 0x7ffff2bfd000
>
> This one seems like a hardware issue around the PCI device can not be set
> to the correct. The path above is the device path to the resource0 value in
> the PCI and the system is having problem mapping the address. The secondary
> process, does it need to have the same option setting the base address?
>
> Sorry, not much help here as I not able to focus on the problem more
> because I am off site at a week long meeting.
>
> >
> > It seems that they are not accessing the same region.
> >
> > Regards,
> > Kai
> >
> > On Wed, Mar 15, 2017 at 12:47 AM, David Coen <d.coen@resi.it> wrote:
> > Hi Kai, I agree with you.
> >
> >
> >
> > Hi have quite the same issue, a primary application and a secondary one
> running, sometimes, with more than 4 cores.
> >
> > I'm using DPDK 16.11 on RedHat 6.7.
> >
> >
> >
> > Till now I solved in this way:
> >
> >
> >
> > - Disabling ASLR by adding those two lines to "/etc/sysctl.conf":
> >
> >                 # Disable Address Space Layout Randomization (ASLR)
> (needed by DPDK)
> >
> >                 kernel.randomize_va_space = 0
> >
> >
> >
> > - Getting virtual address of the first (the one with the minimum address
> value) memory segment returned from the function
> "rte_eal_get_physmem_layout ()", called from a "dummy" primary application
> used only to get this address.
> >
> > - Passing the above virtual address as a parameter for the "real"
> primary application using the " --base-virtaddr= " dpdk command line
> option. When secondary app starts, it all goes well with the specified base
> address.
> >
> >
> >
> > I've tested this solution on different servers and it's always ok.
> >
> > I think that there is some kind of limitation on DPDK primary/secondary
> initialization process that could be improved.
> >
> >
> >
> > Regards,
> >
> > David
> >
> >
> >
> > -----Messaggio originale-----
> >
> > Da: Kai Zhang [mailto:kay21s@gmail.com]
> >
> > Inviato: lunedì 13 marzo 2017 11:59
> >
> > A: Van Haaren, Harry
> >
> > Cc: Wiles, Keith; users@dpdk.org
> >
> > Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap
> device resource file
> >
> >
> >
> > Thank you for your info, Harry.
> >
> >
> >
> > Even if the ASLR is the root reason, I don't think DPDK should expect
> users to disable it to use the primary/secondary model. Is it possible for
> the DPDK team to check this issue and fix the bug?
> >
> >
> >
> > Regards,
> >
> > Kai
> >
> >
> >
> > On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry <
> harry.van.haaren@intel.com> wrote:
> >
> >
> >
> > > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang
> >
> > > > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot
> >
> > > > mmap
> >
> > > device resource
> >
> > > > file
> >
> > > >
> >
> > > > Yes, my application is somewhat special and should run with the
> >
> > > > primary/secondary mode. I will search for the way to turn of the
> >
> > > > random page mapping and try it.
> >
> > >
> >
> > >
> >
> > > You're searching for ASLR, or Address Space Layout Randomization.
> >
> > >
> >
> > > Some useful links regarding ASLR, DPDK and Linux;
> >
> > > http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_
> >
> > > support.html#multi-process-limitations
> >
> > > http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-as
> >
> > > lr-
> >
> > > address-space-layout-randomization
> >
> > > http://dpdk.org/ml/archives/dev/2015-June/019364.html
> >
> > >
> >
> > > Please note that ASLR is a security feature of the OS, think twice
> >
> > > before disabling it.
> >
> > >
> >
> > >
> >
> > > Hope that helps, -Harry
> >
> > >
> >
> > >
> >
> > > > Thanks for your help :)
> >
> > > >
> >
> > > > Regards,
> >
> > > > Kai
> >
> > > >
> >
> > > > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith
> >
> > > > <keith.wiles@intel.com>
> >
> > > wrote:
> >
> > > >
> >
> > > > >
> >
> > > > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
> >
> > > > > >
> >
> > > > > >
> >
> > > > > > Your application may be attaching to the same port for each core.
> >
> > > > > Normally this means the each core could be allocating memory and
> >
> > > > > the
> >
> > > 4th
> >
> > > > > core just goes over the amount of memory you have reserved.
> >
> > > > > >
> >
> > > > > > I don't think so. Because the error is in the rte_eal_init(),
> >
> > > > > > which
> >
> > > is
> >
> > > > > executed in the first line of the main() function. At the time,
> >
> > > > > the
> >
> > > other
> >
> > > > > threads are not even launched.
> >
> > > > > >
> >
> > > > > > Is it possible to consider this as a bug in DPDK?
> >
> > > > >
> >
> > > > > One more thing, I run Pktgen as two processes all of the time. The
> >
> > > > > big difference is I do not run in primary and secondary modes. I
> >
> > > > > run two different instances of pktgen at the same time without
> >
> > > > > seeing this type problem. If the failure is associated with
> >
> > > > > primary/secondary
> >
> > > application
> >
> > > > > model, then it could be a bug in that code as a lot of syncing up
> >
> > > between
> >
> > > > > the two processes needs to be done because of memory/device
> sharing.
> >
> > > One
> >
> > > > > problem with P/S applications is memory needs to be mapped at the
> >
> > > > > same address between the processes and Linux has the Random memory
> >
> > > > > mapping builtin for security reasons. I forget the name of the
> >
> > > > > mode in Linux to turn off the random page mapping and google is
> not work for me ATM.
> >
> > > > >
> >
> > > > > Does your application require running as a primary/secondary
> >
> > > application?
> >
> > > > >
> >
> > > > > >
> >
> > > > > > Regards,
> >
> > > > > > Kai
> >
> > > > > >
> >
> > > > > >
> >
> > > > > > >
> >
> > > > > > > EAL: Cannot mmap device resource file
> /sys/bus/pci/devices/0000:02:
> >
> > > 00.0/resource0
> >
> > > > > to address: 0x7fff65bfc000
> >
> > > > > > > EAL: Error - exiting with code: 1
> >
> > > > > > >   Cause: Requested device 0000:02:00.0 cannot be used
> >
> > > > > > >
> >
> > > > > > > Regards,
> >
> > > > > > > Kai
> >
> > > > > > >
> >
> > > > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com>
> >
> > > wrote:
> >
> > > > > > >
> >
> > > > > > > Command line:
> >
> > > > > > > primary:      sudo ./primary -l 0,1,2,3 -n 4
> --proc-type=primary
> >
> > > > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4
> >
> > > > > > > --proc-type=secondary
> >
> > > > > > >
> >
> > > > > > > The configurations are as follows:
> >
> > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
> >
> > > > > 02:00.0,    2048 x 4k huge page
> >
> > > > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet
> >
> > > > > > > Controller
> >
> > > > > XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> >
> > > > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet
> >
> > > > > > > Controller
> >
> > > > > XL710 for 40GbE QSFP+ (rev 02)
> >
> > > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
> >
> > > > > > > Network
> >
> > > > > Connection (rev 03)
> >
> > > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
> >
> > > > > > > Network
> >
> > > > > Connection (rev 03)
> >
> > > > > > >         Socket 0
> >
> > > > > > > --------
> >
> > > > > > > Core 0  [0, 12]
> >
> > > > > > > Core 1  [1, 13]
> >
> > > > > > > Core 2  [2, 14]
> >
> > > > > > > Core 3  [3, 15]
> >
> > > > > > > Core 4  [4, 16]
> >
> > > > > > > Core 5  [5, 17]
> >
> > > > > > > Core 8  [6, 18]
> >
> > > > > > > Core 9  [7, 19]
> >
> > > > > > > Core 10 [8, 20]
> >
> > > > > > > Core 11 [9, 21]
> >
> > > > > > > Core 12 [10, 22]
> >
> > > > > > > Core 13 [11, 23]
> >
> > > > > > >
> >
> > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,
> >
> > > 2048 x
> >
> > > > > 4k huge page
> >
> > > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
> >
> > > > > > > Network
> >
> > > > > Connection (rev 03)
> >
> > > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
> >
> > > > > > > Network
> >
> > > > > Connection (rev 03)
> >
> > > > > > >         Socket 0        Socket 1
> >
> > > > > > >         --------        --------
> >
> > > > > > > Core 0  [0, 20]         [10, 30]
> >
> > > > > > > Core 1  [1, 21]         [11, 31]
> >
> > > > > > > Core 2  [2, 22]         [12, 32]
> >
> > > > > > > Core 3  [3, 23]         [13, 33]
> >
> > > > > > > Core 4  [4, 24]         [14, 34]
> >
> > > > > > > Core 8  [5, 25]         [15, 35]
> >
> > > > > > > Core 9  [6, 26]         [16, 36]
> >
> > > > > > > Core 10 [7, 27]         [17, 37]
> >
> > > > > > > Core 11 [8, 28]         [18, 38]
> >
> > > > > > > Core 12 [9, 29]         [19, 39]
> >
> > > > > > >
> >
> > > > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC
> >
> > > > > > > and
> >
> > > run
> >
> > > > > my program with locally generated packets. But I am using other
> >
> > > > > DPDK features, such as memory sharing and message passing. Maybe
> >
> > > > > that is the reason it works correctly? I can only access machine B
> >
> > > > > remotely, so I
> >
> > > am
> >
> > > > > unable to install a NIC on it. I have another PC that is used as a
> >
> > > client
> >
> > > > > that only has four cores, which also cannot be used for
> verification...
> >
> > > > > > >
> >
> > > > > > > Regards,
> >
> > > > > > > Kai
> >
> > > > > > >
> >
> > > > > > >
> >
> > > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <
> >
> > > keith.wiles@intel.com>
> >
> > > > > wrote:
> >
> > > > > > >
> >
> > > > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com>
> wrote:
> >
> > > > > > > >
> >
> > > > > > > > Hi Keith,
> >
> > > > > > > >
> >
> > > > > > > > Thank you for your reply.
> >
> > > > > > > >
> >
> > > > > > > > I have tested my program on two machines
> >
> > > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> >
> > > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> >
> > > > > > > >
> >
> > > > > > > > I am very sure that the primary process uses different cores
> >
> > > > > > > > with
> >
> > > > > the secondary process. The strange thing is that my program works
> >
> > > correctly
> >
> > > > > on machine B. But on machine A, the above issue happens with more
> >
> > > > > than
> >
> > > 4
> >
> > > > > cores assigned to the secondary process.
> >
> > > > > > > >
> >
> > > > > > > > I have tried to assign cores 1-5  to the secondary process
> >
> > > > > > > > and
> >
> > > also
> >
> > > > > tried other core assignment policies, but the error still happens
> >
> > > > > rte_eal_init() with more than 4 cores.
> >
> > > > > > >
> >
> > > > > > > It would be nice to see both command lines. I am not sure I
> >
> > > > > > > can
> >
> > > help
> >
> > > > > more all I can do is suggest some ideas to look at.
> >
> > > > > > >
> >
> > > > > > > Does machine B have the same number and type of NICs? Use
> >
> > > > > > > ‘lspci |
> >
> > > > > grep Ethernet’ to get a list of all Ethernet devices on both
> machines.
> >
> > > > > > >
> >
> > > > > > > What is the number of hugepages you have allocated for both
> >
> > > machines.
> >
> > > > > > >
> >
> > > > > > > Also look at the cpu_layout.py script to see why adding the
> >
> > > > > > > 5th
> >
> > > core
> >
> > > > > would be different on the two machines and try to make them the
> same.
> >
> > > > > > >
> >
> > > > > > > >
> >
> > > > > > > > Regards,
> >
> > > > > > > > Kai
> >
> > > > > > > >
> >
> > > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
> >
> > > > > keith.wiles@intel.com> wrote:
> >
> > > > > > > >
> >
> > > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com>
> >
> > > wrote:
> >
> > > > > > > > >
> >
> > > > > > > > > Hi, there
> >
> > > > > > > > >
> >
> > > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS
> >
> > > > > > > > > 7.3.1611
> >
> > > with
> >
> > > > > Linux
> >
> > > > > > > > > kernel version 3.8.0-30.
> >
> > > > > > > > >
> >
> > > > > > > > > I have a master process and a secondary process. When I
> >
> > > > > > > > > run the
> >
> > > > > secondary
> >
> > > > > > > > > process with less than or equal to 4 cores, it works
> correctly.
> >
> > > > > Such as:
> >
> > > > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary sudo
> >
> > > > > > > > > ./program -c 0x0f -n 4 --proc-type=secondary
> >
> > > > > > > > >
> >
> > > > > > > > > However, there will be error in the rte_eal_init if I
> >
> > > > > > > > > assign
> >
> > > more
> >
> > > > > than 4
> >
> > > > > > > > > cores.
> >
> > > > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> >
> > > > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> >
> > > > > > > > >
> >
> > > > > > > > > EAL: Cannot mmap device resource file
> >
> > > > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
> >
> > > > > 0x7fff65bfc000
> >
> > > > > > > > > EAL: Error - exiting with code: 1
> >
> > > > > > > > >  Cause: Requested device 0000:02:00.0 cannot be used
> >
> > > > > > > >
> >
> > > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on
> >
> > > > > > > > the
> >
> > > > > secondary process.
> >
> > > > > > > >
> >
> > > > > > > > You did not show the primary process command line, but the
> >
> > > > > > > > if you
> >
> > > > > use 1-5 then you can only give primary process -l 6-7 or two
> >
> > > > > cores. It
> >
> > > is
> >
> > > > > always a reasonable thing is to leave core zero for linux to use.
> >
> > > > > > > >
> >
> > > > > > > > Also it could be you ran out of memory or hugepages you
> >
> > > allocated to
> >
> > > > > the system.
> >
> > > > > > > >
> >
> > > > > > > > >
> >
> > > > > > > > > Anyone knows why this happens?
> >
> > > > > > > > >
> >
> > > > > > > > > Thanks a lot,
> >
> > > > > > > > > Kai Zhang
> >
> > > > > > > >
> >
> > > > > > > > Regards,
> >
> > > > > > > > Keith
> >
> > > > > > > >
> >
> > > > > > > >
> >
> > > > > > >
> >
> > > > > > > Regards,
> >
> > > > > > > Keith
> >
> > > > > > >
> >
> > > > > > >
> >
> > > > > > >
> >
> > > > > >
> >
> > > > > > Regards,
> >
> > > > > > Keith
> >
> > > > >
> >
> > > > > Regards,
> >
> > > > > Keith
> >
> > > > >
> >
> > > > >
> >
> > >
> >
> >
> >
> >
> >
> >
> Regards,
> Keith
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
@ 2017-03-15 15:48 David Coen
  2017-03-15 17:02 ` Kai Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: David Coen @ 2017-03-15 15:48 UTC (permalink / raw)
  To: 'Kai Zhang', 'Wiles, Keith'
  Cc: 'Van Haaren, Harry', users

Hi Kai,

I'm sure that it's not necessary to use --base-virtaddr option on the secondary process.

Referring to addresses of your last post, to fully try my method, 
you should set your real primary application with

 --base-virtaddr=0x7ffef5000000

that is the smallest address I can see in your post (see below "Region 5").

I hope this could help you,

David
------------------------------------------------------------------------------------------
Da: Kai Zhang [mailto:kay21s@gmail.com] 
Inviato: mercoledì 15 marzo 2017 05:14
A: Wiles, Keith
Cc: David Coen; Van Haaren, Harry
Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file

I have also tried to use the same option --base-virtaddr=0x7fffdc200000 on the secondary process. But it does not help.

Thank you, Keith. I think I can try to figure it out first, if the internal is not too complicated ...

Regards,
Kai


On Wed, Mar 15, 2017 at 11:28 AM, Wiles, Keith <keith.wiles@intel.com> wrote:

> On Mar 15, 2017, at 10:56 AM, Kai Zhang <kay21s@gmail.com> wrote:
>
> Hi David,
>
> I find your method not work for me :-(
>
> The dummy primary application shows the following regions:
> Region 0: virtual address [0x7fffdc200000, 0x7ffff5a00000], physical address 0x59c00000, len 427819008
> Region 1: virtual address [0x7fffdbe00000, 0x7fffdc000000], physical address 0x7b600000, len 2097152
> Region 2: virtual address [0x7fffdba00000, 0x7fffdbc00000], physical address 0xf25800000, len 2097152
> Region 3: virtual address [0x7ffef5800000, 0x7fffdb800000], physical address 0xf25c00000, len 3858759680
> Region 4: virtual address [0x7ffef5400000, 0x7ffef5600000], physical address 0x100f000000, len 2097152
> Region 5: virtual address [0x7ffef5000000, 0x7ffef5200000], physical address 0x1024000000, len 2097152
>
> I set the real primary application with --base-virtaddr=0x7fffdc200000
>
> The error in the secondary process is:
> EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7ffff2bfd000

This one seems like a hardware issue around the PCI device can not be set to the correct. The path above is the device path to the resource0 value in the PCI and the system is having problem mapping the address. The secondary process, does it need to have the same option setting the base address?

Sorry, not much help here as I not able to focus on the problem more because I am off site at a week long meeting.

>
> It seems that they are not accessing the same region.
>
> Regards,
> Kai
>
> On Wed, Mar 15, 2017 at 12:47 AM, David Coen <d.coen@resi.it> wrote:
> Hi Kai, I agree with you.
>
>
>
> Hi have quite the same issue, a primary application and a secondary one running, sometimes, with more than 4 cores.
>
> I'm using DPDK 16.11 on RedHat 6.7.
>
>
>
> Till now I solved in this way:
>
>
>
> - Disabling ASLR by adding those two lines to "/etc/sysctl.conf":
>
>                 # Disable Address Space Layout Randomization (ASLR)  (needed by DPDK)
>
>                 kernel.randomize_va_space = 0
>
>
>
> - Getting virtual address of the first (the one with the minimum address value) memory segment returned from the function "rte_eal_get_physmem_layout ()", called from a "dummy" primary application used only to get this address.
>
> - Passing the above virtual address as a parameter for the "real" primary application using the " --base-virtaddr= " dpdk command line option. When secondary app starts, it all goes well with the specified base address.
>
>
>
> I've tested this solution on different servers and it's always ok.
>
> I think that there is some kind of limitation on DPDK primary/secondary initialization process that could be improved.
>
>
>
> Regards,
>
> David
>
>
>
> -----Messaggio originale-----
>
> Da: Kai Zhang [mailto:kay21s@gmail.com]
>
> Inviato: lunedì 13 marzo 2017 11:59
>
> A: Van Haaren, Harry
>
> Cc: Wiles, Keith; users@dpdk.org
>
> Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
>
>
>
> Thank you for your info, Harry.
>
>
>
> Even if the ASLR is the root reason, I don't think DPDK should expect users to disable it to use the primary/secondary model. Is it possible for the DPDK team to check this issue and fix the bug?
>
>
>
> Regards,
>
> Kai
>
>
>
> On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry < harry.van.haaren@intel.com> wrote:
>
>
>
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Kai Zhang
>
> > > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot
>
> > > mmap
>
> > device resource
>
> > > file
>
> > >
>
> > > Yes, my application is somewhat special and should run with the
>
> > > primary/secondary mode. I will search for the way to turn of the
>
> > > random page mapping and try it.
>
> >
>
> >
>
> > You're searching for ASLR, or Address Space Layout Randomization.
>
> >
>
> > Some useful links regarding ASLR, DPDK and Linux;
>
> > http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_
>
> > support.html#multi-process-limitations
>
> > http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-as
>
> > lr-
>
> > address-space-layout-randomization
>
> > http://dpdk.org/ml/archives/dev/2015-June/019364.html
>
> >
>
> > Please note that ASLR is a security feature of the OS, think twice
>
> > before disabling it.
>
> >
>
> >
>
> > Hope that helps, -Harry
>
> >
>
> >
>
> > > Thanks for your help :)
>
> > >
>
> > > Regards,
>
> > > Kai
>
> > >
>
> > > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith
>
> > > <keith.wiles@intel.com>
>
> > wrote:
>
> > >
>
> > > >
>
> > > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s@gmail.com> wrote:
>
> > > > >
>
> > > > >
>
> > > > > Your application may be attaching to the same port for each core.
>
> > > > Normally this means the each core could be allocating memory and
>
> > > > the
>
> > 4th
>
> > > > core just goes over the amount of memory you have reserved.
>
> > > > >
>
> > > > > I don't think so. Because the error is in the rte_eal_init(),
>
> > > > > which
>
> > is
>
> > > > executed in the first line of the main() function. At the time,
>
> > > > the
>
> > other
>
> > > > threads are not even launched.
>
> > > > >
>
> > > > > Is it possible to consider this as a bug in DPDK?
>
> > > >
>
> > > > One more thing, I run Pktgen as two processes all of the time. The
>
> > > > big difference is I do not run in primary and secondary modes. I
>
> > > > run two different instances of pktgen at the same time without
>
> > > > seeing this type problem. If the failure is associated with
>
> > > > primary/secondary
>
> > application
>
> > > > model, then it could be a bug in that code as a lot of syncing up
>
> > between
>
> > > > the two processes needs to be done because of memory/device sharing.
>
> > One
>
> > > > problem with P/S applications is memory needs to be mapped at the
>
> > > > same address between the processes and Linux has the Random memory
>
> > > > mapping builtin for security reasons. I forget the name of the
>
> > > > mode in Linux to turn off the random page mapping and google is not work for me ATM.
>
> > > >
>
> > > > Does your application require running as a primary/secondary
>
> > application?
>
> > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Kai
>
> > > > >
>
> > > > >
>
> > > > > >
>
> > > > > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:
>
> > 00.0/resource0
>
> > > > to address: 0x7fff65bfc000
>
> > > > > > EAL: Error - exiting with code: 1
>
> > > > > >   Cause: Requested device 0000:02:00.0 cannot be used
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Kai
>
> > > > > >
>
> > > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s@gmail.com>
>
> > wrote:
>
> > > > > >
>
> > > > > > Command line:
>
> > > > > > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
>
> > > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4
>
> > > > > > --proc-type=secondary
>
> > > > > >
>
> > > > > > The configurations are as follows:
>
> > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
>
> > > > 02:00.0,    2048 x 4k huge page
>
> > > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet
>
> > > > > > Controller
>
> > > > XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
>
> > > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet
>
> > > > > > Controller
>
> > > > XL710 for 40GbE QSFP+ (rev 02)
>
> > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > >         Socket 0
>
> > > > > > --------
>
> > > > > > Core 0  [0, 12]
>
> > > > > > Core 1  [1, 13]
>
> > > > > > Core 2  [2, 14]
>
> > > > > > Core 3  [3, 15]
>
> > > > > > Core 4  [4, 16]
>
> > > > > > Core 5  [5, 17]
>
> > > > > > Core 8  [6, 18]
>
> > > > > > Core 9  [7, 19]
>
> > > > > > Core 10 [8, 20]
>
> > > > > > Core 11 [9, 21]
>
> > > > > > Core 12 [10, 22]
>
> > > > > > Core 13 [11, 23]
>
> > > > > >
>
> > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,
>
> > 2048 x
>
> > > > 4k huge page
>
> > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > >         Socket 0        Socket 1
>
> > > > > >         --------        --------
>
> > > > > > Core 0  [0, 20]         [10, 30]
>
> > > > > > Core 1  [1, 21]         [11, 31]
>
> > > > > > Core 2  [2, 22]         [12, 32]
>
> > > > > > Core 3  [3, 23]         [13, 33]
>
> > > > > > Core 4  [4, 24]         [14, 34]
>
> > > > > > Core 8  [5, 25]         [15, 35]
>
> > > > > > Core 9  [6, 26]         [16, 36]
>
> > > > > > Core 10 [7, 27]         [17, 37]
>
> > > > > > Core 11 [8, 28]         [18, 38]
>
> > > > > > Core 12 [9, 29]         [19, 39]
>
> > > > > >
>
> > > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC
>
> > > > > > and
>
> > run
>
> > > > my program with locally generated packets. But I am using other
>
> > > > DPDK features, such as memory sharing and message passing. Maybe
>
> > > > that is the reason it works correctly? I can only access machine B
>
> > > > remotely, so I
>
> > am
>
> > > > unable to install a NIC on it. I have another PC that is used as a
>
> > client
>
> > > > that only has four cores, which also cannot be used for verification...
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Kai
>
> > > > > >
>
> > > > > >
>
> > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <
>
> > keith.wiles@intel.com>
>
> > > > wrote:
>
> > > > > >
>
> > > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s@gmail.com> wrote:
>
> > > > > > >
>
> > > > > > > Hi Keith,
>
> > > > > > >
>
> > > > > > > Thank you for your reply.
>
> > > > > > >
>
> > > > > > > I have tested my program on two machines
>
> > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
>
> > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
>
> > > > > > >
>
> > > > > > > I am very sure that the primary process uses different cores
>
> > > > > > > with
>
> > > > the secondary process. The strange thing is that my program works
>
> > correctly
>
> > > > on machine B. But on machine A, the above issue happens with more
>
> > > > than
>
> > 4
>
> > > > cores assigned to the secondary process.
>
> > > > > > >
>
> > > > > > > I have tried to assign cores 1-5  to the secondary process
>
> > > > > > > and
>
> > also
>
> > > > tried other core assignment policies, but the error still happens
>
> > > > rte_eal_init() with more than 4 cores.
>
> > > > > >
>
> > > > > > It would be nice to see both command lines. I am not sure I
>
> > > > > > can
>
> > help
>
> > > > more all I can do is suggest some ideas to look at.
>
> > > > > >
>
> > > > > > Does machine B have the same number and type of NICs? Use
>
> > > > > > ‘lspci |
>
> > > > grep Ethernet’ to get a list of all Ethernet devices on both machines.
>
> > > > > >
>
> > > > > > What is the number of hugepages you have allocated for both
>
> > machines.
>
> > > > > >
>
> > > > > > Also look at the cpu_layout.py script to see why adding the
>
> > > > > > 5th
>
> > core
>
> > > > would be different on the two machines and try to make them the same.
>
> > > > > >
>
> > > > > > >
>
> > > > > > > Regards,
>
> > > > > > > Kai
>
> > > > > > >
>
> > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
>
> > > > keith.wiles@intel.com> wrote:
>
> > > > > > >
>
> > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s@gmail.com>
>
> > wrote:
>
> > > > > > > >
>
> > > > > > > > Hi, there
>
> > > > > > > >
>
> > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS
>
> > > > > > > > 7.3.1611
>
> > with
>
> > > > Linux
>
> > > > > > > > kernel version 3.8.0-30.
>
> > > > > > > >
>
> > > > > > > > I have a master process and a secondary process. When I
>
> > > > > > > > run the
>
> > > > secondary
>
> > > > > > > > process with less than or equal to 4 cores, it works correctly.
>
> > > > Such as:
>
> > > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary sudo
>
> > > > > > > > ./program -c 0x0f -n 4 --proc-type=secondary
>
> > > > > > > >
>
> > > > > > > > However, there will be error in the rte_eal_init if I
>
> > > > > > > > assign
>
> > more
>
> > > > than 4
>
> > > > > > > > cores.
>
> > > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
>
> > > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
>
> > > > > > > >
>
> > > > > > > > EAL: Cannot mmap device resource file
>
> > > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
>
> > > > 0x7fff65bfc000
>
> > > > > > > > EAL: Error - exiting with code: 1
>
> > > > > > > >  Cause: Requested device 0000:02:00.0 cannot be used
>
> > > > > > >
>
> > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on
>
> > > > > > > the
>
> > > > secondary process.
>
> > > > > > >
>
> > > > > > > You did not show the primary process command line, but the
>
> > > > > > > if you
>
> > > > use 1-5 then you can only give primary process -l 6-7 or two
>
> > > > cores. It
>
> > is
>
> > > > always a reasonable thing is to leave core zero for linux to use.
>
> > > > > > >
>
> > > > > > > Also it could be you ran out of memory or hugepages you
>
> > allocated to
>
> > > > the system.
>
> > > > > > >
>
> > > > > > > >
>
> > > > > > > > Anyone knows why this happens?
>
> > > > > > > >
>
> > > > > > > > Thanks a lot,
>
> > > > > > > > Kai Zhang
>
> > > > > > >
>
> > > > > > > Regards,
>
> > > > > > > Keith
>
> > > > > > >
>
> > > > > > >
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Keith
>
> > > > > >
>
> > > > > >
>
> > > > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Keith
>
> > > >
>
> > > > Regards,
>
> > > > Keith
>
> > > >
>
> > > >
>
> >
>
>
>
>
>
>
Regards,
Keith

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
@ 2017-03-15 14:56 David Coen
  0 siblings, 0 replies; 16+ messages in thread
From: David Coen @ 2017-03-15 14:56 UTC (permalink / raw)
  To: users

Hi Kai, I agree with you.

 

Hi have quite the same issue, a primary application and a secondary one
running, sometimes, with more than 4 cores.

I'm using DPDK 16.11 on RedHat 6.7.

 

Till now I solved in this way:

- Disabling ASLR by adding those two lines to "/etc/sysctl.conf":

                # Disable Address Space Layout Randomization (ASLR)  (needed
by DPDK)

               kernel.randomize_va_space = 0 

- Getting virtual address of the first (the one with the minimum address
value) memory segment returned from the function "rte_eal_get_physmem_layout
()", called from a "dummy" primary     application used only to get this
address.

- Passing the above virtual address as a parameter for the "real" primary
application using the " --base-virtaddr= " dpdk command line option. When
secondary app starts, it all goes well with the specified base address.

 

I've tested this solution on different servers and it's always ok.

I think that there is some kind of limitation on DPDK primary/secondary
initialization process that could be improved.

 

Regards,

David

 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-03-15 17:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-11  3:35 [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file Kai Zhang
2017-03-11 14:52 ` Wiles, Keith
2017-03-11 15:45   ` Kai Zhang
2017-03-11 18:59     ` Wiles, Keith
2017-03-12  3:21       ` Kai Zhang
2017-03-12  3:29         ` Kai Zhang
2017-03-12 10:32           ` Wiles, Keith
2017-03-12 10:39             ` Kai Zhang
2017-03-12 18:55               ` Wiles, Keith
2017-03-12 19:24               ` Wiles, Keith
2017-03-12 23:44                 ` Kai Zhang
2017-03-13  9:58                   ` Van Haaren, Harry
2017-03-13 10:59                     ` Kai Zhang
2017-03-15 14:56 David Coen
2017-03-15 15:48 David Coen
2017-03-15 17:02 ` Kai Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).