* Issue setting up the DPDK development with non-privileged user @ 2022-08-31 14:10 Boris Ouretskey 2022-08-31 16:01 ` Dmitry Kozlyuk 0 siblings, 1 reply; 7+ messages in thread From: Boris Ouretskey @ 2022-08-31 14:10 UTC (permalink / raw) To: users [-- Attachment #1: Type: text/plain, Size: 9579 bytes --] Hi, We are trying to set up a development environment for DPDK with non privileged user. We are facing two issues which seem to be related (but maybe not). The platform information appears at the end of the mail (briefly: linux guest VM, on virtual box windows). 1) First issue is that after following the DPDK guide on running the application with non privileged user and adding the capabilities as advised, the ./dpdk-helloworld application still fails with EPERM. ----------------- [user@dredd examples]$ sudo setcap cap_ipc_lock,cap_sys_admin+ep ./dpdk-helloworld [user@dredd examples]$ ./dpdk-helloworld EAL: Detected CPU lcores: 4 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied EAL: FATAL: Cannot use IOVA as 'PA' since physical addresses are not available EAL: Cannot use IOVA as 'PA' since physical addresses are not availableas PANIC in main(): Cannot init EAL 5: [./dpdk-helloworld(_start+0x2e) [0x58ce4e]] 4: [/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fe4265fbcf3]] 3: [./dpdk-helloworld(main+0x42) [0x58cf87]] 2: [./dpdk-helloworld(__rte_panic+0xdb) [0xa01c59]] 1: [./dpdk-helloworld(rte_dump_stack+0x27) [0xa2fcac]] Aborted ----------------- After adding all capabilities the application seems to be working when run from command line, but of course it defies the goal of using the minimal set of permission needed. ----------------- [user@dredd examples]$ sudo setcap all+ep ./dpdk-helloworld [user@dredd examples]$ ./dpdk-helloworld EAL: Detected CPU lcores: 4 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: VFIO support initialized EAL: Using IOMMU type 8 (No-IOMMU) EAL: Ignore mapping IO port bar(2) EAL: Probe PCI driver: net_e1000_em (8086:100e) device: 0000:00:08.0 (socket 0) ----------------- First question would be if DPDK guide is not updated and there's some capabilities that need to be added in addition to what was mentioned in the guide. Or something that we are doing wrong. 2) Second issue is a more weird one. When I am trying to debug the application with gdb (or with strace) the application behaves differently It seems to pass this time, the permission check (regardless if that's "all" or specific one) but it fails within rte_mem_virt2phy function while doing the following check. /* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) return RTE_BAD_IOVA; --------------------------------- (gdb) r Starting program: /home/user/packages/dpdk/dpdk-22.03/build/examples/dpdk-helloworld --huge-dir /mnt/huge/ [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". EAL: Detected CPU lcores: 4 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK [New Thread 0x7fb88d7a9400 (LWP 12217)] EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket [New Thread 0x7fb88cfa8400 (LWP 12218)] EAL: FATAL: Cannot use IOVA as 'PA' since physical addresses are not available EAL: Cannot use IOVA as 'PA' since physical addresses are not available PANIC in main(): Cannot init EAL --------------------------------- The question is why the behaviour is different between running from command line and gdb, and what can cause the EAL to fail the check above on one side but pass the permission check that does not pass when running from command line ??? Thanks =================================== The platform information ==================================== Linux Guest On Virtual Box (Nested Vt-x enabled) ---- DPDK Version = 22.03 --- [user@dredd examples]$ ../../usertools/dpdk-devbind.py --status Network devices using DPDK-compatible driver ============================================ 0000:00:08.0 '82540EM Gigabit Ethernet Controller 100e' drv=vfio-pci unused=e1000 0000:00:09.0 '82540EM Gigabit Ethernet Controller 100e' drv=vfio-pci unused=e1000 Network devices using kernel driver =================================== 0000:00:03.0 '82540EM Gigabit Ethernet Controller 100e' if=enp0s3 drv=e1000 unused=vfio-pci *Active* --- [user@dredd examples]$ ../../usertools/dpdk-hugepages.py --show Node Pages Size Total 0 10 2Mb 20Mb Hugepages mounted on /dev/hugepages /mnt/huge ---- [user@dredd examples]$ uname -a Linux dredd 4.18.0-348.el8.x86_64 #1 SMP Tue Nov 9 06:28:28 EST 2021 x86_64 x86_64 x86_64 GNU/Linux --- [user@dredd examples]$ cat /etc/os-release NAME="AlmaLinux" VERSION="8.5 (Arctic Sphynx)" ID="almalinux" ID_LIKE="rhel centos fedora" VERSION_ID="8.5" PLATFORM_ID="platform:el8" PRETTY_NAME="AlmaLinux 8.5 (Arctic Sphynx)" ANSI_COLOR="0;34" CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos" HOME_URL="https://almalinux.org/" DOCUMENTATION_URL="https://wiki.almalinux.org/" BUG_REPORT_URL="https://bugs.almalinux.org/" ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8" ALMALINUX_MANTISBT_PROJECT_VERSION="8.5" ------- [user@dredd examples]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 165 model name : Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz stepping : 2 cpu MHz : 2303.998 cache size : 16384 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi flexpriority vpid fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit bogomips : 4607.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 165 model name : Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz stepping : 2 cpu MHz : 2303.998 cache size : 16384 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi flexpriority vpid fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit bogomips : 4607.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 165 model name : Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz stepping : 2 cpu MHz : 2303.998 cache size : 16384 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi flexpriority vpid fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit bogomips : 4607.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 165 model name : Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz stepping : 2 cpu MHz : 2303.998 cache size : 16384 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi flexpriority vpid fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit bogomips : 4607.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: [-- Attachment #2: Type: text/html, Size: 11001 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issue setting up the DPDK development with non-privileged user 2022-08-31 14:10 Issue setting up the DPDK development with non-privileged user Boris Ouretskey @ 2022-08-31 16:01 ` Dmitry Kozlyuk 2022-09-01 12:52 ` Boris Ouretskey 0 siblings, 1 reply; 7+ messages in thread From: Dmitry Kozlyuk @ 2022-08-31 16:01 UTC (permalink / raw) To: Boris Ouretskey; +Cc: users Hi Boris, 1. Tests on Ubuntu 20.04 kernel 5.4.0-96-generic show that CAP_DAC_OVERRIDE is missing from the list. Unfortunately, this capability guards quite a lot. 2. If the process lack privileges, it will see /proc/self/pagemap zero-filled. I assume you understand that using physical addresses allows to effectively bypass any OS restrictions. Please tell if CAP_DAC_OVERRIDE helps, I will update the doc. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issue setting up the DPDK development with non-privileged user 2022-08-31 16:01 ` Dmitry Kozlyuk @ 2022-09-01 12:52 ` Boris Ouretskey 2022-09-01 14:42 ` Dmitry Kozlyuk 0 siblings, 1 reply; 7+ messages in thread From: Boris Ouretskey @ 2022-09-01 12:52 UTC (permalink / raw) To: Dmitry Kozlyuk; +Cc: users [-- Attachment #1: Type: text/plain, Size: 2336 bytes --] > > Dmitry, > Thanks a lot for the reply. > 1. The things look better now, but now we still have some capabilities > left out in order for the program to run. I am myself not a kernel > programmer and asked on stackoverflow > <https://stackoverflow.com/questions/73556145/understand-which-capability-exacty-lacks-to-complete-the-operation-in-linux> > the question on how can we deduce the exact capability that failed the > check in kernel. Otherwise the process of finding the exact set can be very > irritating. May be someone here will have the idea better than guessing for > user space developers. > > ----- > [user1@dredd examples]$ sudo setcap > cap_ipc_lock,cap_sys_admin,cap_dac_override+ep ./dpdk-helloworld > [sudo] password for user1: > [user1@dredd examples]$ ./dpdk-helloworld > EAL: Detected CPU lcores: 4 > EAL: Detected NUMA nodes: 1 > EAL: Detected static linkage of DPDK > EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: VFIO support initialized > EAL: Cannot open /dev/vfio/noiommu-0: Operation not permitted > EAL: Failed to open VFIO group 0 > EAL: Requested device 0000:00:08.0 cannot be used > EAL: Cannot open /dev/vfio/noiommu-1: Operation not permitted > EAL: Failed to open VFIO group 1 > EAL: Requested device 0000:00:09.0 cannot be used > TELEMETRY: No legacy callbacks, legacy socket not created > hello from core 1 > hello from core 2 > hello from core 3 > hello from core 0 > > 2. Thanks a lot for pointing out how it works. Regarding your second note, > In my understanding, knowing physical addresses does not help any user > process lacking the corresponding privileges. Because mapping and read > permission are enforced by kernel & hardware, even knowing the physical > memory address would not help regular process reading or updating it unless > the physical page was mapped by the kernel into process virtual space with > proper permission. > > In addition it turns out that if one would like to debug DPDK or any other > executable using a special capabilities set, this set must be duplicated in > gdb (at least that's how it worked for me), otherwise it spawns debugee > with reduced capabilities set ( I guess by means of bound set). If someone > using VSCODE remote connection debug than also > > > Thanks again for the help > > > > [-- Attachment #2: Type: text/html, Size: 2867 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issue setting up the DPDK development with non-privileged user 2022-09-01 12:52 ` Boris Ouretskey @ 2022-09-01 14:42 ` Dmitry Kozlyuk 2022-09-01 19:26 ` Dmitry Kozlyuk 0 siblings, 1 reply; 7+ messages in thread From: Dmitry Kozlyuk @ 2022-09-01 14:42 UTC (permalink / raw) To: Boris Ouretskey; +Cc: users, Bruce Richardson, Burakov, Anatoly 2022-09-01 15:52 (UTC+0300), Boris Ouretskey: > > > > Dmitry, > > Thanks a lot for the reply. > > 1. The things look better now, but now we still have some capabilities > > left out in order for the program to run. I am myself not a kernel > > programmer and asked on stackoverflow > > <https://stackoverflow.com/questions/73556145/understand-which-capability-exacty-lacks-to-complete-the-operation-in-linux> > > the question on how can we deduce the exact capability that failed the > > check in kernel. Otherwise the process of finding the exact set can be very > > irritating. May be someone here will have the idea better than guessing for > > user space developers. Theoretically, one can enumerate all capabilities, give all capabilities except one to the binary, try to run it, and notice which capability removal leads to a failure. However, `setcap "all=ep $capa-ep" ./binary` did not give the correct answer to me (why?), so I did it semi-manually. I created a file with all capabilities listed, /tmp/caps. The following command sets capabilities from this file which are not commented-out using #: sudo setcap $(grep -v "^#" /tmp/caps | tr "\n" "," | sed -e 's/,$/+ep/') ./binary Then I started commenting parts of /tmp/caps (bisecting actually) until the error reproduced. > > [user1@dredd examples]$ sudo setcap > > cap_ipc_lock,cap_sys_admin,cap_dac_override+ep ./dpdk-helloworld > > [sudo] password for user1: > > [user1@dredd examples]$ ./dpdk-helloworld > > EAL: Detected CPU lcores: 4 > > EAL: Detected NUMA nodes: 1 > > EAL: Detected static linkage of DPDK > > EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket > > EAL: Selected IOVA mode 'PA' > > EAL: VFIO support initialized > > EAL: Cannot open /dev/vfio/noiommu-0: Operation not permitted > > EAL: Failed to open VFIO group 0 > > EAL: Requested device 0000:00:08.0 cannot be used > > EAL: Cannot open /dev/vfio/noiommu-1: Operation not permitted > > EAL: Failed to open VFIO group 1 > > EAL: Requested device 0000:00:09.0 cannot be used > > TELEMETRY: No legacy callbacks, legacy socket not created > > hello from core 1 > > hello from core 2 > > hello from core 3 > > hello from core 0 Bruce, Anatoly, you know VFIO the best, any suggestions? > > 2. Thanks a lot for pointing out how it works. Regarding your second note, > > In my understanding, knowing physical addresses does not help any user > > process lacking the corresponding privileges. Because mapping and read > > permission are enforced by kernel & hardware, even knowing the physical > > memory address would not help regular process reading or updating it unless > > the physical page was mapped by the kernel into process virtual space with > > proper permission. An attacker can trick HW to read or write at the specified physical address. Memory protection is implemented in MMU, and DMA bypasses MMU. I've no idea how practical this is for penetrating systems, but even an unintended bug may crash the OS this way or corrupt data. > > In addition it turns out that if one would like to debug DPDK or any other > > executable using a special capabilities set, this set must be duplicated in > > gdb (at least that's how it worked for me), otherwise it spawns debugee > > with reduced capabilities set ( I guess by means of bound set). If someone > > using VSCODE remote connection debug than also Thanks for sharing the experience. Maybe worth adding to the section about running w/o root privileges. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issue setting up the DPDK development with non-privileged user 2022-09-01 14:42 ` Dmitry Kozlyuk @ 2022-09-01 19:26 ` Dmitry Kozlyuk 2022-09-02 14:31 ` Dmitry Kozlyuk 0 siblings, 1 reply; 7+ messages in thread From: Dmitry Kozlyuk @ 2022-09-01 19:26 UTC (permalink / raw) To: Boris Ouretskey; +Cc: users, Bruce Richardson, Burakov, Anatoly 2022-09-01 17:42 (UTC+0300), Dmitry Kozlyuk: > Theoretically, one can enumerate all capabilities, give all capabilities > except one to the binary, try to run it, and notice which capability removal > leads to a failure. However, `setcap "all=ep $capa-ep" ./binary` > did not give the correct answer to me (why?), so I did it semi-manually. Aha! CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH are not orthogonal: they both allow bypassing file read permission check. I have a working script here: https://github.com/PlushBeaver/ancap In our case: ./ancap /work/_install/bin/dpdk-testpmd /bin/sh -c '/work/_install/bin/dpdk-testpmd -a 03:00.0 --iova-mode=pa --in-memory </dev/null >/dev/null 2>/dev/null' cap_sys_admin+ep cap_dac_read_search+ep NOTE: need cap_dac_override or cap_dac_read_search to bypass file read permission checks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issue setting up the DPDK development with non-privileged user 2022-09-01 19:26 ` Dmitry Kozlyuk @ 2022-09-02 14:31 ` Dmitry Kozlyuk 2022-09-03 18:18 ` Boris Ouretskey 0 siblings, 1 reply; 7+ messages in thread From: Dmitry Kozlyuk @ 2022-09-02 14:31 UTC (permalink / raw) To: Boris Ouretskey; +Cc: users 2022-09-01 22:26 (UTC+0300), Dmitry Kozlyuk: > 2022-09-01 17:42 (UTC+0300), Dmitry Kozlyuk: > > Theoretically, one can enumerate all capabilities, give all capabilities > > except one to the binary, try to run it, and notice which capability removal > > leads to a failure. However, `setcap "all=ep $capa-ep" ./binary` > > did not give the correct answer to me (why?), so I did it semi-manually. > > Aha! CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH are not orthogonal: > they both allow bypassing file read permission check. > > I have a working script here: ... Apparently, a better alternative is already out there: https://github.com/iovisor/bcc/blob/master/tools/capable_example.txt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issue setting up the DPDK development with non-privileged user 2022-09-02 14:31 ` Dmitry Kozlyuk @ 2022-09-03 18:18 ` Boris Ouretskey 0 siblings, 0 replies; 7+ messages in thread From: Boris Ouretskey @ 2022-09-03 18:18 UTC (permalink / raw) To: Dmitry Kozlyuk; +Cc: users [-- Attachment #1: Type: text/plain, Size: 1453 bytes --] With the help of bcc tools I figured out the following list of capabilities to run hello world application sudo setcap cap_ipc_lock,cap_sys_admin,cap_dac_override,cap_dac_read_search,cap_sys_rawio+ep ./dpdk-helloworld BCC toolkit is full of useful utils. My 50 cents to finish the subject. The reason for zeroing out the mapping for the unprivileged user is stated in doc and it is :- from https://www.kernel.org/doc/Documentation/vm/pagemap.txt Starting from 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. Reason: information about PFNs helps in exploiting Rowhammer vulnerability. " Thanks again for the help. On Fri, Sep 2, 2022 at 5:31 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote: > 2022-09-01 22:26 (UTC+0300), Dmitry Kozlyuk: > > 2022-09-01 17:42 (UTC+0300), Dmitry Kozlyuk: > > > Theoretically, one can enumerate all capabilities, give all > capabilities > > > except one to the binary, try to run it, and notice which capability > removal > > > leads to a failure. However, `setcap "all=ep $capa-ep" ./binary` > > > did not give the correct answer to me (why?), so I did it > semi-manually. > > > > Aha! CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH are not orthogonal: > > they both allow bypassing file read permission check. > > > > I have a working script here: ... > > Apparently, a better alternative is already out there: > > https://github.com/iovisor/bcc/blob/master/tools/capable_example.txt > [-- Attachment #2: Type: text/html, Size: 2291 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-03 18:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-31 14:10 Issue setting up the DPDK development with non-privileged user Boris Ouretskey 2022-08-31 16:01 ` Dmitry Kozlyuk 2022-09-01 12:52 ` Boris Ouretskey 2022-09-01 14:42 ` Dmitry Kozlyuk 2022-09-01 19:26 ` Dmitry Kozlyuk 2022-09-02 14:31 ` Dmitry Kozlyuk 2022-09-03 18:18 ` Boris Ouretskey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).