From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65]) by dpdk.org (Postfix) with ESMTP id 92C054C91 for ; Tue, 30 Oct 2018 15:04:47 +0100 (CET) Received: by mail-ed1-f65.google.com with SMTP id w19-v6so10587767eds.1 for ; Tue, 30 Oct 2018 07:04:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OvUou8ugRRS29vLEMwT5UVpVHVrkCLMVjtn2ZgUlmYU=; b=GX3Tr++E5hlQguBfejD2JAAtxrosSH64KcNOEgl6ytoW3y3dgtimLNuuD4lr4IxI88 Hh0xczrJxdDv96CJOTP59X/g0ay1zpe+OR5h+PFv2YARlXteGiF2Y6yOpEjADAJFPVz/ t/zxp/cvPDwEINOh4/yfaBLhq1yRnjUL6d6STMZJLhaQiqvQ6Dp3BYPfyS6P+0QTyU0K s0MrS/5GUjppjKFwwUJ4wUWB589s44Og5DIEAxOuFdI0CMKQyuZIVH2Y9HPyZnHwp6u6 /0nRyVqOSmTVm6HDlajBvA8ha/pM3PJpPEeU6Rhrive3Tg/Y8nGNbWky7c4HriPDKfy5 K5bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OvUou8ugRRS29vLEMwT5UVpVHVrkCLMVjtn2ZgUlmYU=; b=SeS0o8hGHiExRVj7lMKkzvF3rh4OhusCMMP05cjzYbxhcP9DCFn77IMbWEtjj/WnW4 rm9N1fQ++9+mv84xVCv5krE8u9zGlMfWsUw3xL8r+XTJGoTGMcveUTRQoLwCVUbO06IA +OFXUIMA879aA2pakHh58JsMo10VBLpuJ7+ALmQFJKcOPNuQNoxOYtD/Cp+tzV6ajfLI 8KpIzLC5QUzy30q0lyKKR0LGJJ1PbAEJFI2XrAwTIy4t9mOZGDFFXCGcCVKTj8DTyiAh OYCPuTFPfbgIUF1Nkvcl+Y0e+p+kj9Dx3n519bg0bFZvv8dvJE9J+ppFQq4gig2bHhr6 xQdg== X-Gm-Message-State: AGRZ1gKLJfzNT7ybG20+SXjJIHfPYVC/HHwRYkF22cqEjqTRhpSGmJIS y8vFiVv7zt/2hSIIQfOZktjAG9ggupEdoGL1a3QNIA== X-Google-Smtp-Source: AJdET5fZc5WiLKhv/Ez9pD+Zyayk9vlAXlS0GwNMy8qje9lp86jFHBn9o7VME168dJPJHVRea6bj8wBiaK7XKi1VCm0= X-Received: by 2002:a50:86e7:: with SMTP id 36-v6mr16656898edu.104.1540908287110; Tue, 30 Oct 2018 07:04:47 -0700 (PDT) MIME-Version: 1.0 References: <1538743527-8285-1-git-send-email-alejandro.lucero@netronome.com> <2737161.TvyDVilZt4@xps> <2DBBFF226F7CF64BAFCA79B681719D954502B94F@shsmsx102.ccr.corp.intel.com> <0D300480287911409D9FF92C1FA2A3355B442C48@SHSMSX104.ccr.corp.intel.com> <0D300480287911409D9FF92C1FA2A3355B443027@SHSMSX104.ccr.corp.intel.com> <0D300480287911409D9FF92C1FA2A3355B443098@SHSMSX104.ccr.corp.intel.com> In-Reply-To: From: Alejandro Lucero Date: Tue, 30 Oct 2018 14:04:34 +0000 Message-ID: To: xueqin.lin@intel.com Cc: lei.a.yao@intel.com, Thomas Monjalon , dev , "Xu, Qian Q" , "Burakov, Anatoly" , Ferruh Yigit , Qi Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2018 14:04:47 -0000 On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero < alejandro.lucero@netronome.com> wrote: > > > On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin wrote= : > >> Some found on some our servers: >> >> If not add =E2=80=9Dintel_iommu=3Don iommu=3Dpt=E2=80=9D in /boot/grub2= /grub.cfg file, then >> reboot to make it effective. >> >> 18.11 rc1: Success to setup testpmd and secondary process. >> >> >> >> If add =E2=80=9Dintel_iommu=3Don iommu=3Dpt=E2=80=9D in /boot/grub2/gr= ub.cfg file, then >> reboot to make it effective. >> >> 18.11 rc1: Fail to setup testpmd and secondary process. >> >> 18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to >> setup secondary process. >> >> >> >> Maybe =E2=80=9Dintel_iommu=3Don iommu=3Dpt=E2=80=9D enable or not result= in our test gap. >> >> Most of our team servers should enable the IOMMU for VT-d and vfio test. >> >> >> > > It makes sense because the problem is when the IOVA mode is set inside > drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to > rte_eal_check_dma_mask at all. > > > >> Best regards, >> >> Xueqin >> >> >> >> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com] >> *Sent:* Tuesday, October 30, 2018 6:38 PM >> *To:* Lin, Xueqin >> *Cc:* Yao, Lei A ; Thomas Monjalon < >> thomas@monjalon.net>; dev ; Xu, Qian Q ; >> Burakov, Anatoly ; Yigit, Ferruh < >> ferruh.yigit@intel.com>; Zhang, Qi Z >> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA >> mask >> >> >> >> >> >> On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin >> wrote: >> >> Hi Lucero, >> >> >> >> No, we have reproduced multi-process issues(include symmetric_mp, >> simple_mp, hotplug_mp, multi-process unit test=E2=80=A6 )on most of our = servers. >> >> It is also strange that 1~2 servers don=E2=80=99t have the issue. >> >> >> >> >> >> Yes, you are right. I could execute it but it was due to how this proble= m >> triggers. >> >> I think I can fix this and at the same time solving properly the initial >> issue without any limitation like that potential race condition I >> mentioned. >> >> I can give you a patch to try in a couple of hours. >> >> >> > Hi Lin, Can you try the patch attached? Thanks > Thanks >> >> >> >> Bind two NNT ports or FVL ports >> >> >> >> ./build/symmetric_mp -c 4 --proc-type=3Dauto -- -p 3 --num-procs=3D4 >> --proc-id=3D1 >> >> >> >> EAL: Detected 88 lcore(s) >> >> EAL: Detected 2 NUMA nodes >> >> EAL: Auto-detected process type: SECONDARY >> >> [New Thread 0x7ffff6eda700 (LWP 90103)] >> >> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b6= 2 >> >> [New Thread 0x7ffff66d9700 (LWP 90104)] >> >> >> >> Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault. >> >> 0x00000000005566b5 in rte_fbarray_find_next_used () >> >> (gdb) bt >> >> #0 0x00000000005566b5 in rte_fbarray_find_next_used () >> >> #1 0x000000000054da9c in rte_eal_check_dma_mask () >> >> #2 0x0000000000572ae7 in pci_one_device_iommu_support_va () >> >> #3 0x0000000000573988 in rte_pci_get_iommu_class () >> >> #4 0x000000000054f743 in rte_bus_get_iommu_class () >> >> #5 0x000000000053c123 in rte_eal_init () >> >> #6 0x000000000046be2b in main () >> >> >> >> Best regards, >> >> Xueqin >> >> >> >> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com] >> *Sent:* Tuesday, October 30, 2018 5:41 PM >> *To:* Lin, Xueqin >> *Cc:* Yao, Lei A ; Thomas Monjalon < >> thomas@monjalon.net>; dev ; Xu, Qian Q ; >> Burakov, Anatoly ; Yigit, Ferruh < >> ferruh.yigit@intel.com>; Zhang, Qi Z >> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA >> mask >> >> >> >> >> >> On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin wrote= : >> >> Hi Lucero&Thomas, >> >> >> >> Find the patch can=E2=80=99t fix multi-process cases. >> >> >> >> Hi, >> >> >> >> I think it is not specifically about multiprocess but about hotplug with >> multiprocess because I can execute the symmetric_mp successfully with a >> secondary process. >> >> >> >> Working on this as a priority. >> >> >> >> Thanks. >> >> >> >> Steps: >> >> 1. Setup primary process successfully >> >> ./hotplug_mp --proc-type=3Dauto >> >> >> >> 2. Fail to setup secondary process >> >> ./hotplug_mp --proc-type=3Dauto >> >> EAL: Detected 88 lcore(s) >> >> EAL: Detected 2 NUMA nodes >> >> EAL: Auto-detected process type: SECONDARY >> >> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d= 23 >> >> Segmentation fault (core dumped) >> >> >> >> More information as below: >> >> Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault. >> >> 0x0000000000597cfb in find_next (arr=3D0x7ffff7ff20a4, start=3D0, used= =3Dtrue) >> >> at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264 >> >> 264 for (idx =3D first; idx < msk->n_masks; idx++) { >> >> #0 0x0000000000597cfb in find_next (arr=3D0x7ffff7ff20a4, start=3D0, >> used=3Dtrue) >> >> at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264 >> >> #1 0x0000000000598573 in fbarray_find (arr=3D0x7ffff7ff20a4, start=3D0, >> next=3Dtrue, >> >> used=3Dtrue) at >> /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001 >> >> #2 0x000000000059929b in rte_fbarray_find_next_used (arr=3D0x7ffff7ff20= a4, >> start=3D0) >> >> at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018 >> >> #3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=3D0x58c401 >> , >> >> arg=3D0x7fffffffcc38) at >> /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589 >> >> #4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=3D48 '0') >> >> at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465 >> >> #5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=3D0x11b3d= 90) >> >> at /root/dpdk/drivers/bus/pci/linux/pci.c:593 >> >> #6 0x00000000005b9738 in pci_devices_iommu_support_va () >> >> at /root/dpdk/drivers/bus/pci/linux/pci.c:626 >> >> #7 0x00000000005b97a7 in rte_pci_get_iommu_class () >> >> at /root/dpdk/drivers/bus/pci/linux/pci.c:650 >> >> #8 0x000000000058f1ce in rte_bus_get_iommu_class () >> >> at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237 >> >> #9 0x0000000000577c7a in rte_eal_init (argc=3D2, argv=3D0x7fffffffdf98) >> >> at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919 >> >> #10 0x000000000045dd56 in main (argc=3D2, argv=3D0x7fffffffdf98) >> >> at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28 >> >> >> >> >> >> Best regards, >> >> Xueqin >> >> >> >> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com] >> *Sent:* Monday, October 29, 2018 9:41 PM >> *To:* Yao, Lei A >> *Cc:* Thomas Monjalon ; dev ; Xu, >> Qian Q ; Lin, Xueqin ; >> Burakov, Anatoly ; Yigit, Ferruh < >> ferruh.yigit@intel.com> >> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA >> mask >> >> >> >> >> >> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A wrote: >> >> >> >> >> >> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com] >> *Sent:* Monday, October 29, 2018 8:56 PM >> *To:* Thomas Monjalon >> *Cc:* Yao, Lei A ; dev ; Xu, Qian Q < >> qian.q.xu@intel.com>; Lin, Xueqin ; Burakov, >> Anatoly ; Yigit, Ferruh < >> ferruh.yigit@intel.com> >> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA >> mask >> >> >> >> >> >> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon >> wrote: >> >> 29/10/2018 12:39, Alejandro Lucero: >> > I got a patch that solves a bug when calling rte_eal_dma_mask using th= e >> > mask instead of the maskbits. However, this does not solves the >> deadlock. >> >> The deadlock is a bigger concern I think. >> >> >> >> I think once the call to rte_eal_check_dma_mask uses the maskbits >> instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the >> deadlock. >> >> >> >> Yao, can you try with the attached patch? >> >> >> >> Hi, Lucero >> >> >> >> This patch can fix the issue at my side. Thanks a lot >> >> for you quick action. >> >> >> >> >> >> Great! >> >> >> >> I will send an official patch with the changes. >> >> >> >> I have to say that I tested the patchset, but I think it was where >> legacy_mem was still there and therefore dynamic memory allocation code = not >> used during memory initialization. >> >> >> >> There is something that concerns me though. Using >> rte_memseg_walk_thread_unsafe could be a problem under some situations >> although those situations being unlikely. >> >> >> >> Usually, calling rte_eal_check_dma_mask happens during initialization. >> Then it is safe to use the unsafe function for walking memsegs, but with >> device hotplug and dynamic memory allocation, there exists a potential r= ace >> condition when the primary process is allocating more memory and >> concurrently a device is hotplugged and a secondary process does the dev= ice >> initialization. By now, this is just a problem with the NFP, and the >> potential race condition window really unlikely, but I will work on this >> asap. >> >> >> >> BRs >> >> Lei >> >> >> >> > Interestingly, the problem looks like a compiler one. Calling >> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask, >> but if >> > you modify the call like this: >> > >> > - if (rte_memseg_walk(check_iova, &mask)) >> > + if (!rte_memseg_walk(check_iova, &mask)) >> > >> > it works, although the value returned to the invoker changes, of cours= e. >> > But the point here is it should be the same behaviour when calling >> > rte_memseg_walk than before and it is not. >> >> Anyway, the coding style requires to save the return value in a variable= , >> instead of nesting the call in an "if" condition. >> And the "if" check should be explicitly !=3D 0 because it is not a real >> boolean. >> >> PS: please do not top post and avoid HTML emails, thanks >> >>