From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) by dpdk.org (Postfix) with ESMTP id 6D085DED for ; Tue, 30 Oct 2018 10:41:16 +0100 (CET) Received: by mail-ed1-f66.google.com with SMTP id z21-v6so9843622edb.11 for ; Tue, 30 Oct 2018 02:41:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Q6crznrUfy9kXGs/p80pFX+mfDyZGf7ky8SwI47t/WM=; b=Gpnsd3y7Qj7QTeRo2NOwNdLKFipAb24Ti7EHKpfrt97Tk0PkqTR+ifuFjbHEOqINkn rBaV9XR+2h0gJ1+ftd+B8uQt6HhzwPirud3NI1VzDlSatP48FdumL0UWDAvmEktLBzsA O7KKAJBnbwEFoUfglqotyPhpU/V3V1uw5/PDAAqhIGnv9QKLCpl/0GoaQNqtI3P7kW4M EfSGJrMEVvAEvJQlSrUhXTnFJBj0e5b9Nw2FQBHzihG3pCtkYJsRnrCXbyJP6iM2vheA Ddm3SAgiGXbVgfg/iOX1LOduI4EVKA5vq8G7lPrX7GwCs3lqswAKyljdXRaB74VelZH5 KqJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Q6crznrUfy9kXGs/p80pFX+mfDyZGf7ky8SwI47t/WM=; b=EeR61WwFX87bkyELfi0Pd2/Ao2mMqj0gQudjVeInmyiXBLApl/4ICtlTFnPIW6o4BW 8ftUfmb2/OV7atDNias+IwUwGPKN6ihIK223U5v61XA1ugu/iM73DuBraR5Fm/3GD+9D ChW1Um0UnxElouvlM8mRwKl2jax9bBjfrWTFWZ99ha2eTTWsKMcHigy7z1nIsqWrDw9E /5hmcCc/vGHA/J2g4hqSkAFn1I2VqHv0+0wpvsAi70NgBrfLgY65NsJ4F6eqvOrOdjN0 OpO6iOBVdmOHFx0SrJj21h/sjCWbBzXHCDAeoiuTrQngpEdaQHJ1FPLOnpis6FkDjg6L gABg== X-Gm-Message-State: AGRZ1gLBPKYLuPhyqF1YdifLIqjLuVpWTaeyvu4yu/Qxe+gdiSlCeN2H jU10TJZsxCtFkRWkJEqocNv06U5ndrBK2Y9p4UQW9g== X-Google-Smtp-Source: AJdET5f949bm1hv5+uDnU0hRpXq5ybjwQ3uxXTIJpg8cJ42WDGowIbaPF7DqSiwvF8fZ32/oRp4tsjZk4OHtR5KLBuo= X-Received: by 2002:a50:b704:: with SMTP id g4-v6mr16342920ede.139.1540892475980; Tue, 30 Oct 2018 02:41:15 -0700 (PDT) MIME-Version: 1.0 References: <1538743527-8285-1-git-send-email-alejandro.lucero@netronome.com> <2737161.TvyDVilZt4@xps> <2DBBFF226F7CF64BAFCA79B681719D954502B94F@shsmsx102.ccr.corp.intel.com> <0D300480287911409D9FF92C1FA2A3355B442C48@SHSMSX104.ccr.corp.intel.com> In-Reply-To: <0D300480287911409D9FF92C1FA2A3355B442C48@SHSMSX104.ccr.corp.intel.com> From: Alejandro Lucero Date: Tue, 30 Oct 2018 09:41:05 +0000 Message-ID: To: xueqin.lin@intel.com Cc: lei.a.yao@intel.com, Thomas Monjalon , dev , "Xu, Qian Q" , "Burakov, Anatoly" , Ferruh Yigit , Qi Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2018 09:41:16 -0000 On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin wrote: > Hi Lucero&Thomas, > > > > Find the patch can=E2=80=99t fix multi-process cases. > Hi, I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process. Working on this as a priority. Thanks. > Steps: > > 1. Setup primary process successfully > > ./hotplug_mp --proc-type=3Dauto > > > > 2. Fail to setup secondary process > > ./hotplug_mp --proc-type=3Dauto > > EAL: Detected 88 lcore(s) > > EAL: Detected 2 NUMA nodes > > EAL: Auto-detected process type: SECONDARY > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d2= 3 > > Segmentation fault (core dumped) > > > > More information as below: > > Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault. > > 0x0000000000597cfb in find_next (arr=3D0x7ffff7ff20a4, start=3D0, used=3D= true) > > at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264 > > 264 for (idx =3D first; idx < msk->n_masks; idx++) { > > #0 0x0000000000597cfb in find_next (arr=3D0x7ffff7ff20a4, start=3D0, > used=3Dtrue) > > at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264 > > #1 0x0000000000598573 in fbarray_find (arr=3D0x7ffff7ff20a4, start=3D0, > next=3Dtrue, > > used=3Dtrue) at > /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001 > > #2 0x000000000059929b in rte_fbarray_find_next_used (arr=3D0x7ffff7ff20a= 4, > start=3D0) > > at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018 > > #3 0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=3D0x58c401 > , > > arg=3D0x7fffffffcc38) at > /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589 > > #4 0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=3D48 '0') > > at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465 > > #5 0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=3D0x11b3d9= 0) > > at /root/dpdk/drivers/bus/pci/linux/pci.c:593 > > #6 0x00000000005b9738 in pci_devices_iommu_support_va () > > at /root/dpdk/drivers/bus/pci/linux/pci.c:626 > > #7 0x00000000005b97a7 in rte_pci_get_iommu_class () > > at /root/dpdk/drivers/bus/pci/linux/pci.c:650 > > #8 0x000000000058f1ce in rte_bus_get_iommu_class () > > at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237 > > #9 0x0000000000577c7a in rte_eal_init (argc=3D2, argv=3D0x7fffffffdf98) > > at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919 > > #10 0x000000000045dd56 in main (argc=3D2, argv=3D0x7fffffffdf98) > > at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28 > > > > > > Best regards, > > Xueqin > > > > *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com] > *Sent:* Monday, October 29, 2018 9:41 PM > *To:* Yao, Lei A > *Cc:* Thomas Monjalon ; dev ; Xu, Qian > Q ; Lin, Xueqin ; Burakov, > Anatoly ; Yigit, Ferruh > > *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mas= k > > > > > > On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A wrote: > > > > > > *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com] > *Sent:* Monday, October 29, 2018 8:56 PM > *To:* Thomas Monjalon > *Cc:* Yao, Lei A ; dev ; Xu, Qian Q < > qian.q.xu@intel.com>; Lin, Xueqin ; Burakov, > Anatoly ; Yigit, Ferruh > > *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mas= k > > > > > > On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon > wrote: > > 29/10/2018 12:39, Alejandro Lucero: > > I got a patch that solves a bug when calling rte_eal_dma_mask using the > > mask instead of the maskbits. However, this does not solves the > deadlock. > > The deadlock is a bigger concern I think. > > > > I think once the call to rte_eal_check_dma_mask uses the maskbits instead > of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock. > > > > Yao, can you try with the attached patch? > > > > Hi, Lucero > > > > This patch can fix the issue at my side. Thanks a lot > > for you quick action. > > > > > > Great! > > > > I will send an official patch with the changes. > > > > I have to say that I tested the patchset, but I think it was where > legacy_mem was still there and therefore dynamic memory allocation code n= ot > used during memory initialization. > > > > There is something that concerns me though. Using > rte_memseg_walk_thread_unsafe could be a problem under some situations > although those situations being unlikely. > > > > Usually, calling rte_eal_check_dma_mask happens during initialization. > Then it is safe to use the unsafe function for walking memsegs, but with > device hotplug and dynamic memory allocation, there exists a potential ra= ce > condition when the primary process is allocating more memory and > concurrently a device is hotplugged and a secondary process does the devi= ce > initialization. By now, this is just a problem with the NFP, and the > potential race condition window really unlikely, but I will work on this > asap. > > > > BRs > > Lei > > > > > Interestingly, the problem looks like a compiler one. Calling > > rte_memseg_walk does not return when calling inside rt_eal_dma_mask, > but if > > you modify the call like this: > > > > - if (rte_memseg_walk(check_iova, &mask)) > > + if (!rte_memseg_walk(check_iova, &mask)) > > > > it works, although the value returned to the invoker changes, of course= . > > But the point here is it should be the same behaviour when calling > > rte_memseg_walk than before and it is not. > > Anyway, the coding style requires to save the return value in a variable, > instead of nesting the call in an "if" condition. > And the "if" check should be explicitly !=3D 0 because it is not a real > boolean. > > PS: please do not top post and avoid HTML emails, thanks > >