From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) by dpdk.org (Postfix) with ESMTP id EDA3910A3 for ; Mon, 29 Oct 2018 12:39:27 +0100 (CET) Received: by mail-ed1-f68.google.com with SMTP id c1-v6so6971375ede.5 for ; Mon, 29 Oct 2018 04:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/ziSyMX9OmVVvYEd6e8j6bqdJsGvojnPy1N60ocF9yg=; b=Q6yUzQUQohTngR50iwDUqKwFXaZQVPmA1wX/Ot+ubSNo97zM4/Vz94gN40WeMsD7vT EoRo58wV7z/feoLi5FYLNCcIvR9Zfa0ANuWx4fzIt6DOIPlsEDNzLqGSQUmS0cOW2ulx KqdGDcRh0xpd79chKmakI0LSqXBLvO2y25sPb5NfZHHxO9OBbFfD1VR+XqJTlt2O599u yxquLN/Sj2zCN7n2PFf52veqGRdqke7+0H3QL8IXn32IiDcHzpmXdNTYtArjrHbfxCRe Q26PvX+YNYozVBgTR3yAFGjq8ALbbgFQPXwH8sRddK787r5HeuGGBfw1Z8lFWHzHJ7uT PDbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/ziSyMX9OmVVvYEd6e8j6bqdJsGvojnPy1N60ocF9yg=; b=geDh2YN0pCi8X1YK8hba2sziFf/deN2xhMqqG33QUbHAeII2ukeoxfrmZL0l5+zqc7 2GRy/hHn9xG96koOXQDFcS0OBAHxcQdNclAao4d/JoTVjj1RXrXrhQRGK3piT0aV0o4+ sXJtPRA7psRdgnbaYbr0WQl/1QGMNOqZNxOz2yfAK4xcgPcLkC3myq68T9FKS4EAW35y IycAXhwx58jBd4Prc8aMLuLYbCbsamL8nEZhgoWlnFlk4106o8evaSMUpsaSVzrdMy5z yYtFkh58t2nWctr+1+AzrjVxjldYmxRgTCzzRXlz1XY/7w4vmmUhzer5+05JqtqXWp70 kUOQ== X-Gm-Message-State: AGRZ1gLJ2LWdwA/UA12ZDoKPYBU/XJIzSJ41+p5rpbFCyxEM1UowiEue 1sQuLu09CJ/0mP9TfGzenKC8UKUZt1CxFY9vDj8dlw== X-Google-Smtp-Source: AJdET5d9AX2r8ZQ7rXTe81rQs2rJk9kYJfIyXnneQjIaMWLM2slzgHlGyInqHCZTCm393STSqonNSBsk8L/Z7ItO5pQ= X-Received: by 2002:a17:906:70c3:: with SMTP id g3-v6mr10215934ejk.194.1540813167489; Mon, 29 Oct 2018 04:39:27 -0700 (PDT) MIME-Version: 1.0 References: <1538743527-8285-1-git-send-email-alejandro.lucero@netronome.com> <1593678.TTmrtHRuFR@xps> <2DBBFF226F7CF64BAFCA79B681719D954502B7E1@shsmsx102.ccr.corp.intel.com> <1651382.pnTT7vZl36@xps> In-Reply-To: From: Alejandro Lucero Date: Mon, 29 Oct 2018 11:39:14 +0000 Message-ID: To: Thomas Monjalon Cc: lei.a.yao@intel.com, dev , "Xu, Qian Q" , xueqin.lin@intel.com, "Burakov, Anatoly" , Ferruh Yigit Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 11:39:28 -0000 I got a patch that solves a bug when calling rte_eal_dma_mask using the mask instead of the maskbits. However, this does not solves the deadlock. Interestingly, the problem looks like a compiler one. Calling rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if you modify the call like this: *diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c* *index 12dcedf5c..69b26e464 100644* *--- a/lib/librte_eal/common/eal_common_memory.c* *+++ b/lib/librte_eal/common/eal_common_memory.c* @@ -462,7 +462,7 @@ rte_eal_check_dma_mask(uint8_t maskbits) /* create dma mask */ mask = ~((1ULL << maskbits) - 1); - if (rte_memseg_walk(check_iova, &mask)) + if (!rte_memseg_walk(check_iova, &mask)) /* * Dma mask precludes hugepage usage. * This device can not be used and we do not need to keep it works, although the value returned to the invoker changes, of course. But the point here is it should be the same behaviour when calling rte_memseg_walk than before and it is not. Anatoly, maybe you can see something I can not. On Mon, Oct 29, 2018 at 10:15 AM Alejandro Lucero < alejandro.lucero@netronome.com> wrote: > Apologies. Forget my previous email. Just using the wrong repo. > > Looking at solving this asap. > > On Mon, Oct 29, 2018 at 10:11 AM Alejandro Lucero < > alejandro.lucero@netronome.com> wrote: > >> I know what is going on. >> >> In patchset version 3 I forgot to remove an old code. Anatoly spotted >> that and I was going to send another version for fixing it. Before sending >> the new version I saw that report about a problem with dma_mask and I'm >> afraid I did not send another version with the fix ... >> >> Yao, can you try with next patch?: >> >> *diff --git a/lib/librte_eal/common/eal_common_memory.c >> b/lib/librte_eal/common/eal_common_memory.c* >> >> *index ef656bbad..26adf46c0 100644* >> >> *--- a/lib/librte_eal/common/eal_common_memory.c* >> >> *+++ b/lib/librte_eal/common/eal_common_memory.c* >> >> @@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits) >> >> return -1; >> >> } >> >> >> >> - /* keep the more restricted maskbit */ >> >> - if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits) >> >> - mcfg->dma_maskbits = maskbits; >> >> - >> >> /* create dma mask */ >> >> mask = ~((1ULL << maskbits) - 1); >> >> On Mon, Oct 29, 2018 at 9:48 AM Thomas Monjalon >> wrote: >> >>> 29/10/2018 10:36, Yao, Lei A: >>> > From: Thomas Monjalon [mailto:thomas@monjalon.net] >>> > > 29/10/2018 09:23, Yao, Lei A: >>> > > > Hi, Lucero, Thomas >>> > > > >>> > > > This patch set will cause deadlock during memory initialization. >>> > > > rte_memseg_walk and try_expand_heap both will lock >>> > > > the file &mcfg->memory_hotplug_lock. So dead lock will occur. >>> > > > >>> > > > #0 rte_memseg_walk >>> > > > #1 <-rte_eal_check_dma_mask >>> > > > #2 <-alloc_pages_on_heap >>> > > > #3 <-try_expand_heap_primary >>> > > > #4 <-try_expand_heap >>> > > > >>> > > > Log as following: >>> > > > EAL: TSC frequency is ~2494156 KHz >>> > > > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0]) >>> > > > [New Thread 0x7ffff5e0d700 (LWP 330350)] >>> > > > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1]) >>> > > > EAL: Trying to obtain current memory policy. >>> > > > EAL: Setting policy MPOL_PREFERRED for socket 0 >>> > > > EAL: Restoring previous memory policy: 0 >>> > > > >>> > > > Could you have a check on this? A lot of test cases in our >>> validation >>> > > > team fail because of this. Thanks a lot! >>> > > >>> > > Can we just call rte_memseg_walk_thread_unsafe()? >>> > > >>> > > +Cc Anatoly >>> > >>> > Hi, Thomas >>> > >>> > I change to rte_memseg_walk_thread_unsafe(), still >>> > Can't work. >>> > >>> > EAL: Setting policy MPOL_PREFERRED for socket 0 >>> > EAL: Restoring previous memory policy: 0 >>> > EAL: memseg iova 140000000, len 40000000, out of range >>> > EAL: using dma mask ffffffffffffffff >>> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask >>> > EAL: Trying to obtain current memory policy. >>> > EAL: Setting policy MPOL_PREFERRED for socket 1 >>> > EAL: Restoring previous memory policy: 0 >>> > EAL: memseg iova 1bc0000000, len 40000000, out of range >>> > EAL: using dma mask ffffffffffffffff >>> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask >>> > error allocating rte services array >>> > EAL: FATAL: rte_service_init() failed >>> > EAL: rte_service_init() failed >>> > PANIC in main(): >>> >>> I think it is showing there are at least 2 issues: >>> 1/ deadlock >>> 2/ allocation does not comply with mask check (out of range) >>> >>> >>>