From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <anatoly.burakov@intel.com>
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id 552B22B95
 for <dev@dpdk.org>; Mon,  8 Apr 2019 15:47:37 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 08 Apr 2019 06:47:36 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.60,325,1549958400"; d="scan'208";a="289718913"
Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.252.26.147])
 ([10.252.26.147])
 by orsmga004.jf.intel.com with ESMTP; 08 Apr 2019 06:47:34 -0700
To: "Mohamed Mahmoud (mmahmoud)" <mmahmoud@cisco.com>,
 "Jim Holland (jimholla)" <jimholla@cisco.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "Min Tang (mtang2)" <mtang2@cisco.com>
References: <BN8PR11MB36348D6984E65C7AB40DF63BC8520@BN8PR11MB3634.namprd11.prod.outlook.com>
 <839ef3df-c0d3-4705-f510-be517ef8d99f@intel.com>
 <59EFA01F-D0BA-4372-A8D1-DC438BE810D1@cisco.com>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <10192495-2735-ada9-acf6-5586eaab6370@intel.com>
Date: Mon, 8 Apr 2019 14:47:31 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <59EFA01F-D0BA-4372-A8D1-DC438BE810D1@cisco.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [dpdk-dev] Secondary process crash in rte_eal_init
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Apr 2019 13:47:37 -0000

On 08-Apr-19 2:07 PM, Mohamed Mahmoud (mmahmoud) wrote:
> Hi Anatoly:
> 
> For the secondary process all we are setting is the –proc-type set to 
> secondary, maybe point me to your test that might give me some clues, 
> also How I can generate detailed crash log ?
> 
> When I checked the init code there are very few places that can cause 
> the panic
> 
>    if(pipe(lcore_config[i].pipe_master2slave) < 0)
> 
> rte_panic("Cannot create pipe\n");
> 
> if(pipe(lcore_config[i].pipe_slave2master) < 0)
> 
> rte_panic("Cannot create pipe\n");
> 
>          lcore_config[i].state = WAIT;
> 
> /* create a thread for each lcore */
> 
>          ret = pthread_create(&lcore_config[i].thread_id, NULL,
> 
>                       eal_thread_loop, NULL);
> 
> if(ret != 0)
> 
> rte_panic("Cannot create thread\n");

Hi,

That would be the most recent DPDK version. The issue i was responding 
to was for version 17.08, which has quite a bit more panic's in init code.

In order for me to diagnose the issue i would have to at least know 
where the panic happens. That information is missing from both reports, 
so i cannot begin to imagine what went wrong, let alone why.

> 
> Thanks,
> 
> Mohamed
> 
> *From: *"Burakov, Anatoly" <anatoly.burakov@intel.com>
> *Date: *Monday, April 8, 2019 at 4:42 AM
> *To: *"Jim Holland (jimholla)" <jimholla@cisco.com>, "dev@dpdk.org" 
> <dev@dpdk.org>
> *Cc: *"Mohamed Mahmoud (mmahmoud)" <mmahmoud@cisco.com>, "Min Tang 
> (mtang2)" <mtang2@cisco.com>
> *Subject: *Re: [dpdk-dev] Secondary process crash in rte_eal_init
> 
> On 06-Apr-19 3:26 PM, Jim Holland (jimholla) wrote:
> 
>     Hi,
> 
>     We're seeing something similar to what is described in thread below.
>     Our product uses dpdk 17.08. Was there ever a resolution to Souvik's
>     issue?
> 
>     Thanks...Jim
> 
>     On 01/04/2015 06:00, Dey, Souvik wrote:
> 
>         Hi All,
> 
>                             We have a single primary application with
>         multiple secondary applications launched on the same cpu core.
>         When the system boots up for the first time ,the primary comes
>         up followed by the secondary process everything works fine. But
>         in between if I try to restart the secondary processes without
>         restart the primary, the secondary process crashes with the
>         below stack trace. Any idea what is going wrong out here.
> 
>         ============================================
> 
>         Stack Trace of the Crash :
> 
>         (gdb) bt
> 
>         #0 0x00007f4dad6e01b5 in raise () from /lib/libc.so.6
> 
>         #1 0x00007f4dad6e2fc0 in abort () from /lib/libc.so.6
> 
>         #2 0x0000000000402545 in __rte_panic ()
> 
>         #3 0x00000000007353f4 in rte_eal_init ()
> 
>         #4 0x0000000000403133 in main (argc=10, argv=0x7fff2c52fea8)
> 
>         ============================================
> 
>         --
> 
>         Regards,
> 
>         Souvik
> 
>     Hi Dey,
> 
>     Could you try to reproduce the issue with one of the multi-process
> 
>     examples from DPDK?
> 
>     Also you could provide some info such as command line options when
> 
>     running both primary/secondary apps, dpdk version, etc...
> 
>     Sergio
> 
> The crash you are seeing is a panic, but i don't see *what* is
> 
> panic'ing, so i cannot diagnose the issue from stack trace alone.
> 
> In general, restarting secondary processes during primary's runtime is a
> 
> supported case, and is indeed the basis for one of our unit tests
> 
> (granted, most secondary process spinups in our unit tests are supposed
> 
> to fail, but some will pass), and i have at various points done
> 
> "secondary process spam" tests firing up tens of thousands of
> 
> secondaries as well. So, crashes on secondary initialization with valid
> 
> parameters is certainly not normal behavior.
> 
> A more detailed crash log would be good to have.
> 
> -- 
> 
> Thanks,
> 
> Anatoly
> 


-- 
Thanks,
Anatoly

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id 0858FA0096
	for <public@inbox.dpdk.org>; Mon,  8 Apr 2019 15:47:39 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id CBD482C54;
	Mon,  8 Apr 2019 15:47:38 +0200 (CEST)
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id 552B22B95
 for <dev@dpdk.org>; Mon,  8 Apr 2019 15:47:37 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 08 Apr 2019 06:47:36 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.60,325,1549958400"; d="scan'208";a="289718913"
Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.252.26.147])
 ([10.252.26.147])
 by orsmga004.jf.intel.com with ESMTP; 08 Apr 2019 06:47:34 -0700
To: "Mohamed Mahmoud (mmahmoud)" <mmahmoud@cisco.com>,
 "Jim Holland (jimholla)" <jimholla@cisco.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "Min Tang (mtang2)" <mtang2@cisco.com>
References: <BN8PR11MB36348D6984E65C7AB40DF63BC8520@BN8PR11MB3634.namprd11.prod.outlook.com>
 <839ef3df-c0d3-4705-f510-be517ef8d99f@intel.com>
 <59EFA01F-D0BA-4372-A8D1-DC438BE810D1@cisco.com>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <10192495-2735-ada9-acf6-5586eaab6370@intel.com>
Date: Mon, 8 Apr 2019 14:47:31 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <59EFA01F-D0BA-4372-A8D1-DC438BE810D1@cisco.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [dpdk-dev] Secondary process crash in rte_eal_init
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190408134731.LKb4J8aJgpcoUqdThfRKlzMnFnBpLaerMOCnsHqWcag@z>

On 08-Apr-19 2:07 PM, Mohamed Mahmoud (mmahmoud) wrote:
> Hi Anatoly:
> 
> For the secondary process all we are setting is the –proc-type set to 
> secondary, maybe point me to your test that might give me some clues, 
> also How I can generate detailed crash log ?
> 
> When I checked the init code there are very few places that can cause 
> the panic
> 
>    if(pipe(lcore_config[i].pipe_master2slave) < 0)
> 
> rte_panic("Cannot create pipe\n");
> 
> if(pipe(lcore_config[i].pipe_slave2master) < 0)
> 
> rte_panic("Cannot create pipe\n");
> 
>          lcore_config[i].state = WAIT;
> 
> /* create a thread for each lcore */
> 
>          ret = pthread_create(&lcore_config[i].thread_id, NULL,
> 
>                       eal_thread_loop, NULL);
> 
> if(ret != 0)
> 
> rte_panic("Cannot create thread\n");

Hi,

That would be the most recent DPDK version. The issue i was responding 
to was for version 17.08, which has quite a bit more panic's in init code.

In order for me to diagnose the issue i would have to at least know 
where the panic happens. That information is missing from both reports, 
so i cannot begin to imagine what went wrong, let alone why.

> 
> Thanks,
> 
> Mohamed
> 
> *From: *"Burakov, Anatoly" <anatoly.burakov@intel.com>
> *Date: *Monday, April 8, 2019 at 4:42 AM
> *To: *"Jim Holland (jimholla)" <jimholla@cisco.com>, "dev@dpdk.org" 
> <dev@dpdk.org>
> *Cc: *"Mohamed Mahmoud (mmahmoud)" <mmahmoud@cisco.com>, "Min Tang 
> (mtang2)" <mtang2@cisco.com>
> *Subject: *Re: [dpdk-dev] Secondary process crash in rte_eal_init
> 
> On 06-Apr-19 3:26 PM, Jim Holland (jimholla) wrote:
> 
>     Hi,
> 
>     We're seeing something similar to what is described in thread below.
>     Our product uses dpdk 17.08. Was there ever a resolution to Souvik's
>     issue?
> 
>     Thanks...Jim
> 
>     On 01/04/2015 06:00, Dey, Souvik wrote:
> 
>         Hi All,
> 
>                             We have a single primary application with
>         multiple secondary applications launched on the same cpu core.
>         When the system boots up for the first time ,the primary comes
>         up followed by the secondary process everything works fine. But
>         in between if I try to restart the secondary processes without
>         restart the primary, the secondary process crashes with the
>         below stack trace. Any idea what is going wrong out here.
> 
>         ============================================
> 
>         Stack Trace of the Crash :
> 
>         (gdb) bt
> 
>         #0 0x00007f4dad6e01b5 in raise () from /lib/libc.so.6
> 
>         #1 0x00007f4dad6e2fc0 in abort () from /lib/libc.so.6
> 
>         #2 0x0000000000402545 in __rte_panic ()
> 
>         #3 0x00000000007353f4 in rte_eal_init ()
> 
>         #4 0x0000000000403133 in main (argc=10, argv=0x7fff2c52fea8)
> 
>         ============================================
> 
>         --
> 
>         Regards,
> 
>         Souvik
> 
>     Hi Dey,
> 
>     Could you try to reproduce the issue with one of the multi-process
> 
>     examples from DPDK?
> 
>     Also you could provide some info such as command line options when
> 
>     running both primary/secondary apps, dpdk version, etc...
> 
>     Sergio
> 
> The crash you are seeing is a panic, but i don't see *what* is
> 
> panic'ing, so i cannot diagnose the issue from stack trace alone.
> 
> In general, restarting secondary processes during primary's runtime is a
> 
> supported case, and is indeed the basis for one of our unit tests
> 
> (granted, most secondary process spinups in our unit tests are supposed
> 
> to fail, but some will pass), and i have at various points done
> 
> "secondary process spam" tests firing up tens of thousands of
> 
> secondaries as well. So, crashes on secondary initialization with valid
> 
> parameters is certainly not normal behavior.
> 
> A more detailed crash log would be good to have.
> 
> -- 
> 
> Thanks,
> 
> Anatoly
> 


-- 
Thanks,
Anatoly