From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 253661E2B for ; Mon, 20 Aug 2018 18:03:07 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Aug 2018 09:03:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,265,1531810800"; d="scan'208";a="64574802" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga008.fm.intel.com with ESMTP; 20 Aug 2018 09:03:07 -0700 Received: from fmsmsx151.amr.corp.intel.com (10.18.125.4) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 20 Aug 2018 09:03:06 -0700 Received: from fmsmsx117.amr.corp.intel.com ([169.254.3.210]) by FMSMSX151.amr.corp.intel.com ([169.254.7.228]) with mapi id 14.03.0319.002; Mon, 20 Aug 2018 09:03:06 -0700 From: "Wiles, Keith" To: Matteo Lanzuisi CC: Olivier Matz , "dev@dpdk.org" Thread-Topic: [dpdk-dev] Multi-thread mempool usage Thread-Index: AQHULv41lPaH6s1QH0eKSa+Fpr+0laS+KD0AgACPhACACoktAIAAFPkA Date: Mon, 20 Aug 2018 16:03:05 +0000 Message-ID: References: <20180813215424.cesdejskukrrt4zf@neon> <18bbb971-40f1-bba3-3cea-83e7eff94e43@resi.it> In-Reply-To: <18bbb971-40f1-bba3-3cea-83e7eff94e43@resi.it> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.255.228.109] Content-Type: text/plain; charset="us-ascii" Content-ID: <8046A95F483531428D5A3D748731E8FE@intel.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] Multi-thread mempool usage X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2018 16:03:08 -0000 > On Aug 20, 2018, at 9:47 AM, Matteo Lanzuisi wrote: >=20 > Hello Olivier, >=20 > Il 13/08/2018 23:54, Olivier Matz ha scritto: >> Hello Matteo, >>=20 >> On Mon, Aug 13, 2018 at 03:20:44PM +0200, Matteo Lanzuisi wrote: >>> Any suggestion? any idea about this behaviour? >>>=20 >>> Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto: >>>> Hi all, >>>>=20 >>>> recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm) on >>>> RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an >>>> application from RH6 to RH7. On RH6 I used dpdk-2.2.0. >>>>=20 >>>> This application is made up by one or more threads (each one on a >>>> different logical core) reading packets from i40e interfaces. >>>>=20 >>>> Each thread can call the following code lines when receiving a specifi= c >>>> packet: >>>>=20 >>>> RTE_LCORE_FOREACH(lcore_id) >>>> { >>>> result =3D >>>> rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, (VOID_= P >>>> *) &new_work); // mempools are created one for each logical cor= e >>>> if (((uint64_t)(new_work)) < 0x7f0000000000) >>>> printf("Result %d, lcore di partenza %u, lcore di ricezion= e >>>> %u, pointer %p\n", result, rte_lcore_id(), lcore_id, new_work); // >>>> debug print, on my server it should never happen but with multi-thread >>>> happens always on the last logical core!!!! >> Here, checking the value of new_work looks wrong to me, before >> ensuring that result =3D=3D 0. At least, new_work should be set to >> NULL before calling rte_mempool_get(). > I put the check after result =3D=3D 0, and just before the rte_mempool_ge= t() I set new_work to NULL, but nothing changed. > The first time something goes wrong the print is >=20 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, pointer= 0x880002 >=20 > Sorry for the italian language print :) it means that application is send= ing a message from the logical core 1 to the logical core 2, it's the 635th= time, the result is 0 and the pointer is 0x880002 while all pointers befor= e were 0x7ffxxxxxx. > One strange thing is that this behaviour happens always from the logical = core 1 to the logical core 2 when the counter is 635!!! (Sending messages f= rom 2 to 1 or 1 to 1 or 2 to 2 is all ok) > Another strange thing is that pointers from counter 636 to 640 are NULL, = and from 641 begin again to be good... as you can see here following (I att= ached the result of a test without the "if" of the check on the value of ne= w_work, and only for messages from the lcore 1 to lcore 2) >=20 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 627, pointer= 0x7ffe8a261880 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 628, pointer= 0x7ffe8a261900 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 629, pointer= 0x7ffe8a261980 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 630, pointer= 0x7ffe8a261a00 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 631, pointer= 0x7ffe8a261a80 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 632, pointer= 0x7ffe8a261b00 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 633, pointer= 0x7ffe8a261b80 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 634, pointer= 0x7ffe8a261c00 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, pointer= 0x880002 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 636, pointer= (nil) > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 637, pointer= (nil) > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 638, pointer= (nil) > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 639, pointer= (nil) > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 640, pointer= (nil) This sure does seem like a memory over write problem, with maybe a memset(0= ) in the mix as well. Have you tried using hardware break points with the 0= x880002 or 0x00 being written into this range? > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 641, pointer= 0x7ffe8a262b00 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 642, pointer= 0x7ffe8a262b80 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 643, pointer= 0x7ffe8a262d00 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 644, pointer= 0x7ffe8a262d80 > Result 0, lcore di partenza 1, lcore di ricezione 2, counter 645, pointer= 0x7ffe8a262e00 >=20 >>=20 >>>> if (result =3D=3D 0) >>>> { >>>> new_work->command =3D command; // usage of the memory gott= en >>>> from the mempool... <<<<<- here is where the application crashes!!!! >> Do you know why it crashes? Is it that new_work is NULL? > The pointer is not NULL but is not sequential to the others (0x880002 as = written before in this email). It seems to be in a memory zone not in DPDK = hugepages or something similar. > If I use this pointer the application crashes. >>=20 >> Can you check how the mempool is initialized? It should be in multi >> consumer and depending on your use case, single or multi producer. > Here is the initialization of this mempool >=20 > cea_main_cmd_pool[i] =3D rte_mempool_create(pool_name, > (unsigned int) (ikco_cmd_buffers - 1), // 65536 - 1 in this c= ase > sizeof (CEA_DECODE_CMD_T), // 24 bytes > 0, 0, > rte_pktmbuf_pool_init, NULL, > rte_pktmbuf_init, NULL, > rte_socket_id(), 0); >>=20 >> Another thing that could be checked: at all the places where you >> return your work object to the mempool, you should add a check >> that it is not NULL. Or just enabling RTE_LIBRTE_MEMPOOL_DEBUG >> could do the trick: it adds some additional checks when doing >> mempool operations. > I think I have already answered this point with the prints up in the emai= l. >=20 > What do you think about this behaviour? >=20 > Regards, > Matteo >>=20 >>>> result =3D >>>> rte_ring_enqueue(cea_main_lcore_conf[lcore_id].de_conf.cmd_ring, >>>> (VOID_P) new_work); // enqueues the gotten buffer on the rings of a= ll >>>> lcores >>>> // check on result value ... >>>> } >>>> else >>>> { >>>> // do something if result !=3D 0 ... >>>> } >>>> } >>>>=20 >>>> This code worked perfectly (never had an issue) on dpdk-2.2.0, while i= f >>>> I use more than 1 thread doing these operations on dpdk-17.11 it happe= ns >>>> that after some times the "new_work" pointer is not a good one, and th= e >>>> application crashes when using that pointer. >>>>=20 >>>> It seems that these lines cannot be used by more than one thread >>>> simultaneously. I also used many 2017 and 2018 dpdk versions without >>>> success. >>>>=20 >>>> Is this code possible on the new dpdk versions? Or have I to change my >>>> application so that this code is called just by one lcore at a time? >> Assuming the mempool is properly initialized, I don't see any reason >> why it would not work. There has been a lot of changes in mempool betwee= n >> dpdk-2.2.0 and dpdk-17.11, but this behavior should remain the same. >>=20 >> If the comments above do not help to solve the issue, it could be helpfu= l >> to try to reproduce the issue in a minimal program, so we can help to >> review it. >>=20 >> Regards, >> Olivier Regards, Keith