From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id D68BC3DC for ; Tue, 21 Aug 2018 14:51:39 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Aug 2018 05:51:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,269,1531810800"; d="scan'208";a="66683155" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga007.jf.intel.com with ESMTP; 21 Aug 2018 05:51:38 -0700 Received: from fmsmsx123.amr.corp.intel.com (10.18.125.38) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 21 Aug 2018 05:51:38 -0700 Received: from fmsmsx117.amr.corp.intel.com ([169.254.3.210]) by fmsmsx123.amr.corp.intel.com ([169.254.7.120]) with mapi id 14.03.0319.002; Tue, 21 Aug 2018 05:51:38 -0700 From: "Wiles, Keith" To: Matteo Lanzuisi CC: Olivier Matz , "dev@dpdk.org" Thread-Topic: [dpdk-dev] Multi-thread mempool usage Thread-Index: AQHULv41lPaH6s1QH0eKSa+Fpr+0laS+KD0AgACPhACACoktAIAAFPkAgAFO0ACAAASXAIAAB4kAgAAB5YA= Date: Tue, 21 Aug 2018 12:51:34 +0000 Message-ID: References: <20180813215424.cesdejskukrrt4zf@neon> <18bbb971-40f1-bba3-3cea-83e7eff94e43@resi.it> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.255.228.109] Content-Type: text/plain; charset="us-ascii" Content-ID: <37870CB9A0784449B9D52EC611B72A9E@intel.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] Multi-thread mempool usage X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2018 12:51:40 -0000 > On Aug 21, 2018, at 7:44 AM, Matteo Lanzuisi wrote: >=20 > Il 21/08/2018 14:17, Wiles, Keith ha scritto: >>=20 >>> On Aug 21, 2018, at 7:01 AM, Matteo Lanzuisi wrote= : >>>=20 >>> Hi >>>=20 >>> Il 20/08/2018 18:03, Wiles, Keith ha scritto: >>>>> On Aug 20, 2018, at 9:47 AM, Matteo Lanzuisi >>>>> wrote: >>>>>=20 >>>>> Hello Olivier, >>>>>=20 >>>>> Il 13/08/2018 23:54, Olivier Matz ha scritto: >>>>>=20 >>>>>> Hello Matteo, >>>>>>=20 >>>>>> On Mon, Aug 13, 2018 at 03:20:44PM +0200, Matteo Lanzuisi wrote: >>>>>>=20 >>>>>>> Any suggestion? any idea about this behaviour? >>>>>>>=20 >>>>>>> Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto: >>>>>>>=20 >>>>>>>> Hi all, >>>>>>>>=20 >>>>>>>> recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm)= on >>>>>>>> RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an >>>>>>>> application from RH6 to RH7. On RH6 I used dpdk-2.2.0. >>>>>>>>=20 >>>>>>>> This application is made up by one or more threads (each one on a >>>>>>>> different logical core) reading packets from i40e interfaces. >>>>>>>>=20 >>>>>>>> Each thread can call the following code lines when receiving a spe= cific >>>>>>>> packet: >>>>>>>>=20 >>>>>>>> RTE_LCORE_FOREACH(lcore_id) >>>>>>>> { >>>>>>>> result =3D >>>>>>>> rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, (V= OID_P >>>>>>>> *) &new_work); // mempools are created one for each logical= core >>>>>>>> if (((uint64_t)(new_work)) < 0x7f0000000000) >>>>>>>> printf("Result %d, lcore di partenza %u, lcore di rice= zione >>>>>>>> %u, pointer %p\n", result, rte_lcore_id(), lcore_id, new_work); = // >>>>>>>> debug print, on my server it should never happen but with multi-th= read >>>>>>>> happens always on the last logical core!!!! >>>>>>>>=20 >>>>>> Here, checking the value of new_work looks wrong to me, before >>>>>> ensuring that result =3D=3D 0. At least, new_work should be set to >>>>>> NULL before calling rte_mempool_get(). >>>>>>=20 >>>>> I put the check after result =3D=3D 0, and just before the rte_mempoo= l_get() I set new_work to NULL, but nothing changed. >>>>> The first time something goes wrong the print is >>>>>=20 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, poi= nter 0x880002 >>>>>=20 >>>>> Sorry for the italian language print :) it means that application is = sending a message from the logical core 1 to the logical core 2, it's the 6= 35th time, the result is 0 and the pointer is 0x880002 while all pointers b= efore were 0x7ffxxxxxx. >>>>> One strange thing is that this behaviour happens always from the logi= cal core 1 to the logical core 2 when the counter is 635!!! (Sending messag= es from 2 to 1 or 1 to 1 or 2 to 2 is all ok) >>>>> Another strange thing is that pointers from counter 636 to 640 are NU= LL, and from 641 begin again to be good... as you can see here following (I= attached the result of a test without the "if" of the check on the value o= f new_work, and only for messages from the lcore 1 to lcore 2) >>>>>=20 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 627, poi= nter 0x7ffe8a261880 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 628, poi= nter 0x7ffe8a261900 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 629, poi= nter 0x7ffe8a261980 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 630, poi= nter 0x7ffe8a261a00 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 631, poi= nter 0x7ffe8a261a80 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 632, poi= nter 0x7ffe8a261b00 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 633, poi= nter 0x7ffe8a261b80 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 634, poi= nter 0x7ffe8a261c00 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, poi= nter 0x880002 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 636, poi= nter (nil) >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 637, poi= nter (nil) >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 638, poi= nter (nil) >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 639, poi= nter (nil) >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 640, poi= nter (nil) >>>>>=20 >>>> This sure does seem like a memory over write problem, with maybe a mem= set(0) in the mix as well. Have you tried using hardware break points with = the 0x880002 or 0x00 being written into this range? >>> I put some breakpoints and found this: >>>=20 >>> 1 - using pointer 0x880002, the output is (the pointer comes in the mid= dle of two rwlock): >>>=20 >>> (gdb) awatch *0x880002 >>> Hardware access (read/write) watchpoint 1: *0x880002 >>> (gdb) c >>> Continuing. >>> [New Thread 0x7fffeded5700 (LWP 19969)] >>> [New Thread 0x7fffed6d4700 (LWP 19970)] >>> [New Thread 0x7fffeced3700 (LWP 19971)] >>> [New Thread 0x7fffec6d2700 (LWP 19972)] >>> [New Thread 0x7fffebed1700 (LWP 19973)] >>> [New Thread 0x7fffeb6d0700 (LWP 19974)] >>> Hardware access (read/write) watchpoint 1: *0x880002 >>>=20 >>> Value =3D 0 >>> rte_rwlock_init (rwl=3D0x880000 ) >>> at /usr/share/dpdk/x86_64-default-linuxapp-gcc/include/generic/rte_= rwlock.h:81 >>> 81 } >>> (gdb) c >>> Continuing. >>> Hardware access (read/write) watchpoint 1: *0x880002 >> These are most likely false positive hits and not the issue. >>> Value =3D 0 >>> rte_rwlock_init (rwl=3D0x880004 ) >>> at /usr/share/dpdk/x86_64-default-linuxapp-gcc/include/generic/rte_= rwlock.h:81 >>> 81 } >>> (gdb) c >>> Continuing. >>>=20 >>> 2 - when using pointers minor or equal than 0x7ffe8a261d64 (in the rang= e of the mempool), gdb tells nothing about them (I don't use them, I just g= et them from the pool and the put them in the pool again); >>>=20 >>> 3 - when using pointer 0x7ffe8a261d65 or greater, this is the output of= gdb: >>>=20 >>> (gdb) awatch *(int *)0x7ffe8a261d65 >>> Hardware access (read/write) watchpoint 1: *(int *)0x7ffe8a261d65 >>> (gdb) c >>> Continuing. >>> [New Thread 0x7fffeded5700 (LWP 17689)] >>> [New Thread 0x7fffed6d4700 (LWP 17690)] >>> [New Thread 0x7fffeced3700 (LWP 17691)] >>> [New Thread 0x7fffec6d2700 (LWP 17692)] >>> [New Thread 0x7fffebed1700 (LWP 17693)] >>> [New Thread 0x7fffeb6d0700 (LWP 17694)] >>> Hardware access (read/write) watchpoint 1: *(int *)0x7ffe8a261d65 >>>=20 >>> Value =3D 0 >>> 0x00007ffff3798c21 in mempool_add_elem (mp=3Dmp@entry=3D0x7ffebfd8d6c0,= obj=3Dobj@entry=3D0x7ffe8a261d80, >>> iova=3Diova@entry=3D4465237376) at /usr/src/debug/dpdk-17.11/lib/li= brte_mempool/rte_mempool.c:140 >>> 140 STAILQ_INSERT_TAIL(&mp->elt_list, hdr, next); >>> (gdb) where >>> #0 0x00007ffff3798c21 in mempool_add_elem (mp=3Dmp@entry=3D0x7ffebfd8d= 6c0, obj=3Dobj@entry=3D0x7ffe8a261d80, >>> iova=3Diova@entry=3D4465237376) at /usr/src/debug/dpdk-17.11/lib/li= brte_mempool/rte_mempool.c:140 >>> #1 0x00007ffff37990f0 in rte_mempool_populate_iova (mp=3D0x7ffebfd8d6c= 0, vaddr=3D0x7ffe8a23d540 "", >>> iova=3D4465087808, len=3D8388480, free_cb=3D, opaque= =3D) >>> at /usr/src/debug/dpdk-17.11/lib/librte_mempool/rte_mempool.c:424 >>> #2 0x00007ffff379967d in rte_mempool_populate_default (mp=3Dmp@entry= =3D0x7ffebfd8d6c0) >>> at /usr/src/debug/dpdk-17.11/lib/librte_mempool/rte_mempool.c:624 >>> #3 0x00007ffff3799e89 in rte_mempool_create (name=3D, n= =3D, >>> elt_size=3D, cache_size=3D, private_d= ata_size=3D, >>> mp_init=3D0x7ffff444e410 , mp_init_arg=3D0x0= , >>> obj_init=3D0x7ffff444e330 , obj_init_arg=3D0x0, s= ocket_id=3D0, flags=3D0) >>> at /usr/src/debug/dpdk-17.11/lib/librte_mempool/rte_mempool.c:952 >>> #4 0x0000000000548a52 in main (argc=3D16, argv=3D0x7fffffffe3c8) >>> at /root/gemini-cea-4.6.0/msrc/sys/com/linux-dpdk/cea-app/../../../= ../sys/com/linux-dpdk/cea-app/main.c:2360 >>> (gdb) c >>> Continuing. >>> Hardware access (read/write) watchpoint 1: *(int *)0x7ffe8a261d65 >> This seems to be just creating a pktmbuf pool. The STAILQ_INSERT_TAILQ i= s just putting the mempool on the main tailq list for mempools in DPDK. >>=20 >>> Old value =3D 0 >>> New value =3D -402653184 >>> 0x00007ffff3798c24 in mempool_add_elem (mp=3Dmp@entry=3D0x7ffebfd8d6c0,= obj=3Dobj@entry=3D0x7ffe8a261e00, >>> iova=3Diova@entry=3D4465237504) at /usr/src/debug/dpdk-17.11/lib/li= brte_mempool/rte_mempool.c:140 >>> 140 STAILQ_INSERT_TAIL(&mp->elt_list, hdr, next); >>> (gdb) where >>> #0 0x00007ffff3798c24 in mempool_add_elem (mp=3Dmp@entry=3D0x7ffebfd8d= 6c0, obj=3Dobj@entry=3D0x7ffe8a261e00, >>> iova=3Diova@entry=3D4465237504) at /usr/src/debug/dpdk-17.11/lib/li= brte_mempool/rte_mempool.c:140 >>> #1 0x00007ffff37990f0 in rte_mempool_populate_iova (mp=3D0x7ffebfd8d6c= 0, vaddr=3D0x7ffe8a23d540 "", >>> iova=3D4465087808, len=3D8388480, free_cb=3D, opaque= =3D) >>> at /usr/src/debug/dpdk-17.11/lib/librte_mempool/rte_mempool.c:424 >>> #2 0x00007ffff379967d in rte_mempool_populate_default (mp=3Dmp@entry= =3D0x7ffebfd8d6c0) >>> at /usr/src/debug/dpdk-17.11/lib/librte_mempool/rte_mempool.c:624 >>> #3 0x00007ffff3799e89 in rte_mempool_create (name=3D, n= =3D, >>> elt_size=3D, cache_size=3D, private_d= ata_size=3D, >>> mp_init=3D0x7ffff444e410 , mp_init_arg=3D0x0= , >>> obj_init=3D0x7ffff444e330 , obj_init_arg=3D0x0, s= ocket_id=3D0, flags=3D0) >>> at /usr/src/debug/dpdk-17.11/lib/librte_mempool/rte_mempool.c:952 >>> #4 0x0000000000548a52 in main (argc=3D16, argv=3D0x7fffffffe3c8) >>> at /root/gemini-cea-4.6.0/msrc/sys/com/linux-dpdk/cea-app/../../../= ../sys/com/linux-dpdk/cea-app/main.c:2360 >>> (gdb) c >>> Continuing. >>>=20 >>> What do you think? It is normal that the mempool_add_elem is called onl= y on certain pointers of the mempool? >>> I attached the initialization of the mempool. Can this be wrong? >> All mempools with a cache size will have two queue to put memory on, one= is the per lcore list and that one is used as a fast access queue. When th= e cache becomes empty or has more entries then the cache was created with t= hen it pushed the extra entries to the main list of mempool data. > Why do you say "mempools with a cache size" ? In my initialization this m= empool has cache_size =3D 0 If you give a cache size then you will have a cache list per lcore, in your= case you do not have a cache. BTW not having a cache will effect performan= ce a great deal. >>=20 >> The only time that rwlock is touched is to get/put items on the main mem= pool. >>=20 >> Just as a data point have you tried this app on 18.08 yet? I do not see = the problem yet, sorry. > I'll try 18.08 and let you know >>=20 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 641, poi= nter 0x7ffe8a262b00 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 642, poi= nter 0x7ffe8a262b80 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 643, poi= nter 0x7ffe8a262d00 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 644, poi= nter 0x7ffe8a262d80 >>>>> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 645, poi= nter 0x7ffe8a262e00 >>>>>=20 >>>>>=20 >>>>>>>> if (result =3D=3D 0) >>>>>>>> { >>>>>>>> new_work->command =3D command; // usage of the memory = gotten >>>>>>>> from the mempool... <<<<<- here is where the application crashes!!= !! >>>>>>>>=20 >>>>>> Do you know why it crashes? Is it that new_work is NULL? >>>>>>=20 >>>>> The pointer is not NULL but is not sequential to the others (0x880002= as written before in this email). It seems to be in a memory zone not in D= PDK hugepages or something similar. >>>>> If I use this pointer the application crashes. >>>>>=20 >>>>>> Can you check how the mempool is initialized? It should be in multi >>>>>> consumer and depending on your use case, single or multi producer. >>>>>>=20 >>>>> Here is the initialization of this mempool >>>>>=20 >>>>> cea_main_cmd_pool[i] =3D rte_mempool_create(pool_name, >>>>> (unsigned int) (ikco_cmd_buffers - 1), // 65536 - 1 in th= is case >>>>> sizeof (CEA_DECODE_CMD_T), // 24 bytes >>>>> 0, 0, >>>>> rte_pktmbuf_pool_init, NULL, >>>>> rte_pktmbuf_init, NULL, >>>>> rte_socket_id(), 0); >>>>>=20 >>>>>> Another thing that could be checked: at all the places where you >>>>>> return your work object to the mempool, you should add a check >>>>>> that it is not NULL. Or just enabling RTE_LIBRTE_MEMPOOL_DEBUG >>>>>> could do the trick: it adds some additional checks when doing >>>>>> mempool operations. >>>>>>=20 >>>>> I think I have already answered this point with the prints up in the = email. >>>>>=20 >>>>> What do you think about this behaviour? >>>>>=20 >>>>> Regards, >>>>> Matteo >>>>>=20 >>>>>>>> result =3D >>>>>>>> rte_ring_enqueue(cea_main_lcore_conf[lcore_id].de_conf.cmd_ring, >>>>>>>> (VOID_P) new_work); // enqueues the gotten buffer on the rings = of all >>>>>>>> lcores >>>>>>>> // check on result value ... >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> // do something if result !=3D 0 ... >>>>>>>> } >>>>>>>> } >>>>>>>>=20 >>>>>>>> This code worked perfectly (never had an issue) on dpdk-2.2.0, whi= le if >>>>>>>> I use more than 1 thread doing these operations on dpdk-17.11 it h= appens >>>>>>>> that after some times the "new_work" pointer is not a good one, an= d the >>>>>>>> application crashes when using that pointer. >>>>>>>>=20 >>>>>>>> It seems that these lines cannot be used by more than one thread >>>>>>>> simultaneously. I also used many 2017 and 2018 dpdk versions witho= ut >>>>>>>> success. >>>>>>>>=20 >>>>>>>> Is this code possible on the new dpdk versions? Or have I to chang= e my >>>>>>>> application so that this code is called just by one lcore at a tim= e? >>>>>>>>=20 >>>>>> Assuming the mempool is properly initialized, I don't see any reason >>>>>> why it would not work. There has been a lot of changes in mempool be= tween >>>>>> dpdk-2.2.0 and dpdk-17.11, but this behavior should remain the same. >>>>>>=20 >>>>>> If the comments above do not help to solve the issue, it could be he= lpful >>>>>> to try to reproduce the issue in a minimal program, so we can help t= o >>>>>> review it. >>>>>>=20 >>>>>> Regards, >>>>>> Olivier >>>>>>=20 >>>> Regards, >>>> Keith >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>> Regards, >>>=20 >>> Matteo >>>=20 >> Regards, >> Keith >>=20 >>=20 >>=20 > Regards, > Matteo Regards, Keith