From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpcmd02101.aruba.it (smtpcmd02101.aruba.it [62.149.158.101]) by dpdk.org (Postfix) with ESMTP id 5FED91E2B for ; Mon, 20 Aug 2018 16:47:58 +0200 (CEST) Received: from LANZUISI-NBK ([93.146.250.201]) by smtpcmd02.ad.aruba.it with bizsmtp id RSnw1y0074MU9Ql01SnxSk; Mon, 20 Aug 2018 16:47:58 +0200 Received: from [172.16.17.27] by LANZUISI-NBK (PGP Universal service); Mon, 20 Aug 2018 16:48:00 +0100 X-PGP-Universal: processed; by LANZUISI-NBK on Mon, 20 Aug 2018 16:48:00 +0100 To: Olivier Matz Cc: dev@dpdk.org References: <20180813215424.cesdejskukrrt4zf@neon> From: Matteo Lanzuisi Message-ID: <18bbb971-40f1-bba3-3cea-83e7eff94e43@resi.it> Date: Mon, 20 Aug 2018 16:47:58 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180813215424.cesdejskukrrt4zf@neon> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 8bit Content-Language: it DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aruba.it; s=a1; t=1534776478; bh=Gpkov7nmtprd04j2mW5t6YdIv2mUO8p6Bcy+QEwTTTU=; h=Subject:To:From:Date:MIME-Version:Content-Type; b=hV0dSzf10KEM2SpNduuUFx9jNcq9z05PWZW66gLp2Or7ywdRZ28gzKcwX9s8/hHF5 VstKzLXFK3CYbJbZaurBLFk0wIU2cmCTa7geettV0u9Q3VMT+bKC74SQHgQC6Ge48t 6/n3GWs3Jrw3N1Vk6dkjDW9ve+GoYhI3+CnRXM8HzTutF87fl8rBy8pgXV+xMzJ6+Q tF5OJC7ThbjfmXE34n9GHKK7BXkSMca4T5TnJmK2YtK5B5C4sHlX+hyIt8YmcA+nrY qvvld5nnzPUx6dueeTkhUgO6q+kfZO0EDADRd5numHQW+KpvqPRzPl/YB2WWBLZuzh Of38Ivmtq4HuQ== Subject: Re: [dpdk-dev] Multi-thread mempool usage X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2018 14:47:58 -0000 Hello Olivier, Il 13/08/2018 23:54, Olivier Matz ha scritto: > Hello Matteo, > > On Mon, Aug 13, 2018 at 03:20:44PM +0200, Matteo Lanzuisi wrote: >> Any suggestion? any idea about this behaviour? >> >> Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto: >>> Hi all, >>> >>> recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm) on >>> RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an >>> application from RH6 to RH7. On RH6 I used dpdk-2.2.0. >>> >>> This application is made up by one or more threads (each one on a >>> different logical core) reading packets from i40e interfaces. >>> >>> Each thread can call the following code lines when receiving a specific >>> packet: >>> >>> RTE_LCORE_FOREACH(lcore_id) >>> { >>>         result = >>> rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, (VOID_P >>> *) &new_work);        // mempools are created one for each logical core >>>         if (((uint64_t)(new_work)) < 0x7f0000000000) >>>             printf("Result %d, lcore di partenza %u, lcore di ricezione >>> %u, pointer %p\n", result, rte_lcore_id(), lcore_id, new_work);    // >>> debug print, on my server it should never happen but with multi-thread >>> happens always on the last logical core!!!! > Here, checking the value of new_work looks wrong to me, before > ensuring that result == 0. At least, new_work should be set to > NULL before calling rte_mempool_get(). I put the check after result == 0, and just before the rte_mempool_get() I set new_work to NULL, but nothing changed. The first time something goes wrong the print is Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, pointer 0x880002 Sorry for the italian language print :) it means that application is sending a message from the logical core 1 to the logical core 2, it's the 635th time, the result is 0 and the pointer is 0x880002 while all pointers before were 0x7ffxxxxxx. One strange thing is that this behaviour happens always from the logical core 1 to the logical core 2 when the counter is 635!!! (Sending messages from 2 to 1 or 1 to 1 or 2 to 2 is all ok) Another strange thing is that pointers from counter 636 to 640 are NULL, and from 641 begin again to be good... as you can see here following (I attached the result of a test without the "if" of the check on the value of new_work, and only for messages from the lcore 1 to lcore 2) Result 0, lcore di partenza 1, lcore di ricezione 2, counter 627, pointer 0x7ffe8a261880 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 628, pointer 0x7ffe8a261900 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 629, pointer 0x7ffe8a261980 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 630, pointer 0x7ffe8a261a00 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 631, pointer 0x7ffe8a261a80 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 632, pointer 0x7ffe8a261b00 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 633, pointer 0x7ffe8a261b80 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 634, pointer 0x7ffe8a261c00 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, pointer 0x880002 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 636, pointer (nil) Result 0, lcore di partenza 1, lcore di ricezione 2, counter 637, pointer (nil) Result 0, lcore di partenza 1, lcore di ricezione 2, counter 638, pointer (nil) Result 0, lcore di partenza 1, lcore di ricezione 2, counter 639, pointer (nil) Result 0, lcore di partenza 1, lcore di ricezione 2, counter 640, pointer (nil) Result 0, lcore di partenza 1, lcore di ricezione 2, counter 641, pointer 0x7ffe8a262b00 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 642, pointer 0x7ffe8a262b80 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 643, pointer 0x7ffe8a262d00 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 644, pointer 0x7ffe8a262d80 Result 0, lcore di partenza 1, lcore di ricezione 2, counter 645, pointer 0x7ffe8a262e00 > >>>         if (result == 0) >>>         { >>>             new_work->command = command; // usage of the memory gotten >>> from the mempool... <<<<<- here is where the application crashes!!!! > Do you know why it crashes? Is it that new_work is NULL? The pointer is not NULL but is not sequential to the others (0x880002 as written before in this email). It seems to be in a memory zone not in DPDK hugepages or something similar. If I use this pointer the application crashes. > > Can you check how the mempool is initialized? It should be in multi > consumer and depending on your use case, single or multi producer. Here is the initialization of this mempool cea_main_cmd_pool[i] = rte_mempool_create(pool_name,             (unsigned int) (ikco_cmd_buffers - 1), // 65536 - 1 in this case             sizeof (CEA_DECODE_CMD_T), // 24 bytes             0, 0,             rte_pktmbuf_pool_init, NULL,             rte_pktmbuf_init, NULL,             rte_socket_id(), 0); > > Another thing that could be checked: at all the places where you > return your work object to the mempool, you should add a check > that it is not NULL. Or just enabling RTE_LIBRTE_MEMPOOL_DEBUG > could do the trick: it adds some additional checks when doing > mempool operations. I think I have already answered this point with the prints up in the email. What do you think about this behaviour? Regards, Matteo > >>>             result = >>> rte_ring_enqueue(cea_main_lcore_conf[lcore_id].de_conf.cmd_ring, >>> (VOID_P) new_work);    // enqueues the gotten buffer on the rings of all >>> lcores >>>             // check on result value ... >>>         } >>>         else >>>         { >>>             // do something if result != 0 ... >>>         } >>> } >>> >>> This code worked perfectly (never had an issue) on dpdk-2.2.0, while if >>> I use more than 1 thread doing these operations on dpdk-17.11 it happens >>> that after some times the "new_work" pointer is not a good one, and the >>> application crashes when using that pointer. >>> >>> It seems that these lines cannot be used by more than one thread >>> simultaneously. I also used many 2017 and 2018 dpdk versions without >>> success. >>> >>> Is this code possible on the new dpdk versions? Or have I to change my >>> application so that this code is called just by one lcore at a time? > Assuming the mempool is properly initialized, I don't see any reason > why it would not work. There has been a lot of changes in mempool between > dpdk-2.2.0 and dpdk-17.11, but this behavior should remain the same. > > If the comments above do not help to solve the issue, it could be helpful > to try to reproduce the issue in a minimal program, so we can help to > review it. > > Regards, > Olivier > >