From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id B07C1548B for ; Thu, 19 Jul 2018 18:44:54 +0200 (CEST) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1-us3.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id 0FA2D600056; Thu, 19 Jul 2018 16:44:53 +0000 (UTC) Received: from [192.168.1.16] (85.187.13.33) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Thu, 19 Jul 2018 17:44:47 +0100 To: "Burakov, Anatoly" , Stephen Hemminger CC: "dev@dpdk.org" , Sergio Gonzalez Monroy References: <8bc76811-ac29-d7f2-e4c3-12b50fd44dba@solarflare.com> <58e5044c-3d13-9171-4168-b4d6b1d61927@intel.com> <20180718135817.66728c37@xeon-e3> <71478802-911e-676a-93a3-82c550cc0ee9@intel.com> From: Andrew Rybchenko Message-ID: Date: Thu, 19 Jul 2018 19:44:43 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <71478802-911e-676a-93a3-82c550cc0ee9@intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [85.187.13.33] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-11.0.0.1191-8.100.1062-23978.003 X-TM-AS-Result: No--18.610000-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-MDID: 1532018694-OU69wxsYfCV5 Subject: Re: [dpdk-dev] Memory allocated using rte_zmalloc() has non-zeros X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jul 2018 16:44:55 -0000 On 19.07.2018 12:48, Burakov, Anatoly wrote: > On 19-Jul-18 10:01 AM, Burakov, Anatoly wrote: >> On 18-Jul-18 9:58 PM, Stephen Hemminger wrote: >>> On Wed, 18 Jul 2018 22:52:12 +0300 >>> Andrew Rybchenko wrote: >>> >>>> On 18.07.2018 20:18, Burakov, Anatoly wrote: >>>>> On 18-Jul-18 4:20 PM, Andrew Rybchenko wrote: >>>>>> Hi Anatoly, >>>>>> >>>>>> I'm investigating issue which finally comes to the fact that memory >>>>>> allocated using >>>>>> rte_zmalloc() has non zeros. >>>>>> >>>>>> If I add memset just after allocation, everything is perfect and >>>>>> works fine. >>>>>> >>>>>> I've found out that memset was removed from rte_zmalloc_socket() >>>>>> some >>>>>> time ago: >>>>>>   >>> >>>>>> commit b78c9175118f7d61022ddc5c62ce54a1bd73cea5 >>>>>> Author: Sergio Gonzalez Monroy >>>>>> Date:   Tue Jul 5 12:01:16 2016 +0100 >>>>>> >>>>>>       mem: do not zero out memory on zmalloc >>>>>> >>>>>>       Zeroing out memory on rte_zmalloc_socket is not required >>>>>> anymore >>>>>> since all >>>>>>       allocated memory is already zeroed. >>>>>> >>>>>>       Signed-off-by: Sergio Gonzalez Monroy >>>>>> >>>>>> <<< >>>>>> >>>>>> but may be something has changed now that made above statement >>>>>> false. >>>>>> >>>>>> I observe the problem when memory is reallocated. I.e. I configure 7 >>>>>> queues, >>>>>> start, stop, reconfigure to 3 queues, start. Memory is allocated on >>>>>> start and >>>>>> freed on stop, since we have less queues on the second start it is >>>>>> allocated >>>>>> Andrew Rybchenko >>>>>> in a different way and reuses previously allocated/freed memory. >>>>>> >>>>>> Do you have any ideas what could be wrong? >>>>>> >>>>>> Andrew. >>>>>> >>>>> >>>>> Hi Andrew, >>>>> >>>>> I will look into it first thing tomorrow. In general, we memset(0) on >>>>> free, and kernel gives us zeroed out pages initially, so the most >>>>> likely point of failure is that i'm not overwring some malloc headers >>>>> correctly on free. >>>> >>>> OK, at least now I know how it is supposed to work in theory. >>>> >>>> The following region was allocated  (the second number below is >>>> pointer >>>> plus size) >>>> ALLOC 0x7fffa3264080-0x7fffa32640b8 >>>> >>>> Not zerod address is 16 bytes before: >>>> (gdb) p/x ((uint64_t *)0x7fffa3264070)[0] >>>> $4 = 0x4000000002 >>>> (gdb) p/x ((uint64_t  *)0x7fffa3264070)[1] >>>> $5 = 0x80 >>>> >>>> then freed >>>> FREE 0x7fffa3264080-0x7fffa32640b8 >>>> >>>> but above values (gdb) are still the same >>>> then it is allocated as the part of bigger memory chunk >>>> ALLOC 0x7fffa3245b80-0x7fffa3265fd8 >>>> which should contain zeros, but above values are still the same. >>>> >>>> It is interesting that it looks like it was the first block freed >>>> on the >>>> port stop. I'm not 100% sure since I've put printouts to my allocation >>>> wrapper, not EAL. >>>> >>>> Many thanks, >>>> Andrew. >>> >>> memset here is what is supposed to clear the data. >>> >>> struct malloc_elem * >>> malloc_elem_free(struct malloc_elem *elem) >>> { >>>     void *ptr; >>>     size_t data_len; >>> >>>     ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN + elem->pad); >>>     data_len = elem->size - elem->pad - MALLOC_ELEM_OVERHEAD; >>> >>>     elem = malloc_elem_join_adjacent_free(elem); >>> >>>     malloc_elem_free_list_insert(elem); >>> >>>     elem->pad = 0; >>> >>>     /* decrease heap's count of allocated elements */ >>>     elem->heap->alloc_count--; >>> >>>     memset(ptr, 0, data_len); >>> >>> Maybe data_len is not correct either because of bug, or your >>> application clobbered >>> the malloc reserved regions  in the element. >>> >>> More likely, gcc is incorrectly optimizing this away. >>> >>> https://wiki.sei.cmu.edu/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations >>> >>> https://www.cryptologie.net/article/419/zeroing-memory-compiler-optimizations-and-memset_s/ >>> >>> >> >> I tend to be very wary of blaming the compiler without exhausting any >> other possibilities :) It used to work before without issues, so >> presumably whatever is happening, our memset works correctly. >> >> Andrew, you write: >> >> >> ALLOC 0x7fffa3264080-0x7fffa32640b8 >> Not zerod address is 16 bytes before: >> >> >> Of course the memory *before* your pointer would not be zero - it is >> preceded by a 64-byte malloc header, so what you're seeing is the >> malloc header data (which doesn't go away if you free it - it will go >> away only if it is merged with an adjacent free malloc element). So, >> i'm failing to see which problem you're describing, given that all >> memory regions that are supposedly not free lie outside of your >> malloc-allocated memory. I tried to highlight that non-zeroed bytes belong to malloc header of the previously allocated memory region. Later it becomes memory allocated region itself (significantly bigger, so merges happened): >>> then it is allocated as the part of bigger memory chunk ALLOC 0x7fffa3245b80-0x7fffa3265fd8 which should contain zeros, but above values are still the same. <<< >> However, after careful analysis, i can see that there is one >> possibility where memory is not zeroed on free - if the original >> malloc element was padded, and there aren't any more adjacent free >> elements, then newly allocated memory may contain old pad header. >> I'll submit a patch for you to try shortly. >> > > Patch: > > http://patches.dpdk.org/patch/43196/ Yes, the patch fixes the problem I've observed. At least it passes simple test which I used for debugging. I'll run more automated tests tonight. Many thanks, Andrew.