From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <arybchenko@solarflare.com>
Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com
 [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id B07C1548B
 for <dev@dpdk.org>; Thu, 19 Jul 2018 18:44:54 +0200 (CEST)
X-Virus-Scanned: Proofpoint Essentials engine
Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1-us3.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id
 0FA2D600056; Thu, 19 Jul 2018 16:44:53 +0000 (UTC)
Received: from [192.168.1.16] (85.187.13.33) by ukex01.SolarFlarecom.com
 (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Thu, 19 Jul
 2018 17:44:47 +0100
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, Stephen Hemminger
 <stephen@networkplumber.org>
CC: "dev@dpdk.org" <dev@dpdk.org>, Sergio Gonzalez Monroy
 <sergio.gonzalez.monroy@intel.com>
References: <8bc76811-ac29-d7f2-e4c3-12b50fd44dba@solarflare.com>
 <58e5044c-3d13-9171-4168-b4d6b1d61927@intel.com>
 <a5f0915e-ccd1-9237-4337-3a0b0265c4cf@solarflare.com>
 <20180718135817.66728c37@xeon-e3>
 <e39616e7-175c-16b7-ef81-0b2d2ef3d78c@intel.com>
 <71478802-911e-676a-93a3-82c550cc0ee9@intel.com>
From: Andrew Rybchenko <arybchenko@solarflare.com>
Message-ID: <e37dae9d-5dbe-5c9a-958e-52e5e9146131@solarflare.com>
Date: Thu, 19 Jul 2018 19:44:43 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <71478802-911e-676a-93a3-82c550cc0ee9@intel.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Originating-IP: [85.187.13.33]
X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To
 ukex01.SolarFlarecom.com (10.17.10.4)
X-TM-AS-Product-Ver: SMEX-11.0.0.1191-8.100.1062-23978.003
X-TM-AS-Result: No--18.610000-0.000000-31
X-TM-AS-User-Approved-Sender: Yes
X-TM-AS-User-Blocked-Sender: No
X-MDID: 1532018694-OU69wxsYfCV5
Subject: Re: [dpdk-dev] Memory allocated using rte_zmalloc() has non-zeros
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Jul 2018 16:44:55 -0000

On 19.07.2018 12:48, Burakov, Anatoly wrote:
> On 19-Jul-18 10:01 AM, Burakov, Anatoly wrote:
>> On 18-Jul-18 9:58 PM, Stephen Hemminger wrote:
>>> On Wed, 18 Jul 2018 22:52:12 +0300
>>> Andrew Rybchenko <arybchenko@solarflare.com> wrote:
>>>
>>>> On 18.07.2018 20:18, Burakov, Anatoly wrote:
>>>>> On 18-Jul-18 4:20 PM, Andrew Rybchenko wrote:
>>>>>> Hi Anatoly,
>>>>>>
>>>>>> I'm investigating issue which finally comes to the fact that memory
>>>>>> allocated using
>>>>>> rte_zmalloc() has non zeros.
>>>>>>
>>>>>> If I add memset just after allocation, everything is perfect and
>>>>>> works fine.
>>>>>>
>>>>>> I've found out that memset was removed from rte_zmalloc_socket() 
>>>>>> some
>>>>>> time ago:
>>>>>>   >>>
>>>>>> commit b78c9175118f7d61022ddc5c62ce54a1bd73cea5
>>>>>> Author: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
>>>>>> Date:   Tue Jul 5 12:01:16 2016 +0100
>>>>>>
>>>>>>       mem: do not zero out memory on zmalloc
>>>>>>
>>>>>>       Zeroing out memory on rte_zmalloc_socket is not required 
>>>>>> anymore
>>>>>> since all
>>>>>>       allocated memory is already zeroed.
>>>>>>
>>>>>>       Signed-off-by: Sergio Gonzalez Monroy
>>>>>> <sergio.gonzalez.monroy@intel.com>
>>>>>> <<<
>>>>>>
>>>>>> but may be something has changed now that made above statement 
>>>>>> false.
>>>>>>
>>>>>> I observe the problem when memory is reallocated. I.e. I configure 7
>>>>>> queues,
>>>>>> start, stop, reconfigure to 3 queues, start. Memory is allocated on
>>>>>> start and
>>>>>> freed on stop, since we have less queues on the second start it is
>>>>>> allocated
>>>>>> Andrew Rybchenko <arybchenko@solarflare.com>
>>>>>> in a different way and reuses previously allocated/freed memory.
>>>>>>
>>>>>> Do you have any ideas what could be wrong?
>>>>>>
>>>>>> Andrew.
>>>>>>
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> I will look into it first thing tomorrow. In general, we memset(0) on
>>>>> free, and kernel gives us zeroed out pages initially, so the most
>>>>> likely point of failure is that i'm not overwring some malloc headers
>>>>> correctly on free.
>>>>
>>>> OK, at least now I know how it is supposed to work in theory.
>>>>
>>>> The following region was allocated  (the second number below is 
>>>> pointer
>>>> plus size)
>>>> ALLOC 0x7fffa3264080-0x7fffa32640b8
>>>>
>>>> Not zerod address is 16 bytes before:
>>>> (gdb) p/x ((uint64_t *)0x7fffa3264070)[0]
>>>> $4 = 0x4000000002
>>>> (gdb) p/x ((uint64_t  *)0x7fffa3264070)[1]
>>>> $5 = 0x80
>>>>
>>>> then freed
>>>> FREE 0x7fffa3264080-0x7fffa32640b8
>>>>
>>>> but above values (gdb) are still the same
>>>> then it is allocated as the part of bigger memory chunk
>>>> ALLOC 0x7fffa3245b80-0x7fffa3265fd8
>>>> which should contain zeros, but above values are still the same.
>>>>
>>>> It is interesting that it looks like it was the first block freed 
>>>> on the
>>>> port stop. I'm not 100% sure since I've put printouts to my allocation
>>>> wrapper, not EAL.
>>>>
>>>> Many thanks,
>>>> Andrew.
>>>
>>> memset here is what is supposed to clear the data.
>>>
>>> struct malloc_elem *
>>> malloc_elem_free(struct malloc_elem *elem)
>>> {
>>>     void *ptr;
>>>     size_t data_len;
>>>
>>>     ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN + elem->pad);
>>>     data_len = elem->size - elem->pad - MALLOC_ELEM_OVERHEAD;
>>>
>>>     elem = malloc_elem_join_adjacent_free(elem);
>>>
>>>     malloc_elem_free_list_insert(elem);
>>>
>>>     elem->pad = 0;
>>>
>>>     /* decrease heap's count of allocated elements */
>>>     elem->heap->alloc_count--;
>>>
>>>     memset(ptr, 0, data_len);
>>>
>>> Maybe data_len is not correct either because of bug, or your 
>>> application clobbered
>>> the malloc reserved regions  in the element.
>>>
>>> More likely, gcc is incorrectly optimizing this away.
>>>
>>> https://wiki.sei.cmu.edu/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations 
>>>
>>> https://www.cryptologie.net/article/419/zeroing-memory-compiler-optimizations-and-memset_s/ 
>>>
>>>
>>
>> I tend to be very wary of blaming the compiler without exhausting any 
>> other possibilities :) It used to work before without issues, so 
>> presumably whatever is happening, our memset works correctly.
>>
>> Andrew, you write:
>>
>> <snip>
>> ALLOC 0x7fffa3264080-0x7fffa32640b8
>> Not zerod address is 16 bytes before:
>> <snip>
>>
>> Of course the memory *before* your pointer would not be zero - it is 
>> preceded by a 64-byte malloc header, so what you're seeing is the 
>> malloc header data (which doesn't go away if you free it - it will go 
>> away only if it is merged with an adjacent free malloc element). So, 
>> i'm failing to see which problem you're describing, given that all 
>> memory regions that are supposedly not free lie outside of your 
>> malloc-allocated memory.

I tried to highlight that non-zeroed bytes belong to malloc header of 
the previously allocated memory region. Later it becomes memory 
allocated region itself (significantly bigger, so merges happened):
 >>>
then it is allocated as the part of bigger memory chunk
ALLOC 0x7fffa3245b80-0x7fffa3265fd8
which should contain zeros, but above values are still the same.
<<<

>> However, after careful analysis, i can see that there is one 
>> possibility where memory is not zeroed on free - if the original 
>> malloc element was padded, and there aren't any more adjacent free 
>> elements, then newly allocated memory may contain old pad header. 
>> I'll submit a patch for you to try shortly.
>>
>
> Patch:
>
> http://patches.dpdk.org/patch/43196/

Yes, the patch fixes the problem I've observed. At least it passes 
simple test which I used for debugging.
I'll run more automated tests tonight.

Many thanks,
Andrew.