From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7A698A0542; Mon, 29 Aug 2022 13:37:51 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1B0FD4069D; Mon, 29 Aug 2022 13:37:51 +0200 (CEST) Received: from out203-205-251-59.mail.qq.com (out203-205-251-59.mail.qq.com [203.205.251.59]) by mails.dpdk.org (Postfix) with ESMTP id CAC614003C for ; Mon, 29 Aug 2022 13:37:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1661773063; bh=fGvfyZVl2ZV/nL6OH6GuCdW3UR40F2flhV+voUCnD3Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=scY2d9PfdMaY0JDgjV9ywf2Oua0Tl+lnU4MnS5KIeFRJTsIwzv4SlQRcDH7jMdOoM b6F15vqsSzV6Hh+dCtglVRKsnjoti7tUNQNVmoAQIF7BUGZiNFWVtumslmoZkxs8AT +vCRPGshF0k1bEBpBOtHu8dS/XMwDeN/5mX4ecnI= Received: from vscode ([36.111.64.85]) by newxmesmtplogicsvrsza5.qq.com (NewEsmtp) with SMTP id 9643985C; Mon, 29 Aug 2022 19:37:36 +0800 X-QQ-mid: xmsmtpt1661773056t0la8wdkv Message-ID: X-QQ-XMAILINFO: NDgMZBR9sMma9AZiJTiZXeAzjMrXg99y5mirNvp2s8/SObQ3jF4WBc9xkdWY2n p/culoEM51CKErHWAxa56TwCFvh0EDeRnh41MmoXH9d9Mk/D2xQDR10zoQkuHLEm37s/FNu4n47a apiLDUNbG1zgBkOyC8DDbsl7CSklKq2eVFELy42P70tWGblm0nOh4/Vnp8NwY+Na9afBwViwrS6S BSznhpCH6TwbK6vYIgWR3gzYvMdc/KbZVGwJl0u3B4HehTHMf9xbjoDymdE5ANp11BL5tVGhT3Kh qVj6Bsz4Xt4BF2bxD7wR+/f9toS9nhnc7R7o+Z/ZOD1LOyyeQ9BBhwBAYJIu7KaFQwP24hSXRcIX 388cxJsZ88PLcXyg3bbuVexrxT3BB7FWCd9wrhZ9bEPWo0m0FuowvF/lbpnklzXxelmB/PZUr2sK aG330uR9zI4HN0ySegucKOP9tQolIZ+fI5aVNxnhWs7kKJ+TpWFqHpOXsu2iCYhbagTJS0uuiQBc qbEDETvalaVf6Qq5mTGrvIu3i089TZZ35+b/R6Gr3qKFykOBaMKIFF8xjDyqvRPICLQh+0KC+vtB u/ZbFLQfgOksn7MaaBvjHR4ww5fb70VjszXqpJ8YcObCF2JY595ip+lcd28tCBWJ5fgL2osC+Df/ q50VtkO9BpDvYZ4sYvHq5KFVOgCIdfNaiPPfzQW5INSIIjkuCSo3NIbuwGnTGW5yTbwdabbNnb1n WM+BT2r1WhOq9vJKrrCWBHP16wyWX7vpkJtBbe6hvsDIWd6wFLlfHXOIDoo/M7YC4386h4cHWbmQ VLClYQTy2ozkg6/OpXu8x5896y/AWJOLzS87P53AzxWcIN0KJ048JY8zGH1WF9Rx0IIv5ES8ow8L jpzhNQT1qSnoaroOr/zZUh1qet3kkuCfqGY4G8idFd8aF1lsJMjOwgM9WoDlG9CDvC6J44Mz9h+N alpQQgpRc= Date: Mon, 29 Aug 2022 11:37:36 +0000 From: lic121 To: Dmitry Kozlyuk Cc: dev@dpdk.org, lic121 Subject: Re: [PATCH] eal: zero out new added memory X-OQ-MSGID: <20220829113736.GA29167@vscode> References: <20220827125750.291dd7d1@sovereign> <20220827175654.7a167eaf@sovereign> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, Aug 29, 2022 at 01:18:36AM +0000, lic121 wrote: > On Sat, Aug 27, 2022 at 05:56:54PM +0300, Dmitry Kozlyuk wrote: > > 2022-08-27 13:31 (UTC+0000), lic121: > > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > > 2022-08-27 09:25 (UTC+0000), chengtcli@qq.com: > > > > > From: lic121 > > > > > > > > > > When RTE_MALLOC_DEBUG not configured, rte_zmalloc_socket() doens't > > > > > zero oute allocaed memory. Because memory are zeroed out when free > > > > > in malloc_elem_free(). But seems the initial allocated memory is > > > > > not zeroed out as expected. > > > > > > > > > > This patch zero out initial allocated memory in > > > > > malloc_heap_add_memory(). > > > > > > > > > > With dpdk 20.11.5, "QLogic Corp. FastLinQ QL41000" probe triggers > > > > > this problem. > > > > > ``` > > > > > Stack trace of thread 412780: > > > > > #0 0x0000000000e5fb99 ecore_int_igu_read_cam (dpdk-testpmd) > > > > > #1 0x0000000000e4df54 ecore_get_hw_info (dpdk-testpmd) > > > > > #2 0x0000000000e504aa ecore_hw_prepare (dpdk-testpmd) > > > > > #3 0x0000000000e8a7ca qed_probe (dpdk-testpmd) > > > > > #4 0x0000000000e83c59 qede_common_dev_init (dpdk-testpmd) > > > > > #5 0x0000000000e84c8e qede_eth_dev_init (dpdk-testpmd) > > > > > #6 0x00000000009dd5a7 rte_pci_probe_one_driver (dpdk-testpmd) > > > > > #7 0x00000000009734e3 rte_bus_probe (dpdk-testpmd) > > > > > #8 0x00000000009933bd rte_eal_init (dpdk-testpmd) > > > > > #9 0x000000000041768f main (dpdk-testpmd) > > > > > #10 0x00007f41a7001b17 __libc_start_main (libc.so.6) > > > > > #11 0x000000000067e34a _start (dpdk-testpmd) > > > > > ``` > > > > > > > > > > Signed-off-by: lic121 > > > > > --- > > > > > lib/librte_eal/common/malloc_heap.c | 8 ++++++++ > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c > > > > > index f4e20ea..1607401 100644 > > > > > --- a/lib/librte_eal/common/malloc_heap.c > > > > > +++ b/lib/librte_eal/common/malloc_heap.c > > > > > @@ -96,11 +96,19 @@ > > > > > void *start, size_t len) > > > > > { > > > > > struct malloc_elem *elem = start; > > > > > + void *ptr; > > > > > + size_t data_len > > > > > + > > > > > > > > > > malloc_elem_init(elem, heap, msl, len, elem, len); > > > > > > > > > > malloc_elem_insert(elem); > > > > > > > > > > + /* Zero out new added memory. */ > > > > > + *ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); > > > > > + data_len = elem->size - MALLOC_ELEM_OVERHEAD; > > > > > + memset(ptr, 0, data_len); > > > > > + > > > > > elem = malloc_elem_join_adjacent_free(elem); > > > > > > > > > > malloc_elem_free_list_insert(elem); > > > > > > > > Hi, > > > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > > and DPDK ensures that files in hugetlbfs are not re-mapped. > > > > What makes you think that it is not zeroed? > > > > Were you able to catch [start; start+len) contain non-zero bytes > > > > at the start of this function? > > > > If so, is it system memory (not an external heap)? > > > > If so, what is the CPU, kernel, any custom settings? > > > > > > > > Can it be the PMD or the app that uses rte_malloc instead of rte_zmalloc? > > > > > > > > This patch cannot be accepted as-is anyway: > > > > 1. It zeroes memory even if the code was called not via rte_zmalloc(). > > > > 2. It leads to zeroing on both alloc and free, which is suboptimal. > > > > > > Hi Dmitry, thanks for the review. > > > > > > In rte_eth_dev_pci_allocate(), imediately after rte_zmalloc_socket()[1] > > > I printed > > > the content in gdb. It's not zero. > > > > > > print ((struct qede_dev *)(eth_dev->data->dev_private))->edev->p_iov_info > > > > > > cpu: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz > > > kernel: 4.19.90-2102 > > > > > > [1] > > > https://github.com/DPDK/dpdk/blob/v20.11/lib/librte_ethdev/rte_ethdev_pci.h#L91-L93 > > > > Sorry, it seems that something is wrong with your debug. > > Your link is for 20.11.0. > > In 20.11.5 (apparently always) struct qede_dev::edev is not a pointer [2]. > > Even if it was, in zeroed memory it would be a NULL pointer, > > reading a member would give a random value at NULL + some offset. > > I suggest to print content of the allocated memory with rte_hexdump(). > > > > Sorry I didn't describe my debug clear. At first I debuged with version > 20.11.0, I found that the rte_zmalloc_socket() memory is dirty. Then I > tried 20.11.5, I didn't debug on 20.11.5 but the behave is the same(nic > failed to be probed). So in the commit msg I said v20.11.5 has the > issue. But when I describe my debug I uesd 20.11.0 url. > > More debug info: > 1. I reproduced the issue for tens of times, every time the printed var > has the same value. > 2. After search malloc_heap_add_memory, I found that there are 3 places > where call this function to add memory, malloc_add_seg(), > alloc_pages_on_heap() and malloc_heap_add_external_memory(). Firstly, I > zero out memory only for malloc_add_seg(), it didn't fix the issue. Then > I zero out meory in malloc_heap_add_memory() to cover all 3 cases, this > time nic is probed successfully. > 3. Once nic is probed, I roll back my fix code, try to reproduce the > issue. But I can't reproduce anymore. So I guess: the memory allocated > when probe qede nic is at a fixed memory location. Because every time in > my debug the printed var has the same value. After I zeroed out the > allocated memory once, I can't reproduce the issue anymore. > > > [2]: > > http://git.dpdk.org/dpdk-stable/tree/drivers/net/qede/qede_ethdev.h?h=v20.11.5#n223 Today we probaly meet the same issue with intel E810 nic, the behave is that E810 nic can be probed on some host, but can't one some other. On the same host, one E810 may be probed while the other one can't be. After I applied this patch, no such issue anymore.