From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ABAD1A0542; Mon, 29 Aug 2022 13:58:04 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5344E4069D; Mon, 29 Aug 2022 13:58:04 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 43DC14003C for ; Mon, 29 Aug 2022 13:58:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661774282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=INsK+qRRueWBLK+poupY8y7jMbrzQ4VHbKw5PSHL3UY=; b=M2JJqo7qxPx1ZA6HsdEH46fq2TiAQZ5aRnn4wTyW4BOviKNhmbrt9iqPlMB1sn4TKIiLMS 709xTi4ldM5Ro98b4nk87WqF64bd59TuAKYza4naQmrwrwSltP9/yrxOS3js1J9yAt6OnF bqLpyTNPt3ydbTymRreWwmr+emI3yys= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-205-uLG-nZBXMBmiOZlTv8vwMw-1; Mon, 29 Aug 2022 07:58:01 -0400 X-MC-Unique: uLG-nZBXMBmiOZlTv8vwMw-1 Received: by mail-lf1-f71.google.com with SMTP id o1-20020ac24e81000000b0049469fcc39fso1112667lfr.2 for ; Mon, 29 Aug 2022 04:58:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=INsK+qRRueWBLK+poupY8y7jMbrzQ4VHbKw5PSHL3UY=; b=vwexIHWNMoR6Yw5bnWStI5bvlUo1i8JAKXkO8cGkoxQLNMw9pW6rDWgEF1pNC1waBd g5CF0wU/9+8dfq2JmksrDnKcFnHkEcFH6dAs8XhiDwiioxCUnaqVH+CCdlLv0jIihLw2 rKgRWPKZPbz9ZXnj9KzF9VID/BHhcFYwtvMC9GdBTGPcwm9Z8YslS5SqiwYcZ+wYspWT ci7SWqRd6GSM//mZtDyfS5UxcNm0fcdQjHVpCFIASHeAOx8/GZPPpkAatjbL9gz6bzfu BJg8iKjkSowYRKn6yvmTbnDO1E2o21O724b5006DQoOGDWmqXWwQG6SFeC6ORyqmX/UT Ib+A== X-Gm-Message-State: ACgBeo3A/S2HSIZ906cMVk8bZz9ExKPZR6nmTzY5wLVakiCQYroZ9vD6 65TC4AuB5CuWgxuTNSz192ToCe71MwjpXFc21yq79juf45O2bisxzavgpLgpsw6OXgacNMLApgc umZ7hjTFItdMNF4l5KPA= X-Received: by 2002:a2e:8541:0:b0:261:b44b:1a8b with SMTP id u1-20020a2e8541000000b00261b44b1a8bmr5864026ljj.46.1661774279974; Mon, 29 Aug 2022 04:57:59 -0700 (PDT) X-Google-Smtp-Source: AA6agR6UPx2cH/GTR06CuNY9G0sQRwIq01vf1RJG86hkFuS+PcVW8KBS+kQlWImEIfSJlqBSCAtIr7mEbS77puo1mhk= X-Received: by 2002:a2e:8541:0:b0:261:b44b:1a8b with SMTP id u1-20020a2e8541000000b00261b44b1a8bmr5864014ljj.46.1661774279690; Mon, 29 Aug 2022 04:57:59 -0700 (PDT) MIME-Version: 1.0 References: <20220827125750.291dd7d1@sovereign> <20220827175654.7a167eaf@sovereign> In-Reply-To: From: David Marchand Date: Mon, 29 Aug 2022 13:57:48 +0200 Message-ID: Subject: Re: [PATCH] eal: zero out new added memory To: lic121 Cc: Dmitry Kozlyuk , dev X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, Aug 29, 2022 at 1:38 PM lic121 wrote: > > On Mon, Aug 29, 2022 at 01:18:36AM +0000, lic121 wrote: > > On Sat, Aug 27, 2022 at 05:56:54PM +0300, Dmitry Kozlyuk wrote: > > > 2022-08-27 13:31 (UTC+0000), lic121: > > > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > > > 2022-08-27 09:25 (UTC+0000), chengtcli@qq.com: > > > > > > From: lic121 > > > > > > > > > > > > When RTE_MALLOC_DEBUG not configured, rte_zmalloc_socket() doens't > > > > > > zero oute allocaed memory. Because memory are zeroed out when free > > > > > > in malloc_elem_free(). But seems the initial allocated memory is > > > > > > not zeroed out as expected. > > > > > > > > > > > > This patch zero out initial allocated memory in > > > > > > malloc_heap_add_memory(). > > > > > > > > > > > > With dpdk 20.11.5, "QLogic Corp. FastLinQ QL41000" probe triggers > > > > > > this problem. > > > > > > ``` > > > > > > Stack trace of thread 412780: > > > > > > #0 0x0000000000e5fb99 ecore_int_igu_read_cam (dpdk-testpmd) > > > > > > #1 0x0000000000e4df54 ecore_get_hw_info (dpdk-testpmd) > > > > > > #2 0x0000000000e504aa ecore_hw_prepare (dpdk-testpmd) > > > > > > #3 0x0000000000e8a7ca qed_probe (dpdk-testpmd) > > > > > > #4 0x0000000000e83c59 qede_common_dev_init (dpdk-testpmd) > > > > > > #5 0x0000000000e84c8e qede_eth_dev_init (dpdk-testpmd) > > > > > > #6 0x00000000009dd5a7 rte_pci_probe_one_driver (dpdk-testpmd) > > > > > > #7 0x00000000009734e3 rte_bus_probe (dpdk-testpmd) > > > > > > #8 0x00000000009933bd rte_eal_init (dpdk-testpmd) > > > > > > #9 0x000000000041768f main (dpdk-testpmd) > > > > > > #10 0x00007f41a7001b17 __libc_start_main (libc.so.6) > > > > > > #11 0x000000000067e34a _start (dpdk-testpmd) > > > > > > ``` > > > > > > > > > > > > Signed-off-by: lic121 > > > > > > --- > > > > > > lib/librte_eal/common/malloc_heap.c | 8 ++++++++ > > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > > > diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c > > > > > > index f4e20ea..1607401 100644 > > > > > > --- a/lib/librte_eal/common/malloc_heap.c > > > > > > +++ b/lib/librte_eal/common/malloc_heap.c > > > > > > @@ -96,11 +96,19 @@ > > > > > > void *start, size_t len) > > > > > > { > > > > > > struct malloc_elem *elem = start; > > > > > > + void *ptr; > > > > > > + size_t data_len > > > > > > + > > > > > > > > > > > > malloc_elem_init(elem, heap, msl, len, elem, len); > > > > > > > > > > > > malloc_elem_insert(elem); > > > > > > > > > > > > + /* Zero out new added memory. */ > > > > > > + *ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); > > > > > > + data_len = elem->size - MALLOC_ELEM_OVERHEAD; > > > > > > + memset(ptr, 0, data_len); > > > > > > + > > > > > > elem = malloc_elem_join_adjacent_free(elem); > > > > > > > > > > > > malloc_elem_free_list_insert(elem); > > > > > > > > > > Hi, > > > > > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > > > and DPDK ensures that files in hugetlbfs are not re-mapped. > > > > > What makes you think that it is not zeroed? > > > > > Were you able to catch [start; start+len) contain non-zero bytes > > > > > at the start of this function? > > > > > If so, is it system memory (not an external heap)? > > > > > If so, what is the CPU, kernel, any custom settings? > > > > > > > > > > Can it be the PMD or the app that uses rte_malloc instead of rte_zmalloc? > > > > > > > > > > This patch cannot be accepted as-is anyway: > > > > > 1. It zeroes memory even if the code was called not via rte_zmalloc(). > > > > > 2. It leads to zeroing on both alloc and free, which is suboptimal. > > > > > > > > Hi Dmitry, thanks for the review. > > > > > > > > In rte_eth_dev_pci_allocate(), imediately after rte_zmalloc_socket()[1] > > > > I printed > > > > the content in gdb. It's not zero. > > > > > > > > print ((struct qede_dev *)(eth_dev->data->dev_private))->edev->p_iov_info > > > > > > > > cpu: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz > > > > kernel: 4.19.90-2102 > > > > > > > > [1] > > > > https://github.com/DPDK/dpdk/blob/v20.11/lib/librte_ethdev/rte_ethdev_pci.h#L91-L93 > > > > > > Sorry, it seems that something is wrong with your debug. > > > Your link is for 20.11.0. > > > In 20.11.5 (apparently always) struct qede_dev::edev is not a pointer [2]. > > > Even if it was, in zeroed memory it would be a NULL pointer, > > > reading a member would give a random value at NULL + some offset. > > > I suggest to print content of the allocated memory with rte_hexdump(). > > > > > > > Sorry I didn't describe my debug clear. At first I debuged with version > > 20.11.0, I found that the rte_zmalloc_socket() memory is dirty. Then I > > tried 20.11.5, I didn't debug on 20.11.5 but the behave is the same(nic > > failed to be probed). So in the commit msg I said v20.11.5 has the > > issue. But when I describe my debug I uesd 20.11.0 url. > > > > More debug info: > > 1. I reproduced the issue for tens of times, every time the printed var > > has the same value. > > 2. After search malloc_heap_add_memory, I found that there are 3 places > > where call this function to add memory, malloc_add_seg(), > > alloc_pages_on_heap() and malloc_heap_add_external_memory(). Firstly, I > > zero out memory only for malloc_add_seg(), it didn't fix the issue. Then > > I zero out meory in malloc_heap_add_memory() to cover all 3 cases, this > > time nic is probed successfully. > > 3. Once nic is probed, I roll back my fix code, try to reproduce the > > issue. But I can't reproduce anymore. So I guess: the memory allocated > > when probe qede nic is at a fixed memory location. Because every time in > > my debug the printed var has the same value. After I zeroed out the > > allocated memory once, I can't reproduce the issue anymore. > > > > > [2]: > > > http://git.dpdk.org/dpdk-stable/tree/drivers/net/qede/qede_ethdev.h?h=v20.11.5#n223 > > Today we probaly meet the same issue with intel E810 nic, the behave is > that E810 nic can be probed on some host, but can't one some other. On > the same host, one E810 may be probed while the other one can't be. > After I applied this patch, no such issue anymore. Are you perhaps running your DPDK application from inside a container? I remember tracking an issue which had to do with reusing a "dirty" hugepage file (because of SELinux forbidding to destroy those files). -- David Marchand