From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 93FDAA0C4E; Thu, 21 Oct 2021 15:33:05 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1F8EE411FE; Thu, 21 Oct 2021 15:33:05 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 00A564118E for ; Thu, 21 Oct 2021 15:33:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1634823183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PRq94+4axUS48oFZWK6laRSJ2ZLU5e8uq4QY5MqwPwM=; b=cTCIX1T7GoKX5St3d5J7jg6wkVdRHfyXGnK5zi+jRG3pV0BZ9LZWlX/CbSjONWVj/B8CKI PGGUIAh3oakRUeeGS2KBNvbfJRF+edfsSJC7Cx8qG6/zzSMjPBj8ZhgbZxWFv0OPgGEbJC E9YhCHAfzyjoyBl2FwlWyv6RUbrxb5c= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-442-W-4C-B72NZGgEl3F6yJQpw-1; Thu, 21 Oct 2021 09:33:02 -0400 X-MC-Unique: W-4C-B72NZGgEl3F6yJQpw-1 Received: by mail-lf1-f72.google.com with SMTP id p19-20020a056512139300b003ff6dfea137so387920lfa.9 for ; Thu, 21 Oct 2021 06:33:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PRq94+4axUS48oFZWK6laRSJ2ZLU5e8uq4QY5MqwPwM=; b=40HNReHYi/4k3yFU84bKg6ZJE/O8ZEJt9NtOVED5kmqKonvC3yhoQo7GZmlEntryFo b7X4e0mrwSe2k9O6sPkmX/BPXVHC6XY6Z9H0Wzje3/Ed1CWUIiw0YsCJyIJJ6LyNfewr HWPtVEqrIblxkHtCZ+Aiz3IhTPkvBkYyOXRdE1LLyrKSYKA/Ad8mWF1/xGYVEtjH4CCh 3nLm72WDsBqh6OORmGaOar6dYIsg81SQL+OI4DDX2T5Lwgy9ocjl8+x0DqXzC+12mly9 E45DNclfZN3EOW+Y7RPlNEjDGf4Uu+FkEcTcFOR2KH8P5D5PVkgwnOHSJc8utayGkBBN y5ng== X-Gm-Message-State: AOAM530d/o8EZVADqWWvml72tYgWUAgrxqKp657e8A4Tjvu7ZBaLXwuI m+Sx0KjcSoqnJlm7QzDltmN/qOYgYexSAkyDRNu7EH8sr6jB5Xq41ls0GokcNSnrCEB5YhiDBxY vNfPlPAEZ+OLGLCp5sZ8= X-Received: by 2002:a2e:9588:: with SMTP id w8mr5826435ljh.81.1634823180457; Thu, 21 Oct 2021 06:33:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFT0iJ7QJnvPZ3EvBxcH5zESk/s9W7Kx2MHqvmySDZ197W6wlQGTNbk/3E+2KlFkK/B7+NEV/bT1k/DYR1jo8= X-Received: by 2002:a2e:9588:: with SMTP id w8mr5826403ljh.81.1634823180117; Thu, 21 Oct 2021 06:33:00 -0700 (PDT) MIME-Version: 1.0 References: <20210826145726.102081-1-hkalra@marvell.com> <20211018193707.123559-1-hkalra@marvell.com> <20211018193707.123559-3-hkalra@marvell.com> <20211018155654.0d3ffbed@hermes.local> <20211020183051.657b05c1@sovereign> <20211021153305.664c7216@sovereign> In-Reply-To: <20211021153305.664c7216@sovereign> From: David Marchand Date: Thu, 21 Oct 2021 15:32:48 +0200 Message-ID: To: Dmitry Kozlyuk , Harman Kalra Cc: Stephen Hemminger , Thomas Monjalon , "dev@dpdk.org" , Ray Kinsella Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dmarchan@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v3 2/7] eal/interrupts: implement get set APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Oct 21, 2021 at 2:33 PM Dmitry Kozlyuk wrote: > > Hi All, > > > > I came across 2 issues introduced with auto detection mechanism. > > 1. In case of primary secondary model. Primary application is started which makes lots of allocations via > > rte_malloc* > > > > Secondary side: > > a. Secondary starts, in its "rte_eal_init()" it makes some allocation via rte_*, and in one of the allocation > > request for heap expand is made as current memseg got exhausted. (malloc_heap_alloc_on_heap_id ()-> > > alloc_more_mem_on_socket()->try_expand_heap()) > > b. A request to primary for heap expand is sent. Please note secondary holds the spinlock while making > > the request. (malloc_heap_alloc_on_heap_id ()->rte_spinlock_lock(&(heap->lock));) > > > > Primary side: > > a. Primary receives the request, install a new hugepage and setups up the heap (handle_alloc_request()) > > b. To inform all the secondaries about the new memseg, primary sends a sync notice where it sets up an > > alarm (rte_mp_request_async ()->mp_request_async()). > > c. Inside alarm setup API, we register an interrupt callback. > > d. Inside rte_intr_callback_register(), a new interrupt instance allocation is requested for "src->intr_handle" > > e. Since memory management is detected as up, inside "rte_intr_instance_alloc()", call to "rte_zmalloc" for > > allocating memory and further inside "malloc_heap_alloc_on_heap_id()", primary will experience a deadlock > > while taking up the spinlock because this spinlock is already hold by secondary. > > > > > > 2. "eal_flags_file_prefix_autotest" is failing because the spawned process by this tests are expected to cleanup > > their hugepage traces from respective directories (eg /dev/hugepage). > > a. Inside eal_cleanup, rte_free()->malloc_heap_free(), where element to be freed is added to the free list and > > checked if nearby elements can be joined together and form a big free chunk (malloc_elem_free()). > > b. If this free chunk is big enough than the hugepage size, respective hugepage can be uninstalled after making > > sure no allocation from this hugepage exists. (malloc_heap_free()->malloc_heap_free_pages()->eal_memalloc_free_seg()) > > > > But because of interrupt allocations made for pci intr handles (used for VFIO) and other driver specific interrupt > > handles are not cleaned up in "rte_eal_cleanup()", these hugepage files are not removed and test fails. > > Sad to hear. But it's a great and thorough analysis. > > > There could be more such issues, I think we should firstly fix the DPDK. > > 1. Memory management should be made independent and should be the first thing to come up in rte_eal_init() > > As I have explained, buses must be able to report IOVA requirement > at this point (`get_iommu_class()` bus method). > Either `scan()` must complete before that > or `get_iommu_class()` must be able to work before `scan()` is called. > > > 2. rte_eal_cleanup() should be exactly opposite to rte_eal_init(), just like bus_probe, we should have bus_remove > > to clean up all the memory allocations. > > Yes. For most buses it will be just "unplug each device". > In fact, EAL could do it with `unplug()`, but it is not mandatory. > > > > > Regarding this IRQ series, I would like to fall back to our original design i.e. rte_intr_instance_alloc() should take > > an argument whether its memory should be allocated using glibc malloc or rte_malloc*. > > Seems there's no other option to make it on time. - Sorry, my memory is too short, did we describe where we need to share rte_intr_handle objects? I spent some time looking at uses of rte_intr_handle objects. In many cases intr_handle objects are referenced in malloc() objects. The cases where rte_intr_handle are shared is in per device private bits in drivers. A intr_handle often contains fds. For them to be used in mp setups, there needs to be a big machinery with SCM_RIGHTS but I see only 3 drivers which actually reference this. So if intr_handle fds are accessed by multiple processes, their content probably makes no sense wrt fds. >From these two hints, I think we are going backwards, and the main usecase is that those rte_intr_instance objects are not used in mp. I even think they are never accessed from other processes. But I am not sure. - Seeing how time it short for rc1, I am ok with rte_intr_instance_alloc() taking a flag argument. And we can still go back on this API later. Can we agree on the flag name? rte_malloc() interest is that it makes objects shared for mp, so how about RTE_INTR_INSTANCE_F_SHARED ? -- David Marchand