From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 93FDAA0C4E;
	Thu, 21 Oct 2021 15:33:05 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 1F8EE411FE;
	Thu, 21 Oct 2021 15:33:05 +0200 (CEST)
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by mails.dpdk.org (Postfix) with ESMTP id 00A564118E
 for <dev@dpdk.org>; Thu, 21 Oct 2021 15:33:03 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1634823183;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=PRq94+4axUS48oFZWK6laRSJ2ZLU5e8uq4QY5MqwPwM=;
 b=cTCIX1T7GoKX5St3d5J7jg6wkVdRHfyXGnK5zi+jRG3pV0BZ9LZWlX/CbSjONWVj/B8CKI
 PGGUIAh3oakRUeeGS2KBNvbfJRF+edfsSJC7Cx8qG6/zzSMjPBj8ZhgbZxWFv0OPgGEbJC
 E9YhCHAfzyjoyBl2FwlWyv6RUbrxb5c=
Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com
 [209.85.167.72]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-442-W-4C-B72NZGgEl3F6yJQpw-1; Thu, 21 Oct 2021 09:33:02 -0400
X-MC-Unique: W-4C-B72NZGgEl3F6yJQpw-1
Received: by mail-lf1-f72.google.com with SMTP id
 p19-20020a056512139300b003ff6dfea137so387920lfa.9
 for <dev@dpdk.org>; Thu, 21 Oct 2021 06:33:01 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=PRq94+4axUS48oFZWK6laRSJ2ZLU5e8uq4QY5MqwPwM=;
 b=40HNReHYi/4k3yFU84bKg6ZJE/O8ZEJt9NtOVED5kmqKonvC3yhoQo7GZmlEntryFo
 b7X4e0mrwSe2k9O6sPkmX/BPXVHC6XY6Z9H0Wzje3/Ed1CWUIiw0YsCJyIJJ6LyNfewr
 HWPtVEqrIblxkHtCZ+Aiz3IhTPkvBkYyOXRdE1LLyrKSYKA/Ad8mWF1/xGYVEtjH4CCh
 3nLm72WDsBqh6OORmGaOar6dYIsg81SQL+OI4DDX2T5Lwgy9ocjl8+x0DqXzC+12mly9
 E45DNclfZN3EOW+Y7RPlNEjDGf4Uu+FkEcTcFOR2KH8P5D5PVkgwnOHSJc8utayGkBBN
 y5ng==
X-Gm-Message-State: AOAM530d/o8EZVADqWWvml72tYgWUAgrxqKp657e8A4Tjvu7ZBaLXwuI
 m+Sx0KjcSoqnJlm7QzDltmN/qOYgYexSAkyDRNu7EH8sr6jB5Xq41ls0GokcNSnrCEB5YhiDBxY
 vNfPlPAEZ+OLGLCp5sZ8=
X-Received: by 2002:a2e:9588:: with SMTP id w8mr5826435ljh.81.1634823180457;
 Thu, 21 Oct 2021 06:33:00 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwFT0iJ7QJnvPZ3EvBxcH5zESk/s9W7Kx2MHqvmySDZ197W6wlQGTNbk/3E+2KlFkK/B7+NEV/bT1k/DYR1jo8=
X-Received: by 2002:a2e:9588:: with SMTP id w8mr5826403ljh.81.1634823180117;
 Thu, 21 Oct 2021 06:33:00 -0700 (PDT)
MIME-Version: 1.0
References: <20210826145726.102081-1-hkalra@marvell.com>
 <20211018193707.123559-1-hkalra@marvell.com>
 <20211018193707.123559-3-hkalra@marvell.com>
 <20211018155654.0d3ffbed@hermes.local>
 <BL1PR18MB41970024F2181D6A1ADB7E46C5BD9@BL1PR18MB4197.namprd18.prod.outlook.com>
 <20211020183051.657b05c1@sovereign>
 <BN9PR18MB42048E137A3C13DD6F4FE991C5BF9@BN9PR18MB4204.namprd18.prod.outlook.com>
 <20211021153305.664c7216@sovereign>
In-Reply-To: <20211021153305.664c7216@sovereign>
From: David Marchand <david.marchand@redhat.com>
Date: Thu, 21 Oct 2021 15:32:48 +0200
Message-ID: <CAJFAV8xN8cWeviqQCHfHj_r5HBLo6+UBL+W_as+KzWTEKPoZpg@mail.gmail.com>
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>,
 Harman Kalra <hkalra@marvell.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
 Thomas Monjalon <thomas@monjalon.net>, 
 "dev@dpdk.org" <dev@dpdk.org>, Ray Kinsella <mdr@ashroe.eu>
Authentication-Results: relay.mimecast.com;
 auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dmarchan@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v3 2/7] eal/interrupts: implement
 get set APIs
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, Oct 21, 2021 at 2:33 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
> > Hi All,
> >
> > I came across 2 issues introduced with auto detection mechanism.
> > 1. In case of primary secondary model.  Primary application is started which makes lots of allocations via
> > rte_malloc*
> >
> >     Secondary side:
> >     a. Secondary starts, in its "rte_eal_init()" it makes some allocation via rte_*, and in one of the allocation
> > request for heap expand is made as current memseg got exhausted. (malloc_heap_alloc_on_heap_id ()->
> >    alloc_more_mem_on_socket()->try_expand_heap())
> >    b. A request to primary for heap expand is sent. Please note secondary holds the spinlock while making
> > the request. (malloc_heap_alloc_on_heap_id ()->rte_spinlock_lock(&(heap->lock));)
> >
> >    Primary side:
> >    a. Primary receives the request, install a new hugepage and setups up the heap (handle_alloc_request())
> >    b. To inform all the secondaries about the new memseg, primary sends a sync notice where it sets up an
> > alarm (rte_mp_request_async ()->mp_request_async()).
> >    c. Inside alarm setup API, we register an interrupt callback.
> >    d. Inside rte_intr_callback_register(), a new interrupt instance allocation is requested for "src->intr_handle"
> >    e. Since memory management is detected as up, inside "rte_intr_instance_alloc()", call to "rte_zmalloc" for
> > allocating memory and further inside "malloc_heap_alloc_on_heap_id()", primary will experience a deadlock
> > while taking up the spinlock because this spinlock is already hold by secondary.
> >
> >
> > 2. "eal_flags_file_prefix_autotest" is failing because the spawned process by this tests are expected to cleanup
> > their hugepage traces from respective directories (eg /dev/hugepage).
> > a. Inside eal_cleanup, rte_free()->malloc_heap_free(), where element to be freed is added to the free list and
> > checked if nearby elements can be joined together and form a big free chunk (malloc_elem_free()).
> > b. If this free chunk is big enough than the hugepage size, respective hugepage can be uninstalled after making
> > sure no allocation from this hugepage exists. (malloc_heap_free()->malloc_heap_free_pages()->eal_memalloc_free_seg())
> >
> > But because of interrupt allocations made for pci intr handles (used for VFIO) and other driver specific interrupt
> > handles are not cleaned up in "rte_eal_cleanup()", these hugepage files are not removed and test fails.
>
> Sad to hear. But it's a great and thorough analysis.
>
> > There could be more such issues, I think we should firstly fix the DPDK.
> > 1. Memory management should be made independent and should be the first thing to come up in rte_eal_init()
>
> As I have explained, buses must be able to report IOVA requirement
> at this point (`get_iommu_class()` bus method).
> Either `scan()` must complete before that
> or `get_iommu_class()` must be able to work before `scan()` is called.
>
> > 2. rte_eal_cleanup() should be exactly opposite to rte_eal_init(), just like bus_probe, we should have bus_remove
> > to clean up all the memory allocations.
>
> Yes. For most buses it will be just "unplug each device".
> In fact, EAL could do it with `unplug()`, but it is not mandatory.
>
> >
> > Regarding this IRQ series, I would like to fall back to our original design i.e. rte_intr_instance_alloc() should take
> > an argument whether its memory should be allocated using glibc malloc or rte_malloc*.
>
> Seems there's no other option to make it on time.

- Sorry, my memory is too short, did we describe where we need to
share rte_intr_handle objects?

I spent some time looking at uses of rte_intr_handle objects.

In many cases intr_handle objects are referenced in malloc() objects.
The cases where rte_intr_handle are shared is in per device private
bits in drivers.

A intr_handle often contains fds.
For them to be used in mp setups, there needs to be a big machinery
with SCM_RIGHTS but I see only 3 drivers which actually reference
this.
So if intr_handle fds are accessed by multiple processes, their
content probably makes no sense wrt fds.


>From these two hints, I think we are going backwards, and the main
usecase is that those rte_intr_instance objects are not used in mp.
I even think they are never accessed from other processes.
But I am not sure.


- Seeing how time it short for rc1, I am ok with
rte_intr_instance_alloc() taking a flag argument.
And we can still go back on this API later.

Can we agree on the flag name?
rte_malloc() interest is that it makes objects shared for mp, so how
about RTE_INTR_INSTANCE_F_SHARED ?


-- 
David Marchand