From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 318F9A00C4 for ; Wed, 27 Jul 2022 14:30:59 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B01DE40141; Wed, 27 Jul 2022 14:30:58 +0200 (CEST) Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by mails.dpdk.org (Postfix) with ESMTP id 5EE43400D7 for ; Wed, 27 Jul 2022 14:30:57 +0200 (CEST) Received: by mail-lj1-f182.google.com with SMTP id v21so1051648ljh.3 for ; Wed, 27 Jul 2022 05:30:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RT9EoAv33MBSmD4qps5plg5qgTDZNcoMh2YOgrpST20=; b=hBqQkwcm+vVeuVthMAX9KTS5TOQoE/Yith4mRbccQfsvnkhiLqmX6PMkJuEpx8ZmHV 2Bc8dS5kTL2qkWKVDC+U16CFGNu75TxZsb3A2I+NV7aFY1fTHlWxaYZNqHhzKaICvG0S RmKRSoLDqx+4PewzMHBOWW0xz/aNXTi0vfeq1HKzRp6SWDqgEh3ZwdXuk5TF+Q9aK5hF RLMro/HSOupvm/QLhqvJTUn8zJbeCpcmIgXxmcwZjpdi6JZCYAKHLUVhQMIARVcUQx3q vUxx5PunRRXp3XfUPT1qfmhk9kdA62pueiZ7tIkLo6TGjV0DKOhGyAaWMDoZnfcmiDxk dvmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RT9EoAv33MBSmD4qps5plg5qgTDZNcoMh2YOgrpST20=; b=HypxxHP1nuzC2mjGGwVhdTTe+XpLrEnEyVaXKM7ADz3xvpkUA9LH5mlj8Q24EakqkC 51hJZoQWlSlwmGqE9VD/8nrvG1WvDL9pWLdSEaLKZdY6XXWpGSonNXujGY0PQg+t95OA 8SOeS6D4/7Wmd3kxoQq6fEpODD5Yx+pB3eFqwhcqs9Fl3z+dZKN3g8W5XWZ0CaaNVEHm z/Ow9Iabg7hJ8O2xL6UCWv4bmWq0nNio15KvgLuUPksLciH3OmFCrrlzJQeUqlYaAt4G 2RkJak/xq9zLJdWwxPoA3IXDDEi7NRgjQkjIWgQFjVN7gjLVikw7u39ozN+V61mwr4NV KQSA== X-Gm-Message-State: AJIora8FSilRWwgLUHJwMmccVm96/n0t2yenarCPKZPH1a57W5BaPt9Y DMMaUtbXfgrJgnysyAweXqk= X-Google-Smtp-Source: AGRyM1thaOKdftPzjG2tzJYa+R0ku0WBEZCqlZjlg3A8aD4vEMe16pBIKLJoVoqigRnAC+S/Hp375w== X-Received: by 2002:a2e:9851:0:b0:25d:e8a0:97c7 with SMTP id e17-20020a2e9851000000b0025de8a097c7mr7684269ljj.320.1658925056451; Wed, 27 Jul 2022 05:30:56 -0700 (PDT) Received: from sovereign (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id e19-20020a05651236d300b0048a8fc05b0asm308778lfs.117.2022.07.27.05.30.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jul 2022 05:30:55 -0700 (PDT) Date: Wed, 27 Jul 2022 15:30:55 +0300 From: Dmitry Kozlyuk To: MOD Cc: users@dpdk.org Subject: Re: Mempool bigger than 1 page causes segmentation fault Message-ID: <20220727153055.0907ea35@sovereign> In-Reply-To: References: X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org 2022-07-27 14:59 (UTC+0300), MOD: > Hi All, > > My team and I have encountered a problem where allocation of a mempool > larger than 1GB (== 1 Hugepage) fails. > We are in a multi-process environment, and the `rte_mempool_create` > happens in the secondary process. > > Sometimes the allocation succeeds but after some successes (for me > specifically, two) the following occurs: > the secondary process segfaults on `malloc_elem_can_hold`, inside a stack > starting from `rte_mempool_create`. > > Restarting the secondary process does not work as it is stuck on `EAL: > Probing VFIO support`, and restarting > the main process is the only option. > > Has anyone had this problem, or knows any possible solution? > Thanks! Please tell the DPDK version and attach the stack trace. If possible, try rebuilding DPDK with RTE_MALLOC_DEBUG defined, and if your DPDK version supports it, with AddressSanitizer enabled. Segfault in a function that traverses the malloc element list suggests the heap may be corrupted, but it's only a guess. Restarting the secondary process after a segfault is hardly a viable idea because at this point the common memory may be already corrupted, some lock may be taken and never released (which is a possible reason it stucks, BTW).