From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1134CA04FD; Sun, 3 Jul 2022 14:20:19 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E5D8F40E50; Sun, 3 Jul 2022 14:20:18 +0200 (CEST) Received: from forward501p.mail.yandex.net (forward501p.mail.yandex.net [77.88.28.111]) by mails.dpdk.org (Postfix) with ESMTP id AA65140685 for ; Sun, 3 Jul 2022 14:20:17 +0200 (CEST) Received: from myt5-f0e8352497c8.qloud-c.yandex.net (myt5-f0e8352497c8.qloud-c.yandex.net [IPv6:2a02:6b8:c12:3c22:0:640:f0e8:3524]) by forward501p.mail.yandex.net (Yandex) with ESMTP id EF3D062127D2; Sun, 3 Jul 2022 15:20:16 +0300 (MSK) Received: from myt6-bd59def10a3e.qloud-c.yandex.net (myt6-bd59def10a3e.qloud-c.yandex.net [2a02:6b8:c12:2487:0:640:bd59:def1]) by myt5-f0e8352497c8.qloud-c.yandex.net (mxback/Yandex) with ESMTP id 6cUL9wBynV-KGgu1hgM; Sun, 03 Jul 2022 15:20:16 +0300 X-Yandex-Fwd: 2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1656850816; bh=iJDx2+jHvRN/RZJyYVHf+/1w3Uc3ua5BzH1xt5SQLz0=; h=In-Reply-To:From:Subject:Cc:References:Date:Message-ID:To; b=CGm/ORpfIcsSQ/UR1k0ijMBwggX3VcfH/1a+iN1dztWR8wpTPdA45sQiQoVFJRrYY SW3NqvH7Q3WtQBPP+ICbBzVFNoEp3hnSgKjcRyyVLicgTkJNt2bYLtDvkDwQrA2Evw hhgEaZlC0Itl4WJPWTDyVc5zSVSIlwslCbuxeeSo= Authentication-Results: myt5-f0e8352497c8.qloud-c.yandex.net; dkim=pass header.i=@yandex.ru Received: by myt6-bd59def10a3e.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id BPcMxANCqa-KFMWpoBL; Sun, 03 Jul 2022 15:20:15 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Message-ID: <1e082bfe-9b52-86f0-e7fa-279ef8feaf1a@yandex.ru> Date: Sun, 3 Jul 2022 13:20:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [RFC PATCH v1] net/i40e: put mempool cache out of API Content-Language: en-US To: Feifei Wang , Yuying Zhang , Beilei Xing , Ruifeng Wang Cc: dev@dpdk.org, nd@arm.com, Honnappa Nagarahalli References: <20220613055136.1949784-1-feifei.wang2@arm.com> From: Konstantin Ananyev In-Reply-To: <20220613055136.1949784-1-feifei.wang2@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > Refer to "i40e_tx_free_bufs_avx512", this patch puts mempool cache > out of API to free buffers directly. There are two changes different > with previous version: > 1. change txep from "i40e_entry" to "i40e_vec_entry" > 2. put cache out of "mempool_bulk" API to copy buffers into it directly > > Performance Test with l3fwd neon path: > with this patch > n1sdp: no perforamnce change > amper-altra: +4.0% > Thanks for RFC, appreciate your effort. So, as I understand - bypassing mempool put/get itself gives about 7-10% speedup for RX/TX on ARM platforms, correct? About direct-rearm RX approach you propose: After another thought, probably it is possible to re-arrange it in a way that would help avoid related negatives. The basic idea as follows: 1. Make RXQ sw_ring visible and accessible by 'attached' TX queues. Also make sw_ring de-coupled from RXQ itself, i.e: when RXQ is stopped or even destroyed, related sw_ring may still exist (probably ref-counter or RCU would be sufficient here). All that means we need a common layout/api for rxq_sw_ring and PMDs that would like to support direct-rearming will have to use/obey it. 2. Make RXQ sw_ring 'direct' rearming driven by TXQ itself, i.e: at txq_free_bufs() try to store released mbufs inside attached sw_ring directly. If there is no attached sw_ring, or not enough free space in it - continue with mempool_put() as usual. Note that actual arming of HW RXDs still remains responsibility of RX code-path: rxq_rearm(rxq) { ... - check are there are N already filled entries inside rxq_sw_ring. if not, populate them from mempool (usual mempool_get()). - arm related RXDs and mark these sw_ring entries as managed by HW. ... } So rxq_sw_ring will serve two purposes: - track mbufs that are managed by HW (that what it does now) - private (per RXQ) mbuf cache Now, if TXQ is stopped while RXQ is running - no extra synchronization is required, RXQ would just use mempool_get() to rearm its sw_ring itself. If RXQ is stopped while TXQ is still running - TXQ can still continue to populate related sw_ring till it gets full. Then it will continue with mempool_put() as usual. Of-course it means that user who wants to use this feature should probably account some extra mbufs for such case, or might be rxq_sw_ring can have enable/disable flag to mitigate such situation. As another benefit here - such approach makes possible to use several TXQs (even from different devices) to rearm same RXQ. Have to say, that I am still not sure that 10% RX/TX improvement is worth bypassing mempool completely and introducing all this extra complexity in RX/TX path. But, if we'll still decide to go ahead with direct-rearming, this re-arrangement, I think, should help to keep things clear and avoid introducing new limitations in existing functionality. WDYT? Konstantin