From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6477EA034F; Tue, 21 Dec 2021 03:01:37 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E0C9440040; Tue, 21 Dec 2021 03:01:36 +0100 (CET) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by mails.dpdk.org (Postfix) with ESMTP id F2BE84003C for ; Tue, 21 Dec 2021 03:01:34 +0100 (CET) Received: from dggeme756-chm.china.huawei.com (unknown [172.30.72.55]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4JJ04d5JQjz8vym; Tue, 21 Dec 2021 09:59:13 +0800 (CST) Received: from [10.67.103.128] (10.67.103.128) by dggeme756-chm.china.huawei.com (10.3.19.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.20; Tue, 21 Dec 2021 10:01:32 +0800 Subject: Re: [PATCH 3/7] net/bonding: change mbuf pool and ring allocation To: "Sanford, Robert" , Robert Sanford , "dev@dpdk.org" CC: "chas3@att.com" References: <1639592401-56845-1-git-send-email-rsanford@akamai.com> <1639592401-56845-4-git-send-email-rsanford@akamai.com> <3eb682a7-74db-5bc6-cbd0-7dbbc4177abd@huawei.com> <7CE0C72F-5CFD-4C75-8B03-5739A0339092@akamai.com> <8C168DC6-124B-42B4-A766-11A6FA898256@akamai.com> From: "Min Hu (Connor)" Message-ID: <6afa9788-b349-d2b9-c869-d258b4a6d46f@huawei.com> Date: Tue, 21 Dec 2021 10:01:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 MIME-Version: 1.0 In-Reply-To: <8C168DC6-124B-42B4-A766-11A6FA898256@akamai.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.128] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggeme756-chm.china.huawei.com (10.3.19.102) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, Sanford, 在 2021/12/21 0:47, Sanford, Robert 写道: > Hello Connor, > > Please see responses inline. > > On 12/17/21, 10:44 PM, "Min Hu (Connor)" wrote: > >>> When the number of used tx-descs (0..255) + number of mbufs in the >>> cache (0..47) reaches 257, then allocation fails. >>> >>> If I understand the LACP tx-burst code correctly, it would be >>> worse if nb_tx_queues > 1, because (assuming multiple tx-cores) >>> any queue/lcore could xmit an LACPDU. Thus, up to nb_tx_queues * >>> 47 mbufs could be cached, and not accessible from tx_machine(). >>> >>> You would not see this problem if the app xmits other (non-LACP) >>> mbufs on a regular basis, to expedite the clean-up of tx-descs >>> including LACPDU mbufs (unless nb_tx_queues tx-core caches >>> could hold all LACPDU mbufs). >>> >> I think, we could not see this problem only because the mempool can >> offer much more mbufs than cache size on no-LACP circumstance. >> >>> If we make mempool's cache size 0, then allocation will not fail. >> How about enlarge the size of mempool, i.e., up to 4096 ? I think >> it can also avoid this bug. >>> >>> A mempool cache for LACPDUs does not offer much additional speed: >>> during alloc, the intr thread does not have default mempool caches >> Why? as I know, all the core has its own default mempool caches ? > > These are private mbuf pools that we use *only* for LACPDUs, *one* > mbuf per second, at most. (When LACP link peer selects long timeouts, > we get/put one mbuf every 30 seconds.) > > There is *NO* benefit for the consumer thread (interrupt thread > executing tx_machine()) to have caches on per-slave LACPDU pools. > The interrupt thread is a control thread, i.e., a non-EAL thread. > Its lcore_id is LCORE_ID_ANY, so it has no "default cache" in any > mempool. Well, sorry, I forgot that interrupt thread is non-EAL thread. > > There is little or no benefit for active data-plane threads to have > caches on per-slave LACPDU pools, because on each pool, the producer > thread puts back, at most, one mbuf per second. There is not much > contention with the consumer (interrupt thread). > > I contend that caches are not necessary for these private LACPDU I agree with you. > mbuf pools, but just waste RAM and CPU-cache. If we still insist > on creating them *with* caches, then we should add at least > (cache-size x 1.5 x nb-tx-queues) mbufs per pool. > > >>> Q: Why reserve one additional slot in the rx and tx rings? >>> >>> A: rte_ring_create() requires the ring size N, to be a power of 2, >>> but it can only store N-1 items. Thus, if we want to store X items, >> Hi, Robert, could you describe it for me? >> I cannot understand why it >> "only store N -1 items". I check the source code, It writes: >> "The real usable ring size is *count-1* instead of *count* to >> differentiate a free ring from an empty ring." >> But I still can not get what it wrote. > > I believe there is a mistake in the ring comments (in 3 places). > It would be better if they replace "free" with "full": > "... to differentiate a *full* ring from an empty ring." > Well, I still can not understand it. I think the ring size is N, it should store N items, why "N - 1" items.? Hope for your description, thanks. > >>> we need to ask for (at least) X+1. Original code fails when the real >>> desired size is a power of 2, because in such a case, align32pow2 >>> does not round up. >>> >>> For example, say we want a ring to hold 4: >>> >>> rte_ring_create(... rte_align32pow2(4) ...) >>> >>> rte_align32pow2(4) returns 4, and we end up with a ring that only >>> stores 3 items. >>> >>> rte_ring_create(... rte_align32pow2(4+1) ...) >>> >>> rte_align32pow2(5) returns 8, and we end up with a ring that >>> stores up to 7 items, more than we need, but acceptable. >> To fix the bug, how about just setting the flags "RING_F_EXACT_SZ" > > Yes, this is a good idea. I will look for examples or test code that > use this flag. Yes, if fixed, ILGM. > > -- > Regards, > Robert Sanford > >