From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [67.231.154.164]) by dpdk.org (Postfix) with ESMTP id B36DB49E2 for ; Tue, 24 Apr 2018 20:21:22 +0200 (CEST) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1-us3.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id F3240B8005F; Tue, 24 Apr 2018 18:21:18 +0000 (UTC) Received: from [192.168.239.128] (188.242.181.57) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Tue, 24 Apr 2018 19:21:05 +0100 To: Olivier Matz CC: Yongseok Koh , , , , , , References: <20180310012532.15809-1-yskoh@mellanox.com> <20180424013854.33749-1-yskoh@mellanox.com> <934e714e-3cba-7f5d-9fcf-4f96611d758f@solarflare.com> <20180424160244.bggifhilvadxcjb2@neon> From: Andrew Rybchenko Message-ID: Date: Tue, 24 Apr 2018 21:21:00 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180424160244.bggifhilvadxcjb2@neon> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [188.242.181.57] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-11.0.0.1191-8.100.1062-23802.003 X-TM-AS-Result: No--12.597300-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-MDID: 1524594080-lmYoLtIk34s4 Subject: Re: [dpdk-dev] ***Spam*** Re: [PATCH v4 1/2] mbuf: support attaching external buffer to mbuf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2018 18:21:23 -0000 On 04/24/2018 07:02 PM, Olivier Matz wrote: > Hi Andrew, Yongseok, > > On Tue, Apr 24, 2018 at 03:28:33PM +0300, Andrew Rybchenko wrote: >> On 04/24/2018 04:38 AM, Yongseok Koh wrote: >>> This patch introduces a new way of attaching an external buffer to a mbuf. >>> >>> Attaching an external buffer is quite similar to mbuf indirection in >>> replacing buffer addresses and length of a mbuf, but a few differences: >>> - When an indirect mbuf is attached, refcnt of the direct mbuf would be >>> 2 as long as the direct mbuf itself isn't freed after the attachment. >>> In such cases, the buffer area of a direct mbuf must be read-only. But >>> external buffer has its own refcnt and it starts from 1. Unless >>> multiple mbufs are attached to a mbuf having an external buffer, the >>> external buffer is writable. >>> - There's no need to allocate buffer from a mempool. Any buffer can be >>> attached with appropriate free callback. >>> - Smaller metadata is required to maintain shared data such as refcnt. >> Really useful. Many thanks. See my notes below. >> >> It worries me that detach is more expensive than it really required since it >> requires to restore mbuf as direct. If mbuf mempool is used for mbufs >> as headers for external buffers only all these actions are absolutely >> useless. > I agree on the principle. And we have the same issue with indirect mbuf. > Currently, the assumption is that a free mbuf (inside a mempool) is > initialized as a direct mbuf. We can think about optimizations here, > but I'm not sure it should be in this patchset. I agree that it should be addressed separately. > [...] > >>> @@ -688,14 +704,33 @@ rte_mbuf_to_baddr(struct rte_mbuf *md) >>> } >>> /** >>> + * Returns TRUE if given mbuf is cloned by mbuf indirection, or FALSE >>> + * otherwise. >>> + * >>> + * If a mbuf has its data in another mbuf and references it by mbuf >>> + * indirection, this mbuf can be defined as a cloned mbuf. >>> + */ >>> +#define RTE_MBUF_CLONED(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) >>> + >>> +/** >>> * Returns TRUE if given mbuf is indirect, or FALSE otherwise. >>> */ >>> -#define RTE_MBUF_INDIRECT(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) >>> +#define RTE_MBUF_INDIRECT(mb) RTE_MBUF_CLONED(mb) >> It is still confusing that INDIRECT != !DIRECT. >> May be we have no good options right now, but I'd suggest to at least >> deprecate >> RTE_MBUF_INDIRECT() and completely remove it in the next release. > Agree. I may have missed something, but is my previous suggestion > not doable? > > - direct = embeds its own data (and indirect = !direct) > - clone (or another name) = data is another mbuf > - extbuf = data is in an external buffer I guess the problem that it changes INDIRECT semantics since EXTBUF is added as well. I think strictly speaking it is an API change. Is it OK to make it without announcement? > Deprecating the macro is a good idea. > >>> + m->buf_addr = buf_addr; >>> + m->buf_iova = buf_iova; >>> + >>> + if (shinfo == NULL) { >>> + shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end, >>> + sizeof(*shinfo)), sizeof(uintptr_t)); >>> + if ((void *)shinfo <= buf_addr) >>> + return NULL; >>> + >>> + m->buf_len = RTE_PTR_DIFF(shinfo, buf_addr); >>> + } else { >>> + m->buf_len = buf_len; >>> + } >>> + >>> + m->data_len = 0; >>> + >>> + rte_pktmbuf_reset_headroom(m); >> I would suggest to make data_off one more parameter. >> If I have a buffer with data which I'd like to attach to an mbuf, I'd like >> to control data_off. > Another option is to set the headroom to 0. > Because the after attaching the mbuf to an external buffer, we will > still require to set the length. > > A user can do something like this: > > rte_pktmbuf_attach_extbuf(m, buf_va, buf_iova, buf_len, shinfo, > free_cb, free_cb_arg); > rte_pktmbuf_append(m, data_len + headroom); > rte_pktmbuf_adj(m, headroom); > >>> + m->ol_flags |= EXT_ATTACHED_MBUF; >>> + m->shinfo = shinfo; >>> + >>> + rte_mbuf_ext_refcnt_set(shinfo, 1); >> Why is assignment used here? Cannot we attach extbuf already attached to >> other mbuf? > In rte_pktmbuf_attach(), this is true. That's not illogical to > keep the same approach here. Maybe an assert could be added? > >> May be shinfo should be initialized only if it is not provided (shinfo == >> NULL on input)? > I don't get why, can you explain please? May be I misunderstand how it should look like when one huge buffer is partitioned. I thought that it should be only one shinfo per huge buffer to control when it is not used any more by any mbufs with extbuf. Other option is to have shinfo per small buf plus reference counter per huge buf (which is decremented when small buf reference counter becomes zero and free callback is executed). I guess it is assumed above. My fear is that it is too much reference counters:  1. mbuf reference counter  2. small buf reference counter  3. huge buf reference counter May be it is possible use (1) for (2) as well? Andrew.