From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 3A5BF49E2 for ; Tue, 24 Apr 2018 21:15:47 +0200 (CEST) Received: from lfbn-lil-1-700-92.w81-254.abo.wanadoo.fr ([81.254.37.92] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1fB3Pm-0005xr-HB; Tue, 24 Apr 2018 21:15:48 +0200 Received: by droids-corp.org (sSMTP sendmail emulation); Tue, 24 Apr 2018 21:15:38 +0200 Date: Tue, 24 Apr 2018 21:15:38 +0200 From: Olivier Matz To: Andrew Rybchenko Cc: Yongseok Koh , wenzhuo.lu@intel.com, jingjing.wu@intel.com, dev@dpdk.org, konstantin.ananyev@intel.com, adrien.mazarguil@6wind.com, nelio.laranjeiro@6wind.com, Thomas Monjalon Message-ID: <20180424191538.exjgzoif4odhndew@neon> References: <20180310012532.15809-1-yskoh@mellanox.com> <20180424013854.33749-1-yskoh@mellanox.com> <934e714e-3cba-7f5d-9fcf-4f96611d758f@solarflare.com> <20180424160244.bggifhilvadxcjb2@neon> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH v4 1/2] mbuf: support attaching external buffer to mbuf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2018 19:15:47 -0000 On Tue, Apr 24, 2018 at 09:21:00PM +0300, Andrew Rybchenko wrote: > On 04/24/2018 07:02 PM, Olivier Matz wrote: > > Hi Andrew, Yongseok, > > > > On Tue, Apr 24, 2018 at 03:28:33PM +0300, Andrew Rybchenko wrote: > > > On 04/24/2018 04:38 AM, Yongseok Koh wrote: > > > > This patch introduces a new way of attaching an external buffer to a mbuf. > > > > > > > > Attaching an external buffer is quite similar to mbuf indirection in > > > > replacing buffer addresses and length of a mbuf, but a few differences: > > > > - When an indirect mbuf is attached, refcnt of the direct mbuf would be > > > > 2 as long as the direct mbuf itself isn't freed after the attachment. > > > > In such cases, the buffer area of a direct mbuf must be read-only. But > > > > external buffer has its own refcnt and it starts from 1. Unless > > > > multiple mbufs are attached to a mbuf having an external buffer, the > > > > external buffer is writable. > > > > - There's no need to allocate buffer from a mempool. Any buffer can be > > > > attached with appropriate free callback. > > > > - Smaller metadata is required to maintain shared data such as refcnt. > > > Really useful. Many thanks. See my notes below. > > > > > > It worries me that detach is more expensive than it really required since it > > > requires to restore mbuf as direct. If mbuf mempool is used for mbufs > > > as headers for external buffers only all these actions are absolutely > > > useless. > > I agree on the principle. And we have the same issue with indirect mbuf. > > Currently, the assumption is that a free mbuf (inside a mempool) is > > initialized as a direct mbuf. We can think about optimizations here, > > but I'm not sure it should be in this patchset. > > I agree that it should be addressed separately. > > > [...] > > > > > > @@ -688,14 +704,33 @@ rte_mbuf_to_baddr(struct rte_mbuf *md) > > > > } > > > > /** > > > > + * Returns TRUE if given mbuf is cloned by mbuf indirection, or FALSE > > > > + * otherwise. > > > > + * > > > > + * If a mbuf has its data in another mbuf and references it by mbuf > > > > + * indirection, this mbuf can be defined as a cloned mbuf. > > > > + */ > > > > +#define RTE_MBUF_CLONED(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) > > > > + > > > > +/** > > > > * Returns TRUE if given mbuf is indirect, or FALSE otherwise. > > > > */ > > > > -#define RTE_MBUF_INDIRECT(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) > > > > +#define RTE_MBUF_INDIRECT(mb) RTE_MBUF_CLONED(mb) > > > It is still confusing that INDIRECT != !DIRECT. > > > May be we have no good options right now, but I'd suggest to at least > > > deprecate > > > RTE_MBUF_INDIRECT() and completely remove it in the next release. > > Agree. I may have missed something, but is my previous suggestion > > not doable? > > > > - direct = embeds its own data (and indirect = !direct) > > - clone (or another name) = data is another mbuf > > - extbuf = data is in an external buffer > > I guess the problem that it changes INDIRECT semantics since EXTBUF > is added as well. I think strictly speaking it is an API change. > Is it OK to make it without announcement? In any case, there will be an ABI change, because an application compiled for 18.02 will not be able to handle these new kind of mbuf. So unfortunatly yes, I think this kind of changes should first be announced. Thomas, what do you think? > > Deprecating the macro is a good idea. > > > > > > + m->buf_addr = buf_addr; > > > > + m->buf_iova = buf_iova; > > > > + > > > > + if (shinfo == NULL) { > > > > + shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end, > > > > + sizeof(*shinfo)), sizeof(uintptr_t)); > > > > + if ((void *)shinfo <= buf_addr) > > > > + return NULL; > > > > + > > > > + m->buf_len = RTE_PTR_DIFF(shinfo, buf_addr); > > > > + } else { > > > > + m->buf_len = buf_len; > > > > + } > > > > + > > > > + m->data_len = 0; > > > > + > > > > + rte_pktmbuf_reset_headroom(m); > > > I would suggest to make data_off one more parameter. > > > If I have a buffer with data which I'd like to attach to an mbuf, I'd like > > > to control data_off. > > Another option is to set the headroom to 0. > > Because the after attaching the mbuf to an external buffer, we will > > still require to set the length. > > > > A user can do something like this: > > > > rte_pktmbuf_attach_extbuf(m, buf_va, buf_iova, buf_len, shinfo, > > free_cb, free_cb_arg); > > rte_pktmbuf_append(m, data_len + headroom); > > rte_pktmbuf_adj(m, headroom); > > > > > > + m->ol_flags |= EXT_ATTACHED_MBUF; > > > > + m->shinfo = shinfo; > > > > + > > > > + rte_mbuf_ext_refcnt_set(shinfo, 1); > > > Why is assignment used here? Cannot we attach extbuf already attached to > > > other mbuf? > > In rte_pktmbuf_attach(), this is true. That's not illogical to > > keep the same approach here. Maybe an assert could be added? > > > > > May be shinfo should be initialized only if it is not provided (shinfo == > > > NULL on input)? > > I don't get why, can you explain please? > > May be I misunderstand how it should look like when one huge buffer > is partitioned. I thought that it should be only one shinfo per huge buffer > to control when it is not used any more by any mbufs with extbuf. OK I got it. I think both approach could make sense: - one shinfo per huge buffer - or one shinfo per mbuf, and use the callback to manage another refcnt (like what Yongseok described) So I agree with your proposal, shinfo should be initialized by the caller if it is != NULL, else it can be initialized by rte_pktmbuf_attach_extbuf(). > Other option is to have shinfo per small buf plus reference counter > per huge buf (which is decremented when small buf reference counter > becomes zero and free callback is executed). I guess it is assumed above. > My fear is that it is too much reference counters: >  1. mbuf reference counter >  2. small buf reference counter >  3. huge buf reference counter > May be it is possible use (1) for (2) as well? I would prefer to have only 2 reference counters, one in the mbuf and one in the shinfo.