From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [67.231.154.164]) by dpdk.org (Postfix) with ESMTP id A2B327D00 for ; Thu, 26 Apr 2018 18:05:14 +0200 (CEST) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1-us1.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id 0189140006D; Thu, 26 Apr 2018 16:05:11 +0000 (UTC) Received: from [192.168.38.17] (84.52.114.114) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Thu, 26 Apr 2018 17:05:04 +0100 To: Yongseok Koh , , , CC: , , , , , References: <20180310012532.15809-1-yskoh@mellanox.com> <20180426011010.28078-1-yskoh@mellanox.com> From: Andrew Rybchenko Message-ID: <1a65f081-4c92-78d5-b00c-08e66fdef5c8@solarflare.com> Date: Thu, 26 Apr 2018 19:05:01 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180426011010.28078-1-yskoh@mellanox.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Originating-IP: [84.52.114.114] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-11.0.0.1191-8.100.1062-23806.003 X-TM-AS-Result: No--21.722000-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-MDID: 1524758712-ToeWrfaoIhA7 Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: support attaching external buffer to mbuf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2018 16:05:14 -0000 On 04/26/2018 04:10 AM, Yongseok Koh wrote: > This patch introduces a new way of attaching an external buffer to a mbuf. > > Attaching an external buffer is quite similar to mbuf indirection in > replacing buffer addresses and length of a mbuf, but a few differences: > - When an indirect mbuf is attached, refcnt of the direct mbuf would be > 2 as long as the direct mbuf itself isn't freed after the attachment. > In such cases, the buffer area of a direct mbuf must be read-only. But > external buffer has its own refcnt and it starts from 1. Unless > multiple mbufs are attached to a mbuf having an external buffer, the > external buffer is writable. > - There's no need to allocate buffer from a mempool. Any buffer can be > attached with appropriate free callback. > - Smaller metadata is required to maintain shared data such as refcnt. I'm still unsure how reference counters for external buffer work. Let's consider the following example:                         |<-mbuf1 buf_len-->|<-mbuf2 buf_len-->| +--+-----------+----+--------------+----+--------------+---+- - - |  global      |head|mbuf1 data    |head|mbuf2 data    | | |  shinfo=2    |room| |room|              |   | +--+-----------+----+--------------+----+--------------+---+- - -                ^ ^ +----------+   |      +----------+ | | mbuf1    +---+      | mbuf2    +-+ | refcnt=1 |          | refcnt=1 | +----------+          +----------+ I.e. we have big buffer which is sliced into many small data buffers referenced from mbufs. shinfo reference counter is used to control when big buffer may be freed. But what controls sharing of each block? headroom and following mbuf data (buf_len) is owned by corresponding mbuf and the mbuf owner can do everything with the space (prepend data, append data, modify etc). I.e. it is read-write in above terminology. What should happen if mbuf1 is cloned? Right now it will result in a new mbuf1a with reference counter 1 and incremented shinfo reference counter. And what does say that corresponding area is read-only now? It looks like nothing. As I understand it should be solved using per data area shinfo which free callback decrements big buffer reference counter. So, we have two reference counters per each mbuf with external buffer (plus reference counter per big buffer). Two reference counters sounds too much and it looks like mbuf-with-extbuf reference counter is not really used (since on clone/attach update shinfo refcnt). It is still two counters to check on free. Have you considered alternative approach to use mbuf refcnt as sharing indicator for extbuf data? However, in this case indirect referencing extbuf would logically look like: +----------+    +--------+     +--------+ | indirect +--->| extbuf +---->|  data  | |  mbuf    |    |  mbuf  |     |        | +----------+    +--------+     +--------+ It looks like it would allow to avoid two reference counters per data block as above. Personally I'm not sure which approach is better and would like to hear what you and other reviewers think about it. Some minor notes below as well. > Signed-off-by: Yongseok Koh > --- > > ** This patch can pass the mbuf_autotest. ** > > Submitting only non-mlx5 patches to meet deadline for RC1. mlx5 patches > will be submitted separately rebased on a differnet patchset which > accommodates new memory hotplug design to mlx PMDs. > > v6: > * rte_pktmbuf_attach_extbuf() doesn't take NULL shinfo. Instead, > rte_pktmbuf_ext_shinfo_init_helper() is added. > * bug fix in rte_pktmbuf_attach() - shinfo wasn't saved to mi. > * minor changes from review. > > v5: > * rte_pktmbuf_attach_extbuf() sets headroom to 0. > * if shinfo is provided when attaching, user should initialize it. > * minor changes from review. > > v4: > * rte_pktmbuf_attach_extbuf() takes new arguments - buf_iova and shinfo. > user can pass memory for shared data via shinfo argument. > * minor changes from review. > > v3: > * implement external buffer attachment instead of introducing buf_off for > mbuf indirection. > > lib/librte_mbuf/rte_mbuf.h | 335 +++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 306 insertions(+), 29 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 43aaa9c5f..0a6885281 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h [...] > +/** > + * Shared data at the end of an external buffer. > + */ > +struct rte_mbuf_ext_shared_info { > + rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */ > + void *fcb_opaque; /**< Free callback argument */ > + rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */ > +}; > + > /**< Maximum number of nb_segs allowed. */ > #define RTE_MBUF_MAX_NB_SEGS UINT16_MAX > > @@ -706,14 +728,34 @@ rte_mbuf_to_baddr(struct rte_mbuf *md) > } > > /** > + * Returns TRUE if given mbuf is cloned by mbuf indirection, or FALSE > + * otherwise. > + * > + * If a mbuf has its data in another mbuf and references it by mbuf > + * indirection, this mbuf can be defined as a cloned mbuf. > + */ > +#define RTE_MBUF_CLONED(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) > + > +/** > * Returns TRUE if given mbuf is indirect, or FALSE otherwise. > */ > -#define RTE_MBUF_INDIRECT(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) > +#define RTE_MBUF_INDIRECT(mb) RTE_MBUF_CLONED(mb) We have discussed that it would be good to deprecate RTE_MBUF_INDIRECT() since it is not !RTE_MBUF_DIREC(). Is it lost here or intentional (may be I've lost in the thread)? > /** Mbuf prefetch */ > #define RTE_MBUF_PREFETCH_TO_FREE(m) do { \ > if ((m) != NULL) \ > @@ -1213,11 +1307,157 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool, > } > > /** > + * Initialize shared data at the end of an external buffer before attaching > + * to a mbuf by ``rte_pktmbuf_attach_extbuf()``. This is not a mandatory > + * initialization but a helper function to simply spare a few bytes at the > + * end of the buffer for shared data. If shared data is allocated > + * separately, this should not be called but application has to properly > + * initialize the shared data according to its need. > + * > + * Free callback and its argument is saved and the refcnt is set to 1. > + * > + * @warning > + * buf_len must be adjusted to RTE_PTR_DIFF(shinfo, buf_addr) after this > + * initialization. For example, May be buf_len should be inout and it should be done by the function? Just a question since current approach looks fragile. > + * > + * struct rte_mbuf_ext_shared_info *shinfo = > + * rte_pktmbuf_ext_shinfo_init_helpfer(buf_addr, buf_len, > + * free_cb, fcb_arg); > + * buf_len = RTE_PTR_DIFF(shinfo, buf_addr); > + * rte_pktmbuf_attach_extbuf(m, buf_addr, buf_iova, buf_len, shinfo); > + * > + * @param buf_addr > + * The pointer to the external buffer. > + * @param buf_len > + * The size of the external buffer. buf_len must be larger than the size > + * of ``struct rte_mbuf_ext_shared_info`` and padding for alignment. If > + * not enough, this function will return NULL. > + * @param free_cb > + * Free callback function to call when the external buffer needs to be > + * freed. > + * @param fcb_opaque > + * Argument for the free callback function. > + * > + * @return > + * A pointer to the initialized shared data on success, return NULL > + * otherwise. > + */ > +static inline struct rte_mbuf_ext_shared_info * > +rte_pktmbuf_ext_shinfo_init_helper(void *buf_addr, uint16_t buf_len, > + rte_mbuf_extbuf_free_callback_t free_cb, void *fcb_opaque) > +{ > + struct rte_mbuf_ext_shared_info *shinfo; > + void *buf_end = RTE_PTR_ADD(buf_addr, buf_len); > + > + shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end, > + sizeof(*shinfo)), sizeof(uintptr_t)); > + if ((void *)shinfo <= buf_addr) > + return NULL; > + > + rte_mbuf_ext_refcnt_set(shinfo, 1); > + > + shinfo->free_cb = free_cb; > + shinfo->fcb_opaque = fcb_opaque; Just a nit, but I'd suggest to initialize in the same order as in the struct. (if there is no reasons why reference counter should be initialized first) > + > + /* buf_len must be adjusted to RTE_PTR_DIFF(shinfo, buf_addr) */ > + return shinfo; > +} [...]