DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <arybchenko@solarflare.com>
To: Yongseok Koh <yskoh@mellanox.com>, <wenzhuo.lu@intel.com>,
	<jingjing.wu@intel.com>, <olivier.matz@6wind.com>
Cc: <dev@dpdk.org>, <konstantin.ananyev@intel.com>,
	<stephen@networkplumber.org>, <thomas@monjalon.net>,
	<adrien.mazarguil@6wind.com>, <nelio.laranjeiro@6wind.com>
Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: support attaching external buffer to mbuf
Date: Thu, 26 Apr 2018 19:05:01 +0300	[thread overview]
Message-ID: <1a65f081-4c92-78d5-b00c-08e66fdef5c8@solarflare.com> (raw)
In-Reply-To: <20180426011010.28078-1-yskoh@mellanox.com>

On 04/26/2018 04:10 AM, Yongseok Koh wrote:
> This patch introduces a new way of attaching an external buffer to a mbuf.
>
> Attaching an external buffer is quite similar to mbuf indirection in
> replacing buffer addresses and length of a mbuf, but a few differences:
>    - When an indirect mbuf is attached, refcnt of the direct mbuf would be
>      2 as long as the direct mbuf itself isn't freed after the attachment.
>      In such cases, the buffer area of a direct mbuf must be read-only. But
>      external buffer has its own refcnt and it starts from 1. Unless
>      multiple mbufs are attached to a mbuf having an external buffer, the
>      external buffer is writable.
>    - There's no need to allocate buffer from a mempool. Any buffer can be
>      attached with appropriate free callback.
>    - Smaller metadata is required to maintain shared data such as refcnt.

I'm still unsure how reference counters for external buffer work.

Let's consider the following example:

                         |<-mbuf1 buf_len-->|<-mbuf2 buf_len-->|
+--+-----------+----+--------------+----+--------------+---+- - -
|  global      |head|mbuf1 data    |head|mbuf2 data    | |
|  shinfo=2    |room| |room|              |   |
+--+-----------+----+--------------+----+--------------+---+- - -
                ^ ^
+----------+   |      +----------+ |
| mbuf1    +---+      | mbuf2    +-+
| refcnt=1 |          | refcnt=1 |
+----------+          +----------+

I.e. we have big buffer which is sliced into many small data
buffers referenced from mbufs.

shinfo reference counter is used to control when big buffer
may be freed. But what controls sharing of each block?

headroom and following mbuf data (buf_len) is owned by
corresponding mbuf and the mbuf owner can do everything
with the space (prepend data, append data, modify etc).
I.e. it is read-write in above terminology.

What should happen if mbuf1 is cloned? Right now it will result
in a new mbuf1a with reference counter 1 and incremented shinfo
reference counter. And what does say that corresponding area
is read-only now? It looks like nothing.

As I understand it should be solved using per data area shinfo
which free callback decrements big buffer reference counter.

So, we have two reference counters per each mbuf with external
buffer (plus reference counter per big buffer).
Two reference counters sounds too much and it looks like
mbuf-with-extbuf reference counter is not really used
(since on clone/attach update shinfo refcnt).
It is still two counters to check on free.

Have you considered alternative approach to use mbuf refcnt
as sharing indicator for extbuf data? However, in this case
indirect referencing extbuf would logically look like:

+----------+    +--------+     +--------+
| indirect +--->| extbuf +---->|  data  |
|  mbuf    |    |  mbuf  |     |        |
+----------+    +--------+     +--------+

It looks like it would allow to avoid two reference counters
per data block as above. Personally I'm not sure which approach
is better and would like to hear what you and other reviewers
think about it.

Some minor notes below as well.

> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>
> ** This patch can pass the mbuf_autotest. **
>
> Submitting only non-mlx5 patches to meet deadline for RC1. mlx5 patches
> will be submitted separately rebased on a differnet patchset which
> accommodates new memory hotplug design to mlx PMDs.
>
> v6:
> * rte_pktmbuf_attach_extbuf() doesn't take NULL shinfo. Instead,
>    rte_pktmbuf_ext_shinfo_init_helper() is added.
> * bug fix in rte_pktmbuf_attach() - shinfo wasn't saved to mi.
> * minor changes from review.
>
> v5:
> * rte_pktmbuf_attach_extbuf() sets headroom to 0.
> * if shinfo is provided when attaching, user should initialize it.
> * minor changes from review.
>
> v4:
> * rte_pktmbuf_attach_extbuf() takes new arguments - buf_iova and shinfo.
>    user can pass memory for shared data via shinfo argument.
> * minor changes from review.
>
> v3:
> * implement external buffer attachment instead of introducing buf_off for
>    mbuf indirection.
>
>   lib/librte_mbuf/rte_mbuf.h | 335 +++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 306 insertions(+), 29 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 43aaa9c5f..0a6885281 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h

[...]

> +/**
> + * Shared data at the end of an external buffer.
> + */
> +struct rte_mbuf_ext_shared_info {
> +	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
> +	void *fcb_opaque;                        /**< Free callback argument */
> +	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +};
> +
>   /**< Maximum number of nb_segs allowed. */
>   #define RTE_MBUF_MAX_NB_SEGS	UINT16_MAX
>   
> @@ -706,14 +728,34 @@ rte_mbuf_to_baddr(struct rte_mbuf *md)
>   }
>   
>   /**
> + * Returns TRUE if given mbuf is cloned by mbuf indirection, or FALSE
> + * otherwise.
> + *
> + * If a mbuf has its data in another mbuf and references it by mbuf
> + * indirection, this mbuf can be defined as a cloned mbuf.
> + */
> +#define RTE_MBUF_CLONED(mb)     ((mb)->ol_flags & IND_ATTACHED_MBUF)
> +
> +/**
>    * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
>    */
> -#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
> +#define RTE_MBUF_INDIRECT(mb)   RTE_MBUF_CLONED(mb)

We have discussed that it would be good to deprecate RTE_MBUF_INDIRECT()
since it is not !RTE_MBUF_DIREC(). Is it lost here or intentional (may 
be I've lost
in the thread)?

>   /** Mbuf prefetch */
>   #define RTE_MBUF_PREFETCH_TO_FREE(m) do {       \
>   	if ((m) != NULL)                        \
> @@ -1213,11 +1307,157 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
>   }
>   
>   /**
> + * Initialize shared data at the end of an external buffer before attaching
> + * to a mbuf by ``rte_pktmbuf_attach_extbuf()``. This is not a mandatory
> + * initialization but a helper function to simply spare a few bytes at the
> + * end of the buffer for shared data. If shared data is allocated
> + * separately, this should not be called but application has to properly
> + * initialize the shared data according to its need.
> + *
> + * Free callback and its argument is saved and the refcnt is set to 1.
> + *
> + * @warning
> + * buf_len must be adjusted to RTE_PTR_DIFF(shinfo, buf_addr) after this
> + * initialization. For example,

May be buf_len should be inout and it should be done by the function?
Just a question since current approach looks fragile.

> + *
> + *   struct rte_mbuf_ext_shared_info *shinfo =
> + *          rte_pktmbuf_ext_shinfo_init_helpfer(buf_addr, buf_len,
> + *                                              free_cb, fcb_arg);
> + *   buf_len = RTE_PTR_DIFF(shinfo, buf_addr);
> + *   rte_pktmbuf_attach_extbuf(m, buf_addr, buf_iova, buf_len, shinfo);
> + *
> + * @param buf_addr
> + *   The pointer to the external buffer.
> + * @param buf_len
> + *   The size of the external buffer. buf_len must be larger than the size
> + *   of ``struct rte_mbuf_ext_shared_info`` and padding for alignment. If
> + *   not enough, this function will return NULL.
> + * @param free_cb
> + *   Free callback function to call when the external buffer needs to be
> + *   freed.
> + * @param fcb_opaque
> + *   Argument for the free callback function.
> + *
> + * @return
> + *   A pointer to the initialized shared data on success, return NULL
> + *   otherwise.
> + */
> +static inline struct rte_mbuf_ext_shared_info *
> +rte_pktmbuf_ext_shinfo_init_helper(void *buf_addr, uint16_t buf_len,
> +	rte_mbuf_extbuf_free_callback_t free_cb, void *fcb_opaque)
> +{
> +	struct rte_mbuf_ext_shared_info *shinfo;
> +	void *buf_end = RTE_PTR_ADD(buf_addr, buf_len);
> +
> +	shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end,
> +				sizeof(*shinfo)), sizeof(uintptr_t));
> +	if ((void *)shinfo <= buf_addr)
> +		return NULL;
> +
> +	rte_mbuf_ext_refcnt_set(shinfo, 1);
> +
> +	shinfo->free_cb = free_cb;
> +	shinfo->fcb_opaque = fcb_opaque;

Just a nit, but I'd suggest to initialize in the same order as in the 
struct.
(if there is no reasons why reference counter should be initialized first)

> +
> +	/* buf_len must be adjusted to RTE_PTR_DIFF(shinfo, buf_addr) */
> +	return shinfo;
> +}

[...]

  parent reply	other threads:[~2018-04-26 16:05 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-10  1:25 [dpdk-dev] [PATCH v1 0/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 1/6] mbuf: add buffer offset field for flexible indirection Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 2/6] net/mlx5: separate filling Rx flags Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 3/6] net/mlx5: add a function to rdma-core glue Yongseok Koh
2018-03-12  9:13   ` Nélio Laranjeiro
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 4/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-03-12  9:20   ` Nélio Laranjeiro
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 5/6] net/mlx5: release Tx queue resource earlier than Rx Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 6/6] app/testpmd: conserve mbuf indirection flag Yongseok Koh
2018-04-02 18:50 ` [dpdk-dev] [PATCH v2 0/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for flexible indirection Yongseok Koh
2018-04-03  8:26     ` Olivier Matz
2018-04-04  0:12       ` Yongseok Koh
2018-04-09 16:04         ` Olivier Matz
2018-04-10  1:59           ` Yongseok Koh
2018-04-11  0:25             ` Ananyev, Konstantin
2018-04-11  5:33               ` Yongseok Koh
2018-04-11 11:39                 ` Ananyev, Konstantin
2018-04-11 14:02                   ` Andrew Rybchenko
2018-04-11 17:18                     ` Yongseok Koh
2018-04-11 17:08                   ` Yongseok Koh
2018-04-12 16:34                     ` Ananyev, Konstantin
2018-04-12 18:58                       ` Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: separate filling Rx flags Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add a function to rdma-core glue Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: release Tx queue resource earlier than Rx Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: conserve mbuf indirection flag Yongseok Koh
2018-04-19  1:11 ` [dpdk-dev] [PATCH v3 1/2] mbuf: support attaching external buffer to mbuf Yongseok Koh
2018-04-19  1:11   ` [dpdk-dev] [PATCH v3 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-23 11:53   ` [dpdk-dev] [PATCH v3 1/2] mbuf: support attaching external buffer to mbuf Ananyev, Konstantin
2018-04-24  2:04     ` Yongseok Koh
2018-04-25 13:16       ` Ananyev, Konstantin
2018-04-25 16:44         ` Yongseok Koh
2018-04-25 18:05           ` Ananyev, Konstantin
2018-04-23 16:18   ` Olivier Matz
2018-04-24  1:29     ` Yongseok Koh
2018-04-24 15:36       ` Olivier Matz
2018-04-24  1:38 ` [dpdk-dev] [PATCH v4 " Yongseok Koh
2018-04-24  1:38   ` [dpdk-dev] [PATCH v4 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-24  5:01   ` [dpdk-dev] [PATCH v4 1/2] mbuf: support attaching external buffer to mbuf Stephen Hemminger
2018-04-24 11:47     ` Yongseok Koh
2018-04-24 12:28   ` Andrew Rybchenko
2018-04-24 16:02     ` Olivier Matz
2018-04-24 18:21       ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
2018-04-24 19:15         ` [dpdk-dev] " Olivier Matz
2018-04-24 20:22           ` Thomas Monjalon
2018-04-24 21:53             ` Yongseok Koh
2018-04-24 22:15               ` Thomas Monjalon
2018-04-25  8:21               ` Olivier Matz
2018-04-25 15:06             ` Stephen Hemminger
2018-04-24 23:34           ` Yongseok Koh
2018-04-25 14:45             ` Andrew Rybchenko
2018-04-25 17:40               ` Yongseok Koh
2018-04-25  8:28       ` Olivier Matz
2018-04-25  9:08         ` Yongseok Koh
2018-04-25  9:19           ` Yongseok Koh
2018-04-25 20:00             ` Olivier Matz
2018-04-25 22:54               ` Yongseok Koh
2018-04-24 22:30     ` Yongseok Koh
2018-04-25  2:53 ` [dpdk-dev] [PATCH v5 " Yongseok Koh
2018-04-25  2:53   ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-25 13:31   ` [dpdk-dev] [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf Ananyev, Konstantin
2018-04-25 17:06     ` Yongseok Koh
2018-04-25 17:23       ` Ananyev, Konstantin
2018-04-25 18:02         ` Yongseok Koh
2018-04-25 18:22           ` Yongseok Koh
2018-04-25 18:30             ` Yongseok Koh
2018-04-26  1:10 ` [dpdk-dev] [PATCH v6 " Yongseok Koh
2018-04-26  1:10   ` [dpdk-dev] [PATCH v6 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-26 11:39   ` [dpdk-dev] [PATCH v6 1/2] mbuf: support attaching external buffer to mbuf Ananyev, Konstantin
2018-04-26 16:05   ` Andrew Rybchenko [this message]
2018-04-26 16:10     ` Thomas Monjalon
2018-04-26 19:42       ` Olivier Matz
2018-04-26 19:58         ` Thomas Monjalon
2018-04-26 20:07           ` Olivier Matz
2018-04-26 20:24             ` Thomas Monjalon
2018-04-26 17:18     ` Yongseok Koh
2018-04-26 19:45       ` Olivier Matz
2018-04-27  0:01 ` [dpdk-dev] [PATCH v7 " Yongseok Koh
2018-04-27  0:01   ` [dpdk-dev] [PATCH v7 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-27  8:00     ` Andrew Rybchenko
2018-04-27  7:22   ` [dpdk-dev] [PATCH v7 1/2] mbuf: support attaching external buffer to mbuf Andrew Rybchenko
2018-04-27 17:22 ` [dpdk-dev] [PATCH v8 " Yongseok Koh
2018-04-27 17:22   ` [dpdk-dev] [PATCH v8 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-27 18:09   ` [dpdk-dev] [PATCH v8 1/2] mbuf: support attaching external buffer to mbuf Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a65f081-4c92-78d5-b00c-08e66fdef5c8@solarflare.com \
    --to=arybchenko@solarflare.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jingjing.wu@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=olivier.matz@6wind.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    --cc=wenzhuo.lu@intel.com \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).