From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id C39FAF90 for ; Mon, 23 Apr 2018 13:53:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Apr 2018 04:53:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,318,1520924400"; d="scan'208";a="35449942" Received: from irsmsx105.ger.corp.intel.com ([163.33.3.28]) by orsmga007.jf.intel.com with ESMTP; 23 Apr 2018 04:53:06 -0700 Received: from irsmsx156.ger.corp.intel.com (10.108.20.68) by irsmsx105.ger.corp.intel.com (163.33.3.28) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 23 Apr 2018 12:53:05 +0100 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.164]) by IRSMSX156.ger.corp.intel.com ([169.254.3.229]) with mapi id 14.03.0319.002; Mon, 23 Apr 2018 12:53:05 +0100 From: "Ananyev, Konstantin" To: Yongseok Koh , "Lu, Wenzhuo" , "Wu, Jingjing" , "olivier.matz@6wind.com" CC: "dev@dpdk.org" , "adrien.mazarguil@6wind.com" , "nelio.laranjeiro@6wind.com" Thread-Topic: [PATCH v3 1/2] mbuf: support attaching external buffer to mbuf Thread-Index: AQHT13tb7EdOoqZeqEi/dR5l7YKFH6QOO30Q Date: Mon, 23 Apr 2018 11:53:04 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258AE91994F@IRSMSX102.ger.corp.intel.com> References: <20180310012532.15809-1-yskoh@mellanox.com> <20180419011105.9694-1-yskoh@mellanox.com> In-Reply-To: <20180419011105.9694-1-yskoh@mellanox.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGRiMDZmODktN2QxMi00OWMzLTlhOTQtNzY2MTViNTJhN2Y4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Imc0dnFWNTFSMUYzVjAzZ3hQQXNJaWU2WG1xM2JRclRpd2dyZTFOSmZnU2M9In0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/2] mbuf: support attaching external buffer to mbuf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Apr 2018 11:53:09 -0000 >=20 > This patch introduces a new way of attaching an external buffer to a mbuf= . >=20 > Attaching an external buffer is quite similar to mbuf indirection in > replacing buffer addresses and length of a mbuf, but a few differences: > - As refcnt of a direct mbuf is at least 2, the buffer area of a direct > mbuf must be read-only. But external buffer has its own refcnt and it > starts from 1. Unless multiple mbufs are attached to a mbuf having an > external buffer, the external buffer is writable. > - There's no need to allocate buffer from a mempool. Any buffer can be > attached with appropriate free callback. > - Smaller metadata is required to maintain shared data such as refcnt. >=20 > Signed-off-by: Yongseok Koh > --- >=20 > Submitting only non-mlx5 patches to meet deadline for RC1. mlx5 patches > will be submitted separately rebased on a differnet patchset which > accommodates new memory hotplug design to mlx PMDs. >=20 > v3: > * implement external buffer attachment instead of introducing buf_off for > mbuf indirection. >=20 > lib/librte_mbuf/rte_mbuf.h | 276 +++++++++++++++++++++++++++++++++++++++= +----- > 1 file changed, 248 insertions(+), 28 deletions(-) >=20 > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 06eceba37..e64160c81 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -326,7 +326,7 @@ extern "C" { > PKT_TX_MACSEC | \ > PKT_TX_SEC_OFFLOAD) >=20 > -#define __RESERVED (1ULL << 61) /**< reserved for future mbuf = use */ > +#define EXT_ATTACHED_MBUF (1ULL << 61) /**< Mbuf having external buff= er */ >=20 > #define IND_ATTACHED_MBUF (1ULL << 62) /**< Indirect attached mbuf */ >=20 > @@ -568,6 +568,24 @@ struct rte_mbuf { >=20 > } __rte_cache_aligned; >=20 > +/** > + * Function typedef of callback to free externally attached buffer. > + */ > +typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque= ); > + > +/** > + * Shared data at the end of an external buffer. > + */ > +struct rte_mbuf_ext_shared_info { > + rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */ > + void *fcb_opaque; /**< Free callback argument */ > + RTE_STD_C11 > + union { > + rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */ > + uint16_t refcnt; /**< Non-atomically accessed refcnt */ > + }; > +}; > + > /**< Maximum number of nb_segs allowed. */ > #define RTE_MBUF_MAX_NB_SEGS UINT16_MAX >=20 > @@ -693,9 +711,14 @@ rte_mbuf_to_baddr(struct rte_mbuf *md) > #define RTE_MBUF_INDIRECT(mb) ((mb)->ol_flags & IND_ATTACHED_MBUF) >=20 > /** > + * Returns TRUE if given mbuf has external buffer, or FALSE otherwise. > + */ > +#define RTE_MBUF_HAS_EXTBUF(mb) ((mb)->ol_flags & EXT_ATTACHED_MBUF) > + > +/** > * Returns TRUE if given mbuf is direct, or FALSE otherwise. > */ > -#define RTE_MBUF_DIRECT(mb) (!RTE_MBUF_INDIRECT(mb)) > +#define RTE_MBUF_DIRECT(mb) (!RTE_MBUF_INDIRECT(mb) && !RTE_MBUF_HAS_EXT= BUF(mb)) As a nit: RTE_MBUF_DIRECT(mb) (((mb)->ol_flags & (IND_ATTACHED_MBUF | EXT_ATTACHED_M= BUF)) =3D=3D 0) >=20 > /** > * Private data in case of pktmbuf pool. > @@ -821,6 +844,58 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new= _value) >=20 > #endif /* RTE_MBUF_REFCNT_ATOMIC */ >=20 > +/** > + * Reads the refcnt of an external buffer. > + * > + * @param shinfo > + * Shared data of the external buffer. > + * @return > + * Reference count number. > + */ > +static inline uint16_t > +rte_extbuf_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo) > +{ > + return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic)); > +} > + > +/** > + * Set refcnt of an external buffer. > + * > + * @param shinfo > + * Shared data of the external buffer. > + * @param new_value > + * Value set > + */ > +static inline void > +rte_extbuf_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo, > + uint16_t new_value) > +{ > + rte_atomic16_set(&shinfo->refcnt_atomic, new_value); > +} > + > +/** > + * Add given value to refcnt of an external buffer and return its new > + * value. > + * > + * @param shinfo > + * Shared data of the external buffer. > + * @param value > + * Value to add/subtract > + * @return > + * Updated value > + */ > +static inline uint16_t > +rte_extbuf_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo, > + int16_t value) > +{ > + if (likely(rte_extbuf_refcnt_read(shinfo) =3D=3D 1)) { > + rte_extbuf_refcnt_set(shinfo, 1 + value); > + return 1 + value; > + } > + > + return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value)= ; > +} > + > /** Mbuf prefetch */ > #define RTE_MBUF_PREFETCH_TO_FREE(m) do { \ > if ((m) !=3D NULL) \ > @@ -1195,11 +1270,120 @@ static inline int rte_pktmbuf_alloc_bulk(struct = rte_mempool *pool, > } >=20 > /** > + * Return shared data of external buffer of a mbuf. > + * > + * @param m > + * The pointer to the mbuf. > + * @return > + * The address of the shared data. > + */ > +static inline struct rte_mbuf_ext_shared_info * > +rte_mbuf_ext_shinfo(struct rte_mbuf *m) > +{ > + return (struct rte_mbuf_ext_shared_info *) > + RTE_PTR_ADD(m->buf_addr, m->buf_len); > +} > + > +/** > + * Attach an external buffer to a mbuf. > + * > + * User-managed anonymous buffer can be attached to an mbuf. When attach= ing > + * it, corresponding free callback function and its argument should be > + * provided. This callback function will be called once all the mbufs ar= e > + * detached from the buffer. > + * > + * More mbufs can be attached to the same external buffer by > + * ``rte_pktmbuf_attach()`` once the external buffer has been attached b= y > + * this API. > + * > + * Detachment can be done by either ``rte_pktmbuf_detach_extbuf()`` or > + * ``rte_pktmbuf_detach()``. > + * > + * A few bytes in the trailer of the provided buffer will be dedicated f= or > + * shared data (``struct rte_mbuf_ext_shared_info``) to store refcnt, > + * callback function and so on. The shared data can be referenced by > + * ``rte_mbuf_ext_shinfo()`` > + * > + * Attaching an external buffer is quite similar to mbuf indirection in > + * replacing buffer addresses and length of a mbuf, but a few difference= s: > + * - As refcnt of a direct mbuf is at least 2, the buffer area of a dire= ct > + * mbuf must be read-only. But external buffer has its own refcnt and = it > + * starts from 1. Unless multiple mbufs are attached to a mbuf having = an > + * external buffer, the external buffer is writable. > + * - There's no need to allocate buffer from a mempool. Any buffer can b= e > + * attached with appropriate free callback. > + * - Smaller metadata is required to maintain shared data such as refcnt= . > + * > + * @warning > + * @b EXPERIMENTAL: This API may change without prior notice. > + * Once external buffer is enabled by allowing experimental API, > + * ``RTE_MBUF_DIRECT()`` and ``RTE_MBUF_INDIRECT()`` are no longer > + * exclusive. A mbuf can be consiered direct if it is neither indirect n= or > + * having external buffer. > + * > + * @param m > + * The pointer to the mbuf. > + * @param buf_addr > + * The pointer to the external buffer we're attaching to. > + * @param buf_len > + * The size of the external buffer we're attaching to. This must be la= rger > + * than the size of ``struct rte_mbuf_ext_shared_info`` and padding fo= r > + * alignment. If not enough, this function will return NULL. > + * @param free_cb > + * Free callback function to call when the external buffer needs to be= freed. > + * @param fcb_opaque > + * Argument for the free callback function. > + * @return > + * A pointer to the new start of the data on success, return NULL othe= rwise. > + */ > +static inline char * __rte_experimental > +rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, > + uint16_t buf_len, rte_mbuf_extbuf_free_callback_t free_cb, > + void *fcb_opaque) > +{ > + void *buf_end =3D RTE_PTR_ADD(buf_addr, buf_len); > + struct rte_mbuf_ext_shared_info *shinfo; > + > + shinfo =3D RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end, sizeof(*shinfo)), > + sizeof(uintptr_t)); > + > + if ((void *)shinfo <=3D buf_addr) > + return NULL; > + > + m->buf_addr =3D buf_addr; > + m->buf_iova =3D rte_mempool_virt2iova(buf_addr); That wouldn't work for arbitrary extern buffer. Only for the one that is an element in some other mempool. For arbitrary external buffer - callee has to provide PA for it plus guaran= tee that it's VA would be locked down. >>From other side - if your intention is just to use only elements of other m= empools - No need to have free_cb(). mempool_put should do. > + m->buf_len =3D RTE_PTR_DIFF(shinfo, buf_addr); > + m->data_len =3D 0; > + > + rte_pktmbuf_reset_headroom(m); > + m->ol_flags |=3D EXT_ATTACHED_MBUF; > + > + rte_extbuf_refcnt_set(shinfo, 1); > + shinfo->free_cb =3D free_cb; > + shinfo->fcb_opaque =3D fcb_opaque; > + > + return (char *)m->buf_addr + m->data_off; > +} > + > +/** > + * Detach the external buffer attached to a mbuf, same as > + * ``rte_pktmbuf_detach()`` > + * > + * @param m > + * The mbuf having external buffer. > + */ > +#define rte_pktmbuf_detach_extbuf(m) rte_pktmbuf_detach(m) > + > +/** > * Attach packet mbuf to another packet mbuf. > * > - * After attachment we refer the mbuf we attached as 'indirect', > - * while mbuf we attached to as 'direct'. > - * The direct mbuf's reference counter is incremented. > + * If the mbuf we are attaching to isn't a direct buffer and is attached= to > + * an external buffer, the mbuf being attached will be attached to the > + * external buffer instead of mbuf indirection. > + * > + * Otherwise, the mbuf will be indirectly attached. After attachment we > + * refer the mbuf we attached as 'indirect', while mbuf we attached to a= s > + * 'direct'. The direct mbuf's reference counter is incremented. > * > * Right now, not supported: > * - attachment for already indirect mbuf (e.g. - mi has to be direct). > @@ -1213,19 +1397,18 @@ static inline int rte_pktmbuf_alloc_bulk(struct r= te_mempool *pool, > */ > static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mb= uf *m) > { > - struct rte_mbuf *md; > - > RTE_ASSERT(RTE_MBUF_DIRECT(mi) && > rte_mbuf_refcnt_read(mi) =3D=3D 1); >=20 > - /* if m is not direct, get the mbuf that embeds the data */ > - if (RTE_MBUF_DIRECT(m)) > - md =3D m; > - else > - md =3D rte_mbuf_from_indirect(m); > + if (RTE_MBUF_HAS_EXTBUF(m)) { > + rte_extbuf_refcnt_update(rte_mbuf_ext_shinfo(m), 1); > + mi->ol_flags =3D m->ol_flags; > + } else { > + rte_mbuf_refcnt_update(rte_mbuf_from_indirect(m), 1); > + mi->priv_size =3D m->priv_size; > + mi->ol_flags =3D m->ol_flags | IND_ATTACHED_MBUF; > + } >=20 > - rte_mbuf_refcnt_update(md, 1); > - mi->priv_size =3D m->priv_size; > mi->buf_iova =3D m->buf_iova; > mi->buf_addr =3D m->buf_addr; > mi->buf_len =3D m->buf_len; > @@ -1241,7 +1424,6 @@ static inline void rte_pktmbuf_attach(struct rte_mb= uf *mi, struct rte_mbuf *m) > mi->next =3D NULL; > mi->pkt_len =3D mi->data_len; > mi->nb_segs =3D 1; > - mi->ol_flags =3D m->ol_flags | IND_ATTACHED_MBUF; > mi->packet_type =3D m->packet_type; > mi->timestamp =3D m->timestamp; >=20 > @@ -1250,12 +1432,53 @@ static inline void rte_pktmbuf_attach(struct rte_= mbuf *mi, struct rte_mbuf *m) > } >=20 > /** > - * Detach an indirect packet mbuf. > + * @internal used by rte_pktmbuf_detach(). > + * > + * Decrement the reference counter of the external buffer. When the > + * reference counter becomes 0, the buffer is freed by pre-registered > + * callback. > + */ > +static inline void > +__rte_pktmbuf_free_extbuf(struct rte_mbuf *m) > +{ > + struct rte_mbuf_ext_shared_info *shinfo; > + > + RTE_ASSERT(RTE_MBUF_HAS_EXTBUF(m)); > + > + shinfo =3D rte_mbuf_ext_shinfo(m); > + > + if (rte_extbuf_refcnt_update(shinfo, -1) =3D=3D 0) > + shinfo->free_cb(m->buf_addr, shinfo->fcb_opaque); I understand the reason but extra function call for each external mbuf - se= ems quite expensive. Wonder is it possible to group them somehow and amortize the cost? > +} > + > +/** > + * @internal used by rte_pktmbuf_detach(). > * > + * Decrement the direct mbuf's reference counter. When the reference > + * counter becomes 0, the direct mbuf is freed. > + */ > +static inline void > +__rte_pktmbuf_free_direct(struct rte_mbuf *m) > +{ > + struct rte_mbuf *md =3D rte_mbuf_from_indirect(m); > + > + RTE_ASSERT(RTE_MBUF_INDIRECT(m)); > + > + if (rte_mbuf_refcnt_update(md, -1) =3D=3D 0) { > + md->next =3D NULL; > + md->nb_segs =3D 1; > + rte_mbuf_refcnt_set(md, 1); > + rte_mbuf_raw_free(md); > + } > +} > + > +/** > + * Detach a packet mbuf from external buffer or direct buffer. > + * > + * - decrement refcnt and free the external/direct buffer if refcnt > + * becomes zero. > * - restore original mbuf address and length values. > * - reset pktmbuf data and data_len to their default values. > - * - decrement the direct mbuf's reference counter. When the > - * reference counter becomes 0, the direct mbuf is freed. > * > * All other fields of the given packet mbuf will be left intact. > * > @@ -1264,10 +1487,14 @@ static inline void rte_pktmbuf_attach(struct rte_= mbuf *mi, struct rte_mbuf *m) > */ > static inline void rte_pktmbuf_detach(struct rte_mbuf *m) > { > - struct rte_mbuf *md =3D rte_mbuf_from_indirect(m); > struct rte_mempool *mp =3D m->pool; > uint32_t mbuf_size, buf_len, priv_size; >=20 > + if (RTE_MBUF_HAS_EXTBUF(m)) > + __rte_pktmbuf_free_extbuf(m); > + else > + __rte_pktmbuf_free_direct(m); > + > priv_size =3D rte_pktmbuf_priv_size(mp); > mbuf_size =3D sizeof(struct rte_mbuf) + priv_size; > buf_len =3D rte_pktmbuf_data_room_size(mp); > @@ -1279,13 +1506,6 @@ static inline void rte_pktmbuf_detach(struct rte_m= buf *m) > rte_pktmbuf_reset_headroom(m); > m->data_len =3D 0; > m->ol_flags =3D 0; > - > - if (rte_mbuf_refcnt_update(md, -1) =3D=3D 0) { > - md->next =3D NULL; > - md->nb_segs =3D 1; > - rte_mbuf_refcnt_set(md, 1); > - rte_mbuf_raw_free(md); > - } > } >=20 > /** > @@ -1309,7 +1529,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) >=20 > if (likely(rte_mbuf_refcnt_read(m) =3D=3D 1)) { >=20 > - if (RTE_MBUF_INDIRECT(m)) > + if (!RTE_MBUF_DIRECT(m)) > rte_pktmbuf_detach(m); >=20 > if (m->next !=3D NULL) { > @@ -1321,7 +1541,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) >=20 > } else if (__rte_mbuf_refcnt_update(m, -1) =3D=3D 0) { >=20 > - if (RTE_MBUF_INDIRECT(m)) > + if (!RTE_MBUF_DIRECT(m)) > rte_pktmbuf_detach(m); >=20 > if (m->next !=3D NULL) { > -- > 2.11.0