From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A66EEA317C for ; Thu, 17 Oct 2019 13:58:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 772E21E943; Thu, 17 Oct 2019 13:58:57 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 5DDB91E942 for ; Thu, 17 Oct 2019 13:58:56 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2019 04:58:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,307,1566889200"; d="scan'208";a="195137900" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by fmsmga008.fm.intel.com with ESMTP; 17 Oct 2019 04:58:53 -0700 Received: from irsmsx104.ger.corp.intel.com ([169.254.5.252]) by IRSMSX151.ger.corp.intel.com ([169.254.4.59]) with mapi id 14.03.0439.000; Thu, 17 Oct 2019 12:58:53 +0100 From: "Ananyev, Konstantin" To: Olivier Matz CC: "dev@dpdk.org" , Thomas Monjalon , "Wang, Haiyue" , Stephen Hemminger , Andrew Rybchenko , "Wiles, Keith" , Jerin Jacob Kollanukkaran Thread-Topic: [dpdk-dev] [PATCH] mbuf: support dynamic fields and flags Thread-Index: AQHVbkHY1JuInOseYE2F0xUhHJUX86dEXv8QgBoz9ACAAFMOAA== Date: Thu, 17 Oct 2019 11:58:52 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725801A8C6A308@IRSMSX104.ger.corp.intel.com> References: <20190710092907.5565-1-olivier.matz@6wind.com> <20190918165448.22409-1-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB977258019196E0B7@irsmsx105.ger.corp.intel.com> <20191017075434.dk4flyktbbe3lxxd@platinum> In-Reply-To: <20191017075434.dk4flyktbbe3lxxd@platinum> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiN2M3NTkxODgtMmU1Yi00Y2RlLWFmZDctMmNlNWFkYWRjNmY3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiSDREZm4yd2N5VHpKdFBhZmFrRXNcL1RoRWNjUzJGa2orK3VDM1BuaTRHR2x1WlpMWk1kQ3ZncnB6VjFreTg2cHIifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] mbuf: support dynamic fields and flags X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Olivier, > > > Many features require to store data inside the mbuf. As the room in m= buf > > > structure is limited, it is not possible to have a field for each > > > feature. Also, changing fields in the mbuf structure can break the AP= I > > > or ABI. > > > > > > This commit addresses these issues, by enabling the dynamic registrat= ion > > > of fields or flags: > > > > > > - a dynamic field is a named area in the rte_mbuf structure, with a > > > given size (>=3D 1 byte) and alignment constraint. > > > - a dynamic flag is a named bit in the rte_mbuf structure. > > > > > > The typical use case is a PMD that registers space for an offload > > > feature, when the application requests to enable this feature. As > > > the space in mbuf is limited, the space should only be reserved if it > > > is going to be used (i.e when the application explicitly asks for it)= . > > > > > > The registration can be done at any moment, but it is not possible > > > to unregister fields or flags for now. > > > > Looks ok to me in general. > > Some comments/suggestions inline. > > Konstantin > > > > > > > > Signed-off-by: Olivier Matz > > > Acked-by: Thomas Monjalon > > > --- > > > > > > rfc -> v1 > > > > > > * Rebase on top of master > > > * Change registration API to use a structure instead of > > > variables, getting rid of #defines (Stephen's comment) > > > * Update flag registration to use a similar API as fields. > > > * Change max name length from 32 to 64 (sugg. by Thomas) > > > * Enhance API documentation (Haiyue's and Andrew's comments) > > > * Add a debug log at registration > > > * Add some words in release note > > > * Did some performance tests (sugg. by Andrew): > > > On my platform, reading a dynamic field takes ~3 cycles more > > > than a static field, and ~2 cycles more for writing. > > > > > > app/test/test_mbuf.c | 114 ++++++- > > > doc/guides/rel_notes/release_19_11.rst | 7 + > > > lib/librte_mbuf/Makefile | 2 + > > > lib/librte_mbuf/meson.build | 6 +- > > > lib/librte_mbuf/rte_mbuf.h | 25 +- > > > lib/librte_mbuf/rte_mbuf_dyn.c | 408 +++++++++++++++++++++++= ++ > > > lib/librte_mbuf/rte_mbuf_dyn.h | 163 ++++++++++ > > > lib/librte_mbuf/rte_mbuf_version.map | 4 + > > > 8 files changed, 724 insertions(+), 5 deletions(-) > > > create mode 100644 lib/librte_mbuf/rte_mbuf_dyn.c > > > create mode 100644 lib/librte_mbuf/rte_mbuf_dyn.h > > > > > > --- a/lib/librte_mbuf/rte_mbuf.h > > > +++ b/lib/librte_mbuf/rte_mbuf.h > > > @@ -198,9 +198,12 @@ extern "C" { > > > #define PKT_RX_OUTER_L4_CKSUM_GOOD (1ULL << 22) > > > #define PKT_RX_OUTER_L4_CKSUM_INVALID ((1ULL << 21) | (1ULL << 22)) > > > > > > -/* add new RX flags here */ > > > +/* add new RX flags here, don't forget to update PKT_FIRST_FREE */ > > > > > > -/* add new TX flags here */ > > > +#define PKT_FIRST_FREE (1ULL << 23) > > > +#define PKT_LAST_FREE (1ULL << 39) > > > + > > > +/* add new TX flags here, don't forget to update PKT_LAST_FREE */ > > > > > > /** > > > * Indicate that the metadata field in the mbuf is in use. > > > @@ -738,6 +741,8 @@ struct rte_mbuf { > > > */ > > > struct rte_mbuf_ext_shared_info *shinfo; > > > > > > + uint64_t dynfield1; /**< Reserved for dynamic fields. */ > > > + uint64_t dynfield2; /**< Reserved for dynamic fields. */ > > > > Wonder why just not one field: > > union { > > uint8_t u8[16]; > > ... > > uint64_t u64[2]; > > } dyn_field1; > > ? > > Probably would be a bit handy, to refer, register, etc. no? >=20 > I didn't find any place where we need an access through u8, so I > just changed it into uint64_t dynfield1[2]. My thought was - if you'll have all dynamic stuff as one field (uint64_t dy= n_field[2]), then you woulnd't need any cycles at register() at all. But up to you. >=20 >=20 > > > > > } __rte_cache_aligned; > > > > > > /** > > > @@ -1684,6 +1689,21 @@ rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, = void *buf_addr, > > > */ > > > #define rte_pktmbuf_detach_extbuf(m) rte_pktmbuf_detach(m) > > > > > > +/** > > > + * Copy dynamic fields from m_src to m_dst. > > > + * > > > + * @param m_dst > > > + * The destination mbuf. > > > + * @param m_src > > > + * The source mbuf. > > > + */ > > > +static inline void > > > +rte_mbuf_dynfield_copy(struct rte_mbuf *m_dst, const struct rte_mbuf= *m_src) > > > +{ > > > + m_dst->dynfield1 =3D m_src->dynfield1; > > > + m_dst->dynfield2 =3D m_src->dynfield2; > > > +} > > > + > > > /** > > > * Attach packet mbuf to another packet mbuf. > > > * > > > @@ -1732,6 +1752,7 @@ static inline void rte_pktmbuf_attach(struct rt= e_mbuf *mi, struct rte_mbuf *m) > > > mi->vlan_tci_outer =3D m->vlan_tci_outer; > > > mi->tx_offload =3D m->tx_offload; > > > mi->hash =3D m->hash; > > > + rte_mbuf_dynfield_copy(mi, m); > > > > > > mi->next =3D NULL; > > > mi->pkt_len =3D mi->data_len; > > > diff --git a/lib/librte_mbuf/rte_mbuf_dyn.c b/lib/librte_mbuf/rte_mbu= f_dyn.c > > > new file mode 100644 > > > index 000000000..13b8742d0 > > > --- /dev/null > > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.c > > > @@ -0,0 +1,408 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * Copyright 2019 6WIND S.A. > > > + */ > > > + > > > +#include > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#define RTE_MBUF_DYN_MZNAME "rte_mbuf_dyn" > > > + > > > +struct mbuf_dynfield_elt { > > > + TAILQ_ENTRY(mbuf_dynfield_elt) next; > > > + struct rte_mbuf_dynfield params; > > > + int offset; > > > > Why not 'size_t offset', to avoid any explicit conversions, etc? >=20 > Fixed >=20 >=20 > > > +}; > > > +TAILQ_HEAD(mbuf_dynfield_list, rte_tailq_entry); > > > + > > > +static struct rte_tailq_elem mbuf_dynfield_tailq =3D { > > > + .name =3D "RTE_MBUF_DYNFIELD", > > > +}; > > > +EAL_REGISTER_TAILQ(mbuf_dynfield_tailq); > > > + > > > +struct mbuf_dynflag_elt { > > > + TAILQ_ENTRY(mbuf_dynflag_elt) next; > > > + struct rte_mbuf_dynflag params; > > > + int bitnum; > > > +}; > > > +TAILQ_HEAD(mbuf_dynflag_list, rte_tailq_entry); > > > + > > > +static struct rte_tailq_elem mbuf_dynflag_tailq =3D { > > > + .name =3D "RTE_MBUF_DYNFLAG", > > > +}; > > > +EAL_REGISTER_TAILQ(mbuf_dynflag_tailq); > > > + > > > +struct mbuf_dyn_shm { > > > + /** For each mbuf byte, free_space[i] =3D=3D 1 if space is free. */ > > > + uint8_t free_space[sizeof(struct rte_mbuf)]; > > > + /** Bitfield of available flags. */ > > > + uint64_t free_flags; > > > +}; > > > +static struct mbuf_dyn_shm *shm; > > > + > > > +/* allocate and initialize the shared memory */ > > > +static int > > > +init_shared_mem(void) > > > +{ > > > + const struct rte_memzone *mz; > > > + uint64_t mask; > > > + > > > + if (rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY) { > > > + mz =3D rte_memzone_reserve_aligned(RTE_MBUF_DYN_MZNAME, > > > + sizeof(struct mbuf_dyn_shm), > > > + SOCKET_ID_ANY, 0, > > > + RTE_CACHE_LINE_SIZE); > > > + } else { > > > + mz =3D rte_memzone_lookup(RTE_MBUF_DYN_MZNAME); > > > + } > > > + if (mz =3D=3D NULL) > > > + return -1; > > > + > > > + shm =3D mz->addr; > > > + > > > +#define mark_free(field) \ > > > + memset(&shm->free_space[offsetof(struct rte_mbuf, field)], \ > > > + 0xff, sizeof(((struct rte_mbuf *)0)->field)) > > > > I think you can avoid defining/unedifying macros here by something like= that: > > > > static const struct { > > size_t offset; > > size_t size; > > } dyn_syms[] =3D { > > [0] =3D {.offset =3D offsetof(struct rte_mbuf, dynfield1), sizeof((= struct rte_mbuf *)0)->dynfield1), > > [1] =3D {.offset =3D offsetof(struct rte_mbuf, dynfield2), sizeof((= struct rte_mbuf *)0)->dynfield2), > > }; > > ... > > > > for (i =3D 0; i !=3D RTE_DIM(dyn_syms); i++) > > memset(shm->free_space + dym_syms[i].offset, UINT8_MAX, dym_syms[i]= .size); > > >=20 > I tried it, but the following lines are too long > [0] =3D {offsetof(struct rte_mbuf, dynfield1), sizeof((struct rte_mb= uf *)0)->dynfield1), > [1] =3D {offsetof(struct rte_mbuf, dynfield2), sizeof((struct rte_mb= uf *)0)->dynfield2), > To make them shorter, we can use a macro... but... wait :) Guess what, you can put offset ans size on different lines :) [0] =3D { .offset =3D offsetof(struct rte_mbuf, dynfield1), .size=3D sizeof((struct rte_mbuf *)0)->dynfield1), }, .... >=20 > > > + > > > + if (rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY) { > > > + /* init free_space, keep it sync'd with > > > + * rte_mbuf_dynfield_copy(). > > > + */ > > > + memset(shm, 0, sizeof(*shm)); > > > + mark_free(dynfield1); > > > + mark_free(dynfield2); > > > + > > > + /* init free_flags */ > > > + for (mask =3D PKT_FIRST_FREE; mask <=3D PKT_LAST_FREE; mask <<=3D = 1) > > > + shm->free_flags |=3D mask; > > > + } > > > +#undef mark_free > > > + > > > + return 0; > > > +} > > > + > > > +/* check if this offset can be used */ > > > +static int > > > +check_offset(size_t offset, size_t size, size_t align, unsigned int = flags) > > > +{ > > > + size_t i; > > > + > > > + (void)flags; > > > > > > We have RTE_SET_USED() for such cases... > > Though as it is an internal function probably better not to introduce > > unused parameters at all. >=20 > I removed the flag parameter as you suggested. >=20 >=20 > > > + > > > + if ((offset & (align - 1)) !=3D 0) > > > + return -1; > > > + if (offset + size > sizeof(struct rte_mbuf)) > > > + return -1; > > > + > > > + for (i =3D 0; i < size; i++) { > > > + if (!shm->free_space[i + offset]) > > > + return -1; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > +/* assume tailq is locked */ > > > +static struct mbuf_dynfield_elt * > > > +__mbuf_dynfield_lookup(const char *name) > > > +{ > > > + struct mbuf_dynfield_list *mbuf_dynfield_list; > > > + struct mbuf_dynfield_elt *mbuf_dynfield; > > > + struct rte_tailq_entry *te; > > > + > > > + mbuf_dynfield_list =3D RTE_TAILQ_CAST( > > > + mbuf_dynfield_tailq.head, mbuf_dynfield_list); > > > + > > > + TAILQ_FOREACH(te, mbuf_dynfield_list, next) { > > > + mbuf_dynfield =3D (struct mbuf_dynfield_elt *)te->data; > > > + if (strcmp(name, mbuf_dynfield->params.name) =3D=3D 0) > > > + break; > > > + } > > > + > > > + if (te =3D=3D NULL) { > > > + rte_errno =3D ENOENT; > > > + return NULL; > > > + } > > > + > > > + return mbuf_dynfield; > > > +} > > > + > > > +int > > > +rte_mbuf_dynfield_lookup(const char *name, struct rte_mbuf_dynfield = *params) > > > +{ > > > + struct mbuf_dynfield_elt *mbuf_dynfield; > > > + > > > + if (shm =3D=3D NULL) { > > > + rte_errno =3D ENOENT; > > > + return -1; > > > + } > > > + > > > + rte_mcfg_tailq_read_lock(); > > > + mbuf_dynfield =3D __mbuf_dynfield_lookup(name); > > > + rte_mcfg_tailq_read_unlock(); > > > + > > > + if (mbuf_dynfield =3D=3D NULL) { > > > + rte_errno =3D ENOENT; > > > + return -1; > > > + } > > > + > > > + if (params !=3D NULL) > > > + memcpy(params, &mbuf_dynfield->params, sizeof(*params)); > > > + > > > + return mbuf_dynfield->offset; > > > +} > > > + > > > +static int mbuf_dynfield_cmp(const struct rte_mbuf_dynfield *params1= , > > > + const struct rte_mbuf_dynfield *params2) > > > +{ > > > + if (strcmp(params1->name, params2->name)) > > > + return -1; > > > + if (params1->size !=3D params2->size) > > > + return -1; > > > + if (params1->align !=3D params2->align) > > > + return -1; > > > + if (params1->flags !=3D params2->flags) > > > + return -1; > > > + return 0; > > > +} > > > + > > > +int > > > +rte_mbuf_dynfield_register(const struct rte_mbuf_dynfield *params) > > > > What I meant at user-space - if we can also have another function that = would allow > > user to specify required offset for dynfield explicitly, then user can = define it as constant > > value and let compiler do optimization work and hopefully generate fast= er code to access > > this field. > > Something like that: > > > > int rte_mbuf_dynfiled_register_offset(const struct rte_mbuf_dynfield *p= arams, size_t offset); > > > > #define RTE_MBUF_DYNFIELD_OFFSET(fld, off) (offsetof(struct rte_mbuf, = fld) + (off)) > > > > And then somewhere in user code: > > > > /* to let say reserve first 4B in dynfield1*/ > > #define MBUF_DYNFIELD_A RTE_MBUF_DYNFIELD_OFFSET(dynfiled1, 0) > > ... > > params.name =3D RTE_STR(MBUF_DYNFIELD_A); > > params.size =3D sizeof(uint32_t); > > params.align =3D sizeof(uint32_t); > > ret =3D rte_mbuf_dynfiled_register_offset(¶ms, MBUF_DYNFIELD_A); > > if (ret !=3D MBUF_DYNFIELD_A) { > > /* handle it somehow, probably just terminate gracefully... */ > > } > > ... > > > > /* to let say reserve last 2B in dynfield2*/ > > #define MBUF_DYNFIELD_B RTE_MBUF_DYNFIELD_OFFSET(dynfiled2, 6) > > ... > > params.name =3D RTE_STR(MBUF_DYNFIELD_B); > > params.size =3D sizeof(uint16_t); > > params.align =3D sizeof(uint16_t); > > ret =3D rte_mbuf_dynfiled_register_offset(¶ms, MBUF_DYNFIELD_B); > > > > After that user can use constant offsets MBUF_DYNFIELD_A/ MBUF_DYNFIELD= _B > > to access these fields. > > Same thoughts for DYNFLAG. >=20 > I added the feature in v2. >=20 >=20 > > > + struct mbuf_dynfield_list *mbuf_dynfield_list; > > > + struct mbuf_dynfield_elt *mbuf_dynfield =3D NULL; > > > + struct rte_tailq_entry *te =3D NULL; > > > + int offset, ret; > > > > size_t offset > > to avoid explicit conversions, etc.? > > >=20 > Fixed. >=20 >=20 > > > + size_t i; > > > + > > > + if (shm =3D=3D NULL && init_shared_mem() < 0) > > > + goto fail; > > > > As I understand, here you allocate/initialize your shm without any lock= protection, > > though later you protect it via rte_mcfg_tailq_write_lock(). > > That seems a bit flakey to me. > > Why not to store information about free dynfield bytes inside mbuf_dynf= ield_tailq? > > Let say at init() create and add an entry into that list with some res= erved name. > > Then at register - grab mcfg_tailq_write_lock and do lookup > > for such entry and then read/update it as needed. > > It would help to avoid racing problem, plus you wouldn't need to > > allocate/lookup for memzone. >=20 > I don't quite like the idea of having a special entry with a different ty= pe > in an element list. Despite it is simpler for a locking perspective, it i= s > less obvious for the developper. >=20 > Also, I changed the way a zone is reserved to return the one that have th= e > less impact on next reservation, and I feel it is easier to implement wit= h > the shared memory. >=20 > So, I just moved the init_shared_mem() inside the rte_mcfg_tailq_write_lo= ck(), > it should do the job. Yep, that should work too, I think. >=20 >=20 > > > + if (params->size >=3D sizeof(struct rte_mbuf)) { > > > + rte_errno =3D EINVAL; > > > + goto fail; > > > + } > > > + if (!rte_is_power_of_2(params->align)) { > > > + rte_errno =3D EINVAL; > > > + goto fail; > > > + } > > > + if (params->flags !=3D 0) { > > > + rte_errno =3D EINVAL; > > > + goto fail; > > > + } > > > + > > > + rte_mcfg_tailq_write_lock(); > > > + > > > > I think it probably would be cleaner and easier to read/maintain, if yo= u'll put actual > > code under lock protection into a separate function - as you did for __= mbuf_dynfield_lookup(). >=20 > Yes, I did that, it should be clearer now. >=20 >=20