From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 7FD07A05D3 for ; Sat, 30 Mar 2019 15:20:38 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B58D2493D; Sat, 30 Mar 2019 15:20:37 +0100 (CET) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 6EC273977 for ; Sat, 30 Mar 2019 15:20:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Mar 2019 07:20:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,288,1549958400"; d="scan'208";a="130016712" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by orsmga008.jf.intel.com with ESMTP; 30 Mar 2019 07:20:33 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.210]) by IRSMSX104.ger.corp.intel.com ([169.254.5.56]) with mapi id 14.03.0415.000; Sat, 30 Mar 2019 14:20:32 +0000 From: "Ananyev, Konstantin" To: Olivier Matz CC: "dev@dpdk.org" , "akhil.goyal@nxp.com" Thread-Topic: [PATCH v4 1/9] mbuf: new function to generate raw Tx offload value Thread-Index: AQHU5hoPLQL0dX1vBECdciQ9rozSqaYikOGAgAGOHQA= Date: Sat, 30 Mar 2019 14:20:31 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725801365622BB@irsmsx105.ger.corp.intel.com> References: <20190326154320.29913-1-konstantin.ananyev@intel.com> <20190329102726.27716-1-konstantin.ananyev@intel.com> <20190329102726.27716-2-konstantin.ananyev@intel.com> <20190329125427.hdwevmm4wwl73tlj@platinum> In-Reply-To: <20190329125427.hdwevmm4wwl73tlj@platinum> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGY0NzJhMTQtNDc0NS00MzIwLWE4YjAtMGU1MDdjZDY1MGIxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoid1lhUjNXTkxUWE5mWFQ0cEVyM0VOT3hzZVwvZTNJNGVrcWRnVFR3VmhiSEJBRmdmQWVwRnRjZVhJNUgzXC9WK1hNIn0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 1/9] mbuf: new function to generate raw Tx offload value X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190330142031.yhohq_TXko9TnuoKWFSUD0tsEtjIdSi7sb7qfKEp_LI@z> Hi Olivier, > > Operations to set/update bit-fields often cause compilers > > to generate suboptimal code. > > To help avoid such situation for tx_offload fields: > > introduce new enum for tx_offload bit-fields lengths and offsets, > > and new function to generate raw tx_offload value. > > > > Signed-off-by: Konstantin Ananyev > > Acked-by: Akhil Goyal >=20 > I understand the need. Out of curiosity, do you have any performance > numbers to share? On my board (SKX): for micro-benchmark (doing nothing but setting tx_offload for 1M mbufs in = a loop)=20 the difference is more than 150% - from ~55 cycles to ~20 cycles per itera= tion. For ipsec-secgw - ~3% improvement for tunneled outbound packets. >=20 > Few cosmetic questions below. >=20 > > --- > > lib/librte_mbuf/rte_mbuf.h | 79 ++++++++++++++++++++++++++++++++++---- > > 1 file changed, 72 insertions(+), 7 deletions(-) > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > index d961ccaf6..0b197e8ce 100644 > > --- a/lib/librte_mbuf/rte_mbuf.h > > +++ b/lib/librte_mbuf/rte_mbuf.h > > @@ -479,6 +479,31 @@ struct rte_mbuf_sched { > > uint16_t reserved; /**< Reserved. */ > > }; /**< Hierarchical scheduler */ > > > > +/** > > + * enum for the tx_offload bit-fields lenghts and offsets. > > + * defines the layout of rte_mbuf tx_offload field. > > + */ > > +enum { > > + RTE_MBUF_L2_LEN_BITS =3D 7, > > + RTE_MBUF_L3_LEN_BITS =3D 9, > > + RTE_MBUF_L4_LEN_BITS =3D 8, > > + RTE_MBUF_TSO_SEGSZ_BITS =3D 16, > > + RTE_MBUF_OUTL3_LEN_BITS =3D 9, > > + RTE_MBUF_OUTL2_LEN_BITS =3D 7, > > + RTE_MBUF_L2_LEN_OFS =3D 0, > > + RTE_MBUF_L3_LEN_OFS =3D RTE_MBUF_L2_LEN_OFS + RTE_MBUF_L2_LEN_BITS, > > + RTE_MBUF_L4_LEN_OFS =3D RTE_MBUF_L3_LEN_OFS + RTE_MBUF_L3_LEN_BITS, > > + RTE_MBUF_TSO_SEGSZ_OFS =3D RTE_MBUF_L4_LEN_OFS + RTE_MBUF_L4_LEN_BITS= , > > + RTE_MBUF_OUTL3_LEN_OFS =3D > > + RTE_MBUF_TSO_SEGSZ_OFS + RTE_MBUF_TSO_SEGSZ_BITS, > > + RTE_MBUF_OUTL2_LEN_OFS =3D > > + RTE_MBUF_OUTL3_LEN_OFS + RTE_MBUF_OUTL3_LEN_BITS, > > + RTE_MBUF_TXOFLD_UNUSED_OFS =3D > > + RTE_MBUF_OUTL2_LEN_OFS + RTE_MBUF_OUTL2_LEN_BITS, > > + RTE_MBUF_TXOFLD_UNUSED_BITS =3D > > + sizeof(uint64_t) * CHAR_BIT - RTE_MBUF_TXOFLD_UNUSED_OFS, > > +}; > > + >=20 > What is the advantage of defining an enum instead of #defines? No big difference here, just looks nicer to me. >=20 > In any case, I wonder if it wouldn't be clearer to change the order like > this: >=20 > enum { > RTE_MBUF_L2_LEN_OFS =3D 0, > RTE_MBUF_L2_LEN_BITS =3D 7, > RTE_MBUF_L3_LEN_OFS =3D RTE_MBUF_L2_LEN_OFS + RTE_MBUF_L2_LEN_BITS, > RTE_MBUF_L3_LEN_BITS =3D 9, > RTE_MBUF_L4_LEN_OFS =3D RTE_MBUF_L3_LEN_OFS + RTE_MBUF_L3_LEN_BITS, > RTE_MBUF_L4_LEN_BITS =3D 8, > ... NP, can do this way. >=20 >=20 > > /** > > * The generic rte_mbuf, containing a packet mbuf. > > */ > > @@ -640,19 +665,24 @@ struct rte_mbuf { > > uint64_t tx_offload; /**< combined for easy fetch */ > > __extension__ > > struct { > > - uint64_t l2_len:7; > > + uint64_t l2_len:RTE_MBUF_L2_LEN_BITS; > > /**< L2 (MAC) Header Length for non-tunneling pkt. > > * Outer_L4_len + ... + Inner_L2_len for tunneling pkt. > > */ > > - uint64_t l3_len:9; /**< L3 (IP) Header Length. */ > > - uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */ > > - uint64_t tso_segsz:16; /**< TCP TSO segment size */ > > + uint64_t l3_len:RTE_MBUF_L3_LEN_BITS; > > + /**< L3 (IP) Header Length. */ > > + uint64_t l4_len:RTE_MBUF_L4_LEN_BITS; > > + /**< L4 (TCP/UDP) Header Length. */ > > + uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS; > > + /**< TCP TSO segment size */ > > > > /* fields for TX offloading of tunnels */ > > - uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */ > > - uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */ > > + uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS; > > + /**< Outer L3 (IP) Hdr Length. */ > > + uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS; > > + /**< Outer L2 (MAC) Hdr Length. */ > > > > - /* uint64_t unused:8; */ > > + /* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */ > > }; > > }; > > > > @@ -2243,6 +2273,41 @@ static inline int rte_pktmbuf_chain(struct rte_m= buf *head, struct rte_mbuf *tail > > return 0; > > } > > > > +/* > > + * @warning > > + * @b EXPERIMENTAL: This API may change without prior notice. > > + * > > + * For given input values generate raw tx_offload value. > > + * @param il2 > > + * l2_len value. > > + * @param il3 > > + * l3_len value. > > + * @param il4 > > + * l4_len value. > > + * @param tso > > + * tso_segsz value. > > + * @param ol3 > > + * outer_l3_len value. > > + * @param ol2 > > + * outer_l2_len value. > > + * @param unused > > + * unused value. > > + * @return > > + * raw tx_offload value. > > + */ > > +static __rte_always_inline uint64_t > > +rte_mbuf_tx_offload(uint64_t il2, uint64_t il3, uint64_t il4, uint64_t= tso, > > + uint64_t ol3, uint64_t ol2, uint64_t unused) > > +{ > > + return il2 << RTE_MBUF_L2_LEN_OFS | > > + il3 << RTE_MBUF_L3_LEN_OFS | > > + il4 << RTE_MBUF_L4_LEN_OFS | > > + tso << RTE_MBUF_TSO_SEGSZ_OFS | > > + ol3 << RTE_MBUF_OUTL3_LEN_OFS | > > + ol2 << RTE_MBUF_OUTL2_LEN_OFS | > > + unused << RTE_MBUF_TXOFLD_UNUSED_OFS; > > +} > > + > > /** >=20 >=20 > From what I see, the problem is quite similar to what was done with > rte_mbuf_sched_set() recently. So I wondered if it was possible to > declare a structure like this: >=20 > struct rte_mbuf_ol_len { > uint64_t l2_len:7; > uint64_t l3_len:9; /**< L3 (IP) Header Length. */ > uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */ > ... > } >=20 > And have the set function like this: >=20 > m->l =3D (struct rte_mbuf_ol_len) { > .l2_len =3D l2_len, > .l3_len =3D l3_len, > .l4_len =3D l4_len, > ... >=20 > This would avoid the definition of the offsets and bits, but I didn't > find any way to declare these fields as anonymous in the mbuf structure. > Did you tried that way too? I thought about such approach, but as you said above it would change from unnamed struct to named one. Which, as I understand, means API breakage. So don't think the hassle will be worth the benefit. Also the code wouldn't be totally identical - that approach will generate f= ew extra 'AND' instructions. Konstantin