From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 3432E11C5 for ; Thu, 22 Sep 2016 11:36:21 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP; 22 Sep 2016 02:36:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,378,1470726000"; d="scan'208";a="1044095633" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2016 02:36:19 -0700 Received: from irsmsx156.ger.corp.intel.com (10.108.20.68) by IRSMSX106.ger.corp.intel.com (163.33.3.31) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 22 Sep 2016 10:36:16 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.196]) by IRSMSX156.ger.corp.intel.com ([169.254.3.80]) with mapi id 14.03.0248.002; Thu, 22 Sep 2016 10:36:16 +0100 From: "Ananyev, Konstantin" To: Jerin Jacob CC: "Kulasek, TomaszX" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation Thread-Index: AQHSDQSdbAMPwzKA5ECqoX7z1xdg36CAxzYwgAAjHICAAApzgIAAQB+ggAJk+oCAAbH68A== Date: Thu, 22 Sep 2016 09:36:15 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583F0BA159@irsmsx105.ger.corp.intel.com> References: <1472228578-6980-1-git-send-email-tomaszx.kulasek@intel.com> <1473691487-10032-1-git-send-email-tomaszx.kulasek@intel.com> <1473691487-10032-2-git-send-email-tomaszx.kulasek@intel.com> <2601191342CEEE43887BDE71AB9772583F0B583F@irsmsx105.ger.corp.intel.com> <3042915272161B4EB253DA4D77EB373A14F1A294@IRSMSX102.ger.corp.intel.com> <20160919160630.GA18610@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583F0B8F33@irsmsx105.ger.corp.intel.com> <20160921082956.GA5358@localhost.localdomain> In-Reply-To: <20160921082956.GA5358@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2016 09:36:21 -0000 Hi Jerin, > > > > Hi Jerin, >=20 > Hi Konstantin, >=20 > > > > > > > > > > > > > > > > [...] > > > > > > > > > > + > > > > > > +#ifdef RTE_ETHDEV_TX_PREP > > > > > > > > > > Sorry for being a bit late on that discussion, but what the > > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all? > > > > > As I can see right now, if driver doesn't setup tx_pkt_prep, > > > > > then nb_pkts would be return anyway... > > > > > > > > > > BTW, there is my another question - should it be that way? > > > > > Shouldn't we return 0 (and set rte_errno=3DENOTSUP) here if > > > > > dev->tx_pk_prep =3D=3D NULL? > > > > > > > > > > > > > It's an answer to the Jerin's request discussed here: > > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html > > > > > > > > When driver doesn't support tx_prep, default behavior is "we don't > > > > know requirements, so we have nothing to do here". It will > > > > simplify > > > application logic and improve performance for these drivers, I think.= Catching this error with every burst may be problematic. > > > > > > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same > > > > thread, I still don't think It's the best solution of the problem > > > described by him. I have added it here for further discussion. > > > > > > > > Jerin, have you something to add? > > > > > > Nothing very specific to add here. I think, I have tried to share > > > the rational in, http://dpdk.org/ml/archives/dev/2016- > > > September/046437.html > > > > > > > Ok, not sure I am fully understand your intention here. > > I think I understand why you propose rte_eth_tx_prep() to do: > > if (!dev->tx_pkt_prep) > > return nb_pkts; > > > > That allows drivers to NOOP the tx_prep functionality without paying > > the price for callback invocation. >=20 > In true sense, returning the nb_pkts makes it functional NOP as well(i.e = The PMD does not have any limitation on Tx side, so all > packets are _good_(no preparation is required)) >=20 >=20 > > What I don't understand, why with that in place we also need a NOOP > > for the whole rte_eth_tx_prep(): > > +static inline uint16_t > > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_= unused, > > + struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) { > > + return nb_pkts; > > +} > > + > > +#endif > > > > What are you trying to save here: just reading ' dev->tx_pkt_prep'? > > If so, then it seems not that performance critical for me. > > Something else? >=20 > The proposed scheme can make it as true NOOP from compiler perspective to= o if a target decided to do that, I have checked the > instruction generation with arm Assembly, a non true compiler NOOP has fo= llowing instructions overhead at minimum. >=20 > # 1 load > # 1 mov > if (!dev->tx_pkt_prep) > return nb_pkts; Yep. >=20 > # compile can't predict this function needs be executed or not so > # pressure on register allocation and mostly likely it call for > # some stack push and pop based load on outer function(as it is an > # inline function) Well, I suppose compiler wouldn't try to fill function argument registers b= efore the branch above.=20 >=20 > return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, nb_p= kts); >=20 > # 1 branch > if (unlikely(nb_prep < nb_rx)) { > # bunch of code not executed, but pressure on i cache > int i; > for (i =3D nb_prep; i < nb_rx; i++) > rte_pktmbuf_free(pkts_burst[i]); > } >=20 > From a server target(IA or high-end armv8) with external PCIe based syste= m makes sense to have RTE_ETHDEV_TX_PREP option > enabled(which is the case in proposed patch) but the low end arm platform= s with > - limited cores > - less i cache > - IPC =3D=3D 1 > - running around 1GHz > - most importantly, _integrated_ nic controller with no external PCIE > support > does not make much sense to waste the cycles/time for it. > cycle saved is cycle earned. Ok, so it is all to save one memory de-refrence and a comparison plus branc= h. Do you have aby estimation how badly it would hit low-end cpu performance? The reason I am asking: obviously I would prefer to avoid to introduce new = build config option (that's the common dpdk coding practice these days) unless it is really imp= ortant. =20 >=20 > Since DPDK compilation is based on _target_, I really don't see any issue= with this approach nor it does not hurt anything on server > target. > So, IMO, It should be upto the target to decide what works better for the= target. >=20 > Jerin >=20 > > From my point of view NOOP on the driver level is more than enough. > > Again I would prefer to introduce new config option, if possible. > > > > Konstantin > >