From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from stargate3.asicdesigners.com (unknown [67.207.115.98]) by dpdk.org (Postfix) with ESMTP id 36B788E9F for ; Mon, 5 Oct 2015 14:42:08 +0200 (CEST) Received: from localhost (scalar.blr.asicdesigners.com [10.193.185.94]) by stargate3.asicdesigners.com (8.13.8/8.13.8) with ESMTP id t95Cg3B8018198; Mon, 5 Oct 2015 05:42:04 -0700 Date: Mon, 5 Oct 2015 18:12:08 +0530 From: Rahul Lakkireddy To: "Ananyev, Konstantin" Message-ID: <20151005124205.GA24533@scalar.blr.asicdesigners.com> References: <318fc8559675b1157e7f049a6a955a6a2059bac7.1443704150.git.rahul.lakkireddy@chelsio.com> <20151005100620.GA2487@scalar.blr.asicdesigners.com> <2601191342CEEE43887BDE71AB97725836AA36CF@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836AA36CF@irsmsx105.ger.corp.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Cc: "dev@dpdk.org" , Felix Marti , Nirranjan Kirubaharan , Kumar A S Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Oct 2015 12:42:08 -0000 Hi Konstantin, On Monday, October 10/05/15, 2015 at 04:46:40 -0700, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Rahul Lakkireddy > > Sent: Monday, October 05, 2015 11:06 AM > > To: Aaron Conole > > Cc: dev@dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan > > Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G > > > > Hi Aaron, > > > > On Friday, October 10/02/15, 2015 at 14:48:28 -0700, Aaron Conole wrote: > > > Hi Rahul, > > > > > > Rahul Lakkireddy writes: > > > > > > > Update sge initialization with respect to free-list manager configuration > > > > and ingress arbiter. Also update refill logic to refill mbufs only after > > > > a certain threshold for rx. Optimize tx packet prefetch and free. > > > <> > > > > for (i = 0; i < sd->coalesce.idx; i++) { > > > > - rte_pktmbuf_free(sd->coalesce.mbuf[i]); > > > > + struct rte_mbuf *tmp = sd->coalesce.mbuf[i]; > > > > + > > > > + do { > > > > + struct rte_mbuf *next = tmp->next; > > > > + > > > > + rte_pktmbuf_free_seg(tmp); > > > > + tmp = next; > > > > + } while (tmp); > > > > sd->coalesce.mbuf[i] = NULL; > > > Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't > > > actually see much difference between your rewrite of this block, and > > > the implementation of rte_pktmbuf_free() (apart from moving your branch > > > to the end of the function). Did your microbenchmarking really show this > > > as an improvement? > > > > > > Thanks for your time, > > > Aaron > > > > rte_pktmbuf_free calls rte_mbuf_sanity_check which does a lot of > > checks. > > Only when RTE_LIBRTE_MBUF_DEBUG is enabled in your config. > By default it is switched off. Right. I clearly missed this. I am running with default config only btw. > > > This additional check seems redundant for single segment > > packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check. > > > > Several PMDs already prefer to use rte_pktmbuf_free_seg directly over > > rte_pktmbuf_free as it is faster. > > Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated > with it segment. So as HW is done with the TD, SW frees associated segment. > In your case I don't see any point in re-implementing rte_pktmbuf_free() manually, > and I don't think it would be any faster. > > Konstantin As I mentioned below, I am clearly seeing a difference of 1 Mpps. And 1 Mpps is not a small difference IMHO. When running l3fwd with 8 queues, I also collected a perf report. When using rte_pktmbuf_free, I see that it eats up around 6% cpu as below in perf top report:- -------------------- 32.00% l3fwd [.] cxgbe_poll 22.25% l3fwd [.] t4_eth_xmit 20.30% l3fwd [.] main_loop 6.77% l3fwd [.] rte_pktmbuf_free 4.86% l3fwd [.] refill_fl_usembufs 2.00% l3fwd [.] write_sgl ..... -------------------- While, when using rte_pktmbuf_free_seg directly, I don't see above problem. perf top report now comes as:- ------------------- 33.36% l3fwd [.] cxgbe_poll 32.69% l3fwd [.] t4_eth_xmit 19.05% l3fwd [.] main_loop 5.21% l3fwd [.] refill_fl_usembufs 2.40% l3fwd [.] write_sgl .... ------------------- I obviously missed the debug flag for rte_mbuf_sanity_check. However, there is a clear difference of 1 Mpps. I don't know if its the change between while construct used in rte_pktmbuf_free and the do..while construct that I used - is making the difference. > > > > > The forwarding perf. improvement with only this particular block is > > around 1 Mpps for 64B packets when using l3fwd with 8 queues. > > > > Thanks, > > Rahul