From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rahul.lakkireddy@chelsio.com>
Received: from stargate3.asicdesigners.com (unknown [67.207.115.98])
 by dpdk.org (Postfix) with ESMTP id 516BE593A
 for <dev@dpdk.org>; Wed,  7 Oct 2015 17:27:22 +0200 (CEST)
Received: from localhost (scalar.blr.asicdesigners.com [10.193.185.94])
 by stargate3.asicdesigners.com (8.13.8/8.13.8) with ESMTP id t97FRITC010848;
 Wed, 7 Oct 2015 08:27:19 -0700
Date: Wed, 7 Oct 2015 20:57:22 +0530
From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Message-ID: <20151007152721.GA2689@scalar.blr.asicdesigners.com>
References: <cover.1443704150.git.rahul.lakkireddy@chelsio.com>
 <cover.1443704150.git.rahul.lakkireddy@chelsio.com>
 <318fc8559675b1157e7f049a6a955a6a2059bac7.1443704150.git.rahul.lakkireddy@chelsio.com>
 <f7tlhbl54cz.fsf@aconole.bos.csb>
 <20151005100620.GA2487@scalar.blr.asicdesigners.com>
 <2601191342CEEE43887BDE71AB97725836AA36CF@irsmsx105.ger.corp.intel.com>
 <20151005124205.GA24533@scalar.blr.asicdesigners.com>
 <2601191342CEEE43887BDE71AB97725836AA37E2@irsmsx105.ger.corp.intel.com>
 <20151005150729.GA8809@scalar.blr.asicdesigners.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20151005150729.GA8809@scalar.blr.asicdesigners.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Cc: "dev@dpdk.org" <dev@dpdk.org>, Felix Marti <felix@chelsio.com>,
 Nirranjan Kirubaharan <nirranjan@chelsio.com>, Kumar A S <kumaras@chelsio.com>
Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance
 for 40G
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Oct 2015 15:27:22 -0000

On Monday, October 10/05/15, 2015 at 20:37:31 +0530, Rahul Lakkireddy wrote:
> On Monday, October 10/05/15, 2015 at 07:09:27 -0700, Ananyev, Konstantin wrote:
> > Hi Rahul,
> 
> [...]
> 
> > > > > This additional check seems redundant for single segment
> > > > > packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
> > > > >
> > > > > Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
> > > > > rte_pktmbuf_free as it is faster.
> > > >
> > > > Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated
> > > > with it segment. So as HW is done with the TD, SW frees associated segment.
> > > > In your case I don't see any point in re-implementing rte_pktmbuf_free() manually,
> > > > and I don't think it would be any faster.
> > > >
> > > > Konstantin
> > > 
> > > As I mentioned below, I am clearly seeing a difference of 1 Mpps. And 1
> > > Mpps is not a small difference IMHO.
> > 
> > Agree with you here - it is a significant difference.
> > 
> > > 
> > > When running l3fwd with 8 queues, I also collected a perf report.
> > > When using rte_pktmbuf_free, I see that it eats up around 6% cpu as
> > > below in perf top report:-
> > > --------------------
> > > 32.00%  l3fwd                        [.] cxgbe_poll
> > > 22.25%  l3fwd                        [.] t4_eth_xmit
> > > 20.30%  l3fwd                        [.] main_loop
> > >  6.77%  l3fwd                        [.] rte_pktmbuf_free
> > >  4.86%  l3fwd                        [.] refill_fl_usembufs
> > >  2.00%  l3fwd                        [.] write_sgl
> > > .....
> > > --------------------
> > > 
> > > While, when using rte_pktmbuf_free_seg directly, I don't see above
> > > problem. perf top report now comes as:-
> > > -------------------
> > > 33.36%  l3fwd                        [.] cxgbe_poll
> > > 32.69%  l3fwd                        [.] t4_eth_xmit
> > > 19.05%  l3fwd                        [.] main_loop
> > >  5.21%  l3fwd                        [.] refill_fl_usembufs
> > >  2.40%  l3fwd                        [.] write_sgl
> > > ....
> > > -------------------
> > 
> > I don't think these 6% disappeared anywhere.
> > As I can see, now t4_eth_xmit() increased by roughly same amount
> > (you still have same job to do).
> 
> Right.
> 
> > To me it looks like in that case compiler didn't really inline rte_pktmbuf_free().
> > Wonder can you add 'always_inline' attribute to the  rte_pktmbuf_free(),
> > and see would it make any difference?
> > 
> > Konstantin 
> 
> I will try out above and update further.
> 

Tried always_inline and didn't see any difference in performance in
RHEL 6.4 with gcc 4.4.7, but was seeing 1 MPPS improvement with the
above block.

I've moved to latest RHEL 7.1 with gcc 4.8.3 and tried both
always_inline and the above block and I'm not seeing any difference
for both.

Will drop this block and submit a v2.

Thanks for the review Aaron and Konstantin.

Thanks,
Rahul