From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03on0047.outbound.protection.outlook.com [104.47.41.47]) by dpdk.org (Postfix) with ESMTP id 6EE4F6CBB for ; Tue, 11 Oct 2016 11:24:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=harmonic.onmicrosoft.com; s=selector1-harmonicinc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=YJnBYTFOB0shx8esZDP45cFg1YiPWhZiKu8rTls1jHk=; b=ANr0+iqTT2C2QS8nurCcTshBZhOnbjVBa/Vr3XTr0/GHI612cCWrzxUaAGZXnAhla0PyU0ISXj4pfSOUlzlxEXa3zomZy0nC1mSLyZQUr4L/0Vtn9+h8b0cSRvSj5z8OCoDJg+7RAr+aGNkZg7zUiGzoOvhrC9d0iSH5ihq4oDg= Received: from MWHPR11MB1360.namprd11.prod.outlook.com (10.169.235.22) by MWHPR11MB1357.namprd11.prod.outlook.com (10.169.232.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.649.16; Tue, 11 Oct 2016 09:24:38 +0000 Received: from MWHPR11MB1360.namprd11.prod.outlook.com ([10.169.235.22]) by MWHPR11MB1360.namprd11.prod.outlook.com ([10.169.235.22]) with mapi id 15.01.0659.020; Tue, 11 Oct 2016 09:24:38 +0000 From: Vladyslav Buslov To: "Ananyev, Konstantin" , "Wu, Jingjing" , "Yigit, Ferruh" , "Zhang, Helin" CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx Thread-Index: AQHSDoto1p5CkxHiFkqgnkKWcmK4QKCh1pSAgAAeNcCAASdSAIAAAS4w Date: Tue, 11 Oct 2016 09:24:37 +0000 Message-ID: References: <20160714172719.17502-1-vladyslav.buslov@harmonicinc.com> <20160714172719.17502-2-vladyslav.buslov@harmonicinc.com> <18156776-3658-a97d-3fbc-19c1a820a04d@intel.com> <9BB6961774997848B5B42BEC655768F80E277DFC@SHSMSX103.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB9772583F0C0408@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772583F0C0408@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Vladyslav.Buslov@harmonicinc.com; x-originating-ip: [95.67.66.62] x-ms-office365-filtering-correlation-id: 1be2b3a7-305c-4cd2-361e-08d3f1b86c55 x-microsoft-exchange-diagnostics: 1; MWHPR11MB1357; 7:5yat3gnWIsyFiYaPq3U6h+QgGidp49KAYJTnv406y05bl54dgXCGnnC4aRGXkmUwPMGI1ZyjNSGKrBbh529+qRT2pUd8f6nikkgjzRO1G00N0jNIWC2q8pb6JwzCD9rwYTT3XJVNnm4U4F5INwGuAwp/Z1zTng59aj5LFC1rVUC7YEvfYWjhv9sci7GVDiNHHZfzqNWY1Zgb5lC37p712bdwvduQLUAOJeRAsYDf0zTMEFb9/3y8ENyqEHATxhgO+wVvol6vjhqzauzwiYKI/qkDiWUqIu/akP2UfFTeZkuWJ4ZenQk7rb4AVdW9/sL192xpSDNNkjh2xH8k6EMlLaBWlALzJSyL86wclMymFp0=; 20:/MH/Wn6g3oXfBSkLkqSTlnNVn2IRPTJaSP9CJBjhiiz2J93GAuhUzPdWDVgDfa/+vC92jo3V6z58H24Pzl0fc8kNKFGoR4VhTnB54Hu/YP3sHEoWM1rCGcdY67BLxaJOqxdv4IPP482E1ZIWxMW6JUAI+f/paQ2kvdcHa3f4oi8= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:MWHPR11MB1357; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026); SRVR:MWHPR11MB1357; BCL:0; PCL:0; RULEID:; SRVR:MWHPR11MB1357; x-forefront-prvs: 00922518D8 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(13464003)(24454002)(199003)(189002)(377454003)(11100500001)(2906002)(106116001)(106356001)(2900100001)(3900700001)(5660300001)(77096005)(33656002)(87936001)(7846002)(4326007)(19580405001)(74316002)(122556002)(86362001)(7736002)(92566002)(8936002)(19580395003)(10400500002)(305945005)(68736007)(2950100002)(50986999)(76176999)(7696004)(8676002)(189998001)(81156014)(54356999)(5002640100001)(76576001)(99286002)(9686002)(81166006)(97736004)(102836003)(105586002)(3846002)(6116002)(5001770100001)(93886004)(586003)(3660700001)(3280700002)(66066001)(101416001)(83323001); DIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR11MB1357; H:MWHPR11MB1360.namprd11.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: harmonicinc.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: harmonicinc.com X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Oct 2016 09:24:37.4755 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 19294cf8-3352-4dde-be9e-7f47b9b6b73d X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR11MB1357 Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Oct 2016 09:24:41 -0000 Hi Konstantin, > -----Original Message----- > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] > Sent: Tuesday, October 11, 2016 11:51 AM > To: Vladyslav Buslov; Wu, Jingjing; Yigit, Ferruh; Zhang, Helin > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > instructions for bulk rx >=20 > Hi Vladislav, >=20 > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov > > Sent: Monday, October 10, 2016 6:06 PM > > To: Wu, Jingjing ; Yigit, Ferruh > > ; Zhang, Helin > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > > instructions for bulk rx > > > > > -----Original Message----- > > > From: Wu, Jingjing [mailto:jingjing.wu@intel.com] > > > Sent: Monday, October 10, 2016 4:26 PM > > > To: Yigit, Ferruh; Vladyslav Buslov; Zhang, Helin > > > Cc: dev@dpdk.org > > > Subject: RE: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > > > instructions for bulk rx > > > > > > > > > > > > > -----Original Message----- > > > > From: Yigit, Ferruh > > > > Sent: Wednesday, September 14, 2016 9:25 PM > > > > To: Vladyslav Buslov ; Zhang, > > > > Helin ; Wu, Jingjing > > > > > > > > Cc: dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > > > > instructions for bulk rx > > > > > > > > On 7/14/2016 6:27 PM, Vladyslav Buslov wrote: > > > > > Added prefetch of first packet payload cacheline in > > > > > i40e_rx_scan_hw_ring Added prefetch of second mbuf cacheline in > > > > > i40e_rx_alloc_bufs > > > > > > > > > > Signed-off-by: Vladyslav Buslov > > > > > > > > > > --- > > > > > drivers/net/i40e/i40e_rxtx.c | 7 +++++-- > > > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c > > > > > b/drivers/net/i40e/i40e_rxtx.c index d3cfb98..e493fb4 100644 > > > > > --- a/drivers/net/i40e/i40e_rxtx.c > > > > > +++ b/drivers/net/i40e/i40e_rxtx.c > > > > > @@ -1003,6 +1003,7 @@ i40e_rx_scan_hw_ring(struct > i40e_rx_queue > > > *rxq) > > > > > /* Translate descriptor info to mbuf parameters *= / > > > > > for (j =3D 0; j < nb_dd; j++) { > > > > > mb =3D rxep[j].mbuf; > > > > > + rte_prefetch0(RTE_PTR_ADD(mb->buf_addr, > > > > RTE_PKTMBUF_HEADROOM)); > > > > > > Why did prefetch here? I think if application need to deal with > > > packet, it is more suitable to put it in application. > > > > > > > > qword1 =3D rte_le_to_cpu_64(\ > > > > > rxdp[j].wb.qword1.status_error_le= n); > > > > > pkt_len =3D ((qword1 & > > > > I40E_RXD_QW1_LENGTH_PBUF_MASK) >> > > > > > @@ -1086,9 +1087,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue > > > *rxq) > > > > > > > > > > rxdp =3D &rxq->rx_ring[alloc_idx]; > > > > > for (i =3D 0; i < rxq->rx_free_thresh; i++) { > > > > > - if (likely(i < (rxq->rx_free_thresh - 1))) > > > > > + if (likely(i < (rxq->rx_free_thresh - 1))) { > > > > > /* Prefetch next mbuf */ > > > > > - rte_prefetch0(rxep[i + 1].mbuf); > > > > > + rte_prefetch0(&rxep[i + 1].mbuf->cachelin= e0); > > > > > + rte_prefetch0(&rxep[i + > > > > > + 1].mbuf->cacheline1); >=20 > I think there are rte_mbuf_prefetch_part1/part2 defined in rte_mbuf.h, > specially for that case. Thanks for pointing that out. I'll submit new patch if you decide to move forward with this development. >=20 > > > > > + } > > > Agree with this change. And when I test it by testpmd with iofwd, no > > > performance increase is observed but minor decrease. > > > Can you share will us when it will benefit the performance in your > scenario ? > > > > > > > > > Thanks > > > Jingjing > > > > Hello Jingjing, > > > > Thanks for code review. > > > > My use case: We have simple distributor thread that receives packets > > from port and distributes them among worker threads according to VLAN > and MAC address hash. > > > > While working on performance optimization we determined that most of > CPU usage of this thread is in DPDK. > > As and optimization we decided to switch to rx burst alloc function, > > however that caused additional performance degradation compared to > scatter rx mode. > > In profiler two major culprits were: > > 1. Access to packet data Eth header in application code. (cache miss) > > 2. Setting next packet descriptor field to NULL in DPDK > > i40e_rx_alloc_bufs code. (this field is in second descriptor cache > > line that was not > > prefetched) >=20 > I wonder what will happen if we'll remove any prefetches here? > Would it make things better or worse (and by how much)? In our case it causes few per cent PPS degradation on next=3DNULL assignmen= t but it seems that JingJing's test doesn't confirm it. >=20 > > After applying my fixes performance improved compared to scatter rx > mode. > > > > I assumed that prefetch of first cache line of packet data belongs to > > DPDK because it is done in scatter rx mode. (in > > i40e_recv_scattered_pkts) > > It can be moved to application side but IMO it is better to be consiste= nt > across all rx modes. >=20 > I would agree with Jingjing here, probably PMD should avoid to prefetch > packet's data. Actually I can see some valid use cases where it is beneficial to have this= prefetch in driver. In our sw distributor case it is trivial to just prefetch next packet on ea= ch iteration because packets are processed one by one. However when we move this functionality to hw by means of RSS/vfunction/Flo= wDirector(our long term goal) worker threads will receive packets directly = from rx queues of NIC. First operation of worker thread is to perform bulk lookup in hash table by= destination MAC. This will cause cache miss on accessing each eth header a= nd can't be easily mitigated in application code. I assume it is ubiquitous use case for DPDK. Regards, Vladyslav