From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C88F241C48 for ; Thu, 9 Feb 2023 04:59:05 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 92B9C40DDA; Thu, 9 Feb 2023 04:59:05 +0100 (CET) Received: from CNSHPPMGWESA02.NOKIA-SBELL.COM (unknown [116.246.26.45]) by mails.dpdk.org (Postfix) with ESMTP id ABB814067B for ; Thu, 9 Feb 2023 04:59:02 +0100 (CET) X-IronPort-AV: E=Sophos;i="5.97,281,1669046400"; d="scan'208,217";a="10614796" Received: from unknown (HELO CNSHPPEXCH1603.nsn-intra.net) ([135.251.51.103]) by CNSHPPMGWESA02.NOKIA-SBELL.COM with ESMTP; 09 Feb 2023 11:58:56 +0800 Received: from CNSHPPEXCH1601.nsn-intra.net (135.251.51.101) by CNSHPPEXCH1603.nsn-intra.net (135.251.51.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 9 Feb 2023 11:58:56 +0800 Received: from CNSHPPEXCH1601.nsn-intra.net ([135.251.51.101]) by CNSHPPEXCH1601.nsn-intra.net ([135.251.51.101]) with mapi id 15.01.2375.034; Thu, 9 Feb 2023 11:58:56 +0800 From: "Xiaoping Yan (NSB)" To: "users@dpdk.org" Subject: cache miss increases when change rx descriptor from 512 to 2048 Thread-Topic: cache miss increases when change rx descriptor from 512 to 2048 Thread-Index: Adk8OQsj9xJt7PEUSNOohuqhMMgO6g== Date: Thu, 9 Feb 2023 03:58:56 +0000 Message-ID: <4b132ffd05594663b5abb71f42e6f97f@nokia-sbell.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [135.251.51.115] Content-Type: multipart/alternative; boundary="_000_4b132ffd05594663b5abb71f42e6f97fnokiasbellcom_" MIME-Version: 1.0 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --_000_4b132ffd05594663b5abb71f42e6f97fnokiasbellcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi experts, I had a traffic throughput test for my dpdk application, with same software= and test case, only difference is the number of rx/tx descriptor: Rx/tx descriptor 512, test result 3.2mpps Rx/tx descriptor 2048, test result 3mpp >From perf data, rx descriptor 2048 case has more cache miss, and lower inst= ruction per cycle Perf for 512 rx descriptor 114289237792 cpu-cycles 365408402395 instructions # 3.20 insn per cycle 74186289932 branches 36020793 branch-misses # 0.05% of all branche= s 1298741388 bus-cycles 3413460 cache-misses # 0.723 % of all cache= refs 472363654 cache-references Perf for 2048 rx descriptor: 57038451185 cpu-cycles 173805485573 instructions # 3.05 insn per cycle 35289607389 branches 15418885 branch-misses # 0.04% of all branche= s 648164239 bus-cycles 13170596 cache-misses # 1.702 % of all cache= refs 773765263 cache-references I understand it means more rx descriptor somehow causes more cache miss and= then less instruction per cycle, so lower performance. Any one observe similar results? Any idea to mitigate (or investigate further) the impact? (we want to use 2= 048 to better tolerate some jitter/burst) Any comment? Thank you. Br, Xiaoping --_000_4b132ffd05594663b5abb71f42e6f97fnokiasbellcom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi experts,

 

I had a traffic throughput test= for my dpdk application, with same software and test case, only difference= is the number of rx/tx descriptor:

Rx/tx descriptor 512, test resu= lt 3.2mpps

Rx/tx descriptor 2048, test res= ult 3mpp

From perf data, rx descriptor 2= 048 case has more cache miss, and lower instruction per cycle

Perf for 512 rx descriptor=

    &nbs= p; 114289237792      cpu-cycles<= /i>

    &nbs= p; 365408402395      instructions   = ;           #    3.20  insn per cycle=

    &nbs= p;  74186289932      branches

    &nbs= p;     36020793      branch-mi= sses            = ; #    0.05% of all branches

    &nbs= p;   1298741388      bus-cycles

    &nbs= p;      3413460      cach= e-misses              #    0.723 % of all cache refs=

    &nbs= p;    472363654      cache-referenc= es

Perf for 2048 rx descriptor:

    &nbs= p;  57038451185      cpu-cycles

    &nbs= p; 173805485573      instructions   = ;           #    3.05  insn per cycle<= /p>

    &nbs= p;  35289607389      branches

    &nbs= p;     15418885      branch-mi= sses            = ; #    0.04% of all branches

    &nbs= p;    648164239      bus-cycles

    &nbs= p;     13170596      cache-mis= ses              #    1.702 % of all cache refs

    &nbs= p;    773765263      cache-referenc= es

 

I understand it means more rx d= escriptor somehow causes more cache miss and then less instruction per cycl= e, so lower performance.

 

Any one observe similar results= ?

Any idea to mitigate (or invest= igate further) the impact? (we want to use 2048 to better tolerate some jit= ter/burst)

Any comment?<= /p>

 

Thank you.

 

Br, Xiaoping<= /p>

 

--_000_4b132ffd05594663b5abb71f42e6f97fnokiasbellcom_--