From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 1FED72A9 for ; Thu, 11 Dec 2014 03:55:03 +0100 (CET) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 10 Dec 2014 18:55:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,555,1413270000"; d="scan'208";a="645822025" Received: from orsmsx107.amr.corp.intel.com ([10.22.240.5]) by fmsmga002.fm.intel.com with ESMTP; 10 Dec 2014 18:54:58 -0800 Received: from orsmsx158.amr.corp.intel.com (10.22.240.20) by ORSMSX107.amr.corp.intel.com (10.22.240.5) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 10 Dec 2014 18:54:57 -0800 Received: from orsmsx112.amr.corp.intel.com ([169.254.12.164]) by ORSMSX158.amr.corp.intel.com ([169.254.10.157]) with mapi id 14.03.0195.001; Wed, 10 Dec 2014 18:54:57 -0800 From: "Jayakumar, Muthurajan" To: "Liang, Cunming" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Thread-Index: AQHQFOb5OTtQ8O2AbEO98shitiyscJyJsRSQ Date: Thu, 11 Dec 2014 02:54:57 +0000 Message-ID: <5D695A7F6F10504DBD9B9187395A21797D185E40@ORSMSX112.amr.corp.intel.com> References: <1418263490-21088-1-git-send-email-cunming.liang@intel.com> In-Reply-To: <1418263490-21088-1-git-send-email-cunming.liang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.140] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Dec 2014 02:55:06 -0000 Steve,=20 Great write up. Nice explanation of 1) per-lcore numbering and 2) Multi-producer/consumer e= nqueue -dequeue. Thanks, -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang Sent: Wednesday, December 10, 2014 6:05 PM To: dev@dpdk.org Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Scope & Usage Scenario =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 DPDK usually pin pthread per core to avoid task switch overhead. It gains p= erformance a lot, but it's not efficient in all cases. In some cases, it ma= y too expensive to use the whole core for a lightweight workload. It's a re= asonable demand to have multiple threads per core and each threads share CP= U in an assigned weight. In fact, nothing avoid user to create normal pthread and using cgroup to co= ntrol the CPU share. One of the purpose for the patchset is to clean the ga= ps of using more DPDK libraries in the normal pthread. In addition, it demo= nstrates performance gain by proactive 'yield' when doing idle loop in pack= et IO. It also provides several 'rte_pthread_*' APIs to easy life. Changes to DPDK libraries =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D Some of DPDK libraries must run in DPDK environment. # rte_mempool In rte_mempool doc, it mentions a thread not created by EAL must not use me= mpools. The root cause is it uses a per-lcore cache inside mempool.=20 And 'rte_lcore_id()' will not return a correct value. The patchset changes this a little. The index of mempool cache won't be a l= core_id. Instead of it, using a linear number generated by the allocator. For those legacy EAL per-lcore thread, it apply for an unique linear id dur= ing creation. For those normal pthread expecting to use rte_mempool, it req= uires to apply for a linear id explicitly. Now the mempool cache looks like= a per-thread base. The linear ID actually identify for the linear thread i= d. However, there's another problem. The rte_mempool is not preemptable. The p= roblem comes from rte_ring, so talk together in next section. # rte_ring rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it= 's not preemptable. There's conversation talking about this before. http://dpdk.org/ml/archives/dev/2013-November/000714.html Let's say there's two pthreads running on the same core doing enqueue on th= e same rte_ring. If the 1st pthread is preempted by the 2nd pthread while i= t has already modified the prod.head, the 2nd pthread will spin until the 1= st one scheduled agian. It causes time wasting. In addition, if the 2nd pth= read has absolutely higer priority, it's more terrible. But it doesn't means we can't use. Just need to narrow down the situation w= hen it's used by multi-pthread on the same core. - It CAN be used for any single-producer or single-consumer situation. - It MAY be used by multi-producer/consumer pthread whose scheduling policy= are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty bef= or using it. - It MUST not be used by multi-producer/consumer pthread, while some of the= ir scheduling policies is SCHED_FIFO or SCHED_RR. Performance =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D It loses performance by introducing task switching. On packet IO perspectiv= e, we can gain some back by improving IO effective rate. When the pthread d= o idle loop on an empty rx queue, it should proactively yield. We can also = slow down rx for a bit while to take more advantage of the bulk receiving i= n the next loop. In practice, increase the rx ring size also helps to impro= ve the overrall throughput. Cgroup Control =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Here's a simple example, there's four pthread doing packet IO on the same c= ore. We expect the CPU share rate is 1:1:2:4. > mkdir /sys/fs/cgroup/cpu/dpdk > mkdir /sys/fs/cgroup/cpu/dpdk/thread0 > mkdir /sys/fs/cgroup/cpu/dpdk/thread1 > mkdir /sys/fs/cgroup/cpu/dpdk/thread2 > mkdir /sys/fs/cgroup/cpu/dpdk/thread3 > cd /sys/fs/cgroup/cpu/dpdk > echo 256 > thread0/cpu.shares > echo 256 > thread1/cpu.shares > echo 512 > thread2/cpu.shares > echo 1024 > thread3/cpu.shares -END- Any comments are welcome. Thanks *** BLURB HERE *** Cunming Liang (7): eal: add linear thread id as pthread-local variable mempool: use linear-tid as mempool cache index ring: use linear-tid as ring debug stats index eal: add simple API for multi-pthread testpmd: support multi-pthread mode sample: add new sample for multi-pthread eal: macro for cpuset w/ or w/o CPU_ALLOC app/test-pmd/cmdline.c | 41 +++++ app/test-pmd/testpmd.c | 84 ++++++++- app/test-pmd/testpmd.h | 1 + config/common_linuxapp | 1 + examples/multi-pthread/Makefile | 57 ++++++ examples/multi-pthread/main.c | 232 ++++++++++++++++++++++++ examples/multi-pthread/main.h | 46 +++++ lib/librte_eal/common/include/rte_eal.h | 15 ++ lib/librte_eal/common/include/rte_lcore.h | 12 ++ lib/librte_eal/linuxap= p/eal/eal_thread.c | 282 +++++++++++++++++++++++++++--- lib/librte_mempool/rte_mempool.h | 22 +-- lib/librte_ring/rte_ring.h | 6 +- 12 files changed, 755 insertions(+), 44 deletions(-) create mode 100644 e= xamples/multi-pthread/Makefile create mode 100644 examples/multi-pthread/m= ain.c create mode 100644 examples/multi-pthread/main.h -- 1.8.1.4