From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id B4C196A95 for ; Thu, 11 Dec 2014 03:05:02 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 10 Dec 2014 18:04:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,555,1413270000"; d="scan'208";a="651961721" Received: from shvmail01.sh.intel.com ([10.239.29.42]) by orsmga002.jf.intel.com with ESMTP; 10 Dec 2014 18:04:57 -0800 Received: from shecgisg004.sh.intel.com (shecgisg004.sh.intel.com [10.239.29.89]) by shvmail01.sh.intel.com with ESMTP id sBB24tup010851; Thu, 11 Dec 2014 10:04:55 +0800 Received: from shecgisg004.sh.intel.com (localhost [127.0.0.1]) by shecgisg004.sh.intel.com (8.13.6/8.13.6/SuSE Linux 0.8) with ESMTP id sBB24rb0021283; Thu, 11 Dec 2014 10:04:55 +0800 Received: (from cliang18@localhost) by shecgisg004.sh.intel.com (8.13.6/8.13.6/Submit) id sBB24rne021279; Thu, 11 Dec 2014 10:04:53 +0800 From: Cunming Liang To: dev@dpdk.org Date: Thu, 11 Dec 2014 10:04:43 +0800 Message-Id: <1418263490-21088-1-git-send-email-cunming.liang@intel.com> X-Mailer: git-send-email 1.7.4.1 Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Dec 2014 02:05:03 -0000 Scope & Usage Scenario ======================== DPDK usually pin pthread per core to avoid task switch overhead. It gains performance a lot, but it's not efficient in all cases. In some cases, it may too expensive to use the whole core for a lightweight workload. It's a reasonable demand to have multiple threads per core and each threads share CPU in an assigned weight. In fact, nothing avoid user to create normal pthread and using cgroup to control the CPU share. One of the purpose for the patchset is to clean the gaps of using more DPDK libraries in the normal pthread. In addition, it demonstrates performance gain by proactive 'yield' when doing idle loop in packet IO. It also provides several 'rte_pthread_*' APIs to easy life. Changes to DPDK libraries ========================== Some of DPDK libraries must run in DPDK environment. # rte_mempool In rte_mempool doc, it mentions a thread not created by EAL must not use mempools. The root cause is it uses a per-lcore cache inside mempool. And 'rte_lcore_id()' will not return a correct value. The patchset changes this a little. The index of mempool cache won't be a lcore_id. Instead of it, using a linear number generated by the allocator. For those legacy EAL per-lcore thread, it apply for an unique linear id during creation. For those normal pthread expecting to use rte_mempool, it requires to apply for a linear id explicitly. Now the mempool cache looks like a per-thread base. The linear ID actually identify for the linear thread id. However, there's another problem. The rte_mempool is not preemptable. The problem comes from rte_ring, so talk together in next section. # rte_ring rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's not preemptable. There's conversation talking about this before. http://dpdk.org/ml/archives/dev/2013-November/000714.html Let's say there's two pthreads running on the same core doing enqueue on the same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it has already modified the prod.head, the 2nd pthread will spin until the 1st one scheduled agian. It causes time wasting. In addition, if the 2nd pthread has absolutely higer priority, it's more terrible. But it doesn't means we can't use. Just need to narrow down the situation when it's used by multi-pthread on the same core. - It CAN be used for any single-producer or single-consumer situation. - It MAY be used by multi-producer/consumer pthread whose scheduling policy are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor using it. - It MUST not be used by multi-producer/consumer pthread, while some of their scheduling policies is SCHED_FIFO or SCHED_RR. Performance ============== It loses performance by introducing task switching. On packet IO perspective, we can gain some back by improving IO effective rate. When the pthread do idle loop on an empty rx queue, it should proactively yield. We can also slow down rx for a bit while to take more advantage of the bulk receiving in the next loop. In practice, increase the rx ring size also helps to improve the overrall throughput. Cgroup Control ================ Here's a simple example, there's four pthread doing packet IO on the same core. We expect the CPU share rate is 1:1:2:4. > mkdir /sys/fs/cgroup/cpu/dpdk > mkdir /sys/fs/cgroup/cpu/dpdk/thread0 > mkdir /sys/fs/cgroup/cpu/dpdk/thread1 > mkdir /sys/fs/cgroup/cpu/dpdk/thread2 > mkdir /sys/fs/cgroup/cpu/dpdk/thread3 > cd /sys/fs/cgroup/cpu/dpdk > echo 256 > thread0/cpu.shares > echo 256 > thread1/cpu.shares > echo 512 > thread2/cpu.shares > echo 1024 > thread3/cpu.shares -END- Any comments are welcome. Thanks *** BLURB HERE *** Cunming Liang (7): eal: add linear thread id as pthread-local variable mempool: use linear-tid as mempool cache index ring: use linear-tid as ring debug stats index eal: add simple API for multi-pthread testpmd: support multi-pthread mode sample: add new sample for multi-pthread eal: macro for cpuset w/ or w/o CPU_ALLOC app/test-pmd/cmdline.c | 41 +++++ app/test-pmd/testpmd.c | 84 ++++++++- app/test-pmd/testpmd.h | 1 + config/common_linuxapp | 1 + examples/multi-pthread/Makefile | 57 ++++++ examples/multi-pthread/main.c | 232 ++++++++++++++++++++++++ examples/multi-pthread/main.h | 46 +++++ lib/librte_eal/common/include/rte_eal.h | 15 ++ lib/librte_eal/common/include/rte_lcore.h | 12 ++ lib/librte_eal/linuxapp/eal/eal_thread.c | 282 +++++++++++++++++++++++++++--- lib/librte_mempool/rte_mempool.h | 22 +-- lib/librte_ring/rte_ring.h | 6 +- 12 files changed, 755 insertions(+), 44 deletions(-) create mode 100644 examples/multi-pthread/Makefile create mode 100644 examples/multi-pthread/main.c create mode 100644 examples/multi-pthread/main.h -- 1.8.1.4