DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Jayakumar, Muthurajan" <muthurajan.jayakumar@intel.com>
To: "Liang, Cunming" <cunming.liang@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
Date: Thu, 11 Dec 2014 02:54:57 +0000	[thread overview]
Message-ID: <5D695A7F6F10504DBD9B9187395A21797D185E40@ORSMSX112.amr.corp.intel.com> (raw)
In-Reply-To: <1418263490-21088-1-git-send-email-cunming.liang@intel.com>


Great write up.
Nice explanation of 1) per-lcore numbering and 2) Multi-producer/consumer enqueue -dequeue.


-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
Sent: Wednesday, December 10, 2014 6:05 PM
To: dev@dpdk.org
Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

Scope & Usage Scenario

DPDK usually pin pthread per core to avoid task switch overhead. It gains performance a lot, but it's not efficient in all cases. In some cases, it may too expensive to use the whole core for a lightweight workload. It's a reasonable demand to have multiple threads per core and each threads share CPU in an assigned weight.

In fact, nothing avoid user to create normal pthread and using cgroup to control the CPU share. One of the purpose for the patchset is to clean the gaps of using more DPDK libraries in the normal pthread. In addition, it demonstrates performance gain by proactive 'yield' when doing idle loop in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.

Changes to DPDK libraries

Some of DPDK libraries must run in DPDK environment.

# rte_mempool

In rte_mempool doc, it mentions a thread not created by EAL must not use mempools. The root cause is it uses a per-lcore cache inside mempool. 
And 'rte_lcore_id()' will not return a correct value.

The patchset changes this a little. The index of mempool cache won't be a lcore_id. Instead of it, using a linear number generated by the allocator.
For those legacy EAL per-lcore thread, it apply for an unique linear id during creation. For those normal pthread expecting to use rte_mempool, it requires to apply for a linear id explicitly. Now the mempool cache looks like a per-thread base. The linear ID actually identify for the linear thread id.

However, there's another problem. The rte_mempool is not preemptable. The problem comes from rte_ring, so talk together in next section.

# rte_ring

rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's not preemptable. There's conversation talking about this before.

Let's say there's two pthreads running on the same core doing enqueue on the same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it has already modified the prod.head, the 2nd pthread will spin until the 1st one scheduled agian. It causes time wasting. In addition, if the 2nd pthread has absolutely higer priority, it's more terrible.

But it doesn't means we can't use. Just need to narrow down the situation when it's used by multi-pthread on the same core.
- It CAN be used for any single-producer or single-consumer situation.
- It MAY be used by multi-producer/consumer pthread whose scheduling policy are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor using it.
- It MUST not be used by multi-producer/consumer pthread, while some of their scheduling policies is SCHED_FIFO or SCHED_RR.


It loses performance by introducing task switching. On packet IO perspective, we can gain some back by improving IO effective rate. When the pthread do idle loop on an empty rx queue, it should proactively yield. We can also slow down rx for a bit while to take more advantage of the bulk receiving in the next loop. In practice, increase the rx ring size also helps to improve the overrall throughput.

Cgroup Control

Here's a simple example, there's four pthread doing packet IO on the same core.
We expect the CPU share rate is 1:1:2:4.
> mkdir /sys/fs/cgroup/cpu/dpdk
> mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> cd /sys/fs/cgroup/cpu/dpdk
> echo 256 > thread0/cpu.shares
> echo 256 > thread1/cpu.shares
> echo 512 > thread2/cpu.shares
> echo 1024 > thread3/cpu.shares


Any comments are welcome.


*** BLURB HERE ***

Cunming Liang (7):
  eal: add linear thread id as pthread-local variable
  mempool: use linear-tid as mempool cache index
  ring: use linear-tid as ring debug stats index
  eal: add simple API for multi-pthread
  testpmd: support multi-pthread mode
  sample: add new sample for multi-pthread
  eal: macro for cpuset w/ or w/o CPU_ALLOC

 app/test-pmd/cmdline.c                    |  41 +++++
 app/test-pmd/testpmd.c                    |  84 ++++++++-
 app/test-pmd/testpmd.h                    |   1 +
 config/common_linuxapp                    |   1 +
 examples/multi-pthread/Makefile           |  57 ++++++
 examples/multi-pthread/main.c             | 232 ++++++++++++++++++++++++
 examples/multi-pthread/main.h             |  46 +++++
 lib/librte_eal/common/include/rte_eal.h   |  15 ++
 lib/librte_eal/common/include/rte_lcore.h |  12 ++  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282 +++++++++++++++++++++++++++---
 lib/librte_mempool/rte_mempool.h          |  22 +--
 lib/librte_ring/rte_ring.h                |   6 +-
 12 files changed, 755 insertions(+), 44 deletions(-)  create mode 100644 examples/multi-pthread/Makefile  create mode 100644 examples/multi-pthread/main.c  create mode 100644 examples/multi-pthread/main.h


  parent reply	other threads:[~2014-12-11  2:55 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11  2:04 Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
2014-12-16  7:00   ` Qiu, Michael
2014-12-22 19:02   ` Ananyev, Konstantin
2014-12-23  9:56     ` Liang, Cunming
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 2/7] mempool: use linear-tid as mempool cache index Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 3/7] ring: use linear-tid as ring debug stats index Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 4/7] eal: add simple API for multi-pthread Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 5/7] testpmd: support multi-pthread mode Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 6/7] sample: add new sample for multi-pthread Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 7/7] eal: macro for cpuset w/ or w/o CPU_ALLOC Cunming Liang
2014-12-11  2:54 ` Jayakumar, Muthurajan [this message]
2014-12-11  9:56 ` [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Walukiewicz, Miroslaw
2014-12-12  5:44   ` Liang, Cunming
2014-12-15 11:10     ` Walukiewicz, Miroslaw
2014-12-15 11:53       ` Liang, Cunming
2014-12-18 12:20         ` Walukiewicz, Miroslaw
2014-12-18 14:32           ` Bruce Richardson
2014-12-18 15:11             ` Olivier MATZ
2014-12-18 16:04               ` Bruce Richardson
2014-12-18 16:15           ` Stephen Hemminger
2014-12-19  1:28           ` Liang, Cunming
2014-12-19 10:03             ` Bruce Richardson
2014-12-22  1:51               ` Liang, Cunming
2014-12-22  9:46                 ` Bruce Richardson
2014-12-22 10:01                   ` Walukiewicz, Miroslaw
2014-12-23  9:45                     ` Liang, Cunming
2014-12-22 18:28                   ` Stephen Hemminger
2014-12-23  9:19                     ` Walukiewicz, Miroslaw
2014-12-23  9:23                       ` Bruce Richardson
2014-12-23  9:51                     ` Liang, Cunming
2015-01-08 17:05                       ` Ananyev, Konstantin
2015-01-08 17:23                         ` Richardson, Bruce
2015-01-09  9:51                           ` Liang, Cunming
2015-01-09  9:40                         ` Liang, Cunming
2015-01-09 11:52                           ` Ananyev, Konstantin
2015-01-09  9:45                         ` Liang, Cunming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D695A7F6F10504DBD9B9187395A21797D185E40@ORSMSX112.amr.corp.intel.com \
    --to=muthurajan.jayakumar@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=dev@dpdk.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).