DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
@ 2014-12-11  2:04 Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Scope & Usage Scenario 
========================  

DPDK usually pin pthread per core to avoid task switch overhead. It gains 
performance a lot, but it's not efficient in all cases. In some cases, it may
too expensive to use the whole core for a lightweight workload. It's a 
reasonable demand to have multiple threads per core and each threads share CPU 
in an assigned weight.

In fact, nothing avoid user to create normal pthread and using cgroup to 
control the CPU share. One of the purpose for the patchset is to clean the 
gaps of using more DPDK libraries in the normal pthread. In addition, it 
demonstrates performance gain by proactive 'yield' when doing idle loop 
in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.


Changes to DPDK libraries
==========================

Some of DPDK libraries must run in DPDK environment.

# rte_mempool

In rte_mempool doc, it mentions a thread not created by EAL must not use
mempools. The root cause is it uses a per-lcore cache inside mempool. 
And 'rte_lcore_id()' will not return a correct value.

The patchset changes this a little. The index of mempool cache won't be a 
lcore_id. Instead of it, using a linear number generated by the allocator.
For those legacy EAL per-lcore thread, it apply for an unique linear id 
during creation. For those normal pthread expecting to use rte_mempool, it
requires to apply for a linear id explicitly. Now the mempool cache looks like
a per-thread base. The linear ID actually identify for the linear thread id.

However, there's another problem. The rte_mempool is not preemptable. The 
problem comes from rte_ring, so talk together in next section.

# rte_ring

rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's 
not preemptable. There's conversation talking about this before.
http://dpdk.org/ml/archives/dev/2013-November/000714.html

Let's say there's two pthreads running on the same core doing enqueue on the 
same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it has 
already modified the prod.head, the 2nd pthread will spin until the 1st one 
scheduled agian. It causes time wasting. In addition, if the 2nd pthread has 
absolutely higer priority, it's more terrible.

But it doesn't means we can't use. Just need to narrow down the situation when 
it's used by multi-pthread on the same core.
- It CAN be used for any single-producer or single-consumer situation.
- It MAY be used by multi-producer/consumer pthread whose scheduling policy
are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor 
using it.
- It MUST not be used by multi-producer/consumer pthread, while some of their
scheduling policies is SCHED_FIFO or SCHED_RR.


Performance
==============

It loses performance by introducing task switching. On packet IO perspective,
we can gain some back by improving IO effective rate. When the pthread do idle 
loop on an empty rx queue, it should proactively yield. We can also slow down
rx for a bit while to take more advantage of the bulk receiving in the next 
loop. In practice, increase the rx ring size also helps to improve the overrall
throughput.


Cgroup Control
================

Here's a simple example, there's four pthread doing packet IO on the same core.
We expect the CPU share rate is 1:1:2:4.
> mkdir /sys/fs/cgroup/cpu/dpdk
> mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> cd /sys/fs/cgroup/cpu/dpdk
> echo 256 > thread0/cpu.shares
> echo 256 > thread1/cpu.shares
> echo 512 > thread2/cpu.shares
> echo 1024 > thread3/cpu.shares


-END-

Any comments are welcome.

Thanks

*** BLURB HERE ***

Cunming Liang (7):
  eal: add linear thread id as pthread-local variable
  mempool: use linear-tid as mempool cache index
  ring: use linear-tid as ring debug stats index
  eal: add simple API for multi-pthread
  testpmd: support multi-pthread mode
  sample: add new sample for multi-pthread
  eal: macro for cpuset w/ or w/o CPU_ALLOC

 app/test-pmd/cmdline.c                    |  41 +++++
 app/test-pmd/testpmd.c                    |  84 ++++++++-
 app/test-pmd/testpmd.h                    |   1 +
 config/common_linuxapp                    |   1 +
 examples/multi-pthread/Makefile           |  57 ++++++
 examples/multi-pthread/main.c             | 232 ++++++++++++++++++++++++
 examples/multi-pthread/main.h             |  46 +++++
 lib/librte_eal/common/include/rte_eal.h   |  15 ++
 lib/librte_eal/common/include/rte_lcore.h |  12 ++
 lib/librte_eal/linuxapp/eal/eal_thread.c  | 282 +++++++++++++++++++++++++++---
 lib/librte_mempool/rte_mempool.h          |  22 +--
 lib/librte_ring/rte_ring.h                |   6 +-
 12 files changed, 755 insertions(+), 44 deletions(-)
 create mode 100644 examples/multi-pthread/Makefile
 create mode 100644 examples/multi-pthread/main.c
 create mode 100644 examples/multi-pthread/main.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-16  7:00   ` Qiu, Michael
  2014-12-22 19:02   ` Ananyev, Konstantin
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 2/7] mempool: use linear-tid as mempool cache index Cunming Liang
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 lib/librte_eal/common/include/rte_eal.h   |   5 ++
 lib/librte_eal/common/include/rte_lcore.h |  12 ++++
 lib/librte_eal/linuxapp/eal/eal_thread.c  | 115 ++++++++++++++++++++++++++++--
 3 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index f4ecd2e..2640167 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -262,6 +262,11 @@ rte_set_application_usage_hook( rte_usage_hook_t usage_func );
  */
 int rte_eal_has_hugepages(void);
 
+#ifndef RTE_MAX_THREAD
+#define RTE_MAX_THREAD                RTE_MAX_LCORE
+#endif
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 49b2c03..cd83d47 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -73,6 +73,7 @@ struct lcore_config {
 extern struct lcore_config lcore_config[RTE_MAX_LCORE];
 
 RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
+RTE_DECLARE_PER_LCORE(unsigned, _thread_id); /**< Per thread "linear tid". */
 
 /**
  * Return the ID of the execution unit we are running on.
@@ -86,6 +87,17 @@ rte_lcore_id(void)
 }
 
 /**
+ * Return the linear thread ID of the cache unit we are running on.
+ * @return
+ *   core ID
+ */
+static inline unsigned long
+rte_linear_thread_id(void)
+{
+	return RTE_PER_LCORE(_thread_id);
+}
+
+/**
  * Get the id of the master lcore
  *
  * @return
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 80a985f..52478d6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include <pthread.h>
 #include <sched.h>
 #include <sys/queue.h>
+#include <string.h>
 
 #include <rte_debug.h>
 #include <rte_atomic.h>
@@ -51,12 +52,19 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
+#include <rte_spinlock.h>
+#include <rte_common.h>
 
 #include "eal_private.h"
 #include "eal_thread.h"
 
+#define LINEAR_THREAD_ID_POOL        "THREAD_ID_POOL"
+
 RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
 
+/* define linear thread id as thread-local variables */
+RTE_DEFINE_PER_LCORE(unsigned, _thread_id);
+
 /*
  * Send a message to a slave lcore identified by slave_id to call a
  * function f with argument arg. Once the execution is done, the
@@ -94,12 +102,13 @@ rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
 	return 0;
 }
 
+
 /* set affinity for current thread */
 static int
-eal_thread_set_affinity(void)
+__eal_thread_set_affinity(pthread_t thread, unsigned lcore)
 {
+
 	int s;
-	pthread_t thread;
 
 /*
  * According to the section VERSIONS of the CPU_ALLOC man page:
@@ -126,9 +135,8 @@ eal_thread_set_affinity(void)
 
 	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
 	CPU_ZERO_S(size, cpusetp);
-	CPU_SET_S(rte_lcore_id(), size, cpusetp);
+	CPU_SET_S(lcore, size, cpusetp);
 
-	thread = pthread_self();
 	s = pthread_setaffinity_np(thread, size, cpusetp);
 	if (s != 0) {
 		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
@@ -140,9 +148,8 @@ eal_thread_set_affinity(void)
 #else /* CPU_ALLOC */
 	cpu_set_t cpuset;
 	CPU_ZERO( &cpuset );
-	CPU_SET( rte_lcore_id(), &cpuset );
+	CPU_SET(lcore, &cpuset );
 
-	thread = pthread_self();
 	s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
 	if (s != 0) {
 		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
@@ -152,6 +159,15 @@ eal_thread_set_affinity(void)
 	return 0;
 }
 
+/* set affinity for current thread */
+static int
+eal_thread_set_affinity(void)
+{
+	pthread_t thread = pthread_self();
+
+	return __eal_thread_set_affinity(thread, rte_lcore_id());
+}
+
 void eal_thread_init_master(unsigned lcore_id)
 {
 	/* set the lcore ID in per-lcore memory area */
@@ -162,6 +178,87 @@ void eal_thread_init_master(unsigned lcore_id)
 		rte_panic("cannot set affinity\n");
 }
 
+/* linear thread id control block */
+struct eal_thread_cb {
+	rte_spinlock_t lock;
+	uint64_t nb_bucket;
+	uint64_t bitmap[0];
+};
+
+static struct eal_thread_cb *
+__create_tid_pool(void)
+{
+	const struct rte_memzone *mz;
+	struct eal_thread_cb *pcb;
+	uint64_t sz;
+	uint64_t nb_bucket;
+
+	nb_bucket = RTE_ALIGN_CEIL(RTE_MAX_THREAD, 64) / 64;
+	sz = sizeof(*pcb) + nb_bucket * sizeof(uint64_t);
+	mz = rte_memzone_reserve(LINEAR_THREAD_ID_POOL,
+				 sz, rte_socket_id(), 0);
+	if (mz == NULL)
+		rte_panic("Cannot allocate linear thread ID pool\n");
+
+	pcb = mz->addr;
+	rte_spinlock_init(&pcb->lock);
+	pcb->nb_bucket = nb_bucket;
+	memset(pcb->bitmap, 0, nb_bucket * sizeof(uint64_t));
+
+	return pcb;
+}
+
+static int
+__get_linear_tid(uint64_t *tid)
+{
+	const struct rte_memzone *mz;
+	struct eal_thread_cb *pcb;
+	uint64_t i;
+	uint8_t shift = 0;
+
+	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
+	if (mz != NULL)
+		pcb = mz->addr;
+	else
+		pcb = __create_tid_pool();
+
+	rte_spinlock_lock(&pcb->lock);
+	for (i = 0; i < pcb->nb_bucket; i++) {
+		if (pcb->bitmap[i] == (uint64_t)-1)
+			continue;
+		shift = 0; 
+		while (pcb->bitmap[i] & (1UL << shift))
+			shift ++;
+		pcb->bitmap[i] |= (1UL << shift);
+		break;
+	}
+	rte_spinlock_unlock(&pcb->lock);
+
+	if (i == pcb->nb_bucket)
+		return -1;
+
+	*tid = i * 64 + shift;
+	return 0;
+}
+
+static void __rte_unused
+__put_linear_tid(uint64_t tid)
+{
+	const struct rte_memzone *mz;
+	struct eal_thread_cb *pcb;
+	uint8_t shift;
+
+	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
+	if (!mz)
+		return;
+
+	pcb = mz->addr;
+	rte_spinlock_lock(&pcb->lock);
+	shift = tid & 0x3F;
+	pcb->bitmap[tid / 64] &= ~(1UL << shift);
+	rte_spinlock_unlock(&pcb->lock);	
+}
+
 /* main loop of threads */
 __attribute__((noreturn)) void *
 eal_thread_loop(__attribute__((unused)) void *arg)
@@ -169,6 +266,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
 	char c;
 	int n, ret;
 	unsigned lcore_id;
+	unsigned long ltid = 0;
 	pthread_t thread_id;
 	int m2s, s2m;
 
@@ -191,6 +289,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
 	/* set the lcore ID in per-lcore memory area */
 	RTE_PER_LCORE(_lcore_id) = lcore_id;
 
+	/* set the linear thread ID in per-lcore memory area */
+	if (__get_linear_tid(&ltid) < 0)
+		rte_panic("cannot get cache slot id\n");
+	RTE_PER_LCORE(_thread_id) = ltid;
+
 	/* set CPU affinity */
 	if (eal_thread_set_affinity() < 0)
 		rte_panic("cannot set affinity\n");
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 2/7] mempool: use linear-tid as mempool cache index
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 3/7] ring: use linear-tid as ring debug stats index Cunming Liang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 lib/librte_mempool/rte_mempool.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 3314651..bf4117b 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -159,13 +159,13 @@ struct rte_mempool {
 	unsigned private_data_size;      /**< Size of private data. */
 
 #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
-	/** Per-lcore local cache. */
-	struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
+	/** Per-lthread local cache. */
+	struct rte_mempool_cache local_cache[RTE_MAX_THREAD];
 #endif
 
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-	/** Per-lcore statistics. */
-	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE];
+	/** Per-lthread statistics. */
+	struct rte_mempool_debug_stats stats[RTE_MAX_THREAD];
 #endif
 
 	/* Address translation support, starts from next cache line. */
@@ -199,9 +199,9 @@ struct rte_mempool {
  */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 #define __MEMPOOL_STAT_ADD(mp, name, n) do {			\
-		unsigned __lcore_id = rte_lcore_id();		\
-		mp->stats[__lcore_id].name##_objs += n;		\
-		mp->stats[__lcore_id].name##_bulk += 1;		\
+		unsigned __thread_id = rte_linear_thread_id();	\
+		mp->stats[__thread_id].name##_objs += n;		\
+		mp->stats[__thread_id].name##_bulk += 1;		\
 	} while(0)
 #else
 #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
@@ -758,7 +758,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
 	struct rte_mempool_cache *cache;
 	uint32_t index;
 	void **cache_objs;
-	unsigned lcore_id = rte_lcore_id();
+	unsigned tid = rte_linear_thread_id();
 	uint32_t cache_size = mp->cache_size;
 	uint32_t flushthresh = mp->cache_flushthresh;
 #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
@@ -775,7 +775,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
 	if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto ring_enqueue;
 
-	cache = &mp->local_cache[lcore_id];
+	cache = &mp->local_cache[tid];
 	cache_objs = &cache->objs[cache->len];
 
 	/*
@@ -948,14 +948,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table,
 	struct rte_mempool_cache *cache;
 	uint32_t index, len;
 	void **cache_objs;
-	unsigned lcore_id = rte_lcore_id();
+	unsigned tid = rte_linear_thread_id();
 	uint32_t cache_size = mp->cache_size;
 
 	/* cache is not enabled or single consumer */
 	if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
 		goto ring_dequeue;
 
-	cache = &mp->local_cache[lcore_id];
+	cache = &mp->local_cache[tid];
 	cache_objs = cache->objs;
 
 	/* Can this be satisfied from the cache? */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 3/7] ring: use linear-tid as ring debug stats index
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 2/7] mempool: use linear-tid as mempool cache index Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 4/7] eal: add simple API for multi-pthread Cunming Liang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 lib/librte_ring/rte_ring.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 3920830..c038a4f 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -189,9 +189,9 @@ struct rte_ring {
  */
 #ifdef RTE_LIBRTE_RING_DEBUG
 #define __RING_STAT_ADD(r, name, n) do {		\
-		unsigned __lcore_id = rte_lcore_id();	\
-		r->stats[__lcore_id].name##_objs += n;	\
-		r->stats[__lcore_id].name##_bulk += 1;	\
+		unsigned __thread_id = rte_linear_thread_id(); \
+		r->stats[__thread_id].name##_objs += n;	\
+		r->stats[__thread_id].name##_bulk += 1;	\
 	} while(0)
 #else
 #define __RING_STAT_ADD(r, name, n) do {} while(0)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 4/7] eal: add simple API for multi-pthread
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
                   ` (2 preceding siblings ...)
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 3/7] ring: use linear-tid as ring debug stats index Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 5/7] testpmd: support multi-pthread mode Cunming Liang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 config/common_linuxapp                   |   1 +
 lib/librte_eal/common/include/rte_eal.h  |  10 +++
 lib/librte_eal/linuxapp/eal/eal_thread.c | 105 ++++++++++++++++++++++++++++++-
 3 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2f9643b..4800b31 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -89,6 +89,7 @@ CONFIG_RTE_LIBNAME="intel_dpdk"
 #
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
+CONFIG_RTE_MAX_THREAD=256
 CONFIG_RTE_MAX_NUMA_NODES=8
 CONFIG_RTE_MAX_MEMSEG=256
 CONFIG_RTE_MAX_MEMZONE=2560
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 2640167..0bce5f3 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -41,6 +41,7 @@
  */
 
 #include <stdint.h>
+#include <pthread.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -266,6 +267,15 @@ int rte_eal_has_hugepages(void);
 #define RTE_MAX_THREAD                RTE_MAX_LCORE
 #endif
 
+int rte_pthread_create(pthread_t *tid, void *(*work)(void *), void *arg);
+
+int rte_pthread_assign_lcore(pthread_t thread, unsigned lcore);
+
+int rte_pthread_assign_cpuset(pthread_t thread, unsigned lcore[], unsigned num);
+
+int rte_pthread_prepare(void);
+
+void rte_pthread_cleanup(void);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 52478d6..a584e3b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -241,7 +241,7 @@ __get_linear_tid(uint64_t *tid)
 	return 0;
 }
 
-static void __rte_unused
+static void
 __put_linear_tid(uint64_t tid)
 {
 	const struct rte_memzone *mz;
@@ -334,3 +334,106 @@ eal_thread_loop(__attribute__((unused)) void *arg)
 	/* pthread_exit(NULL); */
 	/* return NULL; */
 }
+
+int
+rte_pthread_assign_lcore(pthread_t thread, unsigned lcore)
+{
+	if (!rte_lcore_is_enabled(lcore))
+		return -1;
+
+	if (__eal_thread_set_affinity(thread, lcore) < 0)
+		return -1;
+
+	return 0;
+}
+
+int
+rte_pthread_assign_cpuset(pthread_t thread, unsigned lcore[], unsigned num)
+{
+	int s;
+	unsigned i;
+
+#if defined(CPU_ALLOC)
+	size_t size;
+	cpu_set_t *cpusetp;
+
+	cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
+	if (cpusetp == NULL) {
+		RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
+		return -1;
+	}
+
+	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
+	CPU_ZERO_S(size, cpusetp);
+
+	for (i = 0; i < num; i++) {
+		if (!rte_lcore_is_enabled(lcore[i])) {
+			RTE_LOG(ERR, EAL, "lcore %u not enabled\n", lcore[i]);
+			CPU_FREE(cpusetp);
+			return -1;
+		}
+
+		CPU_SET_S(lcore[i], size, cpusetp);
+	}
+	s = pthread_setaffinity_np(thread, size, cpusetp);
+	if (s != 0) {
+		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+		CPU_FREE(cpusetp);
+		return -1;
+	}
+
+	CPU_FREE(cpusetp);
+#else /* CPU_ALLOC */
+	cpu_set_t cpuset;
+	CPU_ZERO(&cpuset);
+
+	for (i = 0; i < num; i++) {
+		if (!rte_lcore_is_enabled(lcore[i])) {
+			RTE_LOG(ERR, EAL, "lcore %u not enabled\n", lcore[i]);
+			return -1;
+		}
+		CPU_SET(lcore[i], &cpuset);
+	}
+
+	s = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
+	if (s != 0) {
+		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+		return -1;
+	}
+#endif
+
+	return 0;
+}
+
+int
+rte_pthread_prepare(void)
+{
+	unsigned long ltid;
+	if (__get_linear_tid(&ltid) < 0)
+		return -1;
+	RTE_PER_LCORE(_thread_id) = ltid;
+}
+
+void
+rte_pthread_cleanup(void)
+{
+	__put_linear_tid(RTE_PER_LCORE(_thread_id));
+}
+
+int
+rte_pthread_create(pthread_t *tid, void *(*work)(void *), void *arg)
+{
+	int ret;
+
+	if (tid == NULL || work == NULL)
+		return -1;
+
+	ret = pthread_create(tid, NULL, work, arg);
+	if (ret != 0)
+		return -1;
+
+	if (__eal_thread_set_affinity(*tid, rte_lcore_id()) < 0)
+		rte_panic("cannot set affinity\n");
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 5/7] testpmd: support multi-pthread mode
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
                   ` (3 preceding siblings ...)
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 4/7] eal: add simple API for multi-pthread Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 6/7] sample: add new sample for multi-pthread Cunming Liang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 app/test-pmd/cmdline.c | 41 ++++++++++++++++++++++++
 app/test-pmd/testpmd.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h |  1 +
 3 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 882a5a2..9c2322c 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8697,6 +8697,45 @@ cmdline_parse_inst_t cmd_set_flow_director_flex_payload = {
 	},
 };
 
+/* *** SET SP/MP *** */
+struct cmd_set_mp_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t mp;
+	cmdline_fixed_string_t mode;
+};
+
+static void cmd_set_mp_parsed(void *parsed_result,
+			      __attribute__((unused)) struct cmdline *cl,
+			      __attribute__((unused)) void *data)
+{
+	struct cmd_set_mp_result *res = parsed_result;
+
+	if (!strcmp(res->mode, "on"))
+		set_multi_thread(1);
+	else
+		set_multi_thread(0);
+}
+
+cmdline_parse_token_string_t cmd_setmp_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_mp_result, set, "set");
+cmdline_parse_token_string_t cmd_setmp_mp =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_mp_result, mp, "mp");
+cmdline_parse_token_string_t cmd_setmp_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_mp_result, mode,
+				 "on#off");
+
+cmdline_parse_inst_t cmd_set_mp = {
+	.f = cmd_set_mp_parsed,
+	.data = (void *)1,
+	.help_str = "set mp on|off: turn on/off multi-thread per lcore",
+	.tokens = {
+		(void *)&cmd_setmp_set,
+		(void *)&cmd_setmp_mp,
+		(void *)&cmd_setmp_mode,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -8836,6 +8875,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_flush_flow_director,
 	(cmdline_parse_inst_t *)&cmd_set_flow_director_flex_mask,
 	(cmdline_parse_inst_t *)&cmd_set_flow_director_flex_payload,
+	(cmdline_parse_inst_t *)&cmd_set_mp,
 	NULL,
 };
 
@@ -8906,3 +8946,4 @@ bypass_is_supported(portid_t port_id)
 	}
 }
 #endif
+
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 8c69756..7ff9d0c 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -943,7 +943,7 @@ flush_fwd_rx_queues(void)
 }
 
 static void
-run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
+run_pkt_fwd_on_lcore_sp(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
 {
 	struct fwd_stream **fsm;
 	streamid_t nb_fs;
@@ -957,6 +957,70 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
 	} while (! fc->stopped);
 }
 
+struct work_arg {
+	struct fwd_lcore *fc;
+	struct fwd_stream *fs;
+	packet_fwd_t pkt_fwd;
+};
+
+static void* work(void *arg)
+{
+	struct work_arg *warg = (struct work_arg *)arg;
+	struct fwd_stream *fs = warg->fs;
+	struct fwd_lcore *fc = warg->fc;
+	packet_fwd_t pkt_fwd = warg->pkt_fwd;
+
+	do {
+		(*pkt_fwd)(fs);
+	} while (! fc->stopped);
+
+	return NULL;
+}
+
+static void
+run_pkt_fwd_on_lcore_mp(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
+{
+	struct fwd_stream **fsm;
+	streamid_t nb_fs;
+	streamid_t sm_id;
+	streamid_t i;
+	struct work_arg *work_arg = NULL;
+	pthread_t *tids = NULL;
+
+	fsm = &fwd_streams[fc->stream_idx];
+	nb_fs = fc->stream_nb;
+	tids = calloc(nb_fs, nb_fs * sizeof(*tids));
+	if (!tids)
+		goto exit;
+
+	work_arg = calloc(nb_fs, nb_fs * sizeof(*work_arg));
+	if (!work_arg)
+		goto exit;
+
+	for (sm_id = 0; sm_id < nb_fs; sm_id++) {
+		work_arg[sm_id].pkt_fwd = pkt_fwd;
+		work_arg[sm_id].fc = fc;
+		work_arg[sm_id].fs = fsm[sm_id];
+		if (rte_pthread_create(&tids[sm_id], work,
+				       &work_arg[sm_id]) < 0) {
+			printf("create pthread fail on stream %u\n",
+				sm_id);
+			break;
+		}
+	}
+
+	for (i = 0; i < sm_id; i++)
+		(void)pthread_join(tids[i], NULL);
+
+exit:
+	free(tids);
+	free(work_arg);
+}
+
+void
+(*run_pkt_fwd_on_lcore)(struct fwd_lcore *fc, packet_fwd_t pkt_fwd) = 
+	run_pkt_fwd_on_lcore_sp;
+
 static int
 start_pkt_forward_on_core(void *fwd_arg)
 {
@@ -965,6 +1029,24 @@ start_pkt_forward_on_core(void *fwd_arg)
 	return 0;
 }
 
+int set_multi_thread(int on)
+{
+	unsigned int lc_id;
+
+	for (lc_id = 0; lc_id < cur_fwd_config.nb_fwd_lcores; lc_id++)
+		if (fwd_lcores[lc_id]->stopped != 1) {
+			printf("Make sure stop forwarding first\n");
+			return -1;
+		}
+
+	if (on)
+		run_pkt_fwd_on_lcore = run_pkt_fwd_on_lcore_mp;
+	else
+		run_pkt_fwd_on_lcore = run_pkt_fwd_on_lcore_sp;
+
+	return 0;
+}
+
 /*
  * Run the TXONLY packet forwarding engine to send a single burst of packets.
  * Used to start communication flows in network loopback test configurations.
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8b0740..3658e0f 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -558,6 +558,7 @@ void get_flex_filter(uint8_t port_id, uint16_t index);
 int port_id_is_invalid(portid_t port_id);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+int set_multi_thread(int on);
 
 /*
  * Work-around of a compilation error with ICC on invocations of the
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 6/7] sample: add new sample for multi-pthread
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
                   ` (4 preceding siblings ...)
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 5/7] testpmd: support multi-pthread mode Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 7/7] eal: macro for cpuset w/ or w/o CPU_ALLOC Cunming Liang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 examples/multi-pthread/Makefile |  57 ++++++++++
 examples/multi-pthread/main.c   | 232 ++++++++++++++++++++++++++++++++++++++++
 examples/multi-pthread/main.h   |  46 ++++++++
 3 files changed, 335 insertions(+)
 create mode 100644 examples/multi-pthread/Makefile
 create mode 100644 examples/multi-pthread/main.c
 create mode 100644 examples/multi-pthread/main.h

diff --git a/examples/multi-pthread/Makefile b/examples/multi-pthread/Makefile
new file mode 100644
index 0000000..6d75212
--- /dev/null
+++ b/examples/multi-pthread/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-default-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = multi-pthread
+
+# all source are stored in SRCS-y
+SRCS-y := main.c
+
+CFLAGS += -D_GNU_SOURCE # $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+EXTRA_CFLAGS += -O3 -g -Wfatal-errors
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/multi-pthread/main.c b/examples/multi-pthread/main.c
new file mode 100644
index 0000000..10cb4ad
--- /dev/null
+++ b/examples/multi-pthread/main.c
@@ -0,0 +1,232 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <inttypes.h>
+#include <unistd.h>
+#include <rte_eal.h>
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include "main.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+
+#include <pthread.h>
+#include <sched.h>
+
+#define RX_RING_SIZE 512
+#define TX_RING_SIZE 512
+
+#define TX_Q_FLAGS (ETH_TXQ_FLAGS_NOMULTSEGS | ETH_TXQ_FLAGS_NOVLANOFFL |\
+	ETH_TXQ_FLAGS_NOXSUMSCTP | ETH_TXQ_FLAGS_NOXSUMUDP | \
+	ETH_TXQ_FLAGS_NOXSUMTCP)
+
+#define NUM_MBUFS 8191
+#define MBUF_SIZE (1600 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE 250
+#define BURST_SIZE 32
+
+static struct rte_eth_conf port_conf_default = {
+	.rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN, },
+};
+
+/*
+ * Initialises a given port using global settings and with the rx buffers
+ * coming from the mbuf_pool passed as parameter
+ */
+static inline int
+port_init(uint8_t port, struct rte_mempool *mbuf_pool)
+{
+	struct rte_eth_conf port_conf = port_conf_default;
+	const uint16_t rxRings = 1, txRings = 1;
+	struct rte_eth_dev_info info;
+	int retval;
+	uint16_t q;
+
+	if (port >= rte_eth_dev_count())
+		return -1;
+
+	retval = rte_eth_dev_configure(port, rxRings, txRings, &port_conf);
+	if (retval != 0)
+		return retval;
+
+	rte_eth_dev_info_get(port, &info);
+
+	for (q = 0; q < rxRings; q++) {
+		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+				rte_eth_dev_socket_id(port),
+				&info.default_rxconf, mbuf_pool);
+		if (retval < 0)
+			return retval;
+	}
+
+	/* override default TX queue flags to disable offloads for faster TX */
+	info.default_txconf.txq_flags = TX_Q_FLAGS;
+	for (q = 0; q < txRings; q++) {
+		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+				rte_eth_dev_socket_id(port),
+				&info.default_txconf);
+		if (retval < 0)
+			return retval;
+	}
+
+	retval  = rte_eth_dev_start(port);
+	if (retval < 0)
+		return retval;
+
+	struct ether_addr addr;
+	rte_eth_macaddr_get(port, &addr);
+	printf("Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
+			" %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
+			(unsigned)port,
+			addr.addr_bytes[0], addr.addr_bytes[1],
+			addr.addr_bytes[2], addr.addr_bytes[3],
+			addr.addr_bytes[4], addr.addr_bytes[5]);
+
+	rte_eth_promiscuous_enable(port);
+
+	return 0;
+}
+
+int pthread_yield(void);
+static void*
+work(void *arg)
+{
+	struct rte_mbuf *bufs[BURST_SIZE];
+	uint16_t nb_rx, nb_tx;
+	uint64_t port = (uint64_t)arg;
+	uint16_t YIELD_THRESH = BURST_SIZE / 2;
+	uint16_t SLEEP_USEC = 100;
+
+	if (rte_pthread_prepare() < 0)
+		return;
+
+	while (1) {
+		nb_rx = rte_eth_rx_burst(port, 0,
+					 bufs, BURST_SIZE);
+		if (unlikely(nb_rx == 0)){
+			usleep(SLEEP_USEC);
+			continue;
+		} else if (unlikely(nb_rx <= YIELD_THRESH))
+			pthread_yield();
+
+		nb_tx = rte_eth_tx_burst(port ^ 1, 0,
+					 bufs, nb_rx);
+		if (unlikely(nb_tx < nb_rx)) {
+			uint16_t buf;
+			for (buf = nb_tx; buf < nb_rx; buf++)
+				rte_pktmbuf_free(bufs[buf]);
+		}
+	};
+	rte_pthread_cleanup();
+}
+
+/*
+ * Main thread that does the work, reading from INPUT_PORT
+ * and writing to OUTPUT_PORT
+ */
+static void
+lcore_main(void)
+{
+	const uint64_t nb_ports = rte_eth_dev_count();
+	uint64_t port, i;
+	pthread_t tid[RTE_MAX_ETHPORTS];
+
+	memset(tid, 0, sizeof(tid));
+
+	for (port = 0; port < nb_ports; port++)
+		if (rte_eth_dev_socket_id(port) > 0 &&
+				rte_eth_dev_socket_id(port) !=
+						(int)rte_socket_id())
+			printf("WARNING, port %"PRIu64" is on remote "
+			       "NUMA node to polling thread.\n\t"
+			       "Performance will not be optimal.\n", port);
+
+	printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
+	       rte_lcore_id());
+
+	for (port = 0; port < nb_ports; port++) {
+		if (rte_pthread_create(&tid[port], work,
+				       (void *)port) != 0)
+			break;
+	}
+
+	for (i = 0; i < port; i++)
+		(void)pthread_join(tid[i], NULL);
+}
+
+/* Main function, does initialisation and calls the per-lcore functions */
+int
+MAIN(int argc, char *argv[])
+{
+	struct rte_mempool *mbuf_pool;
+	unsigned nb_ports;
+	uint8_t portid;
+
+	/* init EAL */
+	int ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+	argc -= ret;
+	argv += ret;
+
+	nb_ports = rte_eth_dev_count();
+	if (nb_ports < 2 || (nb_ports & 1))
+		rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
+
+	mbuf_pool = rte_mempool_create("MBUF_POOL", NUM_MBUFS * nb_ports,
+				       MBUF_SIZE, MBUF_CACHE_SIZE,
+				       sizeof(struct rte_pktmbuf_pool_private),
+				       rte_pktmbuf_pool_init, NULL,
+				       rte_pktmbuf_init, NULL,
+				       rte_socket_id(), 0);
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
+
+	/* initialize all ports */
+	for (portid = 0; portid < nb_ports; portid++)
+		if (port_init(portid, mbuf_pool) != 0)
+			rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8"\n",
+					portid);
+
+	if (rte_lcore_count() > 1)
+		printf("\nWARNING: Coremask too big - App uses only 1 lcore\n");
+
+	/* call lcore_main on master core only */
+	lcore_main();
+	return 0;
+}
diff --git a/examples/multi-pthread/main.h b/examples/multi-pthread/main.h
new file mode 100644
index 0000000..2682d15
--- /dev/null
+++ b/examples/multi-pthread/main.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MAIN_H_
+#define _MAIN_H_
+
+
+#ifdef RTE_EXEC_ENV_BAREMETAL
+#define MAIN _main
+#else
+#define MAIN main
+#endif
+
+int MAIN(int argc, char *argv[]);
+
+#endif /* ifndef _MAIN_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [dpdk-dev] [RFC PATCH 7/7] eal: macro for cpuset w/ or w/o CPU_ALLOC
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
                   ` (5 preceding siblings ...)
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 6/7] sample: add new sample for multi-pthread Cunming Liang
@ 2014-12-11  2:04 ` Cunming Liang
  2014-12-11  2:54 ` [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Jayakumar, Muthurajan
  2014-12-11  9:56 ` Walukiewicz, Miroslaw
  8 siblings, 0 replies; 37+ messages in thread
From: Cunming Liang @ 2014-12-11  2:04 UTC (permalink / raw)
  To: dev


Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_thread.c | 144 +++++++++++++++++--------------
 1 file changed, 81 insertions(+), 63 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
index a584e3b..05cebe4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -103,13 +103,6 @@ rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
 }
 
 
-/* set affinity for current thread */
-static int
-__eal_thread_set_affinity(pthread_t thread, unsigned lcore)
-{
-
-	int s;
-
 /*
  * According to the section VERSIONS of the CPU_ALLOC man page:
  *
@@ -124,38 +117,62 @@ __eal_thread_set_affinity(pthread_t thread, unsigned lcore)
  * first appeared in glibc 2.7.
  */
 #if defined(CPU_ALLOC)
+#define INIT_CPUSET(size, cpusetp)				\
+	do {							\
+		cpusetp = CPU_ALLOC(RTE_MAX_LCORE);			\
+		if (cpusetp == NULL)					\
+			rte_panic("CPU_ALLOC failed\n");		\
+									\
+		size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);			\
+		CPU_ZERO_S(size, cpusetp);				\
+	} while(0)
+
+#define CLEAN_CPUSET(cpusetp)			\
+	do {					\
+		CPU_FREE(cpusetp);		\
+	} while(0)
+
+#define SET_CPUSET(lcore, size, cpusetp) 	\
+	CPU_SET_S(lcore, size, cpusetp)
+
+#else /* CPU_ALLOC */
+
+#define INIT_CPUSET(size, cpusetp)			\
+	do {						\
+	    cpu_set_t cpuset;				\
+	    cpusetp = &cpuset;				\
+	    size = sizeof(cpuset);			\
+	    CPU_ZERO(&cpuset);				\
+	} while(0)
+
+#define CLEAN_CPUSET(cpusetp)
+
+#define SET_CPUSET(lcore, size, cpusetp)	\
+	CPU_SET(lcore, cpusetp);
+
+#endif
+
+
+/* set affinity for current thread */
+static int
+__eal_thread_set_affinity(pthread_t thread, unsigned lcore)
+{
+	int s;
 	size_t size;
 	cpu_set_t *cpusetp;
 
-	cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
-	if (cpusetp == NULL) {
-		RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
-		return -1;
-	}
-
-	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
-	CPU_ZERO_S(size, cpusetp);
-	CPU_SET_S(lcore, size, cpusetp);
+	INIT_CPUSET(size, cpusetp);
+	SET_CPUSET(lcore, size, cpusetp);
 
 	s = pthread_setaffinity_np(thread, size, cpusetp);
 	if (s != 0) {
 		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-		CPU_FREE(cpusetp);
+		CLEAN_CPUSET(cpusetp);
 		return -1;
 	}
 
-	CPU_FREE(cpusetp);
-#else /* CPU_ALLOC */
-	cpu_set_t cpuset;
-	CPU_ZERO( &cpuset );
-	CPU_SET(lcore, &cpuset );
+	CLEAN_CPUSET(cpusetp);
 
-	s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
-	if (s != 0) {
-		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-		return -1;
-	}
-#endif
 	return 0;
 }
 
@@ -248,6 +265,9 @@ __put_linear_tid(uint64_t tid)
 	struct eal_thread_cb *pcb;
 	uint8_t shift;
 
+	if (tid >= RTE_MAX_THREAD)
+		return;
+
 	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
 	if (!mz)
 		return;
@@ -352,55 +372,28 @@ rte_pthread_assign_cpuset(pthread_t thread, unsigned lcore[], unsigned num)
 {
 	int s;
 	unsigned i;
-
-#if defined(CPU_ALLOC)
 	size_t size;
 	cpu_set_t *cpusetp;
 
-	cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
-	if (cpusetp == NULL) {
-		RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
-		return -1;
-	}
-
-	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
-	CPU_ZERO_S(size, cpusetp);
+	INIT_CPUSET(size, cpusetp);
 
 	for (i = 0; i < num; i++) {
 		if (!rte_lcore_is_enabled(lcore[i])) {
 			RTE_LOG(ERR, EAL, "lcore %u not enabled\n", lcore[i]);
-			CPU_FREE(cpusetp);
+			CLEAN_CPUSET(cpusetp);
 			return -1;
 		}
 
-		CPU_SET_S(lcore[i], size, cpusetp);
+		SET_CPUSET(lcore[i], size, cpusetp);
 	}
 	s = pthread_setaffinity_np(thread, size, cpusetp);
 	if (s != 0) {
 		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-		CPU_FREE(cpusetp);
+		CLEAN_CPUSET(cpusetp);
 		return -1;
 	}
 
-	CPU_FREE(cpusetp);
-#else /* CPU_ALLOC */
-	cpu_set_t cpuset;
-	CPU_ZERO(&cpuset);
-
-	for (i = 0; i < num; i++) {
-		if (!rte_lcore_is_enabled(lcore[i])) {
-			RTE_LOG(ERR, EAL, "lcore %u not enabled\n", lcore[i]);
-			return -1;
-		}
-		CPU_SET(lcore[i], &cpuset);
-	}
-
-	s = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
-	if (s != 0) {
-		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-		return -1;
-	}
-#endif
+	CLEAN_CPUSET(cpusetp);
 
 	return 0;
 }
@@ -409,9 +402,20 @@ int
 rte_pthread_prepare(void)
 {
 	unsigned long ltid;
+	unsigned lcore;
+
 	if (__get_linear_tid(&ltid) < 0)
 		return -1;
+
 	RTE_PER_LCORE(_thread_id) = ltid;
+
+	lcore = sched_getcpu();
+	if (!rte_lcore_is_enabled(lcore))
+		RTE_LOG(WARNING, EAL, "lcore %u is not enabled\n", lcore);
+	else
+		RTE_PER_LCORE(_lcore_id) = lcore;
+	
+	return 0;
 }
 
 void
@@ -424,16 +428,30 @@ int
 rte_pthread_create(pthread_t *tid, void *(*work)(void *), void *arg)
 {
 	int ret;
+	pthread_attr_t attr;
+	size_t size;
+	cpu_set_t *cpusetp;
+	pthread_attr_t *pattr = NULL;
 
 	if (tid == NULL || work == NULL)
 		return -1;
 
-	ret = pthread_create(tid, NULL, work, arg);
+	INIT_CPUSET(size, cpusetp);
+
+	SET_CPUSET(rte_lcore_id(), size, cpusetp);
+
+	pthread_attr_init(&attr);
+	if (!pthread_attr_setaffinity_np(&attr, size, cpusetp))
+		pattr = &attr;
+
+	CLEAN_CPUSET(cpusetp);
+
+	ret = pthread_create(tid, pattr, work, arg);
 	if (ret != 0)
 		return -1;
 
-	if (__eal_thread_set_affinity(*tid, rte_lcore_id()) < 0)
-		rte_panic("cannot set affinity\n");
+	pthread_attr_destroy(&attr);
 
 	return 0;
 }
+
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
                   ` (6 preceding siblings ...)
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 7/7] eal: macro for cpuset w/ or w/o CPU_ALLOC Cunming Liang
@ 2014-12-11  2:54 ` Jayakumar, Muthurajan
  2014-12-11  9:56 ` Walukiewicz, Miroslaw
  8 siblings, 0 replies; 37+ messages in thread
From: Jayakumar, Muthurajan @ 2014-12-11  2:54 UTC (permalink / raw)
  To: Liang, Cunming, dev

Steve, 

Great write up.
Nice explanation of 1) per-lcore numbering and 2) Multi-producer/consumer enqueue -dequeue.

Thanks,

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
Sent: Wednesday, December 10, 2014 6:05 PM
To: dev@dpdk.org
Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore


Scope & Usage Scenario
========================  

DPDK usually pin pthread per core to avoid task switch overhead. It gains performance a lot, but it's not efficient in all cases. In some cases, it may too expensive to use the whole core for a lightweight workload. It's a reasonable demand to have multiple threads per core and each threads share CPU in an assigned weight.

In fact, nothing avoid user to create normal pthread and using cgroup to control the CPU share. One of the purpose for the patchset is to clean the gaps of using more DPDK libraries in the normal pthread. In addition, it demonstrates performance gain by proactive 'yield' when doing idle loop in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.


Changes to DPDK libraries
==========================

Some of DPDK libraries must run in DPDK environment.

# rte_mempool

In rte_mempool doc, it mentions a thread not created by EAL must not use mempools. The root cause is it uses a per-lcore cache inside mempool. 
And 'rte_lcore_id()' will not return a correct value.

The patchset changes this a little. The index of mempool cache won't be a lcore_id. Instead of it, using a linear number generated by the allocator.
For those legacy EAL per-lcore thread, it apply for an unique linear id during creation. For those normal pthread expecting to use rte_mempool, it requires to apply for a linear id explicitly. Now the mempool cache looks like a per-thread base. The linear ID actually identify for the linear thread id.

However, there's another problem. The rte_mempool is not preemptable. The problem comes from rte_ring, so talk together in next section.

# rte_ring

rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's not preemptable. There's conversation talking about this before.
http://dpdk.org/ml/archives/dev/2013-November/000714.html

Let's say there's two pthreads running on the same core doing enqueue on the same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it has already modified the prod.head, the 2nd pthread will spin until the 1st one scheduled agian. It causes time wasting. In addition, if the 2nd pthread has absolutely higer priority, it's more terrible.

But it doesn't means we can't use. Just need to narrow down the situation when it's used by multi-pthread on the same core.
- It CAN be used for any single-producer or single-consumer situation.
- It MAY be used by multi-producer/consumer pthread whose scheduling policy are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor using it.
- It MUST not be used by multi-producer/consumer pthread, while some of their scheduling policies is SCHED_FIFO or SCHED_RR.


Performance
==============

It loses performance by introducing task switching. On packet IO perspective, we can gain some back by improving IO effective rate. When the pthread do idle loop on an empty rx queue, it should proactively yield. We can also slow down rx for a bit while to take more advantage of the bulk receiving in the next loop. In practice, increase the rx ring size also helps to improve the overrall throughput.


Cgroup Control
================

Here's a simple example, there's four pthread doing packet IO on the same core.
We expect the CPU share rate is 1:1:2:4.
> mkdir /sys/fs/cgroup/cpu/dpdk
> mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> cd /sys/fs/cgroup/cpu/dpdk
> echo 256 > thread0/cpu.shares
> echo 256 > thread1/cpu.shares
> echo 512 > thread2/cpu.shares
> echo 1024 > thread3/cpu.shares


-END-

Any comments are welcome.

Thanks

*** BLURB HERE ***

Cunming Liang (7):
  eal: add linear thread id as pthread-local variable
  mempool: use linear-tid as mempool cache index
  ring: use linear-tid as ring debug stats index
  eal: add simple API for multi-pthread
  testpmd: support multi-pthread mode
  sample: add new sample for multi-pthread
  eal: macro for cpuset w/ or w/o CPU_ALLOC

 app/test-pmd/cmdline.c                    |  41 +++++
 app/test-pmd/testpmd.c                    |  84 ++++++++-
 app/test-pmd/testpmd.h                    |   1 +
 config/common_linuxapp                    |   1 +
 examples/multi-pthread/Makefile           |  57 ++++++
 examples/multi-pthread/main.c             | 232 ++++++++++++++++++++++++
 examples/multi-pthread/main.h             |  46 +++++
 lib/librte_eal/common/include/rte_eal.h   |  15 ++
 lib/librte_eal/common/include/rte_lcore.h |  12 ++  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282 +++++++++++++++++++++++++++---
 lib/librte_mempool/rte_mempool.h          |  22 +--
 lib/librte_ring/rte_ring.h                |   6 +-
 12 files changed, 755 insertions(+), 44 deletions(-)  create mode 100644 examples/multi-pthread/Makefile  create mode 100644 examples/multi-pthread/main.c  create mode 100644 examples/multi-pthread/main.h

--
1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
                   ` (7 preceding siblings ...)
  2014-12-11  2:54 ` [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Jayakumar, Muthurajan
@ 2014-12-11  9:56 ` Walukiewicz, Miroslaw
  2014-12-12  5:44   ` Liang, Cunming
  8 siblings, 1 reply; 37+ messages in thread
From: Walukiewicz, Miroslaw @ 2014-12-11  9:56 UTC (permalink / raw)
  To: Liang, Cunming, dev

Thank you Cunming for explanation. 

What about DPDK timers? They also depend on rte_lcore_id() to avoid spinlocks. 

Mirek

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, December 11, 2014 3:05 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> Scope & Usage Scenario
> ========================
> 
> DPDK usually pin pthread per core to avoid task switch overhead. It gains
> performance a lot, but it's not efficient in all cases. In some cases, it may
> too expensive to use the whole core for a lightweight workload. It's a
> reasonable demand to have multiple threads per core and each threads
> share CPU
> in an assigned weight.
> 
> In fact, nothing avoid user to create normal pthread and using cgroup to
> control the CPU share. One of the purpose for the patchset is to clean the
> gaps of using more DPDK libraries in the normal pthread. In addition, it
> demonstrates performance gain by proactive 'yield' when doing idle loop
> in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> 
> 
> Changes to DPDK libraries
> ==========================
> 
> Some of DPDK libraries must run in DPDK environment.
> 
> # rte_mempool
> 
> In rte_mempool doc, it mentions a thread not created by EAL must not use
> mempools. The root cause is it uses a per-lcore cache inside mempool.
> And 'rte_lcore_id()' will not return a correct value.
> 
> The patchset changes this a little. The index of mempool cache won't be a
> lcore_id. Instead of it, using a linear number generated by the allocator.
> For those legacy EAL per-lcore thread, it apply for an unique linear id
> during creation. For those normal pthread expecting to use rte_mempool, it
> requires to apply for a linear id explicitly. Now the mempool cache looks like
> a per-thread base. The linear ID actually identify for the linear thread id.
> 
> However, there's another problem. The rte_mempool is not preemptable.
> The
> problem comes from rte_ring, so talk together in next section.
> 
> # rte_ring
> 
> rte_ring supports multi-producer enqueue and multi-consumer dequeue.
> But it's
> not preemptable. There's conversation talking about this before.
> http://dpdk.org/ml/archives/dev/2013-November/000714.html
> 
> Let's say there's two pthreads running on the same core doing enqueue on
> the
> same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it
> has
> already modified the prod.head, the 2nd pthread will spin until the 1st one
> scheduled agian. It causes time wasting. In addition, if the 2nd pthread has
> absolutely higer priority, it's more terrible.
> 
> But it doesn't means we can't use. Just need to narrow down the situation
> when
> it's used by multi-pthread on the same core.
> - It CAN be used for any single-producer or single-consumer situation.
> - It MAY be used by multi-producer/consumer pthread whose scheduling
> policy
> are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty
> befor
> using it.
> - It MUST not be used by multi-producer/consumer pthread, while some of
> their
> scheduling policies is SCHED_FIFO or SCHED_RR.
> 
> 
> Performance
> ==============
> 
> It loses performance by introducing task switching. On packet IO perspective,
> we can gain some back by improving IO effective rate. When the pthread do
> idle
> loop on an empty rx queue, it should proactively yield. We can also slow
> down
> rx for a bit while to take more advantage of the bulk receiving in the next
> loop. In practice, increase the rx ring size also helps to improve the overrall
> throughput.
> 
> 
> Cgroup Control
> ================
> 
> Here's a simple example, there's four pthread doing packet IO on the same
> core.
> We expect the CPU share rate is 1:1:2:4.
> > mkdir /sys/fs/cgroup/cpu/dpdk
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > cd /sys/fs/cgroup/cpu/dpdk
> > echo 256 > thread0/cpu.shares
> > echo 256 > thread1/cpu.shares
> > echo 512 > thread2/cpu.shares
> > echo 1024 > thread3/cpu.shares
> 
> 
> -END-
> 
> Any comments are welcome.
> 
> Thanks
> 
> *** BLURB HERE ***
> 
> Cunming Liang (7):
>   eal: add linear thread id as pthread-local variable
>   mempool: use linear-tid as mempool cache index
>   ring: use linear-tid as ring debug stats index
>   eal: add simple API for multi-pthread
>   testpmd: support multi-pthread mode
>   sample: add new sample for multi-pthread
>   eal: macro for cpuset w/ or w/o CPU_ALLOC
> 
>  app/test-pmd/cmdline.c                    |  41 +++++
>  app/test-pmd/testpmd.c                    |  84 ++++++++-
>  app/test-pmd/testpmd.h                    |   1 +
>  config/common_linuxapp                    |   1 +
>  examples/multi-pthread/Makefile           |  57 ++++++
>  examples/multi-pthread/main.c             | 232 ++++++++++++++++++++++++
>  examples/multi-pthread/main.h             |  46 +++++
>  lib/librte_eal/common/include/rte_eal.h   |  15 ++
>  lib/librte_eal/common/include/rte_lcore.h |  12 ++
>  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> +++++++++++++++++++++++++++---
>  lib/librte_mempool/rte_mempool.h          |  22 +--
>  lib/librte_ring/rte_ring.h                |   6 +-
>  12 files changed, 755 insertions(+), 44 deletions(-)
>  create mode 100644 examples/multi-pthread/Makefile
>  create mode 100644 examples/multi-pthread/main.c
>  create mode 100644 examples/multi-pthread/main.h
> 
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-11  9:56 ` Walukiewicz, Miroslaw
@ 2014-12-12  5:44   ` Liang, Cunming
  2014-12-15 11:10     ` Walukiewicz, Miroslaw
  0 siblings, 1 reply; 37+ messages in thread
From: Liang, Cunming @ 2014-12-12  5:44 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, dev

Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
For 'rte_timer', I only expect it be used within the 'legacy per-lcore' pthread.
I'm appreciate if you can give me some cases which can't use it to fit.
In case have to use 'rte_timer' in multi-pthread, there are some prerequisites and limitations.
1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do pthread init by rte_pthread_prepare)
2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in multi-pthread, make sure they're not on the same core.

-Cunming

> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Thursday, December 11, 2014 5:57 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Thank you Cunming for explanation.
> 
> What about DPDK timers? They also depend on rte_lcore_id() to avoid spinlocks.
> 
> Mirek
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, December 11, 2014 3:05 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Scope & Usage Scenario
> > ========================
> >
> > DPDK usually pin pthread per core to avoid task switch overhead. It gains
> > performance a lot, but it's not efficient in all cases. In some cases, it may
> > too expensive to use the whole core for a lightweight workload. It's a
> > reasonable demand to have multiple threads per core and each threads
> > share CPU
> > in an assigned weight.
> >
> > In fact, nothing avoid user to create normal pthread and using cgroup to
> > control the CPU share. One of the purpose for the patchset is to clean the
> > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > demonstrates performance gain by proactive 'yield' when doing idle loop
> > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> >
> >
> > Changes to DPDK libraries
> > ==========================
> >
> > Some of DPDK libraries must run in DPDK environment.
> >
> > # rte_mempool
> >
> > In rte_mempool doc, it mentions a thread not created by EAL must not use
> > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > And 'rte_lcore_id()' will not return a correct value.
> >
> > The patchset changes this a little. The index of mempool cache won't be a
> > lcore_id. Instead of it, using a linear number generated by the allocator.
> > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > during creation. For those normal pthread expecting to use rte_mempool, it
> > requires to apply for a linear id explicitly. Now the mempool cache looks like
> > a per-thread base. The linear ID actually identify for the linear thread id.
> >
> > However, there's another problem. The rte_mempool is not preemptable.
> > The
> > problem comes from rte_ring, so talk together in next section.
> >
> > # rte_ring
> >
> > rte_ring supports multi-producer enqueue and multi-consumer dequeue.
> > But it's
> > not preemptable. There's conversation talking about this before.
> > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> >
> > Let's say there's two pthreads running on the same core doing enqueue on
> > the
> > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it
> > has
> > already modified the prod.head, the 2nd pthread will spin until the 1st one
> > scheduled agian. It causes time wasting. In addition, if the 2nd pthread has
> > absolutely higer priority, it's more terrible.
> >
> > But it doesn't means we can't use. Just need to narrow down the situation
> > when
> > it's used by multi-pthread on the same core.
> > - It CAN be used for any single-producer or single-consumer situation.
> > - It MAY be used by multi-producer/consumer pthread whose scheduling
> > policy
> > are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty
> > befor
> > using it.
> > - It MUST not be used by multi-producer/consumer pthread, while some of
> > their
> > scheduling policies is SCHED_FIFO or SCHED_RR.
> >
> >
> > Performance
> > ==============
> >
> > It loses performance by introducing task switching. On packet IO perspective,
> > we can gain some back by improving IO effective rate. When the pthread do
> > idle
> > loop on an empty rx queue, it should proactively yield. We can also slow
> > down
> > rx for a bit while to take more advantage of the bulk receiving in the next
> > loop. In practice, increase the rx ring size also helps to improve the overrall
> > throughput.
> >
> >
> > Cgroup Control
> > ================
> >
> > Here's a simple example, there's four pthread doing packet IO on the same
> > core.
> > We expect the CPU share rate is 1:1:2:4.
> > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > cd /sys/fs/cgroup/cpu/dpdk
> > > echo 256 > thread0/cpu.shares
> > > echo 256 > thread1/cpu.shares
> > > echo 512 > thread2/cpu.shares
> > > echo 1024 > thread3/cpu.shares
> >
> >
> > -END-
> >
> > Any comments are welcome.
> >
> > Thanks
> >
> > *** BLURB HERE ***
> >
> > Cunming Liang (7):
> >   eal: add linear thread id as pthread-local variable
> >   mempool: use linear-tid as mempool cache index
> >   ring: use linear-tid as ring debug stats index
> >   eal: add simple API for multi-pthread
> >   testpmd: support multi-pthread mode
> >   sample: add new sample for multi-pthread
> >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> >
> >  app/test-pmd/cmdline.c                    |  41 +++++
> >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> >  app/test-pmd/testpmd.h                    |   1 +
> >  config/common_linuxapp                    |   1 +
> >  examples/multi-pthread/Makefile           |  57 ++++++
> >  examples/multi-pthread/main.c             | 232
> ++++++++++++++++++++++++
> >  examples/multi-pthread/main.h             |  46 +++++
> >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > +++++++++++++++++++++++++++---
> >  lib/librte_mempool/rte_mempool.h          |  22 +--
> >  lib/librte_ring/rte_ring.h                |   6 +-
> >  12 files changed, 755 insertions(+), 44 deletions(-)
> >  create mode 100644 examples/multi-pthread/Makefile
> >  create mode 100644 examples/multi-pthread/main.c
> >  create mode 100644 examples/multi-pthread/main.h
> >
> > --
> > 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-12  5:44   ` Liang, Cunming
@ 2014-12-15 11:10     ` Walukiewicz, Miroslaw
  2014-12-15 11:53       ` Liang, Cunming
  0 siblings, 1 reply; 37+ messages in thread
From: Walukiewicz, Miroslaw @ 2014-12-15 11:10 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi Cunming, 

The timers could be used by any application/library started as a standard pthread. 
Each pthread needs to have assigned some identifier same way as you are doing it for mempools (the rte_linear_thread_id and rte_lcore_id are good examples)

I made series of patches extending the rte timers API to use with such kind of identifier keeping existing API working also.

I will send it soon. 

Mirek


> -----Original Message-----
> From: Liang, Cunming
> Sent: Friday, December 12, 2014 6:45 AM
> To: Walukiewicz, Miroslaw; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
> For 'rte_timer', I only expect it be used within the 'legacy per-lcore' pthread.
> I'm appreciate if you can give me some cases which can't use it to fit.
> In case have to use 'rte_timer' in multi-pthread, there are some
> prerequisites and limitations.
> 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do pthread
> init by rte_pthread_prepare)
> 2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in
> multi-pthread, make sure they're not on the same core.
> 
> -Cunming
> 
> > -----Original Message-----
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, December 11, 2014 5:57 PM
> > To: Liang, Cunming; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Thank you Cunming for explanation.
> >
> > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> spinlocks.
> >
> > Mirek
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
> > > Sent: Thursday, December 11, 2014 3:05 AM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > >
> > > Scope & Usage Scenario
> > > ========================
> > >
> > > DPDK usually pin pthread per core to avoid task switch overhead. It gains
> > > performance a lot, but it's not efficient in all cases. In some cases, it may
> > > too expensive to use the whole core for a lightweight workload. It's a
> > > reasonable demand to have multiple threads per core and each threads
> > > share CPU
> > > in an assigned weight.
> > >
> > > In fact, nothing avoid user to create normal pthread and using cgroup to
> > > control the CPU share. One of the purpose for the patchset is to clean the
> > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > demonstrates performance gain by proactive 'yield' when doing idle loop
> > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > >
> > >
> > > Changes to DPDK libraries
> > > ==========================
> > >
> > > Some of DPDK libraries must run in DPDK environment.
> > >
> > > # rte_mempool
> > >
> > > In rte_mempool doc, it mentions a thread not created by EAL must not
> use
> > > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > > And 'rte_lcore_id()' will not return a correct value.
> > >
> > > The patchset changes this a little. The index of mempool cache won't be a
> > > lcore_id. Instead of it, using a linear number generated by the allocator.
> > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > during creation. For those normal pthread expecting to use
> rte_mempool, it
> > > requires to apply for a linear id explicitly. Now the mempool cache looks
> like
> > > a per-thread base. The linear ID actually identify for the linear thread id.
> > >
> > > However, there's another problem. The rte_mempool is not
> preemptable.
> > > The
> > > problem comes from rte_ring, so talk together in next section.
> > >
> > > # rte_ring
> > >
> > > rte_ring supports multi-producer enqueue and multi-consumer
> dequeue.
> > > But it's
> > > not preemptable. There's conversation talking about this before.
> > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > >
> > > Let's say there's two pthreads running on the same core doing enqueue
> on
> > > the
> > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while
> it
> > > has
> > > already modified the prod.head, the 2nd pthread will spin until the 1st
> one
> > > scheduled agian. It causes time wasting. In addition, if the 2nd pthread
> has
> > > absolutely higer priority, it's more terrible.
> > >
> > > But it doesn't means we can't use. Just need to narrow down the
> situation
> > > when
> > > it's used by multi-pthread on the same core.
> > > - It CAN be used for any single-producer or single-consumer situation.
> > > - It MAY be used by multi-producer/consumer pthread whose scheduling
> > > policy
> > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance
> penalty
> > > befor
> > > using it.
> > > - It MUST not be used by multi-producer/consumer pthread, while some
> of
> > > their
> > > scheduling policies is SCHED_FIFO or SCHED_RR.
> > >
> > >
> > > Performance
> > > ==============
> > >
> > > It loses performance by introducing task switching. On packet IO
> perspective,
> > > we can gain some back by improving IO effective rate. When the pthread
> do
> > > idle
> > > loop on an empty rx queue, it should proactively yield. We can also slow
> > > down
> > > rx for a bit while to take more advantage of the bulk receiving in the next
> > > loop. In practice, increase the rx ring size also helps to improve the
> overrall
> > > throughput.
> > >
> > >
> > > Cgroup Control
> > > ================
> > >
> > > Here's a simple example, there's four pthread doing packet IO on the
> same
> > > core.
> > > We expect the CPU share rate is 1:1:2:4.
> > > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > > cd /sys/fs/cgroup/cpu/dpdk
> > > > echo 256 > thread0/cpu.shares
> > > > echo 256 > thread1/cpu.shares
> > > > echo 512 > thread2/cpu.shares
> > > > echo 1024 > thread3/cpu.shares
> > >
> > >
> > > -END-
> > >
> > > Any comments are welcome.
> > >
> > > Thanks
> > >
> > > *** BLURB HERE ***
> > >
> > > Cunming Liang (7):
> > >   eal: add linear thread id as pthread-local variable
> > >   mempool: use linear-tid as mempool cache index
> > >   ring: use linear-tid as ring debug stats index
> > >   eal: add simple API for multi-pthread
> > >   testpmd: support multi-pthread mode
> > >   sample: add new sample for multi-pthread
> > >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> > >
> > >  app/test-pmd/cmdline.c                    |  41 +++++
> > >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> > >  app/test-pmd/testpmd.h                    |   1 +
> > >  config/common_linuxapp                    |   1 +
> > >  examples/multi-pthread/Makefile           |  57 ++++++
> > >  examples/multi-pthread/main.c             | 232
> > ++++++++++++++++++++++++
> > >  examples/multi-pthread/main.h             |  46 +++++
> > >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> > >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > > +++++++++++++++++++++++++++---
> > >  lib/librte_mempool/rte_mempool.h          |  22 +--
> > >  lib/librte_ring/rte_ring.h                |   6 +-
> > >  12 files changed, 755 insertions(+), 44 deletions(-)
> > >  create mode 100644 examples/multi-pthread/Makefile
> > >  create mode 100644 examples/multi-pthread/main.c
> > >  create mode 100644 examples/multi-pthread/main.h
> > >
> > > --
> > > 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-15 11:10     ` Walukiewicz, Miroslaw
@ 2014-12-15 11:53       ` Liang, Cunming
  2014-12-18 12:20         ` Walukiewicz, Miroslaw
  0 siblings, 1 reply; 37+ messages in thread
From: Liang, Cunming @ 2014-12-15 11:53 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, dev

Hi Mirek,

That sounds great.
Looking forward to it.

-Cunming

> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Monday, December 15, 2014 7:11 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Hi Cunming,
> 
> The timers could be used by any application/library started as a standard
> pthread.
> Each pthread needs to have assigned some identifier same way as you are doing
> it for mempools (the rte_linear_thread_id and rte_lcore_id are good examples)
> 
> I made series of patches extending the rte timers API to use with such kind of
> identifier keeping existing API working also.
> 
> I will send it soon.
> 
> Mirek
> 
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Friday, December 12, 2014 6:45 AM
> > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
> > For 'rte_timer', I only expect it be used within the 'legacy per-lcore' pthread.
> > I'm appreciate if you can give me some cases which can't use it to fit.
> > In case have to use 'rte_timer' in multi-pthread, there are some
> > prerequisites and limitations.
> > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do pthread
> > init by rte_pthread_prepare)
> > 2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in
> > multi-pthread, make sure they're not on the same core.
> >
> > -Cunming
> >
> > > -----Original Message-----
> > > From: Walukiewicz, Miroslaw
> > > Sent: Thursday, December 11, 2014 5:57 PM
> > > To: Liang, Cunming; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Thank you Cunming for explanation.
> > >
> > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > spinlocks.
> > >
> > > Mirek
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
> > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > >
> > > > Scope & Usage Scenario
> > > > ========================
> > > >
> > > > DPDK usually pin pthread per core to avoid task switch overhead. It gains
> > > > performance a lot, but it's not efficient in all cases. In some cases, it may
> > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > reasonable demand to have multiple threads per core and each threads
> > > > share CPU
> > > > in an assigned weight.
> > > >
> > > > In fact, nothing avoid user to create normal pthread and using cgroup to
> > > > control the CPU share. One of the purpose for the patchset is to clean the
> > > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > > demonstrates performance gain by proactive 'yield' when doing idle loop
> > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > > >
> > > >
> > > > Changes to DPDK libraries
> > > > ==========================
> > > >
> > > > Some of DPDK libraries must run in DPDK environment.
> > > >
> > > > # rte_mempool
> > > >
> > > > In rte_mempool doc, it mentions a thread not created by EAL must not
> > use
> > > > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > > > And 'rte_lcore_id()' will not return a correct value.
> > > >
> > > > The patchset changes this a little. The index of mempool cache won't be a
> > > > lcore_id. Instead of it, using a linear number generated by the allocator.
> > > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > > during creation. For those normal pthread expecting to use
> > rte_mempool, it
> > > > requires to apply for a linear id explicitly. Now the mempool cache looks
> > like
> > > > a per-thread base. The linear ID actually identify for the linear thread id.
> > > >
> > > > However, there's another problem. The rte_mempool is not
> > preemptable.
> > > > The
> > > > problem comes from rte_ring, so talk together in next section.
> > > >
> > > > # rte_ring
> > > >
> > > > rte_ring supports multi-producer enqueue and multi-consumer
> > dequeue.
> > > > But it's
> > > > not preemptable. There's conversation talking about this before.
> > > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > > >
> > > > Let's say there's two pthreads running on the same core doing enqueue
> > on
> > > > the
> > > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while
> > it
> > > > has
> > > > already modified the prod.head, the 2nd pthread will spin until the 1st
> > one
> > > > scheduled agian. It causes time wasting. In addition, if the 2nd pthread
> > has
> > > > absolutely higer priority, it's more terrible.
> > > >
> > > > But it doesn't means we can't use. Just need to narrow down the
> > situation
> > > > when
> > > > it's used by multi-pthread on the same core.
> > > > - It CAN be used for any single-producer or single-consumer situation.
> > > > - It MAY be used by multi-producer/consumer pthread whose scheduling
> > > > policy
> > > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance
> > penalty
> > > > befor
> > > > using it.
> > > > - It MUST not be used by multi-producer/consumer pthread, while some
> > of
> > > > their
> > > > scheduling policies is SCHED_FIFO or SCHED_RR.
> > > >
> > > >
> > > > Performance
> > > > ==============
> > > >
> > > > It loses performance by introducing task switching. On packet IO
> > perspective,
> > > > we can gain some back by improving IO effective rate. When the pthread
> > do
> > > > idle
> > > > loop on an empty rx queue, it should proactively yield. We can also slow
> > > > down
> > > > rx for a bit while to take more advantage of the bulk receiving in the next
> > > > loop. In practice, increase the rx ring size also helps to improve the
> > overrall
> > > > throughput.
> > > >
> > > >
> > > > Cgroup Control
> > > > ================
> > > >
> > > > Here's a simple example, there's four pthread doing packet IO on the
> > same
> > > > core.
> > > > We expect the CPU share rate is 1:1:2:4.
> > > > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > > > cd /sys/fs/cgroup/cpu/dpdk
> > > > > echo 256 > thread0/cpu.shares
> > > > > echo 256 > thread1/cpu.shares
> > > > > echo 512 > thread2/cpu.shares
> > > > > echo 1024 > thread3/cpu.shares
> > > >
> > > >
> > > > -END-
> > > >
> > > > Any comments are welcome.
> > > >
> > > > Thanks
> > > >
> > > > *** BLURB HERE ***
> > > >
> > > > Cunming Liang (7):
> > > >   eal: add linear thread id as pthread-local variable
> > > >   mempool: use linear-tid as mempool cache index
> > > >   ring: use linear-tid as ring debug stats index
> > > >   eal: add simple API for multi-pthread
> > > >   testpmd: support multi-pthread mode
> > > >   sample: add new sample for multi-pthread
> > > >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> > > >
> > > >  app/test-pmd/cmdline.c                    |  41 +++++
> > > >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> > > >  app/test-pmd/testpmd.h                    |   1 +
> > > >  config/common_linuxapp                    |   1 +
> > > >  examples/multi-pthread/Makefile           |  57 ++++++
> > > >  examples/multi-pthread/main.c             | 232
> > > ++++++++++++++++++++++++
> > > >  examples/multi-pthread/main.h             |  46 +++++
> > > >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> > > >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > > > +++++++++++++++++++++++++++---
> > > >  lib/librte_mempool/rte_mempool.h          |  22 +--
> > > >  lib/librte_ring/rte_ring.h                |   6 +-
> > > >  12 files changed, 755 insertions(+), 44 deletions(-)
> > > >  create mode 100644 examples/multi-pthread/Makefile
> > > >  create mode 100644 examples/multi-pthread/main.c
> > > >  create mode 100644 examples/multi-pthread/main.h
> > > >
> > > > --
> > > > 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
@ 2014-12-16  7:00   ` Qiu, Michael
  2014-12-22 19:02   ` Ananyev, Konstantin
  1 sibling, 0 replies; 37+ messages in thread
From: Qiu, Michael @ 2014-12-16  7:00 UTC (permalink / raw)
  To: Liang, Cunming, dev

On 12/11/2014 10:05 AM, Cunming Liang wrote:
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> ---
>  lib/librte_eal/common/include/rte_eal.h   |   5 ++
>  lib/librte_eal/common/include/rte_lcore.h |  12 ++++
>  lib/librte_eal/linuxapp/eal/eal_thread.c  | 115 ++++++++++++++++++++++++++++--
>  3 files changed, 126 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index f4ecd2e..2640167 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -262,6 +262,11 @@ rte_set_application_usage_hook( rte_usage_hook_t usage_func );
>   */
>  int rte_eal_has_hugepages(void);
>  
> +#ifndef RTE_MAX_THREAD
> +#define RTE_MAX_THREAD                RTE_MAX_LCORE
> +#endif
> +
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
> index 49b2c03..cd83d47 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -73,6 +73,7 @@ struct lcore_config {
>  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
>  
>  RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
> +RTE_DECLARE_PER_LCORE(unsigned, _thread_id); /**< Per thread "linear tid". */
>  
>  /**
>   * Return the ID of the execution unit we are running on.
> @@ -86,6 +87,17 @@ rte_lcore_id(void)
>  }
>  
>  /**
> + * Return the linear thread ID of the cache unit we are running on.
> + * @return
> + *   core ID
> + */
> +static inline unsigned long
> +rte_linear_thread_id(void)
> +{
> +	return RTE_PER_LCORE(_thread_id);
> +}
> +
> +/**
>   * Get the id of the master lcore
>   *
>   * @return
> diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
> index 80a985f..52478d6 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
> @@ -39,6 +39,7 @@
>  #include <pthread.h>
>  #include <sched.h>
>  #include <sys/queue.h>
> +#include <string.h>
>  
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> @@ -51,12 +52,19 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> +#include <rte_spinlock.h>
> +#include <rte_common.h>
>  
>  #include "eal_private.h"
>  #include "eal_thread.h"
>  
> +#define LINEAR_THREAD_ID_POOL        "THREAD_ID_POOL"
> +
>  RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
>  
> +/* define linear thread id as thread-local variables */
> +RTE_DEFINE_PER_LCORE(unsigned, _thread_id);
> +
>  /*
>   * Send a message to a slave lcore identified by slave_id to call a
>   * function f with argument arg. Once the execution is done, the
> @@ -94,12 +102,13 @@ rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
>  	return 0;
>  }
>  
> +
>  /* set affinity for current thread */
>  static int
> -eal_thread_set_affinity(void)
> +__eal_thread_set_affinity(pthread_t thread, unsigned lcore)
>  {
> +
>  	int s;
> -	pthread_t thread;
>  
>  /*
>   * According to the section VERSIONS of the CPU_ALLOC man page:
> @@ -126,9 +135,8 @@ eal_thread_set_affinity(void)
>  
>  	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
>  	CPU_ZERO_S(size, cpusetp);
> -	CPU_SET_S(rte_lcore_id(), size, cpusetp);
> +	CPU_SET_S(lcore, size, cpusetp);
>  
> -	thread = pthread_self();
>  	s = pthread_setaffinity_np(thread, size, cpusetp);
>  	if (s != 0) {
>  		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> @@ -140,9 +148,8 @@ eal_thread_set_affinity(void)
>  #else /* CPU_ALLOC */
>  	cpu_set_t cpuset;
>  	CPU_ZERO( &cpuset );
> -	CPU_SET( rte_lcore_id(), &cpuset );
> +	CPU_SET(lcore, &cpuset );
>  
> -	thread = pthread_self();
>  	s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
>  	if (s != 0) {
>  		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> @@ -152,6 +159,15 @@ eal_thread_set_affinity(void)
>  	return 0;
>  }
>  
> +/* set affinity for current thread */
> +static int
> +eal_thread_set_affinity(void)
> +{
> +	pthread_t thread = pthread_self();
> +
> +	return __eal_thread_set_affinity(thread, rte_lcore_id());
> +}
> +
>  void eal_thread_init_master(unsigned lcore_id)
>  {
>  	/* set the lcore ID in per-lcore memory area */
> @@ -162,6 +178,87 @@ void eal_thread_init_master(unsigned lcore_id)
>  		rte_panic("cannot set affinity\n");
>  }
>  
> +/* linear thread id control block */
> +struct eal_thread_cb {
> +	rte_spinlock_t lock;
> +	uint64_t nb_bucket;
> +	uint64_t bitmap[0];
> +};
> +

Can this struct been declared in header files?
 
> +static struct eal_thread_cb *
> +__create_tid_pool(void)
> +{
> +	const struct rte_memzone *mz;
> +	struct eal_thread_cb *pcb;
> +	uint64_t sz;
> +	uint64_t nb_bucket;
> +
> +	nb_bucket = RTE_ALIGN_CEIL(RTE_MAX_THREAD, 64) / 64;

 Is it better to replace division to right shift?

nb_bucket = RTE_ALIGN_CEIL(RTE_MAX_THREAD, 64) >> 6;


> +	sz = sizeof(*pcb) + nb_bucket * sizeof(uint64_t);
> +	mz = rte_memzone_reserve(LINEAR_THREAD_ID_POOL,
> +				 sz, rte_socket_id(), 0);
> +	if (mz == NULL)
> +		rte_panic("Cannot allocate linear thread ID pool\n");
> +
> +	pcb = mz->addr;
> +	rte_spinlock_init(&pcb->lock);
> +	pcb->nb_bucket = nb_bucket;
> +	memset(pcb->bitmap, 0, nb_bucket * sizeof(uint64_t));
> +
> +	return pcb;
> +}
> +
> +static int
> +__get_linear_tid(uint64_t *tid)
> +{
> +	const struct rte_memzone *mz;
> +	struct eal_thread_cb *pcb;
> +	uint64_t i;
> +	uint8_t shift = 0;
> +
> +	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
> +	if (mz != NULL)
> +		pcb = mz->addr;
> +	else
> +		pcb = __create_tid_pool();
> +
> +	rte_spinlock_lock(&pcb->lock);
> +	for (i = 0; i < pcb->nb_bucket; i++) {
> +		if (pcb->bitmap[i] == (uint64_t)-1)

It is better for bitmap as ~0(or ~(uint64_t)0) instead of (uint64_t)-1
for all bit set.
 
> +			continue;
> +		shift = 0; 
> +		while (pcb->bitmap[i] & (1UL << shift))
> +			shift ++;
> +		pcb->bitmap[i] |= (1UL << shift);
> +		break;
> +	}
> +	rte_spinlock_unlock(&pcb->lock);
> +
> +	if (i == pcb->nb_bucket)
> +		return -1;
> +
> +	*tid = i * 64 + shift;
> +	return 0;
> +}
> +
> +static void __rte_unused
> +__put_linear_tid(uint64_t tid)
> +{
> +	const struct rte_memzone *mz;
> +	struct eal_thread_cb *pcb;
> +	uint8_t shift;
> +
> +	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
> +	if (!mz)
> +		return;
> +
> +	pcb = mz->addr;
> +	rte_spinlock_lock(&pcb->lock);
> +	shift = tid & 0x3F;
> +	pcb->bitmap[tid / 64] &= ~(1UL << shift);

tid >> 6


> +	rte_spinlock_unlock(&pcb->lock);	
> +}
> +
>  /* main loop of threads */
>  __attribute__((noreturn)) void *
>  eal_thread_loop(__attribute__((unused)) void *arg)
> @@ -169,6 +266,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
>  	char c;
>  	int n, ret;
>  	unsigned lcore_id;
> +	unsigned long ltid = 0;
>  	pthread_t thread_id;
>  	int m2s, s2m;
>  
> @@ -191,6 +289,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
>  	/* set the lcore ID in per-lcore memory area */
>  	RTE_PER_LCORE(_lcore_id) = lcore_id;
>  
> +	/* set the linear thread ID in per-lcore memory area */
> +	if (__get_linear_tid(&ltid) < 0)
> +		rte_panic("cannot get cache slot id\n");
> +	RTE_PER_LCORE(_thread_id) = ltid;
> +
>  	/* set CPU affinity */
>  	if (eal_thread_set_affinity() < 0)
>  		rte_panic("cannot set affinity\n");


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-15 11:53       ` Liang, Cunming
@ 2014-12-18 12:20         ` Walukiewicz, Miroslaw
  2014-12-18 14:32           ` Bruce Richardson
                             ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Walukiewicz, Miroslaw @ 2014-12-18 12:20 UTC (permalink / raw)
  To: Liang, Cunming, dev

I have another question regarding your patch.

 Could we extend values returned by rte_lcore_id() to set them per thread (really the DPDK lcore is a pthread but started on specific core) instead of creating linear thread id. 

The patch would be much simpler and will work same way. The only change would be extending rte_lcore_id when rte_pthread_create() is called. 

The value __lcore_id has really an attribute __thread that means it is valid not only per CPU core but also per thread.

The mempools, timers, statistics would work without any modifications in that environment.

 I do not see any reason why old legacy DPDK applications would not work in that model. 

Mirek

> -----Original Message-----
> From: Liang, Cunming
> Sent: Monday, December 15, 2014 12:53 PM
> To: Walukiewicz, Miroslaw; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Hi Mirek,
> 
> That sounds great.
> Looking forward to it.
> 
> -Cunming
> 
> > -----Original Message-----
> > From: Walukiewicz, Miroslaw
> > Sent: Monday, December 15, 2014 7:11 PM
> > To: Liang, Cunming; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Hi Cunming,
> >
> > The timers could be used by any application/library started as a standard
> > pthread.
> > Each pthread needs to have assigned some identifier same way as you are
> doing
> > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> examples)
> >
> > I made series of patches extending the rte timers API to use with such kind
> of
> > identifier keeping existing API working also.
> >
> > I will send it soon.
> >
> > Mirek
> >
> >
> > > -----Original Message-----
> > > From: Liang, Cunming
> > > Sent: Friday, December 12, 2014 6:45 AM
> > > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> letter.
> > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> pthread.
> > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > prerequisites and limitations.
> > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> pthread
> > > init by rte_pthread_prepare)
> > > 2. As 'rte_timer' is not preemptable, when using
> rte_timer_manager/reset in
> > > multi-pthread, make sure they're not on the same core.
> > >
> > > -Cunming
> > >
> > > > -----Original Message-----
> > > > From: Walukiewicz, Miroslaw
> > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > To: Liang, Cunming; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> lcore
> > > >
> > > > Thank you Cunming for explanation.
> > > >
> > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > spinlocks.
> > > >
> > > > Mirek
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming
> Liang
> > > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > > To: dev@dpdk.org
> > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > >
> > > > >
> > > > > Scope & Usage Scenario
> > > > > ========================
> > > > >
> > > > > DPDK usually pin pthread per core to avoid task switch overhead. It
> gains
> > > > > performance a lot, but it's not efficient in all cases. In some cases, it
> may
> > > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > > reasonable demand to have multiple threads per core and each
> threads
> > > > > share CPU
> > > > > in an assigned weight.
> > > > >
> > > > > In fact, nothing avoid user to create normal pthread and using cgroup
> to
> > > > > control the CPU share. One of the purpose for the patchset is to clean
> the
> > > > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > > > demonstrates performance gain by proactive 'yield' when doing idle
> loop
> > > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > > > >
> > > > >
> > > > > Changes to DPDK libraries
> > > > > ==========================
> > > > >
> > > > > Some of DPDK libraries must run in DPDK environment.
> > > > >
> > > > > # rte_mempool
> > > > >
> > > > > In rte_mempool doc, it mentions a thread not created by EAL must
> not
> > > use
> > > > > mempools. The root cause is it uses a per-lcore cache inside
> mempool.
> > > > > And 'rte_lcore_id()' will not return a correct value.
> > > > >
> > > > > The patchset changes this a little. The index of mempool cache won't
> be a
> > > > > lcore_id. Instead of it, using a linear number generated by the
> allocator.
> > > > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > > > during creation. For those normal pthread expecting to use
> > > rte_mempool, it
> > > > > requires to apply for a linear id explicitly. Now the mempool cache
> looks
> > > like
> > > > > a per-thread base. The linear ID actually identify for the linear thread
> id.
> > > > >
> > > > > However, there's another problem. The rte_mempool is not
> > > preemptable.
> > > > > The
> > > > > problem comes from rte_ring, so talk together in next section.
> > > > >
> > > > > # rte_ring
> > > > >
> > > > > rte_ring supports multi-producer enqueue and multi-consumer
> > > dequeue.
> > > > > But it's
> > > > > not preemptable. There's conversation talking about this before.
> > > > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > > > >
> > > > > Let's say there's two pthreads running on the same core doing
> enqueue
> > > on
> > > > > the
> > > > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread
> while
> > > it
> > > > > has
> > > > > already modified the prod.head, the 2nd pthread will spin until the 1st
> > > one
> > > > > scheduled agian. It causes time wasting. In addition, if the 2nd
> pthread
> > > has
> > > > > absolutely higer priority, it's more terrible.
> > > > >
> > > > > But it doesn't means we can't use. Just need to narrow down the
> > > situation
> > > > > when
> > > > > it's used by multi-pthread on the same core.
> > > > > - It CAN be used for any single-producer or single-consumer situation.
> > > > > - It MAY be used by multi-producer/consumer pthread whose
> scheduling
> > > > > policy
> > > > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance
> > > penalty
> > > > > befor
> > > > > using it.
> > > > > - It MUST not be used by multi-producer/consumer pthread, while
> some
> > > of
> > > > > their
> > > > > scheduling policies is SCHED_FIFO or SCHED_RR.
> > > > >
> > > > >
> > > > > Performance
> > > > > ==============
> > > > >
> > > > > It loses performance by introducing task switching. On packet IO
> > > perspective,
> > > > > we can gain some back by improving IO effective rate. When the
> pthread
> > > do
> > > > > idle
> > > > > loop on an empty rx queue, it should proactively yield. We can also
> slow
> > > > > down
> > > > > rx for a bit while to take more advantage of the bulk receiving in the
> next
> > > > > loop. In practice, increase the rx ring size also helps to improve the
> > > overrall
> > > > > throughput.
> > > > >
> > > > >
> > > > > Cgroup Control
> > > > > ================
> > > > >
> > > > > Here's a simple example, there's four pthread doing packet IO on the
> > > same
> > > > > core.
> > > > > We expect the CPU share rate is 1:1:2:4.
> > > > > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > > > > cd /sys/fs/cgroup/cpu/dpdk
> > > > > > echo 256 > thread0/cpu.shares
> > > > > > echo 256 > thread1/cpu.shares
> > > > > > echo 512 > thread2/cpu.shares
> > > > > > echo 1024 > thread3/cpu.shares
> > > > >
> > > > >
> > > > > -END-
> > > > >
> > > > > Any comments are welcome.
> > > > >
> > > > > Thanks
> > > > >
> > > > > *** BLURB HERE ***
> > > > >
> > > > > Cunming Liang (7):
> > > > >   eal: add linear thread id as pthread-local variable
> > > > >   mempool: use linear-tid as mempool cache index
> > > > >   ring: use linear-tid as ring debug stats index
> > > > >   eal: add simple API for multi-pthread
> > > > >   testpmd: support multi-pthread mode
> > > > >   sample: add new sample for multi-pthread
> > > > >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> > > > >
> > > > >  app/test-pmd/cmdline.c                    |  41 +++++
> > > > >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> > > > >  app/test-pmd/testpmd.h                    |   1 +
> > > > >  config/common_linuxapp                    |   1 +
> > > > >  examples/multi-pthread/Makefile           |  57 ++++++
> > > > >  examples/multi-pthread/main.c             | 232
> > > > ++++++++++++++++++++++++
> > > > >  examples/multi-pthread/main.h             |  46 +++++
> > > > >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> > > > >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > > > > +++++++++++++++++++++++++++---
> > > > >  lib/librte_mempool/rte_mempool.h          |  22 +--
> > > > >  lib/librte_ring/rte_ring.h                |   6 +-
> > > > >  12 files changed, 755 insertions(+), 44 deletions(-)
> > > > >  create mode 100644 examples/multi-pthread/Makefile
> > > > >  create mode 100644 examples/multi-pthread/main.c
> > > > >  create mode 100644 examples/multi-pthread/main.h
> > > > >
> > > > > --
> > > > > 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-18 12:20         ` Walukiewicz, Miroslaw
@ 2014-12-18 14:32           ` Bruce Richardson
  2014-12-18 15:11             ` Olivier MATZ
  2014-12-18 16:15           ` Stephen Hemminger
  2014-12-19  1:28           ` Liang, Cunming
  2 siblings, 1 reply; 37+ messages in thread
From: Bruce Richardson @ 2014-12-18 14:32 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw; +Cc: dev

On Thu, Dec 18, 2014 at 12:20:07PM +0000, Walukiewicz, Miroslaw wrote:
> I have another question regarding your patch.
> 
>  Could we extend values returned by rte_lcore_id() to set them per thread (really the DPDK lcore is a pthread but started on specific core) instead of creating linear thread id. 
> 
> The patch would be much simpler and will work same way. The only change would be extending rte_lcore_id when rte_pthread_create() is called. 
> 
> The value __lcore_id has really an attribute __thread that means it is valid not only per CPU core but also per thread.
> 
> The mempools, timers, statistics would work without any modifications in that environment.
> 
>  I do not see any reason why old legacy DPDK applications would not work in that model. 
> 
> Mirek

Definite +1 here. 

/Bruce

> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Monday, December 15, 2014 12:53 PM
> > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > 
> > Hi Mirek,
> > 
> > That sounds great.
> > Looking forward to it.
> > 
> > -Cunming
> > 
> > > -----Original Message-----
> > > From: Walukiewicz, Miroslaw
> > > Sent: Monday, December 15, 2014 7:11 PM
> > > To: Liang, Cunming; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Hi Cunming,
> > >
> > > The timers could be used by any application/library started as a standard
> > > pthread.
> > > Each pthread needs to have assigned some identifier same way as you are
> > doing
> > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> > examples)
> > >
> > > I made series of patches extending the rte timers API to use with such kind
> > of
> > > identifier keeping existing API working also.
> > >
> > > I will send it soon.
> > >
> > > Mirek
> > >
> > >
> > > > -----Original Message-----
> > > > From: Liang, Cunming
> > > > Sent: Friday, December 12, 2014 6:45 AM
> > > > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> > letter.
> > > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> > pthread.
> > > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > > prerequisites and limitations.
> > > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> > pthread
> > > > init by rte_pthread_prepare)
> > > > 2. As 'rte_timer' is not preemptable, when using
> > rte_timer_manager/reset in
> > > > multi-pthread, make sure they're not on the same core.
> > > >
> > > > -Cunming
> > > >
> > > > > -----Original Message-----
> > > > > From: Walukiewicz, Miroslaw
> > > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > > To: Liang, Cunming; dev@dpdk.org
> > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > lcore
> > > > >
> > > > > Thank you Cunming for explanation.
> > > > >
> > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > > spinlocks.
> > > > >
> > > > > Mirek
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming
> > Liang
> > > > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > > > To: dev@dpdk.org
> > > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > > >
> > > > > >
> > > > > > Scope & Usage Scenario
> > > > > > ========================
> > > > > >
> > > > > > DPDK usually pin pthread per core to avoid task switch overhead. It
> > gains
> > > > > > performance a lot, but it's not efficient in all cases. In some cases, it
> > may
> > > > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > > > reasonable demand to have multiple threads per core and each
> > threads
> > > > > > share CPU
> > > > > > in an assigned weight.
> > > > > >
> > > > > > In fact, nothing avoid user to create normal pthread and using cgroup
> > to
> > > > > > control the CPU share. One of the purpose for the patchset is to clean
> > the
> > > > > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > > > > demonstrates performance gain by proactive 'yield' when doing idle
> > loop
> > > > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > > > > >
> > > > > >
> > > > > > Changes to DPDK libraries
> > > > > > ==========================
> > > > > >
> > > > > > Some of DPDK libraries must run in DPDK environment.
> > > > > >
> > > > > > # rte_mempool
> > > > > >
> > > > > > In rte_mempool doc, it mentions a thread not created by EAL must
> > not
> > > > use
> > > > > > mempools. The root cause is it uses a per-lcore cache inside
> > mempool.
> > > > > > And 'rte_lcore_id()' will not return a correct value.
> > > > > >
> > > > > > The patchset changes this a little. The index of mempool cache won't
> > be a
> > > > > > lcore_id. Instead of it, using a linear number generated by the
> > allocator.
> > > > > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > > > > during creation. For those normal pthread expecting to use
> > > > rte_mempool, it
> > > > > > requires to apply for a linear id explicitly. Now the mempool cache
> > looks
> > > > like
> > > > > > a per-thread base. The linear ID actually identify for the linear thread
> > id.
> > > > > >
> > > > > > However, there's another problem. The rte_mempool is not
> > > > preemptable.
> > > > > > The
> > > > > > problem comes from rte_ring, so talk together in next section.
> > > > > >
> > > > > > # rte_ring
> > > > > >
> > > > > > rte_ring supports multi-producer enqueue and multi-consumer
> > > > dequeue.
> > > > > > But it's
> > > > > > not preemptable. There's conversation talking about this before.
> > > > > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > > > > >
> > > > > > Let's say there's two pthreads running on the same core doing
> > enqueue
> > > > on
> > > > > > the
> > > > > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread
> > while
> > > > it
> > > > > > has
> > > > > > already modified the prod.head, the 2nd pthread will spin until the 1st
> > > > one
> > > > > > scheduled agian. It causes time wasting. In addition, if the 2nd
> > pthread
> > > > has
> > > > > > absolutely higer priority, it's more terrible.
> > > > > >
> > > > > > But it doesn't means we can't use. Just need to narrow down the
> > > > situation
> > > > > > when
> > > > > > it's used by multi-pthread on the same core.
> > > > > > - It CAN be used for any single-producer or single-consumer situation.
> > > > > > - It MAY be used by multi-producer/consumer pthread whose
> > scheduling
> > > > > > policy
> > > > > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance
> > > > penalty
> > > > > > befor
> > > > > > using it.
> > > > > > - It MUST not be used by multi-producer/consumer pthread, while
> > some
> > > > of
> > > > > > their
> > > > > > scheduling policies is SCHED_FIFO or SCHED_RR.
> > > > > >
> > > > > >
> > > > > > Performance
> > > > > > ==============
> > > > > >
> > > > > > It loses performance by introducing task switching. On packet IO
> > > > perspective,
> > > > > > we can gain some back by improving IO effective rate. When the
> > pthread
> > > > do
> > > > > > idle
> > > > > > loop on an empty rx queue, it should proactively yield. We can also
> > slow
> > > > > > down
> > > > > > rx for a bit while to take more advantage of the bulk receiving in the
> > next
> > > > > > loop. In practice, increase the rx ring size also helps to improve the
> > > > overrall
> > > > > > throughput.
> > > > > >
> > > > > >
> > > > > > Cgroup Control
> > > > > > ================
> > > > > >
> > > > > > Here's a simple example, there's four pthread doing packet IO on the
> > > > same
> > > > > > core.
> > > > > > We expect the CPU share rate is 1:1:2:4.
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > > > > > cd /sys/fs/cgroup/cpu/dpdk
> > > > > > > echo 256 > thread0/cpu.shares
> > > > > > > echo 256 > thread1/cpu.shares
> > > > > > > echo 512 > thread2/cpu.shares
> > > > > > > echo 1024 > thread3/cpu.shares
> > > > > >
> > > > > >
> > > > > > -END-
> > > > > >
> > > > > > Any comments are welcome.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > *** BLURB HERE ***
> > > > > >
> > > > > > Cunming Liang (7):
> > > > > >   eal: add linear thread id as pthread-local variable
> > > > > >   mempool: use linear-tid as mempool cache index
> > > > > >   ring: use linear-tid as ring debug stats index
> > > > > >   eal: add simple API for multi-pthread
> > > > > >   testpmd: support multi-pthread mode
> > > > > >   sample: add new sample for multi-pthread
> > > > > >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> > > > > >
> > > > > >  app/test-pmd/cmdline.c                    |  41 +++++
> > > > > >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> > > > > >  app/test-pmd/testpmd.h                    |   1 +
> > > > > >  config/common_linuxapp                    |   1 +
> > > > > >  examples/multi-pthread/Makefile           |  57 ++++++
> > > > > >  examples/multi-pthread/main.c             | 232
> > > > > ++++++++++++++++++++++++
> > > > > >  examples/multi-pthread/main.h             |  46 +++++
> > > > > >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> > > > > >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > > > > > +++++++++++++++++++++++++++---
> > > > > >  lib/librte_mempool/rte_mempool.h          |  22 +--
> > > > > >  lib/librte_ring/rte_ring.h                |   6 +-
> > > > > >  12 files changed, 755 insertions(+), 44 deletions(-)
> > > > > >  create mode 100644 examples/multi-pthread/Makefile
> > > > > >  create mode 100644 examples/multi-pthread/main.c
> > > > > >  create mode 100644 examples/multi-pthread/main.h
> > > > > >
> > > > > > --
> > > > > > 1.8.1.4
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-18 14:32           ` Bruce Richardson
@ 2014-12-18 15:11             ` Olivier MATZ
  2014-12-18 16:04               ` Bruce Richardson
  0 siblings, 1 reply; 37+ messages in thread
From: Olivier MATZ @ 2014-12-18 15:11 UTC (permalink / raw)
  To: Bruce Richardson, Walukiewicz, Miroslaw; +Cc: dev

Hi,

On 12/18/2014 03:32 PM, Bruce Richardson wrote:
> On Thu, Dec 18, 2014 at 12:20:07PM +0000, Walukiewicz, Miroslaw wrote:
>> I have another question regarding your patch.
>>
>>  Could we extend values returned by rte_lcore_id() to set them per thread (really the DPDK lcore is a pthread but started on specific core) instead of creating linear thread id. 
>>
>> The patch would be much simpler and will work same way. The only change would be extending rte_lcore_id when rte_pthread_create() is called. 
>>
>> The value __lcore_id has really an attribute __thread that means it is valid not only per CPU core but also per thread.
>>
>> The mempools, timers, statistics would work without any modifications in that environment.
>>
>>  I do not see any reason why old legacy DPDK applications would not work in that model. 
>>
>> Mirek
> 
> Definite +1 here. 

One remark though: it looks that the rte_rings (and therefore the
rte_mempools) are designed with the assumption that the execution
units are alone on their cores.

As explained in [1], there is a risk that a pthread is interrupted
by the kernel at a bad moment. Therefore another thread can be
blocked, spinning on a variable to change its value.

The same could also occurs with spinlocks which are not designed
to wakeup another pthread when the lock is held (like pthread_locks).

And finally, having several pthreads per core implies that the
application should be designed with large queues: if a pthread is
not scheduled during 10ms, it represents 100K packets at 10M PPS.

I don't say it's impossible to do it, but I think it's not so
simple :)

Regards,
Olivier

[1] http://dpdk.org/ml/archives/dev/2013-November/000714.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-18 15:11             ` Olivier MATZ
@ 2014-12-18 16:04               ` Bruce Richardson
  0 siblings, 0 replies; 37+ messages in thread
From: Bruce Richardson @ 2014-12-18 16:04 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev

On Thu, Dec 18, 2014 at 04:11:12PM +0100, Olivier MATZ wrote:
> Hi,
> 
> On 12/18/2014 03:32 PM, Bruce Richardson wrote:
> > On Thu, Dec 18, 2014 at 12:20:07PM +0000, Walukiewicz, Miroslaw wrote:
> >> I have another question regarding your patch.
> >>
> >>  Could we extend values returned by rte_lcore_id() to set them per thread (really the DPDK lcore is a pthread but started on specific core) instead of creating linear thread id. 
> >>
> >> The patch would be much simpler and will work same way. The only change would be extending rte_lcore_id when rte_pthread_create() is called. 
> >>
> >> The value __lcore_id has really an attribute __thread that means it is valid not only per CPU core but also per thread.
> >>
> >> The mempools, timers, statistics would work without any modifications in that environment.
> >>
> >>  I do not see any reason why old legacy DPDK applications would not work in that model. 
> >>
> >> Mirek
> > 
> > Definite +1 here. 
> 
> One remark though: it looks that the rte_rings (and therefore the
> rte_mempools) are designed with the assumption that the execution
> units are alone on their cores.
> 
> As explained in [1], there is a risk that a pthread is interrupted
> by the kernel at a bad moment. Therefore another thread can be
> blocked, spinning on a variable to change its value.
> 
> The same could also occurs with spinlocks which are not designed
> to wakeup another pthread when the lock is held (like pthread_locks).
> 
> And finally, having several pthreads per core implies that the
> application should be designed with large queues: if a pthread is
> not scheduled during 10ms, it represents 100K packets at 10M PPS.
> 
> I don't say it's impossible to do it, but I think it's not so
> simple :)
> 
> Regards,
> Olivier
> 
> [1] http://dpdk.org/ml/archives/dev/2013-November/000714.html

Yes, completely agree, but because there are so many potential pitfalls, I
think we should take small steps without making major changes. Hence my agreement
with Mirek's suggestion as the simplest way forward that gives us good possibilities
and flexibility without major work.

BTW: For the multi-thread per core case, I think we'll need to look at threads
making extensive use of yields to help force the context switches at proper times.

/Bruce

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-18 12:20         ` Walukiewicz, Miroslaw
  2014-12-18 14:32           ` Bruce Richardson
@ 2014-12-18 16:15           ` Stephen Hemminger
  2014-12-19  1:28           ` Liang, Cunming
  2 siblings, 0 replies; 37+ messages in thread
From: Stephen Hemminger @ 2014-12-18 16:15 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw; +Cc: dev

On Thu, 18 Dec 2014 12:20:07 +0000
"Walukiewicz, Miroslaw" <Miroslaw.Walukiewicz@intel.com> wrote:

>  Could we extend values returned by rte_lcore_id() to set them per thread (really the DPDK lcore is a pthread but started on specific core) instead of creating linear thread id.

The linear thread id is very useful for having per-core statistics tables.
This is done in lots of places to avoid cache thrashing.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-18 12:20         ` Walukiewicz, Miroslaw
  2014-12-18 14:32           ` Bruce Richardson
  2014-12-18 16:15           ` Stephen Hemminger
@ 2014-12-19  1:28           ` Liang, Cunming
  2014-12-19 10:03             ` Bruce Richardson
  2 siblings, 1 reply; 37+ messages in thread
From: Liang, Cunming @ 2014-12-19  1:28 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, dev



> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Thursday, December 18, 2014 8:20 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> I have another question regarding your patch.
> 
>  Could we extend values returned by rte_lcore_id() to set them per thread (really
> the DPDK lcore is a pthread but started on specific core) instead of creating linear
> thread id.
[Liang, Cunming] As you said, __lcore_id is already per thread. 
Per the semantic meaning, it stands for logic cpu id. 
When multi-thread running on the same lcore, they should get the same value return by rte_lcore_id().
The same effective like 'schedu_getcpu()', but less using cost.
> 
> The patch would be much simpler and will work same way. The only change
> would be extending rte_lcore_id when rte_pthread_create() is called.
[Liang, Cunming] I ever think about it which using rte_lcore_id() to get unique id per pthread rather than have a new API.
But the name lcore actually no longer identify for cpu id. It may impact all existing user application who use the exact meaning of it.
How do you think ?
> 
> The value __lcore_id has really an attribute __thread that means it is valid not
> only per CPU core but also per thread.
> 
> The mempools, timers, statistics would work without any modifications in that
> environment.
> 
>  I do not see any reason why old legacy DPDK applications would not work in that
> model.
> 
> Mirek
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Monday, December 15, 2014 12:53 PM
> > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Hi Mirek,
> >
> > That sounds great.
> > Looking forward to it.
> >
> > -Cunming
> >
> > > -----Original Message-----
> > > From: Walukiewicz, Miroslaw
> > > Sent: Monday, December 15, 2014 7:11 PM
> > > To: Liang, Cunming; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Hi Cunming,
> > >
> > > The timers could be used by any application/library started as a standard
> > > pthread.
> > > Each pthread needs to have assigned some identifier same way as you are
> > doing
> > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> > examples)
> > >
> > > I made series of patches extending the rte timers API to use with such kind
> > of
> > > identifier keeping existing API working also.
> > >
> > > I will send it soon.
> > >
> > > Mirek
> > >
> > >
> > > > -----Original Message-----
> > > > From: Liang, Cunming
> > > > Sent: Friday, December 12, 2014 6:45 AM
> > > > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> > letter.
> > > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> > pthread.
> > > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > > prerequisites and limitations.
> > > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> > pthread
> > > > init by rte_pthread_prepare)
> > > > 2. As 'rte_timer' is not preemptable, when using
> > rte_timer_manager/reset in
> > > > multi-pthread, make sure they're not on the same core.
> > > >
> > > > -Cunming
> > > >
> > > > > -----Original Message-----
> > > > > From: Walukiewicz, Miroslaw
> > > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > > To: Liang, Cunming; dev@dpdk.org
> > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > lcore
> > > > >
> > > > > Thank you Cunming for explanation.
> > > > >
> > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > > spinlocks.
> > > > >
> > > > > Mirek
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming
> > Liang
> > > > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > > > To: dev@dpdk.org
> > > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > > >
> > > > > >
> > > > > > Scope & Usage Scenario
> > > > > > ========================
> > > > > >
> > > > > > DPDK usually pin pthread per core to avoid task switch overhead. It
> > gains
> > > > > > performance a lot, but it's not efficient in all cases. In some cases, it
> > may
> > > > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > > > reasonable demand to have multiple threads per core and each
> > threads
> > > > > > share CPU
> > > > > > in an assigned weight.
> > > > > >
> > > > > > In fact, nothing avoid user to create normal pthread and using cgroup
> > to
> > > > > > control the CPU share. One of the purpose for the patchset is to clean
> > the
> > > > > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > > > > demonstrates performance gain by proactive 'yield' when doing idle
> > loop
> > > > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > > > > >
> > > > > >
> > > > > > Changes to DPDK libraries
> > > > > > ==========================
> > > > > >
> > > > > > Some of DPDK libraries must run in DPDK environment.
> > > > > >
> > > > > > # rte_mempool
> > > > > >
> > > > > > In rte_mempool doc, it mentions a thread not created by EAL must
> > not
> > > > use
> > > > > > mempools. The root cause is it uses a per-lcore cache inside
> > mempool.
> > > > > > And 'rte_lcore_id()' will not return a correct value.
> > > > > >
> > > > > > The patchset changes this a little. The index of mempool cache won't
> > be a
> > > > > > lcore_id. Instead of it, using a linear number generated by the
> > allocator.
> > > > > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > > > > during creation. For those normal pthread expecting to use
> > > > rte_mempool, it
> > > > > > requires to apply for a linear id explicitly. Now the mempool cache
> > looks
> > > > like
> > > > > > a per-thread base. The linear ID actually identify for the linear thread
> > id.
> > > > > >
> > > > > > However, there's another problem. The rte_mempool is not
> > > > preemptable.
> > > > > > The
> > > > > > problem comes from rte_ring, so talk together in next section.
> > > > > >
> > > > > > # rte_ring
> > > > > >
> > > > > > rte_ring supports multi-producer enqueue and multi-consumer
> > > > dequeue.
> > > > > > But it's
> > > > > > not preemptable. There's conversation talking about this before.
> > > > > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > > > > >
> > > > > > Let's say there's two pthreads running on the same core doing
> > enqueue
> > > > on
> > > > > > the
> > > > > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread
> > while
> > > > it
> > > > > > has
> > > > > > already modified the prod.head, the 2nd pthread will spin until the 1st
> > > > one
> > > > > > scheduled agian. It causes time wasting. In addition, if the 2nd
> > pthread
> > > > has
> > > > > > absolutely higer priority, it's more terrible.
> > > > > >
> > > > > > But it doesn't means we can't use. Just need to narrow down the
> > > > situation
> > > > > > when
> > > > > > it's used by multi-pthread on the same core.
> > > > > > - It CAN be used for any single-producer or single-consumer situation.
> > > > > > - It MAY be used by multi-producer/consumer pthread whose
> > scheduling
> > > > > > policy
> > > > > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance
> > > > penalty
> > > > > > befor
> > > > > > using it.
> > > > > > - It MUST not be used by multi-producer/consumer pthread, while
> > some
> > > > of
> > > > > > their
> > > > > > scheduling policies is SCHED_FIFO or SCHED_RR.
> > > > > >
> > > > > >
> > > > > > Performance
> > > > > > ==============
> > > > > >
> > > > > > It loses performance by introducing task switching. On packet IO
> > > > perspective,
> > > > > > we can gain some back by improving IO effective rate. When the
> > pthread
> > > > do
> > > > > > idle
> > > > > > loop on an empty rx queue, it should proactively yield. We can also
> > slow
> > > > > > down
> > > > > > rx for a bit while to take more advantage of the bulk receiving in the
> > next
> > > > > > loop. In practice, increase the rx ring size also helps to improve the
> > > > overrall
> > > > > > throughput.
> > > > > >
> > > > > >
> > > > > > Cgroup Control
> > > > > > ================
> > > > > >
> > > > > > Here's a simple example, there's four pthread doing packet IO on the
> > > > same
> > > > > > core.
> > > > > > We expect the CPU share rate is 1:1:2:4.
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > > > > > cd /sys/fs/cgroup/cpu/dpdk
> > > > > > > echo 256 > thread0/cpu.shares
> > > > > > > echo 256 > thread1/cpu.shares
> > > > > > > echo 512 > thread2/cpu.shares
> > > > > > > echo 1024 > thread3/cpu.shares
> > > > > >
> > > > > >
> > > > > > -END-
> > > > > >
> > > > > > Any comments are welcome.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > *** BLURB HERE ***
> > > > > >
> > > > > > Cunming Liang (7):
> > > > > >   eal: add linear thread id as pthread-local variable
> > > > > >   mempool: use linear-tid as mempool cache index
> > > > > >   ring: use linear-tid as ring debug stats index
> > > > > >   eal: add simple API for multi-pthread
> > > > > >   testpmd: support multi-pthread mode
> > > > > >   sample: add new sample for multi-pthread
> > > > > >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> > > > > >
> > > > > >  app/test-pmd/cmdline.c                    |  41 +++++
> > > > > >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> > > > > >  app/test-pmd/testpmd.h                    |   1 +
> > > > > >  config/common_linuxapp                    |   1 +
> > > > > >  examples/multi-pthread/Makefile           |  57 ++++++
> > > > > >  examples/multi-pthread/main.c             | 232
> > > > > ++++++++++++++++++++++++
> > > > > >  examples/multi-pthread/main.h             |  46 +++++
> > > > > >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> > > > > >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > > > > > +++++++++++++++++++++++++++---
> > > > > >  lib/librte_mempool/rte_mempool.h          |  22 +--
> > > > > >  lib/librte_ring/rte_ring.h                |   6 +-
> > > > > >  12 files changed, 755 insertions(+), 44 deletions(-)
> > > > > >  create mode 100644 examples/multi-pthread/Makefile
> > > > > >  create mode 100644 examples/multi-pthread/main.c
> > > > > >  create mode 100644 examples/multi-pthread/main.h
> > > > > >
> > > > > > --
> > > > > > 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-19  1:28           ` Liang, Cunming
@ 2014-12-19 10:03             ` Bruce Richardson
  2014-12-22  1:51               ` Liang, Cunming
  0 siblings, 1 reply; 37+ messages in thread
From: Bruce Richardson @ 2014-12-19 10:03 UTC (permalink / raw)
  To: Liang, Cunming; +Cc: dev

On Fri, Dec 19, 2014 at 01:28:47AM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, December 18, 2014 8:20 PM
> > To: Liang, Cunming; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > 
> > I have another question regarding your patch.
> > 
> >  Could we extend values returned by rte_lcore_id() to set them per thread (really
> > the DPDK lcore is a pthread but started on specific core) instead of creating linear
> > thread id.
> [Liang, Cunming] As you said, __lcore_id is already per thread. 
> Per the semantic meaning, it stands for logic cpu id. 
> When multi-thread running on the same lcore, they should get the same value return by rte_lcore_id().
> The same effective like 'schedu_getcpu()', but less using cost.
> > 
> > The patch would be much simpler and will work same way. The only change
> > would be extending rte_lcore_id when rte_pthread_create() is called.
> [Liang, Cunming] I ever think about it which using rte_lcore_id() to get unique id per pthread rather than have a new API.
> But the name lcore actually no longer identify for cpu id. It may impact all existing user application who use the exact meaning of it.
> How do you think ?
> > 

I'm conflicted on this one. However, I think far more applications would be broken
to start having to use thread_id in place of an lcore_id than would be broken
by having the lcore_id no longer actually correspond to a core.
I'm actually struggling to come up with a large number of scenarios where it's
important to an app to determine the cpu it's running on, compared to the large
number of cases where you need to have a data-structure per thread. In DPDK libs
alone, you see this assumption that lcore_id == thread_id a large number of times.

Despite the slight logical inconsistency, I think it's better to avoid introducing
a thread-id and continue having lcore_id representing a unique thread.

/Bruce

> > The value __lcore_id has really an attribute __thread that means it is valid not
> > only per CPU core but also per thread.
> > 
> > The mempools, timers, statistics would work without any modifications in that
> > environment.
> > 
> >  I do not see any reason why old legacy DPDK applications would not work in that
> > model.
> > 
> > Mirek
> > 
> > > -----Original Message-----
> > > From: Liang, Cunming
> > > Sent: Monday, December 15, 2014 12:53 PM
> > > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Hi Mirek,
> > >
> > > That sounds great.
> > > Looking forward to it.
> > >
> > > -Cunming
> > >
> > > > -----Original Message-----
> > > > From: Walukiewicz, Miroslaw
> > > > Sent: Monday, December 15, 2014 7:11 PM
> > > > To: Liang, Cunming; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > Hi Cunming,
> > > >
> > > > The timers could be used by any application/library started as a standard
> > > > pthread.
> > > > Each pthread needs to have assigned some identifier same way as you are
> > > doing
> > > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> > > examples)
> > > >
> > > > I made series of patches extending the rte timers API to use with such kind
> > > of
> > > > identifier keeping existing API working also.
> > > >
> > > > I will send it soon.
> > > >
> > > > Mirek
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Liang, Cunming
> > > > > Sent: Friday, December 12, 2014 6:45 AM
> > > > > To: Walukiewicz, Miroslaw; dev@dpdk.org
> > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > >
> > > > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> > > letter.
> > > > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> > > pthread.
> > > > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > > > prerequisites and limitations.
> > > > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> > > pthread
> > > > > init by rte_pthread_prepare)
> > > > > 2. As 'rte_timer' is not preemptable, when using
> > > rte_timer_manager/reset in
> > > > > multi-pthread, make sure they're not on the same core.
> > > > >
> > > > > -Cunming
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Walukiewicz, Miroslaw
> > > > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > > > To: Liang, Cunming; dev@dpdk.org
> > > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > > lcore
> > > > > >
> > > > > > Thank you Cunming for explanation.
> > > > > >
> > > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > > > spinlocks.
> > > > > >
> > > > > > Mirek
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming
> > > Liang
> > > > > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > > > > To: dev@dpdk.org
> > > > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > > > >
> > > > > > >
> > > > > > > Scope & Usage Scenario
> > > > > > > ========================
> > > > > > >
> > > > > > > DPDK usually pin pthread per core to avoid task switch overhead. It
> > > gains
> > > > > > > performance a lot, but it's not efficient in all cases. In some cases, it
> > > may
> > > > > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > > > > reasonable demand to have multiple threads per core and each
> > > threads
> > > > > > > share CPU
> > > > > > > in an assigned weight.
> > > > > > >
> > > > > > > In fact, nothing avoid user to create normal pthread and using cgroup
> > > to
> > > > > > > control the CPU share. One of the purpose for the patchset is to clean
> > > the
> > > > > > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > > > > > demonstrates performance gain by proactive 'yield' when doing idle
> > > loop
> > > > > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > > > > > >
> > > > > > >
> > > > > > > Changes to DPDK libraries
> > > > > > > ==========================
> > > > > > >
> > > > > > > Some of DPDK libraries must run in DPDK environment.
> > > > > > >
> > > > > > > # rte_mempool
> > > > > > >
> > > > > > > In rte_mempool doc, it mentions a thread not created by EAL must
> > > not
> > > > > use
> > > > > > > mempools. The root cause is it uses a per-lcore cache inside
> > > mempool.
> > > > > > > And 'rte_lcore_id()' will not return a correct value.
> > > > > > >
> > > > > > > The patchset changes this a little. The index of mempool cache won't
> > > be a
> > > > > > > lcore_id. Instead of it, using a linear number generated by the
> > > allocator.
> > > > > > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > > > > > during creation. For those normal pthread expecting to use
> > > > > rte_mempool, it
> > > > > > > requires to apply for a linear id explicitly. Now the mempool cache
> > > looks
> > > > > like
> > > > > > > a per-thread base. The linear ID actually identify for the linear thread
> > > id.
> > > > > > >
> > > > > > > However, there's another problem. The rte_mempool is not
> > > > > preemptable.
> > > > > > > The
> > > > > > > problem comes from rte_ring, so talk together in next section.
> > > > > > >
> > > > > > > # rte_ring
> > > > > > >
> > > > > > > rte_ring supports multi-producer enqueue and multi-consumer
> > > > > dequeue.
> > > > > > > But it's
> > > > > > > not preemptable. There's conversation talking about this before.
> > > > > > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > > > > > >
> > > > > > > Let's say there's two pthreads running on the same core doing
> > > enqueue
> > > > > on
> > > > > > > the
> > > > > > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread
> > > while
> > > > > it
> > > > > > > has
> > > > > > > already modified the prod.head, the 2nd pthread will spin until the 1st
> > > > > one
> > > > > > > scheduled agian. It causes time wasting. In addition, if the 2nd
> > > pthread
> > > > > has
> > > > > > > absolutely higer priority, it's more terrible.
> > > > > > >
> > > > > > > But it doesn't means we can't use. Just need to narrow down the
> > > > > situation
> > > > > > > when
> > > > > > > it's used by multi-pthread on the same core.
> > > > > > > - It CAN be used for any single-producer or single-consumer situation.
> > > > > > > - It MAY be used by multi-producer/consumer pthread whose
> > > scheduling
> > > > > > > policy
> > > > > > > are all SCHED_OTHER(cfs). User SHOULD aware of the performance
> > > > > penalty
> > > > > > > befor
> > > > > > > using it.
> > > > > > > - It MUST not be used by multi-producer/consumer pthread, while
> > > some
> > > > > of
> > > > > > > their
> > > > > > > scheduling policies is SCHED_FIFO or SCHED_RR.
> > > > > > >
> > > > > > >
> > > > > > > Performance
> > > > > > > ==============
> > > > > > >
> > > > > > > It loses performance by introducing task switching. On packet IO
> > > > > perspective,
> > > > > > > we can gain some back by improving IO effective rate. When the
> > > pthread
> > > > > do
> > > > > > > idle
> > > > > > > loop on an empty rx queue, it should proactively yield. We can also
> > > slow
> > > > > > > down
> > > > > > > rx for a bit while to take more advantage of the bulk receiving in the
> > > next
> > > > > > > loop. In practice, increase the rx ring size also helps to improve the
> > > > > overrall
> > > > > > > throughput.
> > > > > > >
> > > > > > >
> > > > > > > Cgroup Control
> > > > > > > ================
> > > > > > >
> > > > > > > Here's a simple example, there's four pthread doing packet IO on the
> > > > > same
> > > > > > > core.
> > > > > > > We expect the CPU share rate is 1:1:2:4.
> > > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk
> > > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > > > > > > > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > > > > > > > cd /sys/fs/cgroup/cpu/dpdk
> > > > > > > > echo 256 > thread0/cpu.shares
> > > > > > > > echo 256 > thread1/cpu.shares
> > > > > > > > echo 512 > thread2/cpu.shares
> > > > > > > > echo 1024 > thread3/cpu.shares
> > > > > > >
> > > > > > >
> > > > > > > -END-
> > > > > > >
> > > > > > > Any comments are welcome.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > *** BLURB HERE ***
> > > > > > >
> > > > > > > Cunming Liang (7):
> > > > > > >   eal: add linear thread id as pthread-local variable
> > > > > > >   mempool: use linear-tid as mempool cache index
> > > > > > >   ring: use linear-tid as ring debug stats index
> > > > > > >   eal: add simple API for multi-pthread
> > > > > > >   testpmd: support multi-pthread mode
> > > > > > >   sample: add new sample for multi-pthread
> > > > > > >   eal: macro for cpuset w/ or w/o CPU_ALLOC
> > > > > > >
> > > > > > >  app/test-pmd/cmdline.c                    |  41 +++++
> > > > > > >  app/test-pmd/testpmd.c                    |  84 ++++++++-
> > > > > > >  app/test-pmd/testpmd.h                    |   1 +
> > > > > > >  config/common_linuxapp                    |   1 +
> > > > > > >  examples/multi-pthread/Makefile           |  57 ++++++
> > > > > > >  examples/multi-pthread/main.c             | 232
> > > > > > ++++++++++++++++++++++++
> > > > > > >  examples/multi-pthread/main.h             |  46 +++++
> > > > > > >  lib/librte_eal/common/include/rte_eal.h   |  15 ++
> > > > > > >  lib/librte_eal/common/include/rte_lcore.h |  12 ++
> > > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 282
> > > > > > > +++++++++++++++++++++++++++---
> > > > > > >  lib/librte_mempool/rte_mempool.h          |  22 +--
> > > > > > >  lib/librte_ring/rte_ring.h                |   6 +-
> > > > > > >  12 files changed, 755 insertions(+), 44 deletions(-)
> > > > > > >  create mode 100644 examples/multi-pthread/Makefile
> > > > > > >  create mode 100644 examples/multi-pthread/main.c
> > > > > > >  create mode 100644 examples/multi-pthread/main.h
> > > > > > >
> > > > > > > --
> > > > > > > 1.8.1.4
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-19 10:03             ` Bruce Richardson
@ 2014-12-22  1:51               ` Liang, Cunming
  2014-12-22  9:46                 ` Bruce Richardson
  0 siblings, 1 reply; 37+ messages in thread
From: Liang, Cunming @ 2014-12-22  1:51 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev

...
> I'm conflicted on this one. However, I think far more applications would be
> broken
> to start having to use thread_id in place of an lcore_id than would be broken
> by having the lcore_id no longer actually correspond to a core.
> I'm actually struggling to come up with a large number of scenarios where it's
> important to an app to determine the cpu it's running on, compared to the large
> number of cases where you need to have a data-structure per thread. In DPDK
> libs
> alone, you see this assumption that lcore_id == thread_id a large number of
> times.
> 
> Despite the slight logical inconsistency, I think it's better to avoid introducing
> a thread-id and continue having lcore_id representing a unique thread.
> 
> /Bruce

Ok, I understand it. 
I list the implicit meaning if using lcore_id representing the unique thread.
1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical core id.
2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique id for thread.
3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only in CASE 1)
4). rte_lcore_id() can be used in CASE 2), but the return value no matter represent a logical core id.

If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this conclusion.

/Cunming

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-22  1:51               ` Liang, Cunming
@ 2014-12-22  9:46                 ` Bruce Richardson
  2014-12-22 10:01                   ` Walukiewicz, Miroslaw
  2014-12-22 18:28                   ` Stephen Hemminger
  0 siblings, 2 replies; 37+ messages in thread
From: Bruce Richardson @ 2014-12-22  9:46 UTC (permalink / raw)
  To: Liang, Cunming; +Cc: dev

On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> ...
> > I'm conflicted on this one. However, I think far more applications would be
> > broken
> > to start having to use thread_id in place of an lcore_id than would be broken
> > by having the lcore_id no longer actually correspond to a core.
> > I'm actually struggling to come up with a large number of scenarios where it's
> > important to an app to determine the cpu it's running on, compared to the large
> > number of cases where you need to have a data-structure per thread. In DPDK
> > libs
> > alone, you see this assumption that lcore_id == thread_id a large number of
> > times.
> > 
> > Despite the slight logical inconsistency, I think it's better to avoid introducing
> > a thread-id and continue having lcore_id representing a unique thread.
> > 
> > /Bruce
> 
> Ok, I understand it. 
> I list the implicit meaning if using lcore_id representing the unique thread.
> 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical core id.
> 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique id for thread.
> 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only in CASE 1)
> 4). rte_lcore_id() can be used in CASE 2), but the return value no matter represent a logical core id.
> 
> If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this conclusion.
> 
> /Cunming

Sorry, I don't like that suggestion either, as having lcore_id values greater
than RTE_MAX_LCORE is terrible, as how will people know how to dimension arrays
to be indexes by lcore id? Given the choice, if we are not going to just use
lcore_id as a generic thread id, which is always between 0 and RTE_MAX_LCORE
we can look to define a new thread_id variable to hold that. However, it should
have a bounded range.
>From an ease-of-porting perspective, I still think that the simplest option is to
use the existing lcore_id and accept the fact that it's now a thread id rather
than an actual physical lcore. Question is, is would that cause us lots of issues
in the future?

/Bruce

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-22  9:46                 ` Bruce Richardson
@ 2014-12-22 10:01                   ` Walukiewicz, Miroslaw
  2014-12-23  9:45                     ` Liang, Cunming
  2014-12-22 18:28                   ` Stephen Hemminger
  1 sibling, 1 reply; 37+ messages in thread
From: Walukiewicz, Miroslaw @ 2014-12-22 10:01 UTC (permalink / raw)
  To: Richardson, Bruce, Liang, Cunming; +Cc: dev

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Monday, December 22, 2014 10:46 AM
> To: Liang, Cunming
> Cc: Walukiewicz, Miroslaw; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > ...
> > > I'm conflicted on this one. However, I think far more applications would
> be
> > > broken
> > > to start having to use thread_id in place of an lcore_id than would be
> broken
> > > by having the lcore_id no longer actually correspond to a core.
> > > I'm actually struggling to come up with a large number of scenarios where
> it's
> > > important to an app to determine the cpu it's running on, compared to
> the large
> > > number of cases where you need to have a data-structure per thread. In
> DPDK
> > > libs
> > > alone, you see this assumption that lcore_id == thread_id a large number
> of
> > > times.
> > >
> > > Despite the slight logical inconsistency, I think it's better to avoid
> introducing
> > > a thread-id and continue having lcore_id representing a unique thread.
> > >
> > > /Bruce
> >
> > Ok, I understand it.
> > I list the implicit meaning if using lcore_id representing the unique thread.
> > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> core id.
> > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> unique id for thread.
> > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> only in CASE 1)
> > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> represent a logical core id.
> >
> > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> conclusion.
> >
> > /Cunming
> 
> Sorry, I don't like that suggestion either, as having lcore_id values greater
> than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> arrays
> to be indexes by lcore id? Given the choice, if we are not going to just use
> lcore_id as a generic thread id, which is always between 0 and
> RTE_MAX_LCORE
> we can look to define a new thread_id variable to hold that. However, it
> should
> have a bounded range.
> From an ease-of-porting perspective, I still think that the simplest option is to
> use the existing lcore_id and accept the fact that it's now a thread id rather
> than an actual physical lcore. Question is, is would that cause us lots of issues
> in the future?
> 
I would prefer keeping the RTE_MAX_LCORES as Bruce suggests and 
determine the HW core on base of following condition if we really have to know this.

int num_cores_online = count of cores encountered in the core mask provided by cmdline parameter

Rte_lcore_id() < num_cores_online -> physical core (pthread first started on the core)

Rte_lcore_id() >= num_cores_online -> pthread created by rte_pthread_create

Mirek

> /Bruce

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-22  9:46                 ` Bruce Richardson
  2014-12-22 10:01                   ` Walukiewicz, Miroslaw
@ 2014-12-22 18:28                   ` Stephen Hemminger
  2014-12-23  9:19                     ` Walukiewicz, Miroslaw
  2014-12-23  9:51                     ` Liang, Cunming
  1 sibling, 2 replies; 37+ messages in thread
From: Stephen Hemminger @ 2014-12-22 18:28 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Mon, 22 Dec 2014 09:46:03 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > ...
> > > I'm conflicted on this one. However, I think far more applications would be
> > > broken
> > > to start having to use thread_id in place of an lcore_id than would be broken
> > > by having the lcore_id no longer actually correspond to a core.
> > > I'm actually struggling to come up with a large number of scenarios where it's
> > > important to an app to determine the cpu it's running on, compared to the large
> > > number of cases where you need to have a data-structure per thread. In DPDK
> > > libs
> > > alone, you see this assumption that lcore_id == thread_id a large number of
> > > times.
> > > 
> > > Despite the slight logical inconsistency, I think it's better to avoid introducing
> > > a thread-id and continue having lcore_id representing a unique thread.
> > > 
> > > /Bruce
> > 
> > Ok, I understand it. 
> > I list the implicit meaning if using lcore_id representing the unique thread.
> > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical core id.
> > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique id for thread.
> > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only in CASE 1)
> > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter represent a logical core id.
> > 
> > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this conclusion.
> > 
> > /Cunming
> 
> Sorry, I don't like that suggestion either, as having lcore_id values greater
> than RTE_MAX_LCORE is terrible, as how will people know how to dimension arrays
> to be indexes by lcore id? Given the choice, if we are not going to just use
> lcore_id as a generic thread id, which is always between 0 and RTE_MAX_LCORE
> we can look to define a new thread_id variable to hold that. However, it should
> have a bounded range.
> From an ease-of-porting perspective, I still think that the simplest option is to
> use the existing lcore_id and accept the fact that it's now a thread id rather
> than an actual physical lcore. Question is, is would that cause us lots of issues
> in the future?
> 
> /Bruce

The current rte_lcore_id() has different meaning the thread. Your proposal will
break code that uses lcore_id to do per-cpu statistics and the lcore_config
code in the samples.
q

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable
  2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
  2014-12-16  7:00   ` Qiu, Michael
@ 2014-12-22 19:02   ` Ananyev, Konstantin
  2014-12-23  9:56     ` Liang, Cunming
  1 sibling, 1 reply; 37+ messages in thread
From: Ananyev, Konstantin @ 2014-12-22 19:02 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi Steve,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, December 11, 2014 2:05 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable
> 
> 
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> ---
>  lib/librte_eal/common/include/rte_eal.h   |   5 ++
>  lib/librte_eal/common/include/rte_lcore.h |  12 ++++
>  lib/librte_eal/linuxapp/eal/eal_thread.c  | 115 ++++++++++++++++++++++++++++--
>  3 files changed, 126 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index f4ecd2e..2640167 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -262,6 +262,11 @@ rte_set_application_usage_hook( rte_usage_hook_t usage_func );
>   */
>  int rte_eal_has_hugepages(void);
> 
> +#ifndef RTE_MAX_THREAD
> +#define RTE_MAX_THREAD                RTE_MAX_LCORE
> +#endif
> +
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
> index 49b2c03..cd83d47 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -73,6 +73,7 @@ struct lcore_config {
>  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
> 
>  RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
> +RTE_DECLARE_PER_LCORE(unsigned, _thread_id); /**< Per thread "linear tid". */
> 
>  /**
>   * Return the ID of the execution unit we are running on.
> @@ -86,6 +87,17 @@ rte_lcore_id(void)
>  }
> 
>  /**
> + * Return the linear thread ID of the cache unit we are running on.
> + * @return
> + *   core ID
> + */
> +static inline unsigned long
> +rte_linear_thread_id(void)
> +{
> +	return RTE_PER_LCORE(_thread_id);
> +}
> +
> +/**
>   * Get the id of the master lcore
>   *
>   * @return
> diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
> index 80a985f..52478d6 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
> @@ -39,6 +39,7 @@
>  #include <pthread.h>
>  #include <sched.h>
>  #include <sys/queue.h>
> +#include <string.h>
> 
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> @@ -51,12 +52,19 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> +#include <rte_spinlock.h>
> +#include <rte_common.h>
> 
>  #include "eal_private.h"
>  #include "eal_thread.h"
> 
> +#define LINEAR_THREAD_ID_POOL        "THREAD_ID_POOL"
> +
>  RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
> 
> +/* define linear thread id as thread-local variables */
> +RTE_DEFINE_PER_LCORE(unsigned, _thread_id);
> +
>  /*
>   * Send a message to a slave lcore identified by slave_id to call a
>   * function f with argument arg. Once the execution is done, the
> @@ -94,12 +102,13 @@ rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
>  	return 0;
>  }
> 
> +
>  /* set affinity for current thread */
>  static int
> -eal_thread_set_affinity(void)
> +__eal_thread_set_affinity(pthread_t thread, unsigned lcore)
>  {
> +
>  	int s;
> -	pthread_t thread;
> 
>  /*
>   * According to the section VERSIONS of the CPU_ALLOC man page:
> @@ -126,9 +135,8 @@ eal_thread_set_affinity(void)
> 
>  	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
>  	CPU_ZERO_S(size, cpusetp);
> -	CPU_SET_S(rte_lcore_id(), size, cpusetp);
> +	CPU_SET_S(lcore, size, cpusetp);
> 
> -	thread = pthread_self();
>  	s = pthread_setaffinity_np(thread, size, cpusetp);
>  	if (s != 0) {
>  		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> @@ -140,9 +148,8 @@ eal_thread_set_affinity(void)
>  #else /* CPU_ALLOC */
>  	cpu_set_t cpuset;
>  	CPU_ZERO( &cpuset );
> -	CPU_SET( rte_lcore_id(), &cpuset );
> +	CPU_SET(lcore, &cpuset );
> 
> -	thread = pthread_self();
>  	s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
>  	if (s != 0) {
>  		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> @@ -152,6 +159,15 @@ eal_thread_set_affinity(void)
>  	return 0;
>  }
> 
> +/* set affinity for current thread */
> +static int
> +eal_thread_set_affinity(void)
> +{
> +	pthread_t thread = pthread_self();
> +
> +	return __eal_thread_set_affinity(thread, rte_lcore_id());
> +}
> +
>  void eal_thread_init_master(unsigned lcore_id)
>  {
>  	/* set the lcore ID in per-lcore memory area */
> @@ -162,6 +178,87 @@ void eal_thread_init_master(unsigned lcore_id)
>  		rte_panic("cannot set affinity\n");
>  }
> 
> +/* linear thread id control block */
> +struct eal_thread_cb {
> +	rte_spinlock_t lock;
> +	uint64_t nb_bucket;
> +	uint64_t bitmap[0];
> +};
> +
> +static struct eal_thread_cb *
> +__create_tid_pool(void)
> +{
> +	const struct rte_memzone *mz;
> +	struct eal_thread_cb *pcb;
> +	uint64_t sz;
> +	uint64_t nb_bucket;
> +
> +	nb_bucket = RTE_ALIGN_CEIL(RTE_MAX_THREAD, 64) / 64;
> +	sz = sizeof(*pcb) + nb_bucket * sizeof(uint64_t);
> +	mz = rte_memzone_reserve(LINEAR_THREAD_ID_POOL,
> +				 sz, rte_socket_id(), 0);
> +	if (mz == NULL)
> +		rte_panic("Cannot allocate linear thread ID pool\n");
> +
> +	pcb = mz->addr;
> +	rte_spinlock_init(&pcb->lock);
> +	pcb->nb_bucket = nb_bucket;
> +	memset(pcb->bitmap, 0, nb_bucket * sizeof(uint64_t));
> +
> +	return pcb;
> +}
> +
> +static int
> +__get_linear_tid(uint64_t *tid)
> +{
> +	const struct rte_memzone *mz;
> +	struct eal_thread_cb *pcb;
> +	uint64_t i;
> +	uint8_t shift = 0;
> +
> +	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
> +	if (mz != NULL)
> +		pcb = mz->addr;
> +	else
> +		pcb = __create_tid_pool();


As I understand, __get_linear_tid() could be call concurrently from different threads?
If so, then I think we can have a race conditions here with memzone_lookup/memzone_create.
Probably the easiest way to avoid it - make sure that __create_tid_pool() will be called at startup,
when app is still single-threaded and secondary processes are still waiting for primary.
Something like create: rte_eal_tid_init() and call it  somewhere in rte_eal_init(), before rte_eal_mcfg_complete().
Konstantin

> +
> +	rte_spinlock_lock(&pcb->lock);
> +	for (i = 0; i < pcb->nb_bucket; i++) {
> +		if (pcb->bitmap[i] == (uint64_t)-1)
> +			continue;
> +		shift = 0;
> +		while (pcb->bitmap[i] & (1UL << shift))
> +			shift ++;
> +		pcb->bitmap[i] |= (1UL << shift);
> +		break;
> +	}
> +	rte_spinlock_unlock(&pcb->lock);
> +
> +	if (i == pcb->nb_bucket)
> +		return -1;
> +
> +	*tid = i * 64 + shift;
> +	return 0;
> +}
> +
> +static void __rte_unused
> +__put_linear_tid(uint64_t tid)
> +{
> +	const struct rte_memzone *mz;
> +	struct eal_thread_cb *pcb;
> +	uint8_t shift;
> +
> +	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
> +	if (!mz)
> +		return;
> +
> +	pcb = mz->addr;
> +	rte_spinlock_lock(&pcb->lock);
> +	shift = tid & 0x3F;
> +	pcb->bitmap[tid / 64] &= ~(1UL << shift);
> +	rte_spinlock_unlock(&pcb->lock);
> +}
> +
>  /* main loop of threads */
>  __attribute__((noreturn)) void *
>  eal_thread_loop(__attribute__((unused)) void *arg)
> @@ -169,6 +266,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
>  	char c;
>  	int n, ret;
>  	unsigned lcore_id;
> +	unsigned long ltid = 0;
>  	pthread_t thread_id;
>  	int m2s, s2m;
> 
> @@ -191,6 +289,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
>  	/* set the lcore ID in per-lcore memory area */
>  	RTE_PER_LCORE(_lcore_id) = lcore_id;
> 
> +	/* set the linear thread ID in per-lcore memory area */
> +	if (__get_linear_tid(&ltid) < 0)
> +		rte_panic("cannot get cache slot id\n");
> +	RTE_PER_LCORE(_thread_id) = ltid;
> +
>  	/* set CPU affinity */
>  	if (eal_thread_set_affinity() < 0)
>  		rte_panic("cannot set affinity\n");
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-22 18:28                   ` Stephen Hemminger
@ 2014-12-23  9:19                     ` Walukiewicz, Miroslaw
  2014-12-23  9:23                       ` Bruce Richardson
  2014-12-23  9:51                     ` Liang, Cunming
  1 sibling, 1 reply; 37+ messages in thread
From: Walukiewicz, Miroslaw @ 2014-12-23  9:19 UTC (permalink / raw)
  To: Stephen Hemminger, Richardson, Bruce; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Monday, December 22, 2014 7:29 PM
> To: Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> On Mon, 22 Dec 2014 09:46:03 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > ...
> > > > I'm conflicted on this one. However, I think far more applications would
> be
> > > > broken
> > > > to start having to use thread_id in place of an lcore_id than would be
> broken
> > > > by having the lcore_id no longer actually correspond to a core.
> > > > I'm actually struggling to come up with a large number of scenarios
> where it's
> > > > important to an app to determine the cpu it's running on, compared to
> the large
> > > > number of cases where you need to have a data-structure per thread.
> In DPDK
> > > > libs
> > > > alone, you see this assumption that lcore_id == thread_id a large
> number of
> > > > times.
> > > >
> > > > Despite the slight logical inconsistency, I think it's better to avoid
> introducing
> > > > a thread-id and continue having lcore_id representing a unique thread.
> > > >
> > > > /Bruce
> > >
> > > Ok, I understand it.
> > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> core id.
> > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> unique id for thread.
> > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> only in CASE 1)
> > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> represent a logical core id.
> > >
> > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> conclusion.
> > >
> > > /Cunming
> >
> > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > than RTE_MAX_LCORE is terrible, as how will people know how to
> dimension arrays
> > to be indexes by lcore id? Given the choice, if we are not going to just use
> > lcore_id as a generic thread id, which is always between 0 and
> RTE_MAX_LCORE
> > we can look to define a new thread_id variable to hold that. However, it
> should
> > have a bounded range.
> > From an ease-of-porting perspective, I still think that the simplest option is
> to
> > use the existing lcore_id and accept the fact that it's now a thread id rather
> > than an actual physical lcore. Question is, is would that cause us lots of
> issues
> > in the future?
> >
> > /Bruce
> 
> The current rte_lcore_id() has different meaning the thread. Your proposal
> will
> break code that uses lcore_id to do per-cpu statistics and the lcore_config
> code in the samples.
> q
It depends on application context and how application treats rte_lcore_id() core. When number of the threads will not exceed the number of cores (let's say old-fashioned DPDK application) all stuff like per-cpu statistics will work correctly. 

When we treat threads on cores as ordinary threads as we introducing the special function rte_pthread_create() - the meaning of rte_lcore_id() changes to indicate 
 thread number what is correct under new assumptions and new application model.

I do not  want to limit DPDK design  to only per-cpu application. There is much more application models that could be supported using DPDK. 
Current per-cpu approach is only a subset of the possible applications.

Maybe we should indicate something like CONFIG_RTE_PTHREAD_ENABLE to change a meaning of rte_lcore_id() and introducing rte_pthread_create() family. 

Mirek

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-23  9:19                     ` Walukiewicz, Miroslaw
@ 2014-12-23  9:23                       ` Bruce Richardson
  0 siblings, 0 replies; 37+ messages in thread
From: Bruce Richardson @ 2014-12-23  9:23 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw; +Cc: dev

On Tue, Dec 23, 2014 at 09:19:54AM +0000, Walukiewicz, Miroslaw wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> > Hemminger
> > Sent: Monday, December 22, 2014 7:29 PM
> > To: Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > 
> > On Mon, 22 Dec 2014 09:46:03 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > 
> > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more applications would
> > be
> > > > > broken
> > > > > to start having to use thread_id in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of scenarios
> > where it's
> > > > > important to an app to determine the cpu it's running on, compared to
> > the large
> > > > > number of cases where you need to have a data-structure per thread.
> > In DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a large
> > number of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> > only in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > > than RTE_MAX_LCORE is terrible, as how will people know how to
> > dimension arrays
> > > to be indexes by lcore id? Given the choice, if we are not going to just use
> > > lcore_id as a generic thread id, which is always between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. However, it
> > should
> > > have a bounded range.
> > > From an ease-of-porting perspective, I still think that the simplest option is
> > to
> > > use the existing lcore_id and accept the fact that it's now a thread id rather
> > > than an actual physical lcore. Question is, is would that cause us lots of
> > issues
> > > in the future?
> > >
> > > /Bruce
> > 
> > The current rte_lcore_id() has different meaning the thread. Your proposal
> > will
> > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > code in the samples.
> > q
> It depends on application context and how application treats rte_lcore_id() core. When number of the threads will not exceed the number of cores (let's say old-fashioned DPDK application) all stuff like per-cpu statistics will work correctly. 
> 
> When we treat threads on cores as ordinary threads as we introducing the special function rte_pthread_create() - the meaning of rte_lcore_id() changes to indicate 
>  thread number what is correct under new assumptions and new application model.
> 
> I do not  want to limit DPDK design  to only per-cpu application. There is much more application models that could be supported using DPDK. 
> Current per-cpu approach is only a subset of the possible applications.
> 
> Maybe we should indicate something like CONFIG_RTE_PTHREAD_ENABLE to change a meaning of rte_lcore_id() and introducing rte_pthread_create() family. 
> 
> Mirek
> 
>From the discussion it does look to me like we do need a separate thread id
value, separate from core id. Unfortunately that means that many(most?) places in libs
and examples where we use lcore_id right now, we probably need to use the new
thread id. :-(

/Bruce

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-22 10:01                   ` Walukiewicz, Miroslaw
@ 2014-12-23  9:45                     ` Liang, Cunming
  0 siblings, 0 replies; 37+ messages in thread
From: Liang, Cunming @ 2014-12-23  9:45 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, Richardson, Bruce; +Cc: dev



> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Monday, December 22, 2014 6:02 PM
> To: Richardson, Bruce; Liang, Cunming
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Monday, December 22, 2014 10:46 AM
> > To: Liang, Cunming
> > Cc: Walukiewicz, Miroslaw; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > ...
> > > > I'm conflicted on this one. However, I think far more applications would
> > be
> > > > broken
> > > > to start having to use thread_id in place of an lcore_id than would be
> > broken
> > > > by having the lcore_id no longer actually correspond to a core.
> > > > I'm actually struggling to come up with a large number of scenarios where
> > it's
> > > > important to an app to determine the cpu it's running on, compared to
> > the large
> > > > number of cases where you need to have a data-structure per thread. In
> > DPDK
> > > > libs
> > > > alone, you see this assumption that lcore_id == thread_id a large number
> > of
> > > > times.
> > > >
> > > > Despite the slight logical inconsistency, I think it's better to avoid
> > introducing
> > > > a thread-id and continue having lcore_id representing a unique thread.
> > > >
> > > > /Bruce
> > >
> > > Ok, I understand it.
> > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > core id.
> > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique id for thread.
> > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> > only in CASE 1)
> > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > represent a logical core id.
> > >
> > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> > conclusion.
> > >
> > > /Cunming
> >
> > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> > arrays
> > to be indexes by lcore id? 
[Liang, Cunming] For dimension array, we shall have RTE_MAX_THREAD_ID.
Lcore id no longer means logical core, so why still use RTE_MAX_LCORE as the dimension ?
In my previous mind, I don't expect to change lcore_config. RTE_MAX_LCORE is only used to identify the legal id for logical core.
So there's no any change when id < RTE_MAX_LCORE, while id > RTE_MAX_LCORE cause fail in lcore API.

>> Given the choice, if we are not going to just use
> > lcore_id as a generic thread id, which is always between 0 and
> > RTE_MAX_LCORE
> > we can look to define a new thread_id variable to hold that. However, it
> > should
> > have a bounded range.
[Liang, Cunming] Agree, if we merge lcore id with linear thread id, anyway we require RTE_MAX_THREAD_ID.
> > From an ease-of-porting perspective, I still think that the simplest option is to
> > use the existing lcore_id and accept the fact that it's now a thread id rather
> > than an actual physical lcore. 
[Liang, Cunming] Not sure do you means propose to extend lcore_config as a per thread context instead of per lcore ?
If accepts the fact lcore_id is now a thread id, how to make decision the physical lcore is in core mask or not ?
Question is, is would that cause us lots of issues
> > in the future?
[Liang, Cunming] Personally I don't like this way that lcore id sometimes stand for logical core id, sometimes stand for thread id.
The benefit of it looks like avoid trivial change. Actually will change the meaning of API and implement.
What I propose linear thread id is new, but we can control and estimate such limited change where it happens.
> >
> I would prefer keeping the RTE_MAX_LCORES as Bruce suggests and
> determine the HW core on base of following condition if we really have to know
> this.
> 
> int num_cores_online = count of cores encountered in the core mask provided by
> cmdline parameter
[Liang, Cunming] In this way, if we have core mask 0xf0. num_cores_online will be 4.
rte_lcore_id() value for logical core will be 0, 1, 2, 3, which is no longer 4,5,6,7.
That's probably all right if trying to give up the origin meaning of lcore_id, and change to identify a unique thread id.
But I don't think having a dynamic num_cores_online is a good idea.
If in one day, we plan to support lcore hot plug, the num_cores_online will change in the fly.
It's bad to get the id which already occupied by some thread.
> 
> Rte_lcore_id() < num_cores_online -> physical core (pthread first started on the
> core)
> 
> Rte_lcore_id() >= num_cores_online -> pthread created by rte_pthread_create
> 
> Mirek
> 
> > /Bruce

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-22 18:28                   ` Stephen Hemminger
  2014-12-23  9:19                     ` Walukiewicz, Miroslaw
@ 2014-12-23  9:51                     ` Liang, Cunming
  2015-01-08 17:05                       ` Ananyev, Konstantin
  1 sibling, 1 reply; 37+ messages in thread
From: Liang, Cunming @ 2014-12-23  9:51 UTC (permalink / raw)
  To: Stephen Hemminger, Richardson, Bruce; +Cc: dev



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, December 23, 2014 2:29 AM
> To: Richardson, Bruce
> Cc: Liang, Cunming; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> On Mon, 22 Dec 2014 09:46:03 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > ...
> > > > I'm conflicted on this one. However, I think far more applications would be
> > > > broken
> > > > to start having to use thread_id in place of an lcore_id than would be
> broken
> > > > by having the lcore_id no longer actually correspond to a core.
> > > > I'm actually struggling to come up with a large number of scenarios where
> it's
> > > > important to an app to determine the cpu it's running on, compared to the
> large
> > > > number of cases where you need to have a data-structure per thread. In
> DPDK
> > > > libs
> > > > alone, you see this assumption that lcore_id == thread_id a large number
> of
> > > > times.
> > > >
> > > > Despite the slight logical inconsistency, I think it's better to avoid
> introducing
> > > > a thread-id and continue having lcore_id representing a unique thread.
> > > >
> > > > /Bruce
> > >
> > > Ok, I understand it.
> > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> core id.
> > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique
> id for thread.
> > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only
> in CASE 1)
> > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> represent a logical core id.
> > >
> > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> conclusion.
> > >
> > > /Cunming
> >
> > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> arrays
> > to be indexes by lcore id? Given the choice, if we are not going to just use
> > lcore_id as a generic thread id, which is always between 0 and
> RTE_MAX_LCORE
> > we can look to define a new thread_id variable to hold that. However, it should
> > have a bounded range.
> > From an ease-of-porting perspective, I still think that the simplest option is to
> > use the existing lcore_id and accept the fact that it's now a thread id rather
> > than an actual physical lcore. Question is, is would that cause us lots of issues
> > in the future?
> >
> > /Bruce
> 
> The current rte_lcore_id() has different meaning the thread. Your proposal will
> break code that uses lcore_id to do per-cpu statistics and the lcore_config
> code in the samples.
> q
[Liang, Cunming] +1. 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable
  2014-12-22 19:02   ` Ananyev, Konstantin
@ 2014-12-23  9:56     ` Liang, Cunming
  0 siblings, 0 replies; 37+ messages in thread
From: Liang, Cunming @ 2014-12-23  9:56 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

Thanks Konstantin, it makes sense.

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, December 23, 2014 3:02 AM
> To: Liang, Cunming; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local
> variable
> 
> Hi Steve,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, December 11, 2014 2:05 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local
> variable
> >
> >
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > ---
> >  lib/librte_eal/common/include/rte_eal.h   |   5 ++
> >  lib/librte_eal/common/include/rte_lcore.h |  12 ++++
> >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 115
> ++++++++++++++++++++++++++++--
> >  3 files changed, 126 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/rte_eal.h
> b/lib/librte_eal/common/include/rte_eal.h
> > index f4ecd2e..2640167 100644
> > --- a/lib/librte_eal/common/include/rte_eal.h
> > +++ b/lib/librte_eal/common/include/rte_eal.h
> > @@ -262,6 +262,11 @@ rte_set_application_usage_hook( rte_usage_hook_t
> usage_func );
> >   */
> >  int rte_eal_has_hugepages(void);
> >
> > +#ifndef RTE_MAX_THREAD
> > +#define RTE_MAX_THREAD                RTE_MAX_LCORE
> > +#endif
> > +
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_eal/common/include/rte_lcore.h
> b/lib/librte_eal/common/include/rte_lcore.h
> > index 49b2c03..cd83d47 100644
> > --- a/lib/librte_eal/common/include/rte_lcore.h
> > +++ b/lib/librte_eal/common/include/rte_lcore.h
> > @@ -73,6 +73,7 @@ struct lcore_config {
> >  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
> >
> >  RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
> > +RTE_DECLARE_PER_LCORE(unsigned, _thread_id); /**< Per thread "linear tid".
> */
> >
> >  /**
> >   * Return the ID of the execution unit we are running on.
> > @@ -86,6 +87,17 @@ rte_lcore_id(void)
> >  }
> >
> >  /**
> > + * Return the linear thread ID of the cache unit we are running on.
> > + * @return
> > + *   core ID
> > + */
> > +static inline unsigned long
> > +rte_linear_thread_id(void)
> > +{
> > +	return RTE_PER_LCORE(_thread_id);
> > +}
> > +
> > +/**
> >   * Get the id of the master lcore
> >   *
> >   * @return
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c
> b/lib/librte_eal/linuxapp/eal/eal_thread.c
> > index 80a985f..52478d6 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
> > @@ -39,6 +39,7 @@
> >  #include <pthread.h>
> >  #include <sched.h>
> >  #include <sys/queue.h>
> > +#include <string.h>
> >
> >  #include <rte_debug.h>
> >  #include <rte_atomic.h>
> > @@ -51,12 +52,19 @@
> >  #include <rte_eal.h>
> >  #include <rte_per_lcore.h>
> >  #include <rte_lcore.h>
> > +#include <rte_spinlock.h>
> > +#include <rte_common.h>
> >
> >  #include "eal_private.h"
> >  #include "eal_thread.h"
> >
> > +#define LINEAR_THREAD_ID_POOL        "THREAD_ID_POOL"
> > +
> >  RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
> >
> > +/* define linear thread id as thread-local variables */
> > +RTE_DEFINE_PER_LCORE(unsigned, _thread_id);
> > +
> >  /*
> >   * Send a message to a slave lcore identified by slave_id to call a
> >   * function f with argument arg. Once the execution is done, the
> > @@ -94,12 +102,13 @@ rte_eal_remote_launch(int (*f)(void *), void *arg,
> unsigned slave_id)
> >  	return 0;
> >  }
> >
> > +
> >  /* set affinity for current thread */
> >  static int
> > -eal_thread_set_affinity(void)
> > +__eal_thread_set_affinity(pthread_t thread, unsigned lcore)
> >  {
> > +
> >  	int s;
> > -	pthread_t thread;
> >
> >  /*
> >   * According to the section VERSIONS of the CPU_ALLOC man page:
> > @@ -126,9 +135,8 @@ eal_thread_set_affinity(void)
> >
> >  	size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
> >  	CPU_ZERO_S(size, cpusetp);
> > -	CPU_SET_S(rte_lcore_id(), size, cpusetp);
> > +	CPU_SET_S(lcore, size, cpusetp);
> >
> > -	thread = pthread_self();
> >  	s = pthread_setaffinity_np(thread, size, cpusetp);
> >  	if (s != 0) {
> >  		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> > @@ -140,9 +148,8 @@ eal_thread_set_affinity(void)
> >  #else /* CPU_ALLOC */
> >  	cpu_set_t cpuset;
> >  	CPU_ZERO( &cpuset );
> > -	CPU_SET( rte_lcore_id(), &cpuset );
> > +	CPU_SET(lcore, &cpuset );
> >
> > -	thread = pthread_self();
> >  	s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
> >  	if (s != 0) {
> >  		RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> > @@ -152,6 +159,15 @@ eal_thread_set_affinity(void)
> >  	return 0;
> >  }
> >
> > +/* set affinity for current thread */
> > +static int
> > +eal_thread_set_affinity(void)
> > +{
> > +	pthread_t thread = pthread_self();
> > +
> > +	return __eal_thread_set_affinity(thread, rte_lcore_id());
> > +}
> > +
> >  void eal_thread_init_master(unsigned lcore_id)
> >  {
> >  	/* set the lcore ID in per-lcore memory area */
> > @@ -162,6 +178,87 @@ void eal_thread_init_master(unsigned lcore_id)
> >  		rte_panic("cannot set affinity\n");
> >  }
> >
> > +/* linear thread id control block */
> > +struct eal_thread_cb {
> > +	rte_spinlock_t lock;
> > +	uint64_t nb_bucket;
> > +	uint64_t bitmap[0];
> > +};
> > +
> > +static struct eal_thread_cb *
> > +__create_tid_pool(void)
> > +{
> > +	const struct rte_memzone *mz;
> > +	struct eal_thread_cb *pcb;
> > +	uint64_t sz;
> > +	uint64_t nb_bucket;
> > +
> > +	nb_bucket = RTE_ALIGN_CEIL(RTE_MAX_THREAD, 64) / 64;
> > +	sz = sizeof(*pcb) + nb_bucket * sizeof(uint64_t);
> > +	mz = rte_memzone_reserve(LINEAR_THREAD_ID_POOL,
> > +				 sz, rte_socket_id(), 0);
> > +	if (mz == NULL)
> > +		rte_panic("Cannot allocate linear thread ID pool\n");
> > +
> > +	pcb = mz->addr;
> > +	rte_spinlock_init(&pcb->lock);
> > +	pcb->nb_bucket = nb_bucket;
> > +	memset(pcb->bitmap, 0, nb_bucket * sizeof(uint64_t));
> > +
> > +	return pcb;
> > +}
> > +
> > +static int
> > +__get_linear_tid(uint64_t *tid)
> > +{
> > +	const struct rte_memzone *mz;
> > +	struct eal_thread_cb *pcb;
> > +	uint64_t i;
> > +	uint8_t shift = 0;
> > +
> > +	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
> > +	if (mz != NULL)
> > +		pcb = mz->addr;
> > +	else
> > +		pcb = __create_tid_pool();
> 
> 
> As I understand, __get_linear_tid() could be call concurrently from different
> threads?
> If so, then I think we can have a race conditions here with
> memzone_lookup/memzone_create.
> Probably the easiest way to avoid it - make sure that __create_tid_pool() will be
> called at startup,
> when app is still single-threaded and secondary processes are still waiting for
> primary.
> Something like create: rte_eal_tid_init() and call it  somewhere in rte_eal_init(),
> before rte_eal_mcfg_complete().
> Konstantin
> 
> > +
> > +	rte_spinlock_lock(&pcb->lock);
> > +	for (i = 0; i < pcb->nb_bucket; i++) {
> > +		if (pcb->bitmap[i] == (uint64_t)-1)
> > +			continue;
> > +		shift = 0;
> > +		while (pcb->bitmap[i] & (1UL << shift))
> > +			shift ++;
> > +		pcb->bitmap[i] |= (1UL << shift);
> > +		break;
> > +	}
> > +	rte_spinlock_unlock(&pcb->lock);
> > +
> > +	if (i == pcb->nb_bucket)
> > +		return -1;
> > +
> > +	*tid = i * 64 + shift;
> > +	return 0;
> > +}
> > +
> > +static void __rte_unused
> > +__put_linear_tid(uint64_t tid)
> > +{
> > +	const struct rte_memzone *mz;
> > +	struct eal_thread_cb *pcb;
> > +	uint8_t shift;
> > +
> > +	mz = rte_memzone_lookup(LINEAR_THREAD_ID_POOL);
> > +	if (!mz)
> > +		return;
> > +
> > +	pcb = mz->addr;
> > +	rte_spinlock_lock(&pcb->lock);
> > +	shift = tid & 0x3F;
> > +	pcb->bitmap[tid / 64] &= ~(1UL << shift);
> > +	rte_spinlock_unlock(&pcb->lock);
> > +}
> > +
> >  /* main loop of threads */
> >  __attribute__((noreturn)) void *
> >  eal_thread_loop(__attribute__((unused)) void *arg)
> > @@ -169,6 +266,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
> >  	char c;
> >  	int n, ret;
> >  	unsigned lcore_id;
> > +	unsigned long ltid = 0;
> >  	pthread_t thread_id;
> >  	int m2s, s2m;
> >
> > @@ -191,6 +289,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
> >  	/* set the lcore ID in per-lcore memory area */
> >  	RTE_PER_LCORE(_lcore_id) = lcore_id;
> >
> > +	/* set the linear thread ID in per-lcore memory area */
> > +	if (__get_linear_tid(&ltid) < 0)
> > +		rte_panic("cannot get cache slot id\n");
> > +	RTE_PER_LCORE(_thread_id) = ltid;
> > +
> >  	/* set CPU affinity */
> >  	if (eal_thread_set_affinity() < 0)
> >  		rte_panic("cannot set affinity\n");
> > --
> > 1.8.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2014-12-23  9:51                     ` Liang, Cunming
@ 2015-01-08 17:05                       ` Ananyev, Konstantin
  2015-01-08 17:23                         ` Richardson, Bruce
                                           ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Ananyev, Konstantin @ 2015-01-08 17:05 UTC (permalink / raw)
  To: Liang, Cunming, Stephen Hemminger, Richardson, Bruce; +Cc: dev


Hi Steve,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming
> Sent: Tuesday, December 23, 2014 9:52 AM
> To: Stephen Hemminger; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Tuesday, December 23, 2014 2:29 AM
> > To: Richardson, Bruce
> > Cc: Liang, Cunming; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > On Mon, 22 Dec 2014 09:46:03 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more applications would be
> > > > > broken
> > > > > to start having to use thread_id in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of scenarios where
> > it's
> > > > > important to an app to determine the cpu it's running on, compared to the
> > large
> > > > > number of cases where you need to have a data-structure per thread. In
> > DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a large number
> > of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique
> > id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only
> > in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > > than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> > arrays
> > > to be indexes by lcore id? Given the choice, if we are not going to just use
> > > lcore_id as a generic thread id, which is always between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. However, it should
> > > have a bounded range.
> > > From an ease-of-porting perspective, I still think that the simplest option is to
> > > use the existing lcore_id and accept the fact that it's now a thread id rather
> > > than an actual physical lcore. Question is, is would that cause us lots of issues
> > > in the future?
> > >
> > > /Bruce
> >
> > The current rte_lcore_id() has different meaning the thread. Your proposal will
> > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > code in the samples.
> > q
> [Liang, Cunming] +1.

Few more thoughts on that subject:

Actually one more place in the lib, where lcore_id is used (and it should be unique):
rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
So if we going to replace lcore_id with thread_id as uniques thread index, then these functions
have to be updated too.

About maintaining our own unique thread_id inside shared memory (_get_linear_tid()/_put_linear_tid()).
There is one thing that worries me with that approach:
In case of abnormal process termination, TIDs used by that process will remain 'reserved'
and there is no way to know which TIDs were used by terminated process.
So there could be a situation with DPDK multi-process model,
when after secondary process abnormal termination, It wouldn't be possible to restart it -
we just run out of 'free' TIDs. 
 
Which makes me think probably there is no need to introduce new globally unique 'thread_id'?
Might be just lcore_id is enough?  
As Mirek and Bruce suggested we can treat it a sort of 'unique thread id' inside EAL.
Or as 'virtual' core id that can run on set of physical cpus, and these subsets for different 'virtual' cores can intersect.
Then basically we can keep legacy behaviour with '-c <lcores_mask>,' where each
lcore_id matches one to one  with physical cpu, and introduce new one, something like:
--lcores='(<lcore_set1>)=(<phys_cpu_set1>),..(<lcore_setN)=(<phys_cpu_setN>)'.
So let say: --lcores=(0-7)=(0,2-4),(10)=(7),(8)=(all)' would mean:
Create 10 EAL threads, bind threads with clore_id=[0-7] to cpuset: <0,2,3,4>, 
thread  with lcore_id=10 is binded to  cpu 7, and allow to run lcore_id=8 on any cpu in the system.    
Of course '-c' and '-lcores' would be mutually exclusive, and we will need to update  rte_lcore_to_socket_id()
and introduce: rte_lcore_(set|get)_affinity().

Does it make sense to you?

BTW, one more thing: while we are on it  - it is probably a good time to do something with our interrupt thread?
It is a bit strange that we can't use rte_pktmbuf_free() or  rte_spinlock_recursive_lock() from our own interrupt/alarm handlers

Konstantin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2015-01-08 17:05                       ` Ananyev, Konstantin
@ 2015-01-08 17:23                         ` Richardson, Bruce
  2015-01-09  9:51                           ` Liang, Cunming
  2015-01-09  9:40                         ` Liang, Cunming
  2015-01-09  9:45                         ` Liang, Cunming
  2 siblings, 1 reply; 37+ messages in thread
From: Richardson, Bruce @ 2015-01-08 17:23 UTC (permalink / raw)
  To: Ananyev, Konstantin, Liang, Cunming, Stephen Hemminger; +Cc: dev

My opinion on this is that the lcore_id is rarely (if ever) used to find the actual core a thread is being run on. Instead it is used 99% of the time as a unique array index per thread, and therefore that we can keep that usage by just assigning a valid lcore_id to any extra threads created. The suggestion to get/set affinities on top of that seems a good one to me also.

/Bruce

-----Original Message-----
From: Ananyev, Konstantin 
Sent: Thursday, January 8, 2015 5:06 PM
To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore


Hi Steve,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming
> Sent: Tuesday, December 23, 2014 9:52 AM
> To: Stephen Hemminger; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> lcore
> 
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Tuesday, December 23, 2014 2:29 AM
> > To: Richardson, Bruce
> > Cc: Liang, Cunming; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> > lcore
> >
> > On Mon, 22 Dec 2014 09:46:03 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more 
> > > > > applications would be broken to start having to use thread_id 
> > > > > in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of 
> > > > > scenarios where
> > it's
> > > > > important to an app to determine the cpu it's running on, 
> > > > > compared to the
> > large
> > > > > number of cases where you need to have a data-structure per 
> > > > > thread. In
> > DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a 
> > > > > large number
> > of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better 
> > > > > to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents 
> > > > the logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents 
> > > > an unique
> > id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest 
> > > > to be used only
> > in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value 
> > > > no matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 
> > > > base on this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id 
> > > values greater than RTE_MAX_LCORE is terrible, as how will people 
> > > know how to dimension
> > arrays
> > > to be indexes by lcore id? Given the choice, if we are not going 
> > > to just use lcore_id as a generic thread id, which is always 
> > > between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. 
> > > However, it should have a bounded range.
> > > From an ease-of-porting perspective, I still think that the 
> > > simplest option is to use the existing lcore_id and accept the 
> > > fact that it's now a thread id rather than an actual physical 
> > > lcore. Question is, is would that cause us lots of issues in the future?
> > >
> > > /Bruce
> >
> > The current rte_lcore_id() has different meaning the thread. Your 
> > proposal will break code that uses lcore_id to do per-cpu statistics 
> > and the lcore_config code in the samples.
> > q
> [Liang, Cunming] +1.

Few more thoughts on that subject:

Actually one more place in the lib, where lcore_id is used (and it should be unique):
rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
So if we going to replace lcore_id with thread_id as uniques thread index, then these functions have to be updated too.

About maintaining our own unique thread_id inside shared memory (_get_linear_tid()/_put_linear_tid()).
There is one thing that worries me with that approach:
In case of abnormal process termination, TIDs used by that process will remain 'reserved'
and there is no way to know which TIDs were used by terminated process.
So there could be a situation with DPDK multi-process model, when after secondary process abnormal termination, It wouldn't be possible to restart it - we just run out of 'free' TIDs. 
 
Which makes me think probably there is no need to introduce new globally unique 'thread_id'?
Might be just lcore_id is enough?  
As Mirek and Bruce suggested we can treat it a sort of 'unique thread id' inside EAL.
Or as 'virtual' core id that can run on set of physical cpus, and these subsets for different 'virtual' cores can intersect.
Then basically we can keep legacy behaviour with '-c <lcores_mask>,' where each lcore_id matches one to one  with physical cpu, and introduce new one, something like:
--lcores='(<lcore_set1>)=(<phys_cpu_set1>),..(<lcore_setN)=(<phys_cpu_setN>)'.
So let say: --lcores=(0-7)=(0,2-4),(10)=(7),(8)=(all)' would mean:
Create 10 EAL threads, bind threads with clore_id=[0-7] to cpuset: <0,2,3,4>, 
thread  with lcore_id=10 is binded to  cpu 7, and allow to run lcore_id=8 on any cpu in the system.    
Of course '-c' and '-lcores' would be mutually exclusive, and we will need to update  rte_lcore_to_socket_id() and introduce: rte_lcore_(set|get)_affinity().

Does it make sense to you?

BTW, one more thing: while we are on it  - it is probably a good time to do something with our interrupt thread?
It is a bit strange that we can't use rte_pktmbuf_free() or  rte_spinlock_recursive_lock() from our own interrupt/alarm handlers

Konstantin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2015-01-08 17:05                       ` Ananyev, Konstantin
  2015-01-08 17:23                         ` Richardson, Bruce
@ 2015-01-09  9:40                         ` Liang, Cunming
  2015-01-09 11:52                           ` Ananyev, Konstantin
  2015-01-09  9:45                         ` Liang, Cunming
  2 siblings, 1 reply; 37+ messages in thread
From: Liang, Cunming @ 2015-01-09  9:40 UTC (permalink / raw)
  To: Ananyev, Konstantin, Stephen Hemminger, Richardson, Bruce; +Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, January 09, 2015 1:06 AM
> To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> Hi Steve,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming
> > Sent: Tuesday, December 23, 2014 9:52 AM
> > To: Stephen Hemminger; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > To: Richardson, Bruce
> > > Cc: Liang, Cunming; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > On Mon, 22 Dec 2014 09:46:03 +0000
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >
> > > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > > ...
> > > > > > I'm conflicted on this one. However, I think far more applications would
> be
> > > > > > broken
> > > > > > to start having to use thread_id in place of an lcore_id than would be
> > > broken
> > > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > > I'm actually struggling to come up with a large number of scenarios
> where
> > > it's
> > > > > > important to an app to determine the cpu it's running on, compared to
> the
> > > large
> > > > > > number of cases where you need to have a data-structure per thread.
> In
> > > DPDK
> > > > > > libs
> > > > > > alone, you see this assumption that lcore_id == thread_id a large
> number
> > > of
> > > > > > times.
> > > > > >
> > > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > > introducing
> > > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > Ok, I understand it.
> > > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > > core id.
> > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> unique
> > > id for thread.
> > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> only
> > > in CASE 1)
> > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > > represent a logical core id.
> > > > >
> > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> > > conclusion.
> > > > >
> > > > > /Cunming
> > > >
> > > > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > > > than RTE_MAX_LCORE is terrible, as how will people know how to
> dimension
> > > arrays
> > > > to be indexes by lcore id? Given the choice, if we are not going to just use
> > > > lcore_id as a generic thread id, which is always between 0 and
> > > RTE_MAX_LCORE
> > > > we can look to define a new thread_id variable to hold that. However, it
> should
> > > > have a bounded range.
> > > > From an ease-of-porting perspective, I still think that the simplest option is
> to
> > > > use the existing lcore_id and accept the fact that it's now a thread id rather
> > > > than an actual physical lcore. Question is, is would that cause us lots of
> issues
> > > > in the future?
> > > >
> > > > /Bruce
> > >
> > > The current rte_lcore_id() has different meaning the thread. Your proposal
> will
> > > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > > code in the samples.
> > > q
> > [Liang, Cunming] +1.
> 
> Few more thoughts on that subject:
> 
> Actually one more place in the lib, where lcore_id is used (and it should be
> unique):
> rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
> So if we going to replace lcore_id with thread_id as uniques thread index, then
> these functions
> have to be updated too.
[Liang, Cunming] You're right, if deciding to use thread_id, we have to check and replace 
rte_lcore_id()/RTE_PER_LCORE(_lcore_id) on all the impact place.
Now I'm buying the proposal to keep using rte_lcore_id() to return the 
unique id. Meanwhile I think it's necessary to have real cpu id.
It's helpful in NUMA socket checking. 
I will provide new API rte_curr_cpu() to return the runtime cpu no matter 
the thread running in coremasked or non-coremasked cpu.
So the socket info stored in lcore_config still useful  to choose the local socket.
> 
> About maintaining our own unique thread_id inside shared memory
> (_get_linear_tid()/_put_linear_tid()).
> There is one thing that worries me with that approach:
> In case of abnormal process termination, TIDs used by that process will remain
> 'reserved'
> and there is no way to know which TIDs were used by terminated process.
> So there could be a situation with DPDK multi-process model,
> when after secondary process abnormal termination, It wouldn't be possible to
> restart it -
> we just run out of 'free' TIDs.
[Liang, Cunming] That's a good point I think. I think it's not only for thread id but 
for all the dynamic allocated resource (e.g. memzone, mempool).
we haven't a garbage collection or heartbeat to process the secondary abnormal exit.

> 
> Which makes me think probably there is no need to introduce new globally
> unique 'thread_id'?
> Might be just lcore_id is enough?
> As Mirek and Bruce suggested we can treat it a sort of 'unique thread id' inside
> EAL.
[Liang, Cunming] I think we'd better have two, one for 'unique thread id', one for real cpu id.
No matter which of them are named lcore_id/thread_id/cpu_id and etc.
For cpu id, we need to check/get the NUMA info.
Pthread may migrate from one core to another, the thread 'socket id' may change, 
The per cpu socket info we have them in lcore_config.

> Or as 'virtual' core id that can run on set of physical cpus, and these subsets for
> different 'virtual' cores can intersect.
> Then basically we can keep legacy behaviour with '-c <lcores_mask>,' where each
> lcore_id matches one to one  with physical cpu, and introduce new one,
> something like:
> --
> lcores='(<lcore_set1>)=(<phys_cpu_set1>),..(<lcore_setN)=(<phys_cpu_setN>)'.
> So let say: --lcores=(0-7)=(0,2-4),(10)=(7),(8)=(all)' would mean:
> Create 10 EAL threads, bind threads with clore_id=[0-7] to cpuset: <0,2,3,4>,
> thread  with lcore_id=10 is binded to  cpu 7, and allow to run lcore_id=8 on any
> cpu in the system.
> Of course '-c' and '-lcores' would be mutually exclusive, and we will need to
> update  rte_lcore_to_socket_id()
> and introduce: rte_lcore_(set|get)_affinity().
> 
> Does it make sense to you?
[Liang, Cunming] If assign lcore_id during the command line, user have to handle 
the conflict for '-c' and '--lcores'. 
In this cases, if lcore_id 0~10 is occupied, the coremasked thread start from 11 ?
In case, application create a new pthread during the runtime.
As there's no lcore id belongs to the new thread mentioned in the command line, it then still back to dynamic allocate.
I means on the startup, user may have no idea of how much pthread they will run.

'rte_pthread_assign_lcore' do the things as 'rte_lcore_(set|get)_affinity()'
If we keeping using lcore_id, I like the name you proposed.

I'll send my code update on next Monday.

> 
> BTW, one more thing: while we are on it  - it is probably a good time to do
> something with our interrupt thread?
> It is a bit strange that we can't use rte_pktmbuf_free() or
> rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> 
> Konstantin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2015-01-08 17:05                       ` Ananyev, Konstantin
  2015-01-08 17:23                         ` Richardson, Bruce
  2015-01-09  9:40                         ` Liang, Cunming
@ 2015-01-09  9:45                         ` Liang, Cunming
  2 siblings, 0 replies; 37+ messages in thread
From: Liang, Cunming @ 2015-01-09  9:45 UTC (permalink / raw)
  To: Ananyev, Konstantin, Stephen Hemminger, Richardson, Bruce; +Cc: dev

> 
> BTW, one more thing: while we are on it  - it is probably a good time to do
> something with our interrupt thread?
> It is a bit strange that we can't use rte_pktmbuf_free() or
> rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> 
> Konstantin
[Liang, Cunming] I'll think about it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2015-01-08 17:23                         ` Richardson, Bruce
@ 2015-01-09  9:51                           ` Liang, Cunming
  0 siblings, 0 replies; 37+ messages in thread
From: Liang, Cunming @ 2015-01-09  9:51 UTC (permalink / raw)
  To: Richardson, Bruce, Ananyev, Konstantin, Stephen Hemminger; +Cc: dev

I see. Will update soon.
Thanks for all the comments.

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Friday, January 09, 2015 1:24 AM
> To: Ananyev, Konstantin; Liang, Cunming; Stephen Hemminger
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> My opinion on this is that the lcore_id is rarely (if ever) used to find the actual
> core a thread is being run on. Instead it is used 99% of the time as a unique array
> index per thread, and therefore that we can keep that usage by just assigning a
> valid lcore_id to any extra threads created. The suggestion to get/set affinities on
> top of that seems a good one to me also.
> 
> /Bruce
> 
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, January 8, 2015 5:06 PM
> To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> Hi Steve,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming
> > Sent: Tuesday, December 23, 2014 9:52 AM
> > To: Stephen Hemminger; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > lcore
> >
> >
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > To: Richardson, Bruce
> > > Cc: Liang, Cunming; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > > lcore
> > >
> > > On Mon, 22 Dec 2014 09:46:03 +0000
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >
> > > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > > ...
> > > > > > I'm conflicted on this one. However, I think far more
> > > > > > applications would be broken to start having to use thread_id
> > > > > > in place of an lcore_id than would be
> > > broken
> > > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > > I'm actually struggling to come up with a large number of
> > > > > > scenarios where
> > > it's
> > > > > > important to an app to determine the cpu it's running on,
> > > > > > compared to the
> > > large
> > > > > > number of cases where you need to have a data-structure per
> > > > > > thread. In
> > > DPDK
> > > > > > libs
> > > > > > alone, you see this assumption that lcore_id == thread_id a
> > > > > > large number
> > > of
> > > > > > times.
> > > > > >
> > > > > > Despite the slight logical inconsistency, I think it's better
> > > > > > to avoid
> > > introducing
> > > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > Ok, I understand it.
> > > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents
> > > > > the logical
> > > core id.
> > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents
> > > > > an unique
> > > id for thread.
> > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest
> > > > > to be used only
> > > in CASE 1)
> > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value
> > > > > no matter
> > > represent a logical core id.
> > > > >
> > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2
> > > > > base on this
> > > conclusion.
> > > > >
> > > > > /Cunming
> > > >
> > > > Sorry, I don't like that suggestion either, as having lcore_id
> > > > values greater than RTE_MAX_LCORE is terrible, as how will people
> > > > know how to dimension
> > > arrays
> > > > to be indexes by lcore id? Given the choice, if we are not going
> > > > to just use lcore_id as a generic thread id, which is always
> > > > between 0 and
> > > RTE_MAX_LCORE
> > > > we can look to define a new thread_id variable to hold that.
> > > > However, it should have a bounded range.
> > > > From an ease-of-porting perspective, I still think that the
> > > > simplest option is to use the existing lcore_id and accept the
> > > > fact that it's now a thread id rather than an actual physical
> > > > lcore. Question is, is would that cause us lots of issues in the future?
> > > >
> > > > /Bruce
> > >
> > > The current rte_lcore_id() has different meaning the thread. Your
> > > proposal will break code that uses lcore_id to do per-cpu statistics
> > > and the lcore_config code in the samples.
> > > q
> > [Liang, Cunming] +1.
> 
> Few more thoughts on that subject:
> 
> Actually one more place in the lib, where lcore_id is used (and it should be
> unique):
> rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
> So if we going to replace lcore_id with thread_id as uniques thread index, then
> these functions have to be updated too.
> 
> About maintaining our own unique thread_id inside shared memory
> (_get_linear_tid()/_put_linear_tid()).
> There is one thing that worries me with that approach:
> In case of abnormal process termination, TIDs used by that process will remain
> 'reserved'
> and there is no way to know which TIDs were used by terminated process.
> So there could be a situation with DPDK multi-process model, when after
> secondary process abnormal termination, It wouldn't be possible to restart it - we
> just run out of 'free' TIDs.
> 
> Which makes me think probably there is no need to introduce new globally
> unique 'thread_id'?
> Might be just lcore_id is enough?
> As Mirek and Bruce suggested we can treat it a sort of 'unique thread id' inside
> EAL.
> Or as 'virtual' core id that can run on set of physical cpus, and these subsets for
> different 'virtual' cores can intersect.
> Then basically we can keep legacy behaviour with '-c <lcores_mask>,' where each
> lcore_id matches one to one  with physical cpu, and introduce new one,
> something like:
> --
> lcores='(<lcore_set1>)=(<phys_cpu_set1>),..(<lcore_setN)=(<phys_cpu_setN>)'.
> So let say: --lcores=(0-7)=(0,2-4),(10)=(7),(8)=(all)' would mean:
> Create 10 EAL threads, bind threads with clore_id=[0-7] to cpuset: <0,2,3,4>,
> thread  with lcore_id=10 is binded to  cpu 7, and allow to run lcore_id=8 on any
> cpu in the system.
> Of course '-c' and '-lcores' would be mutually exclusive, and we will need to
> update  rte_lcore_to_socket_id() and introduce: rte_lcore_(set|get)_affinity().
> 
> Does it make sense to you?
> 
> BTW, one more thing: while we are on it  - it is probably a good time to do
> something with our interrupt thread?
> It is a bit strange that we can't use rte_pktmbuf_free() or
> rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> 
> Konstantin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
  2015-01-09  9:40                         ` Liang, Cunming
@ 2015-01-09 11:52                           ` Ananyev, Konstantin
  0 siblings, 0 replies; 37+ messages in thread
From: Ananyev, Konstantin @ 2015-01-09 11:52 UTC (permalink / raw)
  To: Liang, Cunming, Stephen Hemminger, Richardson, Bruce; +Cc: dev



> -----Original Message-----
> From: Liang, Cunming
> Sent: Friday, January 09, 2015 9:41 AM
> To: Ananyev, Konstantin; Stephen Hemminger; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, January 09, 2015 1:06 AM
> > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Hi Steve,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming
> > > Sent: Tuesday, December 23, 2014 9:52 AM
> > > To: Stephen Hemminger; Richardson, Bruce
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > > To: Richardson, Bruce
> > > > Cc: Liang, Cunming; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > On Mon, 22 Dec 2014 09:46:03 +0000
> > > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > >
> > > > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > > > ...
> > > > > > > I'm conflicted on this one. However, I think far more applications would
> > be
> > > > > > > broken
> > > > > > > to start having to use thread_id in place of an lcore_id than would be
> > > > broken
> > > > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > > > I'm actually struggling to come up with a large number of scenarios
> > where
> > > > it's
> > > > > > > important to an app to determine the cpu it's running on, compared to
> > the
> > > > large
> > > > > > > number of cases where you need to have a data-structure per thread.
> > In
> > > > DPDK
> > > > > > > libs
> > > > > > > alone, you see this assumption that lcore_id == thread_id a large
> > number
> > > > of
> > > > > > > times.
> > > > > > >
> > > > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > > > introducing
> > > > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > Ok, I understand it.
> > > > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > > > core id.
> > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique
> > > > id for thread.
> > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> > only
> > > > in CASE 1)
> > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > > > represent a logical core id.
> > > > > >
> > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> > > > conclusion.
> > > > > >
> > > > > > /Cunming
> > > > >
> > > > > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > > > > than RTE_MAX_LCORE is terrible, as how will people know how to
> > dimension
> > > > arrays
> > > > > to be indexes by lcore id? Given the choice, if we are not going to just use
> > > > > lcore_id as a generic thread id, which is always between 0 and
> > > > RTE_MAX_LCORE
> > > > > we can look to define a new thread_id variable to hold that. However, it
> > should
> > > > > have a bounded range.
> > > > > From an ease-of-porting perspective, I still think that the simplest option is
> > to
> > > > > use the existing lcore_id and accept the fact that it's now a thread id rather
> > > > > than an actual physical lcore. Question is, is would that cause us lots of
> > issues
> > > > > in the future?
> > > > >
> > > > > /Bruce
> > > >
> > > > The current rte_lcore_id() has different meaning the thread. Your proposal
> > will
> > > > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > > > code in the samples.
> > > > q
> > > [Liang, Cunming] +1.
> >
> > Few more thoughts on that subject:
> >
> > Actually one more place in the lib, where lcore_id is used (and it should be
> > unique):
> > rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
> > So if we going to replace lcore_id with thread_id as uniques thread index, then
> > these functions
> > have to be updated too.
> [Liang, Cunming] You're right, if deciding to use thread_id, we have to check and replace
> rte_lcore_id()/RTE_PER_LCORE(_lcore_id) on all the impact place.
> Now I'm buying the proposal to keep using rte_lcore_id() to return the
> unique id. Meanwhile I think it's necessary to have real cpu id.
> It's helpful in NUMA socket checking.
> I will provide new API rte_curr_cpu() to return the runtime cpu no matter
> the thread running in coremasked or non-coremasked cpu.
> So the socket info stored in lcore_config still useful  to choose the local socket.
> >
> > About maintaining our own unique thread_id inside shared memory
> > (_get_linear_tid()/_put_linear_tid()).
> > There is one thing that worries me with that approach:
> > In case of abnormal process termination, TIDs used by that process will remain
> > 'reserved'
> > and there is no way to know which TIDs were used by terminated process.
> > So there could be a situation with DPDK multi-process model,
> > when after secondary process abnormal termination, It wouldn't be possible to
> > restart it -
> > we just run out of 'free' TIDs.
> [Liang, Cunming] That's a good point I think. I think it's not only for thread id but
> for all the dynamic allocated resource (e.g. memzone, mempool).
> we haven't a garbage collection or heartbeat to process the secondary abnormal exit.

Of course some dynamically allocated meory could be unclaimed in that case.
But right now, at least you can restart the child process.
What I am saying - we probably better avoid managing our own TIDs dynamically at all.  

> 
> >
> > Which makes me think probably there is no need to introduce new globally
> > unique 'thread_id'?
> > Might be just lcore_id is enough?
> > As Mirek and Bruce suggested we can treat it a sort of 'unique thread id' inside
> > EAL.
> [Liang, Cunming] I think we'd better have two, one for 'unique thread id', one for real cpu id.
> No matter which of them are named lcore_id/thread_id/cpu_id and etc.

As I understand, the goal is to be a be to run multiple EAL threads on multiple physical cpus.
So each thread could run on multiple cpus, i.e - there would be no one to one match
between lcore_id(thread_id) and cpu_id.
That's why I think we  need to:  
Introduce rte_lcore_get_affinity(lcore_id) - that would return cpuset for given lcore.   
Update rte_lcore_to_socket_id(lcore_id) - it would check if all cpus that lcore is allegeable
to run belong to the same socket.
If yes that socket_id will be returned, if no SOCKET_ID_ANY.

> For cpu id, we need to check/get the NUMA info.
> Pthread may migrate from one core to another, the thread 'socket id' may change,
> The per cpu socket info we have them in lcore_config.
> 
> > Or as 'virtual' core id that can run on set of physical cpus, and these subsets for
> > different 'virtual' cores can intersect.
> > Then basically we can keep legacy behaviour with '-c <lcores_mask>,' where each
> > lcore_id matches one to one  with physical cpu, and introduce new one,
> > something like:
> > --
> > lcores='(<lcore_set1>)=(<phys_cpu_set1>),..(<lcore_setN)=(<phys_cpu_setN>)'.
> > So let say: --lcores=(0-7)=(0,2-4),(10)=(7),(8)=(all)' would mean:
> > Create 10 EAL threads, bind threads with clore_id=[0-7] to cpuset: <0,2,3,4>,
> > thread  with lcore_id=10 is binded to  cpu 7, and allow to run lcore_id=8 on any
> > cpu in the system.
> > Of course '-c' and '-lcores' would be mutually exclusive, and we will need to
> > update  rte_lcore_to_socket_id()
> > and introduce: rte_lcore_(set|get)_affinity().
> >
> > Does it make sense to you?
> [Liang, Cunming] If assign lcore_id during the command line, user have to handle
> the conflict for '-c' and '--lcores'.
> In this cases, if lcore_id 0~10 is occupied, the coremasked thread start from 11 ?

As I said above: " Of course '-c' and '-lcores' would be mutually exclusive".

> In case, application create a new pthread during the runtime.
> As there's no lcore id belongs to the new thread mentioned in the command line, it then still back to dynamic allocate.
> I means on the startup, user may have no idea of how much pthread they will run.

I think you are mixing 2 different tasks here:
1. Allow EAL threads (lcores) to be run run on set of physical cpus (not just one), and these subsets for
different lcores can be intersectable.
2. Allow dynamically created threads to call EAL functions (rte_mempool, rte_recursive_lock, rte_timer, etc).

My understanding was that our goal here is task #1.
For #1 - I think what I proposed above is enough.

Though, if our goal is #2 - it is a different story.
In that case, I think we shouldn't manage unique TID ourselves.
We are not OS, and on the app level it would quite complicated to implement it
in a robust way with current DPDK multi-process model.   
Another thing - with proposed implementation we still limiting number of allowed threads.
Instead of RTE_MAX_LCORES we just introduce RTE_MAX_THREADS.
So all problems with rte_mempool caches and rte_timers will remain. 

If we really need #2, then what we probably can do instead:

1. Rely on OS unique TID (linux gettid()).
2. Assign by default __lcore_id = -1, and set it up to the proper value only for EAL (lcore) threads.
3. Revise all usages of __lcore_id inside the lib and for each case:
  A) either change it  to use system wide unique TID (rte_recusive_spinlock)
  B) or update the code, so it can handle situation with __lcore_id == -1 

As I can see, right now the following code inside RTE libs use rte_clore_id():

1.	lib/librte_eal/common/eal_common_log.c
Uses rte_lcore_id() return value as index in static log_cur_msg[].

2.	lib/librte_eal/common/include/generic/rte_spinlock.h
Uses rte_lcore_id() return value as rte_spinlock_recursive.user.
Value -1 (LCORE_ID_ANY) is reserved to mark lock as unused.

3.	lib/librte_mempool/rte_mempool.h
Uses rte_lcore_id() return value as index in rte_mempool.local_cache[] and  inside rte_mempool.stats[].

4.	lib/librte_timer/rte_timer.c
Uses rte_lcore_id() return value as index in static struct priv_timer priv_timer[].
Also uses it as 16 bit owner filed inside union rte_timer_status. 
Again -1 is reserved value for RTE_TIMER_NO_OWNER.

5.	lib/librte_eal/common/include/rte_lcore.h
Inside rte_socket_id() uses rte_clore_id() return value as index in lcore_config[].

6.	lib/librte_ring/rte_ring.h
Uses rte_clore_id() return value as index in rte_ring.stats[].

case 2  is A), so I think we can use gettid() returned value instead of __lcore_id value here.  
All other cases looks like B) to me.

The easiest thing (at least as the first step) is just not add a check that __lcore_id < MAX_LCORE_ID.
case 3:  avoid mempool caching if __lcore_id >= MAX_LCORE_ID  
case 4: Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. - dynamically created thread will be able to start/stop timer for lcore thread,
but it will be not allowed to setup timer for itself or another non-lcore thread. 
rte_timer_manage() for non-lcore thread would simply do nothing and return straightway.
case 5:  just return SOCKET_ID_ANY if __lcore_id >= MAX_LCORE_ID.
case 6:  avoid stats[] update if __lcore_id >= MAX_LCORE_ID.

That way user can create as many threads as he wants dynamically and still should be able to use EAL functions inside them.
Of course for that, the problem that Olivier mentioned with thread pre-emption in the middle of ring enqueue/dequeue
(http://dpdk.org/ml/archives/dev/2014-December/010342.html) 
need to be fixed somehow.
Otherwise performance might be really poor.  
Though I suppose that need to be done for task #1 anyway.

Konstantin

> 
> 'rte_pthread_assign_lcore' do the things as 'rte_lcore_(set|get)_affinity()'
> If we keeping using lcore_id, I like the name you proposed.
> 
> I'll send my code update on next Monday.
> 
> >
> > BTW, one more thing: while we are on it  - it is probably a good time to do
> > something with our interrupt thread?
> > It is a bit strange that we can't use rte_pktmbuf_free() or
> > rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> >
> > Konstantin

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-01-09 11:52 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-11  2:04 [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 1/7] eal: add linear thread id as pthread-local variable Cunming Liang
2014-12-16  7:00   ` Qiu, Michael
2014-12-22 19:02   ` Ananyev, Konstantin
2014-12-23  9:56     ` Liang, Cunming
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 2/7] mempool: use linear-tid as mempool cache index Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 3/7] ring: use linear-tid as ring debug stats index Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 4/7] eal: add simple API for multi-pthread Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 5/7] testpmd: support multi-pthread mode Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 6/7] sample: add new sample for multi-pthread Cunming Liang
2014-12-11  2:04 ` [dpdk-dev] [RFC PATCH 7/7] eal: macro for cpuset w/ or w/o CPU_ALLOC Cunming Liang
2014-12-11  2:54 ` [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Jayakumar, Muthurajan
2014-12-11  9:56 ` Walukiewicz, Miroslaw
2014-12-12  5:44   ` Liang, Cunming
2014-12-15 11:10     ` Walukiewicz, Miroslaw
2014-12-15 11:53       ` Liang, Cunming
2014-12-18 12:20         ` Walukiewicz, Miroslaw
2014-12-18 14:32           ` Bruce Richardson
2014-12-18 15:11             ` Olivier MATZ
2014-12-18 16:04               ` Bruce Richardson
2014-12-18 16:15           ` Stephen Hemminger
2014-12-19  1:28           ` Liang, Cunming
2014-12-19 10:03             ` Bruce Richardson
2014-12-22  1:51               ` Liang, Cunming
2014-12-22  9:46                 ` Bruce Richardson
2014-12-22 10:01                   ` Walukiewicz, Miroslaw
2014-12-23  9:45                     ` Liang, Cunming
2014-12-22 18:28                   ` Stephen Hemminger
2014-12-23  9:19                     ` Walukiewicz, Miroslaw
2014-12-23  9:23                       ` Bruce Richardson
2014-12-23  9:51                     ` Liang, Cunming
2015-01-08 17:05                       ` Ananyev, Konstantin
2015-01-08 17:23                         ` Richardson, Bruce
2015-01-09  9:51                           ` Liang, Cunming
2015-01-09  9:40                         ` Liang, Cunming
2015-01-09 11:52                           ` Ananyev, Konstantin
2015-01-09  9:45                         ` Liang, Cunming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).