DPDK patches and discussions
 help / color / mirror / Atom feed
From: Huichao Cai <chcchc88@163.com>
To: konstantin.v.ananyev@yandex.ru
Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, thomas@monjalon.net
Subject: [PATCH v2] ring: add the second version of the RTS interface
Date: Tue, 14 Jan 2025 20:55:50 +0800	[thread overview]
Message-ID: <20250114125550.1932-1-chcchc88@163.com> (raw)
In-Reply-To: <20250105151345.3314-1-chcchc88@163.com>

Hi Konstantin, thank you very much for your question!

I have modified the __rte_ring_rts_v2_update_tail function(See at the bottom)and it works properly when
using your test command in my local environment(KVM). The local environment parameters are as follows:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7

I have roughly looked at the code of the SORING patch, and my patch's update logic
for tail is similar to the __rte_soring_stage_finalize function. Update the tail as soon as possible.

Tail update logic explanation:
Assuming there are three deqs/enqs simultaneously deq/enq. The order of completion for
deq/enq is first, second, and third deqs/enqs.
RTS: The tail will only be updated after the third deqs/enqs completes it.
RTS_V2: After each deqs/enqs completes it, the tail will be updated.

I have tested it multiple times and found that the performance comparison between RTS
and RTS_V2 test results is not fixed, each with its own strengths and weaknesses, as shown
in the following two test results. So I'm not sure if this patch can truly improve performance,
maybe useful for certain scenarios?

Here are two stress tests comparing the results of RTS and RTS_V2 tests:
=================test 1=================
[root@localhost ~]# echo ring_stress_autotest  | /opt/build-dpdk-release/app/dpdk-test --lcores "(2-7)@(3-5)" -n 8 --no-pci --no-huge
EAL: Detected CPU lcores: 8
EAL: Detected NUMA nodes: 1
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_stress_autotest
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-PRCS START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168020391844(60007282.80 usec),
	DEQ+ENQ={
		nb_call=87964158,
		nb_obj=3122632083,
		nb_cycle=357901041584,
		obj/call(avg): 35.50
		cycles/obj(avg): 114.62
		cycles/call(avg): 4068.71
		max cycles/call=226802256(81000.81 usec),
		min cycles/call=288(0.10 usec),
	},
};
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-PRCS OK
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-AVG START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168039915096(60014255.39 usec),
	DEQ+ENQ={
		nb_call=92846537,
		nb_obj=3296030996,
		nb_cycle=840090079114,
		obj/call(avg): 35.50
		cycles/obj(avg): 254.88
		cycles/call(avg): 9048.16
	},
};
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-AVG OK
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-PRCS START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168006214342(60002219.41 usec),
	DEQ+ENQ={
		nb_call=83543881,
		nb_obj=2965835220,
		nb_cycle=389465266530,
		obj/call(avg): 35.50
		cycles/obj(avg): 131.32
		cycles/call(avg): 4661.80
		max cycles/call=123210780(44003.85 usec),
		min cycles/call=298(0.11 usec),
	},
};
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-PRCS OK
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-AVG START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168000036710(60000013.11 usec),
	DEQ+ENQ={
		nb_call=89759571,
		nb_obj=3186412623,
		nb_cycle=839986422120,
		obj/call(avg): 35.50
		cycles/obj(avg): 263.62
		cycles/call(avg): 9358.18
	},
};
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-AVG OK
Number of tests:	4
Success:	4
Failed:	0
Test OK

=================test 2=================
[root@localhost ~]# echo ring_stress_autotest  | /opt/build-dpdk-release/app/dpdk-test --lcores "(2-7)@(3-5)" -n 8 --no-pci --no-huge
EAL: Detected CPU lcores: 8
EAL: Detected NUMA nodes: 1
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_stress_autotest
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-PRCS START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168011911986(60004254.28 usec),
	DEQ+ENQ={
		nb_call=47315418,
		nb_obj=1679700058,
		nb_cycle=361351406016,
		obj/call(avg): 35.50
		cycles/obj(avg): 215.13
		cycles/call(avg): 7637.08
		max cycles/call=114663660(40951.31 usec),
		min cycles/call=286(0.10 usec),
	},
};
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-PRCS OK
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-AVG START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168039811194(60014218.28 usec),
	DEQ+ENQ={
		nb_call=70103600,
		nb_obj=2488627393,
		nb_cycle=840101179096,
		obj/call(avg): 35.50
		cycles/obj(avg): 337.58
		cycles/call(avg): 11983.71
	},
};
TEST-CASE MT_RTS MT-WRK_ENQ_DEQ-MST_NONE-AVG OK
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-PRCS START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168000022894(60000008.18 usec),
	DEQ+ENQ={
		nb_call=72380924,
		nb_obj=2569422396,
		nb_cycle=386306567792,
		obj/call(avg): 35.50
		cycles/obj(avg): 150.35
		cycles/call(avg): 5337.13
		max cycles/call=226802852(81001.02 usec),
		min cycles/call=328(0.12 usec),
	},
};
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-PRCS OK
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-AVG START
lcore_stat_dump(AGGREGATE)={
	nb_cycle=168000052432(60000018.73 usec),
	DEQ+ENQ={
		nb_call=77585568,
		nb_obj=2754266203,
		nb_cycle=839935549688,
		obj/call(avg): 35.50
		cycles/obj(avg): 304.96
		cycles/call(avg): 10825.93
	},
};
TEST-CASE MT_RTS_V2 MT-WRK_ENQ_DEQ-MST_NONE-AVG OK
Number of tests:	4
Success:	4
Failed:	0
Test OK

==========The modified function is as follows:=========
 static __rte_always_inline void
__rte_ring_rts_v2_update_tail(struct rte_ring_rts_headtail *ht,
	uint32_t old_tail, uint32_t num, uint32_t mask)
{
	union __rte_ring_rts_poscnt ot, nt;
	uint32_t expect_num = 0;

	ot.val.cnt = 0;
	ot.val.pos = old_tail;

	/*
	 * If the tail is equal to the current enqueue/dequeue, update
	 * the tail with new value and then continue to try to update the
	 * tail until the num of the cache is 0, otherwise write the num of
	 * the current enqueue/dequeue to the cache.
	 */

	nt.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
	if (ot.val.pos != nt.val.pos) {
		/*
		 * Write the num of the current enqueues/dequeues to the
		 * corresponding cache.
		 */
		if (rte_atomic_compare_exchange_strong_explicit(
				&ht->rts_cache[ot.val.pos & mask].num, &expect_num, num,
				rte_memory_order_release, rte_memory_order_acquire))
			return;

		/*
		 * Another enqueue/dequeue has exited the operation of updating the tail,
		 * and this enqueue/dequeue for continuing the update.
		 */
		rte_atomic_store_explicit(&ht->tail.raw, ot.raw, rte_memory_order_release);
	}

	/*
	 * Set the corresponding cache to 0 for next use.
	 */
	rte_atomic_store_explicit(&ht->rts_cache[ot.val.pos & mask].num,
		0, rte_memory_order_release);

	nt.val.pos = ot.val.pos + num;

	/*
	 * Try to update the tail until the num of the corresponding cache is 0.
	 * Getting here means that the current enqueues/dequeues is trying to update
	 * the tail of another enqueue/dequeue.
	 */
	while (1) {
		num = 0;
		if (rte_atomic_compare_exchange_strong_explicit(
				&ht->rts_cache[nt.val.pos & mask].num, &num , mask,
				rte_memory_order_release, rte_memory_order_acquire)) {
			/* on 32-bit systems we have to do atomic read here */
			rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
				(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
				rte_memory_order_release, rte_memory_order_acquire);
			return;
		}

		rte_atomic_store_explicit(&ht->rts_cache[nt.val.pos & mask].num,
			0, rte_memory_order_release);

		 /* Now it is safe to update the tail. */
		rte_atomic_store_explicit(&ht->tail.raw, nt.raw, rte_memory_order_release);

		ot.val.pos = nt.val.pos;
		nt.val.pos += num;
	};
}


  parent reply	other threads:[~2025-01-14 12:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-05  9:57 [PATCH] " Huichao Cai
2025-01-05 15:13 ` [PATCH v2] " Huichao Cai
2025-01-08  1:41   ` Huichao Cai
2025-01-14 15:04     ` Thomas Monjalon
2025-01-08 16:49   ` Konstantin Ananyev
2025-01-14 12:55   ` Huichao Cai [this message]
2025-01-05 15:09 Huichao Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250114125550.1932-1-chcchc88@163.com \
    --to=chcchc88@163.com \
    --cc=dev@dpdk.org \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).