DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC] service: extend service function call statistics
@ 2024-01-25 19:15 Mattias Rönnblom
  2024-01-25 23:19 ` Morten Brørup
  2024-08-09 20:25 ` [PATCH] " Mattias Rönnblom
  0 siblings, 2 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2024-01-25 19:15 UTC (permalink / raw)
  To: dev; +Cc: hofors, harry.van.haaren, Stefan Sundkvist, Mattias Rönnblom

Add two new per-service counters.

RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
invocations where no work was performed.

RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
resulting in an error.

The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
counting all invocations, regardless of return value).

The new statistics may be useful for both debugging and profiling
(e.g., calculate the average per-call processing latency for non-idle
service calls).

Service core tests are extended to cover the new counters, and
coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.

The documentation for the CYCLES attributes are updated to reflect
their actual semantics.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_service_cores.c | 70 ++++++++++++++++++++++++++---------
 lib/eal/common/rte_service.c  | 70 ++++++++++++++++++++++++++++-------
 lib/eal/include/rte_service.h | 24 ++++++++++--
 3 files changed, 131 insertions(+), 33 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index c12d52d8f1..13a01b195f 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -3,11 +3,12 @@
  */
 
 #include <rte_common.h>
+#include <rte_cycles.h>
 #include <rte_hexdump.h>
-#include <rte_mbuf.h>
 #include <rte_malloc.h>
+#include <rte_mbuf.h>
 #include <rte_memcpy.h>
-#include <rte_cycles.h>
+#include <rte_random.h>
 
 #include <rte_service.h>
 #include <rte_service_component.h>
@@ -16,8 +17,10 @@
 
 /* used as the service core ID */
 static uint32_t slcore_id;
-/* used as timestamp to detect if a service core is running */
-static uint64_t service_tick;
+/* track service call count */
+static uint64_t service_calls;
+static uint64_t service_idle_calls;
+static uint64_t service_error_calls;
 /* used as a flag to check if a function was run */
 static uint32_t service_remote_launch_flag;
 
@@ -46,9 +49,21 @@ testsuite_teardown(void)
 static int32_t dummy_cb(void *args)
 {
 	RTE_SET_USED(args);
-	service_tick++;
+
+	service_calls++;
+
+	switch (rte_rand_max(3)) {
+	case 0:
+		return 0;
+	case 1:
+		service_idle_calls++;
+		return -EAGAIN;
+	default:
+		service_error_calls++;
+		return -ENOENT;
+	}
+
 	rte_delay_ms(SERVICE_DELAY);
-	return 0;
 }
 
 static int32_t dummy_mt_unsafe_cb(void *args)
@@ -121,6 +136,10 @@ unregister_all(void)
 	rte_service_lcore_reset_all();
 	rte_eal_mp_wait_lcore();
 
+	service_calls = 0;
+	service_idle_calls = 0;
+	service_error_calls = 0;
+
 	return TEST_SUCCESS;
 }
 
@@ -295,12 +314,19 @@ service_attr_get(void)
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(0, attr_value,
 			"attr_get() call didn't set correct cycles (zero)");
-	/* check correct call count */
+	/* check correct call counts */
 	const int attr_calls = RTE_SERVICE_ATTR_CALL_COUNT;
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(0, attr_value,
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(0, attr_value, "Call count was not zero");
+	const int attr_idle_calls = RTE_SERVICE_ATTR_IDLE_CALL_COUNT;
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Idle call count was not zero");
+	const int attr_error_calls = RTE_SERVICE_ATTR_ERROR_CALL_COUNT;
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Error call count was not zero");
 
 	/* Call service to increment cycle count */
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_id),
@@ -331,8 +357,13 @@ service_attr_get(void)
 
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(1, (attr_value > 0),
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(service_calls, attr_value, "Unexpected call count");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(service_idle_calls, attr_value, "Unexpected idle call count");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(service_error_calls, attr_value, "Unexpected error call count");
 
 	TEST_ASSERT_EQUAL(0, rte_service_attr_reset_all(id),
 			"Valid attr_reset_all() return success");
@@ -341,11 +372,16 @@ service_attr_get(void)
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(0, attr_value,
 			"attr_get() call didn't set correct cycles (zero)");
-	/* ensure call count > zero */
+	/* ensure call counts are zero */
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(0, (attr_value > 0),
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(0, attr_value, "Call count was not reset");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Idle call count was not reset");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Error call count was not reset");
 
 	return unregister_all();
 }
@@ -533,10 +569,10 @@ service_lcore_en_dis_able(void)
 static int
 service_lcore_running_check(void)
 {
-	uint64_t tick = service_tick;
+	uint64_t calls = service_calls;
 	rte_delay_ms(SERVICE_DELAY * 100);
-	/* if (tick != service_tick) we know the lcore as polled the service */
-	return tick != service_tick;
+	bool service_polled = calls != service_calls;
+	return service_polled;
 }
 
 static int
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index d959c91459..9438d35c50 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -57,6 +57,8 @@ struct rte_service_spec_impl {
 
 struct service_stats {
 	RTE_ATOMIC(uint64_t) calls;
+	RTE_ATOMIC(uint64_t) idle_calls;
+	RTE_ATOMIC(uint64_t) error_calls;
 	RTE_ATOMIC(uint64_t) cycles;
 };
 
@@ -369,6 +371,16 @@ rte_service_runstate_get(uint32_t id)
 
 }
 
+static void
+service_counter_add(uint64_t *counter, uint64_t value)
+{
+	/* The lcore service worker thread is the only writer, and
+	 * thus only a non-atomic load and an atomic store is needed,
+	 * and not the more expensive atomic add.
+	 */
+	rte_atomic_store_explicit(counter, *counter + value, rte_memory_order_relaxed);
+}
+
 static inline void
 service_runner_do_callback(struct rte_service_spec_impl *s,
 			   struct core_state *cs, uint32_t service_idx)
@@ -380,27 +392,23 @@ service_runner_do_callback(struct rte_service_spec_impl *s,
 		uint64_t start = rte_rdtsc();
 		int rc = s->spec.callback(userdata);
 
-		/* The lcore service worker thread is the only writer,
-		 * and thus only a non-atomic load and an atomic store
-		 * is needed, and not the more expensive atomic
-		 * add.
-		 */
 		struct service_stats *service_stats =
 			&cs->service_stats[service_idx];
 
+		service_counter_add(&service_stats->calls, 1);
+
+		if (rc == -EAGAIN)
+			service_counter_add(&service_stats->idle_calls, 1);
+		else if (rc != 0)
+			service_counter_add(&service_stats->error_calls, 1);
+
 		if (likely(rc != -EAGAIN)) {
 			uint64_t end = rte_rdtsc();
 			uint64_t cycles = end - start;
 
-			rte_atomic_store_explicit(&cs->cycles, cs->cycles + cycles,
-				rte_memory_order_relaxed);
-			rte_atomic_store_explicit(&service_stats->cycles,
-				service_stats->cycles + cycles,
-				rte_memory_order_relaxed);
+			service_counter_add(&cs->cycles, cycles);
+			service_counter_add(&service_stats->cycles, cycles);
 		}
-
-		rte_atomic_store_explicit(&service_stats->calls,
-			service_stats->calls + 1, rte_memory_order_relaxed);
 	} else {
 		s->spec.callback(userdata);
 	}
@@ -867,6 +875,24 @@ lcore_attr_get_service_calls(uint32_t service_id, unsigned int lcore)
 		rte_memory_order_relaxed);
 }
 
+static uint64_t
+lcore_attr_get_service_idle_calls(uint32_t service_id, unsigned int lcore)
+{
+	struct core_state *cs = &lcore_states[lcore];
+
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].idle_calls,
+		rte_memory_order_relaxed);
+}
+
+static uint64_t
+lcore_attr_get_service_error_calls(uint32_t service_id, unsigned int lcore)
+{
+	struct core_state *cs = &lcore_states[lcore];
+
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].error_calls,
+		rte_memory_order_relaxed);
+}
+
 static uint64_t
 lcore_attr_get_service_cycles(uint32_t service_id, unsigned int lcore)
 {
@@ -899,6 +925,18 @@ attr_get_service_calls(uint32_t service_id)
 	return attr_get(service_id, lcore_attr_get_service_calls);
 }
 
+static uint64_t
+attr_get_service_idle_calls(uint32_t service_id)
+{
+	return attr_get(service_id, lcore_attr_get_service_idle_calls);
+}
+
+static uint64_t
+attr_get_service_error_calls(uint32_t service_id)
+{
+	return attr_get(service_id, lcore_attr_get_service_error_calls);
+}
+
 static uint64_t
 attr_get_service_cycles(uint32_t service_id)
 {
@@ -918,6 +956,12 @@ rte_service_attr_get(uint32_t id, uint32_t attr_id, uint64_t *attr_value)
 	case RTE_SERVICE_ATTR_CALL_COUNT:
 		*attr_value = attr_get_service_calls(id);
 		return 0;
+	case RTE_SERVICE_ATTR_IDLE_CALL_COUNT:
+		*attr_value = attr_get_service_idle_calls(id);
+		return 0;
+	case RTE_SERVICE_ATTR_ERROR_CALL_COUNT:
+		*attr_value = attr_get_service_error_calls(id);
+		return 0;
 	case RTE_SERVICE_ATTR_CYCLES:
 		*attr_value = attr_get_service_cycles(id);
 		return 0;
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index e49a7a877e..89057df585 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -374,15 +374,32 @@ int32_t rte_service_lcore_count_services(uint32_t lcore);
 int32_t rte_service_dump(FILE *f, uint32_t id);
 
 /**
- * Returns the number of cycles that this service has consumed
+ * Returns the number of cycles that this service has consumed. Only
+ * cycles spent in non-idle calls (i.e., calls not returning -EAGAIN)
+ * count.
  */
 #define RTE_SERVICE_ATTR_CYCLES 0
 
 /**
- * Returns the count of invocations of this service function
+ * Returns the total number of invocations of this service function
+ * (regardless of return value).
  */
 #define RTE_SERVICE_ATTR_CALL_COUNT 1
 
+/**
+ * Returns the number of invocations of this service function where the
+ * service reported having not performed any useful work (i.e.,
+ * returned -EAGAIN).
+ */
+#define RTE_SERVICE_ATTR_IDLE_CALL_COUNT 2
+
+/**
+ * Returns the number of invocations of this service function where the
+ * service reported an error (i.e., the return value was neither 0 nor
+ * -EAGAIN).
+ */
+#define RTE_SERVICE_ATTR_ERROR_CALL_COUNT 3
+
 /**
  * Get an attribute from a service.
  *
@@ -408,7 +425,8 @@ int32_t rte_service_attr_reset_all(uint32_t id);
 
 /**
  * Returns the total number of cycles that the lcore has spent on
- * running services.
+ * running services. Only non-idle calls (i.e., calls not returning
+ * -EAGAIN) count toward this total.
  */
 #define RTE_SERVICE_LCORE_ATTR_CYCLES 1
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC] service: extend service function call statistics
  2024-01-25 19:15 [RFC] service: extend service function call statistics Mattias Rönnblom
@ 2024-01-25 23:19 ` Morten Brørup
  2024-01-26  8:28   ` Mattias Rönnblom
  2024-08-09 20:25 ` [PATCH] " Mattias Rönnblom
  1 sibling, 1 reply; 18+ messages in thread
From: Morten Brørup @ 2024-01-25 23:19 UTC (permalink / raw)
  To: Mattias Rönnblom, dev; +Cc: hofors, harry.van.haaren, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 25 January 2024 20.15
> 
> Add two new per-service counters.
> 
> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
> invocations where no work was performed.
> 
> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> resulting in an error.
> 
> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> counting all invocations, regardless of return value).
> 
> The new statistics may be useful for both debugging and profiling
> (e.g., calculate the average per-call processing latency for non-idle
> service calls).
> 
> Service core tests are extended to cover the new counters, and
> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.

OK to all of the above. Good stuff.

> 
> The documentation for the CYCLES attributes are updated to reflect
> their actual semantics.

If this is intended behavior, then updating the documentation seems appropriate - I would even go so far as considering it a bug fix.

However, quite a few cycles may be consumed by a service before it can conclude that it had no work to do. Shouldn't that be considered time spent by the service? I.e. should the code be fixed instead of the documentation?

Alternatively, keep the behavior (for backwards compatibility) and fix the documentation, as this patch does, and add an IDLE_CYCLES counter for time spent in idle calls.

PS: We're not using DPDK service cores in our applications, so I'm not familiar with the details. We are using something somewhat similar (but homegrown), also for profiling and power management purposes, and my feedback is based on my experience with our own variant of service cores.

Either way:

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] service: extend service function call statistics
  2024-01-25 23:19 ` Morten Brørup
@ 2024-01-26  8:28   ` Mattias Rönnblom
  2024-01-26 10:07     ` Morten Brørup
  0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2024-01-26  8:28 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: harry.van.haaren, Stefan Sundkvist

On 2024-01-26 00:19, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Thursday, 25 January 2024 20.15
>>
>> Add two new per-service counters.
>>
>> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
>> invocations where no work was performed.
>>
>> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
>> resulting in an error.
>>
>> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
>> counting all invocations, regardless of return value).
>>
>> The new statistics may be useful for both debugging and profiling
>> (e.g., calculate the average per-call processing latency for non-idle
>> service calls).
>>
>> Service core tests are extended to cover the new counters, and
>> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
> 
> OK to all of the above. Good stuff.
> 
>>
>> The documentation for the CYCLES attributes are updated to reflect
>> their actual semantics.
> 
> If this is intended behavior, then updating the documentation seems appropriate - I would even go so far as considering it a bug fix.
> 
> However, quite a few cycles may be consumed by a service before it can conclude that it had no work to do. Shouldn't that be considered time spent by the service? I.e. should the code be fixed instead of the documentation?
> 

Generally, polling for new work in the service is cheap, in my 
experience. But there's nothing in principle that prevents the situation 
your describe from occurring. You could add an "IDLE_CYCLES" counter in 
case that would ever be a real-world problem.

That wouldn't be a fix, but rather just returning to the old, subtly 
broken, (pre-22.11?) semantics.

Have a look at 809bd24 to see the rationale for the change. There's an 
example in 4689c57.

The cause of this ambiguity is due to the fact that the platform/OS 
(i.e., DPDK) doesn't know which service to "wake up" (which in turn is 
the result of the platform not being able to tracking input sources, 
like NIC RX queues or timer wheels) and thus must ask every service to 
check if it has something to do.

> Alternatively, keep the behavior (for backwards compatibility) and fix the documentation, as this patch does, and add an IDLE_CYCLES counter for time spent in idle calls.
> 
> PS: We're not using DPDK service cores in our applications, so I'm not familiar with the details. We are using something somewhat similar (but homegrown), also for profiling and power management purposes, and my feedback is based on my experience with our own variant of service cores.
> 

When are you making the switch to service cores? :)

> Either way:
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 

Thanks for the review.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC] service: extend service function call statistics
  2024-01-26  8:28   ` Mattias Rönnblom
@ 2024-01-26 10:07     ` Morten Brørup
  2024-01-27 19:31       ` Mattias Rönnblom
  0 siblings, 1 reply; 18+ messages in thread
From: Morten Brørup @ 2024-01-26 10:07 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: harry.van.haaren, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Friday, 26 January 2024 09.28
> 
> On 2024-01-26 00:19, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Thursday, 25 January 2024 20.15
> >>
> >> Add two new per-service counters.
> >>
> >> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service
> function
> >> invocations where no work was performed.
> >>
> >> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> >> resulting in an error.
> >>
> >> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> >> counting all invocations, regardless of return value).
> >>
> >> The new statistics may be useful for both debugging and profiling
> >> (e.g., calculate the average per-call processing latency for non-
> idle
> >> service calls).
> >>
> >> Service core tests are extended to cover the new counters, and
> >> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
> >
> > OK to all of the above. Good stuff.
> >
> >>
> >> The documentation for the CYCLES attributes are updated to reflect
> >> their actual semantics.
> >
> > If this is intended behavior, then updating the documentation seems
> appropriate - I would even go so far as considering it a bug fix.
> >
> > However, quite a few cycles may be consumed by a service before it
> can conclude that it had no work to do. Shouldn't that be considered
> time spent by the service? I.e. should the code be fixed instead of the
> documentation?
> >
> 
> Generally, polling for new work in the service is cheap, in my
> experience. But there's nothing in principle that prevents the
> situation
> your describe from occurring. You could add an "IDLE_CYCLES" counter in
> case that would ever be a real-world problem.
> 
> That wouldn't be a fix, but rather just returning to the old, subtly
> broken, (pre-22.11?) semantics.
> 
> Have a look at 809bd24 to see the rationale for the change. There's an
> example in 4689c57.
> 
> The cause of this ambiguity is due to the fact that the platform/OS
> (i.e., DPDK) doesn't know which service to "wake up" (which in turn is
> the result of the platform not being able to tracking input sources,
> like NIC RX queues or timer wheels) and thus must ask every service to
> check if it has something to do.

OK. Makes good sense.
So definitely fix the documentation, not the code. :-)

> 
> > Alternatively, keep the behavior (for backwards compatibility) and
> fix the documentation, as this patch does, and add an IDLE_CYCLES
> counter for time spent in idle calls.
> >
> > PS: We're not using DPDK service cores in our applications, so I'm
> not familiar with the details. We are using something somewhat similar
> (but homegrown), also for profiling and power management purposes, and
> my feedback is based on my experience with our own variant of service
> cores.
> >
> 
> When are you making the switch to service cores? :)

Our own "service cores" implementation has some slightly different properties, which we are still experimenting with.

E.g. in addition to the special return value "idle (no work to do)", we also have a special return value for "incomplete (more work urgently pending)" when a service processed a full burst and still has more work pending its input queue.

We are also considering returning a value to inform what time it needs to be called again. This concept is only an idea, and we haven't started experimenting with it yet.


From a high level perspective, the service cores library is much like an operating system's CPU scheduler, although based on voluntary time sharing. Many algorithms and many parameters can be considered. It can also tie into power management and prioritization of different tasks.

> 
> > Either way:
> >
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >
> 
> Thanks for the review.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] service: extend service function call statistics
  2024-01-26 10:07     ` Morten Brørup
@ 2024-01-27 19:31       ` Mattias Rönnblom
  2024-01-29 12:50         ` Van Haaren, Harry
  0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2024-01-27 19:31 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: harry.van.haaren, Stefan Sundkvist

On 2024-01-26 11:07, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Friday, 26 January 2024 09.28
>>
>> On 2024-01-26 00:19, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>> Sent: Thursday, 25 January 2024 20.15
>>>>
>>>> Add two new per-service counters.
>>>>
>>>> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service
>> function
>>>> invocations where no work was performed.
>>>>
>>>> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
>>>> resulting in an error.
>>>>
>>>> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
>>>> counting all invocations, regardless of return value).
>>>>
>>>> The new statistics may be useful for both debugging and profiling
>>>> (e.g., calculate the average per-call processing latency for non-
>> idle
>>>> service calls).
>>>>
>>>> Service core tests are extended to cover the new counters, and
>>>> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
>>>
>>> OK to all of the above. Good stuff.
>>>
>>>>
>>>> The documentation for the CYCLES attributes are updated to reflect
>>>> their actual semantics.
>>>
>>> If this is intended behavior, then updating the documentation seems
>> appropriate - I would even go so far as considering it a bug fix.
>>>
>>> However, quite a few cycles may be consumed by a service before it
>> can conclude that it had no work to do. Shouldn't that be considered
>> time spent by the service? I.e. should the code be fixed instead of the
>> documentation?
>>>
>>
>> Generally, polling for new work in the service is cheap, in my
>> experience. But there's nothing in principle that prevents the
>> situation
>> your describe from occurring. You could add an "IDLE_CYCLES" counter in
>> case that would ever be a real-world problem.
>>
>> That wouldn't be a fix, but rather just returning to the old, subtly
>> broken, (pre-22.11?) semantics.
>>
>> Have a look at 809bd24 to see the rationale for the change. There's an
>> example in 4689c57.
>>
>> The cause of this ambiguity is due to the fact that the platform/OS
>> (i.e., DPDK) doesn't know which service to "wake up" (which in turn is
>> the result of the platform not being able to tracking input sources,
>> like NIC RX queues or timer wheels) and thus must ask every service to
>> check if it has something to do.
> 
> OK. Makes good sense.
> So definitely fix the documentation, not the code. :-)
> 
>>
>>> Alternatively, keep the behavior (for backwards compatibility) and
>> fix the documentation, as this patch does, and add an IDLE_CYCLES
>> counter for time spent in idle calls.
>>>
>>> PS: We're not using DPDK service cores in our applications, so I'm
>> not familiar with the details. We are using something somewhat similar
>> (but homegrown), also for profiling and power management purposes, and
>> my feedback is based on my experience with our own variant of service
>> cores.
>>>
>>
>> When are you making the switch to service cores? :)
> 
> Our own "service cores" implementation has some slightly different properties, which we are still experimenting with.
> 
> E.g. in addition to the special return value "idle (no work to do)", we also have a special return value for "incomplete (more work urgently pending)" when a service processed a full burst and still has more work pending its input queue.
> 
> We are also considering returning a value to inform what time it needs to be called again. This concept is only an idea, and we haven't started experimenting with it yet.
> 
> 
>  From a high level perspective, the service cores library is much like an operating system's CPU scheduler, although based on voluntary time sharing. Many algorithms and many parameters can be considered. It can also tie into power management and prioritization of different tasks.
> 

Service cores in their current form is more like a primitive kernel 
bottom half implementation.

The primary work scheduler in DPDK is Eventdev, but even with Eventdev 
you want some way to fit non-event kind of processing as well, and here 
services cores serves a role.

rte_service.c is a good place for power management. I agree some means 
for the services to convey what latency (e.g., sleep time) is acceptable 
is needed. Returning a time per service function invocation would be a 
flexible way to do so. Could also just be a per-lcore, or global 
configuration, set by the application.

>>
>>> Either way:
>>>
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>
>>
>> Thanks for the review.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC] service: extend service function call statistics
  2024-01-27 19:31       ` Mattias Rönnblom
@ 2024-01-29 12:50         ` Van Haaren, Harry
  2024-01-30  7:16           ` Mattias Rönnblom
  0 siblings, 1 reply; 18+ messages in thread
From: Van Haaren, Harry @ 2024-01-29 12:50 UTC (permalink / raw)
  To: Mattias Rönnblom, Morten Brørup, Mattias Rönnblom, dev
  Cc: Stefan Sundkvist

> -----Original Message-----
> From: Mattias Rönnblom <hofors@lysator.liu.se>
> Sent: Saturday, January 27, 2024 7:32 PM
> To: Morten Brørup <mb@smartsharesystems.com>; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; dev@dpdk.org
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; Stefan Sundkvist
> <stefan.sundkvist@ericsson.com>
> Subject: Re: [RFC] service: extend service function call statistics

Hi Mattias,

Thanks for the patch, looks good to me! Some responses to discussion below;

Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

> On 2024-01-26 11:07, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Friday, 26 January 2024 09.28
> >>
> >> On 2024-01-26 00:19, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>> Sent: Thursday, 25 January 2024 20.15
> >>>>
> >>>> Add two new per-service counters.
> >>>>
> >>>> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service
> >> function
> >>>> invocations where no work was performed.
> >>>>
> >>>> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> >>>> resulting in an error.
> >>>>
> >>>> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> >>>> counting all invocations, regardless of return value).
> >>>>
> >>>> The new statistics may be useful for both debugging and profiling
> >>>> (e.g., calculate the average per-call processing latency for non-
> >> idle
> >>>> service calls).
> >>>>
> >>>> Service core tests are extended to cover the new counters, and
> >>>> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
> >>>
> >>> OK to all of the above. Good stuff.
> >>>
> >>>>
> >>>> The documentation for the CYCLES attributes are updated to reflect
> >>>> their actual semantics.
> >>>
> >>> If this is intended behavior, then updating the documentation seems
> >> appropriate - I would even go so far as considering it a bug fix.
> >>>
> >>> However, quite a few cycles may be consumed by a service before it
> >> can conclude that it had no work to do. Shouldn't that be considered
> >> time spent by the service? I.e. should the code be fixed instead of the
> >> documentation?
> >>>
> >>
> >> Generally, polling for new work in the service is cheap, in my
> >> experience. But there's nothing in principle that prevents the
> >> situation
> >> your describe from occurring. You could add an "IDLE_CYCLES" counter in
> >> case that would ever be a real-world problem.
> >>
> >> That wouldn't be a fix, but rather just returning to the old, subtly
> >> broken, (pre-22.11?) semantics.
> >>
> >> Have a look at 809bd24 to see the rationale for the change. There's an
> >> example in 4689c57.
> >>
> >> The cause of this ambiguity is due to the fact that the platform/OS
> >> (i.e., DPDK) doesn't know which service to "wake up" (which in turn is
> >> the result of the platform not being able to tracking input sources,
> >> like NIC RX queues or timer wheels) and thus must ask every service to
> >> check if it has something to do.
> >
> > OK. Makes good sense.
> > So definitely fix the documentation, not the code. :-)

Agreed.

> >>
> >>> Alternatively, keep the behavior (for backwards compatibility) and
> >> fix the documentation, as this patch does, and add an IDLE_CYCLES
> >> counter for time spent in idle calls.
> >>>
> >>> PS: We're not using DPDK service cores in our applications, so I'm
> >> not familiar with the details. We are using something somewhat similar
> >> (but homegrown), also for profiling and power management purposes, and
> >> my feedback is based on my experience with our own variant of service
> >> cores.
> >>>
> >>
> >> When are you making the switch to service cores? :)
> >
> > Our own "service cores" implementation has some slightly different properties,
> which we are still experimenting with.
> >
> > E.g. in addition to the special return value "idle (no work to do)", we also have a
> special return value for "incomplete (more work urgently pending)" when a service
> processed a full burst and still has more work pending its input queue.
> >
> > We are also considering returning a value to inform what time it needs to be called
> again. This concept is only an idea, and we haven't started experimenting with it yet.
> >
> >
> >  From a high level perspective, the service cores library is much like an operating
> system's CPU scheduler, although based on voluntary time sharing. Many algorithms
> and many parameters can be considered. It can also tie into power management and
> prioritization of different tasks.

This was brought up as a concern when initially adding it to DPDK; the scope of service-cores
is quite easily going to change from "way to run a dataplane function on a core, to abstract
away the different environments that DPDK runs" to "userspace scheduler with bells-and-whistles".

The reason service-cores was built was to allow applications run with HW & SW eventdev PMDs,
and not have to handle the different threading requirements at the application level. This goal is
achieved by the current service-cores infrastructure.


> Service cores in their current form is more like a primitive kernel
> bottom half implementation.
> 
> The primary work scheduler in DPDK is Eventdev, but even with Eventdev
> you want some way to fit non-event kind of processing as well, and here
> services cores serves a role.
> 
> rte_service.c is a good place for power management. I agree some means
> for the services to convey what latency (e.g., sleep time) is acceptable
> is needed. Returning a time per service function invocation would be a
> flexible way to do so. Could also just be a per-lcore, or global
> configuration, set by the application.

While I agree it is potentially a place to implement power management, sleeping etc... to do so expands
the scope of service-cores significantly; not something that we should do without considerable thought.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] service: extend service function call statistics
  2024-01-29 12:50         ` Van Haaren, Harry
@ 2024-01-30  7:16           ` Mattias Rönnblom
  0 siblings, 0 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2024-01-30  7:16 UTC (permalink / raw)
  To: Van Haaren, Harry, Morten Brørup, Mattias Rönnblom, dev
  Cc: Stefan Sundkvist

On 2024-01-29 13:50, Van Haaren, Harry wrote:
>> -----Original Message-----
>> From: Mattias Rönnblom <hofors@lysator.liu.se>
>> Sent: Saturday, January 27, 2024 7:32 PM
>> To: Morten Brørup <mb@smartsharesystems.com>; Mattias Rönnblom
>> <mattias.ronnblom@ericsson.com>; dev@dpdk.org
>> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; Stefan Sundkvist
>> <stefan.sundkvist@ericsson.com>
>> Subject: Re: [RFC] service: extend service function call statistics
> 
> Hi Mattias,
> 
> Thanks for the patch, looks good to me! Some responses to discussion below;
> 
> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
> 
>> On 2024-01-26 11:07, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Friday, 26 January 2024 09.28
>>>>
>>>> On 2024-01-26 00:19, Morten Brørup wrote:
>>>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>>>> Sent: Thursday, 25 January 2024 20.15
>>>>>>
>>>>>> Add two new per-service counters.
>>>>>>
>>>>>> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service
>>>> function
>>>>>> invocations where no work was performed.
>>>>>>
>>>>>> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
>>>>>> resulting in an error.
>>>>>>
>>>>>> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
>>>>>> counting all invocations, regardless of return value).
>>>>>>
>>>>>> The new statistics may be useful for both debugging and profiling
>>>>>> (e.g., calculate the average per-call processing latency for non-
>>>> idle
>>>>>> service calls).
>>>>>>
>>>>>> Service core tests are extended to cover the new counters, and
>>>>>> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
>>>>>
>>>>> OK to all of the above. Good stuff.
>>>>>
>>>>>>
>>>>>> The documentation for the CYCLES attributes are updated to reflect
>>>>>> their actual semantics.
>>>>>
>>>>> If this is intended behavior, then updating the documentation seems
>>>> appropriate - I would even go so far as considering it a bug fix.
>>>>>
>>>>> However, quite a few cycles may be consumed by a service before it
>>>> can conclude that it had no work to do. Shouldn't that be considered
>>>> time spent by the service? I.e. should the code be fixed instead of the
>>>> documentation?
>>>>>
>>>>
>>>> Generally, polling for new work in the service is cheap, in my
>>>> experience. But there's nothing in principle that prevents the
>>>> situation
>>>> your describe from occurring. You could add an "IDLE_CYCLES" counter in
>>>> case that would ever be a real-world problem.
>>>>
>>>> That wouldn't be a fix, but rather just returning to the old, subtly
>>>> broken, (pre-22.11?) semantics.
>>>>
>>>> Have a look at 809bd24 to see the rationale for the change. There's an
>>>> example in 4689c57.
>>>>
>>>> The cause of this ambiguity is due to the fact that the platform/OS
>>>> (i.e., DPDK) doesn't know which service to "wake up" (which in turn is
>>>> the result of the platform not being able to tracking input sources,
>>>> like NIC RX queues or timer wheels) and thus must ask every service to
>>>> check if it has something to do.
>>>
>>> OK. Makes good sense.
>>> So definitely fix the documentation, not the code. :-)
> 
> Agreed.
> 
>>>>
>>>>> Alternatively, keep the behavior (for backwards compatibility) and
>>>> fix the documentation, as this patch does, and add an IDLE_CYCLES
>>>> counter for time spent in idle calls.
>>>>>
>>>>> PS: We're not using DPDK service cores in our applications, so I'm
>>>> not familiar with the details. We are using something somewhat similar
>>>> (but homegrown), also for profiling and power management purposes, and
>>>> my feedback is based on my experience with our own variant of service
>>>> cores.
>>>>>
>>>>
>>>> When are you making the switch to service cores? :)
>>>
>>> Our own "service cores" implementation has some slightly different properties,
>> which we are still experimenting with.
>>>
>>> E.g. in addition to the special return value "idle (no work to do)", we also have a
>> special return value for "incomplete (more work urgently pending)" when a service
>> processed a full burst and still has more work pending its input queue.
>>>
>>> We are also considering returning a value to inform what time it needs to be called
>> again. This concept is only an idea, and we haven't started experimenting with it yet.
>>>
>>>
>>>   From a high level perspective, the service cores library is much like an operating
>> system's CPU scheduler, although based on voluntary time sharing. Many algorithms
>> and many parameters can be considered. It can also tie into power management and
>> prioritization of different tasks.
> 
> This was brought up as a concern when initially adding it to DPDK; the scope of service-cores
> is quite easily going to change from "way to run a dataplane function on a core, to abstract
> away the different environments that DPDK runs" to "userspace scheduler with bells-and-whistles".
> 

Yes, and in a hypothetical DPDK without Eventdev or competing deferred 
work mechanisms that would have been a natural evolution.

> The reason service-cores was built was to allow applications run with HW & SW eventdev PMDs,
> and not have to handle the different threading requirements at the application level. This goal is
> achieved by the current service-cores infrastructure.
> 

It's a generic and public API. I'm not sure I think it matters much what 
service it was initially built. There's no "PMD use only" label on it, 
which is a good thing.

> 
>> Service cores in their current form is more like a primitive kernel
>> bottom half implementation.
>>
>> The primary work scheduler in DPDK is Eventdev, but even with Eventdev
>> you want some way to fit non-event kind of processing as well, and here
>> services cores serves a role.
>>
>> rte_service.c is a good place for power management. I agree some means
>> for the services to convey what latency (e.g., sleep time) is acceptable
>> is needed. Returning a time per service function invocation would be a
>> flexible way to do so. Could also just be a per-lcore, or global
>> configuration, set by the application.
> 
> While I agree it is potentially a place to implement power management, sleeping etc... to do so expands
> the scope of service-cores significantly; not something that we should do without considerable thought.

Doesn't have to be significant in terms of lines of code to develop and 
maintain, and the actual logic could (and probably should) sit 
elsewhere. Such a feature could start off as just a lcore-is-idle-hook 
to the application, and it would decided what to do, if anything.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] service: extend service function call statistics
  2024-01-25 19:15 [RFC] service: extend service function call statistics Mattias Rönnblom
  2024-01-25 23:19 ` Morten Brørup
@ 2024-08-09 20:25 ` Mattias Rönnblom
  2024-09-09 19:11   ` [PATCH v2] " Mattias Rönnblom
  1 sibling, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2024-08-09 20:25 UTC (permalink / raw)
  To: dev; +Cc: hofors, harry.van.haaren, Stefan Sundkvist, Mattias Rönnblom

Add two new per-service counters.

RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
invocations where no work was performed.

RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
resulting in an error.

The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
counting all invocations, regardless of return value).

The new statistics may be useful for both debugging and profiling
(e.g., calculate the average per-call processing latency for non-idle
service calls).

Service core tests are extended to cover the new counters, and
coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.

The documentation for the CYCLES attributes are updated to reflect
their actual semantics.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

--

PATCH:
 * Update release notes.
---
 app/test/test_service_cores.c          | 70 +++++++++++++++++++-------
 doc/guides/rel_notes/release_24_11.rst | 11 ++++
 lib/eal/common/rte_service.c           | 70 +++++++++++++++++++++-----
 lib/eal/include/rte_service.h          | 24 +++++++--
 4 files changed, 142 insertions(+), 33 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 010ab82893..7ae6bbb179 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -3,11 +3,12 @@
  */
 
 #include <rte_common.h>
+#include <rte_cycles.h>
 #include <rte_hexdump.h>
-#include <rte_mbuf.h>
 #include <rte_malloc.h>
+#include <rte_mbuf.h>
 #include <rte_memcpy.h>
-#include <rte_cycles.h>
+#include <rte_random.h>
 
 #include <rte_service.h>
 #include <rte_service_component.h>
@@ -16,8 +17,10 @@
 
 /* used as the service core ID */
 static uint32_t slcore_id;
-/* used as timestamp to detect if a service core is running */
-static uint64_t service_tick;
+/* track service call count */
+static uint64_t service_calls;
+static uint64_t service_idle_calls;
+static uint64_t service_error_calls;
 /* used as a flag to check if a function was run */
 static uint32_t service_remote_launch_flag;
 
@@ -46,9 +49,21 @@ testsuite_teardown(void)
 static int32_t dummy_cb(void *args)
 {
 	RTE_SET_USED(args);
-	service_tick++;
+
+	service_calls++;
+
+	switch (rte_rand_max(3)) {
+	case 0:
+		return 0;
+	case 1:
+		service_idle_calls++;
+		return -EAGAIN;
+	default:
+		service_error_calls++;
+		return -ENOENT;
+	}
+
 	rte_delay_ms(SERVICE_DELAY);
-	return 0;
 }
 
 static int32_t dummy_mt_unsafe_cb(void *args)
@@ -121,6 +136,10 @@ unregister_all(void)
 	rte_service_lcore_reset_all();
 	rte_eal_mp_wait_lcore();
 
+	service_calls = 0;
+	service_idle_calls = 0;
+	service_error_calls = 0;
+
 	return TEST_SUCCESS;
 }
 
@@ -295,12 +314,19 @@ service_attr_get(void)
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(0, attr_value,
 			"attr_get() call didn't set correct cycles (zero)");
-	/* check correct call count */
+	/* check correct call counts */
 	const int attr_calls = RTE_SERVICE_ATTR_CALL_COUNT;
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(0, attr_value,
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(0, attr_value, "Call count was not zero");
+	const int attr_idle_calls = RTE_SERVICE_ATTR_IDLE_CALL_COUNT;
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Idle call count was not zero");
+	const int attr_error_calls = RTE_SERVICE_ATTR_ERROR_CALL_COUNT;
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Error call count was not zero");
 
 	/* Call service to increment cycle count */
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_id),
@@ -331,8 +357,13 @@ service_attr_get(void)
 
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(1, (attr_value > 0),
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(service_calls, attr_value, "Unexpected call count");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(service_idle_calls, attr_value, "Unexpected idle call count");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(service_error_calls, attr_value, "Unexpected error call count");
 
 	TEST_ASSERT_EQUAL(0, rte_service_attr_reset_all(id),
 			"Valid attr_reset_all() return success");
@@ -341,11 +372,16 @@ service_attr_get(void)
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(0, attr_value,
 			"attr_get() call didn't set correct cycles (zero)");
-	/* ensure call count > zero */
+	/* ensure call counts are zero */
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(0, (attr_value > 0),
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(0, attr_value, "Call count was not reset");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Idle call count was not reset");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Error call count was not reset");
 
 	return unregister_all();
 }
@@ -533,10 +569,10 @@ service_lcore_en_dis_able(void)
 static int
 service_lcore_running_check(void)
 {
-	uint64_t tick = service_tick;
+	uint64_t calls = service_calls;
 	rte_delay_ms(SERVICE_DELAY * 100);
-	/* if (tick != service_tick) we know the lcore as polled the service */
-	return tick != service_tick;
+	bool service_polled = calls != service_calls;
+	return service_polled;
 }
 
 static int
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..eb3a1070e4 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,17 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Extended service cores statistics.**
+
+  Two new per-service counters are added to the service cores framework.
+
+  `RTE_SERVICE_ATTR_IDLE_CALL_COUNT` tracks the number of service function
+  invocations where no actual work was performed.
+
+  `RTE_SERVICE_ATTR_ERROR_CALL_COUNT` tracks the number invocations
+  resulting in an error.
+
+  The new statistics are useful for debugging and profiling.
 
 Removed Items
 -------------
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 56379930b6..ab671daa6d 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -57,6 +57,8 @@ struct __rte_cache_aligned rte_service_spec_impl {
 
 struct service_stats {
 	RTE_ATOMIC(uint64_t) calls;
+	RTE_ATOMIC(uint64_t) idle_calls;
+	RTE_ATOMIC(uint64_t) error_calls;
 	RTE_ATOMIC(uint64_t) cycles;
 };
 
@@ -369,6 +371,16 @@ rte_service_runstate_get(uint32_t id)
 
 }
 
+static void
+service_counter_add(uint64_t *counter, uint64_t value)
+{
+	/* The lcore service worker thread is the only writer, and
+	 * thus only a non-atomic load and an atomic store is needed,
+	 * and not the more expensive atomic add.
+	 */
+	rte_atomic_store_explicit(counter, *counter + value, rte_memory_order_relaxed);
+}
+
 static inline void
 service_runner_do_callback(struct rte_service_spec_impl *s,
 			   struct core_state *cs, uint32_t service_idx)
@@ -380,27 +392,23 @@ service_runner_do_callback(struct rte_service_spec_impl *s,
 		uint64_t start = rte_rdtsc();
 		int rc = s->spec.callback(userdata);
 
-		/* The lcore service worker thread is the only writer,
-		 * and thus only a non-atomic load and an atomic store
-		 * is needed, and not the more expensive atomic
-		 * add.
-		 */
 		struct service_stats *service_stats =
 			&cs->service_stats[service_idx];
 
+		service_counter_add(&service_stats->calls, 1);
+
+		if (rc == -EAGAIN)
+			service_counter_add(&service_stats->idle_calls, 1);
+		else if (rc != 0)
+			service_counter_add(&service_stats->error_calls, 1);
+
 		if (likely(rc != -EAGAIN)) {
 			uint64_t end = rte_rdtsc();
 			uint64_t cycles = end - start;
 
-			rte_atomic_store_explicit(&cs->cycles, cs->cycles + cycles,
-				rte_memory_order_relaxed);
-			rte_atomic_store_explicit(&service_stats->cycles,
-				service_stats->cycles + cycles,
-				rte_memory_order_relaxed);
+			service_counter_add(&cs->cycles, cycles);
+			service_counter_add(&service_stats->cycles, cycles);
 		}
-
-		rte_atomic_store_explicit(&service_stats->calls,
-			service_stats->calls + 1, rte_memory_order_relaxed);
 	} else {
 		s->spec.callback(userdata);
 	}
@@ -867,6 +875,24 @@ lcore_attr_get_service_calls(uint32_t service_id, unsigned int lcore)
 		rte_memory_order_relaxed);
 }
 
+static uint64_t
+lcore_attr_get_service_idle_calls(uint32_t service_id, unsigned int lcore)
+{
+	struct core_state *cs = &lcore_states[lcore];
+
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].idle_calls,
+		rte_memory_order_relaxed);
+}
+
+static uint64_t
+lcore_attr_get_service_error_calls(uint32_t service_id, unsigned int lcore)
+{
+	struct core_state *cs = &lcore_states[lcore];
+
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].error_calls,
+		rte_memory_order_relaxed);
+}
+
 static uint64_t
 lcore_attr_get_service_cycles(uint32_t service_id, unsigned int lcore)
 {
@@ -899,6 +925,18 @@ attr_get_service_calls(uint32_t service_id)
 	return attr_get(service_id, lcore_attr_get_service_calls);
 }
 
+static uint64_t
+attr_get_service_idle_calls(uint32_t service_id)
+{
+	return attr_get(service_id, lcore_attr_get_service_idle_calls);
+}
+
+static uint64_t
+attr_get_service_error_calls(uint32_t service_id)
+{
+	return attr_get(service_id, lcore_attr_get_service_error_calls);
+}
+
 static uint64_t
 attr_get_service_cycles(uint32_t service_id)
 {
@@ -918,6 +956,12 @@ rte_service_attr_get(uint32_t id, uint32_t attr_id, uint64_t *attr_value)
 	case RTE_SERVICE_ATTR_CALL_COUNT:
 		*attr_value = attr_get_service_calls(id);
 		return 0;
+	case RTE_SERVICE_ATTR_IDLE_CALL_COUNT:
+		*attr_value = attr_get_service_idle_calls(id);
+		return 0;
+	case RTE_SERVICE_ATTR_ERROR_CALL_COUNT:
+		*attr_value = attr_get_service_error_calls(id);
+		return 0;
 	case RTE_SERVICE_ATTR_CYCLES:
 		*attr_value = attr_get_service_cycles(id);
 		return 0;
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index e49a7a877e..89057df585 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -374,15 +374,32 @@ int32_t rte_service_lcore_count_services(uint32_t lcore);
 int32_t rte_service_dump(FILE *f, uint32_t id);
 
 /**
- * Returns the number of cycles that this service has consumed
+ * Returns the number of cycles that this service has consumed. Only
+ * cycles spent in non-idle calls (i.e., calls not returning -EAGAIN)
+ * count.
  */
 #define RTE_SERVICE_ATTR_CYCLES 0
 
 /**
- * Returns the count of invocations of this service function
+ * Returns the total number of invocations of this service function
+ * (regardless of return value).
  */
 #define RTE_SERVICE_ATTR_CALL_COUNT 1
 
+/**
+ * Returns the number of invocations of this service function where the
+ * service reported having not performed any useful work (i.e.,
+ * returned -EAGAIN).
+ */
+#define RTE_SERVICE_ATTR_IDLE_CALL_COUNT 2
+
+/**
+ * Returns the number of invocations of this service function where the
+ * service reported an error (i.e., the return value was neither 0 nor
+ * -EAGAIN).
+ */
+#define RTE_SERVICE_ATTR_ERROR_CALL_COUNT 3
+
 /**
  * Get an attribute from a service.
  *
@@ -408,7 +425,8 @@ int32_t rte_service_attr_reset_all(uint32_t id);
 
 /**
  * Returns the total number of cycles that the lcore has spent on
- * running services.
+ * running services. Only non-idle calls (i.e., calls not returning
+ * -EAGAIN) count toward this total.
  */
 #define RTE_SERVICE_LCORE_ATTR_CYCLES 1
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] service: extend service function call statistics
  2024-08-09 20:25 ` [PATCH] " Mattias Rönnblom
@ 2024-09-09 19:11   ` Mattias Rönnblom
  2024-09-10  1:19     ` fengchengwen
                       ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 19:11 UTC (permalink / raw)
  To: dev; +Cc: hofors, harry.van.haaren, Stefan Sundkvist, Mattias Rönnblom

Add two new per-service counters.

RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
invocations where no work was performed.

RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
resulting in an error.

The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
counting all invocations, regardless of return value).

The new statistics may be useful for both debugging and profiling
(e.g., calculate the average per-call processing latency for non-idle
service calls).

Service core tests are extended to cover the new counters, and
coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.

The documentation for the CYCLES attributes are updated to reflect
their actual semantics.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

--

PATCH v2:
 * Fix build issue for enable_stdatomic=true.

PATCH:
 * Update release notes.
---
 app/test/test_service_cores.c          | 70 ++++++++++++++++++------
 doc/guides/rel_notes/release_24_11.rst | 11 ++++
 lib/eal/common/rte_service.c           | 75 +++++++++++++++++++++-----
 lib/eal/include/rte_service.h          | 24 +++++++--
 4 files changed, 147 insertions(+), 33 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 010ab82893..7ae6bbb179 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -3,11 +3,12 @@
  */
 
 #include <rte_common.h>
+#include <rte_cycles.h>
 #include <rte_hexdump.h>
-#include <rte_mbuf.h>
 #include <rte_malloc.h>
+#include <rte_mbuf.h>
 #include <rte_memcpy.h>
-#include <rte_cycles.h>
+#include <rte_random.h>
 
 #include <rte_service.h>
 #include <rte_service_component.h>
@@ -16,8 +17,10 @@
 
 /* used as the service core ID */
 static uint32_t slcore_id;
-/* used as timestamp to detect if a service core is running */
-static uint64_t service_tick;
+/* track service call count */
+static uint64_t service_calls;
+static uint64_t service_idle_calls;
+static uint64_t service_error_calls;
 /* used as a flag to check if a function was run */
 static uint32_t service_remote_launch_flag;
 
@@ -46,9 +49,21 @@ testsuite_teardown(void)
 static int32_t dummy_cb(void *args)
 {
 	RTE_SET_USED(args);
-	service_tick++;
+
+	service_calls++;
+
+	switch (rte_rand_max(3)) {
+	case 0:
+		return 0;
+	case 1:
+		service_idle_calls++;
+		return -EAGAIN;
+	default:
+		service_error_calls++;
+		return -ENOENT;
+	}
+
 	rte_delay_ms(SERVICE_DELAY);
-	return 0;
 }
 
 static int32_t dummy_mt_unsafe_cb(void *args)
@@ -121,6 +136,10 @@ unregister_all(void)
 	rte_service_lcore_reset_all();
 	rte_eal_mp_wait_lcore();
 
+	service_calls = 0;
+	service_idle_calls = 0;
+	service_error_calls = 0;
+
 	return TEST_SUCCESS;
 }
 
@@ -295,12 +314,19 @@ service_attr_get(void)
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(0, attr_value,
 			"attr_get() call didn't set correct cycles (zero)");
-	/* check correct call count */
+	/* check correct call counts */
 	const int attr_calls = RTE_SERVICE_ATTR_CALL_COUNT;
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(0, attr_value,
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(0, attr_value, "Call count was not zero");
+	const int attr_idle_calls = RTE_SERVICE_ATTR_IDLE_CALL_COUNT;
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Idle call count was not zero");
+	const int attr_error_calls = RTE_SERVICE_ATTR_ERROR_CALL_COUNT;
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Error call count was not zero");
 
 	/* Call service to increment cycle count */
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_id),
@@ -331,8 +357,13 @@ service_attr_get(void)
 
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(1, (attr_value > 0),
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(service_calls, attr_value, "Unexpected call count");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(service_idle_calls, attr_value, "Unexpected idle call count");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(service_error_calls, attr_value, "Unexpected error call count");
 
 	TEST_ASSERT_EQUAL(0, rte_service_attr_reset_all(id),
 			"Valid attr_reset_all() return success");
@@ -341,11 +372,16 @@ service_attr_get(void)
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(0, attr_value,
 			"attr_get() call didn't set correct cycles (zero)");
-	/* ensure call count > zero */
+	/* ensure call counts are zero */
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
-	TEST_ASSERT_EQUAL(0, (attr_value > 0),
-			"attr_get() call didn't get call count (zero)");
+	TEST_ASSERT_EQUAL(0, attr_value, "Call count was not reset");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_idle_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Idle call count was not reset");
+	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_error_calls, &attr_value),
+			"Valid attr_get() call didn't return success");
+	TEST_ASSERT_EQUAL(0, attr_value, "Error call count was not reset");
 
 	return unregister_all();
 }
@@ -533,10 +569,10 @@ service_lcore_en_dis_able(void)
 static int
 service_lcore_running_check(void)
 {
-	uint64_t tick = service_tick;
+	uint64_t calls = service_calls;
 	rte_delay_ms(SERVICE_DELAY * 100);
-	/* if (tick != service_tick) we know the lcore as polled the service */
-	return tick != service_tick;
+	bool service_polled = calls != service_calls;
+	return service_polled;
 }
 
 static int
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..eb3a1070e4 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,17 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Extended service cores statistics.**
+
+  Two new per-service counters are added to the service cores framework.
+
+  `RTE_SERVICE_ATTR_IDLE_CALL_COUNT` tracks the number of service function
+  invocations where no actual work was performed.
+
+  `RTE_SERVICE_ATTR_ERROR_CALL_COUNT` tracks the number invocations
+  resulting in an error.
+
+  The new statistics are useful for debugging and profiling.
 
 Removed Items
 -------------
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 56379930b6..a38c594ce4 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -57,6 +57,8 @@ struct __rte_cache_aligned rte_service_spec_impl {
 
 struct service_stats {
 	RTE_ATOMIC(uint64_t) calls;
+	RTE_ATOMIC(uint64_t) idle_calls;
+	RTE_ATOMIC(uint64_t) error_calls;
 	RTE_ATOMIC(uint64_t) cycles;
 };
 
@@ -369,6 +371,21 @@ rte_service_runstate_get(uint32_t id)
 
 }
 
+static void
+service_counter_add(RTE_ATOMIC(uint64_t) *counter, uint64_t operand)
+{
+	/* The lcore service worker thread is the only writer, and
+	 * thus only a non-atomic load and an atomic store is needed,
+	 * and not the more expensive atomic add.
+	 */
+	uint64_t value;
+
+	value = rte_atomic_load_explicit(counter, rte_memory_order_relaxed);
+
+	rte_atomic_store_explicit(counter, value + operand,
+				  rte_memory_order_relaxed);
+}
+
 static inline void
 service_runner_do_callback(struct rte_service_spec_impl *s,
 			   struct core_state *cs, uint32_t service_idx)
@@ -380,27 +397,23 @@ service_runner_do_callback(struct rte_service_spec_impl *s,
 		uint64_t start = rte_rdtsc();
 		int rc = s->spec.callback(userdata);
 
-		/* The lcore service worker thread is the only writer,
-		 * and thus only a non-atomic load and an atomic store
-		 * is needed, and not the more expensive atomic
-		 * add.
-		 */
 		struct service_stats *service_stats =
 			&cs->service_stats[service_idx];
 
+		service_counter_add(&service_stats->calls, 1);
+
+		if (rc == -EAGAIN)
+			service_counter_add(&service_stats->idle_calls, 1);
+		else if (rc != 0)
+			service_counter_add(&service_stats->error_calls, 1);
+
 		if (likely(rc != -EAGAIN)) {
 			uint64_t end = rte_rdtsc();
 			uint64_t cycles = end - start;
 
-			rte_atomic_store_explicit(&cs->cycles, cs->cycles + cycles,
-				rte_memory_order_relaxed);
-			rte_atomic_store_explicit(&service_stats->cycles,
-				service_stats->cycles + cycles,
-				rte_memory_order_relaxed);
+			service_counter_add(&cs->cycles, cycles);
+			service_counter_add(&service_stats->cycles, cycles);
 		}
-
-		rte_atomic_store_explicit(&service_stats->calls,
-			service_stats->calls + 1, rte_memory_order_relaxed);
 	} else {
 		s->spec.callback(userdata);
 	}
@@ -867,6 +880,24 @@ lcore_attr_get_service_calls(uint32_t service_id, unsigned int lcore)
 		rte_memory_order_relaxed);
 }
 
+static uint64_t
+lcore_attr_get_service_idle_calls(uint32_t service_id, unsigned int lcore)
+{
+	struct core_state *cs = &lcore_states[lcore];
+
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].idle_calls,
+		rte_memory_order_relaxed);
+}
+
+static uint64_t
+lcore_attr_get_service_error_calls(uint32_t service_id, unsigned int lcore)
+{
+	struct core_state *cs = &lcore_states[lcore];
+
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].error_calls,
+		rte_memory_order_relaxed);
+}
+
 static uint64_t
 lcore_attr_get_service_cycles(uint32_t service_id, unsigned int lcore)
 {
@@ -899,6 +930,18 @@ attr_get_service_calls(uint32_t service_id)
 	return attr_get(service_id, lcore_attr_get_service_calls);
 }
 
+static uint64_t
+attr_get_service_idle_calls(uint32_t service_id)
+{
+	return attr_get(service_id, lcore_attr_get_service_idle_calls);
+}
+
+static uint64_t
+attr_get_service_error_calls(uint32_t service_id)
+{
+	return attr_get(service_id, lcore_attr_get_service_error_calls);
+}
+
 static uint64_t
 attr_get_service_cycles(uint32_t service_id)
 {
@@ -918,6 +961,12 @@ rte_service_attr_get(uint32_t id, uint32_t attr_id, uint64_t *attr_value)
 	case RTE_SERVICE_ATTR_CALL_COUNT:
 		*attr_value = attr_get_service_calls(id);
 		return 0;
+	case RTE_SERVICE_ATTR_IDLE_CALL_COUNT:
+		*attr_value = attr_get_service_idle_calls(id);
+		return 0;
+	case RTE_SERVICE_ATTR_ERROR_CALL_COUNT:
+		*attr_value = attr_get_service_error_calls(id);
+		return 0;
 	case RTE_SERVICE_ATTR_CYCLES:
 		*attr_value = attr_get_service_cycles(id);
 		return 0;
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index e49a7a877e..89057df585 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -374,15 +374,32 @@ int32_t rte_service_lcore_count_services(uint32_t lcore);
 int32_t rte_service_dump(FILE *f, uint32_t id);
 
 /**
- * Returns the number of cycles that this service has consumed
+ * Returns the number of cycles that this service has consumed. Only
+ * cycles spent in non-idle calls (i.e., calls not returning -EAGAIN)
+ * count.
  */
 #define RTE_SERVICE_ATTR_CYCLES 0
 
 /**
- * Returns the count of invocations of this service function
+ * Returns the total number of invocations of this service function
+ * (regardless of return value).
  */
 #define RTE_SERVICE_ATTR_CALL_COUNT 1
 
+/**
+ * Returns the number of invocations of this service function where the
+ * service reported having not performed any useful work (i.e.,
+ * returned -EAGAIN).
+ */
+#define RTE_SERVICE_ATTR_IDLE_CALL_COUNT 2
+
+/**
+ * Returns the number of invocations of this service function where the
+ * service reported an error (i.e., the return value was neither 0 nor
+ * -EAGAIN).
+ */
+#define RTE_SERVICE_ATTR_ERROR_CALL_COUNT 3
+
 /**
  * Get an attribute from a service.
  *
@@ -408,7 +425,8 @@ int32_t rte_service_attr_reset_all(uint32_t id);
 
 /**
  * Returns the total number of cycles that the lcore has spent on
- * running services.
+ * running services. Only non-idle calls (i.e., calls not returning
+ * -EAGAIN) count toward this total.
  */
 #define RTE_SERVICE_LCORE_ATTR_CYCLES 1
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-09-09 19:11   ` [PATCH v2] " Mattias Rönnblom
@ 2024-09-10  1:19     ` fengchengwen
  2024-09-10  7:19       ` Mattias Rönnblom
  2024-09-12  9:18     ` Van Haaren, Harry
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: fengchengwen @ 2024-09-10  1:19 UTC (permalink / raw)
  To: Mattias Rönnblom, dev; +Cc: hofors, harry.van.haaren, Stefan Sundkvist

Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2024/9/10 3:11, Mattias Rönnblom wrote:
> Add two new per-service counters.
> 
> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
> invocations where no work was performed.
> 
> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> resulting in an error.
> 
> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> counting all invocations, regardless of return value).
> 
> The new statistics may be useful for both debugging and profiling
> (e.g., calculate the average per-call processing latency for non-idle
> service calls).
> 
> Service core tests are extended to cover the new counters, and
> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
> 
> The documentation for the CYCLES attributes are updated to reflect
> their actual semantics.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-09-10  1:19     ` fengchengwen
@ 2024-09-10  7:19       ` Mattias Rönnblom
  0 siblings, 0 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  7:19 UTC (permalink / raw)
  To: fengchengwen, Mattias Rönnblom, dev
  Cc: harry.van.haaren, Stefan Sundkvist

On 2024-09-10 03:19, fengchengwen wrote:
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> 

Thanks.

Harry, could we have a maintainer opinion on this patch?

> On 2024/9/10 3:11, Mattias Rönnblom wrote:
>> Add two new per-service counters.
>>
>> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
>> invocations where no work was performed.
>>
>> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
>> resulting in an error.
>>
>> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
>> counting all invocations, regardless of return value).
>>
>> The new statistics may be useful for both debugging and profiling
>> (e.g., calculate the average per-call processing latency for non-idle
>> service calls).
>>
>> Service core tests are extended to cover the new counters, and
>> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
>>
>> The documentation for the CYCLES attributes are updated to reflect
>> their actual semantics.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-09-09 19:11   ` [PATCH v2] " Mattias Rönnblom
  2024-09-10  1:19     ` fengchengwen
@ 2024-09-12  9:18     ` Van Haaren, Harry
  2024-10-01 15:24     ` David Marchand
  2024-10-02 18:26     ` David Marchand
  3 siblings, 0 replies; 18+ messages in thread
From: Van Haaren, Harry @ 2024-09-12  9:18 UTC (permalink / raw)
  To: Mattias Rönnblom, dev; +Cc: hofors, Stefan Sundkvist

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Monday, September 9, 2024 8:11 PM
> To: dev@dpdk.org <dev@dpdk.org>
> Cc: hofors@lysator.liu.se <hofors@lysator.liu.se>; Van Haaren, Harry <harry.van.haaren@intel.com>; Stefan Sundkvist <stefan.sundkvist@ericsson.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Subject: [PATCH v2] service: extend service function call statistics
>
> Add two new per-service counters.
>
> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
> invocations where no work was performed.
>
> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> resulting in an error.
>
> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> counting all invocations, regardless of return value).
>
> The new statistics may be useful for both debugging and profiling
> (e.g., calculate the average per-call processing latency for non-idle
> service calls).
>
> Service core tests are extended to cover the new counters, and
> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
>
> The documentation for the CYCLES attributes are updated to reflect
> their actual semantics.
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

Thanks for the patch; Nice that these extra counters help in understanding what
a specific service is doing (e.g. idle, errors) in the context of the (existing) call count.

Test coverage and updates all look good to me:

Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

[-- Attachment #2: Type: text/html, Size: 7477 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-09-09 19:11   ` [PATCH v2] " Mattias Rönnblom
  2024-09-10  1:19     ` fengchengwen
  2024-09-12  9:18     ` Van Haaren, Harry
@ 2024-10-01 15:24     ` David Marchand
  2024-10-02 18:26     ` David Marchand
  3 siblings, 0 replies; 18+ messages in thread
From: David Marchand @ 2024-10-01 15:24 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, hofors, harry.van.haaren, Stefan Sundkvist, Chengwen Feng

On Mon, Sep 9, 2024 at 9:28 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> Add two new per-service counters.
>
> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
> invocations where no work was performed.
>
> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> resulting in an error.
>
> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> counting all invocations, regardless of return value).
>
> The new statistics may be useful for both debugging and profiling
> (e.g., calculate the average per-call processing latency for non-idle
> service calls).
>
> Service core tests are extended to cover the new counters, and
> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
>
> The documentation for the CYCLES attributes are updated to reflect
> their actual semantics.
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>


Applied, thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-09-09 19:11   ` [PATCH v2] " Mattias Rönnblom
                       ` (2 preceding siblings ...)
  2024-10-01 15:24     ` David Marchand
@ 2024-10-02 18:26     ` David Marchand
  2024-10-02 18:56       ` Mattias Rönnblom
  3 siblings, 1 reply; 18+ messages in thread
From: David Marchand @ 2024-10-02 18:26 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, harry.van.haaren, Stefan Sundkvist

Hello Mattias,

On Mon, Sep 9, 2024 at 9:28 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
> @@ -46,9 +49,21 @@ testsuite_teardown(void)
>  static int32_t dummy_cb(void *args)
>  {
>         RTE_SET_USED(args);
> -       service_tick++;
> +
> +       service_calls++;
> +
> +       switch (rte_rand_max(3)) {
> +       case 0:
> +               return 0;
> +       case 1:
> +               service_idle_calls++;
> +               return -EAGAIN;
> +       default:
> +               service_error_calls++;
> +               return -ENOENT;
> +       }
> +
>         rte_delay_ms(SERVICE_DELAY);
> -       return 0;
>  }
>
>  static int32_t dummy_mt_unsafe_cb(void *args)

Coverity flagged this patch with issue #445158.
rte_delay_ms() is now unreachable.

I suppose this delay is not that important for the unit test and we
can remove it, but as I am not sure I'll let you have a look and send
a fix.

Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-10-02 18:26     ` David Marchand
@ 2024-10-02 18:56       ` Mattias Rönnblom
  2024-10-02 19:08         ` David Marchand
  0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2024-10-02 18:56 UTC (permalink / raw)
  To: David Marchand, Mattias Rönnblom
  Cc: dev, harry.van.haaren, Stefan Sundkvist

On 2024-10-02 20:26, David Marchand wrote:
> Hello Mattias,
> 
> On Mon, Sep 9, 2024 at 9:28 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> @@ -46,9 +49,21 @@ testsuite_teardown(void)
>>   static int32_t dummy_cb(void *args)
>>   {
>>          RTE_SET_USED(args);
>> -       service_tick++;
>> +
>> +       service_calls++;
>> +
>> +       switch (rte_rand_max(3)) {
>> +       case 0:
>> +               return 0;
>> +       case 1:
>> +               service_idle_calls++;
>> +               return -EAGAIN;
>> +       default:
>> +               service_error_calls++;
>> +               return -ENOENT;
>> +       }
>> +
>>          rte_delay_ms(SERVICE_DELAY);
>> -       return 0;
>>   }
>>
>>   static int32_t dummy_mt_unsafe_cb(void *args)
> 
> Coverity flagged this patch with issue #445158.
> rte_delay_ms() is now unreachable.
> 
> I suppose this delay is not that important for the unit test and we
> can remove it, but as I am not sure I'll let you have a look and send
> a fix.
> 

It works without it I think, but I would keep it, and add it to the 
"case 0" branch.

Let me know if you want a v2.


> Thanks.
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-10-02 18:56       ` Mattias Rönnblom
@ 2024-10-02 19:08         ` David Marchand
  2024-10-02 19:43           ` Mattias Rönnblom
  0 siblings, 1 reply; 18+ messages in thread
From: David Marchand @ 2024-10-02 19:08 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, harry.van.haaren, Stefan Sundkvist

On Wed, Oct 2, 2024 at 8:57 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> > Coverity flagged this patch with issue #445158.
> > rte_delay_ms() is now unreachable.
> >
> > I suppose this delay is not that important for the unit test and we
> > can remove it, but as I am not sure I'll let you have a look and send
> > a fix.
> >
>
> It works without it I think, but I would keep it, and add it to the
> "case 0" branch.
>
> Let me know if you want a v2.

Unfortunately, coverity is run only for merged stuff.
So the report we got is for merged patch in the main repo.

Please send a fix.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-10-02 19:08         ` David Marchand
@ 2024-10-02 19:43           ` Mattias Rönnblom
  2024-10-03  9:53             ` Van Haaren, Harry
  0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2024-10-02 19:43 UTC (permalink / raw)
  To: David Marchand
  Cc: Mattias Rönnblom, dev, harry.van.haaren, Stefan Sundkvist

On 2024-10-02 21:08, David Marchand wrote:
> On Wed, Oct 2, 2024 at 8:57 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>> Coverity flagged this patch with issue #445158.
>>> rte_delay_ms() is now unreachable.
>>>
>>> I suppose this delay is not that important for the unit test and we
>>> can remove it, but as I am not sure I'll let you have a look and send
>>> a fix.
>>>
>>
>> It works without it I think, but I would keep it, and add it to the
>> "case 0" branch.
>>
>> Let me know if you want a v2.
> 
> Unfortunately, coverity is run only for merged stuff.
> So the report we got is for merged patch in the main repo.
> 
> Please send a fix.
> 

Done.

I decided I could just as well reintroduce the old behavior. You could 
refactor away all this sleeping business I think, but that is for 
another day.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] service: extend service function call statistics
  2024-10-02 19:43           ` Mattias Rönnblom
@ 2024-10-03  9:53             ` Van Haaren, Harry
  0 siblings, 0 replies; 18+ messages in thread
From: Van Haaren, Harry @ 2024-10-03  9:53 UTC (permalink / raw)
  To: Mattias Rönnblom, Marchand, David
  Cc: Mattias Rönnblom, dev, Stefan Sundkvist, Ruane, Sean

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

> From: Mattias Rönnblom <hofors@lysator.liu.se>
> Sent: Wednesday, October 2, 2024 8:43 PM
> To: Marchand, David <david.marchand@redhat.com>
> Cc: Mattias Rönnblom <mattias.ronnblom@ericsson.com>; dev@dpdk.org <dev@dpdk.org>; Van Haaren, Harry <harry.van.haaren@intel.com>; Stefan Sundkvist <stefan.sundkvist@ericsson.com>
> Subject: Re: [PATCH v2] service: extend service function call statistics
>
> On 2024-10-02 21:08, David Marchand wrote:
> > On Wed, Oct 2, 2024 at 8:57 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> >>> Coverity flagged this patch with issue #445158.
> >>> rte_delay_ms() is now unreachable.
> >>>
> >>> I suppose this delay is not that important for the unit test and we
> >>> can remove it, but as I am not sure I'll let you have a look and send
> >>> a fix.
> >>>
> >>
> >> It works without it I think, but I would keep it, and add it to the
> >> "case 0" branch.
> >>
> >> Let me know if you want a v2.
> >
> > Unfortunately, coverity is run only for merged stuff.
> > So the report we got is for merged patch in the main repo.
> >
> > Please send a fix.
> >
>
> Done.
>
> I decided I could just as well reintroduce the old behavior. You could
> refactor away all this sleeping business I think, but that is for
> another day.

Thanks for sending the patch Mattias - Coverity flagged me too, was intending on
having a look with colleague Sean here today - instead we'll review your patch shortly.

The delays were introduced so thread-spawns in the CI have some time to take affect,
and to (intentionally) introduce some jitter so we cover all cases of one-thread-before-the-other
checking poll counts and correct behaviour; this is useful as service-cores is inherently racy in
terms of things running in parallel, hence the service_dummy() function has the delay.
All this to say, lets leave the rte_delay_ms() in.

Expect a response (hopefully Tested/Acked-by) soon! Regards -Harry

[-- Attachment #2: Type: text/html, Size: 6774 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-10-03  9:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 19:15 [RFC] service: extend service function call statistics Mattias Rönnblom
2024-01-25 23:19 ` Morten Brørup
2024-01-26  8:28   ` Mattias Rönnblom
2024-01-26 10:07     ` Morten Brørup
2024-01-27 19:31       ` Mattias Rönnblom
2024-01-29 12:50         ` Van Haaren, Harry
2024-01-30  7:16           ` Mattias Rönnblom
2024-08-09 20:25 ` [PATCH] " Mattias Rönnblom
2024-09-09 19:11   ` [PATCH v2] " Mattias Rönnblom
2024-09-10  1:19     ` fengchengwen
2024-09-10  7:19       ` Mattias Rönnblom
2024-09-12  9:18     ` Van Haaren, Harry
2024-10-01 15:24     ` David Marchand
2024-10-02 18:26     ` David Marchand
2024-10-02 18:56       ` Mattias Rönnblom
2024-10-02 19:08         ` David Marchand
2024-10-02 19:43           ` Mattias Rönnblom
2024-10-03  9:53             ` Van Haaren, Harry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).