DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port
@ 2018-06-04  7:21 bugzilla
  2018-06-04  8:20 ` Jerin Jacob
  0 siblings, 1 reply; 6+ messages in thread
From: bugzilla @ 2018-06-04  7:21 UTC (permalink / raw)
  To: dev

https://dpdk.org/tracker/show_bug.cgi?id=60

            Bug ID: 60
           Summary: rte_event_port_unlink() causes subsequent events to
                    end up in wrong port
           Product: DPDK
           Version: 17.11
          Hardware: x86
                OS: Linux
            Status: CONFIRMED
          Severity: major
          Priority: Normal
         Component: eventdev
          Assignee: dev@dpdk.org
          Reporter: matias.elo@nokia.com
  Target Milestone: ---

Created attachment 8
  --> https://dpdk.org/tracker/attachment.cgi?id=8&action=edit
Test application

I'm seeing some unexpected(?) behavior when calling rte_event_port_unlink()
with the SW eventdev driver (DPDK 17.11.2/18.02.1,
RTE_EVENT_MAX_QUEUES_PER_DEV=255). After calling rte_event_port_unlink(),
the enqueued events may end up either back to the unlinked port or to port
zero.

Scenario:

- Run SW evendev on a service core
- Start eventdev with e.g. 16 ports. Each core will have a dedicated port.
- Create 1 atomic queue and link all active ports to it (some ports may not
be linked).
- Allocate some events and enqueue them to the created queue
- Next, each worker core does a number of scheduling rounds concurrently.
E.g.

uint64_t rx_events = 0;
while(rx_events < SCHED_ROUNDS) {
        num_deq = rte_event_dequeue_burst(dev_id, port_id, ev, 1, 0);

        if (num_deq) {
                rx_events++;
                rte_event_enqueue_burst(dev_id, port_id, ev, 1);
        }
}

- This works fine but problems occur when doing cleanup after the first
loop finishes on some core.
E.g.

rte_event_port_unlink(dev_id, port_id, NULL, 0);

while(1) {
        num_deq = rte_event_dequeue_burst(dev_id, port_id, ev, 1, 0);

        if (num_deq == 0)
                break;

        rte_event_enqueue_burst(dev_id, port_id, ev, 1);
}

- The events enqueued in the cleanup loop will ramdomly end up either back to
the same port (which has already been unlinked) or to port zero, which is not
used (mapping rte_lcore_id to port_id).

As far as I understand the eventdev API, an eventdev port shouldn't have to be
linked to the target queue for enqueue to work properly.

I've attached a simple test application for reproducing this issue.
# sudo ./eventdev --vdev event_sw0 -s 0x2

Below is an example rte_event_dev_dump() output when processing events with two
cores (ports 2 and 3). The rest of the ports are not linked at all but events
still end up to port zero stalling the system.


Regards,
Matias

EventDev todo-fix-name: ports 16, qids 1
        rx   908342
        drop 0
        tx   908342
        sched calls: 42577156
        sched cq/qid call: 43120490
        sched no IQ enq: 42122057
        sched no CQ enq: 42122064
        inflight 32, credits: 4064
  Port 0 
        rx   0  drop 0  tx   2  inflight 2
        Max New: 1024   Avg cycles PP: 0        Credits: 0
        Receive burst distribution:
                0:-nan% 
        rx ring used:    0      free: 4096
        cq ring used:    2      free:   14
  Port 1 
        rx   0  drop 0  tx   0  inflight 0
        Max New: 1024   Avg cycles PP: 0        Credits: 0
        Receive burst distribution:
                0:-nan% 
        rx ring used:    0      free: 4096
        cq ring used:    0      free:   16
  Port 2 
        rx   524292     drop 0  tx   524290     inflight 0
        Max New: 1024   Avg cycles PP: 190      Credits: 30
        Receive burst distribution:
                0:98% 1-4:1.82% 
        rx ring used:    0      free: 4096
        cq ring used:    0      free:   16
  Port 3 
        rx   384050     drop 0  tx   384050     inflight 0
        Max New: 1024   Avg cycles PP: 191      Credits: 0
        Receive burst distribution:
                0:100% 1-4:0.04% 
        rx ring used:    0      free: 4096
        cq ring used:    0      free:   16
...
  Port 15 
        rx   0  drop 0  tx   0  inflight 0
        Max New: 1024   Avg cycles PP: 0        Credits: 0
        Receive burst distribution:
                0:-nan% 
        rx ring used:    0      free: 4096
        cq ring used:    0      free:   16
  Queue 0 (Atomic)
        rx   908342     drop 0  tx   908342
        Per Port Stats:
          Port 0: Pkts: 2       Flows: 1
          Port 1: Pkts: 0       Flows: 0
          Port 2: Pkts: 524290  Flows: 0
          Port 3: Pkts: 384050  Flows: 0
          Port 4: Pkts: 0       Flows: 0
          Port 5: Pkts: 0       Flows: 0
          Port 6: Pkts: 0       Flows: 0
          Port 7: Pkts: 0       Flows: 0
          Port 8: Pkts: 0       Flows: 0
          Port 9: Pkts: 0       Flows: 0
          Port 10: Pkts: 0      Flows: 0
          Port 11: Pkts: 0      Flows: 0
          Port 12: Pkts: 0      Flows: 0
          Port 13: Pkts: 0      Flows: 0
          Port 14: Pkts: 0      Flows: 0
          Port 15: Pkts: 0      Flows: 0
        -- iqs empty --

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port
  2018-06-04  7:21 [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port bugzilla
@ 2018-06-04  8:20 ` Jerin Jacob
  2018-06-05 16:43   ` Van Haaren, Harry
  0 siblings, 1 reply; 6+ messages in thread
From: Jerin Jacob @ 2018-06-04  8:20 UTC (permalink / raw)
  To: bugzilla
  Cc: dev, harry.van.haaren, liang.j.ma, hemant.agrawal, sunil.kori,
	nipun.gupta

-----Original Message-----
> Date: Mon, 4 Jun 2018 07:21:18 +0000
> From: bugzilla@dpdk.org
> To: dev@dpdk.org
> Subject: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent
>  events to end up in wrong port
> 
> https://dpdk.org/tracker/show_bug.cgi?id=60
> 
>             Bug ID: 60
>            Summary: rte_event_port_unlink() causes subsequent events to
>                     end up in wrong port
>            Product: DPDK
>            Version: 17.11
>           Hardware: x86
>                 OS: Linux
>             Status: CONFIRMED
>           Severity: major
>           Priority: Normal
>          Component: eventdev
>           Assignee: dev@dpdk.org
>           Reporter: matias.elo@nokia.com
>   Target Milestone: ---
> 
> Created attachment 8
>   --> https://dpdk.org/tracker/attachment.cgi?id=8&action=edit
> Test application
> 
> I'm seeing some unexpected(?) behavior when calling rte_event_port_unlink()
> with the SW eventdev driver (DPDK 17.11.2/18.02.1,
> RTE_EVENT_MAX_QUEUES_PER_DEV=255). After calling rte_event_port_unlink(),
> the enqueued events may end up either back to the unlinked port or to port
> zero.
> 
> Scenario:
> 
> - Run SW evendev on a service core
> - Start eventdev with e.g. 16 ports. Each core will have a dedicated port.
> - Create 1 atomic queue and link all active ports to it (some ports may not
> be linked).
> - Allocate some events and enqueue them to the created queue
> - Next, each worker core does a number of scheduling rounds concurrently.
> E.g.
> 
> uint64_t rx_events = 0;
> while(rx_events < SCHED_ROUNDS) {
>         num_deq = rte_event_dequeue_burst(dev_id, port_id, ev, 1, 0);
> 
>         if (num_deq) {
>                 rx_events++;
>                 rte_event_enqueue_burst(dev_id, port_id, ev, 1);
>         }
> }
> 
> - This works fine but problems occur when doing cleanup after the first
> loop finishes on some core.
> E.g.
> 
> rte_event_port_unlink(dev_id, port_id, NULL, 0);
> 
> while(1) {
>         num_deq = rte_event_dequeue_burst(dev_id, port_id, ev, 1, 0);
> 
>         if (num_deq == 0)
>                 break;
> 
>         rte_event_enqueue_burst(dev_id, port_id, ev, 1);
> }
> 
> - The events enqueued in the cleanup loop will ramdomly end up either back to
> the same port (which has already been unlinked) or to port zero, which is not
> used (mapping rte_lcore_id to port_id).
> 
> As far as I understand the eventdev API, an eventdev port shouldn't have to be
> linked to the target queue for enqueue to work properly.

That is a grey area in the spec. octeontx drivers works as the way you
described. I am not sure about SW driver(CC:
harry.van.haaren@intel.com), If there is no performance impact for none of
the drivers and it is do able for all HW and SW implementation then can
do that way(CC: all PMD maintainers)

No related to this question, Are you planning to use rte_event_port_unlink() in fastpath?
Does rte_event_stop() works for you, if it is in slow path.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port
  2018-06-04  8:20 ` Jerin Jacob
@ 2018-06-05 16:43   ` Van Haaren, Harry
  0 siblings, 0 replies; 6+ messages in thread
From: Van Haaren, Harry @ 2018-06-05 16:43 UTC (permalink / raw)
  To: Jerin Jacob, bugzilla
  Cc: dev, Ma, Liang J, hemant.agrawal, sunil.kori, nipun.gupta

> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Monday, June 4, 2018 9:20 AM
> To: bugzilla@dpdk.org
> Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haaren@intel.com>; Ma, Liang
> J <liang.j.ma@intel.com>; hemant.agrawal@nxp.com; sunil.kori@nxp.com;
> nipun.gupta@nxp.com
> Subject: Re: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent
> events to end up in wrong port
> 
> -----Original Message-----
> > Date: Mon, 4 Jun 2018 07:21:18 +0000
> > From: bugzilla@dpdk.org
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent
> >  events to end up in wrong port
> >
> > https://dpdk.org/tracker/show_bug.cgi?id=60
> >
> >             Bug ID: 60
> >            Summary: rte_event_port_unlink() causes subsequent events to
> >                     end up in wrong port
> >            Product: DPDK
> >            Version: 17.11
> >           Hardware: x86
> >                 OS: Linux
> >             Status: CONFIRMED
> >           Severity: major
> >           Priority: Normal
> >          Component: eventdev
> >           Assignee: dev@dpdk.org
> >           Reporter: matias.elo@nokia.com
> >   Target Milestone: ---
> >
> > Created attachment 8
> >   --> https://dpdk.org/tracker/attachment.cgi?id=8&action=edit
> > Test application
> >
> > I'm seeing some unexpected(?) behavior when calling
> rte_event_port_unlink()
> > with the SW eventdev driver (DPDK 17.11.2/18.02.1,
> > RTE_EVENT_MAX_QUEUES_PER_DEV=255). After calling rte_event_port_unlink(),
> > the enqueued events may end up either back to the unlinked port or to port
> > zero.
> >
> > Scenario:
> >
> > - Run SW evendev on a service core
> > - Start eventdev with e.g. 16 ports. Each core will have a dedicated port.
> > - Create 1 atomic queue and link all active ports to it (some ports may
> not
> > be linked).
> > - Allocate some events and enqueue them to the created queue
> > - Next, each worker core does a number of scheduling rounds concurrently.
> > E.g.
> >
> > uint64_t rx_events = 0;
> > while(rx_events < SCHED_ROUNDS) {
> >         num_deq = rte_event_dequeue_burst(dev_id, port_id, ev, 1, 0);
> >
> >         if (num_deq) {
> >                 rx_events++;
> >                 rte_event_enqueue_burst(dev_id, port_id, ev, 1);
> >         }
> > }
> >
> > - This works fine but problems occur when doing cleanup after the first
> > loop finishes on some core.
> > E.g.
> >
> > rte_event_port_unlink(dev_id, port_id, NULL, 0);
> >
> > while(1) {
> >         num_deq = rte_event_dequeue_burst(dev_id, port_id, ev, 1, 0);
> >
> >         if (num_deq == 0)
> >                 break;
> >
> >         rte_event_enqueue_burst(dev_id, port_id, ev, 1);
> > }
> >
> > - The events enqueued in the cleanup loop will ramdomly end up either back
> to
> > the same port (which has already been unlinked) or to port zero, which is
> not
> > used (mapping rte_lcore_id to port_id).
> >
> > As far as I understand the eventdev API, an eventdev port shouldn't have
> to be
> > linked to the target queue for enqueue to work properly.
> 
> That is a grey area in the spec. octeontx drivers works as the way you
> described. I am not sure about SW driver(CC:
> harry.van.haaren@intel.com), If there is no performance impact for none of
> the drivers and it is do able for all HW and SW implementation then can
> do that way(CC: all PMD maintainers)
> 
> No related to this question, Are you planning to use rte_event_port_unlink()
> in fastpath?
> Does rte_event_stop() works for you, if it is in slow path.


Hi Matias,

Thanks for opening, from memory the sw_port_unlink() API does attempt to handle that correctly.

Having a quick look, we scan for the port to unlink, from the queue, and if we find the queue->port combination, we copy the furthest link in the array to the found position, and reduce num mapped queues by one (aka, we keep the array contiguous from 0 to num_mapped_queues).

The appropriate rte_smp_wmb() is in place to avoid race-conditions between threads there..

I think this should handle the unlink case you mention, however perhaps you have identified a genuine bug. If you have more info or a sample config / app that easily demonstrates the issue that would help reproduce/debug here? 

Unfortunately I will be away until next week, but I will check up on this thread once I'm back in the office.

Regards, -Harry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port
  2018-06-19  9:20 Elo, Matias (Nokia - FI/Espoo)
@ 2018-06-26 13:35 ` Maxim Uvarov
  0 siblings, 0 replies; 6+ messages in thread
From: Maxim Uvarov @ 2018-06-26 13:35 UTC (permalink / raw)
  To: Elo, Matias (Nokia - FI/Espoo), harry.van.haaren; +Cc: dev, jerin.jacob

Hello,

is there any progress on this?


Thank you,
Maxim.

On 19.06.2018 12:20, Elo, Matias (Nokia - FI/Espoo) wrote:
>> I think this should handle the unlink case you mention, however perhaps you have identified a genuine bug. If you have more info or a sample config / app that easily demonstrates the issue that would help reproduce/debug here?
>
> Hi Harry,
>
> The bug report includes a simple test application for demonstrating the issue. I've done some further digging and the following simple patch seems to fix the issue of events ending up in wrong ports.
>
>
> diff --git a/drivers/event/sw/sw_evdev_scheduler.c b/drivers/event/sw/sw_evdev_scheduler.c
> index 8a2c9d4f9..57298345d 100644
> --- a/drivers/event/sw/sw_evdev_scheduler.c
> +++ b/drivers/event/sw/sw_evdev_scheduler.c
> @@ -79,9 +79,11 @@ sw_schedule_atomic_to_cq(struct sw_evdev *sw, struct sw_qid * const qid,
>   		int cq = fid->cq;
>   
>   		if (cq < 0) {
> -			uint32_t cq_idx = qid->cq_next_tx++;
> -			if (qid->cq_next_tx == qid->cq_num_mapped_cqs)
> +			uint32_t cq_idx;
> +			if (qid->cq_next_tx >= qid->cq_num_mapped_cqs)
>   				qid->cq_next_tx = 0;
> +			cq_idx = qid->cq_next_tx++;
> +
>   			cq = qid->cq_map[cq_idx];
>   
>   			/* find least used */
> @@ -168,9 +170,11 @@ sw_schedule_parallel_to_cq(struct sw_evdev *sw, struct sw_qid * const qid,
>   		do {
>   			if (++cq_check_count > qid->cq_num_mapped_cqs)
>   				goto exit;
> -			cq = qid->cq_map[cq_idx];
> -			if (++cq_idx == qid->cq_num_mapped_cqs)
> +
> +			if (cq_idx >= qid->cq_num_mapped_cqs)
>   				cq_idx = 0;
> +			cq = qid->cq_map[cq_idx++];
> +
>   		} while (rte_event_ring_free_count(
>   				sw->ports[cq].cq_worker_ring) == 0 ||
>   				sw->ports[cq].inflights == SW_PORT_HIST_LIST);
> @@ -251,6 +255,9 @@ sw_schedule_qid_to_cq(struct sw_evdev *sw)
>   		if (iq_num >= SW_IQS_MAX)
>   			continue;
>   
> +		if (qid->cq_num_mapped_cqs == 0)
> +			continue;
> +
>   		uint32_t pkts_done = 0;
>   		uint32_t count = iq_ring_count(qid->iq[iq_num]);
>
>
> However, events from atomic/ordered queues may still end up getting stuck when unlinking (scheduled back to unlinked port). In case of atomic queues the problem seems to be related to (struct sw_fid_t *)fid->cq fields being invalid. With ordered queues events get stuck in reorder buffer.
>
> -Matias
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port
@ 2018-06-19  9:20 Elo, Matias (Nokia - FI/Espoo)
  0 siblings, 0 replies; 6+ messages in thread
From: Elo, Matias (Nokia - FI/Espoo) @ 2018-06-19  9:20 UTC (permalink / raw)
  To: jerin.jacob; +Cc: dev

> No related to this question, Are you planning to use rte_event_port_unlink() in fastpath?
> Does rte_event_stop() works for you, if it is in slow path.

Hi Jerin,

Sorry for missing your question earlier. We need rte_event_port_link() /
rte_event_port_unlink() for doing load balancing, so calling rte_event_stop()
isn't an option.

-Matias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port
@ 2018-06-19  9:20 Elo, Matias (Nokia - FI/Espoo)
  2018-06-26 13:35 ` Maxim Uvarov
  0 siblings, 1 reply; 6+ messages in thread
From: Elo, Matias (Nokia - FI/Espoo) @ 2018-06-19  9:20 UTC (permalink / raw)
  To: harry.van.haaren; +Cc: dev, jerin.jacob

> I think this should handle the unlink case you mention, however perhaps you have identified a genuine bug. If you have more info or a sample config / app that easily demonstrates the issue that would help reproduce/debug here? 


Hi Harry,

The bug report includes a simple test application for demonstrating the issue. I've done some further digging and the following simple patch seems to fix the issue of events ending up in wrong ports.


diff --git a/drivers/event/sw/sw_evdev_scheduler.c b/drivers/event/sw/sw_evdev_scheduler.c
index 8a2c9d4f9..57298345d 100644
--- a/drivers/event/sw/sw_evdev_scheduler.c
+++ b/drivers/event/sw/sw_evdev_scheduler.c
@@ -79,9 +79,11 @@ sw_schedule_atomic_to_cq(struct sw_evdev *sw, struct sw_qid * const qid,
 		int cq = fid->cq;
 
 		if (cq < 0) {
-			uint32_t cq_idx = qid->cq_next_tx++;
-			if (qid->cq_next_tx == qid->cq_num_mapped_cqs)
+			uint32_t cq_idx;
+			if (qid->cq_next_tx >= qid->cq_num_mapped_cqs)
 				qid->cq_next_tx = 0;
+			cq_idx = qid->cq_next_tx++;
+
 			cq = qid->cq_map[cq_idx];
 
 			/* find least used */
@@ -168,9 +170,11 @@ sw_schedule_parallel_to_cq(struct sw_evdev *sw, struct sw_qid * const qid,
 		do {
 			if (++cq_check_count > qid->cq_num_mapped_cqs)
 				goto exit;
-			cq = qid->cq_map[cq_idx];
-			if (++cq_idx == qid->cq_num_mapped_cqs)
+
+			if (cq_idx >= qid->cq_num_mapped_cqs)
 				cq_idx = 0;
+			cq = qid->cq_map[cq_idx++];
+
 		} while (rte_event_ring_free_count(
 				sw->ports[cq].cq_worker_ring) == 0 ||
 				sw->ports[cq].inflights == SW_PORT_HIST_LIST);
@@ -251,6 +255,9 @@ sw_schedule_qid_to_cq(struct sw_evdev *sw)
 		if (iq_num >= SW_IQS_MAX)
 			continue;
 
+		if (qid->cq_num_mapped_cqs == 0)
+			continue;
+
 		uint32_t pkts_done = 0;
 		uint32_t count = iq_ring_count(qid->iq[iq_num]);


However, events from atomic/ordered queues may still end up getting stuck when unlinking (scheduled back to unlinked port). In case of atomic queues the problem seems to be related to (struct sw_fid_t *)fid->cq fields being invalid. With ordered queues events get stuck in reorder buffer.

-Matias

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-26 13:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04  7:21 [dpdk-dev] [Bug 60] rte_event_port_unlink() causes subsequent events to end up in wrong port bugzilla
2018-06-04  8:20 ` Jerin Jacob
2018-06-05 16:43   ` Van Haaren, Harry
2018-06-19  9:20 Elo, Matias (Nokia - FI/Espoo)
2018-06-26 13:35 ` Maxim Uvarov
2018-06-19  9:20 Elo, Matias (Nokia - FI/Espoo)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).