Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

DPDK usage discussions
 help / color / mirror / Atom feed

* Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
@ 2025-02-14  8:43 NAGENDRA BALAGANI
  2025-02-14  9:22 ` Van Haaren, Harry
  0 siblings, 1 reply; 10+ messages in thread
From: NAGENDRA BALAGANI @ 2025-02-14  8:43 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

Hi Team,

We are facing a race condition in our DPDK application where one thread is reading packets from queue using rte_eth_rx_burst() , while another thread is attempting to stop the device using rte_eth_dev_stop(). This is causing instability, as the reading thread may still be accessing queues while the device is being stopped.

Could you please suggest the best way to mitigate this race condition without impacting fast path performance? We want to ensure safe synchronization while maintaining high throughput.

Looking forward to your insights.

Regards,
Nagendra

[-- Attachment #2: Type: text/html, Size: 2597 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-14  8:43 Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK NAGENDRA BALAGANI
@ 2025-02-14  9:22 ` Van Haaren, Harry
  2025-02-14 16:31   ` Stephen Hemminger
  2025-02-17 18:57   ` Changchun Zhang
  0 siblings, 2 replies; 10+ messages in thread
From: Van Haaren, Harry @ 2025-02-14  9:22 UTC (permalink / raw)
  To: NAGENDRA BALAGANI, users

[-- Attachment #1: Type: text/plain, Size: 3321 bytes --]

> From: NAGENDRA BALAGANI <nagendra.balagani@oracle.com>
> Sent: Friday, February 14, 2025 8:43 AM
> To: users@dpdk.org <users@dpdk.org>
> Subject: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
>
> Hi Team,

Ni Nagendra,

> We are facing a race condition in our DPDK application where one thread is reading packets from queue using rte_eth_rx_burst() , while another thread is attempting to stop the device using rte_eth_dev_stop(). This is causing instability, as the reading thread may still be accessing queues while the device is being stopped.

This is as expected - it is not valid to stop a device while other cores are using it.

> Could you please suggest the best way to mitigate this race condition without impacting fast path performance? We want to ensure safe synchronization while maintaining high throughput.

There are many implementations possible, but the end result of them all is "ensure that the dataplane core is NOT polling a device that is stopping".

1) One implementation is using a "force_quit" boolean value (see dpdk/examples/l2fwd/main.c for example). This approach changes the lcore's "while (1)" polling loop, and turns it into a "while (!force_quit)". (Note some nuance around "volatile" keyword for the boolean to ensure reloading on each iteration, but that's off topic).

2) Another more flexible/powerful implementation could be some form of message passing. For example imagine the dataplane thread and control plane (stopping ethdev) thread are capable of communicating by sending an "event" to eachother. When a "stop polling" event is recieved by the dataplane thread, it disables polling just that eth device/queue, and responds with a "stopped polling" reply. On recieving the "stopped polling" event, the thread that wants to stop the eth device can now safely do so.

Both of these implementations will have no datapath performance impact:
1) a single boolean value check (shared state cache-line, likely in the core's cache) per iteration of polling of the app is super lightweight
2) an "event ringbuffer" check (when empty, also shared-state, likely in cache) per iteration is also super light.

General notes on the above:
There's even an option to only check the boolean/event-ringbuffer once every N iterations: this will cause even less overhead, but will increase the latency of event action/reply on the datapath thread. As almost always, it depends on what's important for your use-case!

The main difference between implementation 1) and 2) above can be captured by this phrase: "Do not communicate by sharing memory; instead, share memory by communicating.", which I read at the Rust docs here: https://doc.rust-lang.org/book/ch16-02-message-passing.html. 1) literally shares memory (both threads access the force_quit value directly). 2) focusses on communicating: which enables avoiding the race condition in a more powerful/elegant way (and future proof too - it allows adding new event types cleanly, which the force_quit bool value does not.) I like this design-mentality, as it is a good/high-performance/scalable way of interacting between threads, and scales to future needs too: so I recommend approach 2.

> Looking forward to your insights.
>
> Regards,
> Nagendra

Regards, -Harry

[-- Attachment #2: Type: text/html, Size: 9343 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-14  9:22 ` Van Haaren, Harry
@ 2025-02-14 16:31   ` Stephen Hemminger
  2025-02-17  9:32     ` [External] : " NAGENDRA BALAGANI
  2025-02-17 18:57   ` Changchun Zhang
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2025-02-14 16:31 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: NAGENDRA BALAGANI, users

On Fri, 14 Feb 2025 09:22:59 +0000
"Van Haaren, Harry" <harry.van.haaren@intel.com> wrote:

> > From: NAGENDRA BALAGANI <nagendra.balagani@oracle.com>
> > Sent: Friday, February 14, 2025 8:43 AM
> > To: users@dpdk.org <users@dpdk.org>
> > Subject: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
> >
> > Hi Team,  
> 
> Ni Nagendra,
> 
> > We are facing a race condition in our DPDK application where one thread is reading packets from queue using rte_eth_rx_burst() , while another thread is attempting to stop the device using rte_eth_dev_stop(). This is causing instability, as the reading thread may still be accessing queues while the device is being stopped.  
> 
> This is as expected - it is not valid to stop a device while other cores are using it.
> 
> > Could you please suggest the best way to mitigate this race condition without impacting fast path performance? We want to ensure safe synchronization while maintaining high throughput.  
> 
> There are many implementations possible, but the end result of them all is "ensure that the dataplane core is NOT polling a device that is stopping".
> 
> 1) One implementation is using a "force_quit" boolean value (see dpdk/examples/l2fwd/main.c for example). This approach changes the lcore's "while (1)" polling loop, and turns it into a "while (!force_quit)". (Note some nuance around "volatile" keyword for the boolean to ensure reloading on each iteration, but that's off topic).
> 
> 2) Another more flexible/powerful implementation could be some form of message passing. For example imagine the dataplane thread and control plane (stopping ethdev) thread are capable of communicating by sending an "event" to eachother. When a "stop polling" event is recieved by the dataplane thread, it disables polling just that eth device/queue, and responds with a "stopped polling" reply. On recieving the "stopped polling" event, the thread that wants to stop the eth device can now safely do so.
> 
> Both of these implementations will have no datapath performance impact:
> 1) a single boolean value check (shared state cache-line, likely in the core's cache) per iteration of polling of the app is super lightweight
> 2) an "event ringbuffer" check (when empty, also shared-state, likely in cache) per iteration is also super light.
> 
> General notes on the above:
> There's even an option to only check the boolean/event-ringbuffer once every N iterations: this will cause even less overhead, but will increase the latency of event action/reply on the datapath thread. As almost always, it depends on what's important for your use-case!
> 
> The main difference between implementation 1) and 2) above can be captured by this phrase: "Do not communicate by sharing memory; instead, share memory by communicating.", which I read at the Rust docs here: https://doc.rust-lang.org/book/ch16-02-message-passing.html. 1) literally shares memory (both threads access the force_quit value directly). 2) focusses on communicating: which enables avoiding the race condition in a more powerful/elegant way (and future proof too - it allows adding new event types cleanly, which the force_quit bool value does not.) I like this design-mentality, as it is a good/high-performance/scalable way of interacting between threads, and scales to future needs too: so I recommend approach 2.

One other solution is to reserve the main lcore for control operations.
In a couple of projects we had the main lcore spawn the workers then sleep on epoll to handle control
requests from another source (unix domain socket). When the stop request came in the main thread would
set a flag (atomic variable) and wait for the worker lcore's to finish. Then it would do the stop and
other maintenance operations. It worked out much cleaner than doing control in the workers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-14 16:31   ` Stephen Hemminger
@ 2025-02-17  9:32     ` NAGENDRA BALAGANI
  0 siblings, 0 replies; 10+ messages in thread
From: NAGENDRA BALAGANI @ 2025-02-17  9:32 UTC (permalink / raw)
  To: Stephen Hemminger, Van Haaren, Harry; +Cc: users

Hi Harry and Stephen,

Thank you for the response. your insights were really helpful. I will apply the required changes and verify.
Thanks once again for your time and valuable assistance

Regards,
Nagendra

-----Original Message-----
From: Stephen Hemminger <stephen@networkplumber.org> 
Sent: Friday, February 14, 2025 10:01 PM
To: Van Haaren, Harry <harry.van.haaren@intel.com>
Cc: NAGENDRA BALAGANI <nagendra.balagani@oracle.com>; users@dpdk.org
Subject: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

On Fri, 14 Feb 2025 09:22:59 +0000
"Van Haaren, Harry" <harry.van.haaren@intel.com> wrote:

> > From: NAGENDRA BALAGANI <nagendra.balagani@oracle.com>
> > Sent: Friday, February 14, 2025 8:43 AM
> > To: users@dpdk.org <users@dpdk.org>
> > Subject: Query Regarding Race Condition Between Packet Reception and 
> > Device Stop in DPDK
> >
> > Hi Team,
> 
> Ni Nagendra,
> 
> > We are facing a race condition in our DPDK application where one thread is reading packets from queue using rte_eth_rx_burst() , while another thread is attempting to stop the device using rte_eth_dev_stop(). This is causing instability, as the reading thread may still be accessing queues while the device is being stopped.  
> 
> This is as expected - it is not valid to stop a device while other cores are using it.
> 
> > Could you please suggest the best way to mitigate this race condition without impacting fast path performance? We want to ensure safe synchronization while maintaining high throughput.  
> 
> There are many implementations possible, but the end result of them all is "ensure that the dataplane core is NOT polling a device that is stopping".
> 
> 1) One implementation is using a "force_quit" boolean value (see dpdk/examples/l2fwd/main.c for example). This approach changes the lcore's "while (1)" polling loop, and turns it into a "while (!force_quit)". (Note some nuance around "volatile" keyword for the boolean to ensure reloading on each iteration, but that's off topic).
> 
> 2) Another more flexible/powerful implementation could be some form of message passing. For example imagine the dataplane thread and control plane (stopping ethdev) thread are capable of communicating by sending an "event" to eachother. When a "stop polling" event is recieved by the dataplane thread, it disables polling just that eth device/queue, and responds with a "stopped polling" reply. On recieving the "stopped polling" event, the thread that wants to stop the eth device can now safely do so.
> 
> Both of these implementations will have no datapath performance impact:
> 1) a single boolean value check (shared state cache-line, likely in 
> the core's cache) per iteration of polling of the app is super 
> lightweight
> 2) an "event ringbuffer" check (when empty, also shared-state, likely in cache) per iteration is also super light.
> 
> General notes on the above:
> There's even an option to only check the boolean/event-ringbuffer once every N iterations: this will cause even less overhead, but will increase the latency of event action/reply on the datapath thread. As almost always, it depends on what's important for your use-case!
> 
> The main difference between implementation 1) and 2) above can be captured by this phrase: "Do not communicate by sharing memory; instead, share memory by communicating.", which I read at the Rust docs here: https://urldefense.com/v3/__https://doc.rust-lang.org/book/ch16-02-message-passing.html__;!!ACWV5N9M2RV99hQ!LLkVp4x8kIwmmMmSj5_x1okc22KZy_POst8NBV-p1mjSSqJKKldyby3_Kt6E5lXq4JGIQkiqD7v8c5frr0Vmrrh_7TeW$ . 1) literally shares memory (both threads access the force_quit value directly). 2) focusses on communicating: which enables avoiding the race condition in a more powerful/elegant way (and future proof too - it allows adding new event types cleanly, which the force_quit bool value does not.) I like this design-mentality, as it is a good/high-performance/scalable way of interacting between threads, and scales to future needs too: so I recommend approach 2.

One other solution is to reserve the main lcore for control operations.
In a couple of projects we had the main lcore spawn the workers then sleep on epoll to handle control requests from another source (unix domain socket). When the stop request came in the main thread would set a flag (atomic variable) and wait for the worker lcore's to finish. Then it would do the stop and other maintenance operations. It worked out much cleaner than doing control in the workers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-14  9:22 ` Van Haaren, Harry
  2025-02-14 16:31   ` Stephen Hemminger
@ 2025-02-17 18:57   ` Changchun Zhang
  2025-02-17 19:06     ` Stephen Hemminger
  1 sibling, 1 reply; 10+ messages in thread
From: Changchun Zhang @ 2025-02-17 18:57 UTC (permalink / raw)
  To: Van Haaren, Harry, NAGENDRA BALAGANI, users

[-- Attachment #1: Type: text/plain, Size: 4301 bytes --]

Hi Harry,

Can we call rte_eth_dev_rx_queue_stop() on a rx queue when a fast path is still polling the queue? The sequence on control and fast path cores would like:
Control path:
rte_eth_dev_rx_queue_stop(rx_queue_id);
...waiting for draining of rx_queue...
rte_eth_dev_stop()
....

Fast path:
Keep calling rte_eth_rx_burst()
(I am expecting it will return 0 if queue is already drained and stopped)

Thanks,

Changchun

________________________________
From: Van Haaren, Harry <harry.van.haaren@intel.com>
Sent: Friday, February 14, 2025 4:22 AM
To: NAGENDRA BALAGANI <nagendra.balagani@oracle.com>; users@dpdk.org <users@dpdk.org>
Subject: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

> From: NAGENDRA BALAGANI <nagendra.balagani@oracle.com>
> Sent: Friday, February 14, 2025 8:43 AM
> To: users@dpdk.org <users@dpdk.org>
> Subject: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
>
> Hi Team,

Ni Nagendra,

> We are facing a race condition in our DPDK application where one thread is reading packets from queue using rte_eth_rx_burst() , while another thread is attempting to stop the device using rte_eth_dev_stop(). This is causing instability, as the reading thread may still be accessing queues while the device is being stopped.

This is as expected - it is not valid to stop a device while other cores are using it.

> Could you please suggest the best way to mitigate this race condition without impacting fast path performance? We want to ensure safe synchronization while maintaining high throughput.

There are many implementations possible, but the end result of them all is "ensure that the dataplane core is NOT polling a device that is stopping".

1) One implementation is using a "force_quit" boolean value (see dpdk/examples/l2fwd/main.c for example). This approach changes the lcore's "while (1)" polling loop, and turns it into a "while (!force_quit)". (Note some nuance around "volatile" keyword for the boolean to ensure reloading on each iteration, but that's off topic).

2) Another more flexible/powerful implementation could be some form of message passing. For example imagine the dataplane thread and control plane (stopping ethdev) thread are capable of communicating by sending an "event" to eachother. When a "stop polling" event is recieved by the dataplane thread, it disables polling just that eth device/queue, and responds with a "stopped polling" reply. On recieving the "stopped polling" event, the thread that wants to stop the eth device can now safely do so.

Both of these implementations will have no datapath performance impact:
1) a single boolean value check (shared state cache-line, likely in the core's cache) per iteration of polling of the app is super lightweight
2) an "event ringbuffer" check (when empty, also shared-state, likely in cache) per iteration is also super light.

General notes on the above:
There's even an option to only check the boolean/event-ringbuffer once every N iterations: this will cause even less overhead, but will increase the latency of event action/reply on the datapath thread. As almost always, it depends on what's important for your use-case!

The main difference between implementation 1) and 2) above can be captured by this phrase: "Do not communicate by sharing memory; instead, share memory by communicating.", which I read at the Rust docs here: https://doc.rust-lang.org/book/ch16-02-message-passing.html<https://urldefense.com/v3/__https://doc.rust-lang.org/book/ch16-02-message-passing.html__;!!ACWV5N9M2RV99hQ!PJ2tOtZcFXixBAPvzdcZl6JNtpUlX739wECm2pM-nD2ETdGn8kg48hadNTgteBLeIbz73d_gFfqnBjyJTTc4vwVoAeQTAFU$>. 1) literally shares memory (both threads access the force_quit value directly). 2) focusses on communicating: which enables avoiding the race condition in a more powerful/elegant way (and future proof too - it allows adding new event types cleanly, which the force_quit bool value does not.) I like this design-mentality, as it is a good/high-performance/scalable way of interacting between threads, and scales to future needs too: so I recommend approach 2.

> Looking forward to your insights.
>
> Regards,
> Nagendra

Regards, -Harry

[-- Attachment #2: Type: text/html, Size: 14075 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-17 18:57   ` Changchun Zhang
@ 2025-02-17 19:06     ` Stephen Hemminger
  2025-02-17 19:14       ` Changchun Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2025-02-17 19:06 UTC (permalink / raw)
  To: Changchun Zhang; +Cc: Van Haaren, Harry, NAGENDRA BALAGANI, users

On Mon, 17 Feb 2025 18:57:00 +0000
Changchun Zhang <changchun.zhang@oracle.com> wrote:

> Hi Harry,
> 
> Can we call rte_eth_dev_rx_queue_stop() on a rx queue when a fast path is still polling the queue? The sequence on control and fast path cores would like:
> Control path:
> rte_eth_dev_rx_queue_stop(rx_queue_id);
> ...waiting for draining of rx_queue...
> rte_eth_dev_stop()
> ....
> 
> Fast path:
> Keep calling rte_eth_rx_burst()
> (I am expecting it will return 0 if queue is already drained and stopped)
> 

No.
The application needs to not call rx_burst when stop is being done.
There rx_burst is a fast path with no additional checks and is intentionally not thread safe.
You need to coordinate queue management inside the application.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-17 19:06     ` Stephen Hemminger
@ 2025-02-17 19:14       ` Changchun Zhang
  2025-02-17 19:20         ` Changchun Zhang
  2025-02-18  9:32         ` Van Haaren, Harry
  0 siblings, 2 replies; 10+ messages in thread
From: Changchun Zhang @ 2025-02-17 19:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Van Haaren, Harry, NAGENDRA BALAGANI, users

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Okay, so here the issue is still rte_eth_dev_stop(), but not rte_eth_dev_rx_queue_stop(), right? I mean, as long as not calling rte_eth_dev_stop() on control path, is it safe to call rte_eth_dev_rx_queue_stop/rte_eth_dev_rx_queue_start on control path while fast path keeps calling rte_eth_rx_burst()?





Thanks,

Changchun



________________________________
From: Stephen Hemminger <stephen@networkplumber.org>
Sent: Monday, February 17, 2025 2:06 PM
To: Changchun Zhang <changchun.zhang@oracle.com>
Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; NAGENDRA BALAGANI <nagendra.balagani@oracle.com>; users@dpdk.org <users@dpdk.org>
Subject: Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

On Mon, 17 Feb 2025 18:57:00 +0000
Changchun Zhang <changchun.zhang@oracle.com> wrote:

> Hi Harry,
>
> Can we call rte_eth_dev_rx_queue_stop() on a rx queue when a fast path is still polling the queue? The sequence on control and fast path cores would like:
> Control path:
> rte_eth_dev_rx_queue_stop(rx_queue_id);
> ...waiting for draining of rx_queue...
> rte_eth_dev_stop()
> ....
>
> Fast path:
> Keep calling rte_eth_rx_burst()
> (I am expecting it will return 0 if queue is already drained and stopped)
>

No.
The application needs to not call rx_burst when stop is being done.
There rx_burst is a fast path with no additional checks and is intentionally not thread safe.
You need to coordinate queue management inside the application.

[-- Attachment #2: Type: text/html, Size: 3137 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-17 19:14       ` Changchun Zhang
@ 2025-02-17 19:20         ` Changchun Zhang
  2025-02-17 20:56           ` Stephen Hemminger
  2025-02-18  9:32         ` Van Haaren, Harry
  1 sibling, 1 reply; 10+ messages in thread
From: Changchun Zhang @ 2025-02-17 19:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Van Haaren, Harry, NAGENDRA BALAGANI, users

[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]

I assume this answer is No, but just wanted to be confirmed.





Thanks,

Changchun



________________________________
From: Changchun Zhang <changchun.zhang@oracle.com>
Sent: Monday, February 17, 2025 2:14 PM
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; NAGENDRA BALAGANI <nagendra.balagani@oracle.com>; users@dpdk.org <users@dpdk.org>
Subject: Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

Okay, so here the issue is still rte_eth_dev_stop(), but not rte_eth_dev_rx_queue_stop(), right? I mean, as long as not calling rte_eth_dev_stop() on control path, is it safe to call rte_eth_dev_rx_queue_stop/rte_eth_dev_rx_queue_start on control path while fast path keeps calling rte_eth_rx_burst()?





Thanks,

Changchun



________________________________
From: Stephen Hemminger <stephen@networkplumber.org>
Sent: Monday, February 17, 2025 2:06 PM
To: Changchun Zhang <changchun.zhang@oracle.com>
Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; NAGENDRA BALAGANI <nagendra.balagani@oracle.com>; users@dpdk.org <users@dpdk.org>
Subject: Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

On Mon, 17 Feb 2025 18:57:00 +0000
Changchun Zhang <changchun.zhang@oracle.com> wrote:

> Hi Harry,
>
> Can we call rte_eth_dev_rx_queue_stop() on a rx queue when a fast path is still polling the queue? The sequence on control and fast path cores would like:
> Control path:
> rte_eth_dev_rx_queue_stop(rx_queue_id);
> ...waiting for draining of rx_queue...
> rte_eth_dev_stop()
> ....
>
> Fast path:
> Keep calling rte_eth_rx_burst()
> (I am expecting it will return 0 if queue is already drained and stopped)
>

No.
The application needs to not call rx_burst when stop is being done.
There rx_burst is a fast path with no additional checks and is intentionally not thread safe.
You need to coordinate queue management inside the application.

[-- Attachment #2: Type: text/html, Size: 4741 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-17 19:20         ` Changchun Zhang
@ 2025-02-17 20:56           ` Stephen Hemminger
  0 siblings, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2025-02-17 20:56 UTC (permalink / raw)
  To: Changchun Zhang; +Cc: Van Haaren, Harry, NAGENDRA BALAGANI, users

On Mon, 17 Feb 2025 19:20:10 +0000
Changchun Zhang <changchun.zhang@oracle.com> wrote:

> I assume this answer is No, but just wanted to be confirmed.

It should be OK to be calling rx_burst on an unrelated queue.
Just don't start/stop a queue while application is polling that queue.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
  2025-02-17 19:14       ` Changchun Zhang
  2025-02-17 19:20         ` Changchun Zhang
@ 2025-02-18  9:32         ` Van Haaren, Harry
  1 sibling, 0 replies; 10+ messages in thread
From: Van Haaren, Harry @ 2025-02-18  9:32 UTC (permalink / raw)
  To: Zhang, Changchun, Stephen Hemminger; +Cc: NAGENDRA BALAGANI, users

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

Side note; please "reply inline" in plaint text when sending to mailing lists, it allows future readers to see the context of your question when reading the reply.

> From: Changchun Zhang <changchun.zhang@oracle.com>
> Sent: Monday, February 17, 2025 7:14 PM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; NAGENDRA BALAGANI <nagendra.balagani@oracle.com>; users@dpdk.org <users@dpdk.org>
> Subject: Re: [External] : Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK
>
> Okay, so here the issue is still rte_eth_dev_stop(), but not rte_eth_dev_rx_queue_stop(), right? I mean, as long as not calling rte_eth_dev_stop() on control path, is it safe to call rte_eth_dev_rx_queue_stop/rte_eth_dev_rx_queue_start on control path while fast path keeps calling rte_eth_rx_burst()?

I see Stephen has already directly answered your question - but perhaps the below is interesting to expand beyond the exact question being asked.

Instead of answering specific questions like above, allow me to explain the "mental model" around how I think about ports/queues and threads using them.
- Each Queue (rx or tx, doesn't matter) is an entity that can be polled by a dataplane thread. No control plane actions (e.g. start/stop/reset) may be active if the queue is polled!
- Each Queue depends on a port (that it logically belongs to). That port must be started for the queue to be usable from the dataplane thread.
- Each Queue is an individual object: different dataplane threads can poll different queues without any "interference" (allowing one queue to restart, while another STAYS in use)

For specific use-cases, you can now logic out what is required:
- Stop Queue: the dataplane thread MUST NOT poll if another thread is operating on the the same queue.
- Stop Port: All queues must be stopped before the port can be stopped. Therefore, all dataplane threads must stop using the queues associated with the port.

The DPDK docs e.g. rte_eth_dev_start: https://doc.dpdk.org/api/rte__ethdev_8h.html#afdc834c1c52e9fb512301990468ca7c2 do have some statements around expected usage: "On success, all basic functions exported by the Ethernet API (link status, receive/transmit, and so on) can be invoked." Perhaps the documentation can be improved - if you're willing to contribute, this would likely be appreciated by the next developer learning the correct usage of these APIs?

As folks on list may know, I've been using the Rust language for a number of years, and it has some very nice properties around encoding these types of details at the API layer, and putting compile-time (or runtime - depends on implementation) enforcement on these API constraints. This was presented at DPDK Userspace '23, with the Rust + DPDK Ethdev part of the talk being relevant to the above "mental model": this timestamp (6 minutes in) is the start of that section: https://youtu.be/lb6xn2xQ-NQ?t=359

> Thanks,
> Changchun

Regards! -Harry

<snip previous discussions; see https://mails.dpdk.org/archives/users/2025-February/thread.html#8122>

[-- Attachment #2: Type: text/html, Size: 8475 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-02-18  9:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-14  8:43 Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK NAGENDRA BALAGANI
2025-02-14  9:22 ` Van Haaren, Harry
2025-02-14 16:31   ` Stephen Hemminger
2025-02-17  9:32     ` [External] : " NAGENDRA BALAGANI
2025-02-17 18:57   ` Changchun Zhang
2025-02-17 19:06     ` Stephen Hemminger
2025-02-17 19:14       ` Changchun Zhang
2025-02-17 19:20         ` Changchun Zhang
2025-02-17 20:56           ` Stephen Hemminger
2025-02-18  9:32         ` Van Haaren, Harry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).