DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] app/testpmd: remove invalid ports when other process detach
@ 2022-03-02  2:33 Chengwen Feng
  2022-03-02  8:26 ` Thomas Monjalon
  0 siblings, 1 reply; 8+ messages in thread
From: Chengwen Feng @ 2022-03-02  2:33 UTC (permalink / raw)
  To: thomas, ferruh.yigit; +Cc: dev, xiaoyun.li, aman.deep.singh, yuying.zhang

Start main and secondary process:
./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
	--num-procs=2 --proc-id=0
./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
	--num-procs=2 --proc-id=1
Execute following command in main process:
	port stop 0
	port detach 0
Execute following command in secondary process:
	set fwd mac
	start
The secondary process will display:
	Invalid port_id=0
	telcore 19 called rx_pkt_burst for not ready port 0
	stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
	7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
	6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
	5: [/usr/app/testpmd() [0x9c21d8]]
	4: [/usr/app/testpmd() [0x9c2108]]
	3: [/usr/app/testpmd() [0x9b6cf0]]
	2: [/usr/app/testpmd() [0xad8620]]
	1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]

The root cause it that the secondary process has not removed invalid
ports when it processes RTE_ETH_EVENT_DESTROY event.

This patch adds a delay remove invalid ports invoke when process the
RTE_ETH_EVENT_DESTROY event.

Note: There will be two invoke of removing invalid ports in main
process, one is trigger by user command, another is trigger by
RTE_ETH_EVENT_DESTROY event. This patch keeps it unchanged to ensure
that the correct number of ports is displayed after detaching
successfully.

Fixes: 85c6571c9103 ("app/testpmd: reset port status on close notification")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 app/test-pmd/testpmd.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..a6a2533806 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3534,6 +3534,13 @@ rmv_port_callback(void *arg)
 		start_packet_forwarding(0);
 }
 
+static void
+remove_invalid_ports_callback(void *arg)
+{
+	RTE_SET_USED(arg);
+	remove_invalid_ports();
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -3569,6 +3576,10 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	case RTE_ETH_EVENT_DESTROY:
 		ports[port_id].port_status = RTE_PORT_CLOSED;
 		printf("Port %u is closed\n", port_id);
+		if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
+				(void *)(intptr_t)port_id))
+			fprintf(stderr,
+				"Could not set up deferred device released\n");
 		break;
 	default:
 		break;
-- 
2.33.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-03-02  2:33 [PATCH] app/testpmd: remove invalid ports when other process detach Chengwen Feng
@ 2022-03-02  8:26 ` Thomas Monjalon
  2022-03-02  8:36   ` fengchengwen
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Monjalon @ 2022-03-02  8:26 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: ferruh.yigit, dev, xiaoyun.li, aman.deep.singh, yuying.zhang

02/03/2022 03:33, Chengwen Feng:
> Start main and secondary process:
> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
> 	--num-procs=2 --proc-id=0
> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
> 	--num-procs=2 --proc-id=1
> Execute following command in main process:
> 	port stop 0
> 	port detach 0
> Execute following command in secondary process:
> 	set fwd mac
> 	start
> The secondary process will display:
> 	Invalid port_id=0
> 	telcore 19 called rx_pkt_burst for not ready port 0
> 	stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
> 	7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
> 	6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
> 	5: [/usr/app/testpmd() [0x9c21d8]]
> 	4: [/usr/app/testpmd() [0x9c2108]]
> 	3: [/usr/app/testpmd() [0x9b6cf0]]
> 	2: [/usr/app/testpmd() [0xad8620]]
> 	1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
> 
> The root cause it that the secondary process has not removed invalid
> ports when it processes RTE_ETH_EVENT_DESTROY event.

Why the ports are not removed?

> This patch adds a delay remove invalid ports invoke when process the
> RTE_ETH_EVENT_DESTROY event.

Why do we need this delay?

[...]
> +static void
> +remove_invalid_ports_callback(void *arg)
> +{
> +	RTE_SET_USED(arg);
> +	remove_invalid_ports();
> +}
[...]
>  	case RTE_ETH_EVENT_DESTROY:
>  		ports[port_id].port_status = RTE_PORT_CLOSED;
>  		printf("Port %u is closed\n", port_id);
> +		if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
> +				(void *)(intptr_t)port_id))
> +			fprintf(stderr,
> +				"Could not set up deferred device released\n");
>  		break;




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-03-02  8:26 ` Thomas Monjalon
@ 2022-03-02  8:36   ` fengchengwen
  2022-04-11  2:05     ` fengchengwen
  2022-05-20 15:05     ` Ferruh Yigit
  0 siblings, 2 replies; 8+ messages in thread
From: fengchengwen @ 2022-03-02  8:36 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: ferruh.yigit, dev, xiaoyun.li, aman.deep.singh, yuying.zhang

On 2022/3/2 16:26, Thomas Monjalon wrote:
> 02/03/2022 03:33, Chengwen Feng:
>> Start main and secondary process:
>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>> 	--num-procs=2 --proc-id=0
>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>> 	--num-procs=2 --proc-id=1
>> Execute following command in main process:
>> 	port stop 0
>> 	port detach 0
>> Execute following command in secondary process:
>> 	set fwd mac
>> 	start
>> The secondary process will display:
>> 	Invalid port_id=0
>> 	telcore 19 called rx_pkt_burst for not ready port 0
>> 	stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>> 	7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>> 	6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>> 	5: [/usr/app/testpmd() [0x9c21d8]]
>> 	4: [/usr/app/testpmd() [0x9c2108]]
>> 	3: [/usr/app/testpmd() [0x9b6cf0]]
>> 	2: [/usr/app/testpmd() [0xad8620]]
>> 	1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>
>> The root cause it that the secondary process has not removed invalid
>> ports when it processes RTE_ETH_EVENT_DESTROY event.
> 
> Why the ports are not removed?

Testpmd register function eth_event_callback to deal with DESTROY event,
currently it only assign ports[port_id].port_status with RTE_PORT_CLOSED, it
doesn't update other global variables like nb_ports.

> 
>> This patch adds a delay remove invalid ports invoke when process the
>> RTE_ETH_EVENT_DESTROY event.
> 
> Why do we need this delay?

The remove_invalid_ports will scan rte_eth_devices[], when process the DESTROY
event, the rte_eth_devices[x] still valid, so here we should a delay logic.

> 
> [...]
>> +static void
>> +remove_invalid_ports_callback(void *arg)
>> +{
>> +	RTE_SET_USED(arg);
>> +	remove_invalid_ports();
>> +}
> [...]
>>  	case RTE_ETH_EVENT_DESTROY:
>>  		ports[port_id].port_status = RTE_PORT_CLOSED;
>>  		printf("Port %u is closed\n", port_id);
>> +		if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
>> +				(void *)(intptr_t)port_id))
>> +			fprintf(stderr,
>> +				"Could not set up deferred device released\n");
>>  		break;
> 
> 
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-03-02  8:36   ` fengchengwen
@ 2022-04-11  2:05     ` fengchengwen
  2022-05-20 15:05     ` Ferruh Yigit
  1 sibling, 0 replies; 8+ messages in thread
From: fengchengwen @ 2022-04-11  2:05 UTC (permalink / raw)
  To: Thomas Monjalon, ferruh.yigit
  Cc: dev, xiaoyun.li, aman.deep.singh, yuying.zhang

Hi Ferruh,

  Could you please review this patch, I notice the patch is delegated to you.

Thanks

On 2022/3/2 16:36, fengchengwen wrote:
> On 2022/3/2 16:26, Thomas Monjalon wrote:
>> 02/03/2022 03:33, Chengwen Feng:
>>> Start main and secondary process:
>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>> 	--num-procs=2 --proc-id=0
>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>> 	--num-procs=2 --proc-id=1
>>> Execute following command in main process:
>>> 	port stop 0
>>> 	port detach 0
>>> Execute following command in secondary process:
>>> 	set fwd mac
>>> 	start
>>> The secondary process will display:
>>> 	Invalid port_id=0
>>> 	telcore 19 called rx_pkt_burst for not ready port 0
>>> 	stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>> 	7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>> 	6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>> 	5: [/usr/app/testpmd() [0x9c21d8]]
>>> 	4: [/usr/app/testpmd() [0x9c2108]]
>>> 	3: [/usr/app/testpmd() [0x9b6cf0]]
>>> 	2: [/usr/app/testpmd() [0xad8620]]
>>> 	1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>
>>> The root cause it that the secondary process has not removed invalid
>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>
>> Why the ports are not removed?
> 
> Testpmd register function eth_event_callback to deal with DESTROY event,
> currently it only assign ports[port_id].port_status with RTE_PORT_CLOSED, it
> doesn't update other global variables like nb_ports.
> 
>>
>>> This patch adds a delay remove invalid ports invoke when process the
>>> RTE_ETH_EVENT_DESTROY event.
>>
>> Why do we need this delay?
> 
> The remove_invalid_ports will scan rte_eth_devices[], when process the DESTROY
> event, the rte_eth_devices[x] still valid, so here we should a delay logic.
> 
>>
>> [...]
>>> +static void
>>> +remove_invalid_ports_callback(void *arg)
>>> +{
>>> +	RTE_SET_USED(arg);
>>> +	remove_invalid_ports();
>>> +}
>> [...]
>>>  	case RTE_ETH_EVENT_DESTROY:
>>>  		ports[port_id].port_status = RTE_PORT_CLOSED;
>>>  		printf("Port %u is closed\n", port_id);
>>> +		if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
>>> +				(void *)(intptr_t)port_id))
>>> +			fprintf(stderr,
>>> +				"Could not set up deferred device released\n");
>>>  		break;
>>
>>
>>
>>
>> .
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-03-02  8:36   ` fengchengwen
  2022-04-11  2:05     ` fengchengwen
@ 2022-05-20 15:05     ` Ferruh Yigit
  2022-05-20 15:14       ` Ferruh Yigit
  2022-05-21 10:00       ` fengchengwen
  1 sibling, 2 replies; 8+ messages in thread
From: Ferruh Yigit @ 2022-05-20 15:05 UTC (permalink / raw)
  To: fengchengwen, Thomas Monjalon
  Cc: dev, xiaoyun.li, aman.deep.singh, yuying.zhang, Andrew Rybchenko

On 3/2/2022 8:36 AM, fengchengwen wrote:
> On 2022/3/2 16:26, Thomas Monjalon wrote:
>> 02/03/2022 03:33, Chengwen Feng:
>>> Start main and secondary process:
>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>> 	--num-procs=2 --proc-id=0
>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>> 	--num-procs=2 --proc-id=1
>>> Execute following command in main process:
>>> 	port stop 0
>>> 	port detach 0
>>> Execute following command in secondary process:
>>> 	set fwd mac
>>> 	start
>>> The secondary process will display:
>>> 	Invalid port_id=0
>>> 	telcore 19 called rx_pkt_burst for not ready port 0
>>> 	stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>> 	7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>> 	6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>> 	5: [/usr/app/testpmd() [0x9c21d8]]
>>> 	4: [/usr/app/testpmd() [0x9c2108]]
>>> 	3: [/usr/app/testpmd() [0x9b6cf0]]
>>> 	2: [/usr/app/testpmd() [0xad8620]]
>>> 	1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>
>>> The root cause it that the secondary process has not removed invalid
>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>
>> Why the ports are not removed?
> 

It is referring to application (testpmd) level state, ethdev port is 
removed.

Above mentioned problem is valid since testpmd secondary process support 
is added.
When primary hot remove a device, it updates relevant application state 
too, but for secondary process ethdev removes device without secondary 
process updating its state.

I agree that using ethdev events is appropriate for this case, since 
action originated from ethdev for secondary process, need a way to 
notify application about it.
Another option can be checking and removing invalid ports in secondary 
before each forwarding start, but since we already have an event for 
destroy, using it looks good to me.


> Testpmd register function eth_event_callback to deal with DESTROY event,
> currently it only assign ports[port_id].port_status with RTE_PORT_CLOSED, it
> doesn't update other global variables like nb_ports.
> 
>>
>>> This patch adds a delay remove invalid ports invoke when process the
>>> RTE_ETH_EVENT_DESTROY event.
>>
>> Why do we need this delay?
> 
> The remove_invalid_ports will scan rte_eth_devices[], when process the DESTROY
> event, the rte_eth_devices[x] still valid, so here we should a delay logic.
> 

There is a dependency problem in the DESTROY event.

We need some ethdev information to be able to deliver the event, so 
ethdev can't be completely destroyed until DESTROY event is processed. 
Which means when application received DESTROY event, ethdev is not fully 
destroyed yet.

Except from above, 'remove_invalid_ports()' is called twice for primary 
process (as mentioned in commit log), this timer can be helping these 
two calls not conflict, but I am not sure if we can rely on a time for this.
Since the problem is mainly for secondary process, assuming control 
commands like detaching a device will be called by primary process, so 
@Feng what do you think about adding a secondary process check to in the 
'remove_invalid_ports_callback()' call? So 'remove_invalid_ports()' is 
called only once for primary process.

>>
>> [...]
>>> +static void
>>> +remove_invalid_ports_callback(void *arg)
>>> +{
>>> +	RTE_SET_USED(arg);
>>> +	remove_invalid_ports();
>>> +}
>> [...]
>>>   	case RTE_ETH_EVENT_DESTROY:
>>>   		ports[port_id].port_status = RTE_PORT_CLOSED;
>>>   		printf("Port %u is closed\n", port_id);
>>> +		if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
>>> +				(void *)(intptr_t)port_id))
>>> +			fprintf(stderr,
>>> +				"Could not set up deferred device released\n");
>>>   		break;
>>
>>
>>
>>
>> .
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-05-20 15:05     ` Ferruh Yigit
@ 2022-05-20 15:14       ` Ferruh Yigit
  2022-05-21 10:00       ` fengchengwen
  1 sibling, 0 replies; 8+ messages in thread
From: Ferruh Yigit @ 2022-05-20 15:14 UTC (permalink / raw)
  To: fengchengwen, yuying.zhang, aman.deep.singh
  Cc: dev, xiaoyun.li, Andrew Rybchenko, Thomas Monjalon

On 5/20/2022 4:05 PM, Ferruh Yigit wrote:
> [CAUTION: External Email]
> 
> On 3/2/2022 8:36 AM, fengchengwen wrote:
>> On 2022/3/2 16:26, Thomas Monjalon wrote:
>>> 02/03/2022 03:33, Chengwen Feng:
>>>> Start main and secondary process:
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>     --num-procs=2 --proc-id=0
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>     --num-procs=2 --proc-id=1
>>>> Execute following command in main process:
>>>>     port stop 0
>>>>     port detach 0
>>>> Execute following command in secondary process:
>>>>     set fwd mac
>>>>     start
>>>> The secondary process will display:
>>>>     Invalid port_id=0
>>>>     telcore 19 called rx_pkt_burst for not ready port 0
>>>>     stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>>>     7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>>>     6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>>>     5: [/usr/app/testpmd() [0x9c21d8]]
>>>>     4: [/usr/app/testpmd() [0x9c2108]]
>>>>     3: [/usr/app/testpmd() [0x9b6cf0]]
>>>>     2: [/usr/app/testpmd() [0xad8620]]
>>>>     1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>>
>>>> The root cause it that the secondary process has not removed invalid
>>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why the ports are not removed?
>>
> 
> It is referring to application (testpmd) level state, ethdev port is
> removed.
> 
> Above mentioned problem is valid since testpmd secondary process support
> is added.
> When primary hot remove a device, it updates relevant application state
> too, but for secondary process ethdev removes device without secondary
> process updating its state.
> 
> I agree that using ethdev events is appropriate for this case, since
> action originated from ethdev for secondary process, need a way to
> notify application about it.
> Another option can be checking and removing invalid ports in secondary
> before each forwarding start, but since we already have an event for
> destroy, using it looks good to me.
> 

Closing port in the primary has the same problem, secondary process 
state is not updated and crashes.
A high level design decision can be to reduce the application level 
states and rely on ethdev states more, to reduce/remove this ethdev and 
application state sync requirement,
@Aman, @Yuying, what do you think about this high level, long term goal?

(Like why there is a testpmd 'RTE_PORT_CLOSED' state, should be in ethdev?)

((BTW, why testpmd defined 'RTE_PORT_CLOSED' has 'RTE_' prefix, @Aman 
can you do a cleanup for it, 'RTE_' -> 'TESTPMD_' ?))


> 
>> Testpmd register function eth_event_callback to deal with DESTROY event,
>> currently it only assign ports[port_id].port_status with 
>> RTE_PORT_CLOSED, it
>> doesn't update other global variables like nb_ports.
>>
>>>
>>>> This patch adds a delay remove invalid ports invoke when process the
>>>> RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why do we need this delay?
>>
>> The remove_invalid_ports will scan rte_eth_devices[], when process the 
>> DESTROY
>> event, the rte_eth_devices[x] still valid, so here we should a delay 
>> logic.
>>
> 
> There is a dependency problem in the DESTROY event.
> 
> We need some ethdev information to be able to deliver the event, so
> ethdev can't be completely destroyed until DESTROY event is processed.
> Which means when application received DESTROY event, ethdev is not fully
> destroyed yet.
> 
> Except from above, 'remove_invalid_ports()' is called twice for primary
> process (as mentioned in commit log), this timer can be helping these
> two calls not conflict, but I am not sure if we can rely on a time for 
> this.
> Since the problem is mainly for secondary process, assuming control
> commands like detaching a device will be called by primary process, so
> @Feng what do you think about adding a secondary process check to in the
> 'remove_invalid_ports_callback()' call? So 'remove_invalid_ports()' is
> called only once for primary process.
> 
>>>
>>> [...]
>>>> +static void
>>>> +remove_invalid_ports_callback(void *arg)
>>>> +{
>>>> +   RTE_SET_USED(arg);
>>>> +   remove_invalid_ports();
>>>> +}
>>> [...]
>>>>     case RTE_ETH_EVENT_DESTROY:
>>>>             ports[port_id].port_status = RTE_PORT_CLOSED;
>>>>             printf("Port %u is closed\n", port_id);
>>>> +           if (rte_eal_alarm_set(100000, 
>>>> remove_invalid_ports_callback,
>>>> +                           (void *)(intptr_t)port_id))
>>>> +                   fprintf(stderr,
>>>> +                           "Could not set up deferred device 
>>>> released\n");
>>>>             break;
>>>
>>>
>>>
>>>
>>> .
>>>
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-05-20 15:05     ` Ferruh Yigit
  2022-05-20 15:14       ` Ferruh Yigit
@ 2022-05-21 10:00       ` fengchengwen
  2022-05-23  8:43         ` Ferruh Yigit
  1 sibling, 1 reply; 8+ messages in thread
From: fengchengwen @ 2022-05-21 10:00 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon
  Cc: dev, xiaoyun.li, aman.deep.singh, yuying.zhang, Andrew Rybchenko

Hi Ferruh,

On 2022/5/20 23:05, Ferruh Yigit wrote:
> On 3/2/2022 8:36 AM, fengchengwen wrote:
>> On 2022/3/2 16:26, Thomas Monjalon wrote:
>>> 02/03/2022 03:33, Chengwen Feng:
>>>> Start main and secondary process:
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>     --num-procs=2 --proc-id=0
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>     --num-procs=2 --proc-id=1
>>>> Execute following command in main process:
>>>>     port stop 0
>>>>     port detach 0
>>>> Execute following command in secondary process:
>>>>     set fwd mac
>>>>     start
>>>> The secondary process will display:
>>>>     Invalid port_id=0
>>>>     telcore 19 called rx_pkt_burst for not ready port 0
>>>>     stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>>>     7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>>>     6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>>>     5: [/usr/app/testpmd() [0x9c21d8]]
>>>>     4: [/usr/app/testpmd() [0x9c2108]]
>>>>     3: [/usr/app/testpmd() [0x9b6cf0]]
>>>>     2: [/usr/app/testpmd() [0xad8620]]
>>>>     1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>>
>>>> The root cause it that the secondary process has not removed invalid
>>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why the ports are not removed?
>>
> 
> It is referring to application (testpmd) level state, ethdev port is removed.
> 
> Above mentioned problem is valid since testpmd secondary process support is added.
> When primary hot remove a device, it updates relevant application state too, but for secondary process ethdev removes device without secondary process updating its state.
> 
> I agree that using ethdev events is appropriate for this case, since action originated from ethdev for secondary process, need a way to notify application about it.
> Another option can be checking and removing invalid ports in secondary before each forwarding start, but since we already have an event for destroy, using it looks good to me.
> 
> 
>> Testpmd register function eth_event_callback to deal with DESTROY event,
>> currently it only assign ports[port_id].port_status with RTE_PORT_CLOSED, it
>> doesn't update other global variables like nb_ports.
>>
>>>
>>>> This patch adds a delay remove invalid ports invoke when process the
>>>> RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why do we need this delay?
>>
>> The remove_invalid_ports will scan rte_eth_devices[], when process the DESTROY
>> event, the rte_eth_devices[x] still valid, so here we should a delay logic.
>>
> 
> There is a dependency problem in the DESTROY event.
> 
> We need some ethdev information to be able to deliver the event, so ethdev can't be completely destroyed until DESTROY event is processed. Which means when application received DESTROY event, ethdev is not fully destroyed yet.
> 
> Except from above, 'remove_invalid_ports()' is called twice for primary process (as mentioned in commit log), this timer can be helping these two calls not conflict, but I am not sure if we can rely on a time for this.
> Since the problem is mainly for secondary process, assuming control commands like detaching a device will be called by primary process, so @Feng what do you think about adding a secondary process check to in the 'remove_invalid_ports_callback()' call? So 'remove_invalid_ports()' is called only once for primary process.

If we execute following command in secondary process:
     port stop 0
     port detach 0
And execute following command in main process:
     set fwd mac
     start
The main process also show the same error.

So I think we should keep it.

> 
>>>
>>> [...]
>>>> +static void
>>>> +remove_invalid_ports_callback(void *arg)
>>>> +{
>>>> +    RTE_SET_USED(arg);
>>>> +    remove_invalid_ports();
>>>> +}
>>> [...]
>>>>       case RTE_ETH_EVENT_DESTROY:
>>>>           ports[port_id].port_status = RTE_PORT_CLOSED;
>>>>           printf("Port %u is closed\n", port_id);
>>>> +        if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
>>>> +                (void *)(intptr_t)port_id))
>>>> +            fprintf(stderr,
>>>> +                "Could not set up deferred device released\n");
>>>>           break;
>>>
>>>
>>>
>>>
>>> .
>>>
>>
> 
> 
> .


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] app/testpmd: remove invalid ports when other process detach
  2022-05-21 10:00       ` fengchengwen
@ 2022-05-23  8:43         ` Ferruh Yigit
  0 siblings, 0 replies; 8+ messages in thread
From: Ferruh Yigit @ 2022-05-23  8:43 UTC (permalink / raw)
  To: fengchengwen, Thomas Monjalon
  Cc: dev, xiaoyun.li, aman.deep.singh, yuying.zhang, Andrew Rybchenko

On 5/21/2022 11:00 AM, fengchengwen wrote:
> [CAUTION: External Email]
> 
> Hi Ferruh,
> 
> On 2022/5/20 23:05, Ferruh Yigit wrote:
>> On 3/2/2022 8:36 AM, fengchengwen wrote:
>>> On 2022/3/2 16:26, Thomas Monjalon wrote:
>>>> 02/03/2022 03:33, Chengwen Feng:
>>>>> Start main and secondary process:
>>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>>      --num-procs=2 --proc-id=0
>>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>>      --num-procs=2 --proc-id=1
>>>>> Execute following command in main process:
>>>>>      port stop 0
>>>>>      port detach 0
>>>>> Execute following command in secondary process:
>>>>>      set fwd mac
>>>>>      start
>>>>> The secondary process will display:
>>>>>      Invalid port_id=0
>>>>>      telcore 19 called rx_pkt_burst for not ready port 0
>>>>>      stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>>>>      7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>>>>      6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>>>>      5: [/usr/app/testpmd() [0x9c21d8]]
>>>>>      4: [/usr/app/testpmd() [0x9c2108]]
>>>>>      3: [/usr/app/testpmd() [0x9b6cf0]]
>>>>>      2: [/usr/app/testpmd() [0xad8620]]
>>>>>      1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>>>
>>>>> The root cause it that the secondary process has not removed invalid
>>>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>>>
>>>> Why the ports are not removed?
>>>
>>
>> It is referring to application (testpmd) level state, ethdev port is removed.
>>
>> Above mentioned problem is valid since testpmd secondary process support is added.
>> When primary hot remove a device, it updates relevant application state too, but for secondary process ethdev removes device without secondary process updating its state.
>>
>> I agree that using ethdev events is appropriate for this case, since action originated from ethdev for secondary process, need a way to notify application about it.
>> Another option can be checking and removing invalid ports in secondary before each forwarding start, but since we already have an event for destroy, using it looks good to me.
>>
>>
>>> Testpmd register function eth_event_callback to deal with DESTROY event,
>>> currently it only assign ports[port_id].port_status with RTE_PORT_CLOSED, it
>>> doesn't update other global variables like nb_ports.
>>>
>>>>
>>>>> This patch adds a delay remove invalid ports invoke when process the
>>>>> RTE_ETH_EVENT_DESTROY event.
>>>>
>>>> Why do we need this delay?
>>>
>>> The remove_invalid_ports will scan rte_eth_devices[], when process the DESTROY
>>> event, the rte_eth_devices[x] still valid, so here we should a delay logic.
>>>
>>
>> There is a dependency problem in the DESTROY event.
>>
>> We need some ethdev information to be able to deliver the event, so ethdev can't be completely destroyed until DESTROY event is processed. Which means when application received DESTROY event, ethdev is not fully destroyed yet.
>>
>> Except from above, 'remove_invalid_ports()' is called twice for primary process (as mentioned in commit log), this timer can be helping these two calls not conflict, but I am not sure if we can rely on a time for this.
>> Since the problem is mainly for secondary process, assuming control commands like detaching a device will be called by primary process, so @Feng what do you think about adding a secondary process check to in the 'remove_invalid_ports_callback()' call? So 'remove_invalid_ports()' is called only once for primary process.
> 
> If we execute following command in secondary process:
>       port stop 0
>       port detach 0
> And execute following command in main process:
>       set fwd mac
>       start
> The main process also show the same error.
> 

I am aware you will get same error in that case, but how common is this 
to remove a device from secondary process?

Also as mentioned above, what happens the 'remove_invalid_ports()' 
called both from remove path and event path at the same time?

> So I think we should keep it.
> 
>>
>>>>
>>>> [...]
>>>>> +static void
>>>>> +remove_invalid_ports_callback(void *arg)
>>>>> +{
>>>>> +    RTE_SET_USED(arg);
>>>>> +    remove_invalid_ports();
>>>>> +}
>>>> [...]
>>>>>        case RTE_ETH_EVENT_DESTROY:
>>>>>            ports[port_id].port_status = RTE_PORT_CLOSED;
>>>>>            printf("Port %u is closed\n", port_id);
>>>>> +        if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
>>>>> +                (void *)(intptr_t)port_id))
>>>>> +            fprintf(stderr,
>>>>> +                "Could not set up deferred device released\n");
>>>>>            break;
>>>>
>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>
>>
>> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-23  8:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-02  2:33 [PATCH] app/testpmd: remove invalid ports when other process detach Chengwen Feng
2022-03-02  8:26 ` Thomas Monjalon
2022-03-02  8:36   ` fengchengwen
2022-04-11  2:05     ` fengchengwen
2022-05-20 15:05     ` Ferruh Yigit
2022-05-20 15:14       ` Ferruh Yigit
2022-05-21 10:00       ` fengchengwen
2022-05-23  8:43         ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).