DPDK patches and discussions
 help / color / mirror / Atom feed
* Crash in tap pmd when using more than 8 rx queues
@ 2024-09-05 12:55 Edwin Brossette
  2024-09-06 11:16 ` Ferruh Yigit
  0 siblings, 1 reply; 6+ messages in thread
From: Edwin Brossette @ 2024-09-05 12:55 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Didier Pallard, Laurent Hardy, kparameshwar, ferruh.yigit

[-- Attachment #1: Type: text/plain, Size: 2332 bytes --]

Hello,

I have recently stumbled into an issue with my DPDK-based application
running the failsafe pmd. This pmd uses a tap device, with which my
application fails to start if more than 8 rx queues are used. This issue
appears to be related to this patch:
https://git.dpdk.org/dpdk/commit/?id=c36ce7099c2187926cd62cff7ebd479823554929

I have seen in the documentation that there was a limitation to 8 max
queues shared when using a tap device shared between multiple processes.
However, my application uses a single primary process, with no secondary
process, but it appears that I am still running into this limitation.

Now if we look at this small chunk of code:

memset(&msg, 0, sizeof(msg));
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name,
sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
for (i = 0; i < dev->data->nb_tx_queues; i++) {
    msg.fds[fd_iterator++] = process_private->txq_fds[i];
    msg.num_fds++;
    request_param->txq_count++;
}
for (i = 0; i < dev->data->nb_rx_queues; i++) {
    msg.fds[fd_iterator++] = process_private->rxq_fds[i];
    msg.num_fds++;
    request_param->rxq_count++;
}
(Note that I am not using the latest DPDK version, but stable v23.11.1. But
I believe the issue is still present on latest.)

There are no checks on the maximum value i can take in the for loops. Since
the size of msg.fds is limited by the maximum of 8 queues shared between
process because of the IPC API, there is a potential buffer overflow which
can happen here.

See the struct declaration:
struct rte_mp_msg {
     char name[RTE_MP_MAX_NAME_LEN];
     int len_param;
     int num_fds;
     uint8_t param[RTE_MP_MAX_PARAM_LEN];
     int fds[RTE_MP_MAX_FD_NUM];
};

This means that if the number of queues used is more than 8, the program
will crash. This is what happens on my end as I get the following log:
*** stack smashing detected ***: terminated

Reverting the commit mentionned above fixes my issue. Also setting a check
like this works for me:

if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
     return -1;

I've made the changes on my local branch to fix my issue. This mail is just
to bring attention on this problem.
Thank you in advance for considering it.

Regards,
Edwin Brossette.

[-- Attachment #2: Type: text/html, Size: 2780 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash in tap pmd when using more than 8 rx queues
  2024-09-05 12:55 Crash in tap pmd when using more than 8 rx queues Edwin Brossette
@ 2024-09-06 11:16 ` Ferruh Yigit
  2024-09-06 14:04   ` Edwin Brossette
  2024-09-10 16:58   ` Stephen Hemminger
  0 siblings, 2 replies; 6+ messages in thread
From: Ferruh Yigit @ 2024-09-06 11:16 UTC (permalink / raw)
  To: Edwin Brossette, dev
  Cc: Olivier Matz, Didier Pallard, Laurent Hardy, kparameshwar,
	ferruh.yigit, Stephen Hemminger

On 9/5/2024 1:55 PM, Edwin Brossette wrote:
> Hello,
> 
> I have recently stumbled into an issue with my DPDK-based application
> running the failsafe pmd. This pmd uses a tap device, with which my
> application fails to start if more than 8 rx queues are used. This issue
> appears to be related to this patch:
> https://git.dpdk.org/dpdk/commit/?
> id=c36ce7099c2187926cd62cff7ebd479823554929 <https://git.dpdk.org/dpdk/
> commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>
> 
> I have seen in the documentation that there was a limitation to 8 max
> queues shared when using a tap device shared between multiple processes.
> However, my application uses a single primary process, with no secondary
> process, but it appears that I am still running into this limitation.
> 
> Now if we look at this small chunk of code:
> 
> memset(&msg, 0, sizeof(msg));
> strlcpy(msg.name <http://msg.name>, TAP_MP_REQ_START_RXTX,
> sizeof(msg.name <http://msg.name>));
> strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-
>>port_name));
> msg.len_param = sizeof(*request_param);
> for (i = 0; i < dev->data->nb_tx_queues; i++) {
>     msg.fds[fd_iterator++] = process_private->txq_fds[i];
>     msg.num_fds++;
>     request_param->txq_count++;
> }
> for (i = 0; i < dev->data->nb_rx_queues; i++) {
>     msg.fds[fd_iterator++] = process_private->rxq_fds[i];
>     msg.num_fds++;
>     request_param->rxq_count++;
> }
> (Note that I am not using the latest DPDK version, but stable v23.11.1.
> But I believe the issue is still present on latest.)
> 
> There are no checks on the maximum value i can take in the for loops.
> Since the size of msg.fds is limited by the maximum of 8 queues shared
> between process because of the IPC API, there is a potential buffer
> overflow which can happen here.
> 
> See the struct declaration:
> struct rte_mp_msg {
>      char name[RTE_MP_MAX_NAME_LEN];
>      int len_param;
>      int num_fds;
>      uint8_t param[RTE_MP_MAX_PARAM_LEN];
>      int fds[RTE_MP_MAX_FD_NUM];
> };
> 
> This means that if the number of queues used is more than 8, the program
> will crash. This is what happens on my end as I get the following log:
> *** stack smashing detected ***: terminated
> 
> Reverting the commit mentionned above fixes my issue. Also setting a
> check like this works for me:
> 
> if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
>      return -1;
> 
> I've made the changes on my local branch to fix my issue. This mail is
> just to bring attention on this problem.
> Thank you in advance for considering it.
> 

Hi Edwin,

Thanks for the report, I confirm issue is valid, although that code
changed a little (to increase 8 limit) [3].

And in this release Stephen put another patch [1] to increase the limit
even more, but irrelevant from the limit, tap code needs to be fixed.

To fix:
1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
mentioned, to not blindly update the 'msg.fds[]'
2. We should prevent this to be a limit for tap PMD when there is only
primary process, this seems was oversight in our end.


Can you work on the issue or just reporting it?
Can you please report the bug in Bugzilla [2], to record the issue?



[1]
https://patches.dpdk.org/project/dpdk/patch/20240905162018.74301-1-stephen@networkplumber.org/

[2]
https://bugs.dpdk.org/

[3]
https://git.dpdk.org/dpdk/commit/?id=72ab1dc1598e


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash in tap pmd when using more than 8 rx queues
  2024-09-06 11:16 ` Ferruh Yigit
@ 2024-09-06 14:04   ` Edwin Brossette
  2024-09-06 14:14     ` Ferruh Yigit
  2024-09-10 16:58   ` Stephen Hemminger
  1 sibling, 1 reply; 6+ messages in thread
From: Edwin Brossette @ 2024-09-06 14:04 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, Olivier Matz, Didier Pallard, Laurent Hardy, Stephen Hemminger

[-- Attachment #1: Type: text/plain, Size: 4055 bytes --]

Hello,

I created a Bugzilla PR, just as you requested:
https://bugs.dpdk.org/show_bug.cgi?id=1536

As for the bug resolution, I have other matters to attend to and I'm afraid
I cannot spend more time on this issue, so I was only planning to report it.

Regards,
Edwin Brossette.

On Fri, Sep 6, 2024 at 1:16 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 9/5/2024 1:55 PM, Edwin Brossette wrote:
> > Hello,
> >
> > I have recently stumbled into an issue with my DPDK-based application
> > running the failsafe pmd. This pmd uses a tap device, with which my
> > application fails to start if more than 8 rx queues are used. This issue
> > appears to be related to this patch:
> > https://git.dpdk.org/dpdk/commit/?
> > id=c36ce7099c2187926cd62cff7ebd479823554929 <https://git.dpdk.org/dpdk/
> > commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>
> >
> > I have seen in the documentation that there was a limitation to 8 max
> > queues shared when using a tap device shared between multiple processes.
> > However, my application uses a single primary process, with no secondary
> > process, but it appears that I am still running into this limitation.
> >
> > Now if we look at this small chunk of code:
> >
> > memset(&msg, 0, sizeof(msg));
> > strlcpy(msg.name <http://msg.name>, TAP_MP_REQ_START_RXTX,
> > sizeof(msg.name <http://msg.name>));
> > strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-
> >>port_name));
> > msg.len_param = sizeof(*request_param);
> > for (i = 0; i < dev->data->nb_tx_queues; i++) {
> >     msg.fds[fd_iterator++] = process_private->txq_fds[i];
> >     msg.num_fds++;
> >     request_param->txq_count++;
> > }
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >     msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> >     msg.num_fds++;
> >     request_param->rxq_count++;
> > }
> > (Note that I am not using the latest DPDK version, but stable v23.11.1.
> > But I believe the issue is still present on latest.)
> >
> > There are no checks on the maximum value i can take in the for loops.
> > Since the size of msg.fds is limited by the maximum of 8 queues shared
> > between process because of the IPC API, there is a potential buffer
> > overflow which can happen here.
> >
> > See the struct declaration:
> > struct rte_mp_msg {
> >      char name[RTE_MP_MAX_NAME_LEN];
> >      int len_param;
> >      int num_fds;
> >      uint8_t param[RTE_MP_MAX_PARAM_LEN];
> >      int fds[RTE_MP_MAX_FD_NUM];
> > };
> >
> > This means that if the number of queues used is more than 8, the program
> > will crash. This is what happens on my end as I get the following log:
> > *** stack smashing detected ***: terminated
> >
> > Reverting the commit mentionned above fixes my issue. Also setting a
> > check like this works for me:
> >
> > if (dev->data->nb_tx_queues + dev->data->nb_rx_queues >
> RTE_MP_MAX_FD_NUM)
> >      return -1;
> >
> > I've made the changes on my local branch to fix my issue. This mail is
> > just to bring attention on this problem.
> > Thank you in advance for considering it.
> >
>
> Hi Edwin,
>
> Thanks for the report, I confirm issue is valid, although that code
> changed a little (to increase 8 limit) [3].
>
> And in this release Stephen put another patch [1] to increase the limit
> even more, but irrelevant from the limit, tap code needs to be fixed.
>
> To fix:
> 1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
> mentioned, to not blindly update the 'msg.fds[]'
> 2. We should prevent this to be a limit for tap PMD when there is only
> primary process, this seems was oversight in our end.
>
>
> Can you work on the issue or just reporting it?
> Can you please report the bug in Bugzilla [2], to record the issue?
>
>
>
> [1]
>
> https://patches.dpdk.org/project/dpdk/patch/20240905162018.74301-1-stephen@networkplumber.org/
>
> [2]
> https://bugs.dpdk.org/
>
> [3]
> https://git.dpdk.org/dpdk/commit/?id=72ab1dc1598e
>
>

[-- Attachment #2: Type: text/html, Size: 5737 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash in tap pmd when using more than 8 rx queues
  2024-09-06 14:04   ` Edwin Brossette
@ 2024-09-06 14:14     ` Ferruh Yigit
  0 siblings, 0 replies; 6+ messages in thread
From: Ferruh Yigit @ 2024-09-06 14:14 UTC (permalink / raw)
  To: Edwin Brossette
  Cc: dev, Olivier Matz, Didier Pallard, Laurent Hardy,
	Stephen Hemminger, Kumara Parameshwaran

On 9/6/2024 3:04 PM, Edwin Brossette wrote:
> Hello,
> 
> I created a Bugzilla PR, just as you requested:
> https://bugs.dpdk.org/show_bug.cgi?id=1536 <https://bugs.dpdk.org/
> show_bug.cgi?id=1536>
> 
> As for the bug resolution, I have other matters to attend to and I'm
> afraid I cannot spend more time on this issue, so I was only planning to
> report it.
>

Fair enough, thanks for the report.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash in tap pmd when using more than 8 rx queues
  2024-09-06 11:16 ` Ferruh Yigit
  2024-09-06 14:04   ` Edwin Brossette
@ 2024-09-10 16:58   ` Stephen Hemminger
  2024-09-10 17:25     ` Ferruh Yigit
  1 sibling, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2024-09-10 16:58 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Edwin Brossette, dev, Olivier Matz, Didier Pallard,
	Laurent Hardy, kparameshwar, ferruh.yigit

On Fri, 6 Sep 2024 12:16:47 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 9/5/2024 1:55 PM, Edwin Brossette wrote:
> > Hello,
> > 
> > I have recently stumbled into an issue with my DPDK-based application
> > running the failsafe pmd. This pmd uses a tap device, with which my
> > application fails to start if more than 8 rx queues are used. This issue
> > appears to be related to this patch:
> > https://git.dpdk.org/dpdk/commit/?
> > id=c36ce7099c2187926cd62cff7ebd479823554929 <https://git.dpdk.org/dpdk/  
> > commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>  
> > 
> > I have seen in the documentation that there was a limitation to 8 max
> > queues shared when using a tap device shared between multiple processes.
> > However, my application uses a single primary process, with no secondary
> > process, but it appears that I am still running into this limitation.
> > 
> > Now if we look at this small chunk of code:
> > 
> > memset(&msg, 0, sizeof(msg));
> > strlcpy(msg.name <http://msg.name>, TAP_MP_REQ_START_RXTX,
> > sizeof(msg.name <http://msg.name>));
> > strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-  
> >>port_name));  
> > msg.len_param = sizeof(*request_param);
> > for (i = 0; i < dev->data->nb_tx_queues; i++) {
> >     msg.fds[fd_iterator++] = process_private->txq_fds[i];
> >     msg.num_fds++;
> >     request_param->txq_count++;
> > }
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >     msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> >     msg.num_fds++;
> >     request_param->rxq_count++;
> > }
> > (Note that I am not using the latest DPDK version, but stable v23.11.1.
> > But I believe the issue is still present on latest.)
> > 
> > There are no checks on the maximum value i can take in the for loops.
> > Since the size of msg.fds is limited by the maximum of 8 queues shared
> > between process because of the IPC API, there is a potential buffer
> > overflow which can happen here.
> > 
> > See the struct declaration:
> > struct rte_mp_msg {
> >      char name[RTE_MP_MAX_NAME_LEN];
> >      int len_param;
> >      int num_fds;
> >      uint8_t param[RTE_MP_MAX_PARAM_LEN];
> >      int fds[RTE_MP_MAX_FD_NUM];
> > };
> > 
> > This means that if the number of queues used is more than 8, the program
> > will crash. This is what happens on my end as I get the following log:
> > *** stack smashing detected ***: terminated
> > 
> > Reverting the commit mentionned above fixes my issue. Also setting a
> > check like this works for me:
> > 
> > if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
> >      return -1;
> > 
> > I've made the changes on my local branch to fix my issue. This mail is
> > just to bring attention on this problem.
> > Thank you in advance for considering it.
> >   
> 
> Hi Edwin,
> 
> Thanks for the report, I confirm issue is valid, although that code
> changed a little (to increase 8 limit) [3].
> 
> And in this release Stephen put another patch [1] to increase the limit
> even more, but irrelevant from the limit, tap code needs to be fixed.
> 
> To fix:
> 1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
> mentioned, to not blindly update the 'msg.fds[]'
> 2. We should prevent this to be a limit for tap PMD when there is only
> primary process, this seems was oversight in our end.
> 

It is not clear what the error handling should be if the user requests
10 queues but RTE_MP_MAX_FD_NUM is 8. Ideally, it should work if no secondary
process is used. But there is no good way to know that in the driver.

That is why it is best to just set TAP max queues to be less than
or equal to RTE_MP_MAX_FD_NUM, and enforce that with a static assertion
at compile time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash in tap pmd when using more than 8 rx queues
  2024-09-10 16:58   ` Stephen Hemminger
@ 2024-09-10 17:25     ` Ferruh Yigit
  0 siblings, 0 replies; 6+ messages in thread
From: Ferruh Yigit @ 2024-09-10 17:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Edwin Brossette, dev, Olivier Matz, Didier Pallard,
	Laurent Hardy, kparameshwar, ferruh.yigit

On 9/10/2024 5:58 PM, Stephen Hemminger wrote:
> On Fri, 6 Sep 2024 12:16:47 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>> On 9/5/2024 1:55 PM, Edwin Brossette wrote:
>>> Hello,
>>>
>>> I have recently stumbled into an issue with my DPDK-based application
>>> running the failsafe pmd. This pmd uses a tap device, with which my
>>> application fails to start if more than 8 rx queues are used. This issue
>>> appears to be related to this patch:
>>> https://git.dpdk.org/dpdk/commit/?
>>> id=c36ce7099c2187926cd62cff7ebd479823554929 <https://git.dpdk.org/dpdk/  
>>> commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>  
>>>
>>> I have seen in the documentation that there was a limitation to 8 max
>>> queues shared when using a tap device shared between multiple processes.
>>> However, my application uses a single primary process, with no secondary
>>> process, but it appears that I am still running into this limitation.
>>>
>>> Now if we look at this small chunk of code:
>>>
>>> memset(&msg, 0, sizeof(msg));
>>> strlcpy(msg.name <http://msg.name>, TAP_MP_REQ_START_RXTX,
>>> sizeof(msg.name <http://msg.name>));
>>> strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-  
>>>> port_name));  
>>> msg.len_param = sizeof(*request_param);
>>> for (i = 0; i < dev->data->nb_tx_queues; i++) {
>>>     msg.fds[fd_iterator++] = process_private->txq_fds[i];
>>>     msg.num_fds++;
>>>     request_param->txq_count++;
>>> }
>>> for (i = 0; i < dev->data->nb_rx_queues; i++) {
>>>     msg.fds[fd_iterator++] = process_private->rxq_fds[i];
>>>     msg.num_fds++;
>>>     request_param->rxq_count++;
>>> }
>>> (Note that I am not using the latest DPDK version, but stable v23.11.1.
>>> But I believe the issue is still present on latest.)
>>>
>>> There are no checks on the maximum value i can take in the for loops.
>>> Since the size of msg.fds is limited by the maximum of 8 queues shared
>>> between process because of the IPC API, there is a potential buffer
>>> overflow which can happen here.
>>>
>>> See the struct declaration:
>>> struct rte_mp_msg {
>>>      char name[RTE_MP_MAX_NAME_LEN];
>>>      int len_param;
>>>      int num_fds;
>>>      uint8_t param[RTE_MP_MAX_PARAM_LEN];
>>>      int fds[RTE_MP_MAX_FD_NUM];
>>> };
>>>
>>> This means that if the number of queues used is more than 8, the program
>>> will crash. This is what happens on my end as I get the following log:
>>> *** stack smashing detected ***: terminated
>>>
>>> Reverting the commit mentionned above fixes my issue. Also setting a
>>> check like this works for me:
>>>
>>> if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
>>>      return -1;
>>>
>>> I've made the changes on my local branch to fix my issue. This mail is
>>> just to bring attention on this problem.
>>> Thank you in advance for considering it.
>>>   
>>
>> Hi Edwin,
>>
>> Thanks for the report, I confirm issue is valid, although that code
>> changed a little (to increase 8 limit) [3].
>>
>> And in this release Stephen put another patch [1] to increase the limit
>> even more, but irrelevant from the limit, tap code needs to be fixed.
>>
>> To fix:
>> 1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
>> mentioned, to not blindly update the 'msg.fds[]'
>> 2. We should prevent this to be a limit for tap PMD when there is only
>> primary process, this seems was oversight in our end.
>>
> 
> It is not clear what the error handling should be if the user requests
> 10 queues but RTE_MP_MAX_FD_NUM is 8. Ideally, it should work if no secondary
> process is used. But there is no good way to know that in the driver.
> 
> That is why it is best to just set TAP max queues to be less than
> or equal to RTE_MP_MAX_FD_NUM, and enforce that with a static assertion
> at compile time.
>

We can limit queue number to RTE_MP_MAX_FD_NUM when there is secondary
process, but when there is only primary process this restriction is
artificial, and I think we should try to remove it.

One solution can be to provide a devarg if tap pmd will be used only in
primary process, and in this case it skips all code required for
secondary process support, this removes the queue number limit.
Also this can be backward compatible, without devarg default behavior
can be secondary support and queue number is limited, with devarg limit
removed etc..

By spending some time on it probably we can come with even better
solution. Like improving MP socket communication to remove this
restriction completely independent from RTE_MP_MAX_FD_NUM size...






^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-10 17:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-05 12:55 Crash in tap pmd when using more than 8 rx queues Edwin Brossette
2024-09-06 11:16 ` Ferruh Yigit
2024-09-06 14:04   ` Edwin Brossette
2024-09-06 14:14     ` Ferruh Yigit
2024-09-10 16:58   ` Stephen Hemminger
2024-09-10 17:25     ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).