From: Ferruh Yigit <ferruh.yigit@amd.com>
To: "You, KaisenX" <kaisenx.you@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>,
David Marchand <david.marchand@redhat.com>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
"Yang, Qiming" <qiming.yang@intel.com>,
"Zhou, YidingX" <yidingx.zhou@intel.com>,
"Wu, Jingjing" <jingjing.wu@intel.com>,
"Xing, Beilei" <beilei.xing@intel.com>,
"Zhang, Qi Z" <qi.z.zhang@intel.com>,
Luca Boccassi <bluca@debian.org>,
"Mcnamara, John" <john.mcnamara@intel.com>,
Kevin Traynor <ktraynor@redhat.com>
Subject: Re: [PATCH] net/iavf:fix slow memory allocation
Date: Wed, 21 Dec 2022 13:48:58 +0000 [thread overview]
Message-ID: <acd61d39-8a18-055f-3ea1-7a25088601df@amd.com> (raw)
In-Reply-To: <SJ0PR11MB6765839AAA1D619724D69908E1EA9@SJ0PR11MB6765.namprd11.prod.outlook.com>
On 12/20/2022 6:52 AM, You, KaisenX wrote:
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: 2022年12月13日 21:28
>> To: You, KaisenX <kaisenx.you@intel.com>; dev@dpdk.org; Burakov,
>> Anatoly <anatoly.burakov@intel.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
>> <yidingx.zhou@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
>> Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Luca
>> Boccassi <bluca@debian.org>; Mcnamara, John
>> <john.mcnamara@intel.com>; Kevin Traynor <ktraynor@redhat.com>
>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
>>
>> On 12/13/2022 9:35 AM, Ferruh Yigit wrote:
>>> On 12/13/2022 7:52 AM, You, KaisenX wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>>> Sent: 2022年12月8日 23:04
>>>>> To: You, KaisenX <kaisenx.you@intel.com>; dev@dpdk.org; Burakov,
>>>>> Anatoly <anatoly.burakov@intel.com>; David Marchand
>>>>> <david.marchand@redhat.com>
>>>>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou,
>>>>> YidingX <yidingx.zhou@intel.com>; Wu, Jingjing
>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
>>>>> Zhang, Qi Z <qi.z.zhang@intel.com>; Luca Boccassi
>>>>> <bluca@debian.org>; Mcnamara, John <john.mcnamara@intel.com>;
>> Kevin
>>>>> Traynor <ktraynor@redhat.com>
>>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
>>>>>
>>>>> On 11/17/2022 6:57 AM, Kaisen You wrote:
>>>>>> In some cases, the DPDK does not allocate hugepage heap memory to
>>>>> some
>>>>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET 0
>>>>>> has no memory).
>>>>>> When the interrupt thread runs on the corresponding core of this
>>>>>> socket, each allocation/release will execute a whole set of heap
>>>>>> allocation/release operations,resulting in poor performance.
>>>>>> Instead we call malloc() to get memory from the system's heap space
>>>>>> to fix this problem.
>>>>>>
>>>>>
>>>>> Hi Kaisen,
>>>>>
>>>>> Using libc malloc can improve performance for this case, but I would
>>>>> like to understand root cause of the problem.
>>>>>
>>>>>
>>>>> As far as I can see, interrupt callbacks are run by interrupt thread
>>>>> ("eal-intr- thread"), and interrupt thread created by
>> 'rte_ctrl_thread_create()' API.
>>>>>
>>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity
>>>>> retrieved at the time 'rte_eal_init()' was called,"
>>>>>
>>>>> And 'rte_eal_init()' is run on main lcore, which is the first lcore
>>>>> in the core list (unless otherwise defined with --main-lcore).
>>>>>
>>>>> So, the interrupts should be running on a core that has hugepages
>>>>> allocated for it, am I missing something here?
>>>>>
>>>>>
>>>> Thank for your comments. Let me try to explain the root cause here:
>>>> eal_intr_thread the CPU in the corresponding slot does not create
>> memory pool.
>>>> That results in frequent memory subsequently creating/destructing.
>>>>
>>>> When testpmd started, the parameter (e.g. -l 40-79) is set.
>>>> Different OS has different topology. Some OS like SUSE only creates
>>>> memory pool for one CPU slot, while other system creates for two.
>>>> That is why the problem occurs when using memories in different OS.
>>>
>>>
>>> It is testpmd application that decides from which socket to allocate
>>> memory from, right. This is nothing specific to OS.
>>>
>>> As far as I remember, testpmd logic is too allocate from socket that
>>> its cores are used (provided with -l parameter), and allocate from
>>> socket that device is attached to.
>>>
>>> So, in a dual socket system, if all used cores are in socket 1 and the
>>> NIC is in socket 1, no memory is allocated for socket 0. This is to
>>> optimize memory consumption.
>>>
>>>
>>> Can you please confirm that the problem you are observing is because
>>> interrupt handler is running on a CPU, which doesn't have memory
>>> allocated for its socket?
>>>
>>> In this case what I don't understand is why interrupts is not running
>>> on main lcore, which should be first core in the list, for "-l 40-79"
>>> sample it should be lcore 40.
>>> For your case, is interrupt handler run on core 0? Or any arbitrary core?
>>> If so, can you please confirm when you provide core list as "-l 0,40-79"
>>> fixes the issue?
>>>
> First of all, sorry to reply to you so late.
> I can confirm that the problem I observed is because interrupt handler is
> running on a CPU, which doesn't have memory allocated for its socket.
>
> In my case, interrupt handler is running on core 0.
> I tried providing "-l 0,40-79" as a startup parameter, this issue can be resolved.
>
> I corrected the previous statement that this problem does only occur on
> the SUSE system. In any OS, this problem occurs as long as the range of
> startup parameters is only on node1.
>
>>>
>>>>>
>>>>>
>>>>> And what about using 'rte_malloc_socket()' API (instead of
>>>>> rte_malloc), which gets 'socket' as parameter, and provide the
>>>>> socket that devices is on as parameter to this API? Is it possible to test
>> this?
>>>>>
>>>>>
>>>> As to the reason for not using rte_malloc_socket. I thought
>>>> rte_malloc_socket() could solve the problem too. And the appropriate
>>>> parameter should be the socket_id that created the memory pool for
>>>> DPDK initialization. Assuming that> the socket_id of the initially
>>>> allocated memory = 1, first let the
>>> eal_intr_thread
>>>> determine if it is on the socket_id, then record this socket_id in
>>>> the eal_intr_thread and pass it to the iavf_event_thread. But there
>>>> seems no way to link this parameter to the iavf_dev_event_post()
>> function. That is why rte_malloc_socket is not used.
>>>>
>>>
>>> I was thinking socket id of device can be used, but that won't help if
>>> the core that interrupt handler runs is in different socket.
>>> And I also don't know if there is a way to get socket that interrupt
>>> thread is on. @David may help perhaps.
>>>
>>> So question is why interrupt thread is not running on main lcore.
>>>
>>
>> OK after some talk with David, what I am missing is 'rte_ctrl_thread_create()'
>> does NOT run on main lcore, it can run on any core except data plane cores.
>>
>> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and interrupt
>> thread (so driver interrupt callback iavf_dev_event_post()) can run on any
>> core, making it hard to manage.
>> And it seems it is not possible to control where interrupt thread to run.
>>
>> One option can be allocating hugepages for all sockets, but this requires user
>> involvement, and can't happen transparently.
>>
>> Other option can be to control where "iavf-event-thread" run, like using
>> 'rte_thread_create()' to create thread and provide attribute to run it on main
>> lcore (rte_lcore_cpuset(rte_get_main_lcore()))?
>>
>> Can you please test above option?
>>
>>
> The first option can solve this issue. but to borrow from your previous saying,
> "in a dual socket system, if all used cores are in socket 1 and the NIC is in socket 1,
> no memory is allocated for socket 0. This is to optimize memory consumption."
> I think it's unreasonable to do so.
>
> About other option. In " rte_eal_intr_init" function, After the thread is created,
> I set the thread affinity for eal-intr-thread, but it does not solve this issue.
Hi Kaisen,
There are two threads involved,
First one is interrupt thread, "eal-intr-thread", created by
'rte_eal_intr_init()'.
Second one is iavf event handler, "iavf-event-thread", created by
'iavf_dev_event_handler_init()'.
First one triggered by interrupt and puts a message to a list, second
one consumes from the list and processes the message.
So I assume two thread being in different sockets, or memory being
allocated in a different socket than the cores running causes the
performance issue.
Did you test the second thread, "iavf-event-thread", affiliated to main
core? (by creating thread using 'rte_thread_create()' API)
>>
>>>> Let me know if there is anything else unclear.
>>>>>
>>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Kaisen You <kaisenx.you@intel.com>
>>>>>> ---
>>>>>> drivers/net/iavf/iavf_vchnl.c | 8 +++-----
>>>>>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/iavf/iavf_vchnl.c
>>>>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48 100644
>>>>>> --- a/drivers/net/iavf/iavf_vchnl.c
>>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c
>>>>>> @@ -36,7 +36,6 @@ struct iavf_event_element {
>>>>>> struct rte_eth_dev *dev;
>>>>>> enum rte_eth_event_type event;
>>>>>> void *param;
>>>>>> - size_t param_alloc_size;
>>>>>> uint8_t param_alloc_data[0];
>>>>>> };
>>>>>>
>>>>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param
>> __rte_unused)
>>>>>> TAILQ_FOREACH_SAFE(pos, &pending, next, save_next) {
>>>>>> TAILQ_REMOVE(&pending, pos, next);
>>>>>> rte_eth_dev_callback_process(pos->dev, pos- event,
>> pos->param);
>>>>>> - rte_free(pos);
>>>>>> + free(pos);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev *dev,
>> {
>>>>>> struct iavf_event_handler *handler = &event_handler;
>>>>>> char notify_byte;
>>>>>> - struct iavf_event_element *elem = rte_malloc(NULL, sizeof(*elem)
>>>>> + param_alloc_size, 0);
>>>>>> + struct iavf_event_element *elem = malloc(sizeof(*elem) +
>>>>>> +param_alloc_size);
>>>>>> if (!elem)
>>>>>> return;
>>>>>>
>>>>>> elem->dev = dev;
>>>>>> elem->event = event;
>>>>>> elem->param = param;
>>>>>> - elem->param_alloc_size = param_alloc_size;
>>>>>> if (param && param_alloc_size) {
>>>>>> rte_memcpy(elem->param_alloc_data, param,
>>>>> param_alloc_size);
>>>>>> elem->param = elem->param_alloc_data; @@ -165,7 +163,7
>>>>> @@
>>>>>> iavf_dev_event_handler_fini(void)
>>>>>> struct iavf_event_element *pos, *save_next;
>>>>>> TAILQ_FOREACH_SAFE(pos, &handler->pending, next, save_next) {
>>>>>> TAILQ_REMOVE(&handler->pending, pos, next);
>>>>>> - rte_free(pos);
>>>>>> + free(pos);
>>>>>> }
>>>>>> }
>>>>>>
>>>>
>>>
>
next prev parent reply other threads:[~2022-12-21 13:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 6:57 Kaisen You
2022-11-18 8:22 ` Jiale, SongX
2022-12-07 9:07 ` You, KaisenX
2022-12-08 8:46 ` Wu, Jingjing
2022-12-08 15:04 ` Ferruh Yigit
2022-12-13 7:52 ` You, KaisenX
2022-12-13 9:35 ` Ferruh Yigit
2022-12-13 13:27 ` Ferruh Yigit
2022-12-20 6:52 ` You, KaisenX
2022-12-20 9:33 ` David Marchand
2022-12-20 10:11 ` You, KaisenX
2022-12-20 10:33 ` David Marchand
2022-12-21 9:12 ` You, KaisenX
2022-12-21 10:50 ` David Marchand
2022-12-22 6:42 ` You, KaisenX
2022-12-27 6:06 ` You, KaisenX
2023-01-10 10:16 ` David Marchand
2023-01-13 6:24 ` You, KaisenX
2022-12-21 13:48 ` Ferruh Yigit [this message]
2022-12-22 7:23 ` You, KaisenX
2022-12-22 12:06 ` Ferruh Yigit
2022-12-26 2:17 ` Zhang, Qi Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acd61d39-8a18-055f-3ea1-7a25088601df@amd.com \
--to=ferruh.yigit@amd.com \
--cc=anatoly.burakov@intel.com \
--cc=beilei.xing@intel.com \
--cc=bluca@debian.org \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=jingjing.wu@intel.com \
--cc=john.mcnamara@intel.com \
--cc=kaisenx.you@intel.com \
--cc=ktraynor@redhat.com \
--cc=qi.z.zhang@intel.com \
--cc=qiming.yang@intel.com \
--cc=stable@dpdk.org \
--cc=yidingx.zhou@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).