patches for DPDK stable branches
 help / color / mirror / Atom feed
From: "You, KaisenX" <kaisenx.you@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	David Marchand <david.marchand@redhat.com>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
	"Yang, Qiming" <qiming.yang@intel.com>,
	"Zhou, YidingX" <yidingx.zhou@intel.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"Xing, Beilei" <beilei.xing@intel.com>,
	"Zhang, Qi Z" <qi.z.zhang@intel.com>,
	Luca Boccassi <bluca@debian.org>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	Kevin Traynor <ktraynor@redhat.com>
Subject: RE: [PATCH] net/iavf:fix slow memory allocation
Date: Tue, 20 Dec 2022 06:52:13 +0000	[thread overview]
Message-ID: <SJ0PR11MB6765839AAA1D619724D69908E1EA9@SJ0PR11MB6765.namprd11.prod.outlook.com> (raw)
In-Reply-To: <3ad04278-59c0-0c60-5c8c-9e57f33bb0de@amd.com>



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: 2022年12月13日 21:28
> To: You, KaisenX <kaisenx.you@intel.com>; dev@dpdk.org; Burakov,
> Anatoly <anatoly.burakov@intel.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Luca
> Boccassi <bluca@debian.org>; Mcnamara, John
> <john.mcnamara@intel.com>; Kevin Traynor <ktraynor@redhat.com>
> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
> 
> On 12/13/2022 9:35 AM, Ferruh Yigit wrote:
> > On 12/13/2022 7:52 AM, You, KaisenX wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >>> Sent: 2022年12月8日 23:04
> >>> To: You, KaisenX <kaisenx.you@intel.com>; dev@dpdk.org; Burakov,
> >>> Anatoly <anatoly.burakov@intel.com>; David Marchand
> >>> <david.marchand@redhat.com>
> >>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou,
> >>> YidingX <yidingx.zhou@intel.com>; Wu, Jingjing
> >>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
> >>> Zhang, Qi Z <qi.z.zhang@intel.com>; Luca Boccassi
> >>> <bluca@debian.org>; Mcnamara, John <john.mcnamara@intel.com>;
> Kevin
> >>> Traynor <ktraynor@redhat.com>
> >>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
> >>>
> >>> On 11/17/2022 6:57 AM, Kaisen You wrote:
> >>>> In some cases, the DPDK does not allocate hugepage heap memory to
> >>> some
> >>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET 0
> >>>> has no memory).
> >>>> When the interrupt thread runs on the corresponding core of this
> >>>> socket, each allocation/release will execute a whole set of heap
> >>>> allocation/release operations,resulting in poor performance.
> >>>> Instead we call malloc() to get memory from the system's heap space
> >>>> to fix this problem.
> >>>>
> >>>
> >>> Hi Kaisen,
> >>>
> >>> Using libc malloc can improve performance for this case, but I would
> >>> like to understand root cause of the problem.
> >>>
> >>>
> >>> As far as I can see, interrupt callbacks are run by interrupt thread
> >>> ("eal-intr- thread"), and interrupt thread created by
> 'rte_ctrl_thread_create()' API.
> >>>
> >>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity
> >>> retrieved at the time 'rte_eal_init()' was called,"
> >>>
> >>> And 'rte_eal_init()' is run on main lcore, which is the first lcore
> >>> in the core list (unless otherwise defined with --main-lcore).
> >>>
> >>> So, the interrupts should be running on a core that has hugepages
> >>> allocated for it, am I missing something here?
> >>>
> >>>
> >> Thank for your comments.  Let me try to explain the root cause here:
> >> eal_intr_thread the CPU in the corresponding slot does not create
> memory pool.
> >> That results in frequent memory subsequently creating/destructing.
> >>
> >> When testpmd started, the parameter (e.g. -l 40-79) is set.
> >> Different OS has different topology. Some OS like SUSE only creates
> >> memory pool for one CPU slot, while other system creates for two.
> >> That is why the problem occurs when using memories in different OS.
> >
> >
> > It is testpmd application that decides from which socket to allocate
> > memory from, right. This is nothing specific to OS.
> >
> > As far as I remember, testpmd logic is too allocate from socket that
> > its cores are used (provided with -l parameter), and allocate from
> > socket that device is attached to.
> >
> > So, in a dual socket system, if all used cores are in socket 1 and the
> > NIC is in socket 1, no memory is allocated for socket 0. This is to
> > optimize memory consumption.
> >
> >
> > Can you please confirm that the problem you are observing is because
> > interrupt handler is running on a CPU, which doesn't have memory
> > allocated for its socket?
> >
> > In this case what I don't understand is why interrupts is not running
> > on main lcore, which should be first core in the list, for "-l 40-79"
> > sample it should be lcore 40.
> > For your case, is interrupt handler run on core 0? Or any arbitrary core?
> > If so, can you please confirm when you provide core list as "-l 0,40-79"
> > fixes the issue?
> >
First of all, sorry to reply to you so late.
I can confirm that the problem I observed is because  interrupt handler is 
running on a CPU, which doesn't have memory allocated for its socket.

In my case, interrupt handler is running on core 0.
I tried providing "-l 0,40-79" as a startup parameter, this issue can be resolved.

I corrected the previous statement that this problem does  only occur on 
the SUSE system. In any OS, this problem occurs as long as the range of 
startup parameters is only on node1.

> >
> >>>
> >>>
> >>> And what about using 'rte_malloc_socket()' API (instead of
> >>> rte_malloc), which gets 'socket' as parameter, and provide the
> >>> socket that devices is on as parameter to this API? Is it possible to test
> this?
> >>>
> >>>
> >> As to the reason for not using rte_malloc_socket. I thought
> >> rte_malloc_socket() could solve the problem too. And the appropriate
> >> parameter should be the socket_id that created the memory pool for
> >> DPDK initialization. Assuming that> the socket_id of the initially
> >> allocated memory = 1, first let the
> > eal_intr_thread
> >> determine if it is on the socket_id, then record this socket_id in
> >> the eal_intr_thread and pass it to the iavf_event_thread.  But there
> >> seems no way to link this parameter to the iavf_dev_event_post()
> function. That is why rte_malloc_socket is not used.
> >>
> >
> > I was thinking socket id of device can be used, but that won't help if
> > the core that interrupt handler runs is in different socket.
> > And I also don't know if there is a way to get socket that interrupt
> > thread is on. @David may help perhaps.
> >
> > So question is why interrupt thread is not running on main lcore.
> >
> 
> OK after some talk with David, what I am missing is 'rte_ctrl_thread_create()'
> does NOT run on main lcore, it can run on any core except data plane cores.
> 
> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and interrupt
> thread (so driver interrupt callback iavf_dev_event_post()) can run on any
> core, making it hard to manage.
> And it seems it is not possible to control where interrupt thread to run.
> 
> One option can be allocating hugepages for all sockets, but this requires user
> involvement, and can't happen transparently.
> 
> Other option can be to control where "iavf-event-thread" run, like using
> 'rte_thread_create()' to create thread and provide attribute to run it on main
> lcore (rte_lcore_cpuset(rte_get_main_lcore()))?
> 
> Can you please test above option?
> 
> 
The first option can solve this issue. but to borrow from your previous saying, 
"in a dual socket system, if all used cores are in socket 1 and the NIC is in socket 1,
 no memory is allocated for socket 0.  This is to optimize memory consumption."
I think it's unreasonable to do so.

About other option. In " rte_eal_intr_init" function, After the thread is created, 
I set the thread affinity for eal-intr-thread, but it does not solve this issue.
> 
> >> Let me know if there is anything else unclear.
> >>>
> >>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Kaisen You <kaisenx.you@intel.com>
> >>>> ---
> >>>>  drivers/net/iavf/iavf_vchnl.c | 8 +++-----
> >>>>  1 file changed, 3 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/iavf/iavf_vchnl.c
> >>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48 100644
> >>>> --- a/drivers/net/iavf/iavf_vchnl.c
> >>>> +++ b/drivers/net/iavf/iavf_vchnl.c
> >>>> @@ -36,7 +36,6 @@ struct iavf_event_element {
> >>>>  	struct rte_eth_dev *dev;
> >>>>  	enum rte_eth_event_type event;
> >>>>  	void *param;
> >>>> -	size_t param_alloc_size;
> >>>>  	uint8_t param_alloc_data[0];
> >>>>  };
> >>>>
> >>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param
> __rte_unused)
> >>>>  		TAILQ_FOREACH_SAFE(pos, &pending, next, save_next) {
> >>>>  			TAILQ_REMOVE(&pending, pos, next);
> >>>>  			rte_eth_dev_callback_process(pos->dev, pos- event,
> pos->param);
> >>>> -			rte_free(pos);
> >>>> +			free(pos);
> >>>>  		}
> >>>>  	}
> >>>>
> >>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev *dev,
> {
> >>>>  	struct iavf_event_handler *handler = &event_handler;
> >>>>  	char notify_byte;
> >>>> -	struct iavf_event_element *elem = rte_malloc(NULL, sizeof(*elem)
> >>> + param_alloc_size, 0);
> >>>> +	struct iavf_event_element *elem = malloc(sizeof(*elem) +
> >>>> +param_alloc_size);
> >>>>  	if (!elem)
> >>>>  		return;
> >>>>
> >>>>  	elem->dev = dev;
> >>>>  	elem->event = event;
> >>>>  	elem->param = param;
> >>>> -	elem->param_alloc_size = param_alloc_size;
> >>>>  	if (param && param_alloc_size) {
> >>>>  		rte_memcpy(elem->param_alloc_data, param,
> >>> param_alloc_size);
> >>>>  		elem->param = elem->param_alloc_data; @@ -165,7 +163,7
> >>> @@
> >>>> iavf_dev_event_handler_fini(void)
> >>>>  	struct iavf_event_element *pos, *save_next;
> >>>>  	TAILQ_FOREACH_SAFE(pos, &handler->pending, next, save_next) {
> >>>>  		TAILQ_REMOVE(&handler->pending, pos, next);
> >>>> -		rte_free(pos);
> >>>> +		free(pos);
> >>>>  	}
> >>>>  }
> >>>>
> >>
> >


  reply	other threads:[~2022-12-20  6:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  6:57 Kaisen You
2022-11-18  8:22 ` Jiale, SongX
2022-12-07  9:07 ` You, KaisenX
2022-12-08  8:46 ` Wu, Jingjing
2022-12-08 15:04 ` Ferruh Yigit
2022-12-13  7:52   ` You, KaisenX
2022-12-13  9:35     ` Ferruh Yigit
2022-12-13 13:27       ` Ferruh Yigit
2022-12-20  6:52         ` You, KaisenX [this message]
2022-12-20  9:33           ` David Marchand
2022-12-20 10:11             ` You, KaisenX
2022-12-20 10:33               ` David Marchand
2022-12-21  9:12                 ` You, KaisenX
2022-12-21 10:50                   ` David Marchand
2022-12-22  6:42                     ` You, KaisenX
2022-12-27  6:06                       ` You, KaisenX
2023-01-10 10:16                         ` David Marchand
2023-01-13  6:24                           ` You, KaisenX
2022-12-21 13:48           ` Ferruh Yigit
2022-12-22  7:23             ` You, KaisenX
2022-12-22 12:06               ` Ferruh Yigit
2022-12-26  2:17                 ` Zhang, Qi Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB6765839AAA1D619724D69908E1EA9@SJ0PR11MB6765.namprd11.prod.outlook.com \
    --to=kaisenx.you@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=beilei.xing@intel.com \
    --cc=bluca@debian.org \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=jingjing.wu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=ktraynor@redhat.com \
    --cc=qi.z.zhang@intel.com \
    --cc=qiming.yang@intel.com \
    --cc=stable@dpdk.org \
    --cc=yidingx.zhou@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).