* [dpdk-dev] time taken for allocation of mempool.
@ 2019-11-13 5:07 Venumadhav Josyula
2019-11-13 5:12 ` Venumadhav Josyula
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-13 5:07 UTC (permalink / raw)
To: users, dev; +Cc: Venumadhav Josyula
Hi ,
We are using 'rte_mempool_create' for allocation of flow memory. This has
been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now
here is problem statement
Problem statement :
In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec
for allocation compared to older dpdk (17.05). We have som 8-9 mempools for
our entire product. We do upfront allocation for all of them ( i.e. when
dpdk application is coming up). Our application is run to completion model.
Questions:-
i) is that acceptable / has anybody seen such a thing ?
ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
memory perspective ?
Any pointer are welcome.
Thanks & regards
Venu
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula
@ 2019-11-13 5:12 ` Venumadhav Josyula
2019-11-13 8:32 ` Olivier Matz
2019-11-13 9:19 ` Bruce Richardson
2019-11-18 16:45 ` Venumadhav Josyula
2 siblings, 1 reply; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-13 5:12 UTC (permalink / raw)
To: users, dev; +Cc: Venumadhav Josyula
Hi,
Few more points
Operating system : Centos 7.6
Logging mechanism : syslog
We have logged using syslog before the call and syslog after the call.
Thanks & Regards
Venu
On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com> wrote:
> Hi ,
> We are using 'rte_mempool_create' for allocation of flow memory. This has
> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now
> here is problem statement
>
> Problem statement :
> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools
> for our entire product. We do upfront allocation for all of them ( i.e.
> when dpdk application is coming up). Our application is run to completion
> model.
>
> Questions:-
> i) is that acceptable / has anybody seen such a thing ?
> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> memory perspective ?
>
> Any pointer are welcome.
>
> Thanks & regards
> Venu
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 5:12 ` Venumadhav Josyula
@ 2019-11-13 8:32 ` Olivier Matz
2019-11-13 9:11 ` Venumadhav Josyula
0 siblings, 1 reply; 18+ messages in thread
From: Olivier Matz @ 2019-11-13 8:32 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula
Hi Venu,
On Wed, Nov 13, 2019 at 10:42:07AM +0530, Venumadhav Josyula wrote:
> Hi,
>
> Few more points
>
> Operating system : Centos 7.6
> Logging mechanism : syslog
>
> We have logged using syslog before the call and syslog after the call.
>
> Thanks & Regards
> Venu
>
> On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com> wrote:
>
> > Hi ,
> > We are using 'rte_mempool_create' for allocation of flow memory. This has
> > been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now
> > here is problem statement
> >
> > Problem statement :
> > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> > sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools
> > for our entire product. We do upfront allocation for all of them ( i.e.
> > when dpdk application is coming up). Our application is run to completion
> > model.
> >
> > Questions:-
> > i) is that acceptable / has anybody seen such a thing ?
> > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> > memory perspective ?
Could you give some more details about you use case? (hugepage size, number
of objects, object size, additional mempool flags, ...)
Did you manage to reproduce it in a small test example? We could do some
profiling to investigate.
Thanks for the feedback.
Olivier
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 8:32 ` Olivier Matz
@ 2019-11-13 9:11 ` Venumadhav Josyula
2019-11-13 9:30 ` Olivier Matz
0 siblings, 1 reply; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-13 9:11 UTC (permalink / raw)
To: Olivier Matz; +Cc: users, dev, Venumadhav Josyula
Hi Oliver,
*> Could you give some more details about you use case? (hugepage size,
number of objects, object size, additional mempool flags, ...)*
Ours in telecom product, we support multiple rats. Let us take example of
4G case where we act as an gtpu proxy.
· Hugepage size :- 2 Mb
· *rte_mempool_create in param*
o { name=”gtpu-mem”,
o n=1500000,
o elt_size=224,
o cache_size=0,
o private_data_size=0,
o mp_init=NULL,
o mp_init_arg=NULL,
o obj_init=NULL,
o obj_init_arg=NULL,
o socket_id=rte_socket_id(),
o flags=MEMPOOL_F_SP_PUT }
*> Did you manage to reproduce it in a small test example? We could do some
profiling to investigate.*
No I would love to try that ? Are there examples ?
Thanks,
Regards,
Venu
On Wed, 13 Nov 2019 at 14:02, Olivier Matz <olivier.matz@6wind.com> wrote:
> Hi Venu,
>
> On Wed, Nov 13, 2019 at 10:42:07AM +0530, Venumadhav Josyula wrote:
> > Hi,
> >
> > Few more points
> >
> > Operating system : Centos 7.6
> > Logging mechanism : syslog
> >
> > We have logged using syslog before the call and syslog after the call.
> >
> > Thanks & Regards
> > Venu
> >
> > On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com>
> wrote:
> >
> > > Hi ,
> > > We are using 'rte_mempool_create' for allocation of flow memory. This
> has
> > > been there for a while. We just migrated to dpdk-18.11 from
> dpdk-17.05. Now
> > > here is problem statement
> > >
> > > Problem statement :
> > > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> > > sec for allocation compared to older dpdk (17.05). We have som 8-9
> mempools
> > > for our entire product. We do upfront allocation for all of them ( i.e.
> > > when dpdk application is coming up). Our application is run to
> completion
> > > model.
> > >
> > > Questions:-
> > > i) is that acceptable / has anybody seen such a thing ?
> > > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> > > memory perspective ?
>
> Could you give some more details about you use case? (hugepage size, number
> of objects, object size, additional mempool flags, ...)
>
> Did you manage to reproduce it in a small test example? We could do some
> profiling to investigate.
>
> Thanks for the feedback.
> Olivier
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula
2019-11-13 5:12 ` Venumadhav Josyula
@ 2019-11-13 9:19 ` Bruce Richardson
2019-11-13 17:26 ` Burakov, Anatoly
2019-11-18 16:45 ` Venumadhav Josyula
2 siblings, 1 reply; 18+ messages in thread
From: Bruce Richardson @ 2019-11-13 9:19 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula
On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote:
> Hi ,
> We are using 'rte_mempool_create' for allocation of flow memory. This has
> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now
> here is problem statement
>
> Problem statement :
> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec
> for allocation compared to older dpdk (17.05). We have som 8-9 mempools for
> our entire product. We do upfront allocation for all of them ( i.e. when
> dpdk application is coming up). Our application is run to completion model.
>
> Questions:-
> i) is that acceptable / has anybody seen such a thing ?
> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> memory perspective ?
>
> Any pointer are welcome.
>
Hi,
from 17.05 to 18.11 there was a change in default memory model for DPDK. In
17.05 all DPDK memory was allocated statically upfront and that used for
the memory pools. With 18.11, no large blocks of memory are allocated at
init time, instead the memory is requested from the kernel as it is needed
by the app. This will make the initial startup of an app faster, but the
allocation of new objects like mempools slower, and it could be this you
are seeing.
Some things to try:
1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for use
by your memory pools and see if it improves things.
2. Try using "--legacy-mem" flag to revert to the old memory model.
Regards,
/Bruce
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 9:11 ` Venumadhav Josyula
@ 2019-11-13 9:30 ` Olivier Matz
0 siblings, 0 replies; 18+ messages in thread
From: Olivier Matz @ 2019-11-13 9:30 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula
Hi Venu,
On Wed, Nov 13, 2019 at 02:41:04PM +0530, Venumadhav Josyula wrote:
> Hi Oliver,
>
>
>
> *> Could you give some more details about you use case? (hugepage size,
> number of objects, object size, additional mempool flags, ...)*
>
> Ours in telecom product, we support multiple rats. Let us take example of
> 4G case where we act as an gtpu proxy.
>
> · Hugepage size :- 2 Mb
>
> · *rte_mempool_create in param*
>
> o { name=”gtpu-mem”,
>
> o n=1500000,
>
> o elt_size=224,
>
> o cache_size=0,
>
> o private_data_size=0,
>
> o mp_init=NULL,
>
> o mp_init_arg=NULL,
>
> o obj_init=NULL,
>
> o obj_init_arg=NULL,
>
> o socket_id=rte_socket_id(),
>
> o flags=MEMPOOL_F_SP_PUT }
>
OK, that's quite big mempools (~300MB) but I don't think it should
take that much time.
I suspect that using 1G hugepages could help, in case it is related
to the memory allocator.
> *> Did you manage to reproduce it in a small test example? We could do some
> profiling to investigate.*
>
> No I would love to try that ? Are there examples ?
The simplest way for me is to hack the unit tests. Add this code (not
tested) at the beginning of test_mempool.c:test_mempool():
int i;
for (i = 0; i < 100; i++) {
struct rte_mempool *mp;
mp = rte_mempool_create("test", 1500000,
224, 0, 0, NULL, NULL,
NULL, NULL, SOCKET_ID_ANY,
MEMPOOL_F_SP_PUT);
if (mp == NULL) {
printf("rte_mempool_create() failed\n");
return -1;
}
rte_mempool_free(mp);
}
return 0;
Then, you can launch the test application and run you test with
"mempool_autotest". I suggest to compile with EXTRA_CFLAGS="-g", so you
can run "perf top" (https://perf.wiki.kernel.org/index.php/Main_Page) to
see where you spend the time. By using "perf record" / "perf report"
with options, you can also analyze the call stack.
Please share your results, especially comparison between 17.05 and 18.11.
Thanks,
Olivier
>
>
>
> Thanks,
>
> Regards,
>
> Venu
>
> On Wed, 13 Nov 2019 at 14:02, Olivier Matz <olivier.matz@6wind.com> wrote:
>
> > Hi Venu,
> >
> > On Wed, Nov 13, 2019 at 10:42:07AM +0530, Venumadhav Josyula wrote:
> > > Hi,
> > >
> > > Few more points
> > >
> > > Operating system : Centos 7.6
> > > Logging mechanism : syslog
> > >
> > > We have logged using syslog before the call and syslog after the call.
> > >
> > > Thanks & Regards
> > > Venu
> > >
> > > On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com>
> > wrote:
> > >
> > > > Hi ,
> > > > We are using 'rte_mempool_create' for allocation of flow memory. This
> > has
> > > > been there for a while. We just migrated to dpdk-18.11 from
> > dpdk-17.05. Now
> > > > here is problem statement
> > > >
> > > > Problem statement :
> > > > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> > > > sec for allocation compared to older dpdk (17.05). We have som 8-9
> > mempools
> > > > for our entire product. We do upfront allocation for all of them ( i.e.
> > > > when dpdk application is coming up). Our application is run to
> > completion
> > > > model.
> > > >
> > > > Questions:-
> > > > i) is that acceptable / has anybody seen such a thing ?
> > > > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> > > > memory perspective ?
> >
> > Could you give some more details about you use case? (hugepage size, number
> > of objects, object size, additional mempool flags, ...)
> >
> > Did you manage to reproduce it in a small test example? We could do some
> > profiling to investigate.
> >
> > Thanks for the feedback.
> > Olivier
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 9:19 ` Bruce Richardson
@ 2019-11-13 17:26 ` Burakov, Anatoly
2019-11-13 21:01 ` Venumadhav Josyula
2019-11-14 8:12 ` Venumadhav Josyula
0 siblings, 2 replies; 18+ messages in thread
From: Burakov, Anatoly @ 2019-11-13 17:26 UTC (permalink / raw)
To: Bruce Richardson, Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula
On 13-Nov-19 9:19 AM, Bruce Richardson wrote:
> On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote:
>> Hi ,
>> We are using 'rte_mempool_create' for allocation of flow memory. This has
>> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now
>> here is problem statement
>>
>> Problem statement :
>> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec
>> for allocation compared to older dpdk (17.05). We have som 8-9 mempools for
>> our entire product. We do upfront allocation for all of them ( i.e. when
>> dpdk application is coming up). Our application is run to completion model.
>>
>> Questions:-
>> i) is that acceptable / has anybody seen such a thing ?
>> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
>> memory perspective ?
>>
>> Any pointer are welcome.
>>
> Hi,
>
> from 17.05 to 18.11 there was a change in default memory model for DPDK. In
> 17.05 all DPDK memory was allocated statically upfront and that used for
> the memory pools. With 18.11, no large blocks of memory are allocated at
> init time, instead the memory is requested from the kernel as it is needed
> by the app. This will make the initial startup of an app faster, but the
> allocation of new objects like mempools slower, and it could be this you
> are seeing.
>
> Some things to try:
> 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for use
> by your memory pools and see if it improves things.
> 2. Try using "--legacy-mem" flag to revert to the old memory model.
>
> Regards,
> /Bruce
>
I would also add to this the fact that the mempool will, by default,
attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA
contiguous memory whenever getting IOVA-contiguous memory isn't possible.
If you are running in IOVA as PA mode (such as would be the case if you
are using igb_uio kernel driver), then, since it is now impossible to
preallocate large PA-contiguous chunks in advance, what will likely
happen in this case is, mempool will try to allocate IOVA-contiguous
memory, fail and retry with non-IOVA contiguous memory (essentially
allocating memory twice). For large mempools (or large number of
mempools) that can take a bit of time.
The obvious workaround is using VFIO and IOVA as VA mode. This will
cause the allocator to be able to get IOVA-contiguous memory at the
outset, and allocation will complete faster.
The other two alternatives, already suggested in this thread by Bruce
and Olivier, are:
1) use bigger page sizes (such as 1G)
2) use legacy mode (and lose out on all of the benefits provided by the
new memory model)
The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 17:26 ` Burakov, Anatoly
@ 2019-11-13 21:01 ` Venumadhav Josyula
2019-11-14 9:44 ` Burakov, Anatoly
2019-11-14 8:12 ` Venumadhav Josyula
1 sibling, 1 reply; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-13 21:01 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
Hi Anatoly,
By default w/o specifying --iova-mode option is iova-mode=pa by default ?
Thanks
Venu
On Wed, 13 Nov, 2019, 10:56 pm Burakov, Anatoly, <anatoly.burakov@intel.com>
wrote:
> On 13-Nov-19 9:19 AM, Bruce Richardson wrote:
> > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote:
> >> Hi ,
> >> We are using 'rte_mempool_create' for allocation of flow memory. This
> has
> >> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05.
> Now
> >> here is problem statement
> >>
> >> Problem statement :
> >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> sec
> >> for allocation compared to older dpdk (17.05). We have som 8-9 mempools
> for
> >> our entire product. We do upfront allocation for all of them ( i.e. when
> >> dpdk application is coming up). Our application is run to completion
> model.
> >>
> >> Questions:-
> >> i) is that acceptable / has anybody seen such a thing ?
> >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> >> memory perspective ?
> >>
> >> Any pointer are welcome.
> >>
> > Hi,
> >
> > from 17.05 to 18.11 there was a change in default memory model for DPDK.
> In
> > 17.05 all DPDK memory was allocated statically upfront and that used for
> > the memory pools. With 18.11, no large blocks of memory are allocated at
> > init time, instead the memory is requested from the kernel as it is
> needed
> > by the app. This will make the initial startup of an app faster, but the
> > allocation of new objects like mempools slower, and it could be this you
> > are seeing.
> >
> > Some things to try:
> > 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for
> use
> > by your memory pools and see if it improves things.
> > 2. Try using "--legacy-mem" flag to revert to the old memory model.
> >
> > Regards,
> > /Bruce
> >
>
> I would also add to this the fact that the mempool will, by default,
> attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA
> contiguous memory whenever getting IOVA-contiguous memory isn't possible.
>
> If you are running in IOVA as PA mode (such as would be the case if you
> are using igb_uio kernel driver), then, since it is now impossible to
> preallocate large PA-contiguous chunks in advance, what will likely
> happen in this case is, mempool will try to allocate IOVA-contiguous
> memory, fail and retry with non-IOVA contiguous memory (essentially
> allocating memory twice). For large mempools (or large number of
> mempools) that can take a bit of time.
>
> The obvious workaround is using VFIO and IOVA as VA mode. This will
> cause the allocator to be able to get IOVA-contiguous memory at the
> outset, and allocation will complete faster.
>
> The other two alternatives, already suggested in this thread by Bruce
> and Olivier, are:
>
> 1) use bigger page sizes (such as 1G)
> 2) use legacy mode (and lose out on all of the benefits provided by the
> new memory model)
>
> The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode.
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 17:26 ` Burakov, Anatoly
2019-11-13 21:01 ` Venumadhav Josyula
@ 2019-11-14 8:12 ` Venumadhav Josyula
2019-11-14 9:49 ` Burakov, Anatoly
1 sibling, 1 reply; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-14 8:12 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
Hi Oliver,Bruce,
- we were using --SOCKET-MEM Eal flag.
- We did not wanted to avoid going back to legacy mode.
- we also wanted to avoid 1G huge-pages.
Thanks for your inputs.
Hi Anatoly,
We were using vfio with iommu, but by default it s iova-mode=pa, after
changing to iova-mode=va via EAL it kind of helped us to bring down
allocation time(s) for mempools drastically. The time taken was brought
from ~4.4 sec to 0.165254 sec.
Thanks and regards
Venu
On Wed, 13 Nov 2019 at 22:56, Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 13-Nov-19 9:19 AM, Bruce Richardson wrote:
> > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote:
> >> Hi ,
> >> We are using 'rte_mempool_create' for allocation of flow memory. This
> has
> >> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05.
> Now
> >> here is problem statement
> >>
> >> Problem statement :
> >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> sec
> >> for allocation compared to older dpdk (17.05). We have som 8-9 mempools
> for
> >> our entire product. We do upfront allocation for all of them ( i.e. when
> >> dpdk application is coming up). Our application is run to completion
> model.
> >>
> >> Questions:-
> >> i) is that acceptable / has anybody seen such a thing ?
> >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> >> memory perspective ?
> >>
> >> Any pointer are welcome.
> >>
> > Hi,
> >
> > from 17.05 to 18.11 there was a change in default memory model for DPDK.
> In
> > 17.05 all DPDK memory was allocated statically upfront and that used for
> > the memory pools. With 18.11, no large blocks of memory are allocated at
> > init time, instead the memory is requested from the kernel as it is
> needed
> > by the app. This will make the initial startup of an app faster, but the
> > allocation of new objects like mempools slower, and it could be this you
> > are seeing.
> >
> > Some things to try:
> > 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for
> use
> > by your memory pools and see if it improves things.
> > 2. Try using "--legacy-mem" flag to revert to the old memory model.
> >
> > Regards,
> > /Bruce
> >
>
> I would also add to this the fact that the mempool will, by default,
> attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA
> contiguous memory whenever getting IOVA-contiguous memory isn't possible.
>
> If you are running in IOVA as PA mode (such as would be the case if you
> are using igb_uio kernel driver), then, since it is now impossible to
> preallocate large PA-contiguous chunks in advance, what will likely
> happen in this case is, mempool will try to allocate IOVA-contiguous
> memory, fail and retry with non-IOVA contiguous memory (essentially
> allocating memory twice). For large mempools (or large number of
> mempools) that can take a bit of time.
>
> The obvious workaround is using VFIO and IOVA as VA mode. This will
> cause the allocator to be able to get IOVA-contiguous memory at the
> outset, and allocation will complete faster.
>
> The other two alternatives, already suggested in this thread by Bruce
> and Olivier, are:
>
> 1) use bigger page sizes (such as 1G)
> 2) use legacy mode (and lose out on all of the benefits provided by the
> new memory model)
>
> The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode.
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 21:01 ` Venumadhav Josyula
@ 2019-11-14 9:44 ` Burakov, Anatoly
2019-11-14 9:50 ` Venumadhav Josyula
0 siblings, 1 reply; 18+ messages in thread
From: Burakov, Anatoly @ 2019-11-14 9:44 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote:
> Hi Anatoly,
>
> By default w/o specifying --iova-mode option is iova-mode=pa by default ?
>
> Thanks
> Venu
>
In 18.11, there is a very specific set of circumstances that will
default to IOVA as VA mode. Future releases have become more aggressive,
to the point of IOVA as VA mode being the default unless asked
otherwise. So yes, it is highly likely that in your case, IOVA as PA is
picked as the default.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-14 8:12 ` Venumadhav Josyula
@ 2019-11-14 9:49 ` Burakov, Anatoly
2019-11-14 9:53 ` Venumadhav Josyula
0 siblings, 1 reply; 18+ messages in thread
From: Burakov, Anatoly @ 2019-11-14 9:49 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
On 14-Nov-19 8:12 AM, Venumadhav Josyula wrote:
> Hi Oliver,Bruce,
>
> * we were using --SOCKET-MEM Eal flag.
> * We did not wanted to avoid going back to legacy mode.
> * we also wanted to avoid 1G huge-pages.
>
> Thanks for your inputs.
>
> Hi Anatoly,
>
> We were using vfio with iommu, but by default it s iova-mode=pa, after
> changing to iova-mode=va via EAL it kind of helped us to bring down
> allocation time(s) for mempools drastically. The time taken was brought
> from ~4.4 sec to 0.165254 sec.
>
> Thanks and regards
> Venu
That's great to hear.
As a final note, --socket-mem is no longer necessary, because 18.11 will
allocate memory as needed. It is however still advisable to use it if
you see yourself end up in a situation where the runtime allocation
could conceivably fail (such as if you have other applications running
on your system, and DPDK has to compete for hugepage memory).
I would also suggest using --limit-mem if you desire to limit the
maximum amount of memory DPDK will be able to allocate. This will make
DPDK behave similarly to older releases in that it will not attempt to
allocate more memory than you allow it.
>
>
> On Wed, 13 Nov 2019 at 22:56, Burakov, Anatoly
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>
> On 13-Nov-19 9:19 AM, Bruce Richardson wrote:
> > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote:
> >> Hi ,
> >> We are using 'rte_mempool_create' for allocation of flow memory.
> This has
> >> been there for a while. We just migrated to dpdk-18.11 from
> dpdk-17.05. Now
> >> here is problem statement
> >>
> >> Problem statement :
> >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take
> approximately ~4.4 sec
> >> for allocation compared to older dpdk (17.05). We have som 8-9
> mempools for
> >> our entire product. We do upfront allocation for all of them (
> i.e. when
> >> dpdk application is coming up). Our application is run to
> completion model.
> >>
> >> Questions:-
> >> i) is that acceptable / has anybody seen such a thing ?
> >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05
> ) from
> >> memory perspective ?
> >>
> >> Any pointer are welcome.
> >>
> > Hi,
> >
> > from 17.05 to 18.11 there was a change in default memory model
> for DPDK. In
> > 17.05 all DPDK memory was allocated statically upfront and that
> used for
> > the memory pools. With 18.11, no large blocks of memory are
> allocated at
> > init time, instead the memory is requested from the kernel as it
> is needed
> > by the app. This will make the initial startup of an app faster,
> but the
> > allocation of new objects like mempools slower, and it could be
> this you
> > are seeing.
> >
> > Some things to try:
> > 1. Use "--socket-mem" EAL flag to do an upfront allocation of
> memory for use
> > by your memory pools and see if it improves things.
> > 2. Try using "--legacy-mem" flag to revert to the old memory model.
> >
> > Regards,
> > /Bruce
> >
>
> I would also add to this the fact that the mempool will, by default,
> attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA
> contiguous memory whenever getting IOVA-contiguous memory isn't
> possible.
>
> If you are running in IOVA as PA mode (such as would be the case if you
> are using igb_uio kernel driver), then, since it is now impossible to
> preallocate large PA-contiguous chunks in advance, what will likely
> happen in this case is, mempool will try to allocate IOVA-contiguous
> memory, fail and retry with non-IOVA contiguous memory (essentially
> allocating memory twice). For large mempools (or large number of
> mempools) that can take a bit of time.
>
> The obvious workaround is using VFIO and IOVA as VA mode. This will
> cause the allocator to be able to get IOVA-contiguous memory at the
> outset, and allocation will complete faster.
>
> The other two alternatives, already suggested in this thread by Bruce
> and Olivier, are:
>
> 1) use bigger page sizes (such as 1G)
> 2) use legacy mode (and lose out on all of the benefits provided by the
> new memory model)
>
> The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode.
>
> --
> Thanks,
> Anatoly
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-14 9:44 ` Burakov, Anatoly
@ 2019-11-14 9:50 ` Venumadhav Josyula
2019-11-14 9:57 ` Burakov, Anatoly
0 siblings, 1 reply; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-14 9:50 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
Hi Anatoly,
Thanks for quick response. We want to understand, if there will be
performance implications because of iova-mode being va. We want to
understand, specifically in terms following
- cache misses
- Branch misses etc
- translation of va addr -> phy addr when packet is receieved
Thanks and regards
Venu
On Thu, 14 Nov 2019 at 15:14, Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote:
> > Hi Anatoly,
> >
> > By default w/o specifying --iova-mode option is iova-mode=pa by default ?
> >
> > Thanks
> > Venu
> >
>
> In 18.11, there is a very specific set of circumstances that will
> default to IOVA as VA mode. Future releases have become more aggressive,
> to the point of IOVA as VA mode being the default unless asked
> otherwise. So yes, it is highly likely that in your case, IOVA as PA is
> picked as the default.
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-14 9:49 ` Burakov, Anatoly
@ 2019-11-14 9:53 ` Venumadhav Josyula
0 siblings, 0 replies; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-14 9:53 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
Hi Anatoly,
> I would also suggest using --limit-mem if you desire to limit the
> maximum amount of memory DPDK will be able to allocate.
We are already using that.
Thanks and regards,
Venu
On Thu, 14 Nov 2019 at 15:19, Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 14-Nov-19 8:12 AM, Venumadhav Josyula wrote:
> > Hi Oliver,Bruce,
> >
> > * we were using --SOCKET-MEM Eal flag.
> > * We did not wanted to avoid going back to legacy mode.
> > * we also wanted to avoid 1G huge-pages.
> >
> > Thanks for your inputs.
> >
> > Hi Anatoly,
> >
> > We were using vfio with iommu, but by default it s iova-mode=pa, after
> > changing to iova-mode=va via EAL it kind of helped us to bring down
> > allocation time(s) for mempools drastically. The time taken was brought
> > from ~4.4 sec to 0.165254 sec.
> >
> > Thanks and regards
> > Venu
>
> That's great to hear.
>
> As a final note, --socket-mem is no longer necessary, because 18.11 will
> allocate memory as needed. It is however still advisable to use it if
> you see yourself end up in a situation where the runtime allocation
> could conceivably fail (such as if you have other applications running
> on your system, and DPDK has to compete for hugepage memory).
>
> I would also suggest using --limit-mem if you desire to limit the
> maximum amount of memory DPDK will be able to allocate. This will make
> DPDK behave similarly to older releases in that it will not attempt to
> allocate more memory than you allow it.
>
> >
> >
> > On Wed, 13 Nov 2019 at 22:56, Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> > On 13-Nov-19 9:19 AM, Bruce Richardson wrote:
> > > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula
> wrote:
> > >> Hi ,
> > >> We are using 'rte_mempool_create' for allocation of flow memory.
> > This has
> > >> been there for a while. We just migrated to dpdk-18.11 from
> > dpdk-17.05. Now
> > >> here is problem statement
> > >>
> > >> Problem statement :
> > >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take
> > approximately ~4.4 sec
> > >> for allocation compared to older dpdk (17.05). We have som 8-9
> > mempools for
> > >> our entire product. We do upfront allocation for all of them (
> > i.e. when
> > >> dpdk application is coming up). Our application is run to
> > completion model.
> > >>
> > >> Questions:-
> > >> i) is that acceptable / has anybody seen such a thing ?
> > >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05
> > ) from
> > >> memory perspective ?
> > >>
> > >> Any pointer are welcome.
> > >>
> > > Hi,
> > >
> > > from 17.05 to 18.11 there was a change in default memory model
> > for DPDK. In
> > > 17.05 all DPDK memory was allocated statically upfront and that
> > used for
> > > the memory pools. With 18.11, no large blocks of memory are
> > allocated at
> > > init time, instead the memory is requested from the kernel as it
> > is needed
> > > by the app. This will make the initial startup of an app faster,
> > but the
> > > allocation of new objects like mempools slower, and it could be
> > this you
> > > are seeing.
> > >
> > > Some things to try:
> > > 1. Use "--socket-mem" EAL flag to do an upfront allocation of
> > memory for use
> > > by your memory pools and see if it improves things.
> > > 2. Try using "--legacy-mem" flag to revert to the old memory
> model.
> > >
> > > Regards,
> > > /Bruce
> > >
> >
> > I would also add to this the fact that the mempool will, by default,
> > attempt to allocate IOVA-contiguous memory, with a fallback to
> non-IOVA
> > contiguous memory whenever getting IOVA-contiguous memory isn't
> > possible.
> >
> > If you are running in IOVA as PA mode (such as would be the case if
> you
> > are using igb_uio kernel driver), then, since it is now impossible to
> > preallocate large PA-contiguous chunks in advance, what will likely
> > happen in this case is, mempool will try to allocate IOVA-contiguous
> > memory, fail and retry with non-IOVA contiguous memory (essentially
> > allocating memory twice). For large mempools (or large number of
> > mempools) that can take a bit of time.
> >
> > The obvious workaround is using VFIO and IOVA as VA mode. This will
> > cause the allocator to be able to get IOVA-contiguous memory at the
> > outset, and allocation will complete faster.
> >
> > The other two alternatives, already suggested in this thread by Bruce
> > and Olivier, are:
> >
> > 1) use bigger page sizes (such as 1G)
> > 2) use legacy mode (and lose out on all of the benefits provided by
> the
> > new memory model)
> >
> > The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode.
> >
> > --
> > Thanks,
> > Anatoly
> >
>
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-14 9:50 ` Venumadhav Josyula
@ 2019-11-14 9:57 ` Burakov, Anatoly
2019-11-18 16:43 ` Venumadhav Josyula
0 siblings, 1 reply; 18+ messages in thread
From: Burakov, Anatoly @ 2019-11-14 9:57 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
On 14-Nov-19 9:50 AM, Venumadhav Josyula wrote:
> Hi Anatoly,
>
> Thanks for quick response. We want to understand, if there will be
> performance implications because of iova-mode being va. We want to
> understand, specifically in terms following
>
> * cache misses
> * Branch misses etc
> * translation of va addr -> phy addr when packet is receieved
>
There will be no impact whatsoever. You mentioned that you were already
using VFIO, so you were already making use of IOMMU*. Cache/branch
misses are independent of IOVA layout, and translations are done by the
hardware (in either IOVA as PA or IOVA as VA case - IOMMU doesn't care
what you program it with, it still does the translation, even if it's a
1:1 IOVA-to-PA mapping), so there is nothing that can cause degradation.
In fact, under some circumstances, using IOVA as VA mode can be used to
get performance /gains/, because the code can take advantage of the fact
that there are large IOVA-contiguous segments and no page-by-page
allocations. Some drivers (IIRC octeontx mempool?) even refuse to work
in IOVA as PA mode due to huge overheads of page-by-page buffer offset
tracking.
TL;DR you'll be fine :)
* Using an IOMMU can /theoretically/ affect performance due to hardware
IOVA->PA translation and IOTLB cache misses. In practice, i have never
been able to observe /any/ effect whatsoever on performance when using
IOMMU vs. without using IOMMU, so this appears to not be a concern /in
practice/.
> Thanks and regards
> Venu
>
> On Thu, 14 Nov 2019 at 15:14, Burakov, Anatoly
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>
> On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote:
> > Hi Anatoly,
> >
> > By default w/o specifying --iova-mode option is iova-mode=pa by
> default ?
> >
> > Thanks
> > Venu
> >
>
> In 18.11, there is a very specific set of circumstances that will
> default to IOVA as VA mode. Future releases have become more
> aggressive,
> to the point of IOVA as VA mode being the default unless asked
> otherwise. So yes, it is highly likely that in your case, IOVA as PA is
> picked as the default.
>
> --
> Thanks,
> Anatoly
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-14 9:57 ` Burakov, Anatoly
@ 2019-11-18 16:43 ` Venumadhav Josyula
2019-12-06 10:47 ` Burakov, Anatoly
0 siblings, 1 reply; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-18 16:43 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
Hi Anatoly,
After using iova-mode=va, i see my ports are not getting detected ? I
thought it's working but I see following problem
what could be the problem?
i) I see allocation is faster
ii) But my ports are not getting detected
I take my word back that it entirely working..
Thanks,
Regards,
Venu
On Thu, 14 Nov 2019 at 15:27, Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 14-Nov-19 9:50 AM, Venumadhav Josyula wrote:
> > Hi Anatoly,
> >
> > Thanks for quick response. We want to understand, if there will be
> > performance implications because of iova-mode being va. We want to
> > understand, specifically in terms following
> >
> > * cache misses
> > * Branch misses etc
> > * translation of va addr -> phy addr when packet is receieved
> >
>
> There will be no impact whatsoever. You mentioned that you were already
> using VFIO, so you were already making use of IOMMU*. Cache/branch
> misses are independent of IOVA layout, and translations are done by the
> hardware (in either IOVA as PA or IOVA as VA case - IOMMU doesn't care
> what you program it with, it still does the translation, even if it's a
> 1:1 IOVA-to-PA mapping), so there is nothing that can cause degradation.
>
> In fact, under some circumstances, using IOVA as VA mode can be used to
> get performance /gains/, because the code can take advantage of the fact
> that there are large IOVA-contiguous segments and no page-by-page
> allocations. Some drivers (IIRC octeontx mempool?) even refuse to work
> in IOVA as PA mode due to huge overheads of page-by-page buffer offset
> tracking.
>
> TL;DR you'll be fine :)
>
> * Using an IOMMU can /theoretically/ affect performance due to hardware
> IOVA->PA translation and IOTLB cache misses. In practice, i have never
> been able to observe /any/ effect whatsoever on performance when using
> IOMMU vs. without using IOMMU, so this appears to not be a concern /in
> practice/.
>
> > Thanks and regards
> > Venu
> >
> > On Thu, 14 Nov 2019 at 15:14, Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> > On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote:
> > > Hi Anatoly,
> > >
> > > By default w/o specifying --iova-mode option is iova-mode=pa by
> > default ?
> > >
> > > Thanks
> > > Venu
> > >
> >
> > In 18.11, there is a very specific set of circumstances that will
> > default to IOVA as VA mode. Future releases have become more
> > aggressive,
> > to the point of IOVA as VA mode being the default unless asked
> > otherwise. So yes, it is highly likely that in your case, IOVA as PA
> is
> > picked as the default.
> >
> > --
> > Thanks,
> > Anatoly
> >
>
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula
2019-11-13 5:12 ` Venumadhav Josyula
2019-11-13 9:19 ` Bruce Richardson
@ 2019-11-18 16:45 ` Venumadhav Josyula
2 siblings, 0 replies; 18+ messages in thread
From: Venumadhav Josyula @ 2019-11-18 16:45 UTC (permalink / raw)
To: users, dev; +Cc: Venumadhav Josyula
PL note I am using dpdk 18-11...
On Wed, 13 Nov, 2019, 10:37 am Venumadhav Josyula, <vjosyula@gmail.com>
wrote:
> Hi ,
> We are using 'rte_mempool_create' for allocation of flow memory. This has
> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now
> here is problem statement
>
> Problem statement :
> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4
> sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools
> for our entire product. We do upfront allocation for all of them ( i.e.
> when dpdk application is coming up). Our application is run to completion
> model.
>
> Questions:-
> i) is that acceptable / has anybody seen such a thing ?
> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from
> memory perspective ?
>
> Any pointer are welcome.
>
> Thanks & regards
> Venu
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-11-18 16:43 ` Venumadhav Josyula
@ 2019-12-06 10:47 ` Burakov, Anatoly
2019-12-06 10:49 ` Venumadhav Josyula
0 siblings, 1 reply; 18+ messages in thread
From: Burakov, Anatoly @ 2019-12-06 10:47 UTC (permalink / raw)
To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
On 18-Nov-19 4:43 PM, Venumadhav Josyula wrote:
> Hi Anatoly,
>
> After using iova-mode=va, i see my ports are not getting detected ? I
> thought it's working but I see following problem
>
> what could be the problem?
> i) I see allocation is faster
> ii) But my ports are not getting detected
> I take my word back that it entirely working..
>
> Thanks,
> Regards,
> Venu
>
"Ports are not getting detected" is a pretty vague description of the
problem. Could you please post the EAL initialization log (preferably
with --log-level=eal,8 added, so that there's more output)?
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool.
2019-12-06 10:47 ` Burakov, Anatoly
@ 2019-12-06 10:49 ` Venumadhav Josyula
0 siblings, 0 replies; 18+ messages in thread
From: Venumadhav Josyula @ 2019-12-06 10:49 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula
Hi Anatoly,
I was able to resolve the problem, which problem in our script.
Thanks and regards
Venu
On Fri, 6 Dec 2019 at 16:17, Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 18-Nov-19 4:43 PM, Venumadhav Josyula wrote:
> > Hi Anatoly,
> >
> > After using iova-mode=va, i see my ports are not getting detected ? I
> > thought it's working but I see following problem
> >
> > what could be the problem?
> > i) I see allocation is faster
> > ii) But my ports are not getting detected
> > I take my word back that it entirely working..
> >
> > Thanks,
> > Regards,
> > Venu
> >
>
> "Ports are not getting detected" is a pretty vague description of the
> problem. Could you please post the EAL initialization log (preferably
> with --log-level=eal,8 added, so that there's more output)?
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-12-06 10:50 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula
2019-11-13 5:12 ` Venumadhav Josyula
2019-11-13 8:32 ` Olivier Matz
2019-11-13 9:11 ` Venumadhav Josyula
2019-11-13 9:30 ` Olivier Matz
2019-11-13 9:19 ` Bruce Richardson
2019-11-13 17:26 ` Burakov, Anatoly
2019-11-13 21:01 ` Venumadhav Josyula
2019-11-14 9:44 ` Burakov, Anatoly
2019-11-14 9:50 ` Venumadhav Josyula
2019-11-14 9:57 ` Burakov, Anatoly
2019-11-18 16:43 ` Venumadhav Josyula
2019-12-06 10:47 ` Burakov, Anatoly
2019-12-06 10:49 ` Venumadhav Josyula
2019-11-14 8:12 ` Venumadhav Josyula
2019-11-14 9:49 ` Burakov, Anatoly
2019-11-14 9:53 ` Venumadhav Josyula
2019-11-18 16:45 ` Venumadhav Josyula
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).