* [dpdk-dev] time taken for allocation of mempool. @ 2019-11-13 5:07 Venumadhav Josyula 2019-11-13 5:12 ` Venumadhav Josyula ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-13 5:07 UTC (permalink / raw) To: users, dev; +Cc: Venumadhav Josyula Hi , We are using 'rte_mempool_create' for allocation of flow memory. This has been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now here is problem statement Problem statement : In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools for our entire product. We do upfront allocation for all of them ( i.e. when dpdk application is coming up). Our application is run to completion model. Questions:- i) is that acceptable / has anybody seen such a thing ? ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from memory perspective ? Any pointer are welcome. Thanks & regards Venu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula @ 2019-11-13 5:12 ` Venumadhav Josyula 2019-11-13 8:32 ` Olivier Matz 2019-11-13 9:19 ` Bruce Richardson 2019-11-18 16:45 ` Venumadhav Josyula 2 siblings, 1 reply; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-13 5:12 UTC (permalink / raw) To: users, dev; +Cc: Venumadhav Josyula Hi, Few more points Operating system : Centos 7.6 Logging mechanism : syslog We have logged using syslog before the call and syslog after the call. Thanks & Regards Venu On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com> wrote: > Hi , > We are using 'rte_mempool_create' for allocation of flow memory. This has > been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now > here is problem statement > > Problem statement : > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools > for our entire product. We do upfront allocation for all of them ( i.e. > when dpdk application is coming up). Our application is run to completion > model. > > Questions:- > i) is that acceptable / has anybody seen such a thing ? > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > memory perspective ? > > Any pointer are welcome. > > Thanks & regards > Venu > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 5:12 ` Venumadhav Josyula @ 2019-11-13 8:32 ` Olivier Matz 2019-11-13 9:11 ` Venumadhav Josyula 0 siblings, 1 reply; 18+ messages in thread From: Olivier Matz @ 2019-11-13 8:32 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula Hi Venu, On Wed, Nov 13, 2019 at 10:42:07AM +0530, Venumadhav Josyula wrote: > Hi, > > Few more points > > Operating system : Centos 7.6 > Logging mechanism : syslog > > We have logged using syslog before the call and syslog after the call. > > Thanks & Regards > Venu > > On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com> wrote: > > > Hi , > > We are using 'rte_mempool_create' for allocation of flow memory. This has > > been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now > > here is problem statement > > > > Problem statement : > > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > > sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools > > for our entire product. We do upfront allocation for all of them ( i.e. > > when dpdk application is coming up). Our application is run to completion > > model. > > > > Questions:- > > i) is that acceptable / has anybody seen such a thing ? > > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > > memory perspective ? Could you give some more details about you use case? (hugepage size, number of objects, object size, additional mempool flags, ...) Did you manage to reproduce it in a small test example? We could do some profiling to investigate. Thanks for the feedback. Olivier ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 8:32 ` Olivier Matz @ 2019-11-13 9:11 ` Venumadhav Josyula 2019-11-13 9:30 ` Olivier Matz 0 siblings, 1 reply; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-13 9:11 UTC (permalink / raw) To: Olivier Matz; +Cc: users, dev, Venumadhav Josyula Hi Oliver, *> Could you give some more details about you use case? (hugepage size, number of objects, object size, additional mempool flags, ...)* Ours in telecom product, we support multiple rats. Let us take example of 4G case where we act as an gtpu proxy. · Hugepage size :- 2 Mb · *rte_mempool_create in param* o { name=”gtpu-mem”, o n=1500000, o elt_size=224, o cache_size=0, o private_data_size=0, o mp_init=NULL, o mp_init_arg=NULL, o obj_init=NULL, o obj_init_arg=NULL, o socket_id=rte_socket_id(), o flags=MEMPOOL_F_SP_PUT } *> Did you manage to reproduce it in a small test example? We could do some profiling to investigate.* No I would love to try that ? Are there examples ? Thanks, Regards, Venu On Wed, 13 Nov 2019 at 14:02, Olivier Matz <olivier.matz@6wind.com> wrote: > Hi Venu, > > On Wed, Nov 13, 2019 at 10:42:07AM +0530, Venumadhav Josyula wrote: > > Hi, > > > > Few more points > > > > Operating system : Centos 7.6 > > Logging mechanism : syslog > > > > We have logged using syslog before the call and syslog after the call. > > > > Thanks & Regards > > Venu > > > > On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com> > wrote: > > > > > Hi , > > > We are using 'rte_mempool_create' for allocation of flow memory. This > has > > > been there for a while. We just migrated to dpdk-18.11 from > dpdk-17.05. Now > > > here is problem statement > > > > > > Problem statement : > > > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > > > sec for allocation compared to older dpdk (17.05). We have som 8-9 > mempools > > > for our entire product. We do upfront allocation for all of them ( i.e. > > > when dpdk application is coming up). Our application is run to > completion > > > model. > > > > > > Questions:- > > > i) is that acceptable / has anybody seen such a thing ? > > > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > > > memory perspective ? > > Could you give some more details about you use case? (hugepage size, number > of objects, object size, additional mempool flags, ...) > > Did you manage to reproduce it in a small test example? We could do some > profiling to investigate. > > Thanks for the feedback. > Olivier > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 9:11 ` Venumadhav Josyula @ 2019-11-13 9:30 ` Olivier Matz 0 siblings, 0 replies; 18+ messages in thread From: Olivier Matz @ 2019-11-13 9:30 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula Hi Venu, On Wed, Nov 13, 2019 at 02:41:04PM +0530, Venumadhav Josyula wrote: > Hi Oliver, > > > > *> Could you give some more details about you use case? (hugepage size, > number of objects, object size, additional mempool flags, ...)* > > Ours in telecom product, we support multiple rats. Let us take example of > 4G case where we act as an gtpu proxy. > > · Hugepage size :- 2 Mb > > · *rte_mempool_create in param* > > o { name=”gtpu-mem”, > > o n=1500000, > > o elt_size=224, > > o cache_size=0, > > o private_data_size=0, > > o mp_init=NULL, > > o mp_init_arg=NULL, > > o obj_init=NULL, > > o obj_init_arg=NULL, > > o socket_id=rte_socket_id(), > > o flags=MEMPOOL_F_SP_PUT } > OK, that's quite big mempools (~300MB) but I don't think it should take that much time. I suspect that using 1G hugepages could help, in case it is related to the memory allocator. > *> Did you manage to reproduce it in a small test example? We could do some > profiling to investigate.* > > No I would love to try that ? Are there examples ? The simplest way for me is to hack the unit tests. Add this code (not tested) at the beginning of test_mempool.c:test_mempool(): int i; for (i = 0; i < 100; i++) { struct rte_mempool *mp; mp = rte_mempool_create("test", 1500000, 224, 0, 0, NULL, NULL, NULL, NULL, SOCKET_ID_ANY, MEMPOOL_F_SP_PUT); if (mp == NULL) { printf("rte_mempool_create() failed\n"); return -1; } rte_mempool_free(mp); } return 0; Then, you can launch the test application and run you test with "mempool_autotest". I suggest to compile with EXTRA_CFLAGS="-g", so you can run "perf top" (https://perf.wiki.kernel.org/index.php/Main_Page) to see where you spend the time. By using "perf record" / "perf report" with options, you can also analyze the call stack. Please share your results, especially comparison between 17.05 and 18.11. Thanks, Olivier > > > > Thanks, > > Regards, > > Venu > > On Wed, 13 Nov 2019 at 14:02, Olivier Matz <olivier.matz@6wind.com> wrote: > > > Hi Venu, > > > > On Wed, Nov 13, 2019 at 10:42:07AM +0530, Venumadhav Josyula wrote: > > > Hi, > > > > > > Few more points > > > > > > Operating system : Centos 7.6 > > > Logging mechanism : syslog > > > > > > We have logged using syslog before the call and syslog after the call. > > > > > > Thanks & Regards > > > Venu > > > > > > On Wed, 13 Nov 2019 at 10:37, Venumadhav Josyula <vjosyula@gmail.com> > > wrote: > > > > > > > Hi , > > > > We are using 'rte_mempool_create' for allocation of flow memory. This > > has > > > > been there for a while. We just migrated to dpdk-18.11 from > > dpdk-17.05. Now > > > > here is problem statement > > > > > > > > Problem statement : > > > > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > > > > sec for allocation compared to older dpdk (17.05). We have som 8-9 > > mempools > > > > for our entire product. We do upfront allocation for all of them ( i.e. > > > > when dpdk application is coming up). Our application is run to > > completion > > > > model. > > > > > > > > Questions:- > > > > i) is that acceptable / has anybody seen such a thing ? > > > > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > > > > memory perspective ? > > > > Could you give some more details about you use case? (hugepage size, number > > of objects, object size, additional mempool flags, ...) > > > > Did you manage to reproduce it in a small test example? We could do some > > profiling to investigate. > > > > Thanks for the feedback. > > Olivier > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula 2019-11-13 5:12 ` Venumadhav Josyula @ 2019-11-13 9:19 ` Bruce Richardson 2019-11-13 17:26 ` Burakov, Anatoly 2019-11-18 16:45 ` Venumadhav Josyula 2 siblings, 1 reply; 18+ messages in thread From: Bruce Richardson @ 2019-11-13 9:19 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote: > Hi , > We are using 'rte_mempool_create' for allocation of flow memory. This has > been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now > here is problem statement > > Problem statement : > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec > for allocation compared to older dpdk (17.05). We have som 8-9 mempools for > our entire product. We do upfront allocation for all of them ( i.e. when > dpdk application is coming up). Our application is run to completion model. > > Questions:- > i) is that acceptable / has anybody seen such a thing ? > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > memory perspective ? > > Any pointer are welcome. > Hi, from 17.05 to 18.11 there was a change in default memory model for DPDK. In 17.05 all DPDK memory was allocated statically upfront and that used for the memory pools. With 18.11, no large blocks of memory are allocated at init time, instead the memory is requested from the kernel as it is needed by the app. This will make the initial startup of an app faster, but the allocation of new objects like mempools slower, and it could be this you are seeing. Some things to try: 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for use by your memory pools and see if it improves things. 2. Try using "--legacy-mem" flag to revert to the old memory model. Regards, /Bruce ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 9:19 ` Bruce Richardson @ 2019-11-13 17:26 ` Burakov, Anatoly 2019-11-13 21:01 ` Venumadhav Josyula 2019-11-14 8:12 ` Venumadhav Josyula 0 siblings, 2 replies; 18+ messages in thread From: Burakov, Anatoly @ 2019-11-13 17:26 UTC (permalink / raw) To: Bruce Richardson, Venumadhav Josyula; +Cc: users, dev, Venumadhav Josyula On 13-Nov-19 9:19 AM, Bruce Richardson wrote: > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote: >> Hi , >> We are using 'rte_mempool_create' for allocation of flow memory. This has >> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now >> here is problem statement >> >> Problem statement : >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec >> for allocation compared to older dpdk (17.05). We have som 8-9 mempools for >> our entire product. We do upfront allocation for all of them ( i.e. when >> dpdk application is coming up). Our application is run to completion model. >> >> Questions:- >> i) is that acceptable / has anybody seen such a thing ? >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from >> memory perspective ? >> >> Any pointer are welcome. >> > Hi, > > from 17.05 to 18.11 there was a change in default memory model for DPDK. In > 17.05 all DPDK memory was allocated statically upfront and that used for > the memory pools. With 18.11, no large blocks of memory are allocated at > init time, instead the memory is requested from the kernel as it is needed > by the app. This will make the initial startup of an app faster, but the > allocation of new objects like mempools slower, and it could be this you > are seeing. > > Some things to try: > 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for use > by your memory pools and see if it improves things. > 2. Try using "--legacy-mem" flag to revert to the old memory model. > > Regards, > /Bruce > I would also add to this the fact that the mempool will, by default, attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA contiguous memory whenever getting IOVA-contiguous memory isn't possible. If you are running in IOVA as PA mode (such as would be the case if you are using igb_uio kernel driver), then, since it is now impossible to preallocate large PA-contiguous chunks in advance, what will likely happen in this case is, mempool will try to allocate IOVA-contiguous memory, fail and retry with non-IOVA contiguous memory (essentially allocating memory twice). For large mempools (or large number of mempools) that can take a bit of time. The obvious workaround is using VFIO and IOVA as VA mode. This will cause the allocator to be able to get IOVA-contiguous memory at the outset, and allocation will complete faster. The other two alternatives, already suggested in this thread by Bruce and Olivier, are: 1) use bigger page sizes (such as 1G) 2) use legacy mode (and lose out on all of the benefits provided by the new memory model) The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 17:26 ` Burakov, Anatoly @ 2019-11-13 21:01 ` Venumadhav Josyula 2019-11-14 9:44 ` Burakov, Anatoly 2019-11-14 8:12 ` Venumadhav Josyula 1 sibling, 1 reply; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-13 21:01 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula Hi Anatoly, By default w/o specifying --iova-mode option is iova-mode=pa by default ? Thanks Venu On Wed, 13 Nov, 2019, 10:56 pm Burakov, Anatoly, <anatoly.burakov@intel.com> wrote: > On 13-Nov-19 9:19 AM, Bruce Richardson wrote: > > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote: > >> Hi , > >> We are using 'rte_mempool_create' for allocation of flow memory. This > has > >> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. > Now > >> here is problem statement > >> > >> Problem statement : > >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > sec > >> for allocation compared to older dpdk (17.05). We have som 8-9 mempools > for > >> our entire product. We do upfront allocation for all of them ( i.e. when > >> dpdk application is coming up). Our application is run to completion > model. > >> > >> Questions:- > >> i) is that acceptable / has anybody seen such a thing ? > >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > >> memory perspective ? > >> > >> Any pointer are welcome. > >> > > Hi, > > > > from 17.05 to 18.11 there was a change in default memory model for DPDK. > In > > 17.05 all DPDK memory was allocated statically upfront and that used for > > the memory pools. With 18.11, no large blocks of memory are allocated at > > init time, instead the memory is requested from the kernel as it is > needed > > by the app. This will make the initial startup of an app faster, but the > > allocation of new objects like mempools slower, and it could be this you > > are seeing. > > > > Some things to try: > > 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for > use > > by your memory pools and see if it improves things. > > 2. Try using "--legacy-mem" flag to revert to the old memory model. > > > > Regards, > > /Bruce > > > > I would also add to this the fact that the mempool will, by default, > attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA > contiguous memory whenever getting IOVA-contiguous memory isn't possible. > > If you are running in IOVA as PA mode (such as would be the case if you > are using igb_uio kernel driver), then, since it is now impossible to > preallocate large PA-contiguous chunks in advance, what will likely > happen in this case is, mempool will try to allocate IOVA-contiguous > memory, fail and retry with non-IOVA contiguous memory (essentially > allocating memory twice). For large mempools (or large number of > mempools) that can take a bit of time. > > The obvious workaround is using VFIO and IOVA as VA mode. This will > cause the allocator to be able to get IOVA-contiguous memory at the > outset, and allocation will complete faster. > > The other two alternatives, already suggested in this thread by Bruce > and Olivier, are: > > 1) use bigger page sizes (such as 1G) > 2) use legacy mode (and lose out on all of the benefits provided by the > new memory model) > > The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode. > > -- > Thanks, > Anatoly > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 21:01 ` Venumadhav Josyula @ 2019-11-14 9:44 ` Burakov, Anatoly 2019-11-14 9:50 ` Venumadhav Josyula 0 siblings, 1 reply; 18+ messages in thread From: Burakov, Anatoly @ 2019-11-14 9:44 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote: > Hi Anatoly, > > By default w/o specifying --iova-mode option is iova-mode=pa by default ? > > Thanks > Venu > In 18.11, there is a very specific set of circumstances that will default to IOVA as VA mode. Future releases have become more aggressive, to the point of IOVA as VA mode being the default unless asked otherwise. So yes, it is highly likely that in your case, IOVA as PA is picked as the default. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-14 9:44 ` Burakov, Anatoly @ 2019-11-14 9:50 ` Venumadhav Josyula 2019-11-14 9:57 ` Burakov, Anatoly 0 siblings, 1 reply; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-14 9:50 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula Hi Anatoly, Thanks for quick response. We want to understand, if there will be performance implications because of iova-mode being va. We want to understand, specifically in terms following - cache misses - Branch misses etc - translation of va addr -> phy addr when packet is receieved Thanks and regards Venu On Thu, 14 Nov 2019 at 15:14, Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote: > > Hi Anatoly, > > > > By default w/o specifying --iova-mode option is iova-mode=pa by default ? > > > > Thanks > > Venu > > > > In 18.11, there is a very specific set of circumstances that will > default to IOVA as VA mode. Future releases have become more aggressive, > to the point of IOVA as VA mode being the default unless asked > otherwise. So yes, it is highly likely that in your case, IOVA as PA is > picked as the default. > > -- > Thanks, > Anatoly > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-14 9:50 ` Venumadhav Josyula @ 2019-11-14 9:57 ` Burakov, Anatoly 2019-11-18 16:43 ` Venumadhav Josyula 0 siblings, 1 reply; 18+ messages in thread From: Burakov, Anatoly @ 2019-11-14 9:57 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula On 14-Nov-19 9:50 AM, Venumadhav Josyula wrote: > Hi Anatoly, > > Thanks for quick response. We want to understand, if there will be > performance implications because of iova-mode being va. We want to > understand, specifically in terms following > > * cache misses > * Branch misses etc > * translation of va addr -> phy addr when packet is receieved > There will be no impact whatsoever. You mentioned that you were already using VFIO, so you were already making use of IOMMU*. Cache/branch misses are independent of IOVA layout, and translations are done by the hardware (in either IOVA as PA or IOVA as VA case - IOMMU doesn't care what you program it with, it still does the translation, even if it's a 1:1 IOVA-to-PA mapping), so there is nothing that can cause degradation. In fact, under some circumstances, using IOVA as VA mode can be used to get performance /gains/, because the code can take advantage of the fact that there are large IOVA-contiguous segments and no page-by-page allocations. Some drivers (IIRC octeontx mempool?) even refuse to work in IOVA as PA mode due to huge overheads of page-by-page buffer offset tracking. TL;DR you'll be fine :) * Using an IOMMU can /theoretically/ affect performance due to hardware IOVA->PA translation and IOTLB cache misses. In practice, i have never been able to observe /any/ effect whatsoever on performance when using IOMMU vs. without using IOMMU, so this appears to not be a concern /in practice/. > Thanks and regards > Venu > > On Thu, 14 Nov 2019 at 15:14, Burakov, Anatoly > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote: > > On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote: > > Hi Anatoly, > > > > By default w/o specifying --iova-mode option is iova-mode=pa by > default ? > > > > Thanks > > Venu > > > > In 18.11, there is a very specific set of circumstances that will > default to IOVA as VA mode. Future releases have become more > aggressive, > to the point of IOVA as VA mode being the default unless asked > otherwise. So yes, it is highly likely that in your case, IOVA as PA is > picked as the default. > > -- > Thanks, > Anatoly > -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-14 9:57 ` Burakov, Anatoly @ 2019-11-18 16:43 ` Venumadhav Josyula 2019-12-06 10:47 ` Burakov, Anatoly 0 siblings, 1 reply; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-18 16:43 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula Hi Anatoly, After using iova-mode=va, i see my ports are not getting detected ? I thought it's working but I see following problem what could be the problem? i) I see allocation is faster ii) But my ports are not getting detected I take my word back that it entirely working.. Thanks, Regards, Venu On Thu, 14 Nov 2019 at 15:27, Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > On 14-Nov-19 9:50 AM, Venumadhav Josyula wrote: > > Hi Anatoly, > > > > Thanks for quick response. We want to understand, if there will be > > performance implications because of iova-mode being va. We want to > > understand, specifically in terms following > > > > * cache misses > > * Branch misses etc > > * translation of va addr -> phy addr when packet is receieved > > > > There will be no impact whatsoever. You mentioned that you were already > using VFIO, so you were already making use of IOMMU*. Cache/branch > misses are independent of IOVA layout, and translations are done by the > hardware (in either IOVA as PA or IOVA as VA case - IOMMU doesn't care > what you program it with, it still does the translation, even if it's a > 1:1 IOVA-to-PA mapping), so there is nothing that can cause degradation. > > In fact, under some circumstances, using IOVA as VA mode can be used to > get performance /gains/, because the code can take advantage of the fact > that there are large IOVA-contiguous segments and no page-by-page > allocations. Some drivers (IIRC octeontx mempool?) even refuse to work > in IOVA as PA mode due to huge overheads of page-by-page buffer offset > tracking. > > TL;DR you'll be fine :) > > * Using an IOMMU can /theoretically/ affect performance due to hardware > IOVA->PA translation and IOTLB cache misses. In practice, i have never > been able to observe /any/ effect whatsoever on performance when using > IOMMU vs. without using IOMMU, so this appears to not be a concern /in > practice/. > > > Thanks and regards > > Venu > > > > On Thu, 14 Nov 2019 at 15:14, Burakov, Anatoly > > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote: > > > > On 13-Nov-19 9:01 PM, Venumadhav Josyula wrote: > > > Hi Anatoly, > > > > > > By default w/o specifying --iova-mode option is iova-mode=pa by > > default ? > > > > > > Thanks > > > Venu > > > > > > > In 18.11, there is a very specific set of circumstances that will > > default to IOVA as VA mode. Future releases have become more > > aggressive, > > to the point of IOVA as VA mode being the default unless asked > > otherwise. So yes, it is highly likely that in your case, IOVA as PA > is > > picked as the default. > > > > -- > > Thanks, > > Anatoly > > > > > -- > Thanks, > Anatoly > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-18 16:43 ` Venumadhav Josyula @ 2019-12-06 10:47 ` Burakov, Anatoly 2019-12-06 10:49 ` Venumadhav Josyula 0 siblings, 1 reply; 18+ messages in thread From: Burakov, Anatoly @ 2019-12-06 10:47 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula On 18-Nov-19 4:43 PM, Venumadhav Josyula wrote: > Hi Anatoly, > > After using iova-mode=va, i see my ports are not getting detected ? I > thought it's working but I see following problem > > what could be the problem? > i) I see allocation is faster > ii) But my ports are not getting detected > I take my word back that it entirely working.. > > Thanks, > Regards, > Venu > "Ports are not getting detected" is a pretty vague description of the problem. Could you please post the EAL initialization log (preferably with --log-level=eal,8 added, so that there's more output)? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-12-06 10:47 ` Burakov, Anatoly @ 2019-12-06 10:49 ` Venumadhav Josyula 0 siblings, 0 replies; 18+ messages in thread From: Venumadhav Josyula @ 2019-12-06 10:49 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula Hi Anatoly, I was able to resolve the problem, which problem in our script. Thanks and regards Venu On Fri, 6 Dec 2019 at 16:17, Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > On 18-Nov-19 4:43 PM, Venumadhav Josyula wrote: > > Hi Anatoly, > > > > After using iova-mode=va, i see my ports are not getting detected ? I > > thought it's working but I see following problem > > > > what could be the problem? > > i) I see allocation is faster > > ii) But my ports are not getting detected > > I take my word back that it entirely working.. > > > > Thanks, > > Regards, > > Venu > > > > "Ports are not getting detected" is a pretty vague description of the > problem. Could you please post the EAL initialization log (preferably > with --log-level=eal,8 added, so that there's more output)? > > -- > Thanks, > Anatoly > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 17:26 ` Burakov, Anatoly 2019-11-13 21:01 ` Venumadhav Josyula @ 2019-11-14 8:12 ` Venumadhav Josyula 2019-11-14 9:49 ` Burakov, Anatoly 1 sibling, 1 reply; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-14 8:12 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula Hi Oliver,Bruce, - we were using --SOCKET-MEM Eal flag. - We did not wanted to avoid going back to legacy mode. - we also wanted to avoid 1G huge-pages. Thanks for your inputs. Hi Anatoly, We were using vfio with iommu, but by default it s iova-mode=pa, after changing to iova-mode=va via EAL it kind of helped us to bring down allocation time(s) for mempools drastically. The time taken was brought from ~4.4 sec to 0.165254 sec. Thanks and regards Venu On Wed, 13 Nov 2019 at 22:56, Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > On 13-Nov-19 9:19 AM, Bruce Richardson wrote: > > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote: > >> Hi , > >> We are using 'rte_mempool_create' for allocation of flow memory. This > has > >> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. > Now > >> here is problem statement > >> > >> Problem statement : > >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > sec > >> for allocation compared to older dpdk (17.05). We have som 8-9 mempools > for > >> our entire product. We do upfront allocation for all of them ( i.e. when > >> dpdk application is coming up). Our application is run to completion > model. > >> > >> Questions:- > >> i) is that acceptable / has anybody seen such a thing ? > >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > >> memory perspective ? > >> > >> Any pointer are welcome. > >> > > Hi, > > > > from 17.05 to 18.11 there was a change in default memory model for DPDK. > In > > 17.05 all DPDK memory was allocated statically upfront and that used for > > the memory pools. With 18.11, no large blocks of memory are allocated at > > init time, instead the memory is requested from the kernel as it is > needed > > by the app. This will make the initial startup of an app faster, but the > > allocation of new objects like mempools slower, and it could be this you > > are seeing. > > > > Some things to try: > > 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for > use > > by your memory pools and see if it improves things. > > 2. Try using "--legacy-mem" flag to revert to the old memory model. > > > > Regards, > > /Bruce > > > > I would also add to this the fact that the mempool will, by default, > attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA > contiguous memory whenever getting IOVA-contiguous memory isn't possible. > > If you are running in IOVA as PA mode (such as would be the case if you > are using igb_uio kernel driver), then, since it is now impossible to > preallocate large PA-contiguous chunks in advance, what will likely > happen in this case is, mempool will try to allocate IOVA-contiguous > memory, fail and retry with non-IOVA contiguous memory (essentially > allocating memory twice). For large mempools (or large number of > mempools) that can take a bit of time. > > The obvious workaround is using VFIO and IOVA as VA mode. This will > cause the allocator to be able to get IOVA-contiguous memory at the > outset, and allocation will complete faster. > > The other two alternatives, already suggested in this thread by Bruce > and Olivier, are: > > 1) use bigger page sizes (such as 1G) > 2) use legacy mode (and lose out on all of the benefits provided by the > new memory model) > > The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode. > > -- > Thanks, > Anatoly > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-14 8:12 ` Venumadhav Josyula @ 2019-11-14 9:49 ` Burakov, Anatoly 2019-11-14 9:53 ` Venumadhav Josyula 0 siblings, 1 reply; 18+ messages in thread From: Burakov, Anatoly @ 2019-11-14 9:49 UTC (permalink / raw) To: Venumadhav Josyula; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula On 14-Nov-19 8:12 AM, Venumadhav Josyula wrote: > Hi Oliver,Bruce, > > * we were using --SOCKET-MEM Eal flag. > * We did not wanted to avoid going back to legacy mode. > * we also wanted to avoid 1G huge-pages. > > Thanks for your inputs. > > Hi Anatoly, > > We were using vfio with iommu, but by default it s iova-mode=pa, after > changing to iova-mode=va via EAL it kind of helped us to bring down > allocation time(s) for mempools drastically. The time taken was brought > from ~4.4 sec to 0.165254 sec. > > Thanks and regards > Venu That's great to hear. As a final note, --socket-mem is no longer necessary, because 18.11 will allocate memory as needed. It is however still advisable to use it if you see yourself end up in a situation where the runtime allocation could conceivably fail (such as if you have other applications running on your system, and DPDK has to compete for hugepage memory). I would also suggest using --limit-mem if you desire to limit the maximum amount of memory DPDK will be able to allocate. This will make DPDK behave similarly to older releases in that it will not attempt to allocate more memory than you allow it. > > > On Wed, 13 Nov 2019 at 22:56, Burakov, Anatoly > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote: > > On 13-Nov-19 9:19 AM, Bruce Richardson wrote: > > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote: > >> Hi , > >> We are using 'rte_mempool_create' for allocation of flow memory. > This has > >> been there for a while. We just migrated to dpdk-18.11 from > dpdk-17.05. Now > >> here is problem statement > >> > >> Problem statement : > >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take > approximately ~4.4 sec > >> for allocation compared to older dpdk (17.05). We have som 8-9 > mempools for > >> our entire product. We do upfront allocation for all of them ( > i.e. when > >> dpdk application is coming up). Our application is run to > completion model. > >> > >> Questions:- > >> i) is that acceptable / has anybody seen such a thing ? > >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 > ) from > >> memory perspective ? > >> > >> Any pointer are welcome. > >> > > Hi, > > > > from 17.05 to 18.11 there was a change in default memory model > for DPDK. In > > 17.05 all DPDK memory was allocated statically upfront and that > used for > > the memory pools. With 18.11, no large blocks of memory are > allocated at > > init time, instead the memory is requested from the kernel as it > is needed > > by the app. This will make the initial startup of an app faster, > but the > > allocation of new objects like mempools slower, and it could be > this you > > are seeing. > > > > Some things to try: > > 1. Use "--socket-mem" EAL flag to do an upfront allocation of > memory for use > > by your memory pools and see if it improves things. > > 2. Try using "--legacy-mem" flag to revert to the old memory model. > > > > Regards, > > /Bruce > > > > I would also add to this the fact that the mempool will, by default, > attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA > contiguous memory whenever getting IOVA-contiguous memory isn't > possible. > > If you are running in IOVA as PA mode (such as would be the case if you > are using igb_uio kernel driver), then, since it is now impossible to > preallocate large PA-contiguous chunks in advance, what will likely > happen in this case is, mempool will try to allocate IOVA-contiguous > memory, fail and retry with non-IOVA contiguous memory (essentially > allocating memory twice). For large mempools (or large number of > mempools) that can take a bit of time. > > The obvious workaround is using VFIO and IOVA as VA mode. This will > cause the allocator to be able to get IOVA-contiguous memory at the > outset, and allocation will complete faster. > > The other two alternatives, already suggested in this thread by Bruce > and Olivier, are: > > 1) use bigger page sizes (such as 1G) > 2) use legacy mode (and lose out on all of the benefits provided by the > new memory model) > > The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode. > > -- > Thanks, > Anatoly > -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-14 9:49 ` Burakov, Anatoly @ 2019-11-14 9:53 ` Venumadhav Josyula 0 siblings, 0 replies; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-14 9:53 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Bruce Richardson, users, dev, Venumadhav Josyula Hi Anatoly, > I would also suggest using --limit-mem if you desire to limit the > maximum amount of memory DPDK will be able to allocate. We are already using that. Thanks and regards, Venu On Thu, 14 Nov 2019 at 15:19, Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > On 14-Nov-19 8:12 AM, Venumadhav Josyula wrote: > > Hi Oliver,Bruce, > > > > * we were using --SOCKET-MEM Eal flag. > > * We did not wanted to avoid going back to legacy mode. > > * we also wanted to avoid 1G huge-pages. > > > > Thanks for your inputs. > > > > Hi Anatoly, > > > > We were using vfio with iommu, but by default it s iova-mode=pa, after > > changing to iova-mode=va via EAL it kind of helped us to bring down > > allocation time(s) for mempools drastically. The time taken was brought > > from ~4.4 sec to 0.165254 sec. > > > > Thanks and regards > > Venu > > That's great to hear. > > As a final note, --socket-mem is no longer necessary, because 18.11 will > allocate memory as needed. It is however still advisable to use it if > you see yourself end up in a situation where the runtime allocation > could conceivably fail (such as if you have other applications running > on your system, and DPDK has to compete for hugepage memory). > > I would also suggest using --limit-mem if you desire to limit the > maximum amount of memory DPDK will be able to allocate. This will make > DPDK behave similarly to older releases in that it will not attempt to > allocate more memory than you allow it. > > > > > > > On Wed, 13 Nov 2019 at 22:56, Burakov, Anatoly > > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote: > > > > On 13-Nov-19 9:19 AM, Bruce Richardson wrote: > > > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula > wrote: > > >> Hi , > > >> We are using 'rte_mempool_create' for allocation of flow memory. > > This has > > >> been there for a while. We just migrated to dpdk-18.11 from > > dpdk-17.05. Now > > >> here is problem statement > > >> > > >> Problem statement : > > >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take > > approximately ~4.4 sec > > >> for allocation compared to older dpdk (17.05). We have som 8-9 > > mempools for > > >> our entire product. We do upfront allocation for all of them ( > > i.e. when > > >> dpdk application is coming up). Our application is run to > > completion model. > > >> > > >> Questions:- > > >> i) is that acceptable / has anybody seen such a thing ? > > >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 > > ) from > > >> memory perspective ? > > >> > > >> Any pointer are welcome. > > >> > > > Hi, > > > > > > from 17.05 to 18.11 there was a change in default memory model > > for DPDK. In > > > 17.05 all DPDK memory was allocated statically upfront and that > > used for > > > the memory pools. With 18.11, no large blocks of memory are > > allocated at > > > init time, instead the memory is requested from the kernel as it > > is needed > > > by the app. This will make the initial startup of an app faster, > > but the > > > allocation of new objects like mempools slower, and it could be > > this you > > > are seeing. > > > > > > Some things to try: > > > 1. Use "--socket-mem" EAL flag to do an upfront allocation of > > memory for use > > > by your memory pools and see if it improves things. > > > 2. Try using "--legacy-mem" flag to revert to the old memory > model. > > > > > > Regards, > > > /Bruce > > > > > > > I would also add to this the fact that the mempool will, by default, > > attempt to allocate IOVA-contiguous memory, with a fallback to > non-IOVA > > contiguous memory whenever getting IOVA-contiguous memory isn't > > possible. > > > > If you are running in IOVA as PA mode (such as would be the case if > you > > are using igb_uio kernel driver), then, since it is now impossible to > > preallocate large PA-contiguous chunks in advance, what will likely > > happen in this case is, mempool will try to allocate IOVA-contiguous > > memory, fail and retry with non-IOVA contiguous memory (essentially > > allocating memory twice). For large mempools (or large number of > > mempools) that can take a bit of time. > > > > The obvious workaround is using VFIO and IOVA as VA mode. This will > > cause the allocator to be able to get IOVA-contiguous memory at the > > outset, and allocation will complete faster. > > > > The other two alternatives, already suggested in this thread by Bruce > > and Olivier, are: > > > > 1) use bigger page sizes (such as 1G) > > 2) use legacy mode (and lose out on all of the benefits provided by > the > > new memory model) > > > > The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode. > > > > -- > > Thanks, > > Anatoly > > > > > -- > Thanks, > Anatoly > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] time taken for allocation of mempool. 2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula 2019-11-13 5:12 ` Venumadhav Josyula 2019-11-13 9:19 ` Bruce Richardson @ 2019-11-18 16:45 ` Venumadhav Josyula 2 siblings, 0 replies; 18+ messages in thread From: Venumadhav Josyula @ 2019-11-18 16:45 UTC (permalink / raw) To: users, dev; +Cc: Venumadhav Josyula PL note I am using dpdk 18-11... On Wed, 13 Nov, 2019, 10:37 am Venumadhav Josyula, <vjosyula@gmail.com> wrote: > Hi , > We are using 'rte_mempool_create' for allocation of flow memory. This has > been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now > here is problem statement > > Problem statement : > In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 > sec for allocation compared to older dpdk (17.05). We have som 8-9 mempools > for our entire product. We do upfront allocation for all of them ( i.e. > when dpdk application is coming up). Our application is run to completion > model. > > Questions:- > i) is that acceptable / has anybody seen such a thing ? > ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from > memory perspective ? > > Any pointer are welcome. > > Thanks & regards > Venu > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-12-06 10:50 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-13 5:07 [dpdk-dev] time taken for allocation of mempool Venumadhav Josyula 2019-11-13 5:12 ` Venumadhav Josyula 2019-11-13 8:32 ` Olivier Matz 2019-11-13 9:11 ` Venumadhav Josyula 2019-11-13 9:30 ` Olivier Matz 2019-11-13 9:19 ` Bruce Richardson 2019-11-13 17:26 ` Burakov, Anatoly 2019-11-13 21:01 ` Venumadhav Josyula 2019-11-14 9:44 ` Burakov, Anatoly 2019-11-14 9:50 ` Venumadhav Josyula 2019-11-14 9:57 ` Burakov, Anatoly 2019-11-18 16:43 ` Venumadhav Josyula 2019-12-06 10:47 ` Burakov, Anatoly 2019-12-06 10:49 ` Venumadhav Josyula 2019-11-14 8:12 ` Venumadhav Josyula 2019-11-14 9:49 ` Burakov, Anatoly 2019-11-14 9:53 ` Venumadhav Josyula 2019-11-18 16:45 ` Venumadhav Josyula
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).