* [dpdk-dev] [PATCH] Performance optimization of ACL build process @ 2016-08-16 14:01 Vladyslav Buslov 2016-08-16 14:01 ` [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov 2016-08-31 15:24 ` [dpdk-dev] [PATCH] Performance optimization of ACL build process Stephen Hemminger 0 siblings, 2 replies; 7+ messages in thread From: Vladyslav Buslov @ 2016-08-16 14:01 UTC (permalink / raw) To: konstantin.ananyev; +Cc: dev Hello, In our application we need to be able to allocate tens of thousands of ACLs at runtime. Testing revealed significant performance problems. We were able to track them to memset in calloc function which caused multiple page walks per invocation. Modifying tb_mem to use huge page memory resulted ~2x performance gain for that operation. Regards, Vladyslav Vladyslav Buslov (1): acl: use rte_calloc for temporary memory allocation lib/librte_acl/tb_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.8.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation 2016-08-16 14:01 [dpdk-dev] [PATCH] Performance optimization of ACL build process Vladyslav Buslov @ 2016-08-16 14:01 ` Vladyslav Buslov 2016-08-31 1:27 ` Ananyev, Konstantin 2016-08-31 15:24 ` [dpdk-dev] [PATCH] Performance optimization of ACL build process Stephen Hemminger 1 sibling, 1 reply; 7+ messages in thread From: Vladyslav Buslov @ 2016-08-16 14:01 UTC (permalink / raw) To: konstantin.ananyev; +Cc: dev Acl build process uses significant amount of memory which degrades performance by causing page walks when memory is allocated on regular heap using libc calloc. This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc. Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com> --- lib/librte_acl/tb_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index 157e608..c373673 100644 --- a/lib/librte_acl/tb_mem.c +++ b/lib/librte_acl/tb_mem.c @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz) size_t size; size = sz + pool->alignment - 1; - block = calloc(1, size + sizeof(*pool->block)); + block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + sizeof(*pool->block), 0); if (block == NULL) { RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated " "by pool: %zu bytes\n", __func__, sz, pool->alloc); -- 2.8.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation 2016-08-16 14:01 ` [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov @ 2016-08-31 1:27 ` Ananyev, Konstantin 2016-08-31 8:38 ` Vladyslav Buslov 0 siblings, 1 reply; 7+ messages in thread From: Ananyev, Konstantin @ 2016-08-31 1:27 UTC (permalink / raw) To: Vladyslav Buslov; +Cc: dev Hi Vladyslav, > -----Original Message----- > From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com] > Sent: Tuesday, August 16, 2016 3:01 PM > To: Ananyev, Konstantin <konstantin.ananyev@intel.com> > Cc: dev@dpdk.org > Subject: [PATCH] acl: use rte_calloc for temporary memory allocation > > Acl build process uses significant amount of memory which degrades performance by causing page walks when memory is allocated on > regular heap using libc calloc. > > This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc. We deliberately used standard system memory allocation routines (calloc/free) here. With current design build phase was never considered to be an 'RT' phase operation. It is pretty cpu and memory expensive. So if we'll use RTE memory for build phase it could easily consume all (or most) of it, and might cause DPDK process failure or degradation. If you really feel that you (and other users) would benefit from using rte_calloc here (I personally still in doubt), then at least it should be a new field inside rte_acl_config, that would allow user to control that behavior. Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'. To add ability to add/delete rules at runtime while keeping lookup time reasonably low some new approach need to be introduced. Konstantin > > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com> > --- > lib/librte_acl/tb_mem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index 157e608..c373673 100644 > --- a/lib/librte_acl/tb_mem.c > +++ b/lib/librte_acl/tb_mem.c > @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz) > size_t size; > > size = sz + pool->alignment - 1; > - block = calloc(1, size + sizeof(*pool->block)); > + block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + sizeof(*pool->block), > +0); > if (block == NULL) { > RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated " > "by pool: %zu bytes\n", __func__, sz, pool->alloc); > -- > 2.8.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation 2016-08-31 1:27 ` Ananyev, Konstantin @ 2016-08-31 8:38 ` Vladyslav Buslov 2016-08-31 9:33 ` Thomas Monjalon 2016-08-31 9:59 ` Ananyev, Konstantin 0 siblings, 2 replies; 7+ messages in thread From: Vladyslav Buslov @ 2016-08-31 8:38 UTC (permalink / raw) To: Ananyev, Konstantin; +Cc: dev Hi Konstantin, Thanks for your feedback. Would you accept this change as config file compile-time parameter with libc calloc as default? It is one line change only so it is easy to ifdef. Regards, Vlad -----Original Message----- From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] Sent: Wednesday, August 31, 2016 4:28 AM To: Vladyslav Buslov Cc: dev@dpdk.org Subject: RE: [PATCH] acl: use rte_calloc for temporary memory allocation Hi Vladyslav, > -----Original Message----- > From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com] > Sent: Tuesday, August 16, 2016 3:01 PM > To: Ananyev, Konstantin <konstantin.ananyev@intel.com> > Cc: dev@dpdk.org > Subject: [PATCH] acl: use rte_calloc for temporary memory allocation > > Acl build process uses significant amount of memory which degrades > performance by causing page walks when memory is allocated on regular heap using libc calloc. > > This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc. We deliberately used standard system memory allocation routines (calloc/free) here. With current design build phase was never considered to be an 'RT' phase operation. It is pretty cpu and memory expensive. So if we'll use RTE memory for build phase it could easily consume all (or most) of it, and might cause DPDK process failure or degradation. If you really feel that you (and other users) would benefit from using rte_calloc here (I personally still in doubt), then at least it should be a new field inside rte_acl_config, that would allow user to control that behavior. Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'. To add ability to add/delete rules at runtime while keeping lookup time reasonably low some new approach need to be introduced. Konstantin > > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com> > --- > lib/librte_acl/tb_mem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index > 157e608..c373673 100644 > --- a/lib/librte_acl/tb_mem.c > +++ b/lib/librte_acl/tb_mem.c > @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz) > size_t size; > > size = sz + pool->alignment - 1; > - block = calloc(1, size + sizeof(*pool->block)); > + block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + > +sizeof(*pool->block), 0); > if (block == NULL) { > RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated " > "by pool: %zu bytes\n", __func__, sz, pool->alloc); > -- > 2.8.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation 2016-08-31 8:38 ` Vladyslav Buslov @ 2016-08-31 9:33 ` Thomas Monjalon 2016-08-31 9:59 ` Ananyev, Konstantin 1 sibling, 0 replies; 7+ messages in thread From: Thomas Monjalon @ 2016-08-31 9:33 UTC (permalink / raw) To: Vladyslav Buslov; +Cc: dev, Ananyev, Konstantin 2016-08-31 08:38, Vladyslav Buslov: > Would you accept this change as config file compile-time parameter with libc calloc as default? > It is one line change only so it is easy to ifdef. The configuration should not be compile-time. Please think about a runtime configuration via an API. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation 2016-08-31 8:38 ` Vladyslav Buslov 2016-08-31 9:33 ` Thomas Monjalon @ 2016-08-31 9:59 ` Ananyev, Konstantin 1 sibling, 0 replies; 7+ messages in thread From: Ananyev, Konstantin @ 2016-08-31 9:59 UTC (permalink / raw) To: Vladyslav Buslov; +Cc: dev Hi Vlad, > > Hi Konstantin, > > Thanks for your feedback. > > Would you accept this change as config file compile-time parameter with libc calloc as default? > It is one line change only so it is easy to ifdef. It is an option, but the main requirements from the community is to minimize number of build time config options, and instead provide ability to the user to configure things at runtime. That's why I thought about something like: + /* use EAL hugepages for temporary memory allocations, + * might improve build time, build increases hugepages + * demand significantly. + */ #define RTE_ACL_CFG_FLAG_HMEM 1 struct rte_acl_config { uint32_t num_categories; /**< Number of categories to build with. */ uint32_t num_fields; /**< Number of field definitions. */ struct rte_acl_field_def defs[RTE_ACL_MAX_FIELDS]; /**< array of field definitions. */ size_t max_size; /**< max memory limit for internal run-time structures. */ + uint64_t flags; }; And then change tb_pool() to use either calloc() or rte_alloc() based on the flags value. Another, probably even better and more flexible way is to allow user to specify his own alloc routine: struct rte_acl_config { uint32_t num_categories; /**< Number of categories to build with. */ uint32_t num_fields; /**< Number of field definitions. */ struct rte_acl_field_def defs[RTE_ACL_MAX_FIELDS]; /**< array of field definitions. */ size_t max_size; /**< max memory limit for internal run-time structures. */ + void *(tballoc)( (size_t, size_t); + void (*tbfree)(void *); }; In that case user can provide his own alloc/free based on rte_calloc/rte_free or even on something else. The only drawback I see with that approaches, that in that case, you'll need to follow ABI compliance rules, which probably means that your change might not make 16.11. Konstantin > > Regards, > Vlad > > -----Original Message----- > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] > Sent: Wednesday, August 31, 2016 4:28 AM > To: Vladyslav Buslov > Cc: dev@dpdk.org > Subject: RE: [PATCH] acl: use rte_calloc for temporary memory allocation > > Hi Vladyslav, > > > -----Original Message----- > > From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com] > > Sent: Tuesday, August 16, 2016 3:01 PM > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com> > > Cc: dev@dpdk.org > > Subject: [PATCH] acl: use rte_calloc for temporary memory allocation > > > > Acl build process uses significant amount of memory which degrades > > performance by causing page walks when memory is allocated on regular heap using libc calloc. > > > > This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc. > > We deliberately used standard system memory allocation routines (calloc/free) here. > With current design build phase was never considered to be an 'RT' phase operation. > It is pretty cpu and memory expensive. > So if we'll use RTE memory for build phase it could easily consume all (or most) of it, and might cause DPDK process failure or degradation. > If you really feel that you (and other users) would benefit from using rte_calloc here (I personally still in doubt), then at least it should be a > new field inside rte_acl_config, that would allow user to control that behavior. > Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'. > To add ability to add/delete rules at runtime while keeping lookup time reasonably low some new approach need to be introduced. > Konstantin > > > > > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com> > > --- > > lib/librte_acl/tb_mem.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index > > 157e608..c373673 100644 > > --- a/lib/librte_acl/tb_mem.c > > +++ b/lib/librte_acl/tb_mem.c > > @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz) > > size_t size; > > > > size = sz + pool->alignment - 1; > > - block = calloc(1, size + sizeof(*pool->block)); > > + block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + > > +sizeof(*pool->block), 0); > > if (block == NULL) { > > RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated " > > "by pool: %zu bytes\n", __func__, sz, pool->alloc); > > -- > > 2.8.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] Performance optimization of ACL build process 2016-08-16 14:01 [dpdk-dev] [PATCH] Performance optimization of ACL build process Vladyslav Buslov 2016-08-16 14:01 ` [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov @ 2016-08-31 15:24 ` Stephen Hemminger 1 sibling, 0 replies; 7+ messages in thread From: Stephen Hemminger @ 2016-08-31 15:24 UTC (permalink / raw) To: Vladyslav Buslov; +Cc: konstantin.ananyev, dev On Tue, 16 Aug 2016 17:01:27 +0300 Vladyslav Buslov <vladyslav.buslov@harmonicinc.com> wrote: > Hello, > > In our application we need to be able to allocate tens of thousands of ACLs at runtime. > Testing revealed significant performance problems. We were able to track them to memset in calloc function which caused multiple page walks per invocation. > Modifying tb_mem to use huge page memory resulted ~2x performance gain for that operation. > > Regards, > Vladyslav > > Vladyslav Buslov (1): > acl: use rte_calloc for temporary memory allocation > > lib/librte_acl/tb_mem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > What about making sure that the ACL code does it's own initialization of all fields and just use malloc rather than calloc? ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-08-31 15:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-16 14:01 [dpdk-dev] [PATCH] Performance optimization of ACL build process Vladyslav Buslov 2016-08-16 14:01 ` [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov 2016-08-31 1:27 ` Ananyev, Konstantin 2016-08-31 8:38 ` Vladyslav Buslov 2016-08-31 9:33 ` Thomas Monjalon 2016-08-31 9:59 ` Ananyev, Konstantin 2016-08-31 15:24 ` [dpdk-dev] [PATCH] Performance optimization of ACL build process Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).