* [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy @ 2016-05-24 14:50 Jerin Jacob 2016-05-24 14:59 ` Olivier Matz 2016-05-26 8:07 ` [dpdk-dev] [PATCH v2] mempool: " Jerin Jacob 0 siblings, 2 replies; 32+ messages in thread From: Jerin Jacob @ 2016-05-24 14:50 UTC (permalink / raw) To: dev; +Cc: thomas.monjalon, bruce.richardson, konstantin.ananyev, Jerin Jacob Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- lib/librte_mempool/rte_mempool.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index ed2c110..ebe399a 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -74,6 +74,7 @@ #include <rte_memory.h> #include <rte_branch_prediction.h> #include <rte_ring.h> +#include <rte_memcpy.h> #ifdef __cplusplus extern "C" { @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n, __rte_unused int is_mp) { struct rte_mempool_cache *cache; - uint32_t index; void **cache_objs; unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size; @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, */ /* Add elements back into the cache */ - for (index = 0; index < n; ++index, obj_table++) - cache_objs[index] = *obj_table; + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); cache->len += n; -- 2.5.5 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-24 14:50 [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy Jerin Jacob @ 2016-05-24 14:59 ` Olivier Matz 2016-05-24 15:17 ` Jerin Jacob 2016-05-26 8:07 ` [dpdk-dev] [PATCH v2] mempool: " Jerin Jacob 1 sibling, 1 reply; 32+ messages in thread From: Olivier Matz @ 2016-05-24 14:59 UTC (permalink / raw) To: Jerin Jacob, dev; +Cc: thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, On 05/24/2016 04:50 PM, Jerin Jacob wrote: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > lib/librte_mempool/rte_mempool.h | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index ed2c110..ebe399a 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -74,6 +74,7 @@ > #include <rte_memory.h> > #include <rte_branch_prediction.h> > #include <rte_ring.h> > +#include <rte_memcpy.h> > > #ifdef __cplusplus > extern "C" { > @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > unsigned n, __rte_unused int is_mp) > { > struct rte_mempool_cache *cache; > - uint32_t index; > void **cache_objs; > unsigned lcore_id = rte_lcore_id(); > uint32_t cache_size = mp->cache_size; > @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > */ > > /* Add elements back into the cache */ > - for (index = 0; index < n; ++index, obj_table++) > - cache_objs[index] = *obj_table; > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > cache->len += n; > > The commit title should be "mempool" instead of "mbuf". Are you seeing some performance improvement by using rte_memcpy()? Regards Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-24 14:59 ` Olivier Matz @ 2016-05-24 15:17 ` Jerin Jacob 2016-05-27 10:24 ` Hunt, David 2016-06-24 15:56 ` Hunt, David 0 siblings, 2 replies; 32+ messages in thread From: Jerin Jacob @ 2016-05-24 15:17 UTC (permalink / raw) To: Olivier Matz; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: > Hi Jerin, > > > On 05/24/2016 04:50 PM, Jerin Jacob wrote: > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > lib/librte_mempool/rte_mempool.h | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > > index ed2c110..ebe399a 100644 > > --- a/lib/librte_mempool/rte_mempool.h > > +++ b/lib/librte_mempool/rte_mempool.h > > @@ -74,6 +74,7 @@ > > #include <rte_memory.h> > > #include <rte_branch_prediction.h> > > #include <rte_ring.h> > > +#include <rte_memcpy.h> > > > > #ifdef __cplusplus > > extern "C" { > > @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > unsigned n, __rte_unused int is_mp) > > { > > struct rte_mempool_cache *cache; > > - uint32_t index; > > void **cache_objs; > > unsigned lcore_id = rte_lcore_id(); > > uint32_t cache_size = mp->cache_size; > > @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > */ > > > > /* Add elements back into the cache */ > > - for (index = 0; index < n; ++index, obj_table++) > > - cache_objs[index] = *obj_table; > > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > > > cache->len += n; > > > > > > The commit title should be "mempool" instead of "mbuf". I will fix it. > Are you seeing some performance improvement by using rte_memcpy()? Yes, In some case, In default case, It was replaced with memcpy by the compiler itself(gcc 5.3). But when I tried external mempool manager patch and then performance dropped almost 800Kpps. Debugging further it turns out that external mempool managers unrelated change was knocking out the memcpy. explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still unknown(In my test setup, packets are in the local cache, so it must be something do with __mempool_put_bulk text alignment change or similar. Anyone else observed performance drop with external poolmanager? Jerin > > Regards > Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-24 15:17 ` Jerin Jacob @ 2016-05-27 10:24 ` Hunt, David 2016-05-27 11:42 ` Jerin Jacob 2016-05-27 13:45 ` Hunt, David 2016-06-24 15:56 ` Hunt, David 1 sibling, 2 replies; 32+ messages in thread From: Hunt, David @ 2016-05-27 10:24 UTC (permalink / raw) To: Jerin Jacob, Olivier Matz Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On 5/24/2016 4:17 PM, Jerin Jacob wrote: > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: > >> Are you seeing some performance improvement by using rte_memcpy()? > Yes, In some case, In default case, It was replaced with memcpy by the > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > then performance dropped almost 800Kpps. Debugging further it turns out that > external mempool managers unrelated change was knocking out the memcpy. > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > unknown(In my test setup, packets are in the local cache, so it must be > something do with __mempool_put_bulk text alignment change or similar. > > Anyone else observed performance drop with external poolmanager? > > Jerin Jerin, I'm seeing a 300kpps drop in throughput when I apply this on top of the external mempool manager patch. If you're seeing an increase if you apply this patch first, then a drop when applying the mempool manager, the two patches must be conflicting in some way. We probably need to investigate further. Regards, Dave. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-27 10:24 ` Hunt, David @ 2016-05-27 11:42 ` Jerin Jacob 2016-05-27 15:05 ` Thomas Monjalon 2016-05-27 13:45 ` Hunt, David 1 sibling, 1 reply; 32+ messages in thread From: Jerin Jacob @ 2016-05-27 11:42 UTC (permalink / raw) To: Hunt, David Cc: Olivier Matz, dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On Fri, May 27, 2016 at 11:24:57AM +0100, Hunt, David wrote: > > > On 5/24/2016 4:17 PM, Jerin Jacob wrote: > > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: > > > > > Are you seeing some performance improvement by using rte_memcpy()? > > Yes, In some case, In default case, It was replaced with memcpy by the > > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > > then performance dropped almost 800Kpps. Debugging further it turns out that > > external mempool managers unrelated change was knocking out the memcpy. > > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > > unknown(In my test setup, packets are in the local cache, so it must be > > something do with __mempool_put_bulk text alignment change or similar. > > > > Anyone else observed performance drop with external poolmanager? > > > > Jerin > > Jerin, > I'm seeing a 300kpps drop in throughput when I apply this on top of the > external > mempool manager patch. If you're seeing an increase if you apply this patch > first, then > a drop when applying the mempool manager, the two patches must be > conflicting in > some way. We probably need to investigate further. In general, My concern is that most probably this patch also will get dropped on floor due to conflit in different architecture and some architecture/platform need to maintain this out out tree. Unlike other projects, DPDK modules are hand optimized due do that some change are depended register allocations and compiler version and text alignment etc. IMHO, I think we should have means to abstract this _logical_ changes under conditional compilation flags and any arch/platform can choose to select what it suites better for that arch/platform. We may NOT need to have frequent patches to select the specific configuration, but logical patches under compilation flags can be accepted and each arch/platform can choose specific set configuration when we make the final release candidate for the release. Any thoughts? Jerin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-27 11:42 ` Jerin Jacob @ 2016-05-27 15:05 ` Thomas Monjalon 2016-05-30 8:44 ` Olivier Matz 0 siblings, 1 reply; 32+ messages in thread From: Thomas Monjalon @ 2016-05-27 15:05 UTC (permalink / raw) To: Jerin Jacob Cc: Hunt, David, Olivier Matz, dev, bruce.richardson, konstantin.ananyev 2016-05-27 17:12, Jerin Jacob: > IMHO, I think we should have means to abstract this _logical_ changes > under conditional compilation flags and any arch/platform can choose > to select what it suites better for that arch/platform. > > We may NOT need to have frequent patches to select the specific > configuration, but logical patches under compilation flags can be accepted and > each arch/platform can choose specific set configuration when we make > the final release candidate for the release. > > Any thoughts? Yes having some #ifdefs for arch configuration may be reasonnable. But other methods must be preffered first: 1/ try implementing the function in arch-specific files 2/ and check at runtime if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_X 3/ or check #ifdef RTE_MACHINE_CPUFLAG_X 4/ or check #ifdef RTE_ARCH_Y 5/ or check a specific #ifdef RTE_FEATURE_NAME to choose in config files The option 2 is a nice to have which implies other options. Maybe that doc/guides/contributing/design.rst needs to be updated. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-27 15:05 ` Thomas Monjalon @ 2016-05-30 8:44 ` Olivier Matz 0 siblings, 0 replies; 32+ messages in thread From: Olivier Matz @ 2016-05-30 8:44 UTC (permalink / raw) To: Thomas Monjalon, Jerin Jacob Cc: Hunt, David, dev, bruce.richardson, konstantin.ananyev On 05/27/2016 05:05 PM, Thomas Monjalon wrote: > 2016-05-27 17:12, Jerin Jacob: >> IMHO, I think we should have means to abstract this _logical_ changes >> under conditional compilation flags and any arch/platform can choose >> to select what it suites better for that arch/platform. >> >> We may NOT need to have frequent patches to select the specific >> configuration, but logical patches under compilation flags can be accepted and >> each arch/platform can choose specific set configuration when we make >> the final release candidate for the release. >> >> Any thoughts? > > Yes having some #ifdefs for arch configuration may be reasonnable. > But other methods must be preffered first: > 1/ try implementing the function in arch-specific files I agree with Thomas. This option should be preferred, and I think we should avoid as much as possible to have: #if ARCH1 do stuff optimized for arch1 #elif ARCH2 do the same stuff optimized for arch2 #else ... In this particular case, rte_memcpy() seems to be the appropriate function, because it should already be arch-optimized. > 2/ and check at runtime if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_X > 3/ or check #ifdef RTE_MACHINE_CPUFLAG_X > 4/ or check #ifdef RTE_ARCH_Y > 5/ or check a specific #ifdef RTE_FEATURE_NAME to choose in config files > > The option 2 is a nice to have which implies other options. > > Maybe that doc/guides/contributing/design.rst needs to be updated. Regards, Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-27 10:24 ` Hunt, David 2016-05-27 11:42 ` Jerin Jacob @ 2016-05-27 13:45 ` Hunt, David 1 sibling, 0 replies; 32+ messages in thread From: Hunt, David @ 2016-05-27 13:45 UTC (permalink / raw) To: Jerin Jacob, Olivier Matz Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On 5/27/2016 11:24 AM, Hunt, David wrote: > > > On 5/24/2016 4:17 PM, Jerin Jacob wrote: >> On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: >> >>> Are you seeing some performance improvement by using rte_memcpy()? >> Yes, In some case, In default case, It was replaced with memcpy by the >> compiler itself(gcc 5.3). But when I tried external mempool manager >> patch and >> then performance dropped almost 800Kpps. Debugging further it turns >> out that >> external mempool managers unrelated change was knocking out the memcpy. >> explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still >> unknown(In my test setup, packets are in the local cache, so it must be >> something do with __mempool_put_bulk text alignment change or similar. >> >> Anyone else observed performance drop with external poolmanager? >> >> Jerin > > Jerin, > I'm seeing a 300kpps drop in throughput when I apply this on top > of the external > mempool manager patch. If you're seeing an increase if you apply this > patch first, then > a drop when applying the mempool manager, the two patches must be > conflicting in > some way. We probably need to investigate further. > Regards, > Dave. > On further investigation, I now have a setup with no performance degradation. My previous tests were accessing the NICS on a different NUMA node. Once I initiated testPMD with the correct coremask, the difference between pre and post rte_memcpy patch is negligible (maybe 0.1% drop). Regards, Dave. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-05-24 15:17 ` Jerin Jacob 2016-05-27 10:24 ` Hunt, David @ 2016-06-24 15:56 ` Hunt, David 2016-06-24 16:02 ` Olivier Matz 1 sibling, 1 reply; 32+ messages in thread From: Hunt, David @ 2016-06-24 15:56 UTC (permalink / raw) To: Jerin Jacob, Olivier Matz Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, I just ran a couple of tests on this patch on the latest master head on a couple of machines. An older quad socket E5-4650 and a quad socket E5-2699 v3 E5-4650: I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the cached tests. E5-2699 v3: I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the cached tests. This is purely the autotest comparison, I don't have traffic generator results. But based on the above, I don't think there are any performance issues with the patch. Regards, Dave. On 24/5/2016 4:17 PM, Jerin Jacob wrote: > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: >> Hi Jerin, >> >> >> On 05/24/2016 04:50 PM, Jerin Jacob wrote: >>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> >>> --- >>> lib/librte_mempool/rte_mempool.h | 5 ++--- >>> 1 file changed, 2 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h >>> index ed2c110..ebe399a 100644 >>> --- a/lib/librte_mempool/rte_mempool.h >>> +++ b/lib/librte_mempool/rte_mempool.h >>> @@ -74,6 +74,7 @@ >>> #include <rte_memory.h> >>> #include <rte_branch_prediction.h> >>> #include <rte_ring.h> >>> +#include <rte_memcpy.h> >>> >>> #ifdef __cplusplus >>> extern "C" { >>> @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, >>> unsigned n, __rte_unused int is_mp) >>> { >>> struct rte_mempool_cache *cache; >>> - uint32_t index; >>> void **cache_objs; >>> unsigned lcore_id = rte_lcore_id(); >>> uint32_t cache_size = mp->cache_size; >>> @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, >>> */ >>> >>> /* Add elements back into the cache */ >>> - for (index = 0; index < n; ++index, obj_table++) >>> - cache_objs[index] = *obj_table; >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); >>> >>> cache->len += n; >>> >>> >> The commit title should be "mempool" instead of "mbuf". > I will fix it. > >> Are you seeing some performance improvement by using rte_memcpy()? > Yes, In some case, In default case, It was replaced with memcpy by the > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > then performance dropped almost 800Kpps. Debugging further it turns out that > external mempool managers unrelated change was knocking out the memcpy. > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > unknown(In my test setup, packets are in the local cache, so it must be > something do with __mempool_put_bulk text alignment change or similar. > > Anyone else observed performance drop with external poolmanager? > > Jerin > >> Regards >> Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy 2016-06-24 15:56 ` Hunt, David @ 2016-06-24 16:02 ` Olivier Matz 0 siblings, 0 replies; 32+ messages in thread From: Olivier Matz @ 2016-06-24 16:02 UTC (permalink / raw) To: Hunt, David, Jerin Jacob Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Dave, On 06/24/2016 05:56 PM, Hunt, David wrote: > Hi Jerin, > > I just ran a couple of tests on this patch on the latest master head on > a couple of machines. An older quad socket E5-4650 and a quad socket > E5-2699 v3 > > E5-4650: > I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the > cached tests. > > E5-2699 v3: > I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the > cached tests. > > This is purely the autotest comparison, I don't have traffic generator > results. But based on the above, I don't think there are any performance > issues with the patch. > Thanks for doing the test on your side. I think it's probably enough to integrate Jerin's patch . About using a rte_memcpy() in the mempool_get(), I don't think I'll have the time to do a more exhaustive test before the 16.07, so I'll come back with it later. I'm sending an ack on the v2 thread. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-24 14:50 [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy Jerin Jacob 2016-05-24 14:59 ` Olivier Matz @ 2016-05-26 8:07 ` Jerin Jacob 2016-05-30 8:45 ` Olivier Matz ` (2 more replies) 1 sibling, 3 replies; 32+ messages in thread From: Jerin Jacob @ 2016-05-26 8:07 UTC (permalink / raw) To: dev Cc: thomas.monjalon, bruce.richardson, konstantin.ananyev, olivier.matz, Jerin Jacob Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- v1..v2 Corrected the the git commit message(s/mbuf/mempool/g) --- lib/librte_mempool/rte_mempool.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 60339bd..24876a2 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -73,6 +73,7 @@ #include <rte_memory.h> #include <rte_branch_prediction.h> #include <rte_ring.h> +#include <rte_memcpy.h> #ifdef __cplusplus extern "C" { @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n, int is_mp) { struct rte_mempool_cache *cache; - uint32_t index; void **cache_objs; unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size; @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, */ /* Add elements back into the cache */ - for (index = 0; index < n; ++index, obj_table++) - cache_objs[index] = *obj_table; + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); cache->len += n; -- 2.5.5 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-26 8:07 ` [dpdk-dev] [PATCH v2] mempool: " Jerin Jacob @ 2016-05-30 8:45 ` Olivier Matz 2016-05-31 12:58 ` Jerin Jacob 2016-06-30 9:41 ` Thomas Monjalon 2016-06-30 12:16 ` [dpdk-dev] [PATCH v3] " Jerin Jacob 2 siblings, 1 reply; 32+ messages in thread From: Olivier Matz @ 2016-05-30 8:45 UTC (permalink / raw) To: Jerin Jacob, dev; +Cc: thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, On 05/26/2016 10:07 AM, Jerin Jacob wrote: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > v1..v2 > Corrected the the git commit message(s/mbuf/mempool/g) > --- > lib/librte_mempool/rte_mempool.h | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index 60339bd..24876a2 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -73,6 +73,7 @@ > #include <rte_memory.h> > #include <rte_branch_prediction.h> > #include <rte_ring.h> > +#include <rte_memcpy.h> > > #ifdef __cplusplus > extern "C" { > @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > unsigned n, int is_mp) > { > struct rte_mempool_cache *cache; > - uint32_t index; > void **cache_objs; > unsigned lcore_id = rte_lcore_id(); > uint32_t cache_size = mp->cache_size; > @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > */ > > /* Add elements back into the cache */ > - for (index = 0; index < n; ++index, obj_table++) > - cache_objs[index] = *obj_table; > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > cache->len += n; > > I also checked in the get_bulk() function, which looks like that: /* Now fill in the response ... */ for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) *obj_table = cache_objs[len]; I think we could replace it by something like: rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); The only difference is that it won't reverse the pointers in the table, but I don't see any problem with that. What do you think? Regards, Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-30 8:45 ` Olivier Matz @ 2016-05-31 12:58 ` Jerin Jacob 2016-05-31 21:05 ` Olivier MATZ 0 siblings, 1 reply; 32+ messages in thread From: Jerin Jacob @ 2016-05-31 12:58 UTC (permalink / raw) To: Olivier Matz; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On Mon, May 30, 2016 at 10:45:11AM +0200, Olivier Matz wrote: > Hi Jerin, > > On 05/26/2016 10:07 AM, Jerin Jacob wrote: > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > v1..v2 > > Corrected the the git commit message(s/mbuf/mempool/g) > > --- > > lib/librte_mempool/rte_mempool.h | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > > index 60339bd..24876a2 100644 > > --- a/lib/librte_mempool/rte_mempool.h > > +++ b/lib/librte_mempool/rte_mempool.h > > @@ -73,6 +73,7 @@ > > #include <rte_memory.h> > > #include <rte_branch_prediction.h> > > #include <rte_ring.h> > > +#include <rte_memcpy.h> > > > > #ifdef __cplusplus > > extern "C" { > > @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > unsigned n, int is_mp) > > { > > struct rte_mempool_cache *cache; > > - uint32_t index; > > void **cache_objs; > > unsigned lcore_id = rte_lcore_id(); > > uint32_t cache_size = mp->cache_size; > > @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > */ > > > > /* Add elements back into the cache */ > > - for (index = 0; index < n; ++index, obj_table++) > > - cache_objs[index] = *obj_table; > > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > > > cache->len += n; > > > > > > I also checked in the get_bulk() function, which looks like that: > > /* Now fill in the response ... */ > for (index = 0, len = cache->len - 1; > index < n; > ++index, len--, obj_table++) > *obj_table = cache_objs[len]; > > I think we could replace it by something like: > > rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); > > The only difference is that it won't reverse the pointers in the > table, but I don't see any problem with that. > > What do you think? In true sense, it will _not_ be LIFO. Not sure about cache usage implications on the specific use cases. Jerin > > > Regards, > Olivier > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-31 12:58 ` Jerin Jacob @ 2016-05-31 21:05 ` Olivier MATZ 2016-06-01 7:00 ` Jerin Jacob 0 siblings, 1 reply; 32+ messages in thread From: Olivier MATZ @ 2016-05-31 21:05 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, >>> /* Add elements back into the cache */ >>> - for (index = 0; index < n; ++index, obj_table++) >>> - cache_objs[index] = *obj_table; >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); >>> >>> cache->len += n; >>> >>> >> >> I also checked in the get_bulk() function, which looks like that: >> >> /* Now fill in the response ... */ >> for (index = 0, len = cache->len - 1; >> index < n; >> ++index, len--, obj_table++) >> *obj_table = cache_objs[len]; >> >> I think we could replace it by something like: >> >> rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); >> >> The only difference is that it won't reverse the pointers in the >> table, but I don't see any problem with that. >> >> What do you think? > > In true sense, it will _not_ be LIFO. Not sure about cache usage implications > on the specific use cases. Today, the objects pointers are reversed only in the get(). It means that this code: rte_mempool_get_bulk(mp, table, 4); for (i = 0; i < 4; i++) printf("obj = %p\n", t[i]); rte_mempool_put_bulk(mp, table, 4); printf("-----\n"); rte_mempool_get_bulk(mp, table, 4); for (i = 0; i < 4; i++) printf("obj = %p\n", t[i]); rte_mempool_put_bulk(mp, table, 4); prints: addr1 addr2 addr3 addr4 ----- addr4 addr3 addr2 addr1 Which is quite strange. I don't think it would be an issue to replace the loop by a rte_memcpy(), it may increase the copy speed and it will be more coherent with the put(). Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-31 21:05 ` Olivier MATZ @ 2016-06-01 7:00 ` Jerin Jacob 2016-06-02 7:36 ` Olivier MATZ 0 siblings, 1 reply; 32+ messages in thread From: Jerin Jacob @ 2016-06-01 7:00 UTC (permalink / raw) To: Olivier MATZ; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote: > Hi Jerin, Hi Olivier, > > >>> /* Add elements back into the cache */ > >>> - for (index = 0; index < n; ++index, obj_table++) > >>> - cache_objs[index] = *obj_table; > >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > >>> > >>> cache->len += n; > >>> > >>> > >> > >> I also checked in the get_bulk() function, which looks like that: > >> > >> /* Now fill in the response ... */ > >> for (index = 0, len = cache->len - 1; > >> index < n; > >> ++index, len--, obj_table++) > >> *obj_table = cache_objs[len]; > >> > >> I think we could replace it by something like: > >> > >> rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); > >> > >> The only difference is that it won't reverse the pointers in the > >> table, but I don't see any problem with that. > >> > >> What do you think? > > > > In true sense, it will _not_ be LIFO. Not sure about cache usage implications > > on the specific use cases. > > Today, the objects pointers are reversed only in the get(). It means > that this code: > > rte_mempool_get_bulk(mp, table, 4); > for (i = 0; i < 4; i++) > printf("obj = %p\n", t[i]); > rte_mempool_put_bulk(mp, table, 4); > > > printf("-----\n"); > rte_mempool_get_bulk(mp, table, 4); > for (i = 0; i < 4; i++) > printf("obj = %p\n", t[i]); > rte_mempool_put_bulk(mp, table, 4); > > prints: > > addr1 > addr2 > addr3 > addr4 > ----- > addr4 > addr3 > addr2 > addr1 > > Which is quite strange. IMO, It is the expected LIFO behavior. Right ? What is not expected is the following, which is the case after change. Or Am I missing something here? addr1 addr2 addr3 addr4 ----- addr1 addr2 addr3 addr4 > > I don't think it would be an issue to replace the loop by a > rte_memcpy(), it may increase the copy speed and it will be > more coherent with the put(). > > > Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-01 7:00 ` Jerin Jacob @ 2016-06-02 7:36 ` Olivier MATZ 2016-06-02 9:39 ` Jerin Jacob 0 siblings, 1 reply; 32+ messages in thread From: Olivier MATZ @ 2016-06-02 7:36 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, On 06/01/2016 09:00 AM, Jerin Jacob wrote: > On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote: >> Today, the objects pointers are reversed only in the get(). It means >> that this code: >> >> rte_mempool_get_bulk(mp, table, 4); >> for (i = 0; i < 4; i++) >> printf("obj = %p\n", t[i]); >> rte_mempool_put_bulk(mp, table, 4); >> >> >> printf("-----\n"); >> rte_mempool_get_bulk(mp, table, 4); >> for (i = 0; i < 4; i++) >> printf("obj = %p\n", t[i]); >> rte_mempool_put_bulk(mp, table, 4); >> >> prints: >> >> addr1 >> addr2 >> addr3 >> addr4 >> ----- >> addr4 >> addr3 >> addr2 >> addr1 >> >> Which is quite strange. > > IMO, It is the expected LIFO behavior. Right ? > > What is not expected is the following, which is the case after change. Or Am I > missing something here? > > addr1 > addr2 > addr3 > addr4 > ----- > addr1 > addr2 > addr3 > addr4 > >> >> I don't think it would be an issue to replace the loop by a >> rte_memcpy(), it may increase the copy speed and it will be >> more coherent with the put(). >> I think the LIFO behavior should occur on a per-bulk basis. I mean, it should behave like in the exemplaes below: // pool cache is in state X obj1 = mempool_get(mp) obj2 = mempool_get(mp) mempool_put(mp, obj2) mempool_put(mp, obj1) // pool cache is back in state X // pool cache is in state X bulk1 = mempool_get_bulk(mp, 16) bulk2 = mempool_get_bulk(mp, 16) mempool_put_bulk(mp, bulk2, 16) mempool_put_bulk(mp, bulk1, 16) // pool cache is back in state X Note that today it's not the case for bulks, since object addresses are reversed only in get(), we are not back in the original state. I don't really see the advantage of this. Removing the reversing may accelerate the cache in case of bulk get, I think. Regards, Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-02 7:36 ` Olivier MATZ @ 2016-06-02 9:39 ` Jerin Jacob 2016-06-02 21:16 ` Olivier MATZ 0 siblings, 1 reply; 32+ messages in thread From: Jerin Jacob @ 2016-06-02 9:39 UTC (permalink / raw) To: Olivier MATZ; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On Thu, Jun 02, 2016 at 09:36:34AM +0200, Olivier MATZ wrote: > Hi Jerin, > > On 06/01/2016 09:00 AM, Jerin Jacob wrote: > > On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote: > >> Today, the objects pointers are reversed only in the get(). It means > >> that this code: > >> > >> rte_mempool_get_bulk(mp, table, 4); > >> for (i = 0; i < 4; i++) > >> printf("obj = %p\n", t[i]); > >> rte_mempool_put_bulk(mp, table, 4); > >> > >> > >> printf("-----\n"); > >> rte_mempool_get_bulk(mp, table, 4); > >> for (i = 0; i < 4; i++) > >> printf("obj = %p\n", t[i]); > >> rte_mempool_put_bulk(mp, table, 4); > >> > >> prints: > >> > >> addr1 > >> addr2 > >> addr3 > >> addr4 > >> ----- > >> addr4 > >> addr3 > >> addr2 > >> addr1 > >> > >> Which is quite strange. > > > > IMO, It is the expected LIFO behavior. Right ? > > > > What is not expected is the following, which is the case after change. Or Am I > > missing something here? > > > > addr1 > > addr2 > > addr3 > > addr4 > > ----- > > addr1 > > addr2 > > addr3 > > addr4 > > > >> > >> I don't think it would be an issue to replace the loop by a > >> rte_memcpy(), it may increase the copy speed and it will be > >> more coherent with the put(). > >> > > I think the LIFO behavior should occur on a per-bulk basis. I mean, > it should behave like in the exemplaes below: > > // pool cache is in state X > obj1 = mempool_get(mp) > obj2 = mempool_get(mp) > mempool_put(mp, obj2) > mempool_put(mp, obj1) > // pool cache is back in state X > > // pool cache is in state X > bulk1 = mempool_get_bulk(mp, 16) > bulk2 = mempool_get_bulk(mp, 16) > mempool_put_bulk(mp, bulk2, 16) > mempool_put_bulk(mp, bulk1, 16) > // pool cache is back in state X > Per entry LIFO behavior make more sense in _bulk_ case as recently en-queued buffer comes out for next "get" makes more chance that buffer in Last level cache. > Note that today it's not the case for bulks, since object addresses > are reversed only in get(), we are not back in the original state. > I don't really see the advantage of this. > > Removing the reversing may accelerate the cache in case of bulk get, > I think. I tried in my setup, it was dropping the performance. Have you got improvement in any setup? Jerin > > Regards, > Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-02 9:39 ` Jerin Jacob @ 2016-06-02 21:16 ` Olivier MATZ 2016-06-03 7:02 ` Jerin Jacob 0 siblings, 1 reply; 32+ messages in thread From: Olivier MATZ @ 2016-06-02 21:16 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, On 06/02/2016 11:39 AM, Jerin Jacob wrote: > On Thu, Jun 02, 2016 at 09:36:34AM +0200, Olivier MATZ wrote: >> I think the LIFO behavior should occur on a per-bulk basis. I mean, >> it should behave like in the exemplaes below: >> >> // pool cache is in state X >> obj1 = mempool_get(mp) >> obj2 = mempool_get(mp) >> mempool_put(mp, obj2) >> mempool_put(mp, obj1) >> // pool cache is back in state X >> >> // pool cache is in state X >> bulk1 = mempool_get_bulk(mp, 16) >> bulk2 = mempool_get_bulk(mp, 16) >> mempool_put_bulk(mp, bulk2, 16) >> mempool_put_bulk(mp, bulk1, 16) >> // pool cache is back in state X >> > > Per entry LIFO behavior make more sense in _bulk_ case as recently en-queued buffer > comes out for next "get" makes more chance that buffer in Last level cache. Yes, from a memory cache perspective, I think you are right. In practise, I'm not sure it's so important because on many hw drivers, a packet buffer returns to the pool only after a round of the tx ring. So I'd say it won't make a big difference here. >> Note that today it's not the case for bulks, since object addresses >> are reversed only in get(), we are not back in the original state. >> I don't really see the advantage of this. >> >> Removing the reversing may accelerate the cache in case of bulk get, >> I think. > > I tried in my setup, it was dropping the performance. Have you got > improvement in any setup? I know that the mempool_perf autotest is not representative of a real use case, but it gives a trend. I did a quick test with - the legacy code, - the rte_memcpy in put() - the rte_memcpy in both put() and get() (no reverse) It seems that removing the reversing brings ~50% of enhancement with bulks of 32 (on an westmere): legacy mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=839922483 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=849792204 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=1617022156 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1675087052 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=3202914713 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=3268725963 rte_memcpy in put() (your patch proposal) mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=762157465 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=744593817 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=1500276326 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1461347942 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=2974076107 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=2928122264 rte_memcpy in put() and get() mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=974834892 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1129329459 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=2147798220 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=2232457625 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=4510816664 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=4582421298 This is probably more a measure of the pure CPU cost of the mempool function, without considering the memory cache aspect. So, of course, a real use-case test should be done to confirm or not that it increases the performance. I'll manage to do a test and let you know the result. By the way, not all drivers are allocating or freeing the mbufs by bulk, so this modification would only affect these ones. What driver are you using for your test? Regards, Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-02 21:16 ` Olivier MATZ @ 2016-06-03 7:02 ` Jerin Jacob 2016-06-17 10:40 ` Olivier Matz 0 siblings, 1 reply; 32+ messages in thread From: Jerin Jacob @ 2016-06-03 7:02 UTC (permalink / raw) To: Olivier MATZ; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On Thu, Jun 02, 2016 at 11:16:16PM +0200, Olivier MATZ wrote: Hi Olivier, > This is probably more a measure of the pure CPU cost of the mempool > function, without considering the memory cache aspect. So, of course, > a real use-case test should be done to confirm or not that it increases > the performance. I'll manage to do a test and let you know the result. OK IMO, put rte_memcpy makes sense(this patch) as their no behavior change. However, if get rte_memcpy with behavioral changes makes sense some platform then we can enable it on conditional basics(I am OK with that) > > By the way, not all drivers are allocating or freeing the mbufs by > bulk, so this modification would only affect these ones. What driver > are you using for your test? I have tested with ThunderX nicvf pmd(uses the bulk mode). Recently sent out driver in ml for review Jerin > > > Regards, > Olivier > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-03 7:02 ` Jerin Jacob @ 2016-06-17 10:40 ` Olivier Matz 2016-06-24 16:04 ` Olivier Matz 0 siblings, 1 reply; 32+ messages in thread From: Olivier Matz @ 2016-06-17 10:40 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev Hi Jerin, On 06/03/2016 09:02 AM, Jerin Jacob wrote: > On Thu, Jun 02, 2016 at 11:16:16PM +0200, Olivier MATZ wrote: > Hi Olivier, > >> This is probably more a measure of the pure CPU cost of the mempool >> function, without considering the memory cache aspect. So, of course, >> a real use-case test should be done to confirm or not that it increases >> the performance. I'll manage to do a test and let you know the result. > > OK > > IMO, put rte_memcpy makes sense(this patch) as their no behavior change. > However, if get rte_memcpy with behavioral changes makes sense some platform > then we can enable it on conditional basics(I am OK with that) > >> >> By the way, not all drivers are allocating or freeing the mbufs by >> bulk, so this modification would only affect these ones. What driver >> are you using for your test? > > I have tested with ThunderX nicvf pmd(uses the bulk mode). > Recently sent out driver in ml for review Just to let you know I do not forget this. I still need to find some time to do a performance test. Regards, Olivier ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-17 10:40 ` Olivier Matz @ 2016-06-24 16:04 ` Olivier Matz 0 siblings, 0 replies; 32+ messages in thread From: Olivier Matz @ 2016-06-24 16:04 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, thomas.monjalon, bruce.richardson, konstantin.ananyev On 06/17/2016 12:40 PM, Olivier Matz wrote: > Hi Jerin, > > On 06/03/2016 09:02 AM, Jerin Jacob wrote: >> On Thu, Jun 02, 2016 at 11:16:16PM +0200, Olivier MATZ wrote: >> Hi Olivier, >> >>> This is probably more a measure of the pure CPU cost of the mempool >>> function, without considering the memory cache aspect. So, of course, >>> a real use-case test should be done to confirm or not that it increases >>> the performance. I'll manage to do a test and let you know the result. >> >> OK >> >> IMO, put rte_memcpy makes sense(this patch) as their no behavior change. >> However, if get rte_memcpy with behavioral changes makes sense some platform >> then we can enable it on conditional basics(I am OK with that) >> >>> >>> By the way, not all drivers are allocating or freeing the mbufs by >>> bulk, so this modification would only affect these ones. What driver >>> are you using for your test? >> >> I have tested with ThunderX nicvf pmd(uses the bulk mode). >> Recently sent out driver in ml for review > > Just to let you know I do not forget this. I still need to > find some time to do a performance test. Quoting from the other thread [1] too to save this in patchwork: [1] http://dpdk.org/ml/archives/dev/2016-June/042701.html > On 06/24/2016 05:56 PM, Hunt, David wrote: >> Hi Jerin, >> >> I just ran a couple of tests on this patch on the latest master head on >> a couple of machines. An older quad socket E5-4650 and a quad socket >> E5-2699 v3 >> >> E5-4650: >> I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the >> cached tests. >> >> E5-2699 v3: >> I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the >> cached tests. >> >> This is purely the autotest comparison, I don't have traffic generator >> results. But based on the above, I don't think there are any performance >> issues with the patch. >> > > Thanks for doing the test on your side. I think it's probably enough > to integrate Jerin's patch . > > About using a rte_memcpy() in the mempool_get(), I don't think I'll have > the time to do a more exhaustive test before the 16.07, so I'll come > back with it later. > > I'm sending an ack on the v2 thread. Acked-by: Olivier Matz <olivier.matz@6wind.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-26 8:07 ` [dpdk-dev] [PATCH v2] mempool: " Jerin Jacob 2016-05-30 8:45 ` Olivier Matz @ 2016-06-30 9:41 ` Thomas Monjalon 2016-06-30 11:38 ` Jerin Jacob 2016-06-30 12:16 ` [dpdk-dev] [PATCH v3] " Jerin Jacob 2 siblings, 1 reply; 32+ messages in thread From: Thomas Monjalon @ 2016-06-30 9:41 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, bruce.richardson, konstantin.ananyev, olivier.matz 2016-05-26 13:37, Jerin Jacob: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Please Jerin (or anyone else), could you rebase this patch? Thanks ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-30 9:41 ` Thomas Monjalon @ 2016-06-30 11:38 ` Jerin Jacob 0 siblings, 0 replies; 32+ messages in thread From: Jerin Jacob @ 2016-06-30 11:38 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, bruce.richardson, konstantin.ananyev, olivier.matz On Thu, Jun 30, 2016 at 11:41:59AM +0200, Thomas Monjalon wrote: > 2016-05-26 13:37, Jerin Jacob: > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > Please Jerin (or anyone else), could you rebase this patch? OK. I will send the rebased version > Thanks ^ permalink raw reply [flat|nested] 32+ messages in thread
* [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-05-26 8:07 ` [dpdk-dev] [PATCH v2] mempool: " Jerin Jacob 2016-05-30 8:45 ` Olivier Matz 2016-06-30 9:41 ` Thomas Monjalon @ 2016-06-30 12:16 ` Jerin Jacob 2016-06-30 17:28 ` Thomas Monjalon 2 siblings, 1 reply; 32+ messages in thread From: Jerin Jacob @ 2016-06-30 12:16 UTC (permalink / raw) To: dev; +Cc: thomas.monjalon, olivier.matz, Jerin Jacob Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> --- v1..v2 Corrected the the git commit message(s/mbuf/mempool/g) v2..v3 re-base to master --- --- lib/librte_mempool/rte_mempool.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index b2a5197..c8a81e2 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -74,6 +74,7 @@ #include <rte_memory.h> #include <rte_branch_prediction.h> #include <rte_ring.h> +#include <rte_memcpy.h> #ifdef __cplusplus extern "C" { @@ -1028,7 +1029,6 @@ static inline void __attribute__((always_inline)) __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, unsigned n, struct rte_mempool_cache *cache, int flags) { - uint32_t index; void **cache_objs; /* increment stat now, adding in mempool always success */ @@ -1052,8 +1052,7 @@ __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, */ /* Add elements back into the cache */ - for (index = 0; index < n; ++index, obj_table++) - cache_objs[index] = *obj_table; + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); cache->len += n; -- 2.5.5 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-30 12:16 ` [dpdk-dev] [PATCH v3] " Jerin Jacob @ 2016-06-30 17:28 ` Thomas Monjalon 2016-07-05 8:43 ` Ferruh Yigit 0 siblings, 1 reply; 32+ messages in thread From: Thomas Monjalon @ 2016-06-30 17:28 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, olivier.matz 2016-06-30 17:46, Jerin Jacob: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > Acked-by: Olivier Matz <olivier.matz@6wind.com> Applied, thanks ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-06-30 17:28 ` Thomas Monjalon @ 2016-07-05 8:43 ` Ferruh Yigit 2016-07-05 11:32 ` Yuanhan Liu 0 siblings, 1 reply; 32+ messages in thread From: Ferruh Yigit @ 2016-07-05 8:43 UTC (permalink / raw) To: Thomas Monjalon, Jerin Jacob; +Cc: dev, olivier.matz, Pablo de Lara On 6/30/2016 6:28 PM, Thomas Monjalon wrote: > 2016-06-30 17:46, Jerin Jacob: >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> >> Acked-by: Olivier Matz <olivier.matz@6wind.com> > > Applied, thanks > Hi Jerin, This commit cause a compilation error on target i686-native-linuxapp-gcc with gcc6. Compilation error is: == Build drivers/net/virtio CC virtio_rxtx_simple.o In file included from /root/development/dpdk/build/include/rte_mempool.h:77:0, from /root/development/dpdk/drivers/net/virtio/virtio_rxtx_simple.c:46: /root/development/dpdk/drivers/net/virtio/virtio_rxtx_simple.c: In function ‘virtio_xmit_pkts_simple’: /root/development/dpdk/build/include/rte_memcpy.h:551:2: error: array subscript is above array bounds [-Werror=array-bounds] rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /root/development/dpdk/build/include/rte_memcpy.h:552:2: error: array subscript is above array bounds [-Werror=array-bounds] rte_mov16((uint8_t *)dst + 2 * 16, (const uint8_t *)src + 2 * 16); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /root/development/dpdk/build/include/rte_memcpy.h:553:2: error: array subscript is above array bounds [-Werror=array-bounds] rte_mov16((uint8_t *)dst + 3 * 16, (const uint8_t *)src + 3 * 16); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /root/development/dpdk/build/include/rte_memcpy.h:554:2: error: array subscript is above array bounds [-Werror=array-bounds] rte_mov16((uint8_t *)dst + 4 * 16, (const uint8_t *)src + 4 * 16); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... ... Thanks, ferruh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-07-05 8:43 ` Ferruh Yigit @ 2016-07-05 11:32 ` Yuanhan Liu 2016-07-05 13:13 ` Jerin Jacob 0 siblings, 1 reply; 32+ messages in thread From: Yuanhan Liu @ 2016-07-05 11:32 UTC (permalink / raw) To: Jerin Jacob Cc: Ferruh Yigit, Thomas Monjalon, dev, olivier.matz, Pablo de Lara On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote: > On 6/30/2016 6:28 PM, Thomas Monjalon wrote: > > 2016-06-30 17:46, Jerin Jacob: > >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > >> Acked-by: Olivier Matz <olivier.matz@6wind.com> > > > > Applied, thanks > > > > Hi Jerin, > > This commit cause a compilation error on target i686-native-linuxapp-gcc > with gcc6. Besides that, I'm more curious to know have you actually seen any performance boost? --yliu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-07-05 11:32 ` Yuanhan Liu @ 2016-07-05 13:13 ` Jerin Jacob 2016-07-05 13:42 ` Yuanhan Liu 2016-07-05 14:09 ` Ferruh Yigit 0 siblings, 2 replies; 32+ messages in thread From: Jerin Jacob @ 2016-07-05 13:13 UTC (permalink / raw) To: Yuanhan Liu Cc: Ferruh Yigit, Thomas Monjalon, dev, olivier.matz, Pablo de Lara On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote: > On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote: > > On 6/30/2016 6:28 PM, Thomas Monjalon wrote: > > > 2016-06-30 17:46, Jerin Jacob: > > >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > >> Acked-by: Olivier Matz <olivier.matz@6wind.com> > > > > > > Applied, thanks > > > > > > > Hi Jerin, > > > > This commit cause a compilation error on target i686-native-linuxapp-gcc > > with gcc6. > > Besides that, I'm more curious to know have you actually seen any > performance boost? let me first address your curiosity, http://dpdk.org/dev/patchwork/patch/12993/( check the second comment) http://dpdk.org/ml/archives/dev/2016-June/042701.html Ferruh, I have tested on a x86 machine with gcc 6.1. I could n't see any issues with i686-native-linuxapp-gcc target Steps following to create gcc 6.1 toolchain https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/ (removed --disable-multilib to have support for -m32) ➜ [dpdk-master] $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0 --enable-languages=c,c++ --enable-libmudflap --with-system-zlib Thread model: posix gcc version 6.1.0 (GCC) More over this issue seems like an issue from x86 rte_memcpy implementation. Jerin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-07-05 13:13 ` Jerin Jacob @ 2016-07-05 13:42 ` Yuanhan Liu 2016-07-05 14:09 ` Ferruh Yigit 1 sibling, 0 replies; 32+ messages in thread From: Yuanhan Liu @ 2016-07-05 13:42 UTC (permalink / raw) To: Jerin Jacob Cc: Ferruh Yigit, Thomas Monjalon, dev, olivier.matz, Pablo de Lara On Tue, Jul 05, 2016 at 06:43:57PM +0530, Jerin Jacob wrote: > On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote: > > On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote: > > > On 6/30/2016 6:28 PM, Thomas Monjalon wrote: > > > > 2016-06-30 17:46, Jerin Jacob: > > > >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > > >> Acked-by: Olivier Matz <olivier.matz@6wind.com> > > > > > > > > Applied, thanks > > > > > > > > > > Hi Jerin, > > > > > > This commit cause a compilation error on target i686-native-linuxapp-gcc > > > with gcc6. > > > > Besides that, I'm more curious to know have you actually seen any > > performance boost? > > let me first address your curiosity, > http://dpdk.org/dev/patchwork/patch/12993/( check the second comment) > http://dpdk.org/ml/archives/dev/2016-June/042701.html Thanks. Maybe it's good to include those info in the commit log next time. --yliu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-07-05 13:13 ` Jerin Jacob 2016-07-05 13:42 ` Yuanhan Liu @ 2016-07-05 14:09 ` Ferruh Yigit 2016-07-06 16:21 ` Ferruh Yigit 1 sibling, 1 reply; 32+ messages in thread From: Ferruh Yigit @ 2016-07-05 14:09 UTC (permalink / raw) To: Jerin Jacob, Yuanhan Liu Cc: Thomas Monjalon, dev, olivier.matz, Pablo de Lara On 7/5/2016 2:13 PM, Jerin Jacob wrote: > On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote: >> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote: >>> On 6/30/2016 6:28 PM, Thomas Monjalon wrote: >>>> 2016-06-30 17:46, Jerin Jacob: >>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> >>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com> >>>> >>>> Applied, thanks >>>> >>> >>> Hi Jerin, >>> >>> This commit cause a compilation error on target i686-native-linuxapp-gcc >>> with gcc6. >> >> Besides that, I'm more curious to know have you actually seen any >> performance boost? > > let me first address your curiosity, > http://dpdk.org/dev/patchwork/patch/12993/( check the second comment) > http://dpdk.org/ml/archives/dev/2016-June/042701.html > > Ferruh, Hi Jerin, > > I have tested on a x86 machine with gcc 6.1. I could n't see any issues > with i686-native-linuxapp-gcc target Thanks for investigating the issue. > > Steps following to create gcc 6.1 toolchain > https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/ > (removed --disable-multilib to have support for -m32) > > ➜ [dpdk-master] $ gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper > Target: x86_64-pc-linux-gnu > Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0 > --enable-languages=c,c++ --enable-libmudflap --with-system-zlib > Thread model: posix > gcc version 6.1.0 (GCC) I am using Fedora24, which has gcc6 (6.1.1) as default. > > More over this issue seems like an issue from x86 rte_memcpy implementation. You are right. But i686 compilation starts failing with this commit. And reverting this commit in the current HEAD solves the compilation problem. I am not really clear about reason of the compilation error. Thanks, ferruh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-07-05 14:09 ` Ferruh Yigit @ 2016-07-06 16:21 ` Ferruh Yigit 2016-07-07 13:51 ` Ferruh Yigit 0 siblings, 1 reply; 32+ messages in thread From: Ferruh Yigit @ 2016-07-06 16:21 UTC (permalink / raw) To: Jerin Jacob, Yuanhan Liu Cc: Thomas Monjalon, dev, olivier.matz, Pablo de Lara On 7/5/2016 3:09 PM, Ferruh Yigit wrote: > On 7/5/2016 2:13 PM, Jerin Jacob wrote: >> On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote: >>> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote: >>>> On 6/30/2016 6:28 PM, Thomas Monjalon wrote: >>>>> 2016-06-30 17:46, Jerin Jacob: >>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> >>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com> >>>>> >>>>> Applied, thanks >>>>> >>>> >>>> Hi Jerin, >>>> >>>> This commit cause a compilation error on target i686-native-linuxapp-gcc >>>> with gcc6. >>> >>> Besides that, I'm more curious to know have you actually seen any >>> performance boost? >> >> let me first address your curiosity, >> http://dpdk.org/dev/patchwork/patch/12993/( check the second comment) >> http://dpdk.org/ml/archives/dev/2016-June/042701.html >> >> Ferruh, > > Hi Jerin, > >> >> I have tested on a x86 machine with gcc 6.1. I could n't see any issues >> with i686-native-linuxapp-gcc target > Thanks for investigating the issue. > >> >> Steps following to create gcc 6.1 toolchain >> https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/ >> (removed --disable-multilib to have support for -m32) >> >> ➜ [dpdk-master] $ gcc -v >> Using built-in specs. >> COLLECT_GCC=gcc >> COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper >> Target: x86_64-pc-linux-gnu >> Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0 >> --enable-languages=c,c++ --enable-libmudflap --with-system-zlib >> Thread model: posix >> gcc version 6.1.0 (GCC) > I am using Fedora24, which has gcc6 (6.1.1) as default. > >> >> More over this issue seems like an issue from x86 rte_memcpy implementation. > You are right. But i686 compilation starts failing with this commit. > And reverting this commit in the current HEAD solves the compilation > problem. > I am not really clear about reason of the compilation error. The compile error is because compiler is so smart now and at the same time not enough smart. Call stack is as following: virtio_xmit_pkts_simple virtio_xmit_cleanup rte_mempool_put_bulk rte_mempool_generic_put __mempool_generic_put rte_memcpy The array used as source buffer in virtio_xmit_cleanup (free) is a pointer array with 32 elements, in 32bit this makes 128 bytes. in rte_memcpy() implementation, there a code piece as following: if (size > 256) { rte_move128(...); rte_move128(...); <--- [1] .... } The compiler traces the array all through the call stack and knows the size of array is 128 and generates a warning on above [1] which tries to access beyond byte 128. But unfortunately it ignores the "(size > 256)" check. In 64bit, this warning is not generated because array size becomes 256 bytes. So this warning is a false positive. Although I am working on it, if anybody has a suggestion to prevent warning, quite welcome. At worst I will disable this compiler warning for the file. Thanks, ferruh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [dpdk-dev] [PATCH v3] mempool: replace c memcpy code semantics with optimized rte_memcpy 2016-07-06 16:21 ` Ferruh Yigit @ 2016-07-07 13:51 ` Ferruh Yigit 0 siblings, 0 replies; 32+ messages in thread From: Ferruh Yigit @ 2016-07-07 13:51 UTC (permalink / raw) To: Jerin Jacob, Yuanhan Liu Cc: Thomas Monjalon, dev, olivier.matz, Pablo de Lara On 7/6/2016 5:21 PM, Ferruh Yigit wrote: > On 7/5/2016 3:09 PM, Ferruh Yigit wrote: >> On 7/5/2016 2:13 PM, Jerin Jacob wrote: >>> On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote: >>>> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote: >>>>> On 6/30/2016 6:28 PM, Thomas Monjalon wrote: >>>>>> 2016-06-30 17:46, Jerin Jacob: >>>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> >>>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com> >>>>>> >>>>>> Applied, thanks >>>>>> >>>>> >>>>> Hi Jerin, >>>>> >>>>> This commit cause a compilation error on target i686-native-linuxapp-gcc >>>>> with gcc6. >>>> >>>> Besides that, I'm more curious to know have you actually seen any >>>> performance boost? >>> >>> let me first address your curiosity, >>> http://dpdk.org/dev/patchwork/patch/12993/( check the second comment) >>> http://dpdk.org/ml/archives/dev/2016-June/042701.html >>> >>> Ferruh, >> >> Hi Jerin, >> >>> >>> I have tested on a x86 machine with gcc 6.1. I could n't see any issues >>> with i686-native-linuxapp-gcc target >> Thanks for investigating the issue. >> >>> >>> Steps following to create gcc 6.1 toolchain >>> https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/ >>> (removed --disable-multilib to have support for -m32) >>> >>> ➜ [dpdk-master] $ gcc -v >>> Using built-in specs. >>> COLLECT_GCC=gcc >>> COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper >>> Target: x86_64-pc-linux-gnu >>> Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0 >>> --enable-languages=c,c++ --enable-libmudflap --with-system-zlib >>> Thread model: posix >>> gcc version 6.1.0 (GCC) >> I am using Fedora24, which has gcc6 (6.1.1) as default. >> >>> >>> More over this issue seems like an issue from x86 rte_memcpy implementation. >> You are right. But i686 compilation starts failing with this commit. >> And reverting this commit in the current HEAD solves the compilation >> problem. >> I am not really clear about reason of the compilation error. > > The compile error is because compiler is so smart now and at the same > time not enough smart. > > Call stack is as following: > > virtio_xmit_pkts_simple > virtio_xmit_cleanup > rte_mempool_put_bulk > rte_mempool_generic_put > __mempool_generic_put > rte_memcpy > > The array used as source buffer in virtio_xmit_cleanup (free) is a > pointer array with 32 elements, in 32bit this makes 128 bytes. > > in rte_memcpy() implementation, there a code piece as following: > if (size > 256) { > rte_move128(...); > rte_move128(...); <--- [1] > .... > } > > The compiler traces the array all through the call stack and knows the > size of array is 128 and generates a warning on above [1] which tries to > access beyond byte 128. > But unfortunately it ignores the "(size > 256)" check. > > In 64bit, this warning is not generated because array size becomes 256 > bytes. > > So this warning is a false positive. Although I am working on it, if > anybody has a suggestion to prevent warning, quite welcome. At worst I > will disable this compiler warning for the file. I have sent a patch: http://dpdk.org/ml/archives/dev/2016-July/043492.html Giving a hint to compiler that variable "size" is related to the size of the source buffer fixes compiler warning. - This update is in virtio fast path, can you please review it from point of performance effect. - Isn't this surprisingly smart of compiler, or am I missing something J Thanks, ferruh ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2016-07-07 13:51 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-24 14:50 [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy Jerin Jacob 2016-05-24 14:59 ` Olivier Matz 2016-05-24 15:17 ` Jerin Jacob 2016-05-27 10:24 ` Hunt, David 2016-05-27 11:42 ` Jerin Jacob 2016-05-27 15:05 ` Thomas Monjalon 2016-05-30 8:44 ` Olivier Matz 2016-05-27 13:45 ` Hunt, David 2016-06-24 15:56 ` Hunt, David 2016-06-24 16:02 ` Olivier Matz 2016-05-26 8:07 ` [dpdk-dev] [PATCH v2] mempool: " Jerin Jacob 2016-05-30 8:45 ` Olivier Matz 2016-05-31 12:58 ` Jerin Jacob 2016-05-31 21:05 ` Olivier MATZ 2016-06-01 7:00 ` Jerin Jacob 2016-06-02 7:36 ` Olivier MATZ 2016-06-02 9:39 ` Jerin Jacob 2016-06-02 21:16 ` Olivier MATZ 2016-06-03 7:02 ` Jerin Jacob 2016-06-17 10:40 ` Olivier Matz 2016-06-24 16:04 ` Olivier Matz 2016-06-30 9:41 ` Thomas Monjalon 2016-06-30 11:38 ` Jerin Jacob 2016-06-30 12:16 ` [dpdk-dev] [PATCH v3] " Jerin Jacob 2016-06-30 17:28 ` Thomas Monjalon 2016-07-05 8:43 ` Ferruh Yigit 2016-07-05 11:32 ` Yuanhan Liu 2016-07-05 13:13 ` Jerin Jacob 2016-07-05 13:42 ` Yuanhan Liu 2016-07-05 14:09 ` Ferruh Yigit 2016-07-06 16:21 ` Ferruh Yigit 2016-07-07 13:51 ` Ferruh Yigit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).