* [dpdk-dev] some questions about rte_memcpy @ 2015-01-22 3:39 Linhaifeng 2015-01-22 4:45 ` Matthew Hall 0 siblings, 1 reply; 9+ messages in thread From: Linhaifeng @ 2015-01-22 3:39 UTC (permalink / raw) To: dev #define rte_memcpy(dst, src, n) \ ((__builtin_constant_p(n)) ? \ memcpy((dst), (src), (n)) : \ rte_memcpy_func((dst), (src), (n))) Why call memcpy when n is constant variable? Can i change them to the follow codes? #define rte_memcpy(dst, src, n) \ { \ int num = n; \ rte_memcpy_func((dst), (src), (num))) \ } -- Regards, Haifeng ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 3:39 [dpdk-dev] some questions about rte_memcpy Linhaifeng @ 2015-01-22 4:45 ` Matthew Hall 2015-01-22 5:32 ` Linhaifeng 0 siblings, 1 reply; 9+ messages in thread From: Matthew Hall @ 2015-01-22 4:45 UTC (permalink / raw) To: Linhaifeng; +Cc: dev On Thu, Jan 22, 2015 at 11:39:11AM +0800, Linhaifeng wrote: > Why call memcpy when n is constant variable? One theory. Many DPDK functions crash if they are called before rte_eal_init() is called. So perhaps this could be a cause, since that won't have been called when working on a constant? Matthew. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 4:45 ` Matthew Hall @ 2015-01-22 5:32 ` Linhaifeng 2015-01-22 7:35 ` Matthew Hall 0 siblings, 1 reply; 9+ messages in thread From: Linhaifeng @ 2015-01-22 5:32 UTC (permalink / raw) To: Matthew Hall; +Cc: dev On 2015/1/22 12:45, Matthew Hall wrote: > One theory. Many DPDK functions crash if they are called before rte_eal_init() > is called. So perhaps this could be a cause, since that won't have been called > when working on a constant Hi, Matthew Thank you for your response. Do you mean if call rte_memcpy before rte_eal_init() would crash?why? -- Regards, Haifeng ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 5:32 ` Linhaifeng @ 2015-01-22 7:35 ` Matthew Hall 2015-01-22 10:23 ` Tetsuya Mukawa 0 siblings, 1 reply; 9+ messages in thread From: Matthew Hall @ 2015-01-22 7:35 UTC (permalink / raw) To: Linhaifeng; +Cc: dev On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote: > Do you mean if call rte_memcpy before rte_eal_init() would crash?why? No guarantee. But a theory. It might use some things from the EAL init to figure out which version of the accelerated algorithm to use. Matthew. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 7:35 ` Matthew Hall @ 2015-01-22 10:23 ` Tetsuya Mukawa 2015-01-22 11:34 ` Bruce Richardson 0 siblings, 1 reply; 9+ messages in thread From: Tetsuya Mukawa @ 2015-01-22 10:23 UTC (permalink / raw) To: Linhaifeng; +Cc: dev On 2015/01/22 16:35, Matthew Hall wrote: > On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote: >> Do you mean if call rte_memcpy before rte_eal_init() would crash?why? > No guarantee. But a theory. It might use some things from the EAL init to > figure out which version of the accelerated algorithm to use. This selection is done at compile-time. And if the size is constant, I guess DPDK assumes memcpy is replaced by inline __builtin_memcpy. I haven't checked the performance of builtin memcpy, but probably much faster. Tetsuya > Matthew. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 10:23 ` Tetsuya Mukawa @ 2015-01-22 11:34 ` Bruce Richardson 2015-01-22 12:53 ` Linhaifeng 0 siblings, 1 reply; 9+ messages in thread From: Bruce Richardson @ 2015-01-22 11:34 UTC (permalink / raw) To: Tetsuya Mukawa; +Cc: dev On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote: > On 2015/01/22 16:35, Matthew Hall wrote: > > On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote: > >> Do you mean if call rte_memcpy before rte_eal_init() would crash?why? > > No guarantee. But a theory. It might use some things from the EAL init to > > figure out which version of the accelerated algorithm to use. > > This selection is done at compile-time. > And if the size is constant, I guess DPDK assumes memcpy is replaced by > inline __builtin_memcpy. > I haven't checked the performance of builtin memcpy, but probably much > faster. > Yes, that assumption is correct. A couple of years ago we discovered that for constant size values, the compiler would generate much faster code for us using a regular memcpy than rte_memcpy, hence the macro. /Bruce > Tetsuya > > > Matthew. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 11:34 ` Bruce Richardson @ 2015-01-22 12:53 ` Linhaifeng 2015-01-22 15:21 ` Bruce Richardson 0 siblings, 1 reply; 9+ messages in thread From: Linhaifeng @ 2015-01-22 12:53 UTC (permalink / raw) To: Bruce Richardson, Tetsuya Mukawa; +Cc: dev On 2015/1/22 19:34, Bruce Richardson wrote: > On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote: >> On 2015/01/22 16:35, Matthew Hall wrote: >>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote: >>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why? >>> No guarantee. But a theory. It might use some things from the EAL init to >>> figure out which version of the accelerated algorithm to use. >> >> This selection is done at compile-time. >> And if the size is constant, I guess DPDK assumes memcpy is replaced by >> inline __builtin_memcpy. >> I haven't checked the performance of builtin memcpy, but probably much >> faster. >> > > Yes, that assumption is correct. A couple of years ago we discovered that for > constant size values, the compiler would generate much faster code for us > using a regular memcpy than rte_memcpy, hence the macro. > > /Bruce > >> Tetsuya >> >>> Matthew. >> >> > > Hi,Bruce I test it,most results like you said use constant may be faster,but sometimes not. linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 rte_memcpy(constant) used:279893712 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:277818600 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 rte_memcpy(constant) used:279264328 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:277667116 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 rte_memcpy(constant) used:279491832 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:277622772 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 rte_memcpy(constant) used:279402156 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:277738464 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 rte_memcpy(constant) used:279305172 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:277483004 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 rte_memcpy(constant) used:279784124 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:277605332 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 rte_memcpy(constant) used:322817260 rte_memcpy(variable) used:350333864 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 rte_memcpy(constant) used:322840748 rte_memcpy(variable) used:350297868 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 rte_memcpy(constant) used:322488240 rte_memcpy(variable) used:350348652 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 rte_memcpy(constant) used:322021428 rte_memcpy(variable) used:350416440 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 rte_memcpy(constant) used:321370900 rte_memcpy(variable) used:350355796 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 rte_memcpy(constant) used:322704552 rte_memcpy(variable) used:349900832 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 rte_memcpy(constant) used:422705828 rte_memcpy(variable) used:425493328 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 rte_memcpy(constant) used:422421840 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:413691412 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 rte_memcpy(constant) used:425233088 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:421136724 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 rte_memcpy(constant) used:901014608 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:900997388 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 rte_memcpy(constant) used:900803308 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:900794076 linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 rte_memcpy(constant) used:901842436 @@@@@@@@@@@@@@ not faster rte_memcpy(variable) used:901218984 linux-mnSyvH:/mnt/sdb/linhf/test # here is my test codes: #include <stdio.h> #include <rte_memcpy.h> #include <rte_cycles.h> int main(int narg, char** args) { int i; char buf[1024]; uint64_t start, end; if (narg < 3) { printf("usage:./rte_memcpy_test size times\n"); return 0; } size_t size_v = atoi(args[1]); const size_t size_c = atoi(args[1]); int times = atoi(args[2]); start = rte_rdtsc(); for(i = 0; i < times; i++) { rte_memcpy(buf, buf, size_c); } end = rte_rdtsc(); printf("rte_memcpy(constant) used:%llu\n", end - start); start = rte_rdtsc(); for (i = 0; i < times; i++) { rte_memcpy(buf, buf, size_v); } end = rte_rdtsc(); printf("rte_memcpy(variable) used:%llu\n", end - start); return 0; } -- Regards, Haifeng ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 12:53 ` Linhaifeng @ 2015-01-22 15:21 ` Bruce Richardson 2015-01-23 2:58 ` Linhaifeng 0 siblings, 1 reply; 9+ messages in thread From: Bruce Richardson @ 2015-01-22 15:21 UTC (permalink / raw) To: Linhaifeng; +Cc: dev On Thu, Jan 22, 2015 at 08:53:13PM +0800, Linhaifeng wrote: > > > On 2015/1/22 19:34, Bruce Richardson wrote: > > On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote: > >> On 2015/01/22 16:35, Matthew Hall wrote: > >>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote: > >>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why? > >>> No guarantee. But a theory. It might use some things from the EAL init to > >>> figure out which version of the accelerated algorithm to use. > >> > >> This selection is done at compile-time. > >> And if the size is constant, I guess DPDK assumes memcpy is replaced by > >> inline __builtin_memcpy. > >> I haven't checked the performance of builtin memcpy, but probably much > >> faster. > >> > > > > Yes, that assumption is correct. A couple of years ago we discovered that for > > constant size values, the compiler would generate much faster code for us > > using a regular memcpy than rte_memcpy, hence the macro. > > > > /Bruce > > > >> Tetsuya > >> > >>> Matthew. > >> > >> > > > > > > Hi,Bruce > > I test it,most results like you said use constant may be faster,but sometimes not. > > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 > rte_memcpy(constant) used:279893712 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277818600 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 > rte_memcpy(constant) used:279264328 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277667116 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 > rte_memcpy(constant) used:279491832 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277622772 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 > rte_memcpy(constant) used:279402156 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277738464 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 > rte_memcpy(constant) used:279305172 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277483004 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 > rte_memcpy(constant) used:279784124 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277605332 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 > rte_memcpy(constant) used:322817260 > rte_memcpy(variable) used:350333864 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 > rte_memcpy(constant) used:322840748 > rte_memcpy(variable) used:350297868 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 > rte_memcpy(constant) used:322488240 > rte_memcpy(variable) used:350348652 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 > rte_memcpy(constant) used:322021428 > rte_memcpy(variable) used:350416440 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 > rte_memcpy(constant) used:321370900 > rte_memcpy(variable) used:350355796 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 > rte_memcpy(constant) used:322704552 > rte_memcpy(variable) used:349900832 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 > rte_memcpy(constant) used:422705828 > rte_memcpy(variable) used:425493328 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 > rte_memcpy(constant) used:422421840 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:413691412 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 > rte_memcpy(constant) used:425233088 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:421136724 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 > rte_memcpy(constant) used:901014608 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:900997388 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 > rte_memcpy(constant) used:900803308 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:900794076 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 > rte_memcpy(constant) used:901842436 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:901218984 > linux-mnSyvH:/mnt/sdb/linhf/test # > > > > here is my test codes: > > #include <stdio.h> > #include <rte_memcpy.h> > #include <rte_cycles.h> > > > int main(int narg, char** args) > { > int i; > char buf[1024]; > uint64_t start, end; > > if (narg < 3) { > printf("usage:./rte_memcpy_test size times\n"); > return 0; > } > > size_t size_v = atoi(args[1]); > const size_t size_c = atoi(args[1]); This (size_c) is a run-time constant, not a compile-time constant. To trigger the memcpy optimizations inside the compiler, the size value must be constant at compile time. Regards, /Bruce > int times = atoi(args[2]); > > start = rte_rdtsc(); > for(i = 0; i < times; i++) { > rte_memcpy(buf, buf, size_c); > } > end = rte_rdtsc(); > printf("rte_memcpy(constant) used:%llu\n", end - start); > > start = rte_rdtsc(); > for (i = 0; i < times; i++) { > rte_memcpy(buf, buf, size_v); > } > end = rte_rdtsc(); > printf("rte_memcpy(variable) used:%llu\n", end - start); > > return 0; > } > > > > > > -- > Regards, > Haifeng > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] some questions about rte_memcpy 2015-01-22 15:21 ` Bruce Richardson @ 2015-01-23 2:58 ` Linhaifeng 0 siblings, 0 replies; 9+ messages in thread From: Linhaifeng @ 2015-01-23 2:58 UTC (permalink / raw) To: Bruce Richardson; +Cc: dev On 2015/1/22 23:21, Bruce Richardson wrote: > This (size_c) is a run-time constant, not a compile-time constant. To trigger the > memcpy optimizations inside the compiler, the size value must be constant at > compile time. Hi, Bruce You are right. When use compile-time constant memcpy is faster.Thank you for all. Here is my test result: rte_memcpy(constant) size:8 time:876 rte_memcpy(variable) size:8 time:2824 rte_memcpy(constant) size:16 time:868 rte_memcpy(variable) size:16 time:4436 rte_memcpy(constant) size:32 time:856 rte_memcpy(variable) size:32 time:3264 rte_memcpy(constant) size:48 time:872 rte_memcpy(variable) size:48 time:3972 rte_memcpy(constant) size:64 time:856 rte_memcpy(variable) size:64 time:3644 rte_memcpy(constant) size:128 time:868 rte_memcpy(variable) size:128 time:4720 rte_memcpy(constant) size:256 time:868 rte_memcpy(variable) size:256 time:9624 Here is my test program(Who know how to use a loop to test 'constant memcpy'?): #include <stdio.h> #include <rte_memcpy.h> #include <rte_cycles.h> int main(int narg, char** args) { int i,t; char buf[256]; int tests[7] = {8,16,32,48,64,128,256}; char buf8[8],buf16[16],buf32[32],buf48[48],buf64[64],buf128[128],buf256[256]; uint64_t start, end; int times = 9999999; uint64_t result_c[7]; if (narg < 2) { printf("usage:./rte_memcpy_test times\n"); return -1; } times = atoi(args[1]); start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf8, buf8, sizeof buf8); } end = rte_rdtsc(); result_c[0] = end - start; start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf16, buf16, sizeof buf16); } end = rte_rdtsc(); result_c[1] = end - start; start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf32, buf32, sizeof buf32); } end = rte_rdtsc(); result_c[2] = end - start; start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf48, buf48, sizeof buf48); } end = rte_rdtsc(); result_c[3] = end - start; start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf64, buf64, sizeof buf64); } end = rte_rdtsc(); result_c[4] = end - start; start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf128, buf128, sizeof buf128); } end = rte_rdtsc(); result_c[5] = end - start; start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf256, buf256, sizeof buf256); } end = rte_rdtsc(); result_c[6] = end - start; for (i = 0; i < (sizeof tests / sizeof tests[0]); i++) { start = rte_rdtsc(); for(t = 0; t < times; t++) { rte_memcpy(buf, buf, tests[i]); } end = rte_rdtsc(); printf("rte_memcpy(constant) size:%d time:%llu\n", tests[i], result_c[i]); printf("rte_memcpy(variable) size:%d time:%llu\n", tests[i], end - start); } return 0; } -- Regards, Haifeng ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-01-23 2:58 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-01-22 3:39 [dpdk-dev] some questions about rte_memcpy Linhaifeng 2015-01-22 4:45 ` Matthew Hall 2015-01-22 5:32 ` Linhaifeng 2015-01-22 7:35 ` Matthew Hall 2015-01-22 10:23 ` Tetsuya Mukawa 2015-01-22 11:34 ` Bruce Richardson 2015-01-22 12:53 ` Linhaifeng 2015-01-22 15:21 ` Bruce Richardson 2015-01-23 2:58 ` Linhaifeng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).