DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] some questions about  rte_memcpy
@ 2015-01-22  3:39 Linhaifeng
  2015-01-22  4:45 ` Matthew Hall
  0 siblings, 1 reply; 9+ messages in thread
From: Linhaifeng @ 2015-01-22  3:39 UTC (permalink / raw)
  To: dev

#define rte_memcpy(dst, src, n)              \
	((__builtin_constant_p(n)) ?          \
	memcpy((dst), (src), (n)) :          \
	rte_memcpy_func((dst), (src), (n)))


Why call memcpy when n is constant variable?
Can i change them to the follow codes?

#define rte_memcpy(dst, src, n)              \
    {   \
        int num = n;    \
	rte_memcpy_func((dst), (src), (num)))   \
    }


-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22  3:39 [dpdk-dev] some questions about rte_memcpy Linhaifeng
@ 2015-01-22  4:45 ` Matthew Hall
  2015-01-22  5:32   ` Linhaifeng
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Hall @ 2015-01-22  4:45 UTC (permalink / raw)
  To: Linhaifeng; +Cc: dev

On Thu, Jan 22, 2015 at 11:39:11AM +0800, Linhaifeng wrote:
> Why call memcpy when n is constant variable?

One theory. Many DPDK functions crash if they are called before rte_eal_init() 
is called. So perhaps this could be a cause, since that won't have been called 
when working on a constant?

Matthew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22  4:45 ` Matthew Hall
@ 2015-01-22  5:32   ` Linhaifeng
  2015-01-22  7:35     ` Matthew Hall
  0 siblings, 1 reply; 9+ messages in thread
From: Linhaifeng @ 2015-01-22  5:32 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev



On 2015/1/22 12:45, Matthew Hall wrote:
> One theory. Many DPDK functions crash if they are called before rte_eal_init() 
> is called. So perhaps this could be a cause, since that won't have been called 
> when working on a constant

Hi, Matthew

Thank you for your response.

Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22  5:32   ` Linhaifeng
@ 2015-01-22  7:35     ` Matthew Hall
  2015-01-22 10:23       ` Tetsuya Mukawa
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Hall @ 2015-01-22  7:35 UTC (permalink / raw)
  To: Linhaifeng; +Cc: dev

On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?

No guarantee. But a theory. It might use some things from the EAL init to 
figure out which version of the accelerated algorithm to use.

Matthew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22  7:35     ` Matthew Hall
@ 2015-01-22 10:23       ` Tetsuya Mukawa
  2015-01-22 11:34         ` Bruce Richardson
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuya Mukawa @ 2015-01-22 10:23 UTC (permalink / raw)
  To: Linhaifeng; +Cc: dev

On 2015/01/22 16:35, Matthew Hall wrote:
> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> No guarantee. But a theory. It might use some things from the EAL init to 
> figure out which version of the accelerated algorithm to use.

This selection is done at compile-time.
And if the size is constant, I guess DPDK assumes memcpy is replaced by
inline __builtin_memcpy.
I haven't checked the performance of builtin memcpy, but probably much
faster.

Tetsuya

> Matthew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22 10:23       ` Tetsuya Mukawa
@ 2015-01-22 11:34         ` Bruce Richardson
  2015-01-22 12:53           ` Linhaifeng
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce Richardson @ 2015-01-22 11:34 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev

On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
> On 2015/01/22 16:35, Matthew Hall wrote:
> > On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> >> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> > No guarantee. But a theory. It might use some things from the EAL init to 
> > figure out which version of the accelerated algorithm to use.
> 
> This selection is done at compile-time.
> And if the size is constant, I guess DPDK assumes memcpy is replaced by
> inline __builtin_memcpy.
> I haven't checked the performance of builtin memcpy, but probably much
> faster.
> 

Yes, that assumption is correct. A couple of years ago we discovered that for
constant size values, the compiler would generate much faster code for us
using a regular memcpy than rte_memcpy, hence the macro.

/Bruce

> Tetsuya
> 
> > Matthew.
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22 11:34         ` Bruce Richardson
@ 2015-01-22 12:53           ` Linhaifeng
  2015-01-22 15:21             ` Bruce Richardson
  0 siblings, 1 reply; 9+ messages in thread
From: Linhaifeng @ 2015-01-22 12:53 UTC (permalink / raw)
  To: Bruce Richardson, Tetsuya Mukawa; +Cc: dev



On 2015/1/22 19:34, Bruce Richardson wrote:
> On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
>> On 2015/01/22 16:35, Matthew Hall wrote:
>>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
>>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
>>> No guarantee. But a theory. It might use some things from the EAL init to 
>>> figure out which version of the accelerated algorithm to use.
>>
>> This selection is done at compile-time.
>> And if the size is constant, I guess DPDK assumes memcpy is replaced by
>> inline __builtin_memcpy.
>> I haven't checked the performance of builtin memcpy, but probably much
>> faster.
>>
> 
> Yes, that assumption is correct. A couple of years ago we discovered that for
> constant size values, the compiler would generate much faster code for us
> using a regular memcpy than rte_memcpy, hence the macro.
> 
> /Bruce
> 
>> Tetsuya
>>
>>> Matthew.
>>
>>
> 
> 

Hi,Bruce

I test it,most results like you said use constant may be faster,but sometimes not.

linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
rte_memcpy(constant) used:279893712	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:277818600
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
rte_memcpy(constant) used:279264328	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:277667116
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
rte_memcpy(constant) used:279491832	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:277622772
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
rte_memcpy(constant) used:279402156	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:277738464
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
rte_memcpy(constant) used:279305172	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:277483004
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
rte_memcpy(constant) used:279784124	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:277605332
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
rte_memcpy(constant) used:322817260
rte_memcpy(variable) used:350333864
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
rte_memcpy(constant) used:322840748
rte_memcpy(variable) used:350297868
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
rte_memcpy(constant) used:322488240
rte_memcpy(variable) used:350348652
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
rte_memcpy(constant) used:322021428
rte_memcpy(variable) used:350416440
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
rte_memcpy(constant) used:321370900
rte_memcpy(variable) used:350355796
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
rte_memcpy(constant) used:322704552
rte_memcpy(variable) used:349900832
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
rte_memcpy(constant) used:422705828
rte_memcpy(variable) used:425493328
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
rte_memcpy(constant) used:422421840	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:413691412
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
rte_memcpy(constant) used:425233088	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:421136724
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
rte_memcpy(constant) used:901014608	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:900997388
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
rte_memcpy(constant) used:900803308	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:900794076
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
rte_memcpy(constant) used:901842436	@@@@@@@@@@@@@@ not faster
rte_memcpy(variable) used:901218984
linux-mnSyvH:/mnt/sdb/linhf/test #



here is my test codes:

#include <stdio.h>
#include <rte_memcpy.h>
#include <rte_cycles.h>


int main(int narg, char** args)
{
        int i;
        char buf[1024];
        uint64_t start, end;

        if (narg < 3) {
                printf("usage:./rte_memcpy_test size times\n");
                return 0;
        }

        size_t size_v = atoi(args[1]);
        const size_t size_c = atoi(args[1]);
        int times = atoi(args[2]);

        start = rte_rdtsc();
        for(i = 0; i < times; i++) {
                rte_memcpy(buf, buf, size_c);
        }
        end = rte_rdtsc();
        printf("rte_memcpy(constant) used:%llu\n", end - start);

        start = rte_rdtsc();
        for (i = 0; i < times; i++) {
                rte_memcpy(buf, buf, size_v);
        }
        end = rte_rdtsc();
        printf("rte_memcpy(variable) used:%llu\n", end - start);

        return 0;
}





-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22 12:53           ` Linhaifeng
@ 2015-01-22 15:21             ` Bruce Richardson
  2015-01-23  2:58               ` Linhaifeng
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce Richardson @ 2015-01-22 15:21 UTC (permalink / raw)
  To: Linhaifeng; +Cc: dev

On Thu, Jan 22, 2015 at 08:53:13PM +0800, Linhaifeng wrote:
> 
> 
> On 2015/1/22 19:34, Bruce Richardson wrote:
> > On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/01/22 16:35, Matthew Hall wrote:
> >>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> >>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> >>> No guarantee. But a theory. It might use some things from the EAL init to 
> >>> figure out which version of the accelerated algorithm to use.
> >>
> >> This selection is done at compile-time.
> >> And if the size is constant, I guess DPDK assumes memcpy is replaced by
> >> inline __builtin_memcpy.
> >> I haven't checked the performance of builtin memcpy, but probably much
> >> faster.
> >>
> > 
> > Yes, that assumption is correct. A couple of years ago we discovered that for
> > constant size values, the compiler would generate much faster code for us
> > using a regular memcpy than rte_memcpy, hence the macro.
> > 
> > /Bruce
> > 
> >> Tetsuya
> >>
> >>> Matthew.
> >>
> >>
> > 
> > 
> 
> Hi,Bruce
> 
> I test it,most results like you said use constant may be faster,but sometimes not.
> 
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
> rte_memcpy(constant) used:279893712	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277818600
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
> rte_memcpy(constant) used:279264328	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277667116
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
> rte_memcpy(constant) used:279491832	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277622772
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
> rte_memcpy(constant) used:279402156	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277738464
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
> rte_memcpy(constant) used:279305172	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277483004
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
> rte_memcpy(constant) used:279784124	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277605332
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
> rte_memcpy(constant) used:322817260
> rte_memcpy(variable) used:350333864
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
> rte_memcpy(constant) used:322840748
> rte_memcpy(variable) used:350297868
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
> rte_memcpy(constant) used:322488240
> rte_memcpy(variable) used:350348652
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
> rte_memcpy(constant) used:322021428
> rte_memcpy(variable) used:350416440
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
> rte_memcpy(constant) used:321370900
> rte_memcpy(variable) used:350355796
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
> rte_memcpy(constant) used:322704552
> rte_memcpy(variable) used:349900832
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
> rte_memcpy(constant) used:422705828
> rte_memcpy(variable) used:425493328
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
> rte_memcpy(constant) used:422421840	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:413691412
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
> rte_memcpy(constant) used:425233088	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:421136724
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
> rte_memcpy(constant) used:901014608	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:900997388
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
> rte_memcpy(constant) used:900803308	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:900794076
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
> rte_memcpy(constant) used:901842436	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:901218984
> linux-mnSyvH:/mnt/sdb/linhf/test #
> 
> 
> 
> here is my test codes:
> 
> #include <stdio.h>
> #include <rte_memcpy.h>
> #include <rte_cycles.h>
> 
> 
> int main(int narg, char** args)
> {
>         int i;
>         char buf[1024];
>         uint64_t start, end;
> 
>         if (narg < 3) {
>                 printf("usage:./rte_memcpy_test size times\n");
>                 return 0;
>         }
> 
>         size_t size_v = atoi(args[1]);
>         const size_t size_c = atoi(args[1]);

This (size_c) is a run-time constant, not a compile-time constant. To trigger the
memcpy optimizations inside the compiler, the size value must be constant at
compile time.

Regards,
/Bruce

>         int times = atoi(args[2]);
> 
>         start = rte_rdtsc();
>         for(i = 0; i < times; i++) {
>                 rte_memcpy(buf, buf, size_c);
>         }
>         end = rte_rdtsc();
>         printf("rte_memcpy(constant) used:%llu\n", end - start);
> 
>         start = rte_rdtsc();
>         for (i = 0; i < times; i++) {
>                 rte_memcpy(buf, buf, size_v);
>         }
>         end = rte_rdtsc();
>         printf("rte_memcpy(variable) used:%llu\n", end - start);
> 
>         return 0;
> }
> 
> 
> 
> 
> 
> -- 
> Regards,
> Haifeng
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] some questions about  rte_memcpy
  2015-01-22 15:21             ` Bruce Richardson
@ 2015-01-23  2:58               ` Linhaifeng
  0 siblings, 0 replies; 9+ messages in thread
From: Linhaifeng @ 2015-01-23  2:58 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 2015/1/22 23:21, Bruce Richardson wrote:
> This (size_c) is a run-time constant, not a compile-time constant. To trigger the
> memcpy optimizations inside the compiler, the size value must be constant at
> compile time.


Hi, Bruce

You are right. When use compile-time constant memcpy is faster.Thank you for all.

Here is my test result:

rte_memcpy(constant) size:8 time:876
rte_memcpy(variable) size:8 time:2824
rte_memcpy(constant) size:16 time:868
rte_memcpy(variable) size:16 time:4436
rte_memcpy(constant) size:32 time:856
rte_memcpy(variable) size:32 time:3264
rte_memcpy(constant) size:48 time:872
rte_memcpy(variable) size:48 time:3972
rte_memcpy(constant) size:64 time:856
rte_memcpy(variable) size:64 time:3644
rte_memcpy(constant) size:128 time:868
rte_memcpy(variable) size:128 time:4720
rte_memcpy(constant) size:256 time:868
rte_memcpy(variable) size:256 time:9624

Here is my test program(Who know how to use a loop to test 'constant memcpy'?):

#include <stdio.h>
#include <rte_memcpy.h>
#include <rte_cycles.h>


int main(int narg, char** args)
{
        int i,t;
        char buf[256];
        int tests[7] = {8,16,32,48,64,128,256};
        char buf8[8],buf16[16],buf32[32],buf48[48],buf64[64],buf128[128],buf256[256];
        uint64_t start, end;
        int times = 9999999;
        uint64_t result_c[7];

        if (narg < 2) {
                printf("usage:./rte_memcpy_test times\n");
                return -1;
        }

        times = atoi(args[1]);

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf8, buf8, sizeof buf8);
        }
        end = rte_rdtsc();
        result_c[0] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf16, buf16, sizeof buf16);
        }
        end = rte_rdtsc();
        result_c[1] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf32, buf32, sizeof buf32);
        }
        end = rte_rdtsc();
        result_c[2] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf48, buf48, sizeof buf48);
        }
        end = rte_rdtsc();
        result_c[3] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf64, buf64, sizeof buf64);
        }
        end = rte_rdtsc();
        result_c[4] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf128, buf128, sizeof buf128);
        }
        end = rte_rdtsc();
        result_c[5] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf256, buf256, sizeof buf256);
        }
        end = rte_rdtsc();
        result_c[6] = end - start;

        for (i = 0; i < (sizeof tests / sizeof tests[0]); i++) {
                start = rte_rdtsc();
                for(t = 0; t < times; t++) {
                        rte_memcpy(buf, buf, tests[i]);
                }
                end = rte_rdtsc();
                printf("rte_memcpy(constant) size:%d time:%llu\n", tests[i], result_c[i]);
                printf("rte_memcpy(variable) size:%d time:%llu\n", tests[i], end - start);
        }

        return 0;
}

-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-23  2:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-22  3:39 [dpdk-dev] some questions about rte_memcpy Linhaifeng
2015-01-22  4:45 ` Matthew Hall
2015-01-22  5:32   ` Linhaifeng
2015-01-22  7:35     ` Matthew Hall
2015-01-22 10:23       ` Tetsuya Mukawa
2015-01-22 11:34         ` Bruce Richardson
2015-01-22 12:53           ` Linhaifeng
2015-01-22 15:21             ` Bruce Richardson
2015-01-23  2:58               ` Linhaifeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).