Hi,Stephen
Thank you very much for your reply!
>I would just replace all of the rte_memcpy with memcpy  
I will replace all of the rte_memcpy with memcpy.
>I expect that rte_memcpy() is able to do better than memcpy() for larger copies because it is
>likely to use bigger vector instructions and check for alignment.
>For small copies just doing the mov's directly is going to be as fast or faster.
>In fact, lots of places in DPDK should
>replace rte_memcpy() with simple structure assignment to preserve type safety.
I don't know the dividing line(the size of the data) between rte_memcpy and memcpy.
We simply test 1500 bytes of replication, memcpy seems to be faster, maybe our test is not accurate enough.
>This is somewhat historical data, it might be wrong. It would be worthwhile to have benchmarks
>across different sizes (variable and fixed), different compilers, and different CPU's.
>There might be surprising results.
So I hope this can go on and provide a more professional rte_memcpy manual.Thanks!
Huichao,Cai