From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by dpdk.org (Postfix) with ESMTP id E10EA7CFA for ; Thu, 19 Oct 2017 10:33:37 +0200 (CEST) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 5EFD320D4F; Thu, 19 Oct 2017 04:33:37 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Thu, 19 Oct 2017 04:33:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=mesmtp; bh=LlJVhJHPvnmLiyu0SPK9OcHbm1 VxQ582PWVPrUzhr+c=; b=Eg40dK1lpe+Q05hOlkoYIeEISGy2P/jyG0AzTGd3G3 5HVeIRf7GfBZTjbjNOuqXVYEWeiqQ5BMALQycJwcQDBCFAjySTGrannQAWXs8IiE cuQA3OopHx1DdbkqbX7iNOYG0/TbghpY7yjoPGGp33jruZwxhkJfGCW3vzLrNq+y k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=LlJVhJ HPvnmLiyu0SPK9OcHbm1VxQ582PWVPrUzhr+c=; b=kULPmjBZL+gdXD2PA2lRTP 6ZGgJpUvy66bDMYEPeyvfsGm9WLlZ6oPD4gFRlbbOhaWH7mmH3Ll+CZuk2bOzcwY bRlIZqNn+nG4VX+sWougGL+Qh9FVlcPp9BQZTZWwJOdAhRxEvxH5BUwiGOb9ayCn W/NR+VGJk63NPLhypwhCWKzsG/l91Ku+0VxRmsn32b3wxZdvnsAd5HY0I8quzOMA UZsk69EqxufBy9VhFL4mZGTmAhg5D5z9/28csuh+cqe2dcExagbSuhxJneF4APPi iwcMZY7/CR2hoY+JLHkXupGFaZLbkjkaEkAgWWhOIrd0F0vtJvZy3PH3dNJTfRMw == X-ME-Sender: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 0D9E37E674; Thu, 19 Oct 2017 04:33:37 -0400 (EDT) From: Thomas Monjalon To: "Li, Xiaoyun" Cc: "Ananyev, Konstantin" , "Richardson, Bruce" , dev@dpdk.org, "Lu, Wenzhuo" , "Zhang, Helin" , "ophirmu@mellanox.com" Date: Thu, 19 Oct 2017 10:33:36 +0200 Message-ID: <3438028.jIYWTcBuhA@xps> In-Reply-To: References: <1507206794-79941-1-git-send-email-xiaoyun.li@intel.com> <1661434.2yK3chXuTC@xps> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Oct 2017 08:33:38 -0000 19/10/2017 09:51, Li, Xiaoyun: > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > 19/10/2017 04:45, Li, Xiaoyun: > > > Hi > > > > > > > > > > > > The significant change of this patch is to call a function > > > > > > pointer for packet size > 128 (RTE_X86_MEMCPY_THRESH). > > > > > The perf drop is due to function call replacing inline. > > > > > > > > > > > Please could you provide some benchmark numbers? > > > > > I ran memcpy_perf_test which would show the time cost of memcpy. I > > > > > ran it on broadwell with sse and avx2. > > > > > But I just draw pictures and looked at the trend not computed the > > > > > exact percentage. Sorry about that. > > > > > The picture shows results of copy size of 2, 4, 6, 8, 9, 12, 16, > > > > > 32, 64, 128, 192, 256, 320, 384, 448, 512, 768, 1024, 1518, 1522, > > > > > 1536, 1600, 2048, 2560, 3072, 3584, 4096, 4608, 5120, 5632, 6144, > > > > > 6656, 7168, > > > > 7680, 8192. > > > > > In my test, the size grows, the drop degrades. (Using copy time > > > > > indicates the > > > > > perf.) From the trend picture, when the size is smaller than 128 > > > > > bytes, the perf drops a lot, almost 50%. And above 128 bytes, it > > > > > approaches the original dpdk. > > > > > I computed it right now, it shows that when greater than 128 bytes > > > > > and smaller than 1024 bytes, the perf drops about 15%. When above > > > > > 1024 bytes, the perf drops about 4%. > > > > > > > > > > > From a test done at Mellanox, there might be a performance > > > > > > degradation of about 15% in testpmd txonly with AVX2. > > > > > > > > > > I did tests on X710, XXV710, X540 and MT27710 but didn't see > > performance degradation. > > > > > > I used command "./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -- - > > I" and set fwd txonly. > > > I tested it on v17.11-rc1, then revert my patch and tested it again. > > > Show port stats all and see the throughput pps. But the results are similar > > and no drop. > > > > > > Did I miss something? > > > > I do not understand. Yesterday you confirmed a 15% drop with buffers > > between > > 128 and 1024 bytes. > > But you do not see this drop in your txonly tests, right? > > > Yes. The drop is using test. > Using command "make test -j" and then " ./build/app/test -c f -n 4 " > Then run "memcpy_perf_autotest" > The results are the cycles that memory copy costs. > But I just use it to show the trend because I heard that it's not recommended to use micro benchmarks like test_memcpy_perf for memcpy performance report as they aren't likely able to reflect performance of real world applications. Yes real applications can hide the memcpy cost. Sometimes, the cost appear for real :) > Details can be seen at https://software.intel.com/en-us/articles/performance-optimization-of-memcpy-in-dpdk > > And I didn't see drop in testpmd txonly test. Maybe it's because not a lot memcpy calls. It has been seen in a mlx4 use-case using more memcpy. I think 15% in micro-benchmark is too much. What can we do? Raise the threshold? > > > > Another thing, I will test testpmd txonly with intel nics and > > > > mellanox these days. > > > > And try adjusting the RTE_X86_MEMCPY_THRESH to see if there is any > > > > improvement. > > > > > > > > > > Is there someone else seeing a performance degradation?