From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sender-pp-091.zoho.com (sender-pp-091.zoho.com [135.84.80.236]) by dpdk.org (Postfix) with ESMTP id 354E4493D for ; Thu, 6 Sep 2018 08:10:22 +0200 (CEST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=date:from:to:cc:message-id:subject:mime-version:content-type:user-agent; b=jg4VyM2mjI9RrTDL9ARXFWiATnRMEIK1PuyMw+CJQP17cEwOx/YROPCZdOlaTdQYcUS1e+cjRVYq 5k3jS65ZJ1sBXl27CEpN9vYIqjoz4iS98tHR7laZDMvFsORh0t8h Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1536214217019360.6964719901648; Wed, 5 Sep 2018 23:10:17 -0700 (PDT) Date: Thu, 06 Sep 2018 10:40:17 +0430 From: Saber Rezvani To: "Wiles, Keith" Cc: "Stephen Hemminger" , "dev@dpdk.org" Message-Id: <165ad80b137.e083efff311035.3287926262954203334@zoho.com> MIME-Version: 1.0 X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] IXGBE throughput loss with 4+ cores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 06:10:22 -0000 On 08/29/2018 11:22 PM, Wiles, Keith wrote: > >> On Aug 29, 2018, at 12:19 = PM, Saber Rezvani wrote: >> >> >> >> On 08/29/2018 01:39= AM, Wiles, Keith wrote: >>>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani wrote: >>>> >>>> >>>> >>>> On 08/28/2018 11:39 PM, Wiles, = Keith wrote: >>>>> Which version of Pktgen? I just pushed a patch in 3.5.3 = to fix a performance problem. >>>> I use Pktgen verion 3.0.0, indeed it is = O.k as far as I have one core. (10 Gb/s) but when I increase the number of = core (one core per queue) then I loose some performance (roughly 8.5 Gb/s f= or 8-core). In my scenario Pktgen shows it is generating at line rate, but = receiving 8.5 Gb/s. >>>> Is it because of Pktgen??? >>> Normally Pktgen can= receive at line rate up to 10G 64 byte frames, which means Pktgen should n= ot be the problem. You can verify that by looping the cable from one port t= o another on the pktgen machine to create a external loopback. Then send tr= affic what ever you can send from one port you should be able to receive th= ose packets unless something is configured wrong. >>> >>> Please send me th= e command line for pktgen. >>> >>> >>> In pktgen if you have this config -m= =E2=80=9C[1-4:5-8].0=E2=80=9D then you have 4 cores sending traffic and 4 = core receiving packets. >>> >>> In this case the TX cores will be sending t= he packets on all 4 lcores to the same port. On the rx side you have 4 core= s polling 4 rx queues. The rx queues are controlled by RSS, which means the= RX traffic 5 tuples hash must divide the inbound packets across all 4 queu= es to make sure each core is doing the same amount of work. If you are send= ing only a single packet on the Tx cores then only one rx queue be used. >>= > >>> I hope that makes sense. >> I think there is a misunderstanding of th= e problem. Indeed the problem is not the Pktgen. >> Here is my command --> = ./app/app/x86_64-native-linuxapp-gcc/pktgen -c ffc0000 -n 4 -w 84:00.0 -w 8= 4:00.1 --file-prefix pktgen_F2 --socket-mem 1000,2000,1000,1000 -- -T -P -m= "[18-19:20-21].0, [22:23].1" >> >> The problem is when I run the symmetric= _mp example for $numberOfProcesses=3D8 cores, then I have less throughput (= roughly 8.4 Gb/s). but when I run it for $numberOfProcesses=3D3 cores throu= ghput is 10G. >> for i in `seq $numberOfProcesses`; >> do >> .... some calc= ulation goes here..... >> symmetric_mp -c $coremask -n 2 --proc-type=3Dauto= -w 0b:00.0 -w 0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 --= -p 3 --num-procs=3D$numberOfProcesses --proc-id=3D$procid"; >> ..... >> do= ne > Most NICs have a limited amount of memory on the NIC and when you star= t to segment that memory because you are using more queues it can effect pe= rformance. > > In one of the NICs if you go over say 6 or 5 queues the memo= ry per queue for Rx/Tx packets starts to become a bottle neck as you do not= have enough memory in the Tx/Rx queues to hold enough packets. This can ca= use the NIC to drop Rx packets because the host can not pull the data from = the NIC or Rx ring on the host fast enough. This seems to be the problem as= the amount of time to process a packet on the host has not changed only th= e amount of buffer space in the NIC as you increase queues. > > I am not su= re this is your issue, but I figured I would state this point. What you sai= d sounded logical, but is there away that I can be sure? I mean are there s= ome registers at NIC which show the number of packet loss on NIC? or does D= PDK have an API which shows the number of packet loss at NIC level? > >> I = am trying find out what makes this loss! >> >> >>>>>> On Aug 28, 2018, at 1= 2:05 PM, Saber Rezvani wrote: >>>>>> >>>>>> >>>>>> >>>>>= > On 08/28/2018 08:31 PM, Stephen Hemminger wrote: >>>>>>> On Tue, 28 Aug 2= 018 17:34:27 +0430 >>>>>>> Saber Rezvani wrote: >>>>>>> = >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> I have run multi_process/symmetric_= mp example in DPDK example directory. >>>>>>>> For a one process its throug= hput is line rate but as I increase the >>>>>>>> number of cores I see decr= ease in throughput. For example, If the number >>>>>>>> of queues set to 4 = and each queue assigns to a single core, then the >>>>>>>> throughput will = be something about 9.4. if 8 queues, then throughput >>>>>>>> will be 8.5. = >>>>>>>> >>>>>>>> I have read the following, but it was not convincing. >>>= >>>>> >>>>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html = >>>>>>>> >>>>>>>> >>>>>>>> I am eagerly looking forward to hearing from you= , all. >>>>>>>> >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> >>>>>>>> Saber >>>>= >>>> >>>>>>>> >>>>>>> Not completely surprising. If you have more cores tha= n packet line rate >>>>>>> then the number of packets returned for each cal= l to rx_burst will be less. >>>>>>> With large number of cores, most of the= time will be spent doing reads of >>>>>>> PCI registers for no packets! >>= >>>> Indeed pktgen says it is generating traffic at line rate, but receivin= g less than 10 Gb/s. So, it that case there should be something that causes= the reduction in throughput :( >>>>>> >>>>>> >>>>> Regards, >>>>> Keith >>= >>> >>>> >>> Regards, >>> Keith >>> >> Best regards, >> Saber > Regards, > = Keith > Best regards, Saber >From xiaolong.ye@intel.com Thu Sep 6 08:29:09 2018 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id E2AFC326C for ; Thu, 6 Sep 2018 08:29:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2018 23:29:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,334,1531810800"; d="scan'208";a="67936798" Received: from yexl-server.sh.intel.com ([10.67.110.207]) by fmsmga007.fm.intel.com with ESMTP; 05 Sep 2018 23:28:51 -0700 From: Xiaolong Ye To: dev@dpdk.org, Maxime Coquelin , Tiwei Bie , Zhihong Wang Cc: xiao.w.wang@intel.com, Xiaolong Ye Date: Thu, 6 Sep 2018 21:16:52 +0800 Message-Id: <20180906131653.10752-1-xiaolong.ye@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [dpdk-dev] [PATCH v1 1/2] vhost: introduce rte_vdpa_get_device_num api X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 06:29:09 -0000 Signed-off-by: Xiaolong Ye --- lib/librte_vhost/rte_vdpa.h | 3 +++ lib/librte_vhost/vdpa.c | 6 ++++++ 2 files changed, 9 insertions(+) diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index 90465ca26..b8223e337 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -84,4 +84,7 @@ rte_vdpa_find_device_id(struct rte_vdpa_dev_addr *addr); struct rte_vdpa_device * __rte_experimental rte_vdpa_get_device(int did); +/* Get current available vdpa device number */ +int __rte_experimental +rte_vdpa_get_device_num(void); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index c82fd4370..c2c5dff1d 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -113,3 +113,9 @@ rte_vdpa_get_device(int did) return vdpa_devices[did]; } + +int +rte_vdpa_get_device_num(void) +{ + return vdpa_device_num; +} -- 2.18.0.rc1.1.g6f333ff2f