From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237]) by dpdk.org (Postfix) with ESMTP id D6F78F11 for ; Wed, 29 Aug 2018 19:19:39 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; t=1535563176; cv=none; d=zoho.com; s=zohoarc; b=dpXA0jOTLVK9qHNWJPrtitzIdWfvfV2NUDNfDt4FOyMYItpiFUDXiyR5pWasGHyPLHkymRLHNBWu0JPyP/5JmALGqphZX5kKUkKh00rGlWSllWxksRTiXD1ZwILAjd4aP4sLw601I7jwNB6Y1fTuVGM+AoL8fE7XInooVFdhpKA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1535563176; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=YN4r45fCAyC/nXlm8XtmX+k4eA24H99PEHCd3pYI8mM=; b=EycExTZv+nVnKiFlvSoD4KcBZo6x9xYpLSmf4k+/0JrGVEJzb/I3rHgyWtN2SPyvfeE9ukXAtcRu35BLZdr6+EZmSwdB+/Hym753lwWneKGP6KRJPPAcoseYJ6C/uGB/0jXvNeXbQV5/ucelCjH9NcB7M5/ep1CxDNHlLlV/RXs= ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass header.i=zoho.com; spf=pass smtp.mailfrom=irsaber@zoho.com; dmarc=pass header.from= header.from= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=subject:to:cc:references:from:message-id:date:user-agent:mime-version:in-reply-to:content-type; b=rhEH2JqgCEDwK+SEWE+XCJpttHwKxZ35hDaL4PJABRGWz08iiBliiN9miX771SSmXVsgsi+WSmF+ Y4PIiMOqIy/K9XwTeSNzIvl6DJ1QFVfLkJKOXur4rQe9WY1WJs2x Received: from [192.168.1.8] (188.210.190.126 [188.210.190.126]) by mx.zohomail.com with SMTPS id 1535563174278648.1929375630998; Wed, 29 Aug 2018 10:19:34 -0700 (PDT) To: "Wiles, Keith" Cc: Stephen Hemminger , "dev@dpdk.org" References: <74400e6a-91ba-3648-0980-47ceae1089a7@zoho.com> <20180828090142.1262c5ea@shemminger-XPS-13-9360> <9e7b00bb-285b-fe37-f298-6d20d47a77ec@zoho.com> <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com> From: Saber Rezvani Message-ID: Date: Wed, 29 Aug 2018 21:49:26 +0430 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-ZohoMailClient: External Subject: Re: [dpdk-dev] IXGBE throughput loss with 4+ cores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Aug 2018 17:19:40 -0000 On 08/29/2018 01:39 AM, Wiles, Keith wrote: > >> On Aug 28, 2018, at 2:16 PM, Saber Rezvani wrote: >> >> >> >> On 08/28/2018 11:39 PM, Wiles, Keith wrote: >>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a perfo= rmance problem. >> I use Pktgen verion 3.0.0, indeed it is O.k as far as I have one core. = (10 Gb/s) but when I increase the number of core (one core per queue) then = I loose some performance (roughly 8.5 Gb/s for 8-core). In my scenario Pktg= en shows it is generating at line rate, but receiving 8.5 Gb/s. >> Is it because of Pktgen??? > Normally Pktgen can receive at line rate up to 10G 64 byte frames, which = means Pktgen should not be the problem. You can verify that by looping the = cable from one port to another on the pktgen machine to create a external l= oopback. Then send traffic what ever you can send from one port you should = be able to receive those packets unless something is configured wrong. > > Please send me the command line for pktgen. > > > In pktgen if you have this config -m =E2=80=9C[1-4:5-8].0=E2=80=9D then y= ou have 4 cores sending traffic and 4 core receiving packets. > > In this case the TX cores will be sending the packets on all 4 lcores to = the same port. On the rx side you have 4 cores polling 4 rx queues. The rx = queues are controlled by RSS, which means the RX traffic 5 tuples hash must= divide the inbound packets across all 4 queues to make sure each core is d= oing the same amount of work. If you are sending only a single packet on th= e Tx cores then only one rx queue be used. > > I hope that makes sense. I think there is a misunderstanding of the problem. Indeed the problem=20 is not the Pktgen. Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c=20 ffc0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem=20 1000,2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1" The problem is when I run the symmetric_mp example for=20 $numberOfProcesses=3D8 cores, then I have less throughput (roughly 8.4=20 Gb/s). but when I run it for $numberOfProcesses=3D3 cores throughput is 10G= . for i in `seq $numberOfProcesses`; =C2=A0=C2=A0=C2=A0 do =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .... so= me calculation goes here..... =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 s= ymmetric_mp -c $coremask -n 2 --proc-type=3Dauto -w 0b:00.0=20 -w 0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3=20 --num-procs=3D$numberOfProcesses --proc-id=3D$procid"; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .= .... =C2=A0=C2=A0=C2=A0 done I am trying find out what makes this loss! >>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani wrote: >>>> >>>> >>>> >>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: >>>>> On Tue, 28 Aug 2018 17:34:27 +0430 >>>>> Saber Rezvani wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I have run multi_process/symmetric_mp example in DPDK example direct= ory. >>>>>> For a one process its throughput is line rate but as I increase the >>>>>> number of cores I see decrease in throughput. For example, If the nu= mber >>>>>> of queues set to 4 and each queue assigns to a single core, then the >>>>>> throughput will be something about 9.4. if 8 queues, then throughput >>>>>> will be 8.5. >>>>>> >>>>>> I have read the following, but it was not convincing. >>>>>> >>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html >>>>>> >>>>>> >>>>>> I am eagerly looking forward to hearing from you, all. >>>>>> >>>>>> >>>>>> Best wishes, >>>>>> >>>>>> Saber >>>>>> >>>>>> >>>>> Not completely surprising. If you have more cores than packet line ra= te >>>>> then the number of packets returned for each call to rx_burst will be= less. >>>>> With large number of cores, most of the time will be spent doing read= s of >>>>> PCI registers for no packets! >>>> Indeed pktgen says it is generating traffic at line rate, but receivin= g less than 10 Gb/s. So, it that case there should be something that causes= the reduction in throughput :( >>>> >>>> >>> Regards, >>> Keith >>> >> >> > Regards, > Keith > Best regards, Saber