From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <irsaber@zoho.com>
Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237])
 by dpdk.org (Postfix) with ESMTP id D27CF2BA3
 for <dev@dpdk.org>; Thu, 30 Aug 2018 06:08:55 +0200 (CEST)
ARC-Seal: i=1; a=rsa-sha256; t=1535602133; cv=none; d=zoho.com; s=zohoarc; 
 b=nqZQNUfbQqdsfi42xBaCiTUgvqhVja6eZxkkw6nnccPrpGWRtEjCI8X4h/VCgXZWCXqj2jv9b7QhRuStttiPY79UvLcQ5dcs3JMTYYwV9H1eCruSnL0l81K5IuSJ5Wp/ORP7jjNcSpJeVEQ89Aqr5d5n9C9gpRQxupZ97SRKHXM=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc; t=1535602133;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results;
 bh=jaMJYge6akClnHXdsN3etJT+lKikOVkvvs1rfbl7PFI=; 
 b=k5eZa4ktR2tRnVZxEhmZEIBGd9FivKUkeD0j4u4Rvpvx9/71UunDMqODBOpgFjB+TLKjnVpUSa1zYpW7t9wjR6l9+kjQnrxVIfwJOjiYNNmQRfN2louEjLmu391L/9r//Q81/AwqUDXk9C3M0MRQjg3xWPEv8i1irZqjXgcP9OM=
ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass  header.i=zoho.com;
 spf=pass  smtp.mailfrom=irsaber@zoho.com;
 dmarc=pass header.from=<irsaber@zoho.com> header.from=<irsaber@zoho.com>
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; 
 h=subject:to:cc:references:from:message-id:date:user-agent:mime-version:in-reply-to:content-type;
 b=Iepqf8r4BIX6tyNkjMbOau0kdQB7jhSAoBDiNOMdG9j9fMsHGhxAboi31tj4d9rNtO6VbrJ+3t7D
 uNkRgLOMCH/RpwCELdssw+9EprHSH/gcRIsR6OvI8kb08ByOiXyo  
Received: from [192.168.1.8] (188.210.190.126 [188.210.190.126]) by
 mx.zohomail.com with SMTPS id 1535602131355587.5209421937649;
 Wed, 29 Aug 2018 21:08:51 -0700 (PDT)
To: "Wiles, Keith" <keith.wiles@intel.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
 "dev@dpdk.org" <dev@dpdk.org>
References: <74400e6a-91ba-3648-0980-47ceae1089a7@zoho.com>
 <20180828090142.1262c5ea@shemminger-XPS-13-9360>
 <ce7462f4-a4f5-3a8f-1817-2604f225546a@zoho.com>
 <CC7A1A46-ED26-415E-AB52-BEA9AAEBBD7A@intel.com>
 <9e7b00bb-285b-fe37-f298-6d20d47a77ec@zoho.com>
 <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com>
 <b50d5b50-cc47-50c5-5e6c-a26a76e5a0e1@zoho.com>
 <92108FFB-840B-4966-BFF9-5084864DB116@intel.com>
From: Saber Rezvani <irsaber@zoho.com>
Message-ID: <036c2f35-8609-763f-7976-a6a16741c636@zoho.com>
Date: Thu, 30 Aug 2018 08:38:45 +0430
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <92108FFB-840B-4966-BFF9-5084864DB116@intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Language: en-US
X-ZohoMailClient: External
Subject: Re: [dpdk-dev] IXGBE throughput loss with 4+ cores
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Aug 2018 04:08:56 -0000


On 08/29/2018 11:22 PM, Wiles, Keith wrote:
>
>> On Aug 29, 2018, at 12:19 PM, Saber Rezvani <irsaber@zoho.com> wrote:
>>
>>
>>
>> On 08/29/2018 01:39 AM, Wiles, Keith wrote:
>>>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani <irsaber@zoho.com> wrote:
>>>>
>>>>
>>>>
>>>> On 08/28/2018 11:39 PM, Wiles, Keith wrote:
>>>>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a  per=
formance problem.
>>>> I use Pktgen verion 3.0.0, indeed it is O.k as far as I  have one core=
. (10 Gb/s) but when I increase the number of core (one core per queue) the=
n I loose some performance (roughly 8.5 Gb/s for 8-core). In my scenario Pk=
tgen shows it is generating at line rate, but receiving 8.5 Gb/s.
>>>> Is it because of Pktgen???
>>> Normally Pktgen can receive at line rate up to 10G 64 byte frames, whic=
h means Pktgen should not be the problem. You can verify that by looping th=
e cable from one port to another on the pktgen machine to create a external=
 loopback. Then send traffic what ever you can send from one port you shoul=
d be able to receive those packets unless something is configured wrong.
>>>
>>> Please send me the command line for pktgen.
>>>
>>>
>>> In pktgen if you have this config -m =E2=80=9C[1-4:5-8].0=E2=80=9D then=
 you have 4 cores sending traffic and 4 core receiving packets.
>>>
>>> In this case the TX cores will be sending the packets on all 4 lcores t=
o the same port. On the rx side you have 4 cores polling 4 rx queues. The r=
x queues are controlled by RSS, which means the RX traffic 5 tuples hash mu=
st divide the inbound packets across all 4 queues to make sure each core is=
 doing the same amount of work. If you are sending only a single packet on =
the Tx cores then only one rx queue be used.
>>>
>>> I hope that makes sense.
>> I think there is a misunderstanding of the problem. Indeed the problem i=
s not the Pktgen.
>> Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c ff=
c0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem 1000,=
2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1"
>>
>> The problem is when I run the symmetric_mp example for $numberOfProcesse=
s=3D8 cores, then I have less throughput (roughly 8.4 Gb/s). but when I run=
 it for $numberOfProcesses=3D3 cores throughput is 10G.
>> for i in `seq $numberOfProcesses`;
>>      do
>>              .... some calculation goes here.....
>>               symmetric_mp -c $coremask -n 2 --proc-type=3Dauto -w 0b:00=
.0 -w 0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3 --n=
um-procs=3D$numberOfProcesses --proc-id=3D$procid";
>>               .....
>>      done
> Most NICs have a limited amount of memory on the NIC and when you start t=
o segment that memory because you are using more queues it can effect perfo=
rmance.
>
> In one of the NICs if you go over say 6 or 5 queues the memory per queue =
for Rx/Tx packets starts to become a bottle neck as you do not have enough =
memory in the Tx/Rx queues to hold enough packets. This can cause the NIC t=
o drop Rx packets because the host can not pull the data from the NIC or Rx=
 ring on the host fast enough. This seems to be the problem as the amount o=
f time to process a packet on the host has not changed only the amount of b=
uffer space in the NIC as you increase queues.
>
> I am not sure this is your issue, but I figured I would state this point.
What you said sounded logical, but is there away that I can be sure? I=20
mean are there some registers at NIC which show the number of packet=20
loss on NIC? or does DPDK have an API which shows the number of packet=20
loss at NIC level?
>
>> I am trying find out what makes this loss!
>>
>>
>>>>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani <irsaber@zoho.com> wrote=
:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote:
>>>>>>> On Tue, 28 Aug 2018 17:34:27 +0430
>>>>>>> Saber Rezvani <irsaber@zoho.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have run multi_process/symmetric_mp example in DPDK example dire=
ctory.
>>>>>>>> For a one process its throughput is line rate but as I increase th=
e
>>>>>>>> number of cores I see decrease in throughput. For example, If the =
number
>>>>>>>> of queues set to 4 and each queue assigns to a single core, then t=
he
>>>>>>>> throughput will be something about 9.4. if 8 queues, then throughp=
ut
>>>>>>>> will be 8.5.
>>>>>>>>
>>>>>>>> I have read the following, but it was not convincing.
>>>>>>>>
>>>>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html
>>>>>>>>
>>>>>>>>
>>>>>>>> I am eagerly looking forward to hearing from you, all.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>>
>>>>>>>> Saber
>>>>>>>>
>>>>>>>>
>>>>>>> Not completely surprising. If you have more cores than packet line =
rate
>>>>>>> then the number of packets returned for each call to rx_burst will =
be less.
>>>>>>> With large number of cores, most of the time will be spent doing re=
ads of
>>>>>>> PCI registers for no packets!
>>>>>> Indeed pktgen says it is generating traffic at line rate, but receiv=
ing less than 10 Gb/s. So, it that case there should be something that caus=
es the reduction in throughput :(
>>>>>>
>>>>>>
>>>>> Regards,
>>>>> Keith
>>>>>
>>>>
>>> Regards,
>>> Keith
>>>
>> Best regards,
>> Saber
> Regards,
> Keith
>
Best regards,
Saber