From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237]) by dpdk.org (Postfix) with ESMTP id D27CF2BA3 for ; Thu, 30 Aug 2018 06:08:55 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; t=1535602133; cv=none; d=zoho.com; s=zohoarc; b=nqZQNUfbQqdsfi42xBaCiTUgvqhVja6eZxkkw6nnccPrpGWRtEjCI8X4h/VCgXZWCXqj2jv9b7QhRuStttiPY79UvLcQ5dcs3JMTYYwV9H1eCruSnL0l81K5IuSJ5Wp/ORP7jjNcSpJeVEQ89Aqr5d5n9C9gpRQxupZ97SRKHXM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1535602133; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=jaMJYge6akClnHXdsN3etJT+lKikOVkvvs1rfbl7PFI=; b=k5eZa4ktR2tRnVZxEhmZEIBGd9FivKUkeD0j4u4Rvpvx9/71UunDMqODBOpgFjB+TLKjnVpUSa1zYpW7t9wjR6l9+kjQnrxVIfwJOjiYNNmQRfN2louEjLmu391L/9r//Q81/AwqUDXk9C3M0MRQjg3xWPEv8i1irZqjXgcP9OM= ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass header.i=zoho.com; spf=pass smtp.mailfrom=irsaber@zoho.com; dmarc=pass header.from= header.from= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=subject:to:cc:references:from:message-id:date:user-agent:mime-version:in-reply-to:content-type; b=Iepqf8r4BIX6tyNkjMbOau0kdQB7jhSAoBDiNOMdG9j9fMsHGhxAboi31tj4d9rNtO6VbrJ+3t7D uNkRgLOMCH/RpwCELdssw+9EprHSH/gcRIsR6OvI8kb08ByOiXyo Received: from [192.168.1.8] (188.210.190.126 [188.210.190.126]) by mx.zohomail.com with SMTPS id 1535602131355587.5209421937649; Wed, 29 Aug 2018 21:08:51 -0700 (PDT) To: "Wiles, Keith" Cc: Stephen Hemminger , "dev@dpdk.org" References: <74400e6a-91ba-3648-0980-47ceae1089a7@zoho.com> <20180828090142.1262c5ea@shemminger-XPS-13-9360> <9e7b00bb-285b-fe37-f298-6d20d47a77ec@zoho.com> <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com> <92108FFB-840B-4966-BFF9-5084864DB116@intel.com> From: Saber Rezvani Message-ID: <036c2f35-8609-763f-7976-a6a16741c636@zoho.com> Date: Thu, 30 Aug 2018 08:38:45 +0430 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <92108FFB-840B-4966-BFF9-5084864DB116@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-ZohoMailClient: External Subject: Re: [dpdk-dev] IXGBE throughput loss with 4+ cores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Aug 2018 04:08:56 -0000 On 08/29/2018 11:22 PM, Wiles, Keith wrote: > >> On Aug 29, 2018, at 12:19 PM, Saber Rezvani wrote: >> >> >> >> On 08/29/2018 01:39 AM, Wiles, Keith wrote: >>>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani wrote: >>>> >>>> >>>> >>>> On 08/28/2018 11:39 PM, Wiles, Keith wrote: >>>>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a per= formance problem. >>>> I use Pktgen verion 3.0.0, indeed it is O.k as far as I have one core= . (10 Gb/s) but when I increase the number of core (one core per queue) the= n I loose some performance (roughly 8.5 Gb/s for 8-core). In my scenario Pk= tgen shows it is generating at line rate, but receiving 8.5 Gb/s. >>>> Is it because of Pktgen??? >>> Normally Pktgen can receive at line rate up to 10G 64 byte frames, whic= h means Pktgen should not be the problem. You can verify that by looping th= e cable from one port to another on the pktgen machine to create a external= loopback. Then send traffic what ever you can send from one port you shoul= d be able to receive those packets unless something is configured wrong. >>> >>> Please send me the command line for pktgen. >>> >>> >>> In pktgen if you have this config -m =E2=80=9C[1-4:5-8].0=E2=80=9D then= you have 4 cores sending traffic and 4 core receiving packets. >>> >>> In this case the TX cores will be sending the packets on all 4 lcores t= o the same port. On the rx side you have 4 cores polling 4 rx queues. The r= x queues are controlled by RSS, which means the RX traffic 5 tuples hash mu= st divide the inbound packets across all 4 queues to make sure each core is= doing the same amount of work. If you are sending only a single packet on = the Tx cores then only one rx queue be used. >>> >>> I hope that makes sense. >> I think there is a misunderstanding of the problem. Indeed the problem i= s not the Pktgen. >> Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c ff= c0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem 1000,= 2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1" >> >> The problem is when I run the symmetric_mp example for $numberOfProcesse= s=3D8 cores, then I have less throughput (roughly 8.4 Gb/s). but when I run= it for $numberOfProcesses=3D3 cores throughput is 10G. >> for i in `seq $numberOfProcesses`; >> do >> .... some calculation goes here..... >> symmetric_mp -c $coremask -n 2 --proc-type=3Dauto -w 0b:00= .0 -w 0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3 --n= um-procs=3D$numberOfProcesses --proc-id=3D$procid"; >> ..... >> done > Most NICs have a limited amount of memory on the NIC and when you start t= o segment that memory because you are using more queues it can effect perfo= rmance. > > In one of the NICs if you go over say 6 or 5 queues the memory per queue = for Rx/Tx packets starts to become a bottle neck as you do not have enough = memory in the Tx/Rx queues to hold enough packets. This can cause the NIC t= o drop Rx packets because the host can not pull the data from the NIC or Rx= ring on the host fast enough. This seems to be the problem as the amount o= f time to process a packet on the host has not changed only the amount of b= uffer space in the NIC as you increase queues. > > I am not sure this is your issue, but I figured I would state this point. What you said sounded logical, but is there away that I can be sure? I=20 mean are there some registers at NIC which show the number of packet=20 loss on NIC? or does DPDK have an API which shows the number of packet=20 loss at NIC level? > >> I am trying find out what makes this loss! >> >> >>>>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani wrote= : >>>>>> >>>>>> >>>>>> >>>>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: >>>>>>> On Tue, 28 Aug 2018 17:34:27 +0430 >>>>>>> Saber Rezvani wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> I have run multi_process/symmetric_mp example in DPDK example dire= ctory. >>>>>>>> For a one process its throughput is line rate but as I increase th= e >>>>>>>> number of cores I see decrease in throughput. For example, If the = number >>>>>>>> of queues set to 4 and each queue assigns to a single core, then t= he >>>>>>>> throughput will be something about 9.4. if 8 queues, then throughp= ut >>>>>>>> will be 8.5. >>>>>>>> >>>>>>>> I have read the following, but it was not convincing. >>>>>>>> >>>>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html >>>>>>>> >>>>>>>> >>>>>>>> I am eagerly looking forward to hearing from you, all. >>>>>>>> >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> >>>>>>>> Saber >>>>>>>> >>>>>>>> >>>>>>> Not completely surprising. If you have more cores than packet line = rate >>>>>>> then the number of packets returned for each call to rx_burst will = be less. >>>>>>> With large number of cores, most of the time will be spent doing re= ads of >>>>>>> PCI registers for no packets! >>>>>> Indeed pktgen says it is generating traffic at line rate, but receiv= ing less than 10 Gb/s. So, it that case there should be something that caus= es the reduction in throughput :( >>>>>> >>>>>> >>>>> Regards, >>>>> Keith >>>>> >>>> >>> Regards, >>> Keith >>> >> Best regards, >> Saber > Regards, > Keith > Best regards, Saber