From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ci-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 811F4A057C
	for <public@inbox.dpdk.org>; Thu, 26 Mar 2020 21:54:45 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 430384CA7;
	Thu, 26 Mar 2020 21:54:45 +0100 (CET)
Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com
 [66.111.4.27]) by dpdk.org (Postfix) with ESMTP id E2C334C8E;
 Thu, 26 Mar 2020 21:54:43 +0100 (CET)
Received: from compute7.internal (compute7.nyi.internal [10.202.2.47])
 by mailout.nyi.internal (Postfix) with ESMTP id 8A6775C0232;
 Thu, 26 Mar 2020 16:54:43 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute7.internal (MEProxy); Thu, 26 Mar 2020 16:54:43 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding:content-type; s=mesmtp;
 bh=o8sVJovkUzchPCPrdQnAzx6sxUHa0J/WtvUw73x2Wjk=; b=BndEJI22DWw8
 wPk0erSsuI55jMajN8Ytkj99z+BPHQchnfGCICJM8To2Qv4F7taRfw0M94lXvqaU
 hdLK/JRJPap3nloFrb12v27m/D7183nkNctjWR9fba/VuEAWuA0GD67xfTSTiT1b
 +VbZI5R8yjPpsn8ySpleBalA9yD7sV8=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:content-type
 :date:from:in-reply-to:message-id:mime-version:references
 :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm2; bh=o8sVJovkUzchPCPrdQnAzx6sxUHa0J/WtvUw73x2W
 jk=; b=Z0GVJyl9t4t3tS92BLMZ3VhlN0S1dINHTnjNT59hTglPfRFVqQQpB/CFa
 7+uROwUNJafLguGP08SmiE9GqjL5iu1LRgNoL7bWe6RLQGjj09H0vAYBCiNSd+vu
 iry7GCwiCSG8JirPVeWxjqeGgHkr9TyGWosE4T3gFI/TyGo6uhq3CDNE6c9eugSz
 FxZYfKhiwSy+mxmSHJV2gJzf5969yA81agRkF6gbNz3BmMJJAtnGUXzLg+lSjfh0
 ZMFiJTyg2xqwwhrojsxerdG0lnoAABSzYiQ3UJy+RG523ielOVIuLwUWBFDUVLoG
 T0MC/Zo2+ur//OVf/25kfqA/FIarA==
X-ME-Sender: <xms:kxZ9XmFh2FXZKidQZPQWZ8a-qBQk3S1EJEV53kzwWKS9VB7pg8kAsw>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrudehjedgjeekucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhephffvufffkfgjfhgggfgtsehtqhertddttdejnecuhfhrohhmpefvhhhomhgr
 shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecukf
 hppeejjedrudefgedrvddtfedrudekgeenucevlhhushhtvghrufhiiigvpedtnecurfgr
 rhgrmhepmhgrihhlfhhrohhmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvght
X-ME-Proxy: <xmx:kxZ9XiO4Bk7-pbvheyVUmOO-WnvEMbtFL-eRskys6zEeB7KyGVmRRw>
 <xmx:kxZ9XqUk9i2VTvWg8Oi6LYhoZV_wLqkzjZcaf_z_Vc--3jJa9NueYA>
 <xmx:kxZ9XhwW1WShGShNU8hk99XCopgm6hOUzV5IFmyeVSpN-jKuhVEmWg>
 <xmx:kxZ9XhvWXAcxEPLqc1PwDr2Y5imCp0pIquwJW9T56m-DhwrSODeglw>
Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184])
 by mail.messagingengine.com (Postfix) with ESMTPA id 57293328005A;
 Thu, 26 Mar 2020 16:54:42 -0400 (EDT)
From: Thomas Monjalon <thomas@monjalon.net>
To: Hrvoje Habjanic <hrvoje.habjanic@zg.ht.hr>
Cc: users@dpdk.org, galco@mellanox.com, asafp@mellanox.com, olgas@mellanox.com,
 ci@dpdk.org
Date: Thu, 26 Mar 2020 21:54:40 +0100
Message-ID: <2735222.2VHbPRQshP@xps>
In-Reply-To: <ddc6792a-7efc-ab25-7193-c5ba0c7387f9@zg.ht.hr>
References: <c188c7eb-ee23-8ca4-0e4a-69948a38f425@zg.ht.hr>
 <f333c4a5-8d16-2947-24ce-d06b4abf60c0@zg.ht.hr>
 <ddc6792a-7efc-ab25-7193-c5ba0c7387f9@zg.ht.hr>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"
Subject: Re: [dpdk-ci] [dpdk-users] DPDK TX problems
X-BeenThere: ci@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK CI discussions <ci.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/ci>,
 <mailto:ci-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/ci/>
List-Post: <mailto:ci@dpdk.org>
List-Help: <mailto:ci-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/ci>,
 <mailto:ci-request@dpdk.org?subject=subscribe>
Errors-To: ci-bounces@dpdk.org
Sender: "ci" <ci-bounces@dpdk.org>

Thanks for the interesting feedback.
It seems we should test this performance use case in our labs.


18/02/2020 09:36, Hrvoje Habjanic:
> On 08. 04. 2019. 11:52, Hrvoje Habjani=C4=87 wrote:
> > On 29/03/2019 08:24, Hrvoje Habjani=C4=87 wrote:
> >>> Hi.
> >>>
> >>> I did write an application using dpdk 17.11 (did try also with 18.11),
> >>> and when doing some performance testing, i'm seeing very odd behavior.
> >>> To verify that this is not because of my app, i did the same test with
> >>> l2fwd example app, and i'm still confused by results.
> >>>
> >>> In short, i'm trying to push a lot of L2 packets through dpdk engine -
> >>> packet processing is minimal. When testing, i'm starting with small
> >>> number of packets-per-second, and then gradually increase it to see
> >>> where is the limit. At some point, i do reach this limit - packets st=
art
> >>> to get dropped. And this is when stuff become weird.
> >>>
> >>> When i reach peek packet rate (at which packets start to get dropped)=
, i
> >>> would expect that reducing packet rate will remove packet drops. But,
> >>> this is not the case. For example, let's assume that peek packet rate=
 is
> >>> 3.5Mpps. At this point everything works ok. Increasing pps to 4.0Mpps,
> >>> makes a lot of dropped packets. When reducing pps back to 3.5Mpps, app
> >>> is still broken - packets are still dropped.
> >>>
> >>> At this point, i need to drastically reduce pps (1.4Mpps) to make
> >>> dropped packets go away. Also, app is unable to successfully forward
> >>> anything beyond this 1.4M, despite the fact that in the beginning it =
did
> >>> forward 3.5M! Only way to recover is to restart the app.
> >>>
> >>> Also, sometimes, the app just stops forwarding any packets - packets =
are
> >>> received (as seen by counters), but app is unable to send anything ba=
ck.
> >>>
> >>> As i did mention, i'm seeing the same behavior with l2fwd example app=
=2E I
> >>> did test dpdk 17.11 and also dpdk 18.11 - the results are the same.
> >>>
> >>> My test environment is HP DL380G8, with 82599ES 10Gig (ixgbe) cards,
> >>> connected with Cisco nexus 9300 sw. On the other side is ixia test
> >>> appliance. Application is run in virtual machine (VM), using KVM
> >>> (openstack, with sriov enabled, and numa restrictions). I did check t=
hat
> >>> VM is using only cpu's from NUMA node on which network card is
> >>> connected, so there is no cross-numa traffic. Openstack is Queens,
> >>> Ubuntu is Bionic release. Virtual machine is also using ubuntu bionic
> >>> as OS.
> >>>
> >>> I do not know how to debug this? Does someone else have the same
> >>> observations?
> >>>
> >>> Regards,
> >>>
> >>> H.
> >> There are additional findings. It seems that when i reach peak pps
> >> rate, application is not fast enough, and i can see rx missed errors
> >> on card statistics on the host. At the same time, tx side starts to
> >> show problems (tx burst starts to show it did not send all packets).
> >> Shortly after that, tx falls apart completely and top pps rate drops.
> >>
> >> Since i did not disable pause frames, i can see on the switch "RX
> >> pause" frame counter is increasing. On the other hand, if i disable
> >> pause frames (on the nic of server), host driver (ixgbe) reports "TX
> >> unit hang" in dmesg, and issues card reset. Of course, after reset
> >> none of the dpdk apps in VM's on this host does not work.
> >>
> >> Is it possible that at time of congestion DPDK does not release mbufs
> >> back to the pool, and tx ring becomes "filled" with zombie packets
> >> (not send by card and also having ref counter as they are in use)?
> >>
> >> Is there a way to check mempool or tx ring for "left-owers"? Is is
> >> possible to somehow "flush" tx ring and/or mempool?
> >>
> >> H.
> > After few more test, things become even weirder - if i do not free mbufs
> > which are not sent, but resend them again, i can "survive" over-the-peek
> > event! But, then peek rate starts to drop gradually ...
> >
> > I would ask if someone can try this on their platform and report back? I
> > would really like to know if this is problem with my deployment, or
> > there is something wrong with dpdk?
> >
> > Test should be simple - use l2fwd or l3fwd, and determine max pps. Then
> > drive pps 30%over max, and then return back and confirm that you can
> > still get max pps.
> >
> > Thanks in advance.
> >
> > H.
> >
>=20
> I did receive few mails from users facing this issue, asking how it was
> resolved.
>=20
> Unfortunately, there is no real fix. It seems that this issue is related
> to card and hardware used. I'm still not sure which is more to blame,
> but the combination i had is definitely problematic.
>=20
> Anyhow, in the end, i did conclude that card driver have some issues
> when it is saturated with packets. My suspicion is that driver/software
> does not properly free packets, and then DPDK mempool becomes
> fragmented, and this causes performance drops. Restarting software
> releases pools, and restores proper functionality.
>=20
> After no luck with ixgbe, we migrated to Mellanox (4LX), and now there
> is no more of this permanent performance drop. With mlx, when limit is
> reached, reducing number of packets restores packet forwarding, and this
> limit seems to be stable.
>=20
> Also, we moved to newer servers - DL380G10, and got significant
> performance increase. Also, we moved to newer switch (also cisco), with
> 25G ports, which reduced latency - almost by factor of 2!
>=20
> I did not try old ixgbe on newer server, but i did try Intel's XL710,
> and it is not as happy as Mellanox. It gives better PPS, but it is more
> unstable in terms of maximum bw (has similar issues as ixgbe).
>=20
> Regards,
>=20
> H.