From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ff@ozog.com>
Received: from mo6.mail-out.ovh.net (8.mo6.mail-out.ovh.net [178.33.42.204])
 by dpdk.org (Postfix) with ESMTP id BB0A65320
 for <dev@dpdk.org>; Fri, 24 Jan 2014 10:19:37 +0100 (CET)
Received: from mail625.ha.ovh.net (b9.ovh.net [213.186.33.59])
 by mo6.mail-out.ovh.net (Postfix) with SMTP id 86E76FF901F
 for <dev@dpdk.org>; Fri, 24 Jan 2014 10:25:34 +0100 (CET)
Received: from b0.ovh.net (HELO queueout) (213.186.33.50)
 by b0.ovh.net with SMTP; 24 Jan 2014 11:20:13 +0200
Received: from lneuilly-152-23-9-75.w193-252.abo.wanadoo.fr (HELO pcdeff)
 (ff@ozog.com@193.252.40.75)
 by ns0.ovh.net with SMTP; 24 Jan 2014 11:20:12 +0200
From: =?iso-8859-1?Q?Fran=E7ois-Fr=E9d=E9ric_Ozog?= <ff@ozog.com>
To: "'Michael Quicquaro'" <michael.quicquaro@gmail.com>
References: <CAAD-K97KDoGVow6dVrKdRtCyD+wgJchO4OJFpGTRDgiaq5=Stw@mail.gmail.com>
 <CAFmpvUNHqpp0DhsFXtYQOxDObV_K5Qs2B=w7miynMADpP21C5A@mail.gmail.com>
 <CAAD-K9777Tp7tptsfXm6RbAEWJMdd6juGrthAe_MHB1U1JHfDw@mail.gmail.com>
In-Reply-To: <CAAD-K9777Tp7tptsfXm6RbAEWJMdd6juGrthAe_MHB1U1JHfDw@mail.gmail.com>
Date: Fri, 24 Jan 2014 10:18:05 +0100
Message-ID: <00a501cf18e5$30b3f070$921bd150$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac8Ykg7SlPjVqWfoRyCPoSkdhBF1TQASZizg
Content-Language: fr
X-Ovh-Tracer-Id: 3480156613019818201
X-Ovh-Remote: 193.252.40.75 (lneuilly-152-23-9-75.w193-252.abo.wanadoo.fr)
X-Ovh-Local: 213.186.33.20 (ns0.ovh.net)
X-OVH-SPAMSTATE: OK
X-OVH-SPAMSCORE: -100
X-OVH-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeejtddrheegucetufdoteggodetrfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
X-Spam-Check: DONE|U 0.5/N
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -100
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeejtddrheegucetufdoteggodetrfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] Rx-errors with testpmd (only 75% line rate)
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jan 2014 09:19:37 -0000

> -----Message d'origine-----
> De=A0: dev [mailto:dev-bounces@dpdk.org] De la part de Michael =
Quicquaro
> Envoy=E9=A0: vendredi 24 janvier 2014 00:23
> =C0=A0: Robert Sanford
> Cc=A0: dev@dpdk.org; mayhan@mayhan.org
> Objet=A0: Re: [dpdk-dev] Rx-errors with testpmd (only 75% line rate)
>=20
> Thank you, everyone, for all of your suggestions, but unfortunately =
I'm
> still having the problem.
>=20
> I have reduced the test down to using 2 cores (one is the master core)
both
> of which are on the socket in which the NIC's PCI slot is connected.  =
I am
> running in rxonly mode, so I am basically just counting the packets.  =
I've
> tried all different burst sizes.  Nothing seems to make any =
difference.
>=20
> Since my original post, I have acquired an IXIA tester so I have =
better
> control over my testing.   I send 250,000,000 packets to the =
interface.  I
> am getting roughly 25,000,000 Rx-errors with every run.  I have =
verified
> that the number of Rx-errors is consistent in the value in the RXMPC =
of
the
> NIC.
>=20
> Just for sanity's sake, I tried switching the cores to the other =
socket
and
> run the same test.  As expected I got more packet loss.  Roughly
87,000,000
>=20
> I am running Red Hat 6.4 which uses kernel 2.6.32-358
>=20
> This is a numa supported system, but whether or not I use --numa =
doesn't
> seem to make a difference.
>=20

Is the BIOS configured NUMA? If not, the BIOS may program System Address
Decoding so that memory address space is interleaved between sockets on =
64MB
boundaries (you may have a look at Xeon 7500 datasheet volume 2 - a =
public
document - =A74.4 for an "explanation" of this).=20

In general you don't want memory interleaving: QPI bandwidth tops at =
16GBps
on the latest processors while single node aggregated memory bandwidth =
can
be over 60GB/s.


> Looking at the Intel documentation it appears that I should be able to
> easily do what I am trying to do.  Actually, the documentation infers =
that
> I should be able to do roughly 40 Gbps with a single 2.x GHz processor
core
> with other configuration (memory, os, etc.) similar to my system.  It
> appears to me that much of the details of these benchmarks are =
missing.
>=20
> Can someone on this list actually verify for me that what I am trying =
to
do
> is possible and that they have done it with success?

I have done a NAT64 proof of concept that handled 40Gbps throughput on a
single Xeon E5 2697v2.
Intel NIC chip was 82599ES (if I recall correctly, I don't have the card
handy anymore), 4 rx queues 4 tx queues per port, 32768 descriptors per
queue, Intel DCA on, Ethernet pause parameters OFF: 14.8Mpps per port, =
no
packet loss.
However this was with a kernel based proprietary packet framework. I =
expect
DPDK to achieve the same results.

>=20
> Much appreciation for all the help.
> - Michael
>=20
>=20
> On Wed, Jan 22, 2014 at 3:38 PM, Robert Sanford
> <rsanford@prolexic.com>wrote:
>=20
> > Hi Michael,
> >
> > > What can I do to trace down this problem?
> >
> > May I suggest that you try to be more selective in the core masks on
> > the command line. The test app may choose some cores from "other" =
CPU
> sockets.
> > Only enable cores of the one socket to which the NIC is attached.
> >
> >
> > > It seems very similar to a
> > > thread on this list back in May titled "Best example for showing
> > > throughput?" where no resolution was ever mentioned in the thread.
> >
> > After re-reading *that* thread, it appears that their problem may =
have
> > been trying to achieve ~40 Gbits/s of bandwidth (2 ports x 10 Gb Rx =
+
> > 2 ports x 10 Gb Tx), plus overhead, over a typical dual-port NIC =
whose
> > total bus bandwidth is a maximum of 32 Gbits/s (PCI express 2.1 x8).


PCIe is "32Gbps" full duplex, meaning on each direction.
On a single dual port card you have 20Gbps inbound traffic (below =
32Gbps)
and 20Gbps outbound traffic (below 32Gbps).

A 10Gbos port running at  10,000,000,000bps (10^10bps, *not* a power of
two). A 64 byte frame (incl. CRC) has preamble, interframe gap... So on =
the
wire there are=20
7+1+64+12=3D84bytes=3D672bits. The max packet rate is thus 10^10 / 672 =
=3D
14,880,952 pps.

On the PCIexpress side there will be 60 byte (frame excluding CRC)
transferred in a single DMA transaction with additional overhead, plus
8b/10b encoding per packet:
(60 + 8 + 16) =3D 84 bytes (fits into a 128 byte typical max payload) or =
840
'bits' (8b/10b encoding). I=20
An 8 lane 5GT/s (GigaTransaction =3D 5*10^10 "transaction" per second; =
i.e. a
"bit" every 200picosecond) can be viewed as a 40GT/s link, so we can =
have
4*10^10/840=3D47,619,047pps per direction (PCIe is full duplex).

So two fully loaded ports generate 29,761,904pps on each direction which =
can
be absorbed on the PCIexpress Gen x8 even taking account overhead of DMA
stuff.

> >
> > --
> > Regards,
> > Robert
> >
> >