From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 676755917
 for <dev@dpdk.org>; Wed, 22 Oct 2014 16:45:52 +0200 (CEST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga101.fm.intel.com with ESMTP; 22 Oct 2014 07:54:09 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.04,769,1406617200"; d="scan'208";a="618555696"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by fmsmga002.fm.intel.com with ESMTP; 22 Oct 2014 07:53:57 -0700
Received: from irsmsx107.ger.corp.intel.com (163.33.3.99) by
 IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Wed, 22 Oct 2014 15:53:01 +0100
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.174]) by
 IRSMSX107.ger.corp.intel.com ([169.254.10.68]) with mapi id 14.03.0195.001;
 Wed, 22 Oct 2014 15:53:01 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>, "Liang, Cunming"
 <cunming.liang@intel.com>
Thread-Topic: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
 cycles/packet
Thread-Index: AQHP5LMIlpl/13ytTkee3VaYZ7hCTJwsQCWAgA4aiICAAC3CgIABn0EAgAAaWqA=
Date: Wed, 22 Oct 2014 14:53:00 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258213955A6@IRSMSX105.ger.corp.intel.com>
References: <1408947174-11323-1-git-send-email-cunming.liang@intel.com>
 <1412944201-30703-1-git-send-email-cunming.liang@intel.com>
 <1412944201-30703-2-git-send-email-cunming.liang@intel.com>
 <20141010175226.GG19499@hmsreliant.think-freely.org>
 <D0158A423229094DA7ABF71CF2FA0DA31183E8F0@shsmsx102.ccr.corp.intel.com>
 <20141021103315.GB12795@hmsreliant.think-freely.org>
 <D0158A423229094DA7ABF71CF2FA0DA3118501DD@shsmsx102.ccr.corp.intel.com>
 <20141022140316.GA13913@localhost.localdomain>
In-Reply-To: <20141022140316.GA13913@localhost.localdomain>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
 cycles/packet
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2014 14:45:55 -0000



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, October 22, 2014 3:03 PM
> To: Liang, Cunming
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx =
cycles/packet
>=20
> On Tue, Oct 21, 2014 at 01:17:01PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: Tuesday, October 21, 2014 6:33 PM
> > > To: Liang, Cunming
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and=
 tx
> > > cycles/packet
> > >
> > > On Sun, Oct 12, 2014 at 11:10:39AM +0000, Liang, Cunming wrote:
> > > > Hi Neil,
> > > >
> > > > Very appreciate your comments.
> > > > I add inline reply, will send v3 asap when we get alignment.
> > > >
> > > > BRs,
> > > > Liang Cunming
> > > >
> > > > > -----Original Message-----
> > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > > Sent: Saturday, October 11, 2014 1:52 AM
> > > > > To: Liang, Cunming
> > > > > Cc: dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx=
 and tx
> > > cycles/packet
> > > > >
> > > > > On Fri, Oct 10, 2014 at 08:29:58PM +0800, Cunming Liang wrote:
> > > > > > It provides unit test to measure cycles/packet in NIC loopback =
mode.
> > > > > > It simply gives the average cycles of IO used per packet withou=
t test
> > > equipment.
> > > > > > When doing the test, make sure the link is UP.
> > > > > >
> > > > > > Usage Example:
> > > > > > 1. Run unit test app in interactive mode
> > > > > >     app/test -c f -n 4 -- -i
> > > > > > 2. Run and wait for the result
> > > > > >     pmd_perf_autotest
> > > > > >
> > > > > > There's option to choose rx/tx pair, default is vector.
> > > > > >     set_rxtx_mode [vector|scalar|full|hybrid]
> > > > > > Note: To get acurate scalar fast, please choose 'vector' or 'hy=
brid' without
> > > > > INC_VEC=3Dy in config
> > > > > >
> > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > >
> > > > > Notes inline
> > > > >
> > > > > > ---
> > > > > >  app/test/Makefile                   |    1 +
> > > > > >  app/test/commands.c                 |   38 +++
> > > > > >  app/test/packet_burst_generator.c   |    4 +-
> > > > > >  app/test/test.h                     |    4 +
> > > > > >  app/test/test_pmd_perf.c            |  626
> > > > > +++++++++++++++++++++++++++++++++++
> > > > > >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |    6 +
> > > > > >  6 files changed, 677 insertions(+), 2 deletions(-)
> > > > > >  create mode 100644 app/test/test_pmd_perf.c
> > > > > >
> > > > > > diff --git a/app/test/Makefile b/app/test/Makefile
> > > > > > index 6af6d76..ebfa0ba 100644
> > > > > > --- a/app/test/Makefile
> > > > > > +++ b/app/test/Makefile
> > > > > > @@ -56,6 +56,7 @@ SRCS-y +=3D test_memzone.c
> > > > > >
> > > > > >  SRCS-y +=3D test_ring.c
> > > > > >  SRCS-y +=3D test_ring_perf.c
> > > > > > +SRCS-y +=3D test_pmd_perf.c
> > > > > >
> > > > > >  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
> > > > > >  SRCS-y +=3D test_table.c
> > > > > > diff --git a/app/test/commands.c b/app/test/commands.c
> > > > > > index a9e36b1..f1e746e 100644
> > > > > > --- a/app/test/commands.c
> > > > > > +++ b/app/test/commands.c
> > > > > > @@ -310,12 +310,50 @@ cmdline_parse_inst_t cmd_quit =3D {
> > > > > >
> > > > > > +#define NB_ETHPORTS_USED                (1)
> > > > > > +#define NB_SOCKETS                      (2)
> > > > > > +#define MEMPOOL_CACHE_SIZE 250
> > > > > > +#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) +
> > > > > RTE_PKTMBUF_HEADROOM)
> > > > > Don't you want to size this in accordance with the amount of data=
 your
> > > sending
> > > > > (64 Bytes as noted above)?
> > > > [Liang, Cunming] The case is designed to measure small packet IO co=
st with
> > > normal mbuf size.
> > > > Even if decreasing the size, it won't gain significant cycles.
> > > > >
> > > That presumes a non-memory constrained system, doesn't it?  I suppose=
 in the
> > > end
> > > as long as you have consistency, its not overly relevant, but it seem=
s like
> > > you'll want to add data sizing as a feature to this eventually (i.e. =
the ability
> > > to test performance for larger frames sizes), at which point you'll n=
eed to make
> > > this non-static anyway.
> > [Liang, Cunming] For a normal Ethernet packet(w/o jumbo frame), packet =
size is 1518B.
> > As in really network, there won't have huge number of jumbo frames.
> > The mbuf size 2048 is a reasonable value to cover most of the packet si=
ze.
> > It's also be chosen by lots of NIC as the default receiving buffer size=
 in DMA register.
> > In case larger than the size, it need do scatter and gather but lose so=
me performance.
> > The unit test won't measure size from 64 to 9600, won't plan to measure=
 scatter-gather rx/tx.
> > It focus on 64B packet size and taking the mbuf size being used the mos=
t often.
> Fine.
>=20
> > >
> > > > > > +static void
> > > > > > +print_ethaddr(const char *name, const struct ether_addr *eth_a=
ddr)
> > > > > > +{
> > > > > > +	printf("%s%02X:%02X:%02X:%02X:%02X:%02X", name,
> > > > > > +		eth_addr->addr_bytes[0],
> > > > > > +		eth_addr->addr_bytes[1],
> > > > > > +		eth_addr->addr_bytes[2],
> > > > > > +		eth_addr->addr_bytes[3],
> > > > > > +		eth_addr->addr_bytes[4],
> > > > > > +		eth_addr->addr_bytes[5]);
> > > > > > +}
> > > > > > +
> > > > > This was copieed from print_ethaddr.  Seems like a good candidate=
 for a
> > > common
> > > > > function in rte_ether.h
> > > > [Liang, Cunming] Agree with you, some of samples now use it with th=
e same
> > > copy.
> > > > I'll rework it. Adding 'ether_format_addr' in rte_ether.h only for =
format the
> > > 48bits address output.
> > > > And leaving other prints for application customization.
> > > > >
> > > Sounds good.
> > >
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > +static void
> > > > > > +signal_handler(int signum)
> > > > > > +{
> > > > > > +	/* When we receive a USR1 signal, print stats */
> > > > > I think you mean SIGUSR2, below, SIGUSR1 tears the test down and =
exits the
> > > > > program
> > > > [Liang, Cunming] Thanks, it's a typo.
> > > > >
> > > > > > +	if (signum =3D=3D SIGUSR1) {
> > > > > SIGINT instead.  Thats the common practice.
> > > > [Liang, Cunming] I understood your opinion.
> > > > The considerations I'm not using SIGINT instead are:
> > > > 1. We unset ISIG in c_lflag of term. CRTL+C won't trigger SIGINT in=
 command
> > > interactive.
> > > >   It always has to explicitly send signal. No matter SIGUSR1 or SIG=
INT.
> > > > 2. By SIGINT semantic, expect to terminate the process.
> > > >   Here I expect to force stop this case, but still alive in command=
 line.
> > > >   After it stopped, it can run again or start to run other test cas=
es.
> > > >   So I keep SIGINT, SIGUSR1 in different behavior.
> > > > 3. It should be rarely used.
> > > >   Only when exception timeout, I leave this backdoor for automation=
 test
> > > control.
> > > >   For manual test, we can easily force kill the process.
> > > >
> > > Hmm, ok, that sounds reasonable.
> > >
> > > > >
> > > > > > +		printf("Force Stop!\n");
> > > > > > +		stop =3D 1;
> > > > > > +	}
> > > > > > +	if (signum =3D=3D SIGUSR2)
> > > > > > +		stats_display(0);
> > > > > > +}
> > > > > > +/* main processing loop */
> > > > > > +static int
> > > > > > +main_loop(__rte_unused void *args)
> > > > > > +{
> > > > > > +#define PACKET_SIZE 64
> > > > > > +#define FRAME_GAP 12
> > > > > > +#define MAC_PREAMBLE 8
> > > > > > +	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> > > > > > +	unsigned lcore_id;
> > > > > > +	unsigned i, portid, nb_rx =3D 0, nb_tx =3D 0;
> > > > > > +	struct lcore_conf *conf;
> > > > > > +	uint64_t prev_tsc, cur_tsc;
> > > > > > +	int pkt_per_port;
> > > > > > +	uint64_t packets_per_second, total_packets;
> > > > > > +
> > > > > > +	lcore_id =3D rte_lcore_id();
> > > > > > +	conf =3D &lcore_conf[lcore_id];
> > > > > > +	if (conf->status !=3D LCORE_USED)
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	pkt_per_port =3D  MAX_TRAFIC_BURST / conf->nb_ports;
> > > > > > +
> > > > > > +	int idx =3D 0;
> > > > > > +	for (i =3D 0; i < conf->nb_ports; i++) {
> > > > > > +		int num =3D pkt_per_port;
> > > > > > +		portid =3D conf->portlist[i];
> > > > > > +		printf("inject %d packet to port %d\n", num, portid);
> > > > > > +		while (num) {
> > > > > > +			nb_tx =3D RTE_MIN(MAX_PKT_BURST, num);
> > > > > > +			nb_tx =3D rte_eth_tx_burst(portid, 0,
> > > > > > +						&tx_burst[idx], nb_tx);
> > > > > > +			num -=3D nb_tx;
> > > > > > +			idx +=3D nb_tx;
> > > > > > +		}
> > > > > > +	}
> > > > > > +	printf("Total packets inject to prime ports =3D %u\n", idx);
> > > > > > +
> > > > > > +	packets_per_second =3D (link_mbps * 1000 * 1000) /
> > > > > > +		+((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT);
> > > > > > +	printf("Each port will do %"PRIu64" packets per second\n",
> > > > > > +		+packets_per_second);
> > > > > > +
> > > > > > +	total_packets =3D RTE_TEST_DURATION * conf->nb_ports *
> > > > > packets_per_second;
> > > > > > +	printf("Test will stop after at least %"PRIu64" packets recei=
ved\n",
> > > > > > +		+ total_packets);
> > > > > > +
> > > > > > +	prev_tsc =3D rte_rdtsc();
> > > > > > +
> > > > > > +	while (likely(!stop)) {
> > > > > > +		for (i =3D 0; i < conf->nb_ports; i++) {
> > > > > > +			portid =3D conf->portlist[i];
> > > > > > +			nb_rx =3D rte_eth_rx_burst((uint8_t) portid, 0,
> > > > > > +						 pkts_burst, MAX_PKT_BURST);
> > > > > > +			if (unlikely(nb_rx =3D=3D 0)) {
> > > > > > +				idle++;
> > > > > > +				continue;
> > > > > > +			}
> > > > > > +
> > > > > > +			count +=3D nb_rx;
> > > > > Doesn't take into consideration error conditions.  rte_eth_rx_bur=
st can
> > > return
> > > > > -ENOTSUP
> > > > [Liang, Cunming] It returns -ENOTSUP when turning on ETHDEV_DEBUG
> > > CONFIG.
> > > > The error is used to avoid no function call register.
> > > > When ETHDEV_DEBUG turn off, the NULL function call cause segfault d=
irectly.
> > > > So I think it's a library internal error.
> > > No, look at FUNC_PTR_OR_ERR_RET.  If the passed in function pointer i=
s null,
> > > -ENOTSUPP will be returned to the application, you need to handle the=
 error
> > > condition.
> > [Liang, Cunming] The runtime rte_eth_rx_burst is a inline function in r=
te_ethdev.h.
> > The one you're talking about is the one defined in rte_ethdev.c which i=
s a extern non-inline function.
> > It works only when RTE_LIBRTE_ETHDEV_DEBUG turns on.
> > If we always turns such library internal checking on, it lose performan=
ce.
> > So I insist it's a library internal error checking, doesn't need to tak=
e care in application level.
> I'm really flored that you would respond this way.
>=20
> What you have is two versions of a function, one returns errors and one d=
oesn't,
> and the version used is based on compile time configuration.  You've writ=
ten
> your application such that it can only gracefully handle one of the two
> versions, and you're happy to allow the other version, the one thats supp=
osed to
> be in use when you are debugging applications, to create silent accountin=
g
> failures in your application.  And it completely ignores the fact that th=
e
> non-debug version might one day return errors as well (even if it doesn't
> currently).  Your assertion above that we can ignore it because its debug=
 code
> is the most short sighted thing I've read in a long time.

Actually looking at rte_eth_rx_burst(): it returns uint16_t.=20
Which is sort of a strange if it expects to return a negative value as erro=
r code.
Also reading though 'formal' comments of rte_eth_rx_burst() - it seems that=
 it not supposed to return negative values:
"...
The rte_eth_rx_burst() function does not provide any error
 notification to avoid the corresponding overhead. As a hint, the
 upper-level application might check the status of the device link once
 being systematically returned a 0 value for a given number of tries.
...
@return
 *   The number of packets actually retrieved, which is the number
 *   of pointers to *rte_mbuf* structures effectively supplied to the
 *   *rx_pkts* array."

Again, if you looks at rte_eth_rx_burst() implementation, when RTE_LIBRTE_E=
THDEV_DEBUG is on -
For some error cases it returns '0', foor other's: -ENOTSUP:

FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, -ENOTSUP);
if (queue_id >=3D dev->data->nb_rx_queues) {
                PMD_DEBUG_TRACE("Invalid RX queue_id=3D%d\n", queue_id);
                return 0;
}

Which makes me thing that we just have errnoneous implementation of rte_eth=
_rx_burst()
when RTE_LIBRTE_ETHDEV_DEBUG is on.
Probably just a mechanical mistake, when FUNC_PTR_OR_ERR_RET() was added.
 I'd say we change rte_eth_rx_burst() to always return 0 (as was initially =
planned).

Konstantin


>=20
> > >
> > > > In such library exceptional case, I prefer not expecting sample/app=
lication to
> > > condition check library functional error.
> > > But you're assertion about the library handling the exception is wron=
g.
> > >
> > > > >
> > > > > > +			nb_tx =3D rte_eth_tx_burst(portid, 0, pkts_burst, nb_rx);
> > > > > Ditto with -ENOTSUP
> > > > >
> > > > > > +			if (unlikely(nb_tx < nb_rx)) {
> > > > > What makes this unlikely?  Seems like a perfectly reasonable cond=
ition to
> > > happen
> > > > > to me.  If the network is busy, its completely likely that you wi=
ll receive more
> > > > > frames than you send, if you elect to receive all frames.
> > > > [Liang, Cunming] For this case, NIC works in MAC loopback mode.
> > > > It firstly injects numbers of packets to NIC. Then NIC will loopbac=
k all packets to
> > > ingress side.
> > > > The code here will receive the packets, and send them back to NIC
> > > > Packets looping inside all come from initial injection.
> > > > As the total number of injected packets is much less than in-chip q=
ueue size,
> > > the tx egress queue shouldn't block desc. ring update.
> > > > So when receiving packets in line rate and nb_tx < nb_rx, it means =
tx cannot
> > > archive line rate.
> > > > When it happens, the cycles/packets result make no sense, as the bo=
ttle neck is
> > > NIC.
> > > > The drop counter can record it.
> > > Ok.
> > >
> > > > >
> > > > > > +				drop +=3D (nb_rx - nb_tx);
> > > > > > +				do {
> > > > > > +					rte_pktmbuf_free(pkts_burst[nb_tx]);
> > > > > Defer this, it skews your timing
> > > > [Liang, Cunming] Agree with you, I ever thought about it.
> > > > This test cases is designed to measure pure IO RX/TX routine.
> > > > When doing tx burst with result nb_tx < nb_rx, we either repeat re-=
send or
> > > drop it.
> > > > Each way introduces noise(adding additional control code), resendin=
g much
> > > times even cost more than free it.
> > > > The cycles/packets is useful when there's no packet drop, otherwise=
 it gives the
> > > hint where the problem comes from (by idle or drop).
> > > I'm not sure what you're asserting here.  Are you suggesting that you=
 want to
> > > include the time it takes to free memory buffers in your testing?  Th=
at seems
> > > dubious at best to me.  If you want to measure I/O, thats great, but =
memory
> > > management of packet buffers is a separate operation from that I/O
> > [Liang, Cunming] Agree with you. I means it doesn't need to take care o=
f the mbuf free cost.
> > As I said in previous reply, it rarely happens in line rate.
> > The cycle measurement doesn't make much sense if happens.
> > On that time the unit test just notify it happens, and keep safe free a=
ll the unsent mbuf.
> >
> > >
> > > > >
> > > > > > +				} while (++nb_tx < nb_rx);
> > > > > > +			}
> > > > > > +		}
> > > > > > +		if (unlikely(count >=3D total_packets))
> > > > > > +			break;
> > > > > Whats the reasoning here?  Do you only ever expect to receive fra=
mes that
> > > you
> > > > > send?  If so, seems like this should call for a big warning to ge=
t printed.
> > > > [Liang, Cunming] The loop exits when the pre-calculated total_packe=
ts are
> > > received.
> > > > As the nb_rx is unpredictable, the packet counter may large equal t=
han
> > > total_packets the last time.
> > > > The reason unlikely used here is because the condition becomes true=
 only the
> > > last time.
> > > ok
> > >
> > > > >
> > > > > > +	}
> > > > > > +
> > > > > > +	cur_tsc =3D rte_rdtsc();
> > > > > > +
> > > > > > +	for (i =3D 0; i < conf->nb_ports; i++) {
> > > > > > +		portid =3D conf->portlist[i];
> > > > > > +		int nb_free =3D pkt_per_port;
> > > > > > +		do { /* dry out */
> > > > > > +			nb_rx =3D rte_eth_rx_burst((uint8_t) portid, 0,
> > > > > > +						 pkts_burst, MAX_PKT_BURST);
> > > > > > +			nb_tx =3D 0;
> > > > > > +			while (nb_tx < nb_rx)
> > > > > > +				rte_pktmbuf_free(pkts_burst[nb_tx++]);
> > > > > > +			nb_free -=3D nb_rx;
> > > > > > +		} while (nb_free !=3D 0);
> > > > > > +		printf("free %d mbuf left in port %u\n", pkt_per_port, porti=
d);
> > > > > > +	}
> > > > > > +
> > > > > Whats the purpose of this?  Are you trying to flush the device?  =
Wouldn't it
> > > be
> > > > > enough just to stop the interface?
> > > > [Liang, Cunming] If we only run the cases once and exit, it doesn't=
 matter.
> > > > But it's designed to run multi-times without exit, for the purpose =
of warming
> > > up or for getting average number.
> > > > So stopping device is not enough, we have to release the flying pac=
kets.
> > > >
> > > Ok
> > >
> > > > >
> > > > > > +	if (count =3D=3D 0)
> > > > > > +		return -1;
> > > > > > +
> > > > > > +	printf("%lu packet, %lu drop, %lu idle\n", count, drop, idle)=
;
> > > > > > +	printf("Result: %ld cycles per packet\n", (cur_tsc - prev_tsc=
) / count);
> > > > > > +
> > > > > Bad math here.  Theres no guarantee that the tsc hasn't wrapped
> > > (potentially
> > > > > more than once) depending on your test length.  you need to check=
 the tsc
> > > before
> > > > > and after each burst and record an average of deltas instead, acc=
ounting in
> > > each
> > > > > instance for the possibility of wrap.
> > > > [Liang, Cunming] I'm not sure catch your point correctly.
> > > > I think both cur_tsc and prev_tsc are 64 bits width.
> > > > For 3GHz, I think it won't wrapped so quick.
> > > > As it's uint64_t, so even get wrapped, the delta should still be co=
rrect.
> > > But theres no guarantee that the tsc starts at zero when you begin yo=
ur test.
> > > The system may have been up for a long time and near wrapping already=
.
> > > Regardless, you need to account for the possibility that cur_tsc is s=
maller
> > > than prev_tsc, or this breaks.
> > [Liang, Cunming] In case prev_tsc near wrapping, and cur_tsc get wrappe=
d.
> > As they are unsigned, it still ok.
> > e.g. cur_tsc=3D0x2, prev_tsc=3D0xFFFFFFFFFFFFFFFC
> > Delta=3Dcur_tsc-prev_tsc is 6, which is still correct.
> Ah, you're right, my fault.
> > And for uint64_t, we need to start computer for hundreds of years.
> That only assumes that the tsc has been kept at its initial state from bo=
ot, its
> entirely possible that the tsc is writen to a different value.  IIRC, in =
several
> debug configurations, the linux kernel inits the tsc to a large value so =
that it
> rolls over shortly after boot to catch wrapping errors.  Though as you no=
te,
> your used of uint64 makes it irrelevant.
>=20
> > >
> > > > >
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int
> > > > > > +test_pmd_perf(void)
> > > > > > +{
> > > > > > +	uint16_t nb_ports, num, nb_lcores, slave_id =3D (uint16_t)-1;
> > > > > > +	uint16_t nb_rxd =3D RTE_TEST_RX_DESC_DEFAULT;
> > > > > > +	uint16_t nb_txd =3D RTE_TEST_TX_DESC_DEFAULT;
> > > > > > +	uint16_t portid;
> > > > > > +	uint16_t nb_rx_queue =3D 1, nb_tx_queue =3D 1;
> > > > > > +	int socketid =3D -1;
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	printf("Start PMD RXTX cycles cost test.\n");
> > > > > > +
> > > > > > +	signal(SIGUSR1, signal_handler);
> > > > > Again SIGINT here.
> > > > >
> > > > > > +	signal(SIGUSR2, signal_handler);
> > > > > > +
> > > > > > +	nb_ports =3D rte_eth_dev_count();
> > > > > > +	if (nb_ports < NB_ETHPORTS_USED) {
> > > > > > +		printf("At least %u port(s) used for perf. test\n",
> > > > > > +		       NB_ETHPORTS_USED);
> > > > > > +		return -1;
> > > > > > +	}
> > > > > > +
> > > > > > +	if (nb_ports > RTE_MAX_ETHPORTS)
> > > > > > +		nb_ports =3D RTE_MAX_ETHPORTS;
> > > > > > +
> > > > > > +	nb_lcores =3D rte_lcore_count();
> > > > > > +
> > > > > > +	memset(lcore_conf, 0, sizeof(lcore_conf));
> > > > > > +	init_lcores();
> > > > > > +
> > > > > > +	init_mbufpool(NB_MBUF);
> > > > > > +
> > > > > > +	reset_count();
> > > > > > +	num =3D 0;
> > > > > > +	for (portid =3D 0; portid < nb_ports; portid++) {
> > > > > > +		if (socketid =3D=3D -1) {
> > > > > > +			socketid =3D rte_eth_dev_socket_id(portid);
> > > > > > +			slave_id =3D alloc_lcore(socketid);
> > > > > > +			if (slave_id =3D=3D (uint16_t)-1) {
> > > > > > +				printf("No avail lcore to run test\n");
> > > > > > +				return -1;
> > > > > > +			}
> > > > > > +			printf("Performance test runs on lcore %u socket %u\n",
> > > > > > +			       slave_id, socketid);
> > > > > > +		}
> > > > > > +
> > > > > > +		if (socketid !=3D rte_eth_dev_socket_id(portid)) {
> > > > > > +			printf("Skip port %d\n", portid);
> > > > > > +			continue;
> > > > > > +		}
> > > > > > +
> > > > > > +		/* port configure */
> > > > > > +		ret =3D rte_eth_dev_configure(portid, nb_rx_queue,
> > > > > > +					    nb_tx_queue, &port_conf);
> > > > > > +		if (ret < 0)
> > > > > > +			rte_exit(EXIT_FAILURE,
> > > > > > +				"Cannot configure device: err=3D%d, port=3D%d\n",
> > > > > > +				 ret, portid);
> > > > > > +
> > > > > > +		rte_eth_macaddr_get(portid, &ports_eth_addr[portid]);
> > > > > > +		printf("Port %u ", portid);
> > > > > > +		print_ethaddr("Address:", &ports_eth_addr[portid]);
> > > > > > +		printf("\n");
> > > > > > +
> > > > > > +		/* tx queue setup */
> > > > > > +		ret =3D rte_eth_tx_queue_setup(portid, 0, nb_txd,
> > > > > > +					     socketid, &tx_conf);
> > > > > > +		if (ret < 0)
> > > > > > +			rte_exit(EXIT_FAILURE,
> > > > > > +				"rte_eth_tx_queue_setup: err=3D%d, "
> > > > > > +				"port=3D%d\n", ret, portid);
> > > > > > +
> > > > > > +		/* rx queue steup */
> > > > > > +		ret =3D rte_eth_rx_queue_setup(portid, 0, nb_rxd,
> > > > > > +						socketid, &rx_conf,
> > > > > > +						mbufpool[socketid]);
> > > > > > +		if (ret < 0)
> > > > > > +			rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup: err=3D%d,"
> > > > > > +				 "port=3D%d\n", ret, portid);
> > > > > > +
> > > > > > +		/* Start device */
> > > > > > +		stop =3D 0;
> > > > > > +		ret =3D rte_eth_dev_start(portid);
> > > > > > +		if (ret < 0)
> > > > > > +			rte_exit(EXIT_FAILURE,
> > > > > > +				"rte_eth_dev_start: err=3D%d, port=3D%d\n",
> > > > > > +				ret, portid);
> > > > > > +
> > > > > > +		/* always eanble promiscuous */
> > > > > > +		rte_eth_promiscuous_enable(portid);
> > > > > > +
> > > > > > +		lcore_conf[slave_id].portlist[num++] =3D portid;
> > > > > > +		lcore_conf[slave_id].nb_ports++;
> > > > > > +	}
> > > > > > +	check_all_ports_link_status(nb_ports, RTE_PORT_ALL);
> > > > > > +
> > > > > > +	init_traffic(mbufpool[socketid], tx_burst, MAX_TRAFIC_BURST);
> > > > > > +
> > > > > > +	rte_eal_remote_launch(main_loop, NULL, slave_id);
> > > > > > +	if (rte_eal_wait_lcore(slave_id) < 0)
> > > > > > +		return -1;
> > > > > > +
> > > > > > +	/* port tear down */
> > > > > > +	for (portid =3D 0; portid < nb_ports; portid++) {
> > > > > > +		if (socketid !=3D rte_eth_dev_socket_id(portid))
> > > > > > +			continue;
> > > > > > +
> > > > > > +		rte_eth_dev_stop(portid);
> > > > > > +	}
> > > > > > +
> > > > > Clean up your allocated memory/lcores/etc?
> > > > [Liang, Cunming] It do cleanup on the beginning of case.
> > > > "Init_lcores","init_mbufpool","reset_count" guarantee the data clea=
n before
> > > each testing.
> > > > And mbufpool only allocated once even if we run multiple times.
> > > Its a janitorial issue.  Before the program exits, you need to free a=
ny
> > > resources that you've allocated.
> > [Liang, Cunming] Yes, I do but only not for the mbuf pool.
> > Firstly the case exit is not equal to program exit. I expect the re-ent=
er will use the same mempool, so that can do cache warm up.
> > Secondly, DPDK doesn't supply really mempool release. The program exit,=
 the memzone then free.
> I continue to find the fact that freeing mempools isn't supported is redi=
culous,
> but I suppose thats neither here nor there, you need to be able to clean =
up
> resources that you allocate, weather you are exiting a test case, a progr=
am, or
> simply a function, its just not sane to not be able to do that.
>=20
> Given the return code check above, all thats left to say I think is NAK t=
o this
> code.  If Thomas wants to ignore me, I suppose thats his perogative, but =
I can't
> see how you can ignore that case.
>=20
> Neil