DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates
@ 2015-10-27 20:56 Polehn, Mike A
  2015-10-28 10:44 ` Bruce Richardson
  0 siblings, 1 reply; 3+ messages in thread
From: Polehn, Mike A @ 2015-10-27 20:56 UTC (permalink / raw)
  To: dev

Prefetch of interface access variables while calling into driver RX and TX subroutines.

For converging zero loss packet task tests, a small drop in latency for zero loss measurements 
and small drop in lost packet counts for the lossy measurement points was observed, 
indicating some savings of execution clock cycles.

Signed-off-by: Mike A. Polehn <mike.a.polehn@intel.com>

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8a8c82b..09f1069 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2357,11 +2357,15 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
 		 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
 {
 	struct rte_eth_dev *dev;
+	void *rxq;
 
 	dev = &rte_eth_devices[port_id];
 
-	int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
-			rx_pkts, nb_pkts);
+	/* rxq is going to be immediately used, prefetch it */
+	rxq = dev->data->rx_queues[queue_id];
+	rte_prefetch0(rxq);
+
+	int16_t nb_rx = (*dev->rx_pkt_burst)(rxq, rx_pkts, nb_pkts);
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
 	struct rte_eth_rxtx_callback *cb = dev->post_rx_burst_cbs[queue_id];
@@ -2499,6 +2503,7 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct rte_eth_dev *dev;
+	void *txq;
 
 	dev = &rte_eth_devices[port_id];
 
@@ -2514,7 +2519,11 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	}
 #endif
 
-	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
+	/* txq is going to be immediately used, prefetch it */
+	txq = dev->data->tx_queues[queue_id];
+	rte_prefetch0(txq);
+
+	return (*dev->tx_pkt_burst)(txq, tx_pkts, nb_pkts);
 }
 #endif

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates
  2015-10-27 20:56 [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates Polehn, Mike A
@ 2015-10-28 10:44 ` Bruce Richardson
  2015-10-28 21:27   ` Polehn, Mike A
  0 siblings, 1 reply; 3+ messages in thread
From: Bruce Richardson @ 2015-10-28 10:44 UTC (permalink / raw)
  To: Polehn, Mike A; +Cc: dev

On Tue, Oct 27, 2015 at 08:56:31PM +0000, Polehn, Mike A wrote:
> Prefetch of interface access variables while calling into driver RX and TX subroutines.
> 
> For converging zero loss packet task tests, a small drop in latency for zero loss measurements 
> and small drop in lost packet counts for the lossy measurement points was observed, 
> indicating some savings of execution clock cycles.
> 
Hi Mike,

the commit log message above seems a bit awkward to read. If I understand it
correctly, would the below suggestion be a shorter, clearer equivalent?

	Prefetch RX and TX queue variables in ethdev before driver function call

	This has been measured to produce higher throughput and reduced latency
	in RFC 2544 throughput tests.

Or perhaps you could suggest yourself some similar wording. It would also be
good to clarify with what applications the improvements were seen - was it using
testpmd or l3fwd or something else?

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates
  2015-10-28 10:44 ` Bruce Richardson
@ 2015-10-28 21:27   ` Polehn, Mike A
  0 siblings, 0 replies; 3+ messages in thread
From: Polehn, Mike A @ 2015-10-28 21:27 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev

Hi Bruce!

Thank you for reviewing, sorry didn't write clearly as possible.

I was trying to say more than "The performance improved". I didn't call out RFC 2544 since many 
people may not know much about it. I was also trying to convey what was observed and the 
conclusion derived from the observation without getting too big.

When the NIC processing loop rate is around 400,000/sec the entry and exit savings are not easily 
observable when the average data rate variation from test to test is higher than the packet rate 
gain. If RFC 2544 zero loss convergence is set too fine, the time it takes to make a complete test 
increases substantially (I set my convergence about 0.25% of line rate) at 60 seconds per 
measurement point. Unless the current convergence data rate is close to zero loss for the
next point, a small improvement is not going to show up as higher zero loss rate. However the
test has a series of measurements, which has average latency and packet loss. Also since the
test equipment uses a predefined sequence algorithm that cause the same data rate to
to a high degree of accuracy be generated for each test, the results for same data rates can be
compared across tests. If someone repeats the tests, I am pointing to the particular data to
look at. One 60 second measurement itself does not give sufficient accuracy to make a 
conclusion, but information correlated across multiple measurements gives basis for a
correct conclusion.

For l3fwd, to be stable with i40e requires the queues to be increased (I use 2k) and the 
Packet count to also be increased. This then gets 100% zero loss line rate with 64 byte 
Packets for 2 10 GbE connections (given the correct Fortville firmware). This makes it
good to verify the correct NIC firmware but does not work well for testing since the 
data is network limited. I have my own stable packet processing code which I used for 
testing. I have multiple programs, but during the optimization cycle, hit line rate and
had to move to a 5 tuple processing program for a higher load to proceed. I have a
doc that covers this setup and the optimization results, but cannot be shared. Someone
making their on measurements needs to have made sufficient tests to understand the
stability of their test environment.

Mike


-----Original Message-----
From: Richardson, Bruce 
Sent: Wednesday, October 28, 2015 3:45 AM
To: Polehn, Mike A
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates

On Tue, Oct 27, 2015 at 08:56:31PM +0000, Polehn, Mike A wrote:
> Prefetch of interface access variables while calling into driver RX and TX subroutines.
> 
> For converging zero loss packet task tests, a small drop in latency 
> for zero loss measurements and small drop in lost packet counts for 
> the lossy measurement points was observed, indicating some savings of execution clock cycles.
> 
Hi Mike,

the commit log message above seems a bit awkward to read. If I understand it correctly, would the below suggestion be a shorter, clearer equivalent?

	Prefetch RX and TX queue variables in ethdev before driver function call

	This has been measured to produce higher throughput and reduced latency
	in RFC 2544 throughput tests.

Or perhaps you could suggest yourself some similar wording. It would also be good to clarify with what applications the improvements were seen - was it using testpmd or l3fwd or something else?

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-28 21:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27 20:56 [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates Polehn, Mike A
2015-10-28 10:44 ` Bruce Richardson
2015-10-28 21:27   ` Polehn, Mike A

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).