DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ciara Loftus <ciara.loftus@intel.com>
To: dev@dpdk.org
Cc: stephen@networkplumber.org, magnus.karlsson@intel.com,
	qi.z.zhang@intel.com, Ciara Loftus <ciara.loftus@intel.com>
Subject: [dpdk-dev] [PATCH v2] net/af_xdp: optimisations to improve packet loss
Date: Tue, 23 Jun 2020 14:29:25 +0000	[thread overview]
Message-ID: <20200623142925.28305-1-ciara.loftus@intel.com> (raw)

This commit makes some changes to the AF_XDP PMD in an effort to improve
its packet loss characteristics.

1. In the case of failed transmission due to inability to reserve a tx
descriptor, the PMD now pulls from the completion ring, issues a syscall
in which the kernel attempts to complete outstanding tx operations, then
tries to reserve the tx descriptor again. Prior to this we dropped the
packet after the syscall and didn't try to re-reserve.

2. During completion ring cleanup, always pull as many entries as possible
from the ring as opposed to the batch size or just how many packets
we're going to attempt to send. Keeping the completion ring emptier should
reduce failed transmissions in the kernel, as the kernel requires space in
the completion ring to successfully tx.

3. Size the fill ring as twice the receive ring size which may help reduce
allocation failures in the driver.

4. Emulate a tx_free_thresh - when the number of available entries in the
completion ring rises above this, we pull from it. The threshold is set to
1k entries.

With these changes, a benchmark which measured the packet rate at which
0.01% packet loss could be reached improved from ~0.1G to ~3Gbps.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
---

v1->v2:
 - added emulated tx_free_thresh as suggested by Stephen Hemminger

 drivers/net/af_xdp/rte_eth_af_xdp.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 06124ba789..2d69221c1b 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -396,6 +396,8 @@ kick_tx(struct pkt_tx_queue *txq)
 {
 	struct xsk_umem_info *umem = txq->umem;
 
+	pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS);
+
 #if defined(XDP_USE_NEED_WAKEUP)
 	if (xsk_ring_prod__needs_wakeup(&txq->tx))
 #endif
@@ -407,11 +409,9 @@ kick_tx(struct pkt_tx_queue *txq)
 
 			/* pull from completion queue to leave more space */
 			if (errno == EAGAIN)
-				pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+				pull_umem_cq(umem,
+					     XSK_RING_CONS__DEFAULT_NUM_DESCS);
 		}
-#ifndef XDP_UMEM_UNALIGNED_CHUNK_FLAG
-	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
-#endif
 }
 
 #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
@@ -427,8 +427,10 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	uint16_t count = 0;
 	struct xdp_desc *desc;
 	uint64_t addr, offset;
+	uint32_t free_thresh = umem->cq.size >> 1;
 
-	pull_umem_cq(umem, nb_pkts);
+	if (xsk_cons_nb_avail(&umem->cq, free_thresh) >= free_thresh)
+		pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS);
 
 	for (i = 0; i < nb_pkts; i++) {
 		mbuf = bufs[i];
@@ -436,7 +438,9 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		if (mbuf->pool == umem->mb_pool) {
 			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
 				kick_tx(txq);
-				goto out;
+				if (!xsk_ring_prod__reserve(&txq->tx, 1,
+							    &idx_tx))
+					goto out;
 			}
 			desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx);
 			desc->len = mbuf->pkt_len;
@@ -758,7 +762,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals __rte_unused,
 	struct xsk_umem_info *umem;
 	int ret;
 	struct xsk_umem_config usr_config = {
-		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS * 2,
 		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
 		.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG};
 	void *base_addr = NULL;
@@ -867,7 +871,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	struct xsk_socket_config cfg;
 	struct pkt_tx_queue *txq = rxq->pair;
 	int ret = 0;
-	int reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	int reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS;
 	struct rte_mbuf *fq_bufs[reserve_size];
 
 	rxq->umem = xdp_umem_configure(internals, rxq);
-- 
2.17.1


             reply	other threads:[~2020-06-23 14:50 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 14:29 Ciara Loftus [this message]
2020-06-30 14:14 ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200623142925.28305-1-ciara.loftus@intel.com \
    --to=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=magnus.karlsson@intel.com \
    --cc=qi.z.zhang@intel.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).