DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes
@ 2017-05-27  3:47 Rahul Lakkireddy
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 1/4] cxgbe: improve latency for slow traffic Rahul Lakkireddy
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Rahul Lakkireddy @ 2017-05-27  3:47 UTC (permalink / raw)
  To: dev; +Cc: Nirranjan Kirubaharan, Indranil Choudhury, Kumar Sanghvi

This series of patches rework TX and RX path to reduce latency
and improve performance.

Patch 1 reduces latency for slow traffic by using status page update
on RX path to process batch of packets and improves coalesce TX path
to handle slow moving traffic.

Patch 2 fixes an issue with RXQ default parameters not being applied
to all ports under same PF.

Patch 3 fixes rmb bottleneck in RX path.

Patch 4 adds ability to configure PCIe extended tags.

This series depend on following series:

"cxgbe: add support for Chelsio T6 family of adapters"

Thanks,
Rahul

Rahul Lakkireddy (4):
  cxgbe: improve latency for slow traffic
  cxgbe: fix rxq default params for ports under same PF
  cxgbe: remove rmb bottleneck in RX path
  cxgbe: configure PCIe extended tags

 config/common_base                      |   3 +-
 doc/guides/nics/cxgbe.rst               |   4 +
 doc/guides/rel_notes/release_17_08.rst  |   5 +
 drivers/net/cxgbe/base/adapter.h        |   5 +-
 drivers/net/cxgbe/base/t4_regs.h        |  20 +++
 drivers/net/cxgbe/base/t4_regs_values.h |   2 +-
 drivers/net/cxgbe/base/t4fw_interface.h |   8 +
 drivers/net/cxgbe/cxgbe.h               |   3 +-
 drivers/net/cxgbe/cxgbe_compat.h        |  11 +-
 drivers/net/cxgbe/cxgbe_ethdev.c        |   5 +-
 drivers/net/cxgbe/cxgbe_main.c          |  69 +++++---
 drivers/net/cxgbe/sge.c                 | 270 ++++++++++++++++++--------------
 12 files changed, 255 insertions(+), 150 deletions(-)

-- 
2.5.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-dev] [PATCH 1/4] cxgbe: improve latency for slow traffic
  2017-05-27  3:47 [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Rahul Lakkireddy
@ 2017-05-27  3:47 ` Rahul Lakkireddy
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 2/4] cxgbe: fix rxq default params for ports under same PF Rahul Lakkireddy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Rahul Lakkireddy @ 2017-05-27  3:47 UTC (permalink / raw)
  To: dev; +Cc: Nirranjan Kirubaharan, Indranil Choudhury, Kumar Sanghvi

TX coalescing waits for ETH_COALESCE_PKT_NUM packets to be coalesced
across bursts before transmitting them.  For slow traffic, such as
100 PPS, this approach increases latency since packets are received
one at a time and tx coalescing has to wait for ETH_COALESCE_PKT
number of packets to arrive before transmitting.

To fix this:

- Update rx path to use status page instead and only receive packets
  when either the ingress interrupt timer threshold (5 us) or
  the ingress interrupt packet count threshold (32 packets) fires.
  (i.e. whichever happens first).

- If number of packets coalesced is <= number of packets sent
  by tx burst function, stop coalescing and transmit these packets
  immediately.

Also added compile time option to favor throughput over latency by
default.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 config/common_base                      |   3 +-
 doc/guides/nics/cxgbe.rst               |   4 ++
 doc/guides/rel_notes/release_17_08.rst  |   5 ++
 drivers/net/cxgbe/base/adapter.h        |   4 +-
 drivers/net/cxgbe/base/t4_regs_values.h |   2 +-
 drivers/net/cxgbe/base/t4fw_interface.h |   8 +++
 drivers/net/cxgbe/cxgbe_compat.h        |  11 +++-
 drivers/net/cxgbe/cxgbe_ethdev.c        |   3 +-
 drivers/net/cxgbe/cxgbe_main.c          |   5 +-
 drivers/net/cxgbe/sge.c                 | 109 ++++++++++++++++----------------
 10 files changed, 92 insertions(+), 62 deletions(-)

diff --git a/config/common_base b/config/common_base
index 67ef2ec..b2a6ff6 100644
--- a/config/common_base
+++ b/config/common_base
@@ -240,7 +240,7 @@ CONFIG_RTE_LIBRTE_BNX2X_MF_SUPPORT=n
 CONFIG_RTE_LIBRTE_BNX2X_DEBUG_PERIODIC=n
 
 #
-# Compile burst-oriented Chelsio Terminator 10GbE/40GbE (CXGBE) PMD
+# Compile burst-oriented Chelsio Terminator (CXGBE) PMD
 #
 CONFIG_RTE_LIBRTE_CXGBE_PMD=y
 CONFIG_RTE_LIBRTE_CXGBE_DEBUG=n
@@ -248,6 +248,7 @@ CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG=n
 CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX=n
 CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_CXGBE_TPUT=y
 
 #
 # Compile burst-oriented Cisco ENIC PMD driver
diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst
index 176c189..8651a7b 100644
--- a/doc/guides/nics/cxgbe.rst
+++ b/doc/guides/nics/cxgbe.rst
@@ -130,6 +130,10 @@ enabling debugging options may affect system performance.
 
   Toggle display of receiving data path run-time check messages.
 
+- ``CONFIG_RTE_LIBRTE_CXGBE_TPUT`` (default **y**)
+
+  Toggle behaviour to prefer Throughput or Latency.
+
 .. _driver-compilation:
 
 Driver compilation and testing
diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 39a3398..bd4ea2c 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -79,6 +79,11 @@ EAL
 Drivers
 ~~~~~~~
 
+* **net/cxgbe: latency and performance improvements**
+
+  TX and RX path reworked to improve performance.  Also reduced latency
+  for slow traffic.
+
 
 Libraries
 ~~~~~~~~~
diff --git a/drivers/net/cxgbe/base/adapter.h b/drivers/net/cxgbe/base/adapter.h
index cc89e49..58c6903 100644
--- a/drivers/net/cxgbe/base/adapter.h
+++ b/drivers/net/cxgbe/base/adapter.h
@@ -148,6 +148,7 @@ struct sge_rspq {                   /* state for an SGE response queue */
 
 	void __iomem *bar2_addr;    /* address of BAR2 Queue registers */
 	unsigned int bar2_qid;      /* Queue ID for BAR2 Queue registers */
+	struct sge_qstat *stat;
 
 	unsigned int cidx;          /* consumer index */
 	unsigned int gts_idx;	    /* last gts write sent */
@@ -708,7 +709,8 @@ void reclaim_completed_tx(struct sge_txq *q);
 void t4_free_sge_resources(struct adapter *adap);
 void t4_sge_tx_monitor_start(struct adapter *adap);
 void t4_sge_tx_monitor_stop(struct adapter *adap);
-int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf *mbuf);
+int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf *mbuf,
+		uint16_t nb_pkts);
 int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
 		     const struct pkt_gl *gl);
 int t4_sge_init(struct adapter *adap);
diff --git a/drivers/net/cxgbe/base/t4_regs_values.h b/drivers/net/cxgbe/base/t4_regs_values.h
index 1326594..9085ff6d 100644
--- a/drivers/net/cxgbe/base/t4_regs_values.h
+++ b/drivers/net/cxgbe/base/t4_regs_values.h
@@ -82,7 +82,7 @@
 /*
  * Ingress Context field values
  */
-#define X_UPDATEDELIVERY_INTERRUPT	1
+#define X_UPDATEDELIVERY_STATUS_PAGE	2
 
 #define X_RSPD_TYPE_FLBUF		0
 #define X_RSPD_TYPE_CPL			1
diff --git a/drivers/net/cxgbe/base/t4fw_interface.h b/drivers/net/cxgbe/base/t4fw_interface.h
index fcc61bf..6283fe9 100644
--- a/drivers/net/cxgbe/base/t4fw_interface.h
+++ b/drivers/net/cxgbe/base/t4fw_interface.h
@@ -84,6 +84,7 @@ enum fw_memtype {
 enum fw_wr_opcodes {
 	FW_ETH_TX_PKT_WR	= 0x08,
 	FW_ETH_TX_PKTS_WR	= 0x09,
+	FW_ETH_TX_PKTS2_WR      = 0x78,
 };
 
 /*
@@ -591,6 +592,13 @@ struct fw_iq_cmd {
 #define G_FW_IQ_CMD_IQESIZE(x)	\
 	(((x) >> S_FW_IQ_CMD_IQESIZE) & M_FW_IQ_CMD_IQESIZE)
 
+#define S_FW_IQ_CMD_IQRO                30
+#define M_FW_IQ_CMD_IQRO                0x1
+#define V_FW_IQ_CMD_IQRO(x)             ((x) << S_FW_IQ_CMD_IQRO)
+#define G_FW_IQ_CMD_IQRO(x)             \
+	(((x) >> S_FW_IQ_CMD_IQRO) & M_FW_IQ_CMD_IQRO)
+#define F_FW_IQ_CMD_IQRO                V_FW_IQ_CMD_IQRO(1U)
+
 #define S_FW_IQ_CMD_IQFLINTCONGEN	27
 #define M_FW_IQ_CMD_IQFLINTCONGEN	0x1
 #define V_FW_IQ_CMD_IQFLINTCONGEN(x)	((x) << S_FW_IQ_CMD_IQFLINTCONGEN)
diff --git a/drivers/net/cxgbe/cxgbe_compat.h b/drivers/net/cxgbe/cxgbe_compat.h
index 1551cbf..03bba9f 100644
--- a/drivers/net/cxgbe/cxgbe_compat.h
+++ b/drivers/net/cxgbe/cxgbe_compat.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2014-2015 Chelsio Communications.
+ *   Copyright(c) 2014-2017 Chelsio Communications.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -226,6 +226,15 @@ static inline int cxgbe_fls(int x)
 	return x ? sizeof(x) * 8 - __builtin_clz(x) : 0;
 }
 
+/**
+ * cxgbe_ffs - find first bit set
+ * @x: the word to search
+ */
+static inline int cxgbe_ffs(int x)
+{
+	return x ? __builtin_ffs(x) : 0;
+}
+
 static inline unsigned long ilog2(unsigned long n)
 {
 	unsigned int e = 0;
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index ac70f22..7282575 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -104,7 +104,8 @@ static uint16_t cxgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		pkts_remain = nb_pkts - total_sent;
 
 		for (pkts_sent = 0; pkts_sent < pkts_remain; pkts_sent++) {
-			ret = t4_eth_xmit(txq, tx_pkts[total_sent + pkts_sent]);
+			ret = t4_eth_xmit(txq, tx_pkts[total_sent + pkts_sent],
+					  nb_pkts);
 			if (ret < 0)
 				break;
 		}
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index 42238ef..2522354 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -301,7 +301,7 @@ void cfg_queues(struct rte_eth_dev *eth_dev)
 		for (i = 0; i < ARRAY_SIZE(s->ethrxq); i++) {
 			struct sge_eth_rxq *r = &s->ethrxq[i];
 
-			init_rspq(adap, &r->rspq, 0, 0, 1024, 64);
+			init_rspq(adap, &r->rspq, 5, 32, 1024, 64);
 			r->usembufs = 1;
 			r->fl.size = (r->usembufs ? 1024 : 72);
 		}
@@ -445,6 +445,9 @@ static int adap_init0_tweaks(struct adapter *adapter)
 			 V_CREDITCNT(M_CREDITCNT) | M_CREDITCNTPACKING,
 			 V_CREDITCNT(3) | V_CREDITCNTPACKING(1));
 
+	t4_set_reg_field(adapter, A_SGE_INGRESS_RX_THRESHOLD,
+			 V_THRESHOLD_3(M_THRESHOLD_3), V_THRESHOLD_3(32U));
+
 	t4_set_reg_field(adapter, A_SGE_CONTROL2, V_IDMAARBROUNDROBIN(1U),
 			 V_IDMAARBROUNDROBIN(1U));
 
diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 020879a..d98c3f6 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -848,7 +848,7 @@ static inline void ship_tx_pkt_coalesce_wr(struct adapter *adap,
 
 	/* fill the pkts WR header */
 	wr = (void *)&q->desc[q->pidx];
-	wr->op_pkd = htonl(V_FW_WR_OP(FW_ETH_TX_PKTS_WR));
+	wr->op_pkd = htonl(V_FW_WR_OP(FW_ETH_TX_PKTS2_WR));
 
 	wr_mid = V_FW_WR_LEN16(DIV_ROUND_UP(q->coalesce.flits, 2));
 	ndesc = flits_to_desc(q->coalesce.flits);
@@ -971,7 +971,7 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq *txq,
 					struct rte_mbuf *mbuf,
 					int flits, struct adapter *adap,
 					const struct port_info *pi,
-					dma_addr_t *addr)
+					dma_addr_t *addr, uint16_t nb_pkts)
 {
 	u64 cntrl, *end;
 	struct sge_txq *q = &txq->q;
@@ -981,6 +981,10 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq *txq,
 	struct tx_sw_desc *sd;
 	unsigned int idx = q->coalesce.idx, len = mbuf->pkt_len;
 
+#ifdef RTE_LIBRTE_CXGBE_TPUT
+	RTE_SET_USED(nb_pkts);
+#endif
+
 	if (q->coalesce.type == 0) {
 		mc = (struct ulp_txpkt *)q->coalesce.ptr;
 		mc->cmd_dest = htonl(V_ULPTX_CMD(4) | V_ULP_TXPKT_DEST(0) |
@@ -1050,7 +1054,11 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq *txq,
 	sd->coalesce.idx = (idx & 1) + 1;
 
 	/* send the coaelsced work request if max reached */
-	if (++q->coalesce.idx == ETH_COALESCE_PKT_NUM)
+	if (++q->coalesce.idx == ETH_COALESCE_PKT_NUM
+#ifndef RTE_LIBRTE_CXGBE_TPUT
+	    || q->coalesce.idx >= nb_pkts
+#endif
+	    )
 		ship_tx_pkt_coalesce_wr(adap, txq);
 	return 0;
 }
@@ -1062,7 +1070,8 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq *txq,
  *
  * Add a packet to an SGE Ethernet Tx queue.  Runs with softirqs disabled.
  */
-int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf *mbuf)
+int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf *mbuf,
+		uint16_t nb_pkts)
 {
 	const struct port_info *pi;
 	struct cpl_tx_pkt_lso_core *lso;
@@ -1116,7 +1125,7 @@ int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf *mbuf)
 			}
 			rte_prefetch0((volatile void *)addr);
 			return tx_do_packet_coalesce(txq, mbuf, cflits, adap,
-						     pi, addr);
+						     pi, addr, nb_pkts);
 		} else {
 			return -EBUSY;
 		}
@@ -1398,20 +1407,6 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
 	return 0;
 }
 
-/**
- * is_new_response - check if a response is newly written
- * @r: the response descriptor
- * @q: the response queue
- *
- * Returns true if a response descriptor contains a yet unprocessed
- * response.
- */
-static inline bool is_new_response(const struct rsp_ctrl *r,
-				   const struct sge_rspq *q)
-{
-	return (r->u.type_gen >> S_RSPD_GEN) == q->gen;
-}
-
 #define CXGB4_MSG_AN ((void *)1)
 
 /**
@@ -1453,12 +1448,12 @@ static int process_responses(struct sge_rspq *q, int budget,
 	struct sge_eth_rxq *rxq = container_of(q, struct sge_eth_rxq, rspq);
 
 	while (likely(budget_left)) {
+		if (q->cidx == ntohs(q->stat->pidx))
+			break;
+
 		rc = (const struct rsp_ctrl *)
 		     ((const char *)q->cur_desc + (q->iqe_len - sizeof(*rc)));
 
-		if (!is_new_response(rc, q))
-			break;
-
 		/*
 		 * Ensure response has been read
 		 */
@@ -1548,35 +1543,6 @@ static int process_responses(struct sge_rspq *q, int budget,
 
 		rspq_next(q);
 		budget_left--;
-
-		if (R_IDXDIFF(q, gts_idx) >= 64) {
-			unsigned int cidx_inc = R_IDXDIFF(q, gts_idx);
-			unsigned int params;
-			u32 val;
-
-			if (fl_cap(&rxq->fl) - rxq->fl.avail >= 64)
-				__refill_fl(q->adapter, &rxq->fl);
-			params = V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX);
-			q->next_intr_params = params;
-			val = V_CIDXINC(cidx_inc) | V_SEINTARM(params);
-
-			if (unlikely(!q->bar2_addr))
-				t4_write_reg(q->adapter, MYPF_REG(A_SGE_PF_GTS),
-					     val |
-					     V_INGRESSQID((u32)q->cntxt_id));
-			else {
-				writel(val | V_INGRESSQID(q->bar2_qid),
-				       (void *)((uintptr_t)q->bar2_addr +
-				       SGE_UDB_GTS));
-				/*
-				 * This Write memory Barrier will force the
-				 * write to the User Doorbell area to be
-				 * flushed.
-				 */
-				wmb();
-			}
-			q->gts_idx = q->cidx;
-		}
 	}
 
 	/*
@@ -1594,10 +1560,38 @@ static int process_responses(struct sge_rspq *q, int budget,
 int cxgbe_poll(struct sge_rspq *q, struct rte_mbuf **rx_pkts,
 	       unsigned int budget, unsigned int *work_done)
 {
-	int err = 0;
+	struct sge_eth_rxq *rxq = container_of(q, struct sge_eth_rxq, rspq);
+	unsigned int cidx_inc;
+	unsigned int params;
+	u32 val;
 
 	*work_done = process_responses(q, budget, rx_pkts);
-	return err;
+
+	if (*work_done) {
+		cidx_inc = R_IDXDIFF(q, gts_idx);
+
+		if (q->offset >= 0 && fl_cap(&rxq->fl) - rxq->fl.avail >= 64)
+			__refill_fl(q->adapter, &rxq->fl);
+
+		params = q->intr_params;
+		q->next_intr_params = params;
+		val = V_CIDXINC(cidx_inc) | V_SEINTARM(params);
+
+		if (unlikely(!q->bar2_addr)) {
+			t4_write_reg(q->adapter, MYPF_REG(A_SGE_PF_GTS),
+				     val | V_INGRESSQID((u32)q->cntxt_id));
+		} else {
+			writel(val | V_INGRESSQID(q->bar2_qid),
+			       (void *)((uintptr_t)q->bar2_addr + SGE_UDB_GTS));
+			/* This Write memory Barrier will force the
+			 * write to the User Doorbell area to be
+			 * flushed.
+			 */
+			wmb();
+		}
+		q->gts_idx = q->cidx;
+	}
+	return 0;
 }
 
 /**
@@ -1687,18 +1681,20 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 		      V_FW_IQ_CMD_IQASYNCH(fwevtq) |
 		      V_FW_IQ_CMD_VIID(pi->viid) |
 		      V_FW_IQ_CMD_IQANDST(intr_idx < 0) |
-		      V_FW_IQ_CMD_IQANUD(X_UPDATEDELIVERY_INTERRUPT) |
+		      V_FW_IQ_CMD_IQANUD(X_UPDATEDELIVERY_STATUS_PAGE) |
 		      V_FW_IQ_CMD_IQANDSTINDEX(intr_idx >= 0 ? intr_idx :
 							       -intr_idx - 1));
 	c.iqdroprss_to_iqesize =
-		htons(V_FW_IQ_CMD_IQPCIECH(pi->tx_chan) |
+		htons(V_FW_IQ_CMD_IQPCIECH(cong > 0 ? cxgbe_ffs(cong) - 1 :
+						      pi->tx_chan) |
 		      F_FW_IQ_CMD_IQGTSMODE |
 		      V_FW_IQ_CMD_IQINTCNTTHRESH(iq->pktcnt_idx) |
 		      V_FW_IQ_CMD_IQESIZE(ilog2(iq->iqe_len) - 4));
 	c.iqsize = htons(iq->size);
 	c.iqaddr = cpu_to_be64(iq->phys_addr);
 	if (cong >= 0)
-		c.iqns_to_fl0congen = htonl(F_FW_IQ_CMD_IQFLINTCONGEN);
+		c.iqns_to_fl0congen = htonl(F_FW_IQ_CMD_IQFLINTCONGEN |
+					    F_FW_IQ_CMD_IQRO);
 
 	if (fl) {
 		struct sge_eth_rxq *rxq = container_of(fl, struct sge_eth_rxq,
@@ -1773,6 +1769,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 	iq->bar2_addr = bar2_address(adap, iq->cntxt_id, T4_BAR2_QTYPE_INGRESS,
 				     &iq->bar2_qid);
 	iq->size--;                           /* subtract status entry */
+	iq->stat = (void *)&iq->desc[iq->size * 8];
 	iq->eth_dev = eth_dev;
 	iq->handler = hnd;
 	iq->port_id = pi->port_id;
-- 
2.5.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-dev] [PATCH 2/4] cxgbe: fix rxq default params for ports under same PF
  2017-05-27  3:47 [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Rahul Lakkireddy
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 1/4] cxgbe: improve latency for slow traffic Rahul Lakkireddy
@ 2017-05-27  3:47 ` Rahul Lakkireddy
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 3/4] cxgbe: remove rmb bottleneck in RX path Rahul Lakkireddy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Rahul Lakkireddy @ 2017-05-27  3:47 UTC (permalink / raw)
  To: dev; +Cc: Nirranjan Kirubaharan, Indranil Choudhury, Kumar Sanghvi

Enabling rx queues with default interrupt parameters doesn't happen
for other ports under same PF due to FULL_INIT_DONE flag being set
by the first port.

Fix is to to allow each port to enable its own rx queues with default
parameters.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/cxgbe/cxgbe.h        |  3 ++-
 drivers/net/cxgbe/cxgbe_ethdev.c |  2 ++
 drivers/net/cxgbe/cxgbe_main.c   | 33 +++++++++++----------------------
 3 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h
index 0201c99..9120c43 100644
--- a/drivers/net/cxgbe/cxgbe.h
+++ b/drivers/net/cxgbe/cxgbe.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2014-2015 Chelsio Communications.
+ *   Copyright(c) 2014-2017 Chelsio Communications.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -59,5 +59,6 @@ int setup_sge_fwevtq(struct adapter *adapter);
 void cfg_queues(struct rte_eth_dev *eth_dev);
 int cfg_queue_count(struct rte_eth_dev *eth_dev);
 int setup_rss(struct port_info *pi);
+void cxgbe_enable_rx_queues(struct port_info *pi);
 
 #endif /* _CXGBE_H_ */
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 7282575..ade0b11 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -339,6 +339,8 @@ static int cxgbe_dev_start(struct rte_eth_dev *eth_dev)
 			goto out;
 	}
 
+	cxgbe_enable_rx_queues(pi);
+
 	err = setup_rss(pi);
 	if (err)
 		goto out;
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index 2522354..4d95f5d 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -984,33 +984,22 @@ int setup_rss(struct port_info *pi)
 /*
  * Enable NAPI scheduling and interrupt generation for all Rx queues.
  */
-static void enable_rx(struct adapter *adap)
+static void enable_rx(struct adapter *adap, struct sge_rspq *q)
 {
-	struct sge *s = &adap->sge;
-	struct sge_rspq *q = &s->fw_evtq;
-	int i, j;
-
 	/* 0-increment GTS to start the timer and enable interrupts */
 	t4_write_reg(adap, MYPF_REG(A_SGE_PF_GTS),
 		     V_SEINTARM(q->intr_params) |
 		     V_INGRESSQID(q->cntxt_id));
+}
 
-	for_each_port(adap, i) {
-		const struct port_info *pi = &adap->port[i];
-		struct rte_eth_dev *eth_dev = pi->eth_dev;
-
-		for (j = 0; j < eth_dev->data->nb_rx_queues; j++) {
-			q = eth_dev->data->rx_queues[j];
-
-			/*
-			 * 0-increment GTS to start the timer and enable
-			 * interrupts
-			 */
-			t4_write_reg(adap, MYPF_REG(A_SGE_PF_GTS),
-				     V_SEINTARM(q->intr_params) |
-				     V_INGRESSQID(q->cntxt_id));
-		}
-	}
+void cxgbe_enable_rx_queues(struct port_info *pi)
+{
+	struct adapter *adap = pi->adapter;
+	struct sge *s = &adap->sge;
+	unsigned int i;
+
+	for (i = 0; i < pi->n_rx_qsets; i++)
+		enable_rx(adap, &s->ethrxq[pi->first_qset + i].rspq);
 }
 
 /**
@@ -1023,7 +1012,7 @@ static void enable_rx(struct adapter *adap)
  */
 int cxgbe_up(struct adapter *adap)
 {
-	enable_rx(adap);
+	enable_rx(adap, &adap->sge.fw_evtq);
 	t4_sge_tx_monitor_start(adap);
 	t4_intr_enable(adap);
 	adap->flags |= FULL_INIT_DONE;
-- 
2.5.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-dev] [PATCH 3/4] cxgbe: remove rmb bottleneck in RX path
  2017-05-27  3:47 [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Rahul Lakkireddy
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 1/4] cxgbe: improve latency for slow traffic Rahul Lakkireddy
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 2/4] cxgbe: fix rxq default params for ports under same PF Rahul Lakkireddy
@ 2017-05-27  3:47 ` Rahul Lakkireddy
  2017-05-27  3:48 ` [dpdk-dev] [PATCH 4/4] cxgbe: configure PCIe extended tags Rahul Lakkireddy
  2017-05-30 11:25 ` [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Ferruh Yigit
  4 siblings, 0 replies; 6+ messages in thread
From: Rahul Lakkireddy @ 2017-05-27  3:47 UTC (permalink / raw)
  To: dev; +Cc: Nirranjan Kirubaharan, Indranil Choudhury, Kumar Sanghvi

rmb before determining rsp_type is a bottleneck.
Once we determine rsp-type is FL, we can directly go ahead and read
packets based on q->stat->pidx and budget_left.

This removes bottleneck of rmb once per every RX packet.
Now, rmb exists once per RX batch.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/cxgbe/sge.c | 161 +++++++++++++++++++++++++++++-------------------
 1 file changed, 96 insertions(+), 65 deletions(-)

diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index d98c3f6..9cbd4ec 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -683,6 +683,10 @@ static void write_sgl(struct rte_mbuf *mbuf, struct sge_txq *q,
 #define Q_IDXDIFF(q, idx) IDXDIFF((q)->pidx, (q)->idx, (q)->size)
 #define R_IDXDIFF(q, idx) IDXDIFF((q)->cidx, (q)->idx, (q)->size)
 
+#define PIDXDIFF(head, tail, wrap) \
+	((tail) >= (head) ? (tail) - (head) : (wrap) - (head) + (tail))
+#define P_IDXDIFF(q, idx) PIDXDIFF((q)->cidx, idx, (q)->size)
+
 /**
  * ring_tx_db - ring a Tx queue's doorbell
  * @adap: the adapter
@@ -1461,74 +1465,101 @@ static int process_responses(struct sge_rspq *q, int budget,
 		rsp_type = G_RSPD_TYPE(rc->u.type_gen);
 
 		if (likely(rsp_type == X_RSPD_TYPE_FLBUF)) {
-			const struct rx_sw_desc *rsd =
-						&rxq->fl.sdesc[rxq->fl.cidx];
-			const struct rss_header *rss_hdr =
-						(const void *)q->cur_desc;
-			const struct cpl_rx_pkt *cpl =
-						(const void *)&q->cur_desc[1];
-			struct rte_mbuf *pkt, *npkt;
-			u32 len, bufsz;
-			bool csum_ok;
-			u16 err_vec;
-
-			len = ntohl(rc->pldbuflen_qid);
-			BUG_ON(!(len & F_RSPD_NEWBUF));
-			pkt = rsd->buf;
-			npkt = pkt;
-			len = G_RSPD_LEN(len);
-			pkt->pkt_len = len;
-
-			/* Compressed error vector is enabled for
-			 * T6 only
-			 */
-			if (q->adapter->params.tp.rx_pkt_encap)
-				err_vec = G_T6_COMPR_RXERR_VEC(
-						ntohs(cpl->err_vec));
-			else
-				err_vec = ntohs(cpl->err_vec);
-			csum_ok = cpl->csum_calc && !err_vec;
-
-			/* Chain mbufs into len if necessary */
-			while (len) {
-				struct rte_mbuf *new_pkt = rsd->buf;
-
-				bufsz = min(get_buf_size(q->adapter, rsd), len);
-				new_pkt->data_len = bufsz;
-				unmap_rx_buf(&rxq->fl);
-				len -= bufsz;
-				npkt->next = new_pkt;
-				npkt = new_pkt;
-				pkt->nb_segs++;
-				rsd = &rxq->fl.sdesc[rxq->fl.cidx];
-			}
-			npkt->next = NULL;
-			pkt->nb_segs--;
-
-			if (cpl->l2info & htonl(F_RXF_IP)) {
-				pkt->packet_type = RTE_PTYPE_L3_IPV4;
-				if (unlikely(!csum_ok))
-					pkt->ol_flags |= PKT_RX_IP_CKSUM_BAD;
-
-				if ((cpl->l2info &
-				     htonl(F_RXF_UDP | F_RXF_TCP)) && !csum_ok)
-					pkt->ol_flags |= PKT_RX_L4_CKSUM_BAD;
-			} else if (cpl->l2info & htonl(F_RXF_IP6)) {
-				pkt->packet_type = RTE_PTYPE_L3_IPV6;
-			}
+			unsigned int stat_pidx;
+			int stat_pidx_diff;
+
+			stat_pidx = ntohs(q->stat->pidx);
+			stat_pidx_diff = P_IDXDIFF(q, stat_pidx);
+			while (stat_pidx_diff && budget_left) {
+				const struct rx_sw_desc *rsd =
+					&rxq->fl.sdesc[rxq->fl.cidx];
+				const struct rss_header *rss_hdr =
+					(const void *)q->cur_desc;
+				const struct cpl_rx_pkt *cpl =
+					(const void *)&q->cur_desc[1];
+				struct rte_mbuf *pkt, *npkt;
+				u32 len, bufsz;
+				bool csum_ok;
+				u16 err_vec;
+
+				rc = (const struct rsp_ctrl *)
+				     ((const char *)q->cur_desc +
+				      (q->iqe_len - sizeof(*rc)));
+
+				rsp_type = G_RSPD_TYPE(rc->u.type_gen);
+				if (unlikely(rsp_type != X_RSPD_TYPE_FLBUF))
+					break;
+
+				len = ntohl(rc->pldbuflen_qid);
+				BUG_ON(!(len & F_RSPD_NEWBUF));
+				pkt = rsd->buf;
+				npkt = pkt;
+				len = G_RSPD_LEN(len);
+				pkt->pkt_len = len;
+
+				/* Compressed error vector is enabled for
+				 * T6 only
+				 */
+				if (q->adapter->params.tp.rx_pkt_encap)
+					err_vec = G_T6_COMPR_RXERR_VEC(
+							ntohs(cpl->err_vec));
+				else
+					err_vec = ntohs(cpl->err_vec);
+				csum_ok = cpl->csum_calc && !err_vec;
+
+				/* Chain mbufs into len if necessary */
+				while (len) {
+					struct rte_mbuf *new_pkt = rsd->buf;
+
+					bufsz = min(get_buf_size(q->adapter,
+								 rsd), len);
+					new_pkt->data_len = bufsz;
+					unmap_rx_buf(&rxq->fl);
+					len -= bufsz;
+					npkt->next = new_pkt;
+					npkt = new_pkt;
+					pkt->nb_segs++;
+					rsd = &rxq->fl.sdesc[rxq->fl.cidx];
+				}
+				npkt->next = NULL;
+				pkt->nb_segs--;
+
+				if (cpl->l2info & htonl(F_RXF_IP)) {
+					pkt->packet_type = RTE_PTYPE_L3_IPV4;
+					if (unlikely(!csum_ok))
+						pkt->ol_flags |=
+							PKT_RX_IP_CKSUM_BAD;
+
+					if ((cpl->l2info &
+					     htonl(F_RXF_UDP | F_RXF_TCP)) &&
+					    !csum_ok)
+						pkt->ol_flags |=
+							PKT_RX_L4_CKSUM_BAD;
+				} else if (cpl->l2info & htonl(F_RXF_IP6)) {
+					pkt->packet_type = RTE_PTYPE_L3_IPV6;
+				}
 
-			if (!rss_hdr->filter_tid && rss_hdr->hash_type) {
-				pkt->ol_flags |= PKT_RX_RSS_HASH;
-				pkt->hash.rss = ntohl(rss_hdr->hash_val);
-			}
+				if (!rss_hdr->filter_tid &&
+				    rss_hdr->hash_type) {
+					pkt->ol_flags |= PKT_RX_RSS_HASH;
+					pkt->hash.rss =
+						ntohl(rss_hdr->hash_val);
+				}
+
+				if (cpl->vlan_ex) {
+					pkt->ol_flags |= PKT_RX_VLAN_PKT;
+					pkt->vlan_tci = ntohs(cpl->vlan);
+				}
+
+				rxq->stats.pkts++;
+				rxq->stats.rx_bytes += pkt->pkt_len;
+				rx_pkts[budget - budget_left] = pkt;
 
-			if (cpl->vlan_ex) {
-				pkt->ol_flags |= PKT_RX_VLAN_PKT;
-				pkt->vlan_tci = ntohs(cpl->vlan);
+				rspq_next(q);
+				budget_left--;
+				stat_pidx_diff--;
 			}
-			rxq->stats.pkts++;
-			rxq->stats.rx_bytes += pkt->pkt_len;
-			rx_pkts[budget - budget_left] = pkt;
+			continue;
 		} else if (likely(rsp_type == X_RSPD_TYPE_CPL)) {
 			ret = q->handler(q, q->cur_desc, NULL);
 		} else {
-- 
2.5.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-dev] [PATCH 4/4] cxgbe: configure PCIe extended tags
  2017-05-27  3:47 [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Rahul Lakkireddy
                   ` (2 preceding siblings ...)
  2017-05-27  3:47 ` [dpdk-dev] [PATCH 3/4] cxgbe: remove rmb bottleneck in RX path Rahul Lakkireddy
@ 2017-05-27  3:48 ` Rahul Lakkireddy
  2017-05-30 11:25 ` [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Ferruh Yigit
  4 siblings, 0 replies; 6+ messages in thread
From: Rahul Lakkireddy @ 2017-05-27  3:48 UTC (permalink / raw)
  To: dev; +Cc: Nirranjan Kirubaharan, Indranil Choudhury, Kumar Sanghvi

Add support to configure minimum and maximum PCIe extended tag.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/cxgbe/base/adapter.h |  1 +
 drivers/net/cxgbe/base/t4_regs.h | 20 ++++++++++++++++++++
 drivers/net/cxgbe/cxgbe_main.c   | 31 +++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/drivers/net/cxgbe/base/adapter.h b/drivers/net/cxgbe/base/adapter.h
index 58c6903..5e5f221 100644
--- a/drivers/net/cxgbe/base/adapter.h
+++ b/drivers/net/cxgbe/base/adapter.h
@@ -462,6 +462,7 @@ static inline void t4_write_reg64(struct adapter *adapter, u32 reg_addr,
 #define PCI_CAP_LIST_NEXT       1       /* Next capability in the list */
 #define PCI_EXP_DEVCTL          0x0008  /* Device control */
 #define PCI_EXP_DEVCTL2         40      /* Device Control 2 */
+#define PCI_EXP_DEVCTL_EXT_TAG  0x0100  /* Extended Tag Field Enable */
 #define PCI_EXP_DEVCTL_PAYLOAD  0x00E0  /* Max payload */
 #define PCI_CAP_ID_VPD          0x03    /* Vital Product Data */
 #define PCI_VPD_ADDR            2       /* Address to access (15 bits!) */
diff --git a/drivers/net/cxgbe/base/t4_regs.h b/drivers/net/cxgbe/base/t4_regs.h
index 289c7e4..1100e16 100644
--- a/drivers/net/cxgbe/base/t4_regs.h
+++ b/drivers/net/cxgbe/base/t4_regs.h
@@ -420,6 +420,26 @@
 #define A_PCIE_FW 0x30b8
 #define A_PCIE_FW_PF 0x30bc
 
+#define A_PCIE_CFG2 0x3018
+
+#define S_TOTMAXTAG    0
+#define M_TOTMAXTAG    0x3U
+#define V_TOTMAXTAG(x) ((x) << S_TOTMAXTAG)
+
+#define S_T6_TOTMAXTAG    0
+#define M_T6_TOTMAXTAG    0x7U
+#define V_T6_TOTMAXTAG(x) ((x) << S_T6_TOTMAXTAG)
+
+#define A_PCIE_CMD_CFG	0x5980
+
+#define S_MINTAG	0
+#define M_MINTAG	0xffU
+#define V_MINTAG(x)	((x) << S_MINTAG)
+
+#define S_T6_MINTAG	0
+#define M_T6_MINTAG	0xffU
+#define V_T6_MINTAG(x)	((x) << S_T6_MINTAG)
+
 /* registers for module CIM */
 #define CIM_BASE_ADDR 0x7b00
 
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index 4d95f5d..ac5b48f 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -414,6 +414,36 @@ static void print_port_info(struct adapter *adap)
 	}
 }
 
+static void configure_pcie_ext_tag(struct adapter *adapter)
+{
+	u16 v;
+	int pos = t4_os_find_pci_capability(adapter, PCI_CAP_ID_EXP);
+
+	if (!pos)
+		return;
+
+	if (pos > 0) {
+		t4_os_pci_read_cfg2(adapter, pos + PCI_EXP_DEVCTL, &v);
+		v |= PCI_EXP_DEVCTL_EXT_TAG;
+		t4_os_pci_write_cfg2(adapter, pos + PCI_EXP_DEVCTL, v);
+		if (is_t6(adapter->params.chip)) {
+			t4_set_reg_field(adapter, A_PCIE_CFG2,
+					 V_T6_TOTMAXTAG(M_T6_TOTMAXTAG),
+					 V_T6_TOTMAXTAG(7));
+			t4_set_reg_field(adapter, A_PCIE_CMD_CFG,
+					 V_T6_MINTAG(M_T6_MINTAG),
+					 V_T6_MINTAG(8));
+		} else {
+			t4_set_reg_field(adapter, A_PCIE_CFG2,
+					 V_TOTMAXTAG(M_TOTMAXTAG),
+					 V_TOTMAXTAG(3));
+			t4_set_reg_field(adapter, A_PCIE_CMD_CFG,
+					 V_MINTAG(M_MINTAG),
+					 V_MINTAG(8));
+		}
+	}
+}
+
 /*
  * Tweak configuration based on system architecture, etc.  Most of these have
  * defaults assigned to them by Firmware Configuration Files (if we're using
@@ -799,6 +829,7 @@ static int adap_init0(struct adapter *adap)
 	}
 	t4_init_sge_params(adap);
 	t4_init_tp_params(adap);
+	configure_pcie_ext_tag(adap);
 
 	adap->params.drv_memwin = MEMWIN_NIC;
 	adap->flags |= FW_OK;
-- 
2.5.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes
  2017-05-27  3:47 [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Rahul Lakkireddy
                   ` (3 preceding siblings ...)
  2017-05-27  3:48 ` [dpdk-dev] [PATCH 4/4] cxgbe: configure PCIe extended tags Rahul Lakkireddy
@ 2017-05-30 11:25 ` Ferruh Yigit
  4 siblings, 0 replies; 6+ messages in thread
From: Ferruh Yigit @ 2017-05-30 11:25 UTC (permalink / raw)
  To: Rahul Lakkireddy, dev
  Cc: Nirranjan Kirubaharan, Indranil Choudhury, Kumar Sanghvi

On 5/27/2017 4:47 AM, Rahul Lakkireddy wrote:
> This series of patches rework TX and RX path to reduce latency
> and improve performance.
> 
> Patch 1 reduces latency for slow traffic by using status page update
> on RX path to process batch of packets and improves coalesce TX path
> to handle slow moving traffic.
> 
> Patch 2 fixes an issue with RXQ default parameters not being applied
> to all ports under same PF.
> 
> Patch 3 fixes rmb bottleneck in RX path.
> 
> Patch 4 adds ability to configure PCIe extended tags.
> 
> This series depend on following series:
> 
> "cxgbe: add support for Chelsio T6 family of adapters"
> 
> Thanks,
> Rahul
> 
> Rahul Lakkireddy (4):
>   cxgbe: improve latency for slow traffic
>   cxgbe: fix rxq default params for ports under same PF
>   cxgbe: remove rmb bottleneck in RX path
>   cxgbe: configure PCIe extended tags

Series applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-30 11:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27  3:47 [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Rahul Lakkireddy
2017-05-27  3:47 ` [dpdk-dev] [PATCH 1/4] cxgbe: improve latency for slow traffic Rahul Lakkireddy
2017-05-27  3:47 ` [dpdk-dev] [PATCH 2/4] cxgbe: fix rxq default params for ports under same PF Rahul Lakkireddy
2017-05-27  3:47 ` [dpdk-dev] [PATCH 3/4] cxgbe: remove rmb bottleneck in RX path Rahul Lakkireddy
2017-05-27  3:48 ` [dpdk-dev] [PATCH 4/4] cxgbe: configure PCIe extended tags Rahul Lakkireddy
2017-05-30 11:25 ` [dpdk-dev] [PATCH 0/4] cxgbe: latency and performance fixes Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).