DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k
@ 2015-09-29 13:03 Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
                   ` (13 more replies)
  0 siblings, 14 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (14):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add func to re-allocate mbuf for RX ring
  fm10k: add 2 functions to parse pkt_type and offload flag
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func

 drivers/net/fm10k/Makefile         |    1 +
 drivers/net/fm10k/fm10k.h          |   41 ++-
 drivers/net/fm10k/fm10k_ethdev.c   |  149 ++++++-
 drivers/net/fm10k/fm10k_rxtx_vec.c |  851 ++++++++++++++++++++++++++++++++++++
 4 files changed, 1016 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 01/14] fm10k: add new vPMD file
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 02/14] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/Makefile         |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 0000000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <inttypes.h>
+
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include <tmmintrin.h>
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 02/14] fm10k: add vPMD pre-condition check for each RX queue
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 03/14] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   11 ++++++++---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++++++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..8f9a878 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
 	/* Protect the mailbox to avoid race condition */
 	rte_spinlock_t    mbx_lock;
 	struct fm10k_macvlan_filter_info    macvlan;
+	/* Flag to indicate if RX vector conditions satisfied */
+	boot rx_vec_allowed;
 };
 
 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
 	struct rte_mempool *mp;
 	struct rte_mbuf **sw_ring;
 	volatile union fm10k_rx_desc *hw_ring;
-	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
+	uint64_t mbuf_initializer; /* value to init mbufs */
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
 	uint16_t queue_id;
 	uint8_t port_id;
 	uint8_t drop_en;
-	uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };
 
 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..3c7784e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1251,6 +1251,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
 	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
 	struct fm10k_rx_queue *q;
 	const struct rte_memzone *mz;
 
@@ -1333,6 +1334,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->hw_ring_phys_addr = mz->phys_addr;
 #endif
 
+	/* Check if number of descs satisfied Vector requirement */
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+				    "preconditions - canceling the feature for "
+				    "the whole port[%d]",
+			     q->queue_id, q->port_id);
+		dev_info->rx_vec_allowed = false;
+	} else
+		fm10k_rxq_vec_setup(q);
+
 	dev->data->rx_queues[queue_id] = q;
 	return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	/* data_off will be ajusted after new mbuf allocated for 512-byte
+	 * alignment.
+	 */
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+	return 0;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 03/14] fm10k: Add a new func to initialize all parameters
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 02/14] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 04/14] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c |   34 ++++++++++++++++++++++------------
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7784e..1bc1e7c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2066,6 +2066,26 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+	 * there is no way to get link status without reading BAR4.  Until this
+	 * works, assume we have maximum bandwidth.
+	 * @todo - fix bus info
+	 */
+	hw->bus_caps.speed = fm10k_bus_speed_8000;
+	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+	hw->bus_caps.payload = fm10k_bus_payload_512;
+	hw->bus.speed = fm10k_bus_speed_8000;
+	hw->bus.width = fm10k_bus_width_pcie_x8;
+	hw->bus.payload = fm10k_bus_payload_256;
+
+	info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2112,18 +2132,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 		return -EIO;
 	}
 
-	/*
-	 * Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-	 * there is no way to get link status without reading BAR4.  Until this
-	 * works, assume we have maximum bandwidth.
-	 * @todo - fix bus info
-	 */
-	hw->bus_caps.speed = fm10k_bus_speed_8000;
-	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-	hw->bus_caps.payload = fm10k_bus_payload_512;
-	hw->bus.speed = fm10k_bus_speed_8000;
-	hw->bus.width = fm10k_bus_width_pcie_x8;
-	hw->bus.payload = fm10k_bus_payload_256;
+	/* Initialize parameters */
+	fm10k_params_init(dev);
 
 	/* Initialize the hw */
 	diag = fm10k_init_hw(hw);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 04/14] fm10k: add func to re-allocate mbuf for RX ring
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (2 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 03/14] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 05/14] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    9 ++++
 drivers/net/fm10k/fm10k_ethdev.c   |    3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   90 ++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8f9a878..d924cae 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)    (1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)    ((vlan_id) >> 5)
 
+#define RTE_FM10K_RXQ_REARM_THRESH      32
+#define RTE_FM10K_VPMD_TX_BURST         32
+#define RTE_FM10K_MAX_RX_BURST          RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ    64
+#define RTE_FM10K_DESCS_PER_LOOP    4
+
 struct fm10k_macvlan_filter_info {
 	uint16_t vlan_num;       /* Total VLAN number */
 	uint16_t mac_num;        /* Total mac number */
@@ -178,6 +184,9 @@ struct fm10k_rx_queue {
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint16_t queue_id;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
+	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 1bc1e7c..24f936a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
 	FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+	q->rxrearm_start = 0;
+	q->rxrearm_nb = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..75533f9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -64,3 +64,93 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 	rxq->mbuf_initializer = *(uint64_t *)p;
 	return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+	int i;
+	uint16_t rx_id;
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
+	struct rte_mbuf *mb0, *mb1;
+	__m128i head_off = _mm_set_epi64x(
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+	__m128i dma_addr0, dma_addr1;
+	/* Rx buffer need to be aligned with 512 byte */
+	const __m128i hba_msk = _mm_set_epi64x(0,
+				UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+	rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+	/* Pull 'n' more MBUFs into the software ring */
+	if (rte_mempool_get_bulk(rxq->mp,
+				 (void *)mb_alloc,
+				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_FM10K_RXQ_REARM_THRESH;
+		return;
+	}
+
+	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
+	for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+		__m128i vaddr0, vaddr1;
+		uintptr_t p0, p1;
+
+		mb0 = mb_alloc[0];
+		mb1 = mb_alloc[1];
+
+		/* Flush mbuf with pkt template.
+		 * Data to be rearmed is 6 bytes long.
+		 * Though, RX will overwrite ol_flags that are coming next
+		 * anyway. So overwrite whole 8 bytes with one load:
+		 * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+		 */
+		p0 = (uintptr_t)&mb0->rearm_data;
+		*(uint64_t *)p0 = rxq->mbuf_initializer;
+		p1 = (uintptr_t)&mb1->rearm_data;
+		*(uint64_t *)p1 = rxq->mbuf_initializer;
+
+		/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
+		vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
+		vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
+
+		/* convert pa to dma_addr hdr/data */
+		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+		/* add headroom to pa values */
+		dma_addr0 = _mm_add_epi64(dma_addr0, head_off);
+		dma_addr1 = _mm_add_epi64(dma_addr1, head_off);
+
+		/* Do 512 byte alignment to satisfy HW requirement, in the
+		 * meanwhile, set Header Buffer Address to zero.
+		 */
+		dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
+		dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
+
+		/* flush desc with pa dma_addr */
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr0);
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr1);
+
+		/* enforce 512B alignment on default Rx virtual addresses */
+		mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb0->buf_addr);
+		mb1->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb1->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb1->buf_addr);
+	}
+
+	rxq->rxrearm_start += RTE_FM10K_RXQ_REARM_THRESH;
+	if (rxq->rxrearm_start >= rxq->nb_desc)
+		rxq->rxrearm_start = 0;
+
+	rxq->rxrearm_nb -= RTE_FM10K_RXQ_REARM_THRESH;
+
+	rx_id = (uint16_t) ((rxq->rxrearm_start == 0) ?
+			     (rxq->nb_desc - 1) : (rxq->rxrearm_start - 1));
+
+	/* Update the tail pointer on the NIC */
+	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 05/14] fm10k: add 2 functions to parse pkt_type and offload flag
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (3 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 04/14] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function Chen Jing D(Mark)
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |  127 ++++++++++++++++++++++++++++++++++++
 1 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 75533f9..581a309 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT     (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT     (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT     (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i ptype0, ptype1, vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	const __m128i pkttype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+	/* mask everything except rss type */
+	const __m128i rsstype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x000F, 0x000F, 0x000F, 0x000F);
+
+	/* map rss type to rss hash flag */
+	const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+			0, 0, 0, PKT_RX_RSS_HASH,
+			PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+			PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+	ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+	ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+	ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+	ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+	vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+	vtag1 = _mm_or_si128(ptype0, vtag1);
+	vol.dword = _mm_cvtsi128_si64(vtag1);
+
+	rx_pkts[0]->ol_flags = vol.e[0];
+	rx_pkts[1]->ol_flags = vol.e[1];
+	rx_pkts[2]->ol_flags = vol.e[2];
+	rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i l3l4type0, l3l4type1, l3type, l4type;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	/* L3 pkt type mask  Bit4 to Bit6 */
+	const __m128i l3type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0070, 0x0070, 0x0070, 0x0070);
+
+	/* L4 pkt type mask  Bit7 to Bit9 */
+	const __m128i l4type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0380, 0x0380, 0x0380, 0x0380);
+
+	/* convert RRC l3 type to mbuf format */
+	const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+			0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+			RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+			RTE_PTYPE_L3_IPV4, 0);
+
+	/* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+	 * to fill into8 bits length.
+	 */
+	const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+			RTE_PTYPE_TUNNEL_GENEVE >> 8,
+			RTE_PTYPE_TUNNEL_NVGRE >> 8,
+			RTE_PTYPE_TUNNEL_VXLAN >> 8,
+			RTE_PTYPE_TUNNEL_GRE >> 8,
+			RTE_PTYPE_L4_UDP >> 8,
+			RTE_PTYPE_L4_TCP >> 8,
+			0);
+
+	l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+	l3type = _mm_and_si128(l3l4type0, l3type_msk);
+	l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+	l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+	l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+	l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+	/* l4type_flags shift-left for 8 bits, need shift-right back */
+	l4type = _mm_shuffle_epi8(l4type_flags, l4type);
+
+	l4type = _mm_slli_epi16(l4type, 8);
+	l3l4type0 = _mm_or_si128(l3type, l4type);
+	vol.dword = _mm_cvtsi128_si64(l3l4type0);
+
+	rx_pkts[0]->packet_type = vol.e[0];
+	rx_pkts[1]->packet_type = vol.e[1];
+	rx_pkts[2]->packet_type = vol.e[2];
+	rx_pkts[3]->packet_type = vol.e[3];
+}
+#else
+#define fm10k_desc_to_olflags_v(desc, rx_pkts) do {} while (0)
+#define fm10k_desc_to_pktype_v(desc, rx_pkts) do {} while (0)
+#endif
+
 int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (4 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 05/14] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:14   ` Ananyev, Konstantin
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 07/14] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  213 ++++++++++++++++++++++++++++++++++++
 2 files changed, 214 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index d924cae..285254f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 581a309..63b34b5 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -281,3 +281,216 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	/* Update the tail pointer on the NIC */
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
+
+/*
+ * vPMD receive routine, now only accept (nb_pkts == RTE_IXGBE_VPMD_RX_BURST)
+ * in one loop
+ *
+ * Notice:
+ * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet
+ * - nb_pkts > RTE_IXGBE_VPMD_RX_BURST, only scan RTE_IXGBE_VPMD_RX_BURST
+ *   numbers of DD bit
+ * - don't support ol_flags for rss and csum err
+ */
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts, uint8_t *split_packet)
+{
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mbufp;
+	uint16_t nb_pkts_recd;
+	int pos;
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint64_t var;
+	__m128i shuf_msk;
+	__m128i dd_check, eop_check;
+	uint16_t next_dd;
+
+	next_dd = rxq->next_dd;
+
+	if (unlikely(nb_pkts < RTE_FM10K_MAX_RX_BURST))
+		return 0;
+
+	/* Just the act of getting into the function from the application is
+	 * going to cost about 7 cycles
+	 */
+	rxdp = rxq->hw_ring + next_dd;
+
+	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+	/* See if we need to rearm the RX queue - gives the prefetch a bit
+	 * of time to act
+	 */
+	if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+		fm10k_rxq_rearm(rxq);
+
+	/* Before we start moving massive data around, check to see if
+	 * there is actually a packet available
+	 */
+	if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+		return 0;
+
+	/* 4 packets DD mask */
+	dd_check = _mm_set_epi64x(0x0000000100000001LL, 0x0000000100000001LL);
+
+	/* 4 packets EOP mask */
+	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
+
+	/* mask to shuffle from desc. to mbuf */
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+		);
+
+	/* Cache is empty -> need to scan the buffer rings, but first move
+	 * the next 'n' mbufs into the cache
+	 */
+	mbufp = &rxq->sw_ring[next_dd];
+
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+			pos += RTE_FM10K_DESCS_PER_LOOP,
+			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+		if (split_packet) {
+			rte_prefetch0(&rx_pkts[pos]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
+		}
+
+		/* B.1 load 1 mbuf point */
+		mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
+
+		/* Read desc statuses backwards to avoid race condition */
+		/* A.1 load 4 pkts desc */
+		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+
+		/* B.1 load 1 mbuf point */
+		mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
+
+		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		/* B.1 load 2 mbuf point */
+		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+
+		/* avoid compiler reorder optimization */
+		rte_compiler_barrier();
+
+		/* D.1 pkt 3,4 convert format from desc to pktmbuf */
+		pkt_mb4 = _mm_shuffle_epi8(descs0[3], shuf_msk);
+		pkt_mb3 = _mm_shuffle_epi8(descs0[2], shuf_msk);
+
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp2 = _mm_unpackhi_epi32(descs0[3], descs0[2]);
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp1 = _mm_unpackhi_epi32(descs0[1], descs0[0]);
+
+		/* set ol_flags with vlan packet type */
+		fm10k_desc_to_olflags_v(descs0, &rx_pkts[pos]);
+
+		/* D.1 pkt 1,2 convert format from desc to pktmbuf */
+		pkt_mb2 = _mm_shuffle_epi8(descs0[1], shuf_msk);
+		pkt_mb1 = _mm_shuffle_epi8(descs0[0], shuf_msk);
+
+		/* C.2 get 4 pkts staterr value  */
+		zero = _mm_xor_si128(dd_check, dd_check);
+		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
+
+		/* D.3 copy final 3,4 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
+				pkt_mb4);
+		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
+				pkt_mb3);
+
+		/* C* extract and record EOP bit */
+		if (split_packet) {
+			__m128i eop_shuf_mask = _mm_set_epi8(
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0x04, 0x0C, 0x00, 0x08
+					);
+
+			/* and with mask to extract bits, flipping 1-0 */
+			__m128i eop_bits = _mm_andnot_si128(staterr, eop_check);
+			/* the staterr values are not in order, as the count
+			 * count of dd bits doesn't care. However, for end of
+			 * packet tracking, we do care, so shuffle. This also
+			 * compresses the 32-bit values to 8-bit
+			 */
+			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
+			/* store the resulting 32-bit value */
+			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
+			split_packet += RTE_FM10K_DESCS_PER_LOOP;
+
+			/* zero-out next pointers */
+			rx_pkts[pos]->next = NULL;
+			rx_pkts[pos + 1]->next = NULL;
+			rx_pkts[pos + 2]->next = NULL;
+			rx_pkts[pos + 3]->next = NULL;
+		}
+
+		/* C.3 calc available number of desc */
+		staterr = _mm_and_si128(staterr, dd_check);
+		staterr = _mm_packs_epi32(staterr, zero);
+
+		/* D.3 copy final 1,2 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
+				pkt_mb2);
+		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
+				pkt_mb1);
+
+		fm10k_desc_to_pktype_v(descs0, &rx_pkts[pos]);
+
+		/* C.4 calc avaialbe number of desc */
+		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
+		nb_pkts_recd += var;
+		if (likely(var != RTE_FM10K_DESCS_PER_LOOP))
+			break;
+	}
+
+	/* Update our internal tail pointer */
+	rxq->next_dd = (uint16_t)(rxq->next_dd + nb_pkts_recd);
+	rxq->next_dd = (uint16_t)(rxq->next_dd & (rxq->nb_desc - 1));
+	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
+
+	return nb_pkts_recd;
+}
+
+/* vPMD receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
+ *
+ * Notice:
+ * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
+ * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
+ *   numbers of DD bit
+ * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
+ * - don't support ol_flags for rss and csum err
+ */
+uint16_t
+fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts)
+{
+	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 07/14] fm10k: add func to do Vector RX condition check
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (5 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 08/14] fm10k: add Vector RX scatter function Chen Jing D(Mark)
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 285254f..e109525 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,5 +327,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 63b34b5..c9b009e 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 #endif
 
 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+	/* whithout rx ol_flags, no VP flag report */
+	if (rxmode->hw_vlan_extend != 0)
+		return -1;
+#endif
+
+	/* no fdir support */
+	if (fconf->mode != RTE_FDIR_MODE_NONE)
+		return -1;
+
+	/* - no csum error report support
+	 * - no header split support
+	 */
+	if (rxmode->hw_ip_checksum == 1 ||
+	    rxmode->header_split == 1)
+		return -1;
+
+	return 0;
+#else
+	RTE_SET_USED(dev);
+	return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
 	uintptr_t p;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 08/14] fm10k: add Vector RX scatter function
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (6 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 07/14] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 09/14] fm10k: add function to decide best RX function Chen Jing D(Mark)
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index e109525..744e8d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+					uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index c9b009e..afeca50 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -525,3 +525,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 {
 	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+		struct rte_mbuf **rx_bufs,
+		uint16_t nb_bufs, uint8_t *split_flags)
+{
+	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+	struct rte_mbuf *start = rxq->pkt_first_seg;
+	struct rte_mbuf *end =  rxq->pkt_last_seg;
+	unsigned pkt_idx, buf_idx;
+
+
+	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+		if (end != NULL) {
+			/* processing a split packet */
+			end->next = rx_bufs[buf_idx];
+			start->nb_segs++;
+			start->pkt_len += rx_bufs[buf_idx]->data_len;
+			end = end->next;
+
+			if (!split_flags[buf_idx]) {
+				/* it's the last packet of the set */
+				start->hash = end->hash;
+				start->ol_flags = end->ol_flags;
+				pkts[pkt_idx++] = start;
+				start = end = NULL;
+			}
+		} else {
+			/* not processing a split packet */
+			if (!split_flags[buf_idx]) {
+				/* not a split packet, save and skip */
+				pkts[pkt_idx++] = rx_bufs[buf_idx];
+				continue;
+			}
+			end = start = rx_bufs[buf_idx];
+		}
+	}
+
+	/* save the partial packet for next time */
+	rxq->pkt_first_seg = start;
+	rxq->pkt_last_seg = end;
+	memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+	return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
+ * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
+ *   numbers of DD bit
+ * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts)
+{
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+	unsigned i = 0;
+
+	/* get some new buffers */
+	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+			split_flags);
+	if (nb_bufs == 0)
+		return 0;
+
+	/* happy day case, full burst + no packets to be joined */
+	const uint64_t *split_fl64 = (uint64_t *)split_flags;
+	if (rxq->pkt_first_seg == NULL &&
+			split_fl64[0] == 0 && split_fl64[1] == 0 &&
+			split_fl64[2] == 0 && split_fl64[3] == 0)
+		return nb_bufs;
+
+	/* reassemble any packets that need reassembly*/
+	if (rxq->pkt_first_seg == NULL) {
+		/* find the first split flag, and only reassemble then*/
+		while (i < nb_bufs && !split_flags[i])
+			i++;
+		if (i == nb_bufs)
+			return nb_bufs;
+	}
+	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+		&split_flags[i]);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 09/14] fm10k: add function to decide best RX function
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (7 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 08/14] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 10/14] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 744e8d0..d22debe 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -187,6 +187,7 @@ struct fm10k_rx_queue {
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
 	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+	uint16_t rx_using_sse; /* indicates that vector RX is in use */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 24f936a..53c4ef1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 			dev->data->dev_conf.rxmode.enable_scatter) {
 			uint32_t reg;
 			dev->data->scattered_rx = 1;
-			dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
 			reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
 			reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
 			FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 
 	/* Configure RSS if applicable */
 	fm10k_dev_mq_rx_configure(dev);
+
+	/* Decide the best RX function */
+	fm10k_set_rx_function(dev);
 	return 0;
 }
 
@@ -2069,6 +2072,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	uint16_t i, rx_using_sse;
+
+	/* In order to allow Vector Rx there are a few configuration
+	 * conditions to be met.
+	 */
+	if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+		if (dev->data->scattered_rx)
+			dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+		else
+			dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+	} else if (dev->data->scattered_rx)
+		dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+	rx_using_sse =
+		(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+		dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+		rxq->rx_using_sse = rx_using_sse;
+	}
+
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2102,9 +2133,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
 
-	if (dev->data->scattered_rx)
-		dev->rx_pkt_burst = &fm10k_recv_scattered_pkts;
-
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 10/14] fm10k: add func to release mbuf in case Vector RX applied
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (8 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 09/14] fm10k: add function to decide best RX function Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 11/14] fm10k: add Vector TX function Chen Jing D(Mark)
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_ethdev.c   |    6 ++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index d22debe..b7f3cc5 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,6 +329,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 53c4ef1..2c3d8be 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* vPMD driver has a different way of releasing mbufs. */
+	if (q->rx_using_sse) {
+		fm10k_rx_queue_release_mbufs_vec(q);
+		return;
+	}
+
 	/* free software buffers */
 	for (i = 0; i < q->nb_desc; ++i) {
 		if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index afeca50..d0b0141 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -313,6 +313,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
 
+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+	const unsigned mask = rxq->nb_desc - 1;
+	unsigned i;
+
+	if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+		return;
+
+	/* free all mbufs that are valid in the ring */
+	for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+		rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+	rxq->rxrearm_nb = rxq->nb_desc;
+
+	/* set all entries to NULL */
+	memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 /*
  * vPMD receive routine, now only accept (nb_pkts == RTE_IXGBE_VPMD_RX_BURST)
  * in one loop
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 11/14] fm10k: add Vector TX function
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (9 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 10/14] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 12/14] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index b7f3cc5..d4b9ed9 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -215,6 +215,9 @@ struct fm10k_tx_queue {
 	uint16_t nb_used;
 	uint16_t free_thresh;
 	uint16_t rs_thresh;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t next_rs; /* Next pos to set RS flag */
+	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint8_t port_id;
@@ -333,4 +336,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index d0b0141..89fb956 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -631,3 +631,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
 		&split_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf *pkt, uint64_t flags)
+{
+	__m128i descriptor = _mm_set_epi64x(flags << 56 |
+			pkt->vlan_tci << 16 | pkt->data_len,
+			MBUF_DMA_ADDR(pkt));
+	_mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+	int i;
+
+	for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+		vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+	struct rte_mbuf **txep;
+	uint8_t flags;
+	uint32_t n;
+	uint32_t i;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+	/* check DD bit on threshold descriptor */
+	flags = txq->hw_ring[txq->next_dd].flags;
+	if (!(flags & FM10K_TXD_FLAG_DONE))
+		return 0;
+
+	n = txq->rs_thresh;
+
+	/* First buffer to free from S/W ring is at index
+	 * next_dd - (rs_thresh-1)
+	 */
+	txep = &txq->sw_ring[txq->next_dd - (n - 1)];
+	m = __rte_pktmbuf_prefree_seg(txep[0]);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+							(void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	/* buffers were freed, update counters */
+	txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+	txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+	if (txq->next_dd >= txq->nb_desc)
+		txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+	return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i;
+
+	for (i = 0; i < (int)nb_pkts; ++i)
+		txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts)
+{
+	struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+	volatile struct fm10k_tx_desc *txdp;
+	struct rte_mbuf **txep;
+	uint16_t n, nb_commit, tx_id;
+	uint64_t flags = FM10K_TXD_FLAG_LAST;
+	uint64_t rs = FM10K_TXD_FLAG_RS | FM10K_TXD_FLAG_LAST;
+	int i;
+
+	/* cross rx_thresh boundary is not allowed */
+	nb_pkts = RTE_MIN(nb_pkts, txq->rs_thresh);
+
+	if (txq->nb_free < txq->free_thresh)
+		fm10k_tx_free_bufs(txq);
+
+	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_free, nb_pkts);
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	tx_id = txq->next_free;
+	txdp = &txq->hw_ring[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
+	txq->nb_free = (uint16_t)(txq->nb_free - nb_pkts);
+
+	n = (uint16_t)(txq->nb_desc - tx_id);
+	if (nb_commit >= n) {
+		tx_backlog_entry(txep, tx_pkts, n);
+
+		for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
+			vtx1(txdp, *tx_pkts, flags);
+
+		vtx1(txdp, *tx_pkts++, rs);
+
+		nb_commit = (uint16_t)(nb_commit - n);
+
+		tx_id = 0;
+		txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+		/* avoid reach the end of ring */
+		txdp = &(txq->hw_ring[tx_id]);
+		txep = &txq->sw_ring[tx_id];
+	}
+
+	tx_backlog_entry(txep, tx_pkts, nb_commit);
+
+	vtx(txdp, tx_pkts, nb_commit, flags);
+
+	tx_id = (uint16_t)(tx_id + nb_commit);
+	if (tx_id > txq->next_rs) {
+		txq->hw_ring[txq->next_rs].flags |= FM10K_TXD_FLAG_RS;
+		txq->next_rs = (uint16_t)(txq->next_rs + txq->rs_thresh);
+	}
+
+	txq->next_free = tx_id;
+
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, txq->next_free);
+
+	return nb_pkts;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 12/14] fm10k: use func pointer to reset TX queue and mbuf release
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (10 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 11/14] fm10k: add Vector TX function Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 13/14] fm10k: introduce 2 funcs " Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 14/14] fm10k: Add function to decide best TX func Chen Jing D(Mark)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++++++
 drivers/net/fm10k/fm10k_ethdev.c |   21 ++++++++++++++++-----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index d4b9ed9..4e737c1 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -204,11 +204,14 @@ struct fifo {
 	uint16_t *endp;
 };
 
+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
 	struct rte_mbuf **sw_ring;
 	struct fm10k_tx_desc *hw_ring;
 	uint64_t hw_ring_phys_addr;
 	struct fifo rs_tracker;
+	const struct fm10k_txq_ops *ops; /* txq ops */
 	uint16_t last_free;
 	uint16_t next_free;
 	uint16_t nb_free;
@@ -225,6 +228,11 @@ struct fm10k_tx_queue {
 	uint16_t queue_id;
 };
 
+struct fm10k_txq_ops {
+	void (*release_mbufs)(struct fm10k_tx_queue *txq);
+	void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
 	((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))
 
@@ -338,4 +346,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 2c3d8be..0a523eb 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
 	return 0;
 }
 
+static const struct fm10k_txq_ops def_txq_ops = {
+	.release_mbufs = tx_queue_free,
+	.reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,8 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	PMD_INIT_FUNC_TRACE();
 
 	if (tx_queue_id < dev->data->nb_tx_queues) {
-		tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+		struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+		q->ops->reset(q);
 
 		/* reset head and tail pointers */
 		FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +843,10 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 
 	if (dev->data->tx_queues) {
-		for (i = 0; i < dev->data->nb_tx_queues; i++)
-			fm10k_tx_queue_release(dev->data->tx_queues[i]);
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+			txq->ops->release_mbufs(txq);
+		}
 	}
 
 	if (dev->data->rx_queues) {
@@ -1454,7 +1462,8 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	 * different socket than was previously used.
 	 */
 	if (dev->data->tx_queues[queue_id] != NULL) {
-		tx_queue_free(dev->data->tx_queues[queue_id]);
+		struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+		txq->ops->release_mbufs(txq);
 		dev->data->tx_queues[queue_id] = NULL;
 	}
 
@@ -1470,6 +1479,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
 	if (handle_txconf(q, conf))
@@ -1528,9 +1538,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+	struct fm10k_tx_queue *q = queue;
 	PMD_INIT_FUNC_TRACE();
 
-	tx_queue_free(queue);
+	q->ops->release_mbufs(q);
 }
 
 static int
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 13/14] fm10k: introduce 2 funcs to reset TX queue and mbuf release
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (11 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 12/14] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 14/14] fm10k: Add function to decide best TX func Chen Jing D(Mark)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 ++++++++++++++++++++++++++++++++++++
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 89fb956..05db6ef 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -632,6 +637,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 		&split_flags[i]);
 }
 
+static const struct fm10k_txq_ops vec_txq_ops = {
+	.release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+	.reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+	txq->ops = &vec_txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
@@ -781,3 +797,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+	unsigned i;
+	const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+	if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+		return;
+
+	/* release the used mbufs in sw_ring */
+	for (i = txq->next_dd - (txq->rs_thresh - 1);
+	     i != txq->next_free;
+	     i = (i + 1) & max_desc)
+		rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+	txq->nb_free = max_desc;
+
+	/* reset tx_entry */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->sw_ring[i] = NULL;
+
+	rte_free(txq->sw_ring);
+	txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+	static const struct fm10k_tx_desc zeroed_desc = {0};
+	struct rte_mbuf **txe = txq->sw_ring;
+	uint16_t i;
+
+	/* Zero out HW ring memory */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->hw_ring[i] = zeroed_desc;
+
+	/* Initialize SW ring entries */
+	for (i = 0; i < txq->nb_desc; i++)
+		txe[i] = NULL;
+
+	txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+	txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+	txq->next_free = 0;
+	txq->nb_used = 0;
+	/* Always allow 1 descriptor to be un-allocated to avoid
+	 * a H/W race condition
+	 */
+	txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH 14/14] fm10k: Add function to decide best TX func
  2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                   ` (12 preceding siblings ...)
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 13/14] fm10k: introduce 2 funcs " Chen Jing D(Mark)
@ 2015-09-29 13:03 ` Chen Jing D(Mark)
  13 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-09-29 13:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 4e737c1..d059d73 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -222,6 +222,7 @@ struct fm10k_tx_queue {
 	uint16_t next_rs; /* Next pos to set RS flag */
 	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
+	uint32_t txq_flags; /* Holds flags for this TXq */
 	uint16_t nb_desc;
 	uint8_t port_id;
 	uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0a523eb..046979d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
 
+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+				ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
 				base_addr >> (CHAR_BIT * sizeof(uint32_t)));
 		FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
 	}
+
+	/* set up vector or scalar TX function as appropriate */
+	fm10k_set_tx_function(dev);
+
 	return 0;
 }
 
@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		},
 		.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
 		.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-				ETH_TXQ_FLAGS_NOOFFLOADS,
+		.txq_flags = FM10K_SIMPLE_TX_FLAG,
 	};
 
 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->txq_flags = conf->txq_flags;
 	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };
 
 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_tx_queue *txq;
+	int i;
+	int use_sse = 1;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+			FM10K_SIMPLE_TX_FLAG) {
+			use_sse = 0;
+			break;
+		}
+	}
+
+	if (use_sse) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			fm10k_txq_vec_setup(txq);
+		}
+		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+	} else
+		dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
 	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-09-29 13:14   ` Ananyev, Konstantin
  2015-09-29 14:22     ` Bruce Richardson
  2015-09-30 13:18     ` Chen, Jing D
  0 siblings, 2 replies; 109+ messages in thread
From: Ananyev, Konstantin @ 2015-09-29 13:14 UTC (permalink / raw)
  To: Chen, Jing D, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chen Jing D(Mark)
> Sent: Tuesday, September 29, 2015 2:04 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> 
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> 
> Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
> includes possible chained packets.
> Add func fm10k_recv_pkts_vec to receive single mbuf packet.
> 
> Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> ---
>  drivers/net/fm10k/fm10k.h          |    1 +
>  drivers/net/fm10k/fm10k_rxtx_vec.c |  213 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 214 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> index d924cae..285254f 100644
> --- a/drivers/net/fm10k/fm10k.h
> +++ b/drivers/net/fm10k/fm10k.h
> @@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	uint16_t nb_pkts);
> 
>  int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> +uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
>  #endif
> diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
> index 581a309..63b34b5 100644
> --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> @@ -281,3 +281,216 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
>  	/* Update the tail pointer on the NIC */
>  	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
>  }
> +
> +/*
> + * vPMD receive routine, now only accept (nb_pkts == RTE_IXGBE_VPMD_RX_BURST)
> + * in one loop
> + *
> + * Notice:
> + * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet
> + * - nb_pkts > RTE_IXGBE_VPMD_RX_BURST, only scan RTE_IXGBE_VPMD_RX_BURST
> + *   numbers of DD bit
> + * - don't support ol_flags for rss and csum err
> + */
> +static inline uint16_t
> +fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> +		uint16_t nb_pkts, uint8_t *split_packet)
> +{
> +	volatile union fm10k_rx_desc *rxdp;
> +	struct rte_mbuf **mbufp;
> +	uint16_t nb_pkts_recd;
> +	int pos;
> +	struct fm10k_rx_queue *rxq = rx_queue;
> +	uint64_t var;
> +	__m128i shuf_msk;
> +	__m128i dd_check, eop_check;
> +	uint16_t next_dd;
> +
> +	next_dd = rxq->next_dd;
> +
> +	if (unlikely(nb_pkts < RTE_FM10K_MAX_RX_BURST))
> +		return 0;
> +
> +	/* Just the act of getting into the function from the application is
> +	 * going to cost about 7 cycles
> +	 */
> +	rxdp = rxq->hw_ring + next_dd;
> +
> +	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
> +
> +	/* See if we need to rearm the RX queue - gives the prefetch a bit
> +	 * of time to act
> +	 */
> +	if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
> +		fm10k_rxq_rearm(rxq);
> +
> +	/* Before we start moving massive data around, check to see if
> +	 * there is actually a packet available
> +	 */
> +	if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
> +		return 0;
> +
> +	/* 4 packets DD mask */
> +	dd_check = _mm_set_epi64x(0x0000000100000001LL, 0x0000000100000001LL);
> +
> +	/* 4 packets EOP mask */
> +	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
> +
> +	/* mask to shuffle from desc. to mbuf */
> +	shuf_msk = _mm_set_epi8(
> +		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
> +		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
> +		13, 12,      /* octet 12~13, 16 bits data_len */
> +		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
> +		13, 12,      /* octet 12~13, low 16 bits pkt_len */
> +		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
> +		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
> +		);
> +
> +	/* Cache is empty -> need to scan the buffer rings, but first move
> +	 * the next 'n' mbufs into the cache
> +	 */
> +	mbufp = &rxq->sw_ring[next_dd];
> +
> +	/* A. load 4 packet in one loop
> +	 * [A*. mask out 4 unused dirty field in desc]
> +	 * B. copy 4 mbuf point from swring to rx_pkts
> +	 * C. calc the number of DD bits among the 4 packets
> +	 * [C*. extract the end-of-packet bit, if requested]
> +	 * D. fill info. from desc to mbuf
> +	 */
> +	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
> +			pos += RTE_FM10K_DESCS_PER_LOOP,
> +			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
> +		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
> +		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> +		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
> +		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
> +
> +		if (split_packet) {
> +			rte_prefetch0(&rx_pkts[pos]->cacheline1);
> +			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
> +			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
> +			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
> +		}


Same thing as with i40e vPMD:
You are pretching junk addreses here.
Check out Zoltan's patch:
http://dpdk.org/dev/patchwork/patch/7190/
and related conversation:
http://dpdk.org/ml/archives/dev/2015-September/023715.html
I think there is the same issue here.
Konstantin 

> +
> +		/* B.1 load 1 mbuf point */
> +		mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
> +
> +		/* Read desc statuses backwards to avoid race condition */
> +		/* A.1 load 4 pkts desc */
> +		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
> +
> +		/* B.2 copy 2 mbuf point into rx_pkts  */
> +		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
> +
> +		/* B.1 load 1 mbuf point */
> +		mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
> +
> +		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
> +		/* B.1 load 2 mbuf point */
> +		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
> +		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
> +
> +		/* B.2 copy 2 mbuf point into rx_pkts  */
> +		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
> +
> +		/* avoid compiler reorder optimization */
> +		rte_compiler_barrier();
> +
> +		/* D.1 pkt 3,4 convert format from desc to pktmbuf */
> +		pkt_mb4 = _mm_shuffle_epi8(descs0[3], shuf_msk);
> +		pkt_mb3 = _mm_shuffle_epi8(descs0[2], shuf_msk);
> +
> +		/* C.1 4=>2 filter staterr info only */
> +		sterr_tmp2 = _mm_unpackhi_epi32(descs0[3], descs0[2]);
> +		/* C.1 4=>2 filter staterr info only */
> +		sterr_tmp1 = _mm_unpackhi_epi32(descs0[1], descs0[0]);
> +
> +		/* set ol_flags with vlan packet type */
> +		fm10k_desc_to_olflags_v(descs0, &rx_pkts[pos]);
> +
> +		/* D.1 pkt 1,2 convert format from desc to pktmbuf */
> +		pkt_mb2 = _mm_shuffle_epi8(descs0[1], shuf_msk);
> +		pkt_mb1 = _mm_shuffle_epi8(descs0[0], shuf_msk);
> +
> +		/* C.2 get 4 pkts staterr value  */
> +		zero = _mm_xor_si128(dd_check, dd_check);
> +		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
> +
> +		/* D.3 copy final 3,4 data to rx_pkts */
> +		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
> +				pkt_mb4);
> +		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
> +				pkt_mb3);
> +
> +		/* C* extract and record EOP bit */
> +		if (split_packet) {
> +			__m128i eop_shuf_mask = _mm_set_epi8(
> +					0xFF, 0xFF, 0xFF, 0xFF,
> +					0xFF, 0xFF, 0xFF, 0xFF,
> +					0xFF, 0xFF, 0xFF, 0xFF,
> +					0x04, 0x0C, 0x00, 0x08
> +					);
> +
> +			/* and with mask to extract bits, flipping 1-0 */
> +			__m128i eop_bits = _mm_andnot_si128(staterr, eop_check);
> +			/* the staterr values are not in order, as the count
> +			 * count of dd bits doesn't care. However, for end of
> +			 * packet tracking, we do care, so shuffle. This also
> +			 * compresses the 32-bit values to 8-bit
> +			 */
> +			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
> +			/* store the resulting 32-bit value */
> +			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
> +			split_packet += RTE_FM10K_DESCS_PER_LOOP;
> +
> +			/* zero-out next pointers */
> +			rx_pkts[pos]->next = NULL;
> +			rx_pkts[pos + 1]->next = NULL;
> +			rx_pkts[pos + 2]->next = NULL;
> +			rx_pkts[pos + 3]->next = NULL;
> +		}
> +
> +		/* C.3 calc available number of desc */
> +		staterr = _mm_and_si128(staterr, dd_check);
> +		staterr = _mm_packs_epi32(staterr, zero);
> +
> +		/* D.3 copy final 1,2 data to rx_pkts */
> +		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
> +				pkt_mb2);
> +		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
> +				pkt_mb1);
> +
> +		fm10k_desc_to_pktype_v(descs0, &rx_pkts[pos]);
> +
> +		/* C.4 calc avaialbe number of desc */
> +		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
> +		nb_pkts_recd += var;
> +		if (likely(var != RTE_FM10K_DESCS_PER_LOOP))
> +			break;
> +	}
> +
> +	/* Update our internal tail pointer */
> +	rxq->next_dd = (uint16_t)(rxq->next_dd + nb_pkts_recd);
> +	rxq->next_dd = (uint16_t)(rxq->next_dd & (rxq->nb_desc - 1));
> +	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
> +
> +	return nb_pkts_recd;
> +}
> +
> +/* vPMD receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
> + *
> + * Notice:
> + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
> + *   numbers of DD bit
> + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> + * - don't support ol_flags for rss and csum err
> + */
> +uint16_t
> +fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> +		uint16_t nb_pkts)
> +{
> +	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
> +}
> --
> 1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
  2015-09-29 13:14   ` Ananyev, Konstantin
@ 2015-09-29 14:22     ` Bruce Richardson
  2015-09-30 13:23       ` Chen, Jing D
  2015-09-30 13:18     ` Chen, Jing D
  1 sibling, 1 reply; 109+ messages in thread
From: Bruce Richardson @ 2015-09-29 14:22 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Tue, Sep 29, 2015 at 01:14:26PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chen Jing D(Mark)
> > Sent: Tuesday, September 29, 2015 2:04 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> > 
> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > 
> > Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
> > includes possible chained packets.
> > Add func fm10k_recv_pkts_vec to receive single mbuf packet.
> > 
> > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > ---
> >  drivers/net/fm10k/fm10k.h          |    1 +
> >  drivers/net/fm10k/fm10k_rxtx_vec.c |  213 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 214 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > index d924cae..285254f 100644
> > --- a/drivers/net/fm10k/fm10k.h
> > +++ b/drivers/net/fm10k/fm10k.h
> > @@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >  	uint16_t nb_pkts);
> > 
> >  int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> > +uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> >  #endif
> > diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > index 581a309..63b34b5 100644
> > --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > @@ -281,3 +281,216 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> >  	/* Update the tail pointer on the NIC */
> >  	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
> >  }
> > +
> > +/*
> > + * vPMD receive routine, now only accept (nb_pkts == RTE_IXGBE_VPMD_RX_BURST)
> > + * in one loop
> > + *
> > + * Notice:
> > + * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet

Why this limitation? I believe this limitation has already been removed for
ixgbe, so the same solution should be applicable here

/Bruce

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
  2015-09-29 13:14   ` Ananyev, Konstantin
  2015-09-29 14:22     ` Bruce Richardson
@ 2015-09-30 13:18     ` Chen, Jing D
  1 sibling, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-09-30 13:18 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, September 29, 2015 9:14 PM
> To: Chen, Jing D; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> 
> 
> > +	/* A. load 4 packet in one loop
> > +	 * [A*. mask out 4 unused dirty field in desc]
> > +	 * B. copy 4 mbuf point from swring to rx_pkts
> > +	 * C. calc the number of DD bits among the 4 packets
> > +	 * [C*. extract the end-of-packet bit, if requested]
> > +	 * D. fill info. from desc to mbuf
> > +	 */
> > +	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
> > +			pos += RTE_FM10K_DESCS_PER_LOOP,
> > +			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
> > +		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
> > +		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> > +		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
> > +		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg.
> */
> > +
> > +		if (split_packet) {
> > +			rte_prefetch0(&rx_pkts[pos]->cacheline1);
> > +			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
> > +			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
> > +			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
> > +		}
> 
> 
> Same thing as with i40e vPMD:
> You are pretching junk addreses here.
> Check out Zoltan's patch:
> http://dpdk.org/dev/patchwork/patch/7190/
> and related conversation:
> http://dpdk.org/ml/archives/dev/2015-September/023715.html
> I think there is the same issue here.
> Konstantin
> 

Thanks for the comments, Konstantin!  I'll check the material you referred to.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
  2015-09-29 14:22     ` Bruce Richardson
@ 2015-09-30 13:23       ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-09-30 13:23 UTC (permalink / raw)
  To: Richardson, Bruce, Ananyev, Konstantin; +Cc: dev

Hi, Bruce,

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Tuesday, September 29, 2015 10:23 PM
> To: Ananyev, Konstantin
> Cc: Chen, Jing D; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> 
> On Tue, Sep 29, 2015 at 01:14:26PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chen Jing
> > > D(Mark)
> > > Sent: Tuesday, September 29, 2015 2:04 PM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> > >
> > > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > >
> > > Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
> > > includes possible chained packets.
> > > Add func fm10k_recv_pkts_vec to receive single mbuf packet.
> > >
> > > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > > ---
> > >  drivers/net/fm10k/fm10k.h          |    1 +
> > >  drivers/net/fm10k/fm10k_rxtx_vec.c |  213
> > > ++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 214 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > > index d924cae..285254f 100644
> > > --- a/drivers/net/fm10k/fm10k.h
> > > +++ b/drivers/net/fm10k/fm10k.h
> > > @@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
> > >  	uint16_t nb_pkts);
> > >
> > >  int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> > > +uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> > >  #endif
> > > diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > index 581a309..63b34b5 100644
> > > --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > @@ -281,3 +281,216 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > >  	/* Update the tail pointer on the NIC */
> > >  	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);  }
> > > +
> > > +/*
> > > + * vPMD receive routine, now only accept (nb_pkts ==
> > > +RTE_IXGBE_VPMD_RX_BURST)
> > > + * in one loop
> > > + *
> > > + * Notice:
> > > + * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet
> 
> Why this limitation? I believe this limitation has already been removed for
> ixgbe, so the same solution should be applicable here
> 
> /Bruce

Thanks! I'll change it accordingly.  

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k
  2015-09-29 13:03 ` [dpdk-dev] [PATCH 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-22  9:44   ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
                       ` (15 more replies)
  0 siblings, 16 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

v2:
 - Fix a typo issue
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (16):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add func to re-allocate mbuf for RX ring
  fm10k: add 2 functions to parse pkt_type and offload flag
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |    5 +
 drivers/net/fm10k/Makefile           |    1 +
 drivers/net/fm10k/fm10k.h            |   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c     |  168 ++++++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  834 ++++++++++++++++++++++++++++++++++
 5 files changed, 1025 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22 15:58       ` Stephen Hemminger
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
                       ` (14 subsequent siblings)
  15 siblings, 2 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/Makefile         |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 0000000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <inttypes.h>
+
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include <tmmintrin.h>
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 02/16] fm10k: add vPMD pre-condition check for each RX queue
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
                       ` (13 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   11 ++++++++---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++++++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
 	/* Protect the mailbox to avoid race condition */
 	rte_spinlock_t    mbx_lock;
 	struct fm10k_macvlan_filter_info    macvlan;
+	/* Flag to indicate if RX vector conditions satisfied */
+	bool rx_vec_allowed;
 };
 
 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
 	struct rte_mempool *mp;
 	struct rte_mbuf **sw_ring;
 	volatile union fm10k_rx_desc *hw_ring;
-	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
+	uint64_t mbuf_initializer; /* value to init mbufs */
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
 	uint16_t queue_id;
 	uint8_t port_id;
 	uint8_t drop_en;
-	uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };
 
 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..3c7784e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1251,6 +1251,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
 	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
 	struct fm10k_rx_queue *q;
 	const struct rte_memzone *mz;
 
@@ -1333,6 +1334,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->hw_ring_phys_addr = mz->phys_addr;
 #endif
 
+	/* Check if number of descs satisfied Vector requirement */
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+				    "preconditions - canceling the feature for "
+				    "the whole port[%d]",
+			     q->queue_id, q->port_id);
+		dev_info->rx_vec_allowed = false;
+	} else
+		fm10k_rxq_vec_setup(q);
+
 	dev->data->rx_queues[queue_id] = q;
 	return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	/* data_off will be ajusted after new mbuf allocated for 512-byte
+	 * alignment.
+	 */
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+	return 0;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22 15:57       ` Stephen Hemminger
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
                       ` (12 subsequent siblings)
  15 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c |   34 ++++++++++++++++++++++------------
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7784e..1bc1e7c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2066,6 +2066,26 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+	 * there is no way to get link status without reading BAR4.  Until this
+	 * works, assume we have maximum bandwidth.
+	 * @todo - fix bus info
+	 */
+	hw->bus_caps.speed = fm10k_bus_speed_8000;
+	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+	hw->bus_caps.payload = fm10k_bus_payload_512;
+	hw->bus.speed = fm10k_bus_speed_8000;
+	hw->bus.width = fm10k_bus_width_pcie_x8;
+	hw->bus.payload = fm10k_bus_payload_256;
+
+	info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2112,18 +2132,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 		return -EIO;
 	}
 
-	/*
-	 * Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-	 * there is no way to get link status without reading BAR4.  Until this
-	 * works, assume we have maximum bandwidth.
-	 * @todo - fix bus info
-	 */
-	hw->bus_caps.speed = fm10k_bus_speed_8000;
-	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-	hw->bus_caps.payload = fm10k_bus_payload_512;
-	hw->bus.speed = fm10k_bus_speed_8000;
-	hw->bus.width = fm10k_bus_width_pcie_x8;
-	hw->bus.payload = fm10k_bus_payload_256;
+	/* Initialize parameters */
+	fm10k_params_init(dev);
 
 	/* Initialize the hw */
 	diag = fm10k_init_hw(hw);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (2 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
                       ` (11 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    9 ++++
 drivers/net/fm10k/fm10k_ethdev.c   |    3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   90 ++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..5df7960 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)    (1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)    ((vlan_id) >> 5)
 
+#define RTE_FM10K_RXQ_REARM_THRESH      32
+#define RTE_FM10K_VPMD_TX_BURST         32
+#define RTE_FM10K_MAX_RX_BURST          RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ    64
+#define RTE_FM10K_DESCS_PER_LOOP    4
+
 struct fm10k_macvlan_filter_info {
 	uint16_t vlan_num;       /* Total VLAN number */
 	uint16_t mac_num;        /* Total mac number */
@@ -178,6 +184,9 @@ struct fm10k_rx_queue {
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint16_t queue_id;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
+	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 1bc1e7c..24f936a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
 	FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+	q->rxrearm_start = 0;
+	q->rxrearm_nb = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..75533f9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -64,3 +64,93 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 	rxq->mbuf_initializer = *(uint64_t *)p;
 	return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+	int i;
+	uint16_t rx_id;
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
+	struct rte_mbuf *mb0, *mb1;
+	__m128i head_off = _mm_set_epi64x(
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+	__m128i dma_addr0, dma_addr1;
+	/* Rx buffer need to be aligned with 512 byte */
+	const __m128i hba_msk = _mm_set_epi64x(0,
+				UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+	rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+	/* Pull 'n' more MBUFs into the software ring */
+	if (rte_mempool_get_bulk(rxq->mp,
+				 (void *)mb_alloc,
+				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_FM10K_RXQ_REARM_THRESH;
+		return;
+	}
+
+	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
+	for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+		__m128i vaddr0, vaddr1;
+		uintptr_t p0, p1;
+
+		mb0 = mb_alloc[0];
+		mb1 = mb_alloc[1];
+
+		/* Flush mbuf with pkt template.
+		 * Data to be rearmed is 6 bytes long.
+		 * Though, RX will overwrite ol_flags that are coming next
+		 * anyway. So overwrite whole 8 bytes with one load:
+		 * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+		 */
+		p0 = (uintptr_t)&mb0->rearm_data;
+		*(uint64_t *)p0 = rxq->mbuf_initializer;
+		p1 = (uintptr_t)&mb1->rearm_data;
+		*(uint64_t *)p1 = rxq->mbuf_initializer;
+
+		/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
+		vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
+		vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
+
+		/* convert pa to dma_addr hdr/data */
+		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+		/* add headroom to pa values */
+		dma_addr0 = _mm_add_epi64(dma_addr0, head_off);
+		dma_addr1 = _mm_add_epi64(dma_addr1, head_off);
+
+		/* Do 512 byte alignment to satisfy HW requirement, in the
+		 * meanwhile, set Header Buffer Address to zero.
+		 */
+		dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
+		dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
+
+		/* flush desc with pa dma_addr */
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr0);
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr1);
+
+		/* enforce 512B alignment on default Rx virtual addresses */
+		mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb0->buf_addr);
+		mb1->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb1->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb1->buf_addr);
+	}
+
+	rxq->rxrearm_start += RTE_FM10K_RXQ_REARM_THRESH;
+	if (rxq->rxrearm_start >= rxq->nb_desc)
+		rxq->rxrearm_start = 0;
+
+	rxq->rxrearm_nb -= RTE_FM10K_RXQ_REARM_THRESH;
+
+	rx_id = (uint16_t) ((rxq->rxrearm_start == 0) ?
+			     (rxq->nb_desc - 1) : (rxq->rxrearm_start - 1));
+
+	/* Update the tail pointer on the NIC */
+	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 05/16] fm10k: add 2 functions to parse pkt_type and offload flag
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (3 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
                       ` (10 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |  127 ++++++++++++++++++++++++++++++++++++
 1 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 75533f9..581a309 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT     (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT     (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT     (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i ptype0, ptype1, vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	const __m128i pkttype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+	/* mask everything except rss type */
+	const __m128i rsstype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x000F, 0x000F, 0x000F, 0x000F);
+
+	/* map rss type to rss hash flag */
+	const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+			0, 0, 0, PKT_RX_RSS_HASH,
+			PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+			PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+	ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+	ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+	ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+	ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+	vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+	vtag1 = _mm_or_si128(ptype0, vtag1);
+	vol.dword = _mm_cvtsi128_si64(vtag1);
+
+	rx_pkts[0]->ol_flags = vol.e[0];
+	rx_pkts[1]->ol_flags = vol.e[1];
+	rx_pkts[2]->ol_flags = vol.e[2];
+	rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i l3l4type0, l3l4type1, l3type, l4type;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	/* L3 pkt type mask  Bit4 to Bit6 */
+	const __m128i l3type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0070, 0x0070, 0x0070, 0x0070);
+
+	/* L4 pkt type mask  Bit7 to Bit9 */
+	const __m128i l4type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0380, 0x0380, 0x0380, 0x0380);
+
+	/* convert RRC l3 type to mbuf format */
+	const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+			0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+			RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+			RTE_PTYPE_L3_IPV4, 0);
+
+	/* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+	 * to fill into8 bits length.
+	 */
+	const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+			RTE_PTYPE_TUNNEL_GENEVE >> 8,
+			RTE_PTYPE_TUNNEL_NVGRE >> 8,
+			RTE_PTYPE_TUNNEL_VXLAN >> 8,
+			RTE_PTYPE_TUNNEL_GRE >> 8,
+			RTE_PTYPE_L4_UDP >> 8,
+			RTE_PTYPE_L4_TCP >> 8,
+			0);
+
+	l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+	l3type = _mm_and_si128(l3l4type0, l3type_msk);
+	l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+	l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+	l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+	l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+	/* l4type_flags shift-left for 8 bits, need shift-right back */
+	l4type = _mm_shuffle_epi8(l4type_flags, l4type);
+
+	l4type = _mm_slli_epi16(l4type, 8);
+	l3l4type0 = _mm_or_si128(l3type, l4type);
+	vol.dword = _mm_cvtsi128_si64(l3l4type0);
+
+	rx_pkts[0]->packet_type = vol.e[0];
+	rx_pkts[1]->packet_type = vol.e[1];
+	rx_pkts[2]->packet_type = vol.e[2];
+	rx_pkts[3]->packet_type = vol.e[3];
+}
+#else
+#define fm10k_desc_to_olflags_v(desc, rx_pkts) do {} while (0)
+#define fm10k_desc_to_pktype_v(desc, rx_pkts) do {} while (0)
+#endif
+
 int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (4 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-27  5:24       ` Liang, Cunming
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
                       ` (9 subsequent siblings)
  15 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  196 ++++++++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5df7960..f04ba2c 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 581a309..482b76c 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -281,3 +281,199 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	/* Update the tail pointer on the NIC */
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
+
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts, uint8_t *split_packet)
+{
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mbufp;
+	uint16_t nb_pkts_recd;
+	int pos;
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint64_t var;
+	__m128i shuf_msk;
+	__m128i dd_check, eop_check;
+	uint16_t next_dd;
+
+	next_dd = rxq->next_dd;
+
+	/* Just the act of getting into the function from the application is
+	 * going to cost about 7 cycles
+	 */
+	rxdp = rxq->hw_ring + next_dd;
+
+	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+	/* See if we need to rearm the RX queue - gives the prefetch a bit
+	 * of time to act
+	 */
+	if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+		fm10k_rxq_rearm(rxq);
+
+	/* Before we start moving massive data around, check to see if
+	 * there is actually a packet available
+	 */
+	if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+		return 0;
+
+	/* 4 packets DD mask */
+	dd_check = _mm_set_epi64x(0x0000000100000001LL, 0x0000000100000001LL);
+
+	/* 4 packets EOP mask */
+	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
+
+	/* mask to shuffle from desc. to mbuf */
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+		);
+
+	/* Cache is empty -> need to scan the buffer rings, but first move
+	 * the next 'n' mbufs into the cache
+	 */
+	mbufp = &rxq->sw_ring[next_dd];
+
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+			pos += RTE_FM10K_DESCS_PER_LOOP,
+			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+		/* B.1 load 1 mbuf point */
+		mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
+
+		/* Read desc statuses backwards to avoid race condition */
+		/* A.1 load 4 pkts desc */
+		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+
+		/* B.1 load 1 mbuf point */
+		mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
+
+		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		/* B.1 load 2 mbuf point */
+		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+
+		/* avoid compiler reorder optimization */
+		rte_compiler_barrier();
+
+		if (split_packet) {
+			rte_prefetch0(&rx_pkts[pos]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
+		}
+
+		/* D.1 pkt 3,4 convert format from desc to pktmbuf */
+		pkt_mb4 = _mm_shuffle_epi8(descs0[3], shuf_msk);
+		pkt_mb3 = _mm_shuffle_epi8(descs0[2], shuf_msk);
+
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp2 = _mm_unpackhi_epi32(descs0[3], descs0[2]);
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp1 = _mm_unpackhi_epi32(descs0[1], descs0[0]);
+
+		/* set ol_flags with vlan packet type */
+		fm10k_desc_to_olflags_v(descs0, &rx_pkts[pos]);
+
+		/* D.1 pkt 1,2 convert format from desc to pktmbuf */
+		pkt_mb2 = _mm_shuffle_epi8(descs0[1], shuf_msk);
+		pkt_mb1 = _mm_shuffle_epi8(descs0[0], shuf_msk);
+
+		/* C.2 get 4 pkts staterr value  */
+		zero = _mm_xor_si128(dd_check, dd_check);
+		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
+
+		/* D.3 copy final 3,4 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
+				pkt_mb4);
+		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
+				pkt_mb3);
+
+		/* C* extract and record EOP bit */
+		if (split_packet) {
+			__m128i eop_shuf_mask = _mm_set_epi8(
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0x04, 0x0C, 0x00, 0x08
+					);
+
+			/* and with mask to extract bits, flipping 1-0 */
+			__m128i eop_bits = _mm_andnot_si128(staterr, eop_check);
+			/* the staterr values are not in order, as the count
+			 * count of dd bits doesn't care. However, for end of
+			 * packet tracking, we do care, so shuffle. This also
+			 * compresses the 32-bit values to 8-bit
+			 */
+			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
+			/* store the resulting 32-bit value */
+			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
+			split_packet += RTE_FM10K_DESCS_PER_LOOP;
+
+			/* zero-out next pointers */
+			rx_pkts[pos]->next = NULL;
+			rx_pkts[pos + 1]->next = NULL;
+			rx_pkts[pos + 2]->next = NULL;
+			rx_pkts[pos + 3]->next = NULL;
+		}
+
+		/* C.3 calc available number of desc */
+		staterr = _mm_and_si128(staterr, dd_check);
+		staterr = _mm_packs_epi32(staterr, zero);
+
+		/* D.3 copy final 1,2 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
+				pkt_mb2);
+		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
+				pkt_mb1);
+
+		fm10k_desc_to_pktype_v(descs0, &rx_pkts[pos]);
+
+		/* C.4 calc avaialbe number of desc */
+		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
+		nb_pkts_recd += var;
+		if (likely(var != RTE_FM10K_DESCS_PER_LOOP))
+			break;
+	}
+
+	/* Update our internal tail pointer */
+	rxq->next_dd = (uint16_t)(rxq->next_dd + nb_pkts_recd);
+	rxq->next_dd = (uint16_t)(rxq->next_dd & (rxq->nb_desc - 1));
+	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
+
+	return nb_pkts_recd;
+}
+
+/* vPMD receive routine
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ */
+uint16_t
+fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts)
+{
+	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 07/16] fm10k: add func to do Vector RX condition check
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (5 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
                       ` (8 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index f04ba2c..1502ae3 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,5 +327,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 482b76c..96ca28b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 #endif
 
 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+	/* whithout rx ol_flags, no VP flag report */
+	if (rxmode->hw_vlan_extend != 0)
+		return -1;
+#endif
+
+	/* no fdir support */
+	if (fconf->mode != RTE_FDIR_MODE_NONE)
+		return -1;
+
+	/* - no csum error report support
+	 * - no header split support
+	 */
+	if (rxmode->hw_ip_checksum == 1 ||
+	    rxmode->header_split == 1)
+		return -1;
+
+	return 0;
+#else
+	RTE_SET_USED(dev);
+	return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
 	uintptr_t p;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (6 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-27  5:27       ` Liang, Cunming
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
                       ` (7 subsequent siblings)
  15 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 1502ae3..06697fa 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+					uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 96ca28b..237de9d 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -508,3 +508,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 {
 	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+		struct rte_mbuf **rx_bufs,
+		uint16_t nb_bufs, uint8_t *split_flags)
+{
+	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+	struct rte_mbuf *start = rxq->pkt_first_seg;
+	struct rte_mbuf *end =  rxq->pkt_last_seg;
+	unsigned pkt_idx, buf_idx;
+
+
+	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+		if (end != NULL) {
+			/* processing a split packet */
+			end->next = rx_bufs[buf_idx];
+			start->nb_segs++;
+			start->pkt_len += rx_bufs[buf_idx]->data_len;
+			end = end->next;
+
+			if (!split_flags[buf_idx]) {
+				/* it's the last packet of the set */
+				start->hash = end->hash;
+				start->ol_flags = end->ol_flags;
+				pkts[pkt_idx++] = start;
+				start = end = NULL;
+			}
+		} else {
+			/* not processing a split packet */
+			if (!split_flags[buf_idx]) {
+				/* not a split packet, save and skip */
+				pkts[pkt_idx++] = rx_bufs[buf_idx];
+				continue;
+			}
+			end = start = rx_bufs[buf_idx];
+		}
+	}
+
+	/* save the partial packet for next time */
+	rxq->pkt_first_seg = start;
+	rxq->pkt_last_seg = end;
+	memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+	return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
+ * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
+ *   numbers of DD bit
+ * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts)
+{
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+	unsigned i = 0;
+
+	/* get some new buffers */
+	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+			split_flags);
+	if (nb_bufs == 0)
+		return 0;
+
+	/* happy day case, full burst + no packets to be joined */
+	const uint64_t *split_fl64 = (uint64_t *)split_flags;
+	if (rxq->pkt_first_seg == NULL &&
+			split_fl64[0] == 0 && split_fl64[1] == 0 &&
+			split_fl64[2] == 0 && split_fl64[3] == 0)
+		return nb_bufs;
+
+	/* reassemble any packets that need reassembly*/
+	if (rxq->pkt_first_seg == NULL) {
+		/* find the first split flag, and only reassemble then*/
+		while (i < nb_bufs && !split_flags[i])
+			i++;
+		if (i == nb_bufs)
+			return nb_bufs;
+	}
+	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+		&split_flags[i]);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 09/16] fm10k: add function to decide best RX function
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (7 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
                       ` (6 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 06697fa..8614e81 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -187,6 +187,7 @@ struct fm10k_rx_queue {
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
 	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+	uint16_t rx_using_sse; /* indicates that vector RX is in use */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 24f936a..53c4ef1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 			dev->data->dev_conf.rxmode.enable_scatter) {
 			uint32_t reg;
 			dev->data->scattered_rx = 1;
-			dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
 			reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
 			reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
 			FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 
 	/* Configure RSS if applicable */
 	fm10k_dev_mq_rx_configure(dev);
+
+	/* Decide the best RX function */
+	fm10k_set_rx_function(dev);
 	return 0;
 }
 
@@ -2069,6 +2072,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	uint16_t i, rx_using_sse;
+
+	/* In order to allow Vector Rx there are a few configuration
+	 * conditions to be met.
+	 */
+	if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+		if (dev->data->scattered_rx)
+			dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+		else
+			dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+	} else if (dev->data->scattered_rx)
+		dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+	rx_using_sse =
+		(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+		dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+		rxq->rx_using_sse = rx_using_sse;
+	}
+
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2102,9 +2133,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
 
-	if (dev->data->scattered_rx)
-		dev->rx_pkt_burst = &fm10k_recv_scattered_pkts;
-
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 10/16] fm10k: add func to release mbuf in case Vector RX applied
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (8 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
                       ` (5 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_ethdev.c   |    6 ++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8614e81..c5e66e2 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,6 +329,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 53c4ef1..2c3d8be 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* vPMD driver has a different way of releasing mbufs. */
+	if (q->rx_using_sse) {
+		fm10k_rx_queue_release_mbufs_vec(q);
+		return;
+	}
+
 	/* free software buffers */
 	for (i = 0; i < q->nb_desc; ++i) {
 		if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 237de9d..ab0218e 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -313,6 +313,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
 
+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+	const unsigned mask = rxq->nb_desc - 1;
+	unsigned i;
+
+	if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+		return;
+
+	/* free all mbufs that are valid in the ring */
+	for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+		rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+	rxq->rxrearm_nb = rxq->nb_desc;
+
+	/* set all entries to NULL */
+	memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 11/16] fm10k: add Vector TX function
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (9 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
@ 2015-10-22  9:44     ` Chen Jing D(Mark)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
                       ` (4 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:44 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c5e66e2..0a4c174 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -215,6 +215,9 @@ struct fm10k_tx_queue {
 	uint16_t nb_used;
 	uint16_t free_thresh;
 	uint16_t rs_thresh;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t next_rs; /* Next pos to set RS flag */
+	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint8_t port_id;
@@ -333,4 +336,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ab0218e..f119c2c 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -614,3 +614,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
 		&split_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf *pkt, uint64_t flags)
+{
+	__m128i descriptor = _mm_set_epi64x(flags << 56 |
+			pkt->vlan_tci << 16 | pkt->data_len,
+			MBUF_DMA_ADDR(pkt));
+	_mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+	int i;
+
+	for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+		vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+	struct rte_mbuf **txep;
+	uint8_t flags;
+	uint32_t n;
+	uint32_t i;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+	/* check DD bit on threshold descriptor */
+	flags = txq->hw_ring[txq->next_dd].flags;
+	if (!(flags & FM10K_TXD_FLAG_DONE))
+		return 0;
+
+	n = txq->rs_thresh;
+
+	/* First buffer to free from S/W ring is at index
+	 * next_dd - (rs_thresh-1)
+	 */
+	txep = &txq->sw_ring[txq->next_dd - (n - 1)];
+	m = __rte_pktmbuf_prefree_seg(txep[0]);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+							(void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	/* buffers were freed, update counters */
+	txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+	txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+	if (txq->next_dd >= txq->nb_desc)
+		txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+	return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i;
+
+	for (i = 0; i < (int)nb_pkts; ++i)
+		txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts)
+{
+	struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+	volatile struct fm10k_tx_desc *txdp;
+	struct rte_mbuf **txep;
+	uint16_t n, nb_commit, tx_id;
+	uint64_t flags = FM10K_TXD_FLAG_LAST;
+	uint64_t rs = FM10K_TXD_FLAG_RS | FM10K_TXD_FLAG_LAST;
+	int i;
+
+	/* cross rx_thresh boundary is not allowed */
+	nb_pkts = RTE_MIN(nb_pkts, txq->rs_thresh);
+
+	if (txq->nb_free < txq->free_thresh)
+		fm10k_tx_free_bufs(txq);
+
+	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_free, nb_pkts);
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	tx_id = txq->next_free;
+	txdp = &txq->hw_ring[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
+	txq->nb_free = (uint16_t)(txq->nb_free - nb_pkts);
+
+	n = (uint16_t)(txq->nb_desc - tx_id);
+	if (nb_commit >= n) {
+		tx_backlog_entry(txep, tx_pkts, n);
+
+		for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
+			vtx1(txdp, *tx_pkts, flags);
+
+		vtx1(txdp, *tx_pkts++, rs);
+
+		nb_commit = (uint16_t)(nb_commit - n);
+
+		tx_id = 0;
+		txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+		/* avoid reach the end of ring */
+		txdp = &(txq->hw_ring[tx_id]);
+		txep = &txq->sw_ring[tx_id];
+	}
+
+	tx_backlog_entry(txep, tx_pkts, nb_commit);
+
+	vtx(txdp, tx_pkts, nb_commit, flags);
+
+	tx_id = (uint16_t)(tx_id + nb_commit);
+	if (tx_id > txq->next_rs) {
+		txq->hw_ring[txq->next_rs].flags |= FM10K_TXD_FLAG_RS;
+		txq->next_rs = (uint16_t)(txq->next_rs + txq->rs_thresh);
+	}
+
+	txq->next_free = tx_id;
+
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, txq->next_free);
+
+	return nb_pkts;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 12/16] fm10k: use func pointer to reset TX queue and mbuf release
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (10 preceding siblings ...)
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
@ 2015-10-22  9:45     ` Chen Jing D(Mark)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
                       ` (3 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++++++
 drivers/net/fm10k/fm10k_ethdev.c |   21 ++++++++++++++++-----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 0a4c174..2bead12 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -204,11 +204,14 @@ struct fifo {
 	uint16_t *endp;
 };
 
+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
 	struct rte_mbuf **sw_ring;
 	struct fm10k_tx_desc *hw_ring;
 	uint64_t hw_ring_phys_addr;
 	struct fifo rs_tracker;
+	const struct fm10k_txq_ops *ops; /* txq ops */
 	uint16_t last_free;
 	uint16_t next_free;
 	uint16_t nb_free;
@@ -225,6 +228,11 @@ struct fm10k_tx_queue {
 	uint16_t queue_id;
 };
 
+struct fm10k_txq_ops {
+	void (*release_mbufs)(struct fm10k_tx_queue *txq);
+	void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
 	((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))
 
@@ -338,4 +346,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 2c3d8be..0a523eb 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
 	return 0;
 }
 
+static const struct fm10k_txq_ops def_txq_ops = {
+	.release_mbufs = tx_queue_free,
+	.reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,8 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	PMD_INIT_FUNC_TRACE();
 
 	if (tx_queue_id < dev->data->nb_tx_queues) {
-		tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+		struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+		q->ops->reset(q);
 
 		/* reset head and tail pointers */
 		FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +843,10 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 
 	if (dev->data->tx_queues) {
-		for (i = 0; i < dev->data->nb_tx_queues; i++)
-			fm10k_tx_queue_release(dev->data->tx_queues[i]);
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+			txq->ops->release_mbufs(txq);
+		}
 	}
 
 	if (dev->data->rx_queues) {
@@ -1454,7 +1462,8 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	 * different socket than was previously used.
 	 */
 	if (dev->data->tx_queues[queue_id] != NULL) {
-		tx_queue_free(dev->data->tx_queues[queue_id]);
+		struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+		txq->ops->release_mbufs(txq);
 		dev->data->tx_queues[queue_id] = NULL;
 	}
 
@@ -1470,6 +1479,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
 	if (handle_txconf(q, conf))
@@ -1528,9 +1538,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+	struct fm10k_tx_queue *q = queue;
 	PMD_INIT_FUNC_TRACE();
 
-	tx_queue_free(queue);
+	q->ops->release_mbufs(q);
 }
 
 static int
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 13/16] fm10k: introduce 2 funcs to reset TX queue and mbuf release
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (11 preceding siblings ...)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
@ 2015-10-22  9:45     ` Chen Jing D(Mark)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
                       ` (2 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 ++++++++++++++++++++++++++++++++++++
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index f119c2c..5ed8653 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -615,6 +620,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 		&split_flags[i]);
 }
 
+static const struct fm10k_txq_ops vec_txq_ops = {
+	.release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+	.reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+	txq->ops = &vec_txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
@@ -764,3 +780,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+	unsigned i;
+	const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+	if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+		return;
+
+	/* release the used mbufs in sw_ring */
+	for (i = txq->next_dd - (txq->rs_thresh - 1);
+	     i != txq->next_free;
+	     i = (i + 1) & max_desc)
+		rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+	txq->nb_free = max_desc;
+
+	/* reset tx_entry */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->sw_ring[i] = NULL;
+
+	rte_free(txq->sw_ring);
+	txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+	static const struct fm10k_tx_desc zeroed_desc = {0};
+	struct rte_mbuf **txe = txq->sw_ring;
+	uint16_t i;
+
+	/* Zero out HW ring memory */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->hw_ring[i] = zeroed_desc;
+
+	/* Initialize SW ring entries */
+	for (i = 0; i < txq->nb_desc; i++)
+		txe[i] = NULL;
+
+	txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+	txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+	txq->next_free = 0;
+	txq->nb_used = 0;
+	/* Always allow 1 descriptor to be un-allocated to avoid
+	 * a H/W race condition
+	 */
+	txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 14/16] fm10k: Add function to decide best TX func
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (12 preceding siblings ...)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
@ 2015-10-22  9:45     ` Chen Jing D(Mark)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 2bead12..68ae1b8 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -222,6 +222,7 @@ struct fm10k_tx_queue {
 	uint16_t next_rs; /* Next pos to set RS flag */
 	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
+	uint32_t txq_flags; /* Holds flags for this TXq */
 	uint16_t nb_desc;
 	uint8_t port_id;
 	uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0a523eb..046979d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
 
+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+				ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
 				base_addr >> (CHAR_BIT * sizeof(uint32_t)));
 		FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
 	}
+
+	/* set up vector or scalar TX function as appropriate */
+	fm10k_set_tx_function(dev);
+
 	return 0;
 }
 
@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		},
 		.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
 		.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-				ETH_TXQ_FLAGS_NOOFFLOADS,
+		.txq_flags = FM10K_SIMPLE_TX_FLAG,
 	};
 
 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->txq_flags = conf->txq_flags;
 	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };
 
 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_tx_queue *txq;
+	int i;
+	int use_sse = 1;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+			FM10K_SIMPLE_TX_FLAG) {
+			use_sse = 0;
+			break;
+		}
+	}
+
+	if (use_sse) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			fm10k_txq_vec_setup(txq);
+		}
+		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+	} else
+		dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
 	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 15/16] fm10k: fix a crash issue in vector RX func
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (13 preceding siblings ...)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
@ 2015-10-22  9:45     ` Chen Jing D(Mark)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    4 ++++
 drivers/net/fm10k/fm10k_ethdev.c |   19 +++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 68ae1b8..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,12 +177,16 @@ struct fm10k_rx_queue {
 	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
 	uint64_t mbuf_initializer; /* value to init mbufs */
+	/* need to alloc dummy mbuf, for wraparound when scanning hw ring */
+	struct rte_mbuf fake_mbuf;
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
 	uint16_t alloc_thresh;
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
+	/* Number of faked desc added at the tail for Vector RX function */
+	uint16_t nb_fake_desc;
 	uint16_t queue_id;
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 046979d..31c96ac 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+	static const union fm10k_rx_desc zero = {{0}};
 	uint64_t dma_addr;
 	int i, diag;
 	PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 		q->hw_ring[i].q.hdr_addr = dma_addr;
 	}
 
+	/* initialize extra software ring entries. Space for these extra
+	 * entries is always allocated.
+	 */
+	memset(&q->fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+	for (i = 0; i < q->nb_fake_desc; ++i) {
+		q->sw_ring[q->nb_desc + i] = &q->fake_mbuf;
+		q->hw_ring[q->nb_desc + i] = zero;
+	}
+
 	q->next_dd = 0;
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* zero faked descriptors */
+	for (i = 0; i < q->nb_fake_desc; ++i)
+		q->hw_ring[q->nb_desc + i] = zero;
+
 	/* vPMD driver has a different way of releasing mbufs. */
 	if (q->rx_using_sse) {
 		fm10k_rx_queue_release_mbufs_vec(q);
@@ -1323,6 +1337,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	/* setup queue */
 	q->mp = mp;
 	q->nb_desc = nb_desc;
+	q->nb_fake_desc = FM10K_MULT_RX_DESC;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
 	q->tail_ptr = (volatile uint32_t *)
@@ -1332,8 +1347,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 
 	/* allocate memory for the software ring */
 	q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-					nb_desc * sizeof(struct rte_mbuf *),
-					RTE_CACHE_LINE_SIZE, socket_id);
+			(nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+			RTE_CACHE_LINE_SIZE, socket_id);
 	if (q->sw_ring == NULL) {
 		PMD_INIT_LOG(ERR, "Cannot allocate software ring");
 		rte_free(q);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v2 16/16] doc: release notes update for fm10k Vector PMD
  2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                       ` (14 preceding siblings ...)
  2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
@ 2015-10-22  9:45     ` Chen Jing D(Mark)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 doc/guides/rel_notes/release_2_2.rst |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 9a70dae..44a3f74 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers
 
   Fixed issue with libvirt ``virsh destroy`` not killing the VM.
 
+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.
 
 Libraries
 ~~~~~~~~~
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
@ 2015-10-22 15:57       ` Stephen Hemminger
  2015-10-23  8:27         ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Stephen Hemminger @ 2015-10-22 15:57 UTC (permalink / raw)
  To: Chen Jing D(Mark); +Cc: dev

On Thu, 22 Oct 2015 17:44:51 +0800
"Chen Jing D(Mark)" <jing.d.chen@intel.com> wrote:

> +static void
> +fm10k_params_init(struct rte_eth_dev *dev)
> +{
> +	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
> +	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
> +	 * there is no way to get link status without reading BAR4.  Until this
> +	 * works, assume we have maximum bandwidth.
> +	 * @todo - fix bus info

Minor nit. I would prefer that DPDK follow current Linux kernel
style which is to always have a blank line after declarations.
This improves readability.

I.e:

static void
fm10k_params_init(struct rte_eth_dev *dev)
{
	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);

	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
	 * there is no way to get link status without reading BAR4.  Until this
	 * works, assume we have maximum bandwidth.
	 * @todo - fix bus info

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-22 15:58       ` Stephen Hemminger
  2015-10-23  8:39         ` Chen, Jing D
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  1 sibling, 1 reply; 109+ messages in thread
From: Stephen Hemminger @ 2015-10-22 15:58 UTC (permalink / raw)
  To: Chen Jing D(Mark); +Cc: dev

On Thu, 22 Oct 2015 17:44:49 +0800
"Chen Jing D(Mark)" <jing.d.chen@intel.com> wrote:

> +#ifndef __INTEL_COMPILER
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +#endif

Since this is new code, can't you make it work correctly
with Gcc. Rather than turning off a useful diagnostic.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters
  2015-10-22 15:57       ` Stephen Hemminger
@ 2015-10-23  8:27         ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-10-23  8:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi, Stephen,

Best Regards,
Mark


> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, October 22, 2015 11:58 PM
> To: Chen, Jing D
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize
> all parameters
> 
> On Thu, 22 Oct 2015 17:44:51 +0800
> "Chen Jing D(Mark)" <jing.d.chen@intel.com> wrote:
> 
> > +static void
> > +fm10k_params_init(struct rte_eth_dev *dev)
> > +{
> > +	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> > +	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
> > +	/* Inialize bus info. Normally we would call fm10k_get_bus_info(),
> but
> > +	 * there is no way to get link status without reading BAR4.  Until this
> > +	 * works, assume we have maximum bandwidth.
> > +	 * @todo - fix bus info
> 
> Minor nit. I would prefer that DPDK follow current Linux kernel
> style which is to always have a blank line after declarations.
> This improves readability.
> 

Thanks the comments! I'll change accordingly.

> I.e:
> 
> static void
> fm10k_params_init(struct rte_eth_dev *dev)
> {
> 	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> 	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
> 
> 	/* Inialize bus info. Normally we would call fm10k_get_bus_info(),
> but
> 	 * there is no way to get link status without reading BAR4.  Until this
> 	 * works, assume we have maximum bandwidth.
> 	 * @todo - fix bus info

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
  2015-10-22 15:58       ` Stephen Hemminger
@ 2015-10-23  8:39         ` Chen, Jing D
  2015-10-23 10:01           ` Bruce Richardson
  0 siblings, 1 reply; 109+ messages in thread
From: Chen, Jing D @ 2015-10-23  8:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi, Stephen,

Best Regards,
Mark


> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, October 22, 2015 11:59 PM
> To: Chen, Jing D
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
> 
> On Thu, 22 Oct 2015 17:44:49 +0800
> "Chen Jing D(Mark)" <jing.d.chen@intel.com> wrote:
> 
> > +#ifndef __INTEL_COMPILER
> > +#pragma GCC diagnostic ignored "-Wcast-qual"
> > +#endif
> 
> Since this is new code, can't you make it work correctly
> with Gcc. Rather than turning off a useful diagnostic.

This macro is necessary for later SSE functions or I'll have to add some
Un-necessary cast to avoid compile failure.

I can add it in later patch. But it will have to show up anyway, right? 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
  2015-10-23  8:39         ` Chen, Jing D
@ 2015-10-23 10:01           ` Bruce Richardson
  2015-10-27  5:26             ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Bruce Richardson @ 2015-10-23 10:01 UTC (permalink / raw)
  To: Chen, Jing D; +Cc: dev

On Fri, Oct 23, 2015 at 08:39:56AM +0000, Chen, Jing D wrote:
> Hi, Stephen,
> 
> Best Regards,
> Mark
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, October 22, 2015 11:59 PM
> > To: Chen, Jing D
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
> > 
> > On Thu, 22 Oct 2015 17:44:49 +0800
> > "Chen Jing D(Mark)" <jing.d.chen@intel.com> wrote:
> > 
> > > +#ifndef __INTEL_COMPILER
> > > +#pragma GCC diagnostic ignored "-Wcast-qual"
> > > +#endif
> > 
> > Since this is new code, can't you make it work correctly
> > with Gcc. Rather than turning off a useful diagnostic.
> 
> This macro is necessary for later SSE functions or I'll have to add some
> Un-necessary cast to avoid compile failure.
> 
> I can add it in later patch. But it will have to show up anyway, right? 

Actually, casting won't make the warnings go away either. You'll always get a
warning about removing the volatile from the structure members when you pass
them to the intrinsic functions. Only option is to temporarily disable the warning
[or else write your own versions of the intrinsics that do support volatiles :-)]

/Bruce

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-10-27  5:24       ` Liang, Cunming
  2015-10-27  5:32         ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-27  5:24 UTC (permalink / raw)
  To: Chen Jing D(Mark), dev

Hi,

On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
>
> Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
> includes possible chained packets.
> Add func fm10k_recv_pkts_vec to receive single mbuf packet.
>
> Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> ---
>   drivers/net/fm10k/fm10k.h          |    1 +
>   drivers/net/fm10k/fm10k_rxtx_vec.c |  196 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 197 insertions(+), 0 deletions(-)
[...]
> +	/* mask to shuffle from desc. to mbuf */
> +	shuf_msk = _mm_set_epi8(
> +		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
> +		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
> +		13, 12,      /* octet 12~13, 16 bits data_len */
> +		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
> +		13, 12,      /* octet 12~13, low 16 bits pkt_len */
> +		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
> +		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
> +		);
> +
> +	/* Cache is empty -> need to scan the buffer rings, but first move
> +	 * the next 'n' mbufs into the cache
> +	 */
> +	mbufp = &rxq->sw_ring[next_dd];
> +
> +	/* A. load 4 packet in one loop
> +	 * [A*. mask out 4 unused dirty field in desc]
> +	 * B. copy 4 mbuf point from swring to rx_pkts
> +	 * C. calc the number of DD bits among the 4 packets
> +	 * [C*. extract the end-of-packet bit, if requested]
> +	 * D. fill info. from desc to mbuf
> +	 */
> +	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
It's necessary to floor align the nb_pkts into RTE_FM10K_DESCS_PER_LOOP, 
otherwise it may exceed the rx_pkts array.
e.g. nb_pkts is 6, it executes twice in the loop which has chance to get 
8 packets done, but rx_pkts only expect 6 packets.
>   
> +			pos += RTE_FM10K_DESCS_PER_LOOP,
> +			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
> +		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
> +		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> +		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
> +		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
> +
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
  2015-10-23 10:01           ` Bruce Richardson
@ 2015-10-27  5:26             ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-10-27  5:26 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev

Hi, Bruce,

Best Regards,
Mark


> -----Original Message-----
> From: Richardson, Bruce
> Sent: Friday, October 23, 2015 6:01 PM
> To: Chen, Jing D
> Cc: Stephen Hemminger; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
> 
> On Fri, Oct 23, 2015 at 08:39:56AM +0000, Chen, Jing D wrote:
> > Hi, Stephen,
> >
> > Best Regards,
> > Mark
> >
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Thursday, October 22, 2015 11:59 PM
> > > To: Chen, Jing D
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file
> > >
> > > On Thu, 22 Oct 2015 17:44:49 +0800
> > > "Chen Jing D(Mark)" <jing.d.chen@intel.com> wrote:
> > >
> > > > +#ifndef __INTEL_COMPILER
> > > > +#pragma GCC diagnostic ignored "-Wcast-qual"
> > > > +#endif
> > >
> > > Since this is new code, can't you make it work correctly
> > > with Gcc. Rather than turning off a useful diagnostic.
> >
> > This macro is necessary for later SSE functions or I'll have to add some
> > Un-necessary cast to avoid compile failure.
> >
> > I can add it in later patch. But it will have to show up anyway, right?
> 
> Actually, casting won't make the warnings go away either. You'll always get a
> warning about removing the volatile from the structure members when you
> pass
> them to the intrinsic functions. Only option is to temporarily disable the
> warning
> [or else write your own versions of the intrinsics that do support volatiles :-)]
> 
> /Bruce

Thanks for the clarifications. :)

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-10-27  5:27       ` Liang, Cunming
  2015-10-27  5:43         ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-27  5:27 UTC (permalink / raw)
  To: Chen Jing D(Mark), dev

Hi,

On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
>
> Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> with SSE instructions.
>
> Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> ---
>   drivers/net/fm10k/fm10k.h          |    2 +
>   drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 90 insertions(+), 0 deletions(-)
>
[...]
> +
> +/*
> + * vPMD receive routine that reassembles scattered packets
> + *
> + * Notice:
> + * - don't support ol_flags for rss and csum err
> + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
> + *   numbers of DD bit
In order to make sure nb_pkts > RTE_IXGBE_MAX_RX_BURST, it's necessary 
to do RTE_MIN().
> + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> + */
> +uint16_t
> +fm10k_recv_scattered_pkts_vec(void *rx_queue,
> +				struct rte_mbuf **rx_pkts,
> +				uint16_t nb_pkts)
> +{
> +	struct fm10k_rx_queue *rxq = rx_queue;
> +	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
> +	unsigned i = 0;
> +
> +	/* get some new buffers */
> +	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
> +			split_flags);
> +	if (nb_bufs == 0)
> +		return 0;
> +
> +	/* happy day case, full burst + no packets to be joined */
> +	const uint64_t *split_fl64 = (uint64_t *)split_flags;
> +	if (rxq->pkt_first_seg == NULL &&
> +			split_fl64[0] == 0 && split_fl64[1] == 0 &&
> +			split_fl64[2] == 0 && split_fl64[3] == 0)
> +		return nb_bufs;
> +
> +	/* reassemble any packets that need reassembly*/
> +	if (rxq->pkt_first_seg == NULL) {
> +		/* find the first split flag, and only reassemble then*/
> +		while (i < nb_bufs && !split_flags[i])
> +			i++;
> +		if (i == nb_bufs)
> +			return nb_bufs;
> +	}
> +	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
> +		&split_flags[i]);
> +}

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function
  2015-10-27  5:24       ` Liang, Cunming
@ 2015-10-27  5:32         ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-10-27  5:32 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: Liang, Cunming
> Sent: Tuesday, October 27, 2015 1:25 PM
> To: Chen, Jing D; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v2 06/16] fm10k: add Vector RX function
> 
> Hi,
> 
> On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> >
> > Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
> > includes possible chained packets.
> > Add func fm10k_recv_pkts_vec to receive single mbuf packet.
> >
> > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > ---
> >   drivers/net/fm10k/fm10k.h          |    1 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |  196
> ++++++++++++++++++++++++++++++++++++
> >   2 files changed, 197 insertions(+), 0 deletions(-)
> [...]
> > +	/* mask to shuffle from desc. to mbuf */
> > +	shuf_msk = _mm_set_epi8(
> > +		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
> > +		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
> > +		13, 12,      /* octet 12~13, 16 bits data_len */
> > +		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
> > +		13, 12,      /* octet 12~13, low 16 bits pkt_len */
> > +		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
> > +		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
> > +		);
> > +
> > +	/* Cache is empty -> need to scan the buffer rings, but first move
> > +	 * the next 'n' mbufs into the cache
> > +	 */
> > +	mbufp = &rxq->sw_ring[next_dd];
> > +
> > +	/* A. load 4 packet in one loop
> > +	 * [A*. mask out 4 unused dirty field in desc]
> > +	 * B. copy 4 mbuf point from swring to rx_pkts
> > +	 * C. calc the number of DD bits among the 4 packets
> > +	 * [C*. extract the end-of-packet bit, if requested]
> > +	 * D. fill info. from desc to mbuf
> > +	 */
> > +	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
> It's necessary to floor align the nb_pkts into RTE_FM10K_DESCS_PER_LOOP,
> otherwise it may exceed the rx_pkts array.
> e.g. nb_pkts is 6, it executes twice in the loop which has chance to get
> 8 packets done, but rx_pkts only expect 6 packets.

You are right. I'll change accordingly. 

> >
> > +			pos += RTE_FM10K_DESCS_PER_LOOP,
> > +			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
> > +		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
> > +		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> > +		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
> > +		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg.
> */
> > +
> >

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function
  2015-10-27  5:27       ` Liang, Cunming
@ 2015-10-27  5:43         ` Chen, Jing D
  2015-10-27  5:55           ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Chen, Jing D @ 2015-10-27  5:43 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: Liang, Cunming
> Sent: Tuesday, October 27, 2015 1:28 PM
> To: Chen, Jing D; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v2 08/16] fm10k: add Vector RX scatter function
> 
> Hi,
> 
> On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> >
> > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > with SSE instructions.
> >
> > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > ---
> >   drivers/net/fm10k/fm10k.h          |    2 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> ++++++++++++++++++++++++++++++++++++
> >   2 files changed, 90 insertions(+), 0 deletions(-)
> >
> [...]
> > +
> > +/*
> > + * vPMD receive routine that reassembles scattered packets
> > + *
> > + * Notice:
> > + * - don't support ol_flags for rss and csum err
> > + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> > + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan
> RTE_IXGBE_MAX_RX_BURST
> > + *   numbers of DD bit
> In order to make sure nb_pkts > RTE_IXGBE_MAX_RX_BURST, it's necessary
> to do RTE_MIN().

I'll remove the improper comments. In func fm10k_recv_raw_pkts_vec, it will use
nb_pkts as index to iterate properly.
After then, below func will use actual received packet size nb_bufs as index to iterate.
So, I think RTE_MIN() is not necessary?

> > + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> > + */
> > +uint16_t
> > +fm10k_recv_scattered_pkts_vec(void *rx_queue,
> > +				struct rte_mbuf **rx_pkts,
> > +				uint16_t nb_pkts)
> > +{
> > +	struct fm10k_rx_queue *rxq = rx_queue;
> > +	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
> > +	unsigned i = 0;
> > +
> > +	/* get some new buffers */
> > +	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
> > +			split_flags);
> > +	if (nb_bufs == 0)
> > +		return 0;
> > +
> > +	/* happy day case, full burst + no packets to be joined */
> > +	const uint64_t *split_fl64 = (uint64_t *)split_flags;
> > +	if (rxq->pkt_first_seg == NULL &&
> > +			split_fl64[0] == 0 && split_fl64[1] == 0 &&
> > +			split_fl64[2] == 0 && split_fl64[3] == 0)
> > +		return nb_bufs;
> > +
> > +	/* reassemble any packets that need reassembly*/
> > +	if (rxq->pkt_first_seg == NULL) {
> > +		/* find the first split flag, and only reassemble then*/
> > +		while (i < nb_bufs && !split_flags[i])
> > +			i++;
> > +		if (i == nb_bufs)
> > +			return nb_bufs;
> > +	}
> > +	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
> > +		&split_flags[i]);
> > +}

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function
  2015-10-27  5:43         ` Chen, Jing D
@ 2015-10-27  5:55           ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-10-27  5:55 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chen, Jing D
> Sent: Tuesday, October 27, 2015 1:44 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter
> function
> 
> Hi, Steve,
> 
> Best Regards,
> Mark
> 
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Tuesday, October 27, 2015 1:28 PM
> > To: Chen, Jing D; dev@dpdk.org
> > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > Subject: Re: [PATCH v2 08/16] fm10k: add Vector RX scatter function
> >
> > Hi,
> >
> > On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> > > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > >
> > > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > > with SSE instructions.
> > >
> > > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > > ---
> > >   drivers/net/fm10k/fm10k.h          |    2 +
> > >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> > ++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 90 insertions(+), 0 deletions(-)
> > >
> > [...]
> > > +
> > > +/*
> > > + * vPMD receive routine that reassembles scattered packets
> > > + *
> > > + * Notice:
> > > + * - don't support ol_flags for rss and csum err
> > > + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> > > + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan
> > RTE_IXGBE_MAX_RX_BURST
> > > + *   numbers of DD bit
> > In order to make sure nb_pkts > RTE_IXGBE_MAX_RX_BURST, it's
> necessary
> > to do RTE_MIN().

My bad. You indicates nb_pkts should be less or equal than RTE_IXGBE_MAX_TX_BURST.
I'll change accordingly.

> 
> I'll remove the improper comments. In func fm10k_recv_raw_pkts_vec, it
> will use
> nb_pkts as index to iterate properly.
> After then, below func will use actual received packet size nb_bufs as index
> to iterate.
> So, I think RTE_MIN() is not necessary?
> 
> > > + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> > > + */
> > > +uint16_t
> > > +fm10k_recv_scattered_pkts_vec(void *rx_queue,
> > > +				struct rte_mbuf **rx_pkts,
> > > +				uint16_t nb_pkts)
> > > +{
> > > +	struct fm10k_rx_queue *rxq = rx_queue;
> > > +	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
> > > +	unsigned i = 0;
> > > +
> > > +	/* get some new buffers */
> > > +	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
> > > +			split_flags);
> > > +	if (nb_bufs == 0)
> > > +		return 0;
> > > +
> > > +	/* happy day case, full burst + no packets to be joined */
> > > +	const uint64_t *split_fl64 = (uint64_t *)split_flags;
> > > +	if (rxq->pkt_first_seg == NULL &&
> > > +			split_fl64[0] == 0 && split_fl64[1] == 0 &&
> > > +			split_fl64[2] == 0 && split_fl64[3] == 0)
> > > +		return nb_bufs;
> > > +
> > > +	/* reassemble any packets that need reassembly*/
> > > +	if (rxq->pkt_first_seg == NULL) {
> > > +		/* find the first split flag, and only reassemble then*/
> > > +		while (i < nb_bufs && !split_flags[i])
> > > +			i++;
> > > +		if (i == nb_bufs)
> > > +			return nb_bufs;
> > > +	}
> > > +	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
> > > +		&split_flags[i]);
> > > +}

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k
  2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
  2015-10-22 15:58       ` Stephen Hemminger
@ 2015-10-27  9:46       ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
                           ` (15 more replies)
  1 sibling, 16 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

v3:
 - Add a blank line after variable definition.
 - Do floor alignment for passing in argument nb_pkts to avoid memory overwritten.
 - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.

v2:
 - Fix a typo issue.
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (16):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add func to re-allocate mbuf for RX ring
  fm10k: add 2 functions to parse pkt_type and offload flag
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |    5 +
 drivers/net/fm10k/Makefile           |    1 +
 drivers/net/fm10k/fm10k.h            |   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c     |  169 ++++++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  839 ++++++++++++++++++++++++++++++++++
 5 files changed, 1031 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 01/16] fm10k: add new vPMD file
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
                           ` (14 subsequent siblings)
  15 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/Makefile         |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 0000000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <inttypes.h>
+
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include <tmmintrin.h>
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 02/16] fm10k: add vPMD pre-condition check for each RX queue
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
                           ` (13 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   11 ++++++++---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++++++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
 	/* Protect the mailbox to avoid race condition */
 	rte_spinlock_t    mbx_lock;
 	struct fm10k_macvlan_filter_info    macvlan;
+	/* Flag to indicate if RX vector conditions satisfied */
+	bool rx_vec_allowed;
 };
 
 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
 	struct rte_mempool *mp;
 	struct rte_mbuf **sw_ring;
 	volatile union fm10k_rx_desc *hw_ring;
-	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
+	uint64_t mbuf_initializer; /* value to init mbufs */
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
 	uint16_t queue_id;
 	uint8_t port_id;
 	uint8_t drop_en;
-	uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };
 
 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..3c7784e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1251,6 +1251,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
 	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
 	struct fm10k_rx_queue *q;
 	const struct rte_memzone *mz;
 
@@ -1333,6 +1334,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->hw_ring_phys_addr = mz->phys_addr;
 #endif
 
+	/* Check if number of descs satisfied Vector requirement */
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+				    "preconditions - canceling the feature for "
+				    "the whole port[%d]",
+			     q->queue_id, q->port_id);
+		dev_info->rx_vec_allowed = false;
+	} else
+		fm10k_rxq_vec_setup(q);
+
 	dev->data->rx_queues[queue_id] = q;
 	return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	/* data_off will be ajusted after new mbuf allocated for 512-byte
+	 * alignment.
+	 */
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+	return 0;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 03/16] fm10k: Add a new func to initialize all parameters
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
                           ` (12 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7784e..363ef98 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2066,6 +2066,27 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+
+	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+	 * there is no way to get link status without reading BAR4.  Until this
+	 * works, assume we have maximum bandwidth.
+	 * @todo - fix bus info
+	 */
+	hw->bus_caps.speed = fm10k_bus_speed_8000;
+	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+	hw->bus_caps.payload = fm10k_bus_payload_512;
+	hw->bus.speed = fm10k_bus_speed_8000;
+	hw->bus.width = fm10k_bus_width_pcie_x8;
+	hw->bus.payload = fm10k_bus_payload_256;
+
+	info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2112,18 +2133,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 		return -EIO;
 	}
 
-	/*
-	 * Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-	 * there is no way to get link status without reading BAR4.  Until this
-	 * works, assume we have maximum bandwidth.
-	 * @todo - fix bus info
-	 */
-	hw->bus_caps.speed = fm10k_bus_speed_8000;
-	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-	hw->bus_caps.payload = fm10k_bus_payload_512;
-	hw->bus.speed = fm10k_bus_speed_8000;
-	hw->bus.width = fm10k_bus_width_pcie_x8;
-	hw->bus.payload = fm10k_bus_payload_256;
+	/* Initialize parameters */
+	fm10k_params_init(dev);
 
 	/* Initialize the hw */
 	diag = fm10k_init_hw(hw);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (2 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-28 13:58           ` Liang, Cunming
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
                           ` (11 subsequent siblings)
  15 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    9 ++++
 drivers/net/fm10k/fm10k_ethdev.c   |    3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   90 ++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..5df7960 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)    (1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)    ((vlan_id) >> 5)
 
+#define RTE_FM10K_RXQ_REARM_THRESH      32
+#define RTE_FM10K_VPMD_TX_BURST         32
+#define RTE_FM10K_MAX_RX_BURST          RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ    64
+#define RTE_FM10K_DESCS_PER_LOOP    4
+
 struct fm10k_macvlan_filter_info {
 	uint16_t vlan_num;       /* Total VLAN number */
 	uint16_t mac_num;        /* Total mac number */
@@ -178,6 +184,9 @@ struct fm10k_rx_queue {
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint16_t queue_id;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
+	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 363ef98..44c3d34 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
 	FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+	q->rxrearm_start = 0;
+	q->rxrearm_nb = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..75533f9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -64,3 +64,93 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 	rxq->mbuf_initializer = *(uint64_t *)p;
 	return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+	int i;
+	uint16_t rx_id;
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
+	struct rte_mbuf *mb0, *mb1;
+	__m128i head_off = _mm_set_epi64x(
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+	__m128i dma_addr0, dma_addr1;
+	/* Rx buffer need to be aligned with 512 byte */
+	const __m128i hba_msk = _mm_set_epi64x(0,
+				UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+	rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+	/* Pull 'n' more MBUFs into the software ring */
+	if (rte_mempool_get_bulk(rxq->mp,
+				 (void *)mb_alloc,
+				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_FM10K_RXQ_REARM_THRESH;
+		return;
+	}
+
+	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
+	for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+		__m128i vaddr0, vaddr1;
+		uintptr_t p0, p1;
+
+		mb0 = mb_alloc[0];
+		mb1 = mb_alloc[1];
+
+		/* Flush mbuf with pkt template.
+		 * Data to be rearmed is 6 bytes long.
+		 * Though, RX will overwrite ol_flags that are coming next
+		 * anyway. So overwrite whole 8 bytes with one load:
+		 * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+		 */
+		p0 = (uintptr_t)&mb0->rearm_data;
+		*(uint64_t *)p0 = rxq->mbuf_initializer;
+		p1 = (uintptr_t)&mb1->rearm_data;
+		*(uint64_t *)p1 = rxq->mbuf_initializer;
+
+		/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
+		vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
+		vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
+
+		/* convert pa to dma_addr hdr/data */
+		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+		/* add headroom to pa values */
+		dma_addr0 = _mm_add_epi64(dma_addr0, head_off);
+		dma_addr1 = _mm_add_epi64(dma_addr1, head_off);
+
+		/* Do 512 byte alignment to satisfy HW requirement, in the
+		 * meanwhile, set Header Buffer Address to zero.
+		 */
+		dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
+		dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
+
+		/* flush desc with pa dma_addr */
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr0);
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr1);
+
+		/* enforce 512B alignment on default Rx virtual addresses */
+		mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb0->buf_addr);
+		mb1->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb1->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb1->buf_addr);
+	}
+
+	rxq->rxrearm_start += RTE_FM10K_RXQ_REARM_THRESH;
+	if (rxq->rxrearm_start >= rxq->nb_desc)
+		rxq->rxrearm_start = 0;
+
+	rxq->rxrearm_nb -= RTE_FM10K_RXQ_REARM_THRESH;
+
+	rx_id = (uint16_t) ((rxq->rxrearm_start == 0) ?
+			     (rxq->nb_desc - 1) : (rxq->rxrearm_start - 1));
+
+	/* Update the tail pointer on the NIC */
+	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 05/16] fm10k: add 2 functions to parse pkt_type and offload flag
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (3 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
                           ` (10 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |  127 ++++++++++++++++++++++++++++++++++++
 1 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 75533f9..581a309 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT     (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT     (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT     (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i ptype0, ptype1, vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	const __m128i pkttype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+	/* mask everything except rss type */
+	const __m128i rsstype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x000F, 0x000F, 0x000F, 0x000F);
+
+	/* map rss type to rss hash flag */
+	const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+			0, 0, 0, PKT_RX_RSS_HASH,
+			PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+			PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+	ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+	ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+	ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+	ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+	vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+	vtag1 = _mm_or_si128(ptype0, vtag1);
+	vol.dword = _mm_cvtsi128_si64(vtag1);
+
+	rx_pkts[0]->ol_flags = vol.e[0];
+	rx_pkts[1]->ol_flags = vol.e[1];
+	rx_pkts[2]->ol_flags = vol.e[2];
+	rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i l3l4type0, l3l4type1, l3type, l4type;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	/* L3 pkt type mask  Bit4 to Bit6 */
+	const __m128i l3type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0070, 0x0070, 0x0070, 0x0070);
+
+	/* L4 pkt type mask  Bit7 to Bit9 */
+	const __m128i l4type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0380, 0x0380, 0x0380, 0x0380);
+
+	/* convert RRC l3 type to mbuf format */
+	const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+			0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+			RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+			RTE_PTYPE_L3_IPV4, 0);
+
+	/* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+	 * to fill into8 bits length.
+	 */
+	const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+			RTE_PTYPE_TUNNEL_GENEVE >> 8,
+			RTE_PTYPE_TUNNEL_NVGRE >> 8,
+			RTE_PTYPE_TUNNEL_VXLAN >> 8,
+			RTE_PTYPE_TUNNEL_GRE >> 8,
+			RTE_PTYPE_L4_UDP >> 8,
+			RTE_PTYPE_L4_TCP >> 8,
+			0);
+
+	l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+	l3type = _mm_and_si128(l3l4type0, l3type_msk);
+	l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+	l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+	l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+	l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+	/* l4type_flags shift-left for 8 bits, need shift-right back */
+	l4type = _mm_shuffle_epi8(l4type_flags, l4type);
+
+	l4type = _mm_slli_epi16(l4type, 8);
+	l3l4type0 = _mm_or_si128(l3type, l4type);
+	vol.dword = _mm_cvtsi128_si64(l3l4type0);
+
+	rx_pkts[0]->packet_type = vol.e[0];
+	rx_pkts[1]->packet_type = vol.e[1];
+	rx_pkts[2]->packet_type = vol.e[2];
+	rx_pkts[3]->packet_type = vol.e[3];
+}
+#else
+#define fm10k_desc_to_olflags_v(desc, rx_pkts) do {} while (0)
+#define fm10k_desc_to_pktype_v(desc, rx_pkts) do {} while (0)
+#endif
+
 int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 06/16] fm10k: add Vector RX function
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (4 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
                           ` (9 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  201 ++++++++++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5df7960..f04ba2c 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 581a309..6127cf9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -281,3 +281,204 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	/* Update the tail pointer on the NIC */
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
+
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts, uint8_t *split_packet)
+{
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mbufp;
+	uint16_t nb_pkts_recd;
+	int pos;
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint64_t var;
+	__m128i shuf_msk;
+	__m128i dd_check, eop_check;
+	uint16_t next_dd;
+
+	next_dd = rxq->next_dd;
+
+	/* Just the act of getting into the function from the application is
+	 * going to cost about 7 cycles
+	 */
+	rxdp = rxq->hw_ring + next_dd;
+
+	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+	/* See if we need to rearm the RX queue - gives the prefetch a bit
+	 * of time to act
+	 */
+	if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+		fm10k_rxq_rearm(rxq);
+
+	/* Before we start moving massive data around, check to see if
+	 * there is actually a packet available
+	 */
+	if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+		return 0;
+
+	/* Vecotr RX will process 4 packets at a time, strip the unaligned
+	 * tails in case it's not multiple of 4.
+	 */
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_FM10K_DESCS_PER_LOOP);
+
+	/* 4 packets DD mask */
+	dd_check = _mm_set_epi64x(0x0000000100000001LL, 0x0000000100000001LL);
+
+	/* 4 packets EOP mask */
+	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
+
+	/* mask to shuffle from desc. to mbuf */
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+		);
+
+	/* Cache is empty -> need to scan the buffer rings, but first move
+	 * the next 'n' mbufs into the cache
+	 */
+	mbufp = &rxq->sw_ring[next_dd];
+
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+			pos += RTE_FM10K_DESCS_PER_LOOP,
+			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+		/* B.1 load 1 mbuf point */
+		mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
+
+		/* Read desc statuses backwards to avoid race condition */
+		/* A.1 load 4 pkts desc */
+		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+
+		/* B.1 load 1 mbuf point */
+		mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
+
+		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		/* B.1 load 2 mbuf point */
+		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+
+		/* avoid compiler reorder optimization */
+		rte_compiler_barrier();
+
+		if (split_packet) {
+			rte_prefetch0(&rx_pkts[pos]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
+		}
+
+		/* D.1 pkt 3,4 convert format from desc to pktmbuf */
+		pkt_mb4 = _mm_shuffle_epi8(descs0[3], shuf_msk);
+		pkt_mb3 = _mm_shuffle_epi8(descs0[2], shuf_msk);
+
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp2 = _mm_unpackhi_epi32(descs0[3], descs0[2]);
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp1 = _mm_unpackhi_epi32(descs0[1], descs0[0]);
+
+		/* set ol_flags with vlan packet type */
+		fm10k_desc_to_olflags_v(descs0, &rx_pkts[pos]);
+
+		/* D.1 pkt 1,2 convert format from desc to pktmbuf */
+		pkt_mb2 = _mm_shuffle_epi8(descs0[1], shuf_msk);
+		pkt_mb1 = _mm_shuffle_epi8(descs0[0], shuf_msk);
+
+		/* C.2 get 4 pkts staterr value  */
+		zero = _mm_xor_si128(dd_check, dd_check);
+		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
+
+		/* D.3 copy final 3,4 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
+				pkt_mb4);
+		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
+				pkt_mb3);
+
+		/* C* extract and record EOP bit */
+		if (split_packet) {
+			__m128i eop_shuf_mask = _mm_set_epi8(
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0x04, 0x0C, 0x00, 0x08
+					);
+
+			/* and with mask to extract bits, flipping 1-0 */
+			__m128i eop_bits = _mm_andnot_si128(staterr, eop_check);
+			/* the staterr values are not in order, as the count
+			 * count of dd bits doesn't care. However, for end of
+			 * packet tracking, we do care, so shuffle. This also
+			 * compresses the 32-bit values to 8-bit
+			 */
+			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
+			/* store the resulting 32-bit value */
+			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
+			split_packet += RTE_FM10K_DESCS_PER_LOOP;
+
+			/* zero-out next pointers */
+			rx_pkts[pos]->next = NULL;
+			rx_pkts[pos + 1]->next = NULL;
+			rx_pkts[pos + 2]->next = NULL;
+			rx_pkts[pos + 3]->next = NULL;
+		}
+
+		/* C.3 calc available number of desc */
+		staterr = _mm_and_si128(staterr, dd_check);
+		staterr = _mm_packs_epi32(staterr, zero);
+
+		/* D.3 copy final 1,2 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
+				pkt_mb2);
+		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
+				pkt_mb1);
+
+		fm10k_desc_to_pktype_v(descs0, &rx_pkts[pos]);
+
+		/* C.4 calc avaialbe number of desc */
+		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
+		nb_pkts_recd += var;
+		if (likely(var != RTE_FM10K_DESCS_PER_LOOP))
+			break;
+	}
+
+	/* Update our internal tail pointer */
+	rxq->next_dd = (uint16_t)(rxq->next_dd + nb_pkts_recd);
+	rxq->next_dd = (uint16_t)(rxq->next_dd & (rxq->nb_desc - 1));
+	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
+
+	return nb_pkts_recd;
+}
+
+/* vPMD receive routine
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ */
+uint16_t
+fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts)
+{
+	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 07/16] fm10k: add func to do Vector RX condition check
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (5 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
                           ` (8 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index f04ba2c..1502ae3 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,5 +327,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 6127cf9..2e6f1a2 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 #endif
 
 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+	/* whithout rx ol_flags, no VP flag report */
+	if (rxmode->hw_vlan_extend != 0)
+		return -1;
+#endif
+
+	/* no fdir support */
+	if (fconf->mode != RTE_FDIR_MODE_NONE)
+		return -1;
+
+	/* - no csum error report support
+	 * - no header split support
+	 */
+	if (rxmode->hw_ip_checksum == 1 ||
+	    rxmode->header_split == 1)
+		return -1;
+
+	return 0;
+#else
+	RTE_SET_USED(dev);
+	return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
 	uintptr_t p;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (6 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-28 14:30           ` Liang, Cunming
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
                           ` (7 subsequent siblings)
  15 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 1502ae3..06697fa 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+					uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 2e6f1a2..3fd5d45 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -513,3 +513,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 {
 	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+		struct rte_mbuf **rx_bufs,
+		uint16_t nb_bufs, uint8_t *split_flags)
+{
+	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+	struct rte_mbuf *start = rxq->pkt_first_seg;
+	struct rte_mbuf *end =  rxq->pkt_last_seg;
+	unsigned pkt_idx, buf_idx;
+
+
+	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+		if (end != NULL) {
+			/* processing a split packet */
+			end->next = rx_bufs[buf_idx];
+			start->nb_segs++;
+			start->pkt_len += rx_bufs[buf_idx]->data_len;
+			end = end->next;
+
+			if (!split_flags[buf_idx]) {
+				/* it's the last packet of the set */
+				start->hash = end->hash;
+				start->ol_flags = end->ol_flags;
+				pkts[pkt_idx++] = start;
+				start = end = NULL;
+			}
+		} else {
+			/* not processing a split packet */
+			if (!split_flags[buf_idx]) {
+				/* not a split packet, save and skip */
+				pkts[pkt_idx++] = rx_bufs[buf_idx];
+				continue;
+			}
+			end = start = rx_bufs[buf_idx];
+		}
+	}
+
+	/* save the partial packet for next time */
+	rxq->pkt_first_seg = start;
+	rxq->pkt_last_seg = end;
+	memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+	return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts > RTE_FM10K_MAX_RX_BURST, only scan RTE_FM10K_MAX_RX_BURST
+ *   numbers of DD bit
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts)
+{
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+	unsigned i = 0;
+
+	/* Split_flags only can support max of RTE_FM10K_MAX_RX_BURST */
+	nb_pkts = RTE_MIN(nb_pkts, RTE_FM10K_MAX_RX_BURST);
+	/* get some new buffers */
+	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+			split_flags);
+	if (nb_bufs == 0)
+		return 0;
+
+	/* happy day case, full burst + no packets to be joined */
+	const uint64_t *split_fl64 = (uint64_t *)split_flags;
+	if (rxq->pkt_first_seg == NULL &&
+			split_fl64[0] == 0 && split_fl64[1] == 0 &&
+			split_fl64[2] == 0 && split_fl64[3] == 0)
+		return nb_bufs;
+
+	/* reassemble any packets that need reassembly*/
+	if (rxq->pkt_first_seg == NULL) {
+		/* find the first split flag, and only reassemble then*/
+		while (i < nb_bufs && !split_flags[i])
+			i++;
+		if (i == nb_bufs)
+			return nb_bufs;
+	}
+	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+		&split_flags[i]);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 09/16] fm10k: add function to decide best RX function
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (7 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
                           ` (6 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 06697fa..8614e81 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -187,6 +187,7 @@ struct fm10k_rx_queue {
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
 	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+	uint16_t rx_using_sse; /* indicates that vector RX is in use */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 44c3d34..4690a0c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 			dev->data->dev_conf.rxmode.enable_scatter) {
 			uint32_t reg;
 			dev->data->scattered_rx = 1;
-			dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
 			reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
 			reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
 			FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 
 	/* Configure RSS if applicable */
 	fm10k_dev_mq_rx_configure(dev);
+
+	/* Decide the best RX function */
+	fm10k_set_rx_function(dev);
 	return 0;
 }
 
@@ -2069,6 +2072,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	uint16_t i, rx_using_sse;
+
+	/* In order to allow Vector Rx there are a few configuration
+	 * conditions to be met.
+	 */
+	if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+		if (dev->data->scattered_rx)
+			dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+		else
+			dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+	} else if (dev->data->scattered_rx)
+		dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+	rx_using_sse =
+		(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+		dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+		rxq->rx_using_sse = rx_using_sse;
+	}
+
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2103,9 +2134,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
 
-	if (dev->data->scattered_rx)
-		dev->rx_pkt_burst = &fm10k_recv_scattered_pkts;
-
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 10/16] fm10k: add func to release mbuf in case Vector RX applied
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (8 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
                           ` (5 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_ethdev.c   |    6 ++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8614e81..c5e66e2 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,6 +329,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 4690a0c..a46a349 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* vPMD driver has a different way of releasing mbufs. */
+	if (q->rx_using_sse) {
+		fm10k_rx_queue_release_mbufs_vec(q);
+		return;
+	}
+
 	/* free software buffers */
 	for (i = 0; i < q->nb_desc; ++i) {
 		if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 3fd5d45..ea85996 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -313,6 +313,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
 
+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+	const unsigned mask = rxq->nb_desc - 1;
+	unsigned i;
+
+	if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+		return;
+
+	/* free all mbufs that are valid in the ring */
+	for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+		rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+	rxq->rxrearm_nb = rxq->nb_desc;
+
+	/* set all entries to NULL */
+	memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 11/16] fm10k: add Vector TX function
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (9 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
                           ` (4 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c5e66e2..0a4c174 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -215,6 +215,9 @@ struct fm10k_tx_queue {
 	uint16_t nb_used;
 	uint16_t free_thresh;
 	uint16_t rs_thresh;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t next_rs; /* Next pos to set RS flag */
+	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint8_t port_id;
@@ -333,4 +336,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ea85996..e802eec 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -619,3 +619,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
 		&split_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf *pkt, uint64_t flags)
+{
+	__m128i descriptor = _mm_set_epi64x(flags << 56 |
+			pkt->vlan_tci << 16 | pkt->data_len,
+			MBUF_DMA_ADDR(pkt));
+	_mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+	int i;
+
+	for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+		vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+	struct rte_mbuf **txep;
+	uint8_t flags;
+	uint32_t n;
+	uint32_t i;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+	/* check DD bit on threshold descriptor */
+	flags = txq->hw_ring[txq->next_dd].flags;
+	if (!(flags & FM10K_TXD_FLAG_DONE))
+		return 0;
+
+	n = txq->rs_thresh;
+
+	/* First buffer to free from S/W ring is at index
+	 * next_dd - (rs_thresh-1)
+	 */
+	txep = &txq->sw_ring[txq->next_dd - (n - 1)];
+	m = __rte_pktmbuf_prefree_seg(txep[0]);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+							(void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	/* buffers were freed, update counters */
+	txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+	txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+	if (txq->next_dd >= txq->nb_desc)
+		txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+	return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i;
+
+	for (i = 0; i < (int)nb_pkts; ++i)
+		txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts)
+{
+	struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+	volatile struct fm10k_tx_desc *txdp;
+	struct rte_mbuf **txep;
+	uint16_t n, nb_commit, tx_id;
+	uint64_t flags = FM10K_TXD_FLAG_LAST;
+	uint64_t rs = FM10K_TXD_FLAG_RS | FM10K_TXD_FLAG_LAST;
+	int i;
+
+	/* cross rx_thresh boundary is not allowed */
+	nb_pkts = RTE_MIN(nb_pkts, txq->rs_thresh);
+
+	if (txq->nb_free < txq->free_thresh)
+		fm10k_tx_free_bufs(txq);
+
+	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_free, nb_pkts);
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	tx_id = txq->next_free;
+	txdp = &txq->hw_ring[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
+	txq->nb_free = (uint16_t)(txq->nb_free - nb_pkts);
+
+	n = (uint16_t)(txq->nb_desc - tx_id);
+	if (nb_commit >= n) {
+		tx_backlog_entry(txep, tx_pkts, n);
+
+		for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
+			vtx1(txdp, *tx_pkts, flags);
+
+		vtx1(txdp, *tx_pkts++, rs);
+
+		nb_commit = (uint16_t)(nb_commit - n);
+
+		tx_id = 0;
+		txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+		/* avoid reach the end of ring */
+		txdp = &(txq->hw_ring[tx_id]);
+		txep = &txq->sw_ring[tx_id];
+	}
+
+	tx_backlog_entry(txep, tx_pkts, nb_commit);
+
+	vtx(txdp, tx_pkts, nb_commit, flags);
+
+	tx_id = (uint16_t)(tx_id + nb_commit);
+	if (tx_id > txq->next_rs) {
+		txq->hw_ring[txq->next_rs].flags |= FM10K_TXD_FLAG_RS;
+		txq->next_rs = (uint16_t)(txq->next_rs + txq->rs_thresh);
+	}
+
+	txq->next_free = tx_id;
+
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, txq->next_free);
+
+	return nb_pkts;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 12/16] fm10k: use func pointer to reset TX queue and mbuf release
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (10 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
                           ` (3 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++++++
 drivers/net/fm10k/fm10k_ethdev.c |   21 ++++++++++++++++-----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 0a4c174..2bead12 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -204,11 +204,14 @@ struct fifo {
 	uint16_t *endp;
 };
 
+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
 	struct rte_mbuf **sw_ring;
 	struct fm10k_tx_desc *hw_ring;
 	uint64_t hw_ring_phys_addr;
 	struct fifo rs_tracker;
+	const struct fm10k_txq_ops *ops; /* txq ops */
 	uint16_t last_free;
 	uint16_t next_free;
 	uint16_t nb_free;
@@ -225,6 +228,11 @@ struct fm10k_tx_queue {
 	uint16_t queue_id;
 };
 
+struct fm10k_txq_ops {
+	void (*release_mbufs)(struct fm10k_tx_queue *txq);
+	void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
 	((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))
 
@@ -338,4 +346,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a46a349..88bd887 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
 	return 0;
 }
 
+static const struct fm10k_txq_ops def_txq_ops = {
+	.release_mbufs = tx_queue_free,
+	.reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,8 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	PMD_INIT_FUNC_TRACE();
 
 	if (tx_queue_id < dev->data->nb_tx_queues) {
-		tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+		struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+		q->ops->reset(q);
 
 		/* reset head and tail pointers */
 		FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +843,10 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 
 	if (dev->data->tx_queues) {
-		for (i = 0; i < dev->data->nb_tx_queues; i++)
-			fm10k_tx_queue_release(dev->data->tx_queues[i]);
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+			txq->ops->release_mbufs(txq);
+		}
 	}
 
 	if (dev->data->rx_queues) {
@@ -1454,7 +1462,8 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	 * different socket than was previously used.
 	 */
 	if (dev->data->tx_queues[queue_id] != NULL) {
-		tx_queue_free(dev->data->tx_queues[queue_id]);
+		struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+		txq->ops->release_mbufs(txq);
 		dev->data->tx_queues[queue_id] = NULL;
 	}
 
@@ -1470,6 +1479,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
 	if (handle_txconf(q, conf))
@@ -1528,9 +1538,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+	struct fm10k_tx_queue *q = queue;
 	PMD_INIT_FUNC_TRACE();
 
-	tx_queue_free(queue);
+	q->ops->release_mbufs(q);
 }
 
 static int
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 13/16] fm10k: introduce 2 funcs to reset TX queue and mbuf release
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (11 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
                           ` (2 subsequent siblings)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 ++++++++++++++++++++++++++++++++++++
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index e802eec..7ef7910 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -620,6 +625,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 		&split_flags[i]);
 }
 
+static const struct fm10k_txq_ops vec_txq_ops = {
+	.release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+	.reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+	txq->ops = &vec_txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
@@ -769,3 +785,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+	unsigned i;
+	const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+	if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+		return;
+
+	/* release the used mbufs in sw_ring */
+	for (i = txq->next_dd - (txq->rs_thresh - 1);
+	     i != txq->next_free;
+	     i = (i + 1) & max_desc)
+		rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+	txq->nb_free = max_desc;
+
+	/* reset tx_entry */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->sw_ring[i] = NULL;
+
+	rte_free(txq->sw_ring);
+	txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+	static const struct fm10k_tx_desc zeroed_desc = {0};
+	struct rte_mbuf **txe = txq->sw_ring;
+	uint16_t i;
+
+	/* Zero out HW ring memory */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->hw_ring[i] = zeroed_desc;
+
+	/* Initialize SW ring entries */
+	for (i = 0; i < txq->nb_desc; i++)
+		txe[i] = NULL;
+
+	txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+	txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+	txq->next_free = 0;
+	txq->nb_used = 0;
+	/* Always allow 1 descriptor to be un-allocated to avoid
+	 * a H/W race condition
+	 */
+	txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 14/16] fm10k: Add function to decide best TX func
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (12 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 2bead12..68ae1b8 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -222,6 +222,7 @@ struct fm10k_tx_queue {
 	uint16_t next_rs; /* Next pos to set RS flag */
 	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
+	uint32_t txq_flags; /* Holds flags for this TXq */
 	uint16_t nb_desc;
 	uint8_t port_id;
 	uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 88bd887..469bd85 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
 
+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+				ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
 				base_addr >> (CHAR_BIT * sizeof(uint32_t)));
 		FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
 	}
+
+	/* set up vector or scalar TX function as appropriate */
+	fm10k_set_tx_function(dev);
+
 	return 0;
 }
 
@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		},
 		.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
 		.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-				ETH_TXQ_FLAGS_NOOFFLOADS,
+		.txq_flags = FM10K_SIMPLE_TX_FLAG,
 	};
 
 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->txq_flags = conf->txq_flags;
 	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };
 
 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_tx_queue *txq;
+	int i;
+	int use_sse = 1;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+			FM10K_SIMPLE_TX_FLAG) {
+			use_sse = 0;
+			break;
+		}
+	}
+
+	if (use_sse) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			fm10k_txq_vec_setup(txq);
+		}
+		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+	} else
+		dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
 	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 15/16] fm10k: fix a crash issue in vector RX func
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (13 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    4 ++++
 drivers/net/fm10k/fm10k_ethdev.c |   19 +++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 68ae1b8..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,12 +177,16 @@ struct fm10k_rx_queue {
 	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
 	uint64_t mbuf_initializer; /* value to init mbufs */
+	/* need to alloc dummy mbuf, for wraparound when scanning hw ring */
+	struct rte_mbuf fake_mbuf;
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
 	uint16_t alloc_thresh;
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
+	/* Number of faked desc added at the tail for Vector RX function */
+	uint16_t nb_fake_desc;
 	uint16_t queue_id;
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 469bd85..fb8ec0d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+	static const union fm10k_rx_desc zero = {{0}};
 	uint64_t dma_addr;
 	int i, diag;
 	PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 		q->hw_ring[i].q.hdr_addr = dma_addr;
 	}
 
+	/* initialize extra software ring entries. Space for these extra
+	 * entries is always allocated.
+	 */
+	memset(&q->fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+	for (i = 0; i < q->nb_fake_desc; ++i) {
+		q->sw_ring[q->nb_desc + i] = &q->fake_mbuf;
+		q->hw_ring[q->nb_desc + i] = zero;
+	}
+
 	q->next_dd = 0;
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* zero faked descriptors */
+	for (i = 0; i < q->nb_fake_desc; ++i)
+		q->hw_ring[q->nb_desc + i] = zero;
+
 	/* vPMD driver has a different way of releasing mbufs. */
 	if (q->rx_using_sse) {
 		fm10k_rx_queue_release_mbufs_vec(q);
@@ -1323,6 +1337,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	/* setup queue */
 	q->mp = mp;
 	q->nb_desc = nb_desc;
+	q->nb_fake_desc = FM10K_MULT_RX_DESC;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
 	q->tail_ptr = (volatile uint32_t *)
@@ -1332,8 +1347,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 
 	/* allocate memory for the software ring */
 	q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-					nb_desc * sizeof(struct rte_mbuf *),
-					RTE_CACHE_LINE_SIZE, socket_id);
+			(nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+			RTE_CACHE_LINE_SIZE, socket_id);
 	if (q->sw_ring == NULL) {
 		PMD_INIT_LOG(ERR, "Cannot allocate software ring");
 		rte_free(q);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v3 16/16] doc: release notes update for fm10k Vector PMD
  2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                           ` (14 preceding siblings ...)
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
@ 2015-10-27  9:46         ` Chen Jing D(Mark)
  15 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-27  9:46 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 doc/guides/rel_notes/release_2_2.rst |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 9a70dae..44a3f74 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers
 
   Fixed issue with libvirt ``virsh destroy`` not killing the VM.
 
+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.
 
 Libraries
 ~~~~~~~~~
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
@ 2015-10-28 13:58           ` Liang, Cunming
  2015-10-29  5:24             ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-28 13:58 UTC (permalink / raw)
  To: Chen Jing D(Mark), dev

Hi Mark,

On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
>
> Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
> in RX HW ring.
>
> Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> ---
>   drivers/net/fm10k/fm10k.h          |    9 ++++
>   drivers/net/fm10k/fm10k_ethdev.c   |    3 +
>   drivers/net/fm10k/fm10k_rxtx_vec.c |   90 ++++++++++++++++++++++++++++++++++++
>   3 files changed, 102 insertions(+), 0 deletions(-)
[...]
> +static inline void
> +fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> +{
> +	int i;
> +	uint16_t rx_id;
> +	volatile union fm10k_rx_desc *rxdp;
> +	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
> +	struct rte_mbuf *mb0, *mb1;
> +	__m128i head_off = _mm_set_epi64x(
> +			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
> +			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
> +	__m128i dma_addr0, dma_addr1;
> +	/* Rx buffer need to be aligned with 512 byte */
> +	const __m128i hba_msk = _mm_set_epi64x(0,
> +				UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
> +
> +	rxdp = rxq->hw_ring + rxq->rxrearm_start;
> +
> +	/* Pull 'n' more MBUFs into the software ring */
> +	if (rte_mempool_get_bulk(rxq->mp,
> +				 (void *)mb_alloc,
> +				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
Here's one potential issue when the failure happens. As tail won't 
update, the head will equal to tail in the end. HW won't write back 
anyway, however the SW recv_raw_pkts_vec only check DD bit, the old 
'dirty' descriptor(DD bit is not clean) will be taken and continue move 
forward to check the next which even beyond the tail. I'm sorry didn't 
catch it on the first time. /Steve
> +		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
> +			RTE_FM10K_RXQ_REARM_THRESH;
> +		return;
> +	}
> +
> +	

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-10-28 14:30           ` Liang, Cunming
  2015-10-29  5:27             ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-28 14:30 UTC (permalink / raw)
  To: Chen Jing D(Mark), dev

Hi Mark,

On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
>
> Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> with SSE instructions.
>
> Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> ---
>   drivers/net/fm10k/fm10k.h          |    2 +
>   drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 90 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> index 1502ae3..06697fa 100644
> --- a/drivers/net/fm10k/fm10k.h
> +++ b/drivers/net/fm10k/fm10k.h
> @@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
>   int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
>   uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> +uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
> +					uint16_t);
>   #endif
> diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
> index 2e6f1a2..3fd5d45 100644
> --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> @@ -513,3 +513,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>   {
>   	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
>   }
> +
> +static inline uint16_t
> +fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
> +		struct rte_mbuf **rx_bufs,
> +		uint16_t nb_bufs, uint8_t *split_flags)
> +{
> +	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
> +	struct rte_mbuf *start = rxq->pkt_first_seg;
> +	struct rte_mbuf *end =  rxq->pkt_last_seg;
> +	unsigned pkt_idx, buf_idx;
> +
> +
> +	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
> +		if (end != NULL) {
> +			/* processing a split packet */
> +			end->next = rx_bufs[buf_idx];
> +			start->nb_segs++;
> +			start->pkt_len += rx_bufs[buf_idx]->data_len;
> +			end = end->next;
> +
> +			if (!split_flags[buf_idx]) {
> +				/* it's the last packet of the set */
> +				start->hash = end->hash;
> +				start->ol_flags = end->ol_flags;
> +				pkts[pkt_idx++] = start;
> +				start = end = NULL;
> +			}
> +		} else {
> +			/* not processing a split packet */
> +			if (!split_flags[buf_idx]) {
> +				/* not a split packet, save and skip */
> +				pkts[pkt_idx++] = rx_bufs[buf_idx];
> +				continue;
> +			}
> +			end = start = rx_bufs[buf_idx];
> +		}
I guess you forgot to consider the crc_len during processing. /Steve

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-28 13:58           ` Liang, Cunming
@ 2015-10-29  5:24             ` Chen, Jing D
  2015-10-29  8:14               ` Liang, Cunming
  0 siblings, 1 reply; 109+ messages in thread
From: Chen, Jing D @ 2015-10-29  5:24 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: Liang, Cunming
> Sent: Wednesday, October 28, 2015 9:59 PM
> To: Chen, Jing D; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
> 
> Hi Mark,
> 
> On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> >
> > Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
> > in RX HW ring.
> >
> > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > ---
> >   drivers/net/fm10k/fm10k.h          |    9 ++++
> >   drivers/net/fm10k/fm10k_ethdev.c   |    3 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |   90
> ++++++++++++++++++++++++++++++++++++
> >   3 files changed, 102 insertions(+), 0 deletions(-)
> [...]
> > +static inline void
> > +fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > +{
> > +	int i;
> > +	uint16_t rx_id;
> > +	volatile union fm10k_rx_desc *rxdp;
> > +	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
> > +	struct rte_mbuf *mb0, *mb1;
> > +	__m128i head_off = _mm_set_epi64x(
> > +			RTE_PKTMBUF_HEADROOM +
> FM10K_RX_DATABUF_ALIGN - 1,
> > +			RTE_PKTMBUF_HEADROOM +
> FM10K_RX_DATABUF_ALIGN - 1);
> > +	__m128i dma_addr0, dma_addr1;
> > +	/* Rx buffer need to be aligned with 512 byte */
> > +	const __m128i hba_msk = _mm_set_epi64x(0,
> > +				UINT64_MAX - FM10K_RX_DATABUF_ALIGN
> + 1);
> > +
> > +	rxdp = rxq->hw_ring + rxq->rxrearm_start;
> > +
> > +	/* Pull 'n' more MBUFs into the software ring */
> > +	if (rte_mempool_get_bulk(rxq->mp,
> > +				 (void *)mb_alloc,
> > +				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
> Here's one potential issue when the failure happens. As tail won't
> update, the head will equal to tail in the end. HW won't write back
> anyway, however the SW recv_raw_pkts_vec only check DD bit, the old
> 'dirty' descriptor(DD bit is not clean) will be taken and continue move
> forward to check the next which even beyond the tail. I'm sorry didn't
> catch it on the first time. /Steve

I have a different view on this. In case mbuf allocation always failed and tail equaled to head, 
then HW will stop to send new packet to HW ring, as you pointed out. Then, when 
Mbuf can be allocated, this function will refill HW ring and update tail. So, HW will 
resume to fill packet to HW ring. Receive functions will continue to work. Anything I missed?

> > +		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed
> +=
> > +			RTE_FM10K_RXQ_REARM_THRESH;
> > +		return;
> > +	}
> > +
> > +

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function
  2015-10-28 14:30           ` Liang, Cunming
@ 2015-10-29  5:27             ` Chen, Jing D
  2015-10-29  8:06               ` Liang, Cunming
  0 siblings, 1 reply; 109+ messages in thread
From: Chen, Jing D @ 2015-10-29  5:27 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: Liang, Cunming
> Sent: Wednesday, October 28, 2015 10:30 PM
> To: Chen, Jing D; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v3 08/16] fm10k: add Vector RX scatter function
> 
> Hi Mark,
> 
> On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> >
> > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > with SSE instructions.
> >
> > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > ---
> >   drivers/net/fm10k/fm10k.h          |    2 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> ++++++++++++++++++++++++++++++++++++
> >   2 files changed, 90 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > index 1502ae3..06697fa 100644
> > --- a/drivers/net/fm10k/fm10k.h
> > +++ b/drivers/net/fm10k/fm10k.h
> > @@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
> >   int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> >   int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
> >   uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> > +uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
> > +					uint16_t);
> >   #endif
> > diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > index 2e6f1a2..3fd5d45 100644
> > --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > @@ -513,3 +513,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct
> rte_mbuf **rx_pkts,
> >   {
> >   	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts,
> NULL);
> >   }
> > +
> > +static inline uint16_t
> > +fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
> > +		struct rte_mbuf **rx_bufs,
> > +		uint16_t nb_bufs, uint8_t *split_flags)
> > +{
> > +	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished
> pkts*/
> > +	struct rte_mbuf *start = rxq->pkt_first_seg;
> > +	struct rte_mbuf *end =  rxq->pkt_last_seg;
> > +	unsigned pkt_idx, buf_idx;
> > +
> > +
> > +	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
> > +		if (end != NULL) {
> > +			/* processing a split packet */
> > +			end->next = rx_bufs[buf_idx];
> > +			start->nb_segs++;
> > +			start->pkt_len += rx_bufs[buf_idx]->data_len;
> > +			end = end->next;
> > +
> > +			if (!split_flags[buf_idx]) {
> > +				/* it's the last packet of the set */
> > +				start->hash = end->hash;
> > +				start->ol_flags = end->ol_flags;
> > +				pkts[pkt_idx++] = start;
> > +				start = end = NULL;
> > +			}
> > +		} else {
> > +			/* not processing a split packet */
> > +			if (!split_flags[buf_idx]) {
> > +				/* not a split packet, save and skip */
> > +				pkts[pkt_idx++] = rx_bufs[buf_idx];
> > +				continue;
> > +			}
> > +			end = start = rx_bufs[buf_idx];
> > +		}
> I guess you forgot to consider the crc_len during processing. /Steve

In fm10k, crc is always be stripped and pkt_len/data_len carried actual
data length. So, we needn't add crc_len back here.  This is a little different
from IXGBE.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function
  2015-10-29  5:27             ` Chen, Jing D
@ 2015-10-29  8:06               ` Liang, Cunming
  0 siblings, 0 replies; 109+ messages in thread
From: Liang, Cunming @ 2015-10-29  8:06 UTC (permalink / raw)
  To: Chen, Jing D, dev

Hi Mark,

> -----Original Message-----
> From: Chen, Jing D
> Sent: Thursday, October 29, 2015 1:28 PM
> To: Liang, Cunming; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: RE: [PATCH v3 08/16] fm10k: add Vector RX scatter function
> 
> Hi, Steve,
> 
> Best Regards,
> Mark
> 
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Wednesday, October 28, 2015 10:30 PM
> > To: Chen, Jing D; dev@dpdk.org
> > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > Subject: Re: [PATCH v3 08/16] fm10k: add Vector RX scatter function
> >
> > Hi Mark,
> >
> > On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > >
> > > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > > with SSE instructions.
> > >
> > > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > > ---
> > >   drivers/net/fm10k/fm10k.h          |    2 +
> > >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> > ++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 90 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > > index 1502ae3..06697fa 100644
> > > --- a/drivers/net/fm10k/fm10k.h
> > > +++ b/drivers/net/fm10k/fm10k.h
> > > @@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct
> > rte_mbuf **tx_pkts,
> > >   int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> > >   int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
> > >   uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> > > +uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
> > > +					uint16_t);
> > >   #endif
> > > diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > index 2e6f1a2..3fd5d45 100644
> > > --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > @@ -513,3 +513,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct
> > rte_mbuf **rx_pkts,
> > >   {
> > >   	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts,
> > NULL);
> > >   }
> > > +
> > > +static inline uint16_t
> > > +fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
> > > +		struct rte_mbuf **rx_bufs,
> > > +		uint16_t nb_bufs, uint8_t *split_flags)
> > > +{
> > > +	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished
> > pkts*/
> > > +	struct rte_mbuf *start = rxq->pkt_first_seg;
> > > +	struct rte_mbuf *end =  rxq->pkt_last_seg;
> > > +	unsigned pkt_idx, buf_idx;
> > > +
> > > +
> > > +	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
> > > +		if (end != NULL) {
> > > +			/* processing a split packet */
> > > +			end->next = rx_bufs[buf_idx];
> > > +			start->nb_segs++;
> > > +			start->pkt_len += rx_bufs[buf_idx]->data_len;
> > > +			end = end->next;
> > > +
> > > +			if (!split_flags[buf_idx]) {
> > > +				/* it's the last packet of the set */
> > > +				start->hash = end->hash;
> > > +				start->ol_flags = end->ol_flags;
> > > +				pkts[pkt_idx++] = start;
> > > +				start = end = NULL;
> > > +			}
> > > +		} else {
> > > +			/* not processing a split packet */
> > > +			if (!split_flags[buf_idx]) {
> > > +				/* not a split packet, save and skip */
> > > +				pkts[pkt_idx++] = rx_bufs[buf_idx];
> > > +				continue;
> > > +			}
> > > +			end = start = rx_bufs[buf_idx];
> > > +		}
> > I guess you forgot to consider the crc_len during processing. /Steve
> 
> In fm10k, crc is always be stripped and pkt_len/data_len carried actual
> data length. So, we needn't add crc_len back here.  This is a little different
> from IXGBE.
Ok, that's fine. /Steve

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-29  5:24             ` Chen, Jing D
@ 2015-10-29  8:14               ` Liang, Cunming
  2015-10-29  8:37                 ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-29  8:14 UTC (permalink / raw)
  To: Chen, Jing D, dev

Hi Mark,


> -----Original Message-----
> From: Chen, Jing D
> Sent: Thursday, October 29, 2015 1:24 PM
> To: Liang, Cunming; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: RE: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
> 
> Hi, Steve,
> 
> Best Regards,
> Mark
> 
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Wednesday, October 28, 2015 9:59 PM
> > To: Chen, Jing D; dev@dpdk.org
> > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > Subject: Re: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
> >
> > Hi Mark,
> >
> > On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > >
> > > Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
> > > in RX HW ring.
> > >
> > > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > > ---
> > >   drivers/net/fm10k/fm10k.h          |    9 ++++
> > >   drivers/net/fm10k/fm10k_ethdev.c   |    3 +
> > >   drivers/net/fm10k/fm10k_rxtx_vec.c |   90
> > ++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 102 insertions(+), 0 deletions(-)
> > [...]
> > > +static inline void
> > > +fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > > +{
> > > +	int i;
> > > +	uint16_t rx_id;
> > > +	volatile union fm10k_rx_desc *rxdp;
> > > +	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
> > > +	struct rte_mbuf *mb0, *mb1;
> > > +	__m128i head_off = _mm_set_epi64x(
> > > +			RTE_PKTMBUF_HEADROOM +
> > FM10K_RX_DATABUF_ALIGN - 1,
> > > +			RTE_PKTMBUF_HEADROOM +
> > FM10K_RX_DATABUF_ALIGN - 1);
> > > +	__m128i dma_addr0, dma_addr1;
> > > +	/* Rx buffer need to be aligned with 512 byte */
> > > +	const __m128i hba_msk = _mm_set_epi64x(0,
> > > +				UINT64_MAX - FM10K_RX_DATABUF_ALIGN
> > + 1);
> > > +
> > > +	rxdp = rxq->hw_ring + rxq->rxrearm_start;
> > > +
> > > +	/* Pull 'n' more MBUFs into the software ring */
> > > +	if (rte_mempool_get_bulk(rxq->mp,
> > > +				 (void *)mb_alloc,
> > > +				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
> > Here's one potential issue when the failure happens. As tail won't
> > update, the head will equal to tail in the end. HW won't write back
> > anyway, however the SW recv_raw_pkts_vec only check DD bit, the old
> > 'dirty' descriptor(DD bit is not clean) will be taken and continue move
> > forward to check the next which even beyond the tail. I'm sorry didn't
> > catch it on the first time. /Steve
> 
> I have a different view on this. In case mbuf allocation always failed and tail
> equaled to head,
> then HW will stop to send new packet to HW ring, as you pointed out. Then,
> when
> Mbuf can be allocated, this function will refill HW ring and update tail. 
We can't guarantee it successful to recover and allocates new mbuf before the polling SW already move beyond the un-rearmed dirty entry. 
So, HW
> will
> resume to fill packet to HW ring. Receive functions will continue to work.
The point is HW is pending on that moment, but polling receive function won't wait, it just read next DD, but the value is 1 which hasn't cleared.
> Anything I missed?
> 
> > > +		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed
> > +=
> > > +			RTE_FM10K_RXQ_REARM_THRESH;
> > > +		return;
> > > +	}
> > > +
> > > +


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-29  8:14               ` Liang, Cunming
@ 2015-10-29  8:37                 ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-10-29  8:37 UTC (permalink / raw)
  To: Liang, Cunming, dev

Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: Liang, Cunming
> Sent: Thursday, October 29, 2015 4:15 PM
> To: Chen, Jing D; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: RE: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
> 
> Hi Mark,
> 
> 
> > -----Original Message-----
> > From: Chen, Jing D
> > Sent: Thursday, October 29, 2015 1:24 PM
> > To: Liang, Cunming; dev@dpdk.org
> > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > Subject: RE: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX
> ring
> >
> > Hi, Steve,
> >
> > Best Regards,
> > Mark
> >
> >
> > > -----Original Message-----
> > > From: Liang, Cunming
> > > Sent: Wednesday, October 28, 2015 9:59 PM
> > > To: Chen, Jing D; dev@dpdk.org
> > > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > > Subject: Re: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX
> ring
> > >
> > > Hi Mark,
> > >
> > > On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > > > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > > >
> > > > Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
> > > > in RX HW ring.
> > > >
> > > > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > > > ---
> > > >   drivers/net/fm10k/fm10k.h          |    9 ++++
> > > >   drivers/net/fm10k/fm10k_ethdev.c   |    3 +
> > > >   drivers/net/fm10k/fm10k_rxtx_vec.c |   90
> > > ++++++++++++++++++++++++++++++++++++
> > > >   3 files changed, 102 insertions(+), 0 deletions(-)
> > > [...]
> > > > +static inline void
> > > > +fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > > > +{
> > > > +	int i;
> > > > +	uint16_t rx_id;
> > > > +	volatile union fm10k_rx_desc *rxdp;
> > > > +	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
> > > > +	struct rte_mbuf *mb0, *mb1;
> > > > +	__m128i head_off = _mm_set_epi64x(
> > > > +			RTE_PKTMBUF_HEADROOM +
> > > FM10K_RX_DATABUF_ALIGN - 1,
> > > > +			RTE_PKTMBUF_HEADROOM +
> > > FM10K_RX_DATABUF_ALIGN - 1);
> > > > +	__m128i dma_addr0, dma_addr1;
> > > > +	/* Rx buffer need to be aligned with 512 byte */
> > > > +	const __m128i hba_msk = _mm_set_epi64x(0,
> > > > +				UINT64_MAX - FM10K_RX_DATABUF_ALIGN
> > > + 1);
> > > > +
> > > > +	rxdp = rxq->hw_ring + rxq->rxrearm_start;
> > > > +
> > > > +	/* Pull 'n' more MBUFs into the software ring */
> > > > +	if (rte_mempool_get_bulk(rxq->mp,
> > > > +				 (void *)mb_alloc,
> > > > +				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
> > > Here's one potential issue when the failure happens. As tail won't
> > > update, the head will equal to tail in the end. HW won't write back
> > > anyway, however the SW recv_raw_pkts_vec only check DD bit, the old
> > > 'dirty' descriptor(DD bit is not clean) will be taken and continue move
> > > forward to check the next which even beyond the tail. I'm sorry didn't
> > > catch it on the first time. /Steve
> >
> > I have a different view on this. In case mbuf allocation always failed and tail
> > equaled to head,
> > then HW will stop to send new packet to HW ring, as you pointed out. Then,
> > when
> > Mbuf can be allocated, this function will refill HW ring and update tail.
> We can't guarantee it successful to recover and allocates new mbuf before
> the polling SW already move beyond the un-rearmed dirty entry.
> So, HW

Thanks! I got you. I'll change accordingly.

> > will
> > resume to fill packet to HW ring. Receive functions will continue to work.
> The point is HW is pending on that moment, but polling receive function
> won't wait, it just read next DD, but the value is 1 which hasn't cleared.
> > Anything I missed?
> >
> > > > +		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed
> > > +=
> > > > +			RTE_FM10K_RXQ_REARM_THRESH;
> > > > +		return;
> > > > +	}
> > > > +
> > > > +

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k
  2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-29  9:15           ` Chen Jing D(Mark)
  2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
                               ` (16 more replies)
  0 siblings, 17 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:15 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

v4:
 - Clear HW/SW ring content after allocating mbuf failed.

v3:
 - Add a blank line after variable definition.
 - Do floor alignment for passing in argument nb_pkts to avoid memory overwritten.
 - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.

v2:
 - Fix a typo issue.
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (16):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add func to re-allocate mbuf for RX ring
  fm10k: add 2 functions to parse pkt_type and offload flag
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |    5 +
 drivers/net/fm10k/Makefile           |    1 +
 drivers/net/fm10k/fm10k.h            |   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c     |  169 ++++++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  847 ++++++++++++++++++++++++++++++++++
 5 files changed, 1039 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
@ 2015-10-29  9:15             ` Chen Jing D(Mark)
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
                               ` (15 subsequent siblings)
  16 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:15 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/Makefile         |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 0000000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <inttypes.h>
+
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include <tmmintrin.h>
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 02/16] fm10k: add vPMD pre-condition check for each RX queue
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-29  9:15             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
                               ` (14 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:15 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   11 ++++++++---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++++++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
 	/* Protect the mailbox to avoid race condition */
 	rte_spinlock_t    mbx_lock;
 	struct fm10k_macvlan_filter_info    macvlan;
+	/* Flag to indicate if RX vector conditions satisfied */
+	bool rx_vec_allowed;
 };
 
 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
 	struct rte_mempool *mp;
 	struct rte_mbuf **sw_ring;
 	volatile union fm10k_rx_desc *hw_ring;
-	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
+	uint64_t mbuf_initializer; /* value to init mbufs */
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
 	uint16_t queue_id;
 	uint8_t port_id;
 	uint8_t drop_en;
-	uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };
 
 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..3c7784e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1251,6 +1251,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
 	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
 	struct fm10k_rx_queue *q;
 	const struct rte_memzone *mz;
 
@@ -1333,6 +1334,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->hw_ring_phys_addr = mz->phys_addr;
 #endif
 
+	/* Check if number of descs satisfied Vector requirement */
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+				    "preconditions - canceling the feature for "
+				    "the whole port[%d]",
+			     q->queue_id, q->port_id);
+		dev_info->rx_vec_allowed = false;
+	} else
+		fm10k_rxq_vec_setup(q);
+
 	dev->data->rx_queues[queue_id] = q;
 	return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	/* data_off will be ajusted after new mbuf allocated for 512-byte
+	 * alignment.
+	 */
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+	return 0;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 03/16] fm10k: Add a new func to initialize all parameters
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
  2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
                               ` (13 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7784e..363ef98 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2066,6 +2066,27 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+
+	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+	 * there is no way to get link status without reading BAR4.  Until this
+	 * works, assume we have maximum bandwidth.
+	 * @todo - fix bus info
+	 */
+	hw->bus_caps.speed = fm10k_bus_speed_8000;
+	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+	hw->bus_caps.payload = fm10k_bus_payload_512;
+	hw->bus.speed = fm10k_bus_speed_8000;
+	hw->bus.width = fm10k_bus_width_pcie_x8;
+	hw->bus.payload = fm10k_bus_payload_256;
+
+	info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2112,18 +2133,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 		return -EIO;
 	}
 
-	/*
-	 * Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-	 * there is no way to get link status without reading BAR4.  Until this
-	 * works, assume we have maximum bandwidth.
-	 * @todo - fix bus info
-	 */
-	hw->bus_caps.speed = fm10k_bus_speed_8000;
-	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-	hw->bus_caps.payload = fm10k_bus_payload_512;
-	hw->bus.speed = fm10k_bus_speed_8000;
-	hw->bus.width = fm10k_bus_width_pcie_x8;
-	hw->bus.payload = fm10k_bus_payload_256;
+	/* Initialize parameters */
+	fm10k_params_init(dev);
 
 	/* Initialize the hw */
 	diag = fm10k_init_hw(hw);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 04/16] fm10k: add func to re-allocate mbuf for RX ring
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (2 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
                               ` (12 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   11 ++++
 drivers/net/fm10k/fm10k_ethdev.c   |    3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   98 ++++++++++++++++++++++++++++++++++++
 3 files changed, 112 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..5513644 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)    (1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)    ((vlan_id) >> 5)
 
+#define RTE_FM10K_RXQ_REARM_THRESH      32
+#define RTE_FM10K_VPMD_TX_BURST         32
+#define RTE_FM10K_MAX_RX_BURST          RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ    64
+#define RTE_FM10K_DESCS_PER_LOOP    4
+
 struct fm10k_macvlan_filter_info {
 	uint16_t vlan_num;       /* Total VLAN number */
 	uint16_t mac_num;        /* Total mac number */
@@ -171,6 +177,8 @@ struct fm10k_rx_queue {
 	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
 	uint64_t mbuf_initializer; /* value to init mbufs */
+	/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+	struct rte_mbuf fake_mbuf;
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -178,6 +186,9 @@ struct fm10k_rx_queue {
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint16_t queue_id;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
+	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 363ef98..44c3d34 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
 	FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+	q->rxrearm_start = 0;
+	q->rxrearm_nb = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..6c21f15 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -64,3 +64,101 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 	rxq->mbuf_initializer = *(uint64_t *)p;
 	return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+	int i;
+	uint16_t rx_id;
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
+	struct rte_mbuf *mb0, *mb1;
+	__m128i head_off = _mm_set_epi64x(
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+	__m128i dma_addr0, dma_addr1;
+	/* Rx buffer need to be aligned with 512 byte */
+	const __m128i hba_msk = _mm_set_epi64x(0,
+				UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+	rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+	/* Pull 'n' more MBUFs into the software ring */
+	if (rte_mempool_get_bulk(rxq->mp,
+				 (void *)mb_alloc,
+				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+		dma_addr0 = _mm_setzero_si128();
+		/* Clean up all the HW/SW ring content */
+		for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) {
+			mb_alloc[i] = &rxq->fake_mbuf;
+			_mm_store_si128((__m128i *)&rxdp[i].q,
+						dma_addr0);
+		}
+
+		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_FM10K_RXQ_REARM_THRESH;
+		return;
+	}
+
+	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
+	for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+		__m128i vaddr0, vaddr1;
+		uintptr_t p0, p1;
+
+		mb0 = mb_alloc[0];
+		mb1 = mb_alloc[1];
+
+		/* Flush mbuf with pkt template.
+		 * Data to be rearmed is 6 bytes long.
+		 * Though, RX will overwrite ol_flags that are coming next
+		 * anyway. So overwrite whole 8 bytes with one load:
+		 * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+		 */
+		p0 = (uintptr_t)&mb0->rearm_data;
+		*(uint64_t *)p0 = rxq->mbuf_initializer;
+		p1 = (uintptr_t)&mb1->rearm_data;
+		*(uint64_t *)p1 = rxq->mbuf_initializer;
+
+		/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
+		vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
+		vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
+
+		/* convert pa to dma_addr hdr/data */
+		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+		/* add headroom to pa values */
+		dma_addr0 = _mm_add_epi64(dma_addr0, head_off);
+		dma_addr1 = _mm_add_epi64(dma_addr1, head_off);
+
+		/* Do 512 byte alignment to satisfy HW requirement, in the
+		 * meanwhile, set Header Buffer Address to zero.
+		 */
+		dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
+		dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
+
+		/* flush desc with pa dma_addr */
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr0);
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr1);
+
+		/* enforce 512B alignment on default Rx virtual addresses */
+		mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb0->buf_addr);
+		mb1->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb1->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb1->buf_addr);
+	}
+
+	rxq->rxrearm_start += RTE_FM10K_RXQ_REARM_THRESH;
+	if (rxq->rxrearm_start >= rxq->nb_desc)
+		rxq->rxrearm_start = 0;
+
+	rxq->rxrearm_nb -= RTE_FM10K_RXQ_REARM_THRESH;
+
+	rx_id = (uint16_t) ((rxq->rxrearm_start == 0) ?
+			     (rxq->nb_desc - 1) : (rxq->rxrearm_start - 1));
+
+	/* Update the tail pointer on the NIC */
+	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 05/16] fm10k: add 2 functions to parse pkt_type and offload flag
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (3 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
                               ` (11 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |  127 ++++++++++++++++++++++++++++++++++++
 1 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 6c21f15..88c9536 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT     (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT     (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT     (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i ptype0, ptype1, vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	const __m128i pkttype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+	/* mask everything except rss type */
+	const __m128i rsstype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x000F, 0x000F, 0x000F, 0x000F);
+
+	/* map rss type to rss hash flag */
+	const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+			0, 0, 0, PKT_RX_RSS_HASH,
+			PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+			PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+	ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+	ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+	ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+	ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+	vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+	vtag1 = _mm_or_si128(ptype0, vtag1);
+	vol.dword = _mm_cvtsi128_si64(vtag1);
+
+	rx_pkts[0]->ol_flags = vol.e[0];
+	rx_pkts[1]->ol_flags = vol.e[1];
+	rx_pkts[2]->ol_flags = vol.e[2];
+	rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i l3l4type0, l3l4type1, l3type, l4type;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	/* L3 pkt type mask  Bit4 to Bit6 */
+	const __m128i l3type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0070, 0x0070, 0x0070, 0x0070);
+
+	/* L4 pkt type mask  Bit7 to Bit9 */
+	const __m128i l4type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0380, 0x0380, 0x0380, 0x0380);
+
+	/* convert RRC l3 type to mbuf format */
+	const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+			0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+			RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+			RTE_PTYPE_L3_IPV4, 0);
+
+	/* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+	 * to fill into8 bits length.
+	 */
+	const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+			RTE_PTYPE_TUNNEL_GENEVE >> 8,
+			RTE_PTYPE_TUNNEL_NVGRE >> 8,
+			RTE_PTYPE_TUNNEL_VXLAN >> 8,
+			RTE_PTYPE_TUNNEL_GRE >> 8,
+			RTE_PTYPE_L4_UDP >> 8,
+			RTE_PTYPE_L4_TCP >> 8,
+			0);
+
+	l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+	l3type = _mm_and_si128(l3l4type0, l3type_msk);
+	l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+	l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+	l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+	l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+	/* l4type_flags shift-left for 8 bits, need shift-right back */
+	l4type = _mm_shuffle_epi8(l4type_flags, l4type);
+
+	l4type = _mm_slli_epi16(l4type, 8);
+	l3l4type0 = _mm_or_si128(l3type, l4type);
+	vol.dword = _mm_cvtsi128_si64(l3l4type0);
+
+	rx_pkts[0]->packet_type = vol.e[0];
+	rx_pkts[1]->packet_type = vol.e[1];
+	rx_pkts[2]->packet_type = vol.e[2];
+	rx_pkts[3]->packet_type = vol.e[3];
+}
+#else
+#define fm10k_desc_to_olflags_v(desc, rx_pkts) do {} while (0)
+#define fm10k_desc_to_pktype_v(desc, rx_pkts) do {} while (0)
+#endif
+
 int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 06/16] fm10k: add Vector RX function
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (4 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
                               ` (10 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  201 ++++++++++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5513644..96b30a7 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 88c9536..8c535f0 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -289,3 +289,204 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	/* Update the tail pointer on the NIC */
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
+
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts, uint8_t *split_packet)
+{
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mbufp;
+	uint16_t nb_pkts_recd;
+	int pos;
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint64_t var;
+	__m128i shuf_msk;
+	__m128i dd_check, eop_check;
+	uint16_t next_dd;
+
+	next_dd = rxq->next_dd;
+
+	/* Just the act of getting into the function from the application is
+	 * going to cost about 7 cycles
+	 */
+	rxdp = rxq->hw_ring + next_dd;
+
+	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+	/* See if we need to rearm the RX queue - gives the prefetch a bit
+	 * of time to act
+	 */
+	if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+		fm10k_rxq_rearm(rxq);
+
+	/* Before we start moving massive data around, check to see if
+	 * there is actually a packet available
+	 */
+	if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+		return 0;
+
+	/* Vecotr RX will process 4 packets at a time, strip the unaligned
+	 * tails in case it's not multiple of 4.
+	 */
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_FM10K_DESCS_PER_LOOP);
+
+	/* 4 packets DD mask */
+	dd_check = _mm_set_epi64x(0x0000000100000001LL, 0x0000000100000001LL);
+
+	/* 4 packets EOP mask */
+	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
+
+	/* mask to shuffle from desc. to mbuf */
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+		);
+
+	/* Cache is empty -> need to scan the buffer rings, but first move
+	 * the next 'n' mbufs into the cache
+	 */
+	mbufp = &rxq->sw_ring[next_dd];
+
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+			pos += RTE_FM10K_DESCS_PER_LOOP,
+			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+		/* B.1 load 1 mbuf point */
+		mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
+
+		/* Read desc statuses backwards to avoid race condition */
+		/* A.1 load 4 pkts desc */
+		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+
+		/* B.1 load 1 mbuf point */
+		mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
+
+		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		/* B.1 load 2 mbuf point */
+		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+
+		/* avoid compiler reorder optimization */
+		rte_compiler_barrier();
+
+		if (split_packet) {
+			rte_prefetch0(&rx_pkts[pos]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
+		}
+
+		/* D.1 pkt 3,4 convert format from desc to pktmbuf */
+		pkt_mb4 = _mm_shuffle_epi8(descs0[3], shuf_msk);
+		pkt_mb3 = _mm_shuffle_epi8(descs0[2], shuf_msk);
+
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp2 = _mm_unpackhi_epi32(descs0[3], descs0[2]);
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp1 = _mm_unpackhi_epi32(descs0[1], descs0[0]);
+
+		/* set ol_flags with vlan packet type */
+		fm10k_desc_to_olflags_v(descs0, &rx_pkts[pos]);
+
+		/* D.1 pkt 1,2 convert format from desc to pktmbuf */
+		pkt_mb2 = _mm_shuffle_epi8(descs0[1], shuf_msk);
+		pkt_mb1 = _mm_shuffle_epi8(descs0[0], shuf_msk);
+
+		/* C.2 get 4 pkts staterr value  */
+		zero = _mm_xor_si128(dd_check, dd_check);
+		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
+
+		/* D.3 copy final 3,4 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
+				pkt_mb4);
+		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
+				pkt_mb3);
+
+		/* C* extract and record EOP bit */
+		if (split_packet) {
+			__m128i eop_shuf_mask = _mm_set_epi8(
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0x04, 0x0C, 0x00, 0x08
+					);
+
+			/* and with mask to extract bits, flipping 1-0 */
+			__m128i eop_bits = _mm_andnot_si128(staterr, eop_check);
+			/* the staterr values are not in order, as the count
+			 * count of dd bits doesn't care. However, for end of
+			 * packet tracking, we do care, so shuffle. This also
+			 * compresses the 32-bit values to 8-bit
+			 */
+			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
+			/* store the resulting 32-bit value */
+			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
+			split_packet += RTE_FM10K_DESCS_PER_LOOP;
+
+			/* zero-out next pointers */
+			rx_pkts[pos]->next = NULL;
+			rx_pkts[pos + 1]->next = NULL;
+			rx_pkts[pos + 2]->next = NULL;
+			rx_pkts[pos + 3]->next = NULL;
+		}
+
+		/* C.3 calc available number of desc */
+		staterr = _mm_and_si128(staterr, dd_check);
+		staterr = _mm_packs_epi32(staterr, zero);
+
+		/* D.3 copy final 1,2 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
+				pkt_mb2);
+		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
+				pkt_mb1);
+
+		fm10k_desc_to_pktype_v(descs0, &rx_pkts[pos]);
+
+		/* C.4 calc avaialbe number of desc */
+		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
+		nb_pkts_recd += var;
+		if (likely(var != RTE_FM10K_DESCS_PER_LOOP))
+			break;
+	}
+
+	/* Update our internal tail pointer */
+	rxq->next_dd = (uint16_t)(rxq->next_dd + nb_pkts_recd);
+	rxq->next_dd = (uint16_t)(rxq->next_dd & (rxq->nb_desc - 1));
+	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
+
+	return nb_pkts_recd;
+}
+
+/* vPMD receive routine
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ */
+uint16_t
+fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts)
+{
+	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 07/16] fm10k: add func to do Vector RX condition check
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (5 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
                               ` (9 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 96b30a7..6c1c698 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,5 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 8c535f0..4ecb471 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 #endif
 
 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+	/* whithout rx ol_flags, no VP flag report */
+	if (rxmode->hw_vlan_extend != 0)
+		return -1;
+#endif
+
+	/* no fdir support */
+	if (fconf->mode != RTE_FDIR_MODE_NONE)
+		return -1;
+
+	/* - no csum error report support
+	 * - no header split support
+	 */
+	if (rxmode->hw_ip_checksum == 1 ||
+	    rxmode->header_split == 1)
+		return -1;
+
+	return 0;
+#else
+	RTE_SET_USED(dev);
+	return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
 	uintptr_t p;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 08/16] fm10k: add Vector RX scatter function
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (6 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
                               ` (8 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 6c1c698..8dba27b 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,4 +331,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+					uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 4ecb471..2d1dfa3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -521,3 +521,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 {
 	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+		struct rte_mbuf **rx_bufs,
+		uint16_t nb_bufs, uint8_t *split_flags)
+{
+	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+	struct rte_mbuf *start = rxq->pkt_first_seg;
+	struct rte_mbuf *end =  rxq->pkt_last_seg;
+	unsigned pkt_idx, buf_idx;
+
+
+	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+		if (end != NULL) {
+			/* processing a split packet */
+			end->next = rx_bufs[buf_idx];
+			start->nb_segs++;
+			start->pkt_len += rx_bufs[buf_idx]->data_len;
+			end = end->next;
+
+			if (!split_flags[buf_idx]) {
+				/* it's the last packet of the set */
+				start->hash = end->hash;
+				start->ol_flags = end->ol_flags;
+				pkts[pkt_idx++] = start;
+				start = end = NULL;
+			}
+		} else {
+			/* not processing a split packet */
+			if (!split_flags[buf_idx]) {
+				/* not a split packet, save and skip */
+				pkts[pkt_idx++] = rx_bufs[buf_idx];
+				continue;
+			}
+			end = start = rx_bufs[buf_idx];
+		}
+	}
+
+	/* save the partial packet for next time */
+	rxq->pkt_first_seg = start;
+	rxq->pkt_last_seg = end;
+	memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+	return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts > RTE_FM10K_MAX_RX_BURST, only scan RTE_FM10K_MAX_RX_BURST
+ *   numbers of DD bit
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts)
+{
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+	unsigned i = 0;
+
+	/* Split_flags only can support max of RTE_FM10K_MAX_RX_BURST */
+	nb_pkts = RTE_MIN(nb_pkts, RTE_FM10K_MAX_RX_BURST);
+	/* get some new buffers */
+	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+			split_flags);
+	if (nb_bufs == 0)
+		return 0;
+
+	/* happy day case, full burst + no packets to be joined */
+	const uint64_t *split_fl64 = (uint64_t *)split_flags;
+	if (rxq->pkt_first_seg == NULL &&
+			split_fl64[0] == 0 && split_fl64[1] == 0 &&
+			split_fl64[2] == 0 && split_fl64[3] == 0)
+		return nb_bufs;
+
+	/* reassemble any packets that need reassembly*/
+	if (rxq->pkt_first_seg == NULL) {
+		/* find the first split flag, and only reassemble then*/
+		while (i < nb_bufs && !split_flags[i])
+			i++;
+		if (i == nb_bufs)
+			return nb_bufs;
+	}
+	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+		&split_flags[i]);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 09/16] fm10k: add function to decide best RX function
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (7 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
                               ` (7 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8dba27b..5666af6 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -189,6 +189,7 @@ struct fm10k_rx_queue {
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
 	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+	uint16_t rx_using_sse; /* indicates that vector RX is in use */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 44c3d34..4690a0c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 			dev->data->dev_conf.rxmode.enable_scatter) {
 			uint32_t reg;
 			dev->data->scattered_rx = 1;
-			dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
 			reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
 			reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
 			FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 
 	/* Configure RSS if applicable */
 	fm10k_dev_mq_rx_configure(dev);
+
+	/* Decide the best RX function */
+	fm10k_set_rx_function(dev);
 	return 0;
 }
 
@@ -2069,6 +2072,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	uint16_t i, rx_using_sse;
+
+	/* In order to allow Vector Rx there are a few configuration
+	 * conditions to be met.
+	 */
+	if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+		if (dev->data->scattered_rx)
+			dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+		else
+			dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+	} else if (dev->data->scattered_rx)
+		dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+	rx_using_sse =
+		(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+		dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+		rxq->rx_using_sse = rx_using_sse;
+	}
+
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2103,9 +2134,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
 
-	if (dev->data->scattered_rx)
-		dev->rx_pkt_burst = &fm10k_recv_scattered_pkts;
-
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 10/16] fm10k: add func to release mbuf in case Vector RX applied
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (8 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
                               ` (6 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_ethdev.c   |    6 ++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5666af6..d17b2fb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,6 +331,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 4690a0c..a46a349 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* vPMD driver has a different way of releasing mbufs. */
+	if (q->rx_using_sse) {
+		fm10k_rx_queue_release_mbufs_vec(q);
+		return;
+	}
+
 	/* free software buffers */
 	for (i = 0; i < q->nb_desc; ++i) {
 		if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 2d1dfa3..0869aa3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -321,6 +321,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
 
+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+	const unsigned mask = rxq->nb_desc - 1;
+	unsigned i;
+
+	if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+		return;
+
+	/* free all mbufs that are valid in the ring */
+	for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+		rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+	rxq->rxrearm_nb = rxq->nb_desc;
+
+	/* set all entries to NULL */
+	memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 11/16] fm10k: add Vector TX function
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (9 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
                               ` (5 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index d17b2fb..5525b72 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -217,6 +217,9 @@ struct fm10k_tx_queue {
 	uint16_t nb_used;
 	uint16_t free_thresh;
 	uint16_t rs_thresh;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t next_rs; /* Next pos to set RS flag */
+	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint8_t port_id;
@@ -335,4 +338,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 0869aa3..37418bf 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -627,3 +627,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
 		&split_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf *pkt, uint64_t flags)
+{
+	__m128i descriptor = _mm_set_epi64x(flags << 56 |
+			pkt->vlan_tci << 16 | pkt->data_len,
+			MBUF_DMA_ADDR(pkt));
+	_mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+	int i;
+
+	for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+		vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+	struct rte_mbuf **txep;
+	uint8_t flags;
+	uint32_t n;
+	uint32_t i;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+	/* check DD bit on threshold descriptor */
+	flags = txq->hw_ring[txq->next_dd].flags;
+	if (!(flags & FM10K_TXD_FLAG_DONE))
+		return 0;
+
+	n = txq->rs_thresh;
+
+	/* First buffer to free from S/W ring is at index
+	 * next_dd - (rs_thresh-1)
+	 */
+	txep = &txq->sw_ring[txq->next_dd - (n - 1)];
+	m = __rte_pktmbuf_prefree_seg(txep[0]);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+							(void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	/* buffers were freed, update counters */
+	txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+	txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+	if (txq->next_dd >= txq->nb_desc)
+		txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+	return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i;
+
+	for (i = 0; i < (int)nb_pkts; ++i)
+		txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts)
+{
+	struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+	volatile struct fm10k_tx_desc *txdp;
+	struct rte_mbuf **txep;
+	uint16_t n, nb_commit, tx_id;
+	uint64_t flags = FM10K_TXD_FLAG_LAST;
+	uint64_t rs = FM10K_TXD_FLAG_RS | FM10K_TXD_FLAG_LAST;
+	int i;
+
+	/* cross rx_thresh boundary is not allowed */
+	nb_pkts = RTE_MIN(nb_pkts, txq->rs_thresh);
+
+	if (txq->nb_free < txq->free_thresh)
+		fm10k_tx_free_bufs(txq);
+
+	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_free, nb_pkts);
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	tx_id = txq->next_free;
+	txdp = &txq->hw_ring[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
+	txq->nb_free = (uint16_t)(txq->nb_free - nb_pkts);
+
+	n = (uint16_t)(txq->nb_desc - tx_id);
+	if (nb_commit >= n) {
+		tx_backlog_entry(txep, tx_pkts, n);
+
+		for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
+			vtx1(txdp, *tx_pkts, flags);
+
+		vtx1(txdp, *tx_pkts++, rs);
+
+		nb_commit = (uint16_t)(nb_commit - n);
+
+		tx_id = 0;
+		txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+		/* avoid reach the end of ring */
+		txdp = &(txq->hw_ring[tx_id]);
+		txep = &txq->sw_ring[tx_id];
+	}
+
+	tx_backlog_entry(txep, tx_pkts, nb_commit);
+
+	vtx(txdp, tx_pkts, nb_commit, flags);
+
+	tx_id = (uint16_t)(tx_id + nb_commit);
+	if (tx_id > txq->next_rs) {
+		txq->hw_ring[txq->next_rs].flags |= FM10K_TXD_FLAG_RS;
+		txq->next_rs = (uint16_t)(txq->next_rs + txq->rs_thresh);
+	}
+
+	txq->next_free = tx_id;
+
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, txq->next_free);
+
+	return nb_pkts;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 12/16] fm10k: use func pointer to reset TX queue and mbuf release
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (10 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
                               ` (4 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++++++
 drivers/net/fm10k/fm10k_ethdev.c |   21 ++++++++++++++++-----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5525b72..bfb71da 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -206,11 +206,14 @@ struct fifo {
 	uint16_t *endp;
 };
 
+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
 	struct rte_mbuf **sw_ring;
 	struct fm10k_tx_desc *hw_ring;
 	uint64_t hw_ring_phys_addr;
 	struct fifo rs_tracker;
+	const struct fm10k_txq_ops *ops; /* txq ops */
 	uint16_t last_free;
 	uint16_t next_free;
 	uint16_t nb_free;
@@ -227,6 +230,11 @@ struct fm10k_tx_queue {
 	uint16_t queue_id;
 };
 
+struct fm10k_txq_ops {
+	void (*release_mbufs)(struct fm10k_tx_queue *txq);
+	void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
 	((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))
 
@@ -340,4 +348,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a46a349..88bd887 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
 	return 0;
 }
 
+static const struct fm10k_txq_ops def_txq_ops = {
+	.release_mbufs = tx_queue_free,
+	.reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,8 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	PMD_INIT_FUNC_TRACE();
 
 	if (tx_queue_id < dev->data->nb_tx_queues) {
-		tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+		struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+		q->ops->reset(q);
 
 		/* reset head and tail pointers */
 		FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +843,10 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 
 	if (dev->data->tx_queues) {
-		for (i = 0; i < dev->data->nb_tx_queues; i++)
-			fm10k_tx_queue_release(dev->data->tx_queues[i]);
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+			txq->ops->release_mbufs(txq);
+		}
 	}
 
 	if (dev->data->rx_queues) {
@@ -1454,7 +1462,8 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	 * different socket than was previously used.
 	 */
 	if (dev->data->tx_queues[queue_id] != NULL) {
-		tx_queue_free(dev->data->tx_queues[queue_id]);
+		struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+		txq->ops->release_mbufs(txq);
 		dev->data->tx_queues[queue_id] = NULL;
 	}
 
@@ -1470,6 +1479,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
 	if (handle_txconf(q, conf))
@@ -1528,9 +1538,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+	struct fm10k_tx_queue *q = queue;
 	PMD_INIT_FUNC_TRACE();
 
-	tx_queue_free(queue);
+	q->ops->release_mbufs(q);
 }
 
 static int
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 13/16] fm10k: introduce 2 funcs to reset TX queue and mbuf release
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (11 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
                               ` (3 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 ++++++++++++++++++++++++++++++++++++
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 37418bf..e572715 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -628,6 +633,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 		&split_flags[i]);
 }
 
+static const struct fm10k_txq_ops vec_txq_ops = {
+	.release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+	.reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+	txq->ops = &vec_txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
@@ -777,3 +793,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+	unsigned i;
+	const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+	if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+		return;
+
+	/* release the used mbufs in sw_ring */
+	for (i = txq->next_dd - (txq->rs_thresh - 1);
+	     i != txq->next_free;
+	     i = (i + 1) & max_desc)
+		rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+	txq->nb_free = max_desc;
+
+	/* reset tx_entry */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->sw_ring[i] = NULL;
+
+	rte_free(txq->sw_ring);
+	txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+	static const struct fm10k_tx_desc zeroed_desc = {0};
+	struct rte_mbuf **txe = txq->sw_ring;
+	uint16_t i;
+
+	/* Zero out HW ring memory */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->hw_ring[i] = zeroed_desc;
+
+	/* Initialize SW ring entries */
+	for (i = 0; i < txq->nb_desc; i++)
+		txe[i] = NULL;
+
+	txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+	txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+	txq->next_free = 0;
+	txq->nb_used = 0;
+	/* Always allow 1 descriptor to be un-allocated to avoid
+	 * a H/W race condition
+	 */
+	txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 14/16] fm10k: Add function to decide best TX func
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (12 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
                               ` (2 subsequent siblings)
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index bfb71da..8e2c6a4 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -224,6 +224,7 @@ struct fm10k_tx_queue {
 	uint16_t next_rs; /* Next pos to set RS flag */
 	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
+	uint32_t txq_flags; /* Holds flags for this TXq */
 	uint16_t nb_desc;
 	uint8_t port_id;
 	uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 88bd887..469bd85 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
 
+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+				ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
 				base_addr >> (CHAR_BIT * sizeof(uint32_t)));
 		FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
 	}
+
+	/* set up vector or scalar TX function as appropriate */
+	fm10k_set_tx_function(dev);
+
 	return 0;
 }
 
@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		},
 		.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
 		.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-				ETH_TXQ_FLAGS_NOOFFLOADS,
+		.txq_flags = FM10K_SIMPLE_TX_FLAG,
 	};
 
 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->txq_flags = conf->txq_flags;
 	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };
 
 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_tx_queue *txq;
+	int i;
+	int use_sse = 1;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+			FM10K_SIMPLE_TX_FLAG) {
+			use_sse = 0;
+			break;
+		}
+	}
+
+	if (use_sse) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			fm10k_txq_vec_setup(txq);
+		}
+		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+	} else
+		dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
 	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 15/16] fm10k: fix a crash issue in vector RX func
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (13 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
  2015-10-29 10:22             ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    4 +++-
 drivers/net/fm10k/fm10k_ethdev.c |   19 +++++++++++++++++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8e2c6a4..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,7 +177,7 @@ struct fm10k_rx_queue {
 	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
 	uint64_t mbuf_initializer; /* value to init mbufs */
-	/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+	/* need to alloc dummy mbuf, for wraparound when scanning hw ring */
 	struct rte_mbuf fake_mbuf;
 	uint16_t next_dd;
 	uint16_t next_alloc;
@@ -185,6 +185,8 @@ struct fm10k_rx_queue {
 	uint16_t alloc_thresh;
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
+	/* Number of faked desc added at the tail for Vector RX function */
+	uint16_t nb_fake_desc;
 	uint16_t queue_id;
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 469bd85..705b311 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+	static const union fm10k_rx_desc zero = {{0} };
 	uint64_t dma_addr;
 	int i, diag;
 	PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 		q->hw_ring[i].q.hdr_addr = dma_addr;
 	}
 
+	/* initialize extra software ring entries. Space for these extra
+	 * entries is always allocated.
+	 */
+	memset(&q->fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+	for (i = 0; i < q->nb_fake_desc; ++i) {
+		q->sw_ring[q->nb_desc + i] = &q->fake_mbuf;
+		q->hw_ring[q->nb_desc + i] = zero;
+	}
+
 	q->next_dd = 0;
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* zero faked descriptors */
+	for (i = 0; i < q->nb_fake_desc; ++i)
+		q->hw_ring[q->nb_desc + i] = zero;
+
 	/* vPMD driver has a different way of releasing mbufs. */
 	if (q->rx_using_sse) {
 		fm10k_rx_queue_release_mbufs_vec(q);
@@ -1323,6 +1337,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	/* setup queue */
 	q->mp = mp;
 	q->nb_desc = nb_desc;
+	q->nb_fake_desc = FM10K_MULT_RX_DESC;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
 	q->tail_ptr = (volatile uint32_t *)
@@ -1332,8 +1347,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 
 	/* allocate memory for the software ring */
 	q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-					nb_desc * sizeof(struct rte_mbuf *),
-					RTE_CACHE_LINE_SIZE, socket_id);
+			(nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+			RTE_CACHE_LINE_SIZE, socket_id);
 	if (q->sw_ring == NULL) {
 		PMD_INIT_LOG(ERR, "Cannot allocate software ring");
 		rte_free(q);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v4 16/16] doc: release notes update for fm10k Vector PMD
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (14 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
@ 2015-10-29  9:16             ` Chen Jing D(Mark)
  2015-10-29 10:22             ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
  16 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-29  9:16 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 doc/guides/rel_notes/release_2_2.rst |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 9a70dae..44a3f74 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers
 
   Fixed issue with libvirt ``virsh destroy`` not killing the VM.
 
+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.
 
 Libraries
 ~~~~~~~~~
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k
  2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                               ` (15 preceding siblings ...)
  2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
@ 2015-10-29 10:22             ` Liang, Cunming
  2015-10-29 23:12               ` Thomas Monjalon
  16 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-29 10:22 UTC (permalink / raw)
  To: Chen, Jing D, dev

Hi,

> -----Original Message-----
> From: Chen, Jing D
> Sent: Thursday, October 29, 2015 5:16 PM
> To: dev@dpdk.org
> Cc: Liang, Cunming; Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson,
> Bruce; Chen, Jing D
> Subject: [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k
> 
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> 
> v4:
>  - Clear HW/SW ring content after allocating mbuf failed.
> 
> v3:
>  - Add a blank line after variable definition.
>  - Do floor alignment for passing in argument nb_pkts to avoid memory
> overwritten.
>  - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.
> 
> v2:
>  - Fix a typo issue.
>  - Fix an improper prefetch in vector RX function, in which prefetches
>    un-initialized mbuf.
>  - Remove limitation on number of desc pointer in vector RX function.
>  - Re-organize some comments.
>  - Add a new patch to fix a crash issue in vector RX func.
>  - Add a new patch to update release notes.
> 
> v1:
> This patch set includes Vector Rx/Tx functions to receive/transmit packets
> for fm10k devices. It also contains logic to do sanity check for proper
> RX/TX function selections.
> 
> Chen Jing D(Mark) (16):
>   fm10k: add new vPMD file
>   fm10k: add vPMD pre-condition check for each RX queue
>   fm10k: Add a new func to initialize all parameters
>   fm10k: add func to re-allocate mbuf for RX ring
>   fm10k: add 2 functions to parse pkt_type and offload flag
>   fm10k: add Vector RX function
>   fm10k: add func to do Vector RX condition check
>   fm10k: add Vector RX scatter function
>   fm10k: add function to decide best RX function
>   fm10k: add func to release mbuf in case Vector RX applied
>   fm10k: add Vector TX function
>   fm10k: use func pointer to reset TX queue and mbuf release
>   fm10k: introduce 2 funcs to reset TX queue and mbuf release
>   fm10k: Add function to decide best TX func
>   fm10k: fix a crash issue in vector RX func
>   doc: release notes update for fm10k Vector PMD
> 
>  doc/guides/rel_notes/release_2_2.rst |    5 +
>  drivers/net/fm10k/Makefile           |    1 +
>  drivers/net/fm10k/fm10k.h            |   45 ++-
>  drivers/net/fm10k/fm10k_ethdev.c     |  169 ++++++-
>  drivers/net/fm10k/fm10k_rxtx_vec.c   |  847
> ++++++++++++++++++++++++++++++++++
>  5 files changed, 1039 insertions(+), 28 deletions(-)
>  create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c
> 
> --
> 1.7.7.6

Acked-by: Cunming Liang <cunming.liang@intel.com>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k
  2015-10-29 10:22             ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
@ 2015-10-29 23:12               ` Thomas Monjalon
  2015-10-30  3:09                 ` Chen, Jing D
  0 siblings, 1 reply; 109+ messages in thread
From: Thomas Monjalon @ 2015-10-29 23:12 UTC (permalink / raw)
  To: Chen, Jing D; +Cc: dev

> > Chen Jing D(Mark) (16):
> >   fm10k: add new vPMD file
> >   fm10k: add vPMD pre-condition check for each RX queue
> >   fm10k: Add a new func to initialize all parameters
> >   fm10k: add func to re-allocate mbuf for RX ring
> >   fm10k: add 2 functions to parse pkt_type and offload flag
> >   fm10k: add Vector RX function
> >   fm10k: add func to do Vector RX condition check
> >   fm10k: add Vector RX scatter function
> >   fm10k: add function to decide best RX function
> >   fm10k: add func to release mbuf in case Vector RX applied
> >   fm10k: add Vector TX function
> >   fm10k: use func pointer to reset TX queue and mbuf release
> >   fm10k: introduce 2 funcs to reset TX queue and mbuf release
> >   fm10k: Add function to decide best TX func
> >   fm10k: fix a crash issue in vector RX func
> >   doc: release notes update for fm10k Vector PMD
> 
> Acked-by: Cunming Liang <cunming.liang@intel.com>

Sorry, there are some checkpatch warnings and a compilation error:

SPACING: No space is necessary after a cast
SPACING: spaces preferred around that '+'
LINE_CONTINUATIONS: Avoid unnecessary line continuations

And more important, with clang:

fm10k_rxtx_vec.c:69:1: error: unused function 'fm10k_rxq_rearm'

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k
  2015-10-29 23:12               ` Thomas Monjalon
@ 2015-10-30  3:09                 ` Chen, Jing D
  0 siblings, 0 replies; 109+ messages in thread
From: Chen, Jing D @ 2015-10-30  3:09 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi, Thomas,

Best Regards,
Mark


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Friday, October 30, 2015 7:13 AM
> To: Chen, Jing D
> Cc: dev@dpdk.org; Liang, Cunming
> Subject: Re: [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation
> for fm10k
> 
> > > Chen Jing D(Mark) (16):
> > >   fm10k: add new vPMD file
> > >   fm10k: add vPMD pre-condition check for each RX queue
> > >   fm10k: Add a new func to initialize all parameters
> > >   fm10k: add func to re-allocate mbuf for RX ring
> > >   fm10k: add 2 functions to parse pkt_type and offload flag
> > >   fm10k: add Vector RX function
> > >   fm10k: add func to do Vector RX condition check
> > >   fm10k: add Vector RX scatter function
> > >   fm10k: add function to decide best RX function
> > >   fm10k: add func to release mbuf in case Vector RX applied
> > >   fm10k: add Vector TX function
> > >   fm10k: use func pointer to reset TX queue and mbuf release
> > >   fm10k: introduce 2 funcs to reset TX queue and mbuf release
> > >   fm10k: Add function to decide best TX func
> > >   fm10k: fix a crash issue in vector RX func
> > >   doc: release notes update for fm10k Vector PMD
> >
> > Acked-by: Cunming Liang <cunming.liang@intel.com>
> 
> Sorry, there are some checkpatch warnings and a compilation error:
> 
> SPACING: No space is necessary after a cast
> SPACING: spaces preferred around that '+'
> LINE_CONTINUATIONS: Avoid unnecessary line continuations
> 
> And more important, with clang:
> 
> fm10k_rxtx_vec.c:69:1: error: unused function 'fm10k_rxq_rearm'

Thanks for the comments. I'll fix it.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k
  2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-30  8:02               ` Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
                                   ` (14 more replies)
  0 siblings, 15 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

v5:
 - Fix some warnings reported by checkpatch.pl
 - Squash 3 patches into 1 to avoid compile error on unsued functions.
 - Sync with master branch

v4:
 - Clear HW/SW ring content after allocating mbuf failed.

v3:
 - Add a blank line after variable definition.
 - Do floor alignment for passing in argument nb_pkts to avoid memory overwritten.
 - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.

v2:
 - Fix a typo issue.
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (14):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |    6 +
 drivers/net/fm10k/Makefile           |    1 +
 drivers/net/fm10k/fm10k.h            |   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c     |  172 ++++++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  847 ++++++++++++++++++++++++++++++++++
 5 files changed, 1043 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 01/14] fm10k: add new vPMD file
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
@ 2015-10-30  8:02                 ` Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 02/14] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
                                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/Makefile         |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 0000000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <inttypes.h>
+
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include <tmmintrin.h>
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 02/14] fm10k: add vPMD pre-condition check for each RX queue
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
@ 2015-10-30  8:02                 ` Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 03/14] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
                                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   11 ++++++++---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++++++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
 	/* Protect the mailbox to avoid race condition */
 	rte_spinlock_t    mbx_lock;
 	struct fm10k_macvlan_filter_info    macvlan;
+	/* Flag to indicate if RX vector conditions satisfied */
+	bool rx_vec_allowed;
 };
 
 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
 	struct rte_mempool *mp;
 	struct rte_mbuf **sw_ring;
 	volatile union fm10k_rx_desc *hw_ring;
-	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
+	uint64_t mbuf_initializer; /* value to init mbufs */
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
 	uint16_t queue_id;
 	uint8_t port_id;
 	uint8_t drop_en;
-	uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };
 
 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index b104fc2..680a7fe 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1252,6 +1252,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
 	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
 	struct fm10k_rx_queue *q;
 	const struct rte_memzone *mz;
 
@@ -1334,6 +1335,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->hw_ring_phys_addr = mz->phys_addr;
 #endif
 
+	/* Check if number of descs satisfied Vector requirement */
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+				    "preconditions - canceling the feature for "
+				    "the whole port[%d]",
+			     q->queue_id, q->port_id);
+		dev_info->rx_vec_allowed = false;
+	} else
+		fm10k_rxq_vec_setup(q);
+
 	dev->data->rx_queues[queue_id] = q;
 	return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	/* data_off will be ajusted after new mbuf allocated for 512-byte
+	 * alignment.
+	 */
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+	return 0;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 03/14] fm10k: Add a new func to initialize all parameters
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 02/14] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
@ 2015-10-30  8:02                 ` Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 04/14] fm10k: add Vector RX function Chen Jing D(Mark)
                                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 680a7fe..8dd64bf 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2067,6 +2067,27 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+	struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+
+	/* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+	 * there is no way to get link status without reading BAR4.  Until this
+	 * works, assume we have maximum bandwidth.
+	 * @todo - fix bus info
+	 */
+	hw->bus_caps.speed = fm10k_bus_speed_8000;
+	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+	hw->bus_caps.payload = fm10k_bus_payload_512;
+	hw->bus.speed = fm10k_bus_speed_8000;
+	hw->bus.width = fm10k_bus_width_pcie_x8;
+	hw->bus.payload = fm10k_bus_payload_256;
+
+	info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2113,18 +2134,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 		return -EIO;
 	}
 
-	/*
-	 * Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-	 * there is no way to get link status without reading BAR4.  Until this
-	 * works, assume we have maximum bandwidth.
-	 * @todo - fix bus info
-	 */
-	hw->bus_caps.speed = fm10k_bus_speed_8000;
-	hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-	hw->bus_caps.payload = fm10k_bus_payload_512;
-	hw->bus.speed = fm10k_bus_speed_8000;
-	hw->bus.width = fm10k_bus_width_pcie_x8;
-	hw->bus.payload = fm10k_bus_payload_256;
+	/* Initialize parameters */
+	fm10k_params_init(dev);
 
 	/* Initialize the hw */
 	diag = fm10k_init_hw(hw);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 04/14] fm10k: add Vector RX function
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (2 preceding siblings ...)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 03/14] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
@ 2015-10-30  8:02                 ` Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 05/14] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
                                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

This patch add below functions:
1. Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.
2. Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.
3. Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
4. Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |   12 +
 drivers/net/fm10k/fm10k_ethdev.c   |    3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  426 ++++++++++++++++++++++++++++++++++++
 3 files changed, 441 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..96b30a7 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)    (1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)    ((vlan_id) >> 5)
 
+#define RTE_FM10K_RXQ_REARM_THRESH      32
+#define RTE_FM10K_VPMD_TX_BURST         32
+#define RTE_FM10K_MAX_RX_BURST          RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ    64
+#define RTE_FM10K_DESCS_PER_LOOP    4
+
 struct fm10k_macvlan_filter_info {
 	uint16_t vlan_num;       /* Total VLAN number */
 	uint16_t mac_num;        /* Total mac number */
@@ -171,6 +177,8 @@ struct fm10k_rx_queue {
 	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
 	uint64_t mbuf_initializer; /* value to init mbufs */
+	/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+	struct rte_mbuf fake_mbuf;
 	uint16_t next_dd;
 	uint16_t next_alloc;
 	uint16_t next_trigger;
@@ -178,6 +186,9 @@ struct fm10k_rx_queue {
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint16_t queue_id;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
+	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
@@ -318,4 +329,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 8dd64bf..6be764a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
 	FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+	q->rxrearm_start = 0;
+	q->rxrearm_nb = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..9633f35 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT     (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT     (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT     (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i ptype0, ptype1, vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	const __m128i pkttype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+			PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+	/* mask everything except rss type */
+	const __m128i rsstype_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x000F, 0x000F, 0x000F, 0x000F);
+
+	/* map rss type to rss hash flag */
+	const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+			0, 0, 0, PKT_RX_RSS_HASH,
+			PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+			PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+	ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+	ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+	ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+	ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+	vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+	vtag1 = _mm_or_si128(ptype0, vtag1);
+	vol.dword = _mm_cvtsi128_si64(vtag1);
+
+	rx_pkts[0]->ol_flags = vol.e[0];
+	rx_pkts[1]->ol_flags = vol.e[1];
+	rx_pkts[2]->ol_flags = vol.e[2];
+	rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+	__m128i l3l4type0, l3l4type1, l3type, l4type;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	/* L3 pkt type mask  Bit4 to Bit6 */
+	const __m128i l3type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0070, 0x0070, 0x0070, 0x0070);
+
+	/* L4 pkt type mask  Bit7 to Bit9 */
+	const __m128i l4type_msk = _mm_set_epi16(
+			0x0000, 0x0000, 0x0000, 0x0000,
+			0x0380, 0x0380, 0x0380, 0x0380);
+
+	/* convert RRC l3 type to mbuf format */
+	const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+			0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+			RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+			RTE_PTYPE_L3_IPV4, 0);
+
+	/* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+	 * to fill into8 bits length.
+	 */
+	const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+			RTE_PTYPE_TUNNEL_GENEVE >> 8,
+			RTE_PTYPE_TUNNEL_NVGRE >> 8,
+			RTE_PTYPE_TUNNEL_VXLAN >> 8,
+			RTE_PTYPE_TUNNEL_GRE >> 8,
+			RTE_PTYPE_L4_UDP >> 8,
+			RTE_PTYPE_L4_TCP >> 8,
+			0);
+
+	l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+	l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+	l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+	l3type = _mm_and_si128(l3l4type0, l3type_msk);
+	l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+	l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+	l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+	l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+	/* l4type_flags shift-left for 8 bits, need shift-right back */
+	l4type = _mm_shuffle_epi8(l4type_flags, l4type);
+
+	l4type = _mm_slli_epi16(l4type, 8);
+	l3l4type0 = _mm_or_si128(l3type, l4type);
+	vol.dword = _mm_cvtsi128_si64(l3l4type0);
+
+	rx_pkts[0]->packet_type = vol.e[0];
+	rx_pkts[1]->packet_type = vol.e[1];
+	rx_pkts[2]->packet_type = vol.e[2];
+	rx_pkts[3]->packet_type = vol.e[3];
+}
+#else
+#define fm10k_desc_to_olflags_v(desc, rx_pkts) do {} while (0)
+#define fm10k_desc_to_pktype_v(desc, rx_pkts) do {} while (0)
+#endif
+
 int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
@@ -64,3 +191,302 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 	rxq->mbuf_initializer = *(uint64_t *)p;
 	return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+	int i;
+	uint16_t rx_id;
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
+	struct rte_mbuf *mb0, *mb1;
+	__m128i head_off = _mm_set_epi64x(
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+			RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+	__m128i dma_addr0, dma_addr1;
+	/* Rx buffer need to be aligned with 512 byte */
+	const __m128i hba_msk = _mm_set_epi64x(0,
+				UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+	rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+	/* Pull 'n' more MBUFs into the software ring */
+	if (rte_mempool_get_bulk(rxq->mp,
+				 (void *)mb_alloc,
+				 RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+		dma_addr0 = _mm_setzero_si128();
+		/* Clean up all the HW/SW ring content */
+		for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) {
+			mb_alloc[i] = &rxq->fake_mbuf;
+			_mm_store_si128((__m128i *)&rxdp[i].q,
+						dma_addr0);
+		}
+
+		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_FM10K_RXQ_REARM_THRESH;
+		return;
+	}
+
+	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
+	for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+		__m128i vaddr0, vaddr1;
+		uintptr_t p0, p1;
+
+		mb0 = mb_alloc[0];
+		mb1 = mb_alloc[1];
+
+		/* Flush mbuf with pkt template.
+		 * Data to be rearmed is 6 bytes long.
+		 * Though, RX will overwrite ol_flags that are coming next
+		 * anyway. So overwrite whole 8 bytes with one load:
+		 * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+		 */
+		p0 = (uintptr_t)&mb0->rearm_data;
+		*(uint64_t *)p0 = rxq->mbuf_initializer;
+		p1 = (uintptr_t)&mb1->rearm_data;
+		*(uint64_t *)p1 = rxq->mbuf_initializer;
+
+		/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
+		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
+		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
+
+		/* convert pa to dma_addr hdr/data */
+		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+		/* add headroom to pa values */
+		dma_addr0 = _mm_add_epi64(dma_addr0, head_off);
+		dma_addr1 = _mm_add_epi64(dma_addr1, head_off);
+
+		/* Do 512 byte alignment to satisfy HW requirement, in the
+		 * meanwhile, set Header Buffer Address to zero.
+		 */
+		dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
+		dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
+
+		/* flush desc with pa dma_addr */
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr0);
+		_mm_store_si128((__m128i *)&rxdp++->q, dma_addr1);
+
+		/* enforce 512B alignment on default Rx virtual addresses */
+		mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb0->buf_addr);
+		mb1->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb1->buf_addr
+				+ RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
+				- (char *)mb1->buf_addr);
+	}
+
+	rxq->rxrearm_start += RTE_FM10K_RXQ_REARM_THRESH;
+	if (rxq->rxrearm_start >= rxq->nb_desc)
+		rxq->rxrearm_start = 0;
+
+	rxq->rxrearm_nb -= RTE_FM10K_RXQ_REARM_THRESH;
+
+	rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
+			(rxq->nb_desc - 1) : (rxq->rxrearm_start - 1));
+
+	/* Update the tail pointer on the NIC */
+	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
+}
+
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts, uint8_t *split_packet)
+{
+	volatile union fm10k_rx_desc *rxdp;
+	struct rte_mbuf **mbufp;
+	uint16_t nb_pkts_recd;
+	int pos;
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint64_t var;
+	__m128i shuf_msk;
+	__m128i dd_check, eop_check;
+	uint16_t next_dd;
+
+	next_dd = rxq->next_dd;
+
+	/* Just the act of getting into the function from the application is
+	 * going to cost about 7 cycles
+	 */
+	rxdp = rxq->hw_ring + next_dd;
+
+	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+	/* See if we need to rearm the RX queue - gives the prefetch a bit
+	 * of time to act
+	 */
+	if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+		fm10k_rxq_rearm(rxq);
+
+	/* Before we start moving massive data around, check to see if
+	 * there is actually a packet available
+	 */
+	if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+		return 0;
+
+	/* Vecotr RX will process 4 packets at a time, strip the unaligned
+	 * tails in case it's not multiple of 4.
+	 */
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_FM10K_DESCS_PER_LOOP);
+
+	/* 4 packets DD mask */
+	dd_check = _mm_set_epi64x(0x0000000100000001LL, 0x0000000100000001LL);
+
+	/* 4 packets EOP mask */
+	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
+
+	/* mask to shuffle from desc. to mbuf */
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+		);
+
+	/* Cache is empty -> need to scan the buffer rings, but first move
+	 * the next 'n' mbufs into the cache
+	 */
+	mbufp = &rxq->sw_ring[next_dd];
+
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+			pos += RTE_FM10K_DESCS_PER_LOOP,
+			rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+		__m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+		__m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+		/* B.1 load 1 mbuf point */
+		mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
+
+		/* Read desc statuses backwards to avoid race condition */
+		/* A.1 load 4 pkts desc */
+		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+
+		/* B.1 load 1 mbuf point */
+		mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
+
+		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+		/* B.1 load 2 mbuf point */
+		descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+
+		/* B.2 copy 2 mbuf point into rx_pkts  */
+		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+
+		/* avoid compiler reorder optimization */
+		rte_compiler_barrier();
+
+		if (split_packet) {
+			rte_prefetch0(&rx_pkts[pos]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
+			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
+		}
+
+		/* D.1 pkt 3,4 convert format from desc to pktmbuf */
+		pkt_mb4 = _mm_shuffle_epi8(descs0[3], shuf_msk);
+		pkt_mb3 = _mm_shuffle_epi8(descs0[2], shuf_msk);
+
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp2 = _mm_unpackhi_epi32(descs0[3], descs0[2]);
+		/* C.1 4=>2 filter staterr info only */
+		sterr_tmp1 = _mm_unpackhi_epi32(descs0[1], descs0[0]);
+
+		/* set ol_flags with vlan packet type */
+		fm10k_desc_to_olflags_v(descs0, &rx_pkts[pos]);
+
+		/* D.1 pkt 1,2 convert format from desc to pktmbuf */
+		pkt_mb2 = _mm_shuffle_epi8(descs0[1], shuf_msk);
+		pkt_mb1 = _mm_shuffle_epi8(descs0[0], shuf_msk);
+
+		/* C.2 get 4 pkts staterr value  */
+		zero = _mm_xor_si128(dd_check, dd_check);
+		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
+
+		/* D.3 copy final 3,4 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
+				pkt_mb4);
+		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
+				pkt_mb3);
+
+		/* C* extract and record EOP bit */
+		if (split_packet) {
+			__m128i eop_shuf_mask = _mm_set_epi8(
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0xFF, 0xFF, 0xFF, 0xFF,
+					0x04, 0x0C, 0x00, 0x08
+					);
+
+			/* and with mask to extract bits, flipping 1-0 */
+			__m128i eop_bits = _mm_andnot_si128(staterr, eop_check);
+			/* the staterr values are not in order, as the count
+			 * count of dd bits doesn't care. However, for end of
+			 * packet tracking, we do care, so shuffle. This also
+			 * compresses the 32-bit values to 8-bit
+			 */
+			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
+			/* store the resulting 32-bit value */
+			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
+			split_packet += RTE_FM10K_DESCS_PER_LOOP;
+
+			/* zero-out next pointers */
+			rx_pkts[pos]->next = NULL;
+			rx_pkts[pos + 1]->next = NULL;
+			rx_pkts[pos + 2]->next = NULL;
+			rx_pkts[pos + 3]->next = NULL;
+		}
+
+		/* C.3 calc available number of desc */
+		staterr = _mm_and_si128(staterr, dd_check);
+		staterr = _mm_packs_epi32(staterr, zero);
+
+		/* D.3 copy final 1,2 data to rx_pkts */
+		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
+				pkt_mb2);
+		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
+				pkt_mb1);
+
+		fm10k_desc_to_pktype_v(descs0, &rx_pkts[pos]);
+
+		/* C.4 calc avaialbe number of desc */
+		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
+		nb_pkts_recd += var;
+		if (likely(var != RTE_FM10K_DESCS_PER_LOOP))
+			break;
+	}
+
+	/* Update our internal tail pointer */
+	rxq->next_dd = (uint16_t)(rxq->next_dd + nb_pkts_recd);
+	rxq->next_dd = (uint16_t)(rxq->next_dd & (rxq->nb_desc - 1));
+	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
+
+	return nb_pkts_recd;
+}
+
+/* vPMD receive routine
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ */
+uint16_t
+fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts)
+{
+	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 05/14] fm10k: add func to do Vector RX condition check
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (3 preceding siblings ...)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 04/14] fm10k: add Vector RX function Chen Jing D(Mark)
@ 2015-10-30  8:02                 ` Chen Jing D(Mark)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 06/14] fm10k: add Vector RX scatter function Chen Jing D(Mark)
                                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 96b30a7..6c1c698 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,5 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 9633f35..64036e3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 #endif
 
 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+	/* whithout rx ol_flags, no VP flag report */
+	if (rxmode->hw_vlan_extend != 0)
+		return -1;
+#endif
+
+	/* no fdir support */
+	if (fconf->mode != RTE_FDIR_MODE_NONE)
+		return -1;
+
+	/* - no csum error report support
+	 * - no header split support
+	 */
+	if (rxmode->hw_ip_checksum == 1 ||
+	    rxmode->header_split == 1)
+		return -1;
+
+	return 0;
+#else
+	RTE_SET_USED(dev);
+	return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
 	uintptr_t p;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 06/14] fm10k: add Vector RX scatter function
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (4 preceding siblings ...)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 05/14] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
@ 2015-10-30  8:02                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 07/14] fm10k: add function to decide best RX function Chen Jing D(Mark)
                                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:02 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 6c1c698..8dba27b 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,4 +331,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+					uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 64036e3..ffd022a 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -521,3 +521,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 {
 	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+		struct rte_mbuf **rx_bufs,
+		uint16_t nb_bufs, uint8_t *split_flags)
+{
+	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+	struct rte_mbuf *start = rxq->pkt_first_seg;
+	struct rte_mbuf *end =  rxq->pkt_last_seg;
+	unsigned pkt_idx, buf_idx;
+
+	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+		if (end != NULL) {
+			/* processing a split packet */
+			end->next = rx_bufs[buf_idx];
+			start->nb_segs++;
+			start->pkt_len += rx_bufs[buf_idx]->data_len;
+			end = end->next;
+
+			if (!split_flags[buf_idx]) {
+				/* it's the last packet of the set */
+				start->hash = end->hash;
+				start->ol_flags = end->ol_flags;
+				pkts[pkt_idx++] = start;
+				start = end = NULL;
+			}
+		} else {
+			/* not processing a split packet */
+			if (!split_flags[buf_idx]) {
+				/* not a split packet, save and skip */
+				pkts[pkt_idx++] = rx_bufs[buf_idx];
+				continue;
+			}
+			end = start = rx_bufs[buf_idx];
+		}
+	}
+
+	/* save the partial packet for next time */
+	rxq->pkt_first_seg = start;
+	rxq->pkt_last_seg = end;
+	memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+	return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts > RTE_FM10K_MAX_RX_BURST, only scan RTE_FM10K_MAX_RX_BURST
+ *   numbers of DD bit
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts)
+{
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+	unsigned i = 0;
+
+	/* Split_flags only can support max of RTE_FM10K_MAX_RX_BURST */
+	nb_pkts = RTE_MIN(nb_pkts, RTE_FM10K_MAX_RX_BURST);
+	/* get some new buffers */
+	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+			split_flags);
+	if (nb_bufs == 0)
+		return 0;
+
+	/* happy day case, full burst + no packets to be joined */
+	const uint64_t *split_fl64 = (uint64_t *)split_flags;
+
+	if (rxq->pkt_first_seg == NULL &&
+			split_fl64[0] == 0 && split_fl64[1] == 0 &&
+			split_fl64[2] == 0 && split_fl64[3] == 0)
+		return nb_bufs;
+
+	/* reassemble any packets that need reassembly*/
+	if (rxq->pkt_first_seg == NULL) {
+		/* find the first split flag, and only reassemble then*/
+		while (i < nb_bufs && !split_flags[i])
+			i++;
+		if (i == nb_bufs)
+			return nb_bufs;
+	}
+	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+		&split_flags[i]);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 07/14] fm10k: add function to decide best RX function
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (5 preceding siblings ...)
  2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 06/14] fm10k: add Vector RX scatter function Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 08/14] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
                                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8dba27b..5666af6 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -189,6 +189,7 @@ struct fm10k_rx_queue {
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
 	uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+	uint16_t rx_using_sse; /* indicates that vector RX is in use */
 	uint8_t port_id;
 	uint8_t drop_en;
 	uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 6be764a..70dac2a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 			dev->data->dev_conf.rxmode.enable_scatter) {
 			uint32_t reg;
 			dev->data->scattered_rx = 1;
-			dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
 			reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
 			reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
 			FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
 
 	/* Configure RSS if applicable */
 	fm10k_dev_mq_rx_configure(dev);
+
+	/* Decide the best RX function */
+	fm10k_set_rx_function(dev);
 	return 0;
 }
 
@@ -2070,6 +2073,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 	.rss_hash_conf_get	= fm10k_rss_hash_conf_get,
 };
 
+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+	uint16_t i, rx_using_sse;
+
+	/* In order to allow Vector Rx there are a few configuration
+	 * conditions to be met.
+	 */
+	if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+		if (dev->data->scattered_rx)
+			dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+		else
+			dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+	} else if (dev->data->scattered_rx)
+		dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+	rx_using_sse =
+		(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+		dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+
+		rxq->rx_using_sse = rx_using_sse;
+	}
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2104,9 +2135,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
 
-	if (dev->data->scattered_rx)
-		dev->rx_pkt_burst = &fm10k_recv_scattered_pkts;
-
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 08/14] fm10k: add func to release mbuf in case Vector RX applied
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (6 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 07/14] fm10k: add function to decide best RX function Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 09/14] fm10k: add Vector TX function Chen Jing D(Mark)
                                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    1 +
 drivers/net/fm10k/fm10k_ethdev.c   |    6 ++++++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5666af6..d17b2fb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,6 +331,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 70dac2a..3c7b707 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* vPMD driver has a different way of releasing mbufs. */
+	if (q->rx_using_sse) {
+		fm10k_rx_queue_release_mbufs_vec(q);
+		return;
+	}
+
 	/* free software buffers */
 	for (i = 0; i < q->nb_desc; ++i) {
 		if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ffd022a..4d90d6a 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -321,6 +321,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
 	FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
 
+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+	const unsigned mask = rxq->nb_desc - 1;
+	unsigned i;
+
+	if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+		return;
+
+	/* free all mbufs that are valid in the ring */
+	for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+		rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+	rxq->rxrearm_nb = rxq->nb_desc;
+
+	/* set all entries to NULL */
+	memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 09/14] fm10k: add Vector TX function
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (7 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 08/14] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 10/14] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
                                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index d17b2fb..5525b72 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -217,6 +217,9 @@ struct fm10k_tx_queue {
 	uint16_t nb_used;
 	uint16_t free_thresh;
 	uint16_t rs_thresh;
+	/* Below 2 fields only valid in case vPMD is applied. */
+	uint16_t next_rs; /* Next pos to set RS flag */
+	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
 	uint8_t port_id;
@@ -335,4 +338,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 4d90d6a..4515b26 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -627,3 +627,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
 		&split_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf *pkt, uint64_t flags)
+{
+	__m128i descriptor = _mm_set_epi64x(flags << 56 |
+			pkt->vlan_tci << 16 | pkt->data_len,
+			MBUF_DMA_ADDR(pkt));
+	_mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+	int i;
+
+	for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+		vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+	struct rte_mbuf **txep;
+	uint8_t flags;
+	uint32_t n;
+	uint32_t i;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+	/* check DD bit on threshold descriptor */
+	flags = txq->hw_ring[txq->next_dd].flags;
+	if (!(flags & FM10K_TXD_FLAG_DONE))
+		return 0;
+
+	n = txq->rs_thresh;
+
+	/* First buffer to free from S/W ring is at index
+	 * next_dd - (rs_thresh-1)
+	 */
+	txep = &txq->sw_ring[txq->next_dd - (n - 1)];
+	m = __rte_pktmbuf_prefree_seg(txep[0]);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+							(void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < n; i++) {
+			m = __rte_pktmbuf_prefree_seg(txep[i]);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	/* buffers were freed, update counters */
+	txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+	txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+	if (txq->next_dd >= txq->nb_desc)
+		txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+	return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i;
+
+	for (i = 0; i < (int)nb_pkts; ++i)
+		txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts)
+{
+	struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+	volatile struct fm10k_tx_desc *txdp;
+	struct rte_mbuf **txep;
+	uint16_t n, nb_commit, tx_id;
+	uint64_t flags = FM10K_TXD_FLAG_LAST;
+	uint64_t rs = FM10K_TXD_FLAG_RS | FM10K_TXD_FLAG_LAST;
+	int i;
+
+	/* cross rx_thresh boundary is not allowed */
+	nb_pkts = RTE_MIN(nb_pkts, txq->rs_thresh);
+
+	if (txq->nb_free < txq->free_thresh)
+		fm10k_tx_free_bufs(txq);
+
+	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_free, nb_pkts);
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	tx_id = txq->next_free;
+	txdp = &txq->hw_ring[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
+	txq->nb_free = (uint16_t)(txq->nb_free - nb_pkts);
+
+	n = (uint16_t)(txq->nb_desc - tx_id);
+	if (nb_commit >= n) {
+		tx_backlog_entry(txep, tx_pkts, n);
+
+		for (i = 0; i < n - 1; ++i, ++tx_pkts, ++txdp)
+			vtx1(txdp, *tx_pkts, flags);
+
+		vtx1(txdp, *tx_pkts++, rs);
+
+		nb_commit = (uint16_t)(nb_commit - n);
+
+		tx_id = 0;
+		txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+		/* avoid reach the end of ring */
+		txdp = &(txq->hw_ring[tx_id]);
+		txep = &txq->sw_ring[tx_id];
+	}
+
+	tx_backlog_entry(txep, tx_pkts, nb_commit);
+
+	vtx(txdp, tx_pkts, nb_commit, flags);
+
+	tx_id = (uint16_t)(tx_id + nb_commit);
+	if (tx_id > txq->next_rs) {
+		txq->hw_ring[txq->next_rs].flags |= FM10K_TXD_FLAG_RS;
+		txq->next_rs = (uint16_t)(txq->next_rs + txq->rs_thresh);
+	}
+
+	txq->next_free = tx_id;
+
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, txq->next_free);
+
+	return nb_pkts;
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 10/14] fm10k: use func pointer to reset TX queue and mbuf release
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (8 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 09/14] fm10k: add Vector TX function Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 11/14] fm10k: introduce 2 funcs " Chen Jing D(Mark)
                                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++++++
 drivers/net/fm10k/fm10k_ethdev.c |   24 +++++++++++++++++++-----
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5525b72..bfb71da 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -206,11 +206,14 @@ struct fifo {
 	uint16_t *endp;
 };
 
+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
 	struct rte_mbuf **sw_ring;
 	struct fm10k_tx_desc *hw_ring;
 	uint64_t hw_ring_phys_addr;
 	struct fifo rs_tracker;
+	const struct fm10k_txq_ops *ops; /* txq ops */
 	uint16_t last_free;
 	uint16_t next_free;
 	uint16_t nb_free;
@@ -227,6 +230,11 @@ struct fm10k_tx_queue {
 	uint16_t queue_id;
 };
 
+struct fm10k_txq_ops {
+	void (*release_mbufs)(struct fm10k_tx_queue *txq);
+	void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
 	((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))
 
@@ -340,4 +348,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
 					uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7b707..0b40797 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
 	return 0;
 }
 
+static const struct fm10k_txq_ops def_txq_ops = {
+	.release_mbufs = tx_queue_free,
+	.reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,9 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	PMD_INIT_FUNC_TRACE();
 
 	if (tx_queue_id < dev->data->nb_tx_queues) {
-		tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+		struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+
+		q->ops->reset(q);
 
 		/* reset head and tail pointers */
 		FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +844,11 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 
 	if (dev->data->tx_queues) {
-		for (i = 0; i < dev->data->nb_tx_queues; i++)
-			fm10k_tx_queue_release(dev->data->tx_queues[i]);
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+
+			txq->ops->release_mbufs(txq);
+		}
 	}
 
 	if (dev->data->rx_queues) {
@@ -1455,7 +1465,9 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	 * different socket than was previously used.
 	 */
 	if (dev->data->tx_queues[queue_id] != NULL) {
-		tx_queue_free(dev->data->tx_queues[queue_id]);
+		struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+
+		txq->ops->release_mbufs(txq);
 		dev->data->tx_queues[queue_id] = NULL;
 	}
 
@@ -1471,6 +1483,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
 	if (handle_txconf(q, conf))
@@ -1529,9 +1542,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+	struct fm10k_tx_queue *q = queue;
 	PMD_INIT_FUNC_TRACE();
 
-	tx_queue_free(queue);
+	q->ops->release_mbufs(q);
 }
 
 static int
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 11/14] fm10k: introduce 2 funcs to reset TX queue and mbuf release
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (9 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 10/14] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 12/14] fm10k: Add function to decide best TX func Chen Jing D(Mark)
                                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 ++++++++++++++++++++++++++++++++++++
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 4515b26..06beca9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -628,6 +633,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 		&split_flags[i]);
 }
 
+static const struct fm10k_txq_ops vec_txq_ops = {
+	.release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+	.reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+	txq->ops = &vec_txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
@@ -777,3 +793,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+	unsigned i;
+	const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+	if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+		return;
+
+	/* release the used mbufs in sw_ring */
+	for (i = txq->next_dd - (txq->rs_thresh - 1);
+	     i != txq->next_free;
+	     i = (i + 1) & max_desc)
+		rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+	txq->nb_free = max_desc;
+
+	/* reset tx_entry */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->sw_ring[i] = NULL;
+
+	rte_free(txq->sw_ring);
+	txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+	static const struct fm10k_tx_desc zeroed_desc = {0};
+	struct rte_mbuf **txe = txq->sw_ring;
+	uint16_t i;
+
+	/* Zero out HW ring memory */
+	for (i = 0; i < txq->nb_desc; i++)
+		txq->hw_ring[i] = zeroed_desc;
+
+	/* Initialize SW ring entries */
+	for (i = 0; i < txq->nb_desc; i++)
+		txe[i] = NULL;
+
+	txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+	txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+	txq->next_free = 0;
+	txq->nb_used = 0;
+	/* Always allow 1 descriptor to be un-allocated to avoid
+	 * a H/W race condition
+	 */
+	txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+	FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 12/14] fm10k: Add function to decide best TX func
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (10 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 11/14] fm10k: introduce 2 funcs " Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 13/14] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
                                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index bfb71da..8e2c6a4 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -224,6 +224,7 @@ struct fm10k_tx_queue {
 	uint16_t next_rs; /* Next pos to set RS flag */
 	uint16_t next_dd; /* Next pos to check DD flag */
 	volatile uint32_t *tail_ptr;
+	uint32_t txq_flags; /* Holds flags for this TXq */
 	uint16_t nb_desc;
 	uint8_t port_id;
 	uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0b40797..05ed90d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
 
+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+				ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);
 
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
 				base_addr >> (CHAR_BIT * sizeof(uint32_t)));
 		FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
 	}
+
+	/* set up vector or scalar TX function as appropriate */
+	fm10k_set_tx_function(dev);
+
 	return 0;
 }
 
@@ -983,8 +991,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		},
 		.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
 		.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-				ETH_TXQ_FLAGS_NOOFFLOADS,
+		.txq_flags = FM10K_SIMPLE_TX_FLAG,
 	};
 
 }
@@ -1483,6 +1490,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	q->nb_desc = nb_desc;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
+	q->txq_flags = conf->txq_flags;
 	q->ops = &def_txq_ops;
 	q->tail_ptr = (volatile uint32_t *)
 		&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2094,6 +2102,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };
 
 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+	struct fm10k_tx_queue *txq;
+	int i;
+	int use_sse = 1;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) !=
+			FM10K_SIMPLE_TX_FLAG) {
+			use_sse = 0;
+			break;
+		}
+	}
+
+	if (use_sse) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			fm10k_txq_vec_setup(txq);
+		}
+		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+	} else
+		dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
 	struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 13/14] fm10k: fix a crash issue in vector RX func
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (11 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 12/14] fm10k: Add function to decide best TX func Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 14/14] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
  2015-10-30  8:26                 ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
  14 siblings, 0 replies; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    4 +++-
 drivers/net/fm10k/fm10k_ethdev.c |   19 +++++++++++++++++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8e2c6a4..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,7 +177,7 @@ struct fm10k_rx_queue {
 	struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
 	uint64_t hw_ring_phys_addr;
 	uint64_t mbuf_initializer; /* value to init mbufs */
-	/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+	/* need to alloc dummy mbuf, for wraparound when scanning hw ring */
 	struct rte_mbuf fake_mbuf;
 	uint16_t next_dd;
 	uint16_t next_alloc;
@@ -185,6 +185,8 @@ struct fm10k_rx_queue {
 	uint16_t alloc_thresh;
 	volatile uint32_t *tail_ptr;
 	uint16_t nb_desc;
+	/* Number of faked desc added at the tail for Vector RX function */
+	uint16_t nb_fake_desc;
 	uint16_t queue_id;
 	/* Below 2 fields only valid in case vPMD is applied. */
 	uint16_t rxrearm_nb;     /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 05ed90d..dde067f 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+	static const union fm10k_rx_desc zero = {{0} };
 	uint64_t dma_addr;
 	int i, diag;
 	PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 		q->hw_ring[i].q.hdr_addr = dma_addr;
 	}
 
+	/* initialize extra software ring entries. Space for these extra
+	 * entries is always allocated.
+	 */
+	memset(&q->fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+	for (i = 0; i < q->nb_fake_desc; ++i) {
+		q->sw_ring[q->nb_desc + i] = &q->fake_mbuf;
+		q->hw_ring[q->nb_desc + i] = zero;
+	}
+
 	q->next_dd = 0;
 	q->next_alloc = 0;
 	q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
 	for (i = 0; i < q->nb_desc; ++i)
 		q->hw_ring[i] = zero;
 
+	/* zero faked descriptors */
+	for (i = 0; i < q->nb_fake_desc; ++i)
+		q->hw_ring[q->nb_desc + i] = zero;
+
 	/* vPMD driver has a different way of releasing mbufs. */
 	if (q->rx_using_sse) {
 		fm10k_rx_queue_release_mbufs_vec(q);
@@ -1326,6 +1340,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 	/* setup queue */
 	q->mp = mp;
 	q->nb_desc = nb_desc;
+	q->nb_fake_desc = FM10K_MULT_RX_DESC;
 	q->port_id = dev->data->port_id;
 	q->queue_id = queue_id;
 	q->tail_ptr = (volatile uint32_t *)
@@ -1335,8 +1350,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
 
 	/* allocate memory for the software ring */
 	q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-					nb_desc * sizeof(struct rte_mbuf *),
-					RTE_CACHE_LINE_SIZE, socket_id);
+			(nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+			RTE_CACHE_LINE_SIZE, socket_id);
 	if (q->sw_ring == NULL) {
 		PMD_INIT_LOG(ERR, "Cannot allocate software ring");
 		rte_free(q);
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [dpdk-dev] [PATCH v5 14/14] doc: release notes update for fm10k Vector PMD
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (12 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 13/14] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
@ 2015-10-30  8:03                 ` Chen Jing D(Mark)
  2015-11-02  8:36                   ` Thomas Monjalon
  2015-10-30  8:26                 ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
  14 siblings, 1 reply; 109+ messages in thread
From: Chen Jing D(Mark) @ 2015-10-30  8:03 UTC (permalink / raw)
  To: dev

From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 doc/guides/rel_notes/release_2_2.rst |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 89e4d58..d9d4ce5 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -79,6 +79,12 @@ Drivers
 
   Fixed issue when releasing null control queue.
 
+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.
+
 
 Libraries
 ~~~~~~~~~
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k
  2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
                                   ` (13 preceding siblings ...)
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 14/14] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
@ 2015-10-30  8:26                 ` Liang, Cunming
  2015-11-02  8:38                   ` Thomas Monjalon
  14 siblings, 1 reply; 109+ messages in thread
From: Liang, Cunming @ 2015-10-30  8:26 UTC (permalink / raw)
  To: Chen, Jing D, dev



> -----Original Message-----
> From: Chen, Jing D
> Sent: Friday, October 30, 2015 4:03 PM
> To: dev@dpdk.org
> Cc: Liang, Cunming; Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson,
> Bruce; Chen, Jing D
> Subject: [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k
> 
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> 
> v5:
>  - Fix some warnings reported by checkpatch.pl
>  - Squash 3 patches into 1 to avoid compile error on unsued functions.
>  - Sync with master branch
> 
> v4:
>  - Clear HW/SW ring content after allocating mbuf failed.
> 
> v3:
>  - Add a blank line after variable definition.
>  - Do floor alignment for passing in argument nb_pkts to avoid memory
> overwritten.
>  - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.
> 
> v2:
>  - Fix a typo issue.
>  - Fix an improper prefetch in vector RX function, in which prefetches
>    un-initialized mbuf.
>  - Remove limitation on number of desc pointer in vector RX function.
>  - Re-organize some comments.
>  - Add a new patch to fix a crash issue in vector RX func.
>  - Add a new patch to update release notes.
> 
> v1:
> This patch set includes Vector Rx/Tx functions to receive/transmit packets
> for fm10k devices. It also contains logic to do sanity check for proper
> RX/TX function selections.
> 
> Chen Jing D(Mark) (14):
>   fm10k: add new vPMD file
>   fm10k: add vPMD pre-condition check for each RX queue
>   fm10k: Add a new func to initialize all parameters
>   fm10k: add Vector RX function
>   fm10k: add func to do Vector RX condition check
>   fm10k: add Vector RX scatter function
>   fm10k: add function to decide best RX function
>   fm10k: add func to release mbuf in case Vector RX applied
>   fm10k: add Vector TX function
>   fm10k: use func pointer to reset TX queue and mbuf release
>   fm10k: introduce 2 funcs to reset TX queue and mbuf release
>   fm10k: Add function to decide best TX func
>   fm10k: fix a crash issue in vector RX func
>   doc: release notes update for fm10k Vector PMD
> 
>  doc/guides/rel_notes/release_2_2.rst |    6 +
>  drivers/net/fm10k/Makefile           |    1 +
>  drivers/net/fm10k/fm10k.h            |   45 ++-
>  drivers/net/fm10k/fm10k_ethdev.c     |  172 ++++++-
>  drivers/net/fm10k/fm10k_rxtx_vec.c   |  847
> ++++++++++++++++++++++++++++++++++
>  5 files changed, 1043 insertions(+), 28 deletions(-)
>  create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c
> 
> --
> 1.7.7.6
Acked-by: Cunming Liang <cunming.liang@intel.com>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v5 14/14] doc: release notes update for fm10k Vector PMD
  2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 14/14] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
@ 2015-11-02  8:36                   ` Thomas Monjalon
  0 siblings, 0 replies; 109+ messages in thread
From: Thomas Monjalon @ 2015-11-02  8:36 UTC (permalink / raw)
  To: Chen Jing D(Mark); +Cc: dev

> --- a/doc/guides/rel_notes/release_2_2.rst
> +++ b/doc/guides/rel_notes/release_2_2.rst
> @@ -79,6 +79,12 @@ Drivers
>  
>    Fixed issue when releasing null control queue.
>  
> +* **fm10k:  Add Vector Rx/Tx implementation.**
> +
> +  This patch set includes Vector Rx/Tx functions to receive/transmit packets
> +  for fm10k devices. It also contains logic to do sanity check for proper
> +  RX/TX function selections.

I am going to reword it in the right section.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k
  2015-10-30  8:26                 ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
@ 2015-11-02  8:38                   ` Thomas Monjalon
  0 siblings, 0 replies; 109+ messages in thread
From: Thomas Monjalon @ 2015-11-02  8:38 UTC (permalink / raw)
  To: Chen, Jing D; +Cc: dev

> > Chen Jing D(Mark) (14):
> >   fm10k: add new vPMD file
> >   fm10k: add vPMD pre-condition check for each RX queue
> >   fm10k: Add a new func to initialize all parameters
> >   fm10k: add Vector RX function
> >   fm10k: add func to do Vector RX condition check
> >   fm10k: add Vector RX scatter function
> >   fm10k: add function to decide best RX function
> >   fm10k: add func to release mbuf in case Vector RX applied
> >   fm10k: add Vector TX function
> >   fm10k: use func pointer to reset TX queue and mbuf release
> >   fm10k: introduce 2 funcs to reset TX queue and mbuf release
> >   fm10k: Add function to decide best TX func
> >   fm10k: fix a crash issue in vector RX func
> >   doc: release notes update for fm10k Vector PMD
> 
> Acked-by: Cunming Liang <cunming.liang@intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2015-11-02  8:40 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 13:03 [dpdk-dev] [PATCH 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
2015-10-22  9:44   ` [dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
2015-10-22 15:58       ` Stephen Hemminger
2015-10-23  8:39         ` Chen, Jing D
2015-10-23 10:01           ` Bruce Richardson
2015-10-27  5:26             ` Chen, Jing D
2015-10-27  9:46       ` [dpdk-dev] [PATCH v3 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
2015-10-29  9:15           ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file Chen Jing D(Mark)
2015-10-30  8:02               ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Chen Jing D(Mark)
2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 01/14] fm10k: add new vPMD file Chen Jing D(Mark)
2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 02/14] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 03/14] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 04/14] fm10k: add Vector RX function Chen Jing D(Mark)
2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 05/14] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
2015-10-30  8:02                 ` [dpdk-dev] [PATCH v5 06/14] fm10k: add Vector RX scatter function Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 07/14] fm10k: add function to decide best RX function Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 08/14] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 09/14] fm10k: add Vector TX function Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 10/14] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 11/14] fm10k: introduce 2 funcs " Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 12/14] fm10k: Add function to decide best TX func Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 13/14] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
2015-10-30  8:03                 ` [dpdk-dev] [PATCH v5 14/14] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
2015-11-02  8:36                   ` Thomas Monjalon
2015-10-30  8:26                 ` [dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
2015-11-02  8:38                   ` Thomas Monjalon
2015-10-29  9:15             ` [dpdk-dev] [PATCH v4 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
2015-10-29  9:16             ` [dpdk-dev] [PATCH v4 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
2015-10-29 10:22             ` [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k Liang, Cunming
2015-10-29 23:12               ` Thomas Monjalon
2015-10-30  3:09                 ` Chen, Jing D
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
2015-10-28 13:58           ` Liang, Cunming
2015-10-29  5:24             ` Chen, Jing D
2015-10-29  8:14               ` Liang, Cunming
2015-10-29  8:37                 ` Chen, Jing D
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
2015-10-28 14:30           ` Liang, Cunming
2015-10-29  5:27             ` Chen, Jing D
2015-10-29  8:06               ` Liang, Cunming
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
2015-10-27  9:46         ` [dpdk-dev] [PATCH v3 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 02/16] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
2015-10-22 15:57       ` Stephen Hemminger
2015-10-23  8:27         ` Chen, Jing D
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 04/16] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 05/16] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function Chen Jing D(Mark)
2015-10-27  5:24       ` Liang, Cunming
2015-10-27  5:32         ` Chen, Jing D
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 07/16] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function Chen Jing D(Mark)
2015-10-27  5:27       ` Liang, Cunming
2015-10-27  5:43         ` Chen, Jing D
2015-10-27  5:55           ` Chen, Jing D
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 09/16] fm10k: add function to decide best RX function Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 10/16] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
2015-10-22  9:44     ` [dpdk-dev] [PATCH v2 11/16] fm10k: add Vector TX function Chen Jing D(Mark)
2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 12/16] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 13/16] fm10k: introduce 2 funcs " Chen Jing D(Mark)
2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 14/16] fm10k: Add function to decide best TX func Chen Jing D(Mark)
2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 15/16] fm10k: fix a crash issue in vector RX func Chen Jing D(Mark)
2015-10-22  9:45     ` [dpdk-dev] [PATCH v2 16/16] doc: release notes update for fm10k Vector PMD Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 02/14] fm10k: add vPMD pre-condition check for each RX queue Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 03/14] fm10k: Add a new func to initialize all parameters Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 04/14] fm10k: add func to re-allocate mbuf for RX ring Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 05/14] fm10k: add 2 functions to parse pkt_type and offload flag Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function Chen Jing D(Mark)
2015-09-29 13:14   ` Ananyev, Konstantin
2015-09-29 14:22     ` Bruce Richardson
2015-09-30 13:23       ` Chen, Jing D
2015-09-30 13:18     ` Chen, Jing D
2015-09-29 13:03 ` [dpdk-dev] [PATCH 07/14] fm10k: add func to do Vector RX condition check Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 08/14] fm10k: add Vector RX scatter function Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 09/14] fm10k: add function to decide best RX function Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 10/14] fm10k: add func to release mbuf in case Vector RX applied Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 11/14] fm10k: add Vector TX function Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 12/14] fm10k: use func pointer to reset TX queue and mbuf release Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 13/14] fm10k: introduce 2 funcs " Chen Jing D(Mark)
2015-09-29 13:03 ` [dpdk-dev] [PATCH 14/14] fm10k: Add function to decide best TX func Chen Jing D(Mark)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).