DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF
@ 2016-05-04 21:10 Wenzhuo Lu
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 1/4] ixgbe: VF supports mailbox interruption for PF link up/down Wenzhuo Lu
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Wenzhuo Lu @ 2016-05-04 21:10 UTC (permalink / raw)
  To: dev

Now if the PF link is down and up, VF doesn't handle this event,
user need to reset the VF port to let it recover.
This patch set addes the support of the mailbox interruption on
VF. So, VF can receice the messges for physical link down/up.
And VF will handle this event and let the VF link recover
automatically.

Wenzhuo Lu (4):
  ixgbe: VF supports mailbox interruption for PF link up/down
  igb: VF supports mailbox interruption for PF link up/down
  ixgbe: automatic link recovery on VF
  igb: automatic link recovery on VF

 doc/guides/rel_notes/release_16_07.rst |  11 ++
 drivers/net/e1000/e1000_ethdev.h       |  14 ++
 drivers/net/e1000/igb_ethdev.c         | 244 +++++++++++++++++++++++++++++++++
 drivers/net/e1000/igb_rxtx.c           |  38 +++++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 169 ++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  14 ++
 drivers/net/ixgbe/ixgbe_rxtx.c         |  34 +++++
 drivers/net/ixgbe/ixgbe_rxtx.h         |   2 +
 8 files changed, 523 insertions(+), 3 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH 1/4] ixgbe: VF supports mailbox interruption for PF link up/down
  2016-05-04 21:10 [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Wenzhuo Lu
@ 2016-05-04 21:10 ` Wenzhuo Lu
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 2/4] igb: " Wenzhuo Lu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Wenzhuo Lu @ 2016-05-04 21:10 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

In this scenario, kernel PF + DPDK VF, when PF finds the link
state is changed, up -> down or down -> up, it will send a
mailbox message to VF.
This patch enables the support of the interruption of mailbox,
so VF can receive the message for link up/down.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  6 +++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 85 ++++++++++++++++++++++++++++++++--
 2 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index 83c841b..be702fd 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -34,6 +34,12 @@ This section should contain new features added in this release. Sample format:
 
   Refer to the previous release notes for examples.
 
+* **Added mailbox interruption support for ixgbe VF.**
+
+  When the link becomes down or up, PF will use mailbox message to notice
+  VF. To handle this link up/down event, add the mailbox interruption
+  support to receive the message.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index eec607c..8e5f64f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -151,6 +151,8 @@
 #define IXGBE_VMTIR(_i) (0x00017000 + ((_i) * 4)) /* 64 of these (0-63) */
 #define IXGBE_QDE_STRIP_TAG                    0x00000004
 
+#define IXGBE_VTEICR_MASK        0x07
+
 enum ixgbevf_xcast_modes {
 	IXGBEVF_XCAST_MODE_NONE = 0,
 	IXGBEVF_XCAST_MODE_MULTI,
@@ -361,6 +363,8 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev *dev,
 				   struct timespec *timestamp);
 static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
+static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle,
+					  void *param);
 
 static int ixgbe_dev_l2_tunnel_eth_type_conf
 	(struct rte_eth_dev *dev, struct rte_eth_l2_tunnel_conf *l2_tunnel);
@@ -1441,6 +1445,12 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 			return -EIO;
 	}
 
+	rte_intr_callback_register(&pci_dev->intr_handle,
+				   ixgbevf_dev_interrupt_handler,
+				   (void *)eth_dev);
+	rte_intr_enable(&pci_dev->intr_handle);
+	ixgbevf_intr_enable(hw);
+
 	PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x mac.type=%s",
 		     eth_dev->data->port_id, pci_dev->id.vendor_id,
 		     pci_dev->id.device_id, "ixgbe_mac_82599_vf");
@@ -1454,6 +1464,7 @@ static int
 eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct ixgbe_hw *hw;
+	struct rte_pci_device *pci_dev = eth_dev->pci_dev;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -1475,6 +1486,11 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
+	rte_intr_disable(&pci_dev->intr_handle);
+	rte_intr_callback_unregister(&pci_dev->intr_handle,
+				     ixgbevf_dev_interrupt_handler,
+				     (void *)eth_dev);
+
 	return 0;
 }
 
@@ -4073,6 +4089,8 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 
 	PMD_INIT_FUNC_TRACE();
 
+	ixgbevf_intr_disable(hw);
+
 	hw->adapter_stopped = 1;
 	ixgbe_stop_adapter(hw);
 
@@ -4796,6 +4814,9 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 	uint32_t q_idx;
 	uint32_t vector_idx = IXGBE_MISC_VEC_ID;
 
+	/* Configure VF other cause ivar */
+	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd.
 	 */
@@ -4810,9 +4831,6 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 		ixgbevf_set_ivar_map(hw, 0, q_idx, vector_idx);
 		intr_handle->intr_vec[q_idx] = vector_idx;
 	}
-
-	/* Configure VF other cause ivar */
-	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
 }
 
 /**
@@ -7131,6 +7149,67 @@ ixgbevf_dev_allmulticast_disable(struct rte_eth_dev *dev)
 	ixgbevf_update_xcast_mode(hw, IXGBEVF_XCAST_MODE_NONE);
 }
 
+static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	u32 in_msg = 0;
+
+	if (ixgbe_read_mbx(hw, &in_msg, 1, 0))
+		return;
+
+	/* PF reset VF event */
+	if (in_msg == IXGBE_PF_CONTROL_MSG)
+		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET);
+}
+
+static int
+ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
+{
+	uint32_t eicr;
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ixgbe_interrupt *intr =
+		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
+	ixgbevf_intr_disable(hw);
+
+	/* read-on-clear nic registers here */
+	eicr = IXGBE_READ_REG(hw, IXGBE_VTEICR);
+	intr->flags = 0;
+
+	/* only one misc vector supported - mailbox */
+	eicr &= IXGBE_VTEICR_MASK;
+	if (eicr == IXGBE_MISC_VEC_ID)
+		intr->flags |= IXGBE_FLAG_MAILBOX;
+
+	return 0;
+}
+
+static int
+ixgbevf_dev_interrupt_action(struct rte_eth_dev *dev)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ixgbe_interrupt *intr =
+		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
+
+	if (intr->flags & IXGBE_FLAG_MAILBOX) {
+		ixgbevf_mbx_process(dev);
+		intr->flags &= ~IXGBE_FLAG_MAILBOX;
+	}
+
+	ixgbevf_intr_enable(hw);
+
+	return 0;
+}
+
+static void
+ixgbevf_dev_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+			      void *param)
+{
+	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
+
+	ixgbevf_dev_interrupt_get_status(dev);
+	ixgbevf_dev_interrupt_action(dev);
+}
+
 static struct rte_driver rte_ixgbe_driver = {
 	.type = PMD_PDEV,
 	.init = rte_ixgbe_pmd_init,
-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH 2/4] igb: VF supports mailbox interruption for PF link up/down
  2016-05-04 21:10 [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Wenzhuo Lu
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 1/4] ixgbe: VF supports mailbox interruption for PF link up/down Wenzhuo Lu
@ 2016-05-04 21:10 ` Wenzhuo Lu
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF Wenzhuo Lu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Wenzhuo Lu @ 2016-05-04 21:10 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

In this scenario, kernel PF + DPDK VF, when PF finds the link
state is changed, up -> down or down -> up, it will send a
mailbox message to VF.
This patch enables the support of the interruption of mailbox,
so VF can receive the message for link up/down.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |   2 +-
 drivers/net/e1000/igb_ethdev.c         | 159 +++++++++++++++++++++++++++++++++
 2 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index be702fd..8d45915 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -34,7 +34,7 @@ This section should contain new features added in this release. Sample format:
 
   Refer to the previous release notes for examples.
 
-* **Added mailbox interruption support for ixgbe VF.**
+* **Added mailbox interruption support for ixgbe/igb VF.**
 
   When the link becomes down or up, PF will use mailbox message to notice
   VF. To handle this link up/down event, add the mailbox interruption
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index f0921ee..b0e5e6a 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -86,6 +86,12 @@
 #define E1000_INCVALUE_82576         (16 << IGB_82576_TSYNC_SHIFT)
 #define E1000_TSAUXC_DISABLE_SYSTIME 0x80000000
 
+#define E1000_VTIVAR_MISC                0x01740
+#define E1000_VTIVAR_MISC_MASK           0xFF
+#define E1000_VTIVAR_VALID               0x80
+#define E1000_VTIVAR_MISC_MAILBOX        0
+#define E1000_VTIVAR_MISC_INTR_MASK      0x3
+
 static int  eth_igb_configure(struct rte_eth_dev *dev);
 static int  eth_igb_start(struct rte_eth_dev *dev);
 static void eth_igb_stop(struct rte_eth_dev *dev);
@@ -259,6 +265,9 @@ static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 			       uint8_t index, uint8_t offset);
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
+					void *param);
+static void igbvf_mbx_process(struct rte_eth_dev *dev);
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -554,6 +563,41 @@ igb_intr_disable(struct e1000_hw *hw)
 	E1000_WRITE_FLUSH(hw);
 }
 
+static inline void
+igbvf_intr_enable(struct rte_eth_dev *dev)
+{
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	/* only for mailbox */
+	E1000_WRITE_REG(hw, E1000_EIAM, 1 << E1000_VTIVAR_MISC_MAILBOX);
+	E1000_WRITE_REG(hw, E1000_EIAC, 1 << E1000_VTIVAR_MISC_MAILBOX);
+	E1000_WRITE_REG(hw, E1000_EIMS, 1 << E1000_VTIVAR_MISC_MAILBOX);
+	E1000_WRITE_FLUSH(hw);
+}
+
+/* only for mailbox now. If RX/TX needed, should extend this function.  */
+static void
+igbvf_set_ivar_map(struct e1000_hw *hw, uint8_t msix_vector)
+{
+	uint32_t tmp = 0;
+
+	/* mailbox */
+	tmp |= (msix_vector & E1000_VTIVAR_MISC_INTR_MASK);
+	tmp |= E1000_VTIVAR_VALID;
+	E1000_WRITE_REG(hw, E1000_VTIVAR_MISC, tmp);
+}
+
+static void
+eth_igbvf_configure_msix_intr(struct rte_eth_dev *dev)
+{
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	/* Configure VF other cause ivar */
+	igbvf_set_ivar_map(hw, E1000_VTIVAR_MISC_MAILBOX);
+}
+
 static inline int32_t
 igb_pf_reset_hw(struct e1000_hw *hw)
 {
@@ -942,6 +986,10 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 		     eth_dev->data->port_id, pci_dev->id.vendor_id,
 		     pci_dev->id.device_id, "igb_mac_82576_vf");
 
+	rte_intr_callback_register(&pci_dev->intr_handle,
+				   eth_igbvf_interrupt_handler,
+				   (void *)eth_dev);
+
 	return 0;
 }
 
@@ -950,6 +998,7 @@ eth_igbvf_dev_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct e1000_adapter *adapter =
 		E1000_DEV_PRIVATE(eth_dev->data->dev_private);
+	struct rte_pci_device *pci_dev = eth_dev->pci_dev;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -966,6 +1015,12 @@ eth_igbvf_dev_uninit(struct rte_eth_dev *eth_dev)
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
+	/* disable uio intr before callback unregister */
+	rte_intr_disable(&pci_dev->intr_handle);
+	rte_intr_callback_unregister(&pci_dev->intr_handle,
+				     eth_igbvf_interrupt_handler,
+				     (void *)eth_dev);
+
 	return 0;
 }
 
@@ -2564,6 +2619,69 @@ eth_igb_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
 }
 
 static int
+eth_igbvf_interrupt_get_status(struct rte_eth_dev *dev)
+{
+	uint32_t eicr;
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct e1000_interrupt *intr =
+		E1000_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
+
+	igbvf_intr_disable(hw);
+
+	/* read-on-clear nic registers here */
+	eicr = E1000_READ_REG(hw, E1000_EICR);
+	intr->flags = 0;
+
+	if (eicr == E1000_VTIVAR_MISC_MAILBOX)
+		intr->flags |= E1000_FLAG_MAILBOX;
+
+	return 0;
+}
+
+void igbvf_mbx_process(struct rte_eth_dev *dev)
+{
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct e1000_mbx_info *mbx = &hw->mbx;
+	u32 in_msg = 0;
+
+	if (mbx->ops.read(hw, &in_msg, 1, 0))
+		return;
+
+	/* PF reset VF event */
+	if (in_msg == E1000_PF_CONTROL_MSG)
+		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET);
+}
+
+static int
+eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
+{
+	struct e1000_interrupt *intr =
+		E1000_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
+
+	if (intr->flags & E1000_FLAG_MAILBOX) {
+		igbvf_mbx_process(dev);
+		intr->flags &= ~E1000_FLAG_MAILBOX;
+	}
+
+	igbvf_intr_enable(dev);
+	rte_intr_enable(&dev->pci_dev->intr_handle);
+
+	return 0;
+}
+
+static void
+eth_igbvf_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+			    void *param)
+{
+	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
+
+	eth_igbvf_interrupt_get_status(dev);
+	eth_igbvf_interrupt_action(dev);
+}
+
+static int
 eth_igb_led_on(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw;
@@ -2834,6 +2952,8 @@ igbvf_dev_start(struct rte_eth_dev *dev)
 	struct e1000_adapter *adapter =
 		E1000_DEV_PRIVATE(dev->data->dev_private);
 	int ret;
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+	uint32_t intr_vector = 0;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -2853,12 +2973,41 @@ igbvf_dev_start(struct rte_eth_dev *dev)
 		return ret;
 	}
 
+	/* check and configure queue intr-vector mapping */
+	if (dev->data->dev_conf.intr_conf.rxq != 0) {
+		intr_vector = dev->data->nb_rx_queues;
+		ret = rte_intr_efd_enable(intr_handle, intr_vector);
+		if (ret)
+			return ret;
+	}
+
+	if (rte_intr_dp_is_en(intr_handle) && !intr_handle->intr_vec) {
+		intr_handle->intr_vec =
+			rte_zmalloc("intr_vec",
+				    dev->data->nb_rx_queues * sizeof(int), 0);
+		if (!intr_handle->intr_vec) {
+			PMD_INIT_LOG(ERR, "Failed to allocate %d rx_queues"
+				     " intr_vec\n", dev->data->nb_rx_queues);
+			return -ENOMEM;
+		}
+	}
+
+	eth_igbvf_configure_msix_intr(dev);
+
+	/* enable uio/vfio intr/eventfd mapping */
+	rte_intr_enable(intr_handle);
+
+	/* resume enabled intr since hw reset */
+	igbvf_intr_enable(dev);
+
 	return 0;
 }
 
 static void
 igbvf_dev_stop(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+
 	PMD_INIT_FUNC_TRACE();
 
 	igbvf_stop_adapter(dev);
@@ -2870,6 +3019,16 @@ igbvf_dev_stop(struct rte_eth_dev *dev)
 	igbvf_set_vfta_all(dev,0);
 
 	igb_dev_clear_queues(dev);
+
+	/* disable intr eventfd mapping */
+	rte_intr_disable(intr_handle);
+
+	/* Clean datapath event and queue/vec mapping */
+	rte_intr_efd_disable(intr_handle);
+	if (intr_handle->intr_vec) {
+		rte_free(intr_handle->intr_vec);
+		intr_handle->intr_vec = NULL;
+	}
 }
 
 static void
-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
  2016-05-04 21:10 [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Wenzhuo Lu
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 1/4] ixgbe: VF supports mailbox interruption for PF link up/down Wenzhuo Lu
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 2/4] igb: " Wenzhuo Lu
@ 2016-05-04 21:10 ` Wenzhuo Lu
  2016-05-16 12:01   ` Olivier Matz
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 4/4] igb: " Wenzhuo Lu
  2016-05-24  5:46 ` [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Lu, Wenzhuo
  4 siblings, 1 reply; 10+ messages in thread
From: Wenzhuo Lu @ 2016-05-04 21:10 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

When the physical link is down and recover later,
the VF link cannot recover until the user stop and
start it manually.
This patch implements the automatic recovery of VF
port.
The automatic recovery bases on the link up/down
message received from PF. When VF receives the link
up/down message, it will replace the RX/TX and
operation functions with fake ones to stop RX/TX
and any future operation. Then reset the VF port.
After successfully resetting the port, recover the
RX/TX and operation functions.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  5 ++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 86 +++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       | 14 ++++++
 drivers/net/ixgbe/ixgbe_rxtx.c         | 34 ++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h         |  2 +
 5 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index 8d45915..d80f449 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -40,6 +40,11 @@ This section should contain new features added in this release. Sample format:
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
+* **Added the support of automatic link recovery for ixgbe VF.**
+
+  When the physical link becomes down and recover later, VF will receive
+  the mailbox message for that. VF handles this message by resetting the
+  VF port. Then the VF link can recover automatically.
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 8e5f64f..f1f67f2 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -589,6 +589,8 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
 	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
 };
 
+static const struct eth_dev_ops ixgbevf_eth_dev_ops_fake = {NULL};
+
 /* store statistics names and its offset in stats structure */
 struct rte_ixgbe_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -1322,12 +1324,15 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 	struct ixgbe_hwstrip *hwstrip =
 		IXGBE_DEV_PRIVATE_TO_HWSTRIP_BITMAP(eth_dev->data->dev_private);
 	struct ether_addr *perm_addr = (struct ether_addr *) hw->mac.perm_addr;
+	struct ixgbe_adapter *adapter =
+		(struct ixgbe_adapter *)eth_dev->data->dev_private;
 
 	PMD_INIT_FUNC_TRACE();
 
 	eth_dev->dev_ops = &ixgbevf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	rte_spinlock_init(&adapter->vf_reset_lock);
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -7152,14 +7157,93 @@ ixgbevf_dev_allmulticast_disable(struct rte_eth_dev *dev)
 static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ixgbe_adapter *adapter =
+		(struct ixgbe_adapter *)dev->data->dev_private;
 	u32 in_msg = 0;
 
 	if (ixgbe_read_mbx(hw, &in_msg, 1, 0))
 		return;
 
 	/* PF reset VF event */
-	if (in_msg == IXGBE_PF_CONTROL_MSG)
+	if (in_msg == IXGBE_PF_CONTROL_MSG) {
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET);
+
+		/* Stop the ops and rx/tx */
+		if (dev->data->dev_started) {
+			PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+			dev->dev_ops = &ixgbevf_eth_dev_ops_fake;
+
+			adapter->rx_backup = dev->rx_pkt_burst;
+			adapter->tx_backup = dev->tx_pkt_burst;
+			dev->rx_pkt_burst = ixgbevf_recv_pkts_fake;
+			dev->tx_pkt_burst = ixgbevf_xmit_pkts_fake;
+		}
+	}
+}
+
+void
+ixgbevf_dev_link_up_down_handler(struct rte_eth_dev *dev)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ixgbe_adapter *adapter =
+		(struct ixgbe_adapter *)dev->data->dev_private;
+	int diag;
+	uint32_t vteiam;
+
+	/* Only one working core need to performance VF reset */
+	if (rte_spinlock_trylock(&adapter->vf_reset_lock)) {
+		/**
+		 * When fake rec/xmit is replaced, working thread may is running
+		 * into real RX/TX func, so wait long enough to assume all
+		 * working thread exit. The assumption is it will spend less
+		 * than 100us for each execution of RX and TX func.
+		 */
+		rte_delay_us(100);
+
+		do {
+			dev->data->dev_started = 0;
+			ixgbevf_dev_stop(dev);
+			rte_delay_us(1000000);
+
+			diag = ixgbevf_dev_start(dev);
+			if (diag) {
+				PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+					     "Failed to start device.");
+				return;
+			}
+			dev->data->dev_started = 1;
+			ixgbevf_dev_stats_reset(dev);
+			if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+			if (diag) {
+				PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+					     "Failed to update link.");
+			}
+
+			/**
+			 * When the PF link is down, there has chance
+			 * that VF cannot operate its registers. Will
+			 * check if the registers is written
+			 * successfully. If not, repeat stop/start until
+			 * the PF link is up, in other words, until the
+			 * registers can be written.
+			 */
+			vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+		/* Reference ixgbevf_intr_enable when checking */
+		} while (vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+		dev->rx_pkt_burst = adapter->rx_backup;
+		dev->tx_pkt_burst = adapter->tx_backup;
+		dev->dev_ops = &ixgbevf_eth_dev_ops;
+
+		/**
+		 * Wait a while to ensure other working thread is running with
+		 * real rx/tx func. Can avoid other working thread runs into and
+		 * reset device again.
+		 */
+		rte_delay_us(100);
+		rte_spinlock_unlock(&adapter->vf_reset_lock);
+	}
 }
 
 static int
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..daca27c 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -38,6 +38,7 @@
 #include "base/ixgbe_dcb_82598.h"
 #include "ixgbe_bypass.h"
 #include <rte_time.h>
+#include <rte_spinlock.h>
 
 /* need update link, bit flag */
 #define IXGBE_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
@@ -289,6 +290,9 @@ struct ixgbe_adapter {
 	struct rte_timecounter      systime_tc;
 	struct rte_timecounter      rx_tstamp_tc;
 	struct rte_timecounter      tx_tstamp_tc;
+	eth_rx_burst_t              rx_backup;
+	eth_tx_burst_t              tx_backup;
+	rte_spinlock_t              vf_reset_lock;
 };
 
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
@@ -396,6 +400,14 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbevf_recv_pkts_fake(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts);
+
+uint16_t ixgbevf_xmit_pkts_fake(void *tx_queue,
+				struct rte_mbuf **tx_pkts,
+				uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
@@ -442,4 +454,6 @@ uint32_t ixgbe_convert_vm_rx_mask_to_val(uint16_t rx_mask, uint32_t orig_val);
 
 int ixgbe_fdir_ctrl_func(struct rte_eth_dev *dev,
 			enum rte_filter_op filter_op, void *arg);
+
+void ixgbevf_dev_link_up_down_handler(struct rte_eth_dev *dev);
 #endif /* _IXGBE_ETHDEV_H_ */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9fb38a6..d99e1fe 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2344,6 +2344,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	txq->dev = dev;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
 	txq->tx_free_thresh = tx_free_thresh;
@@ -2623,6 +2624,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
 				 RTE_CACHE_LINE_SIZE, socket_id);
 	if (rxq == NULL)
 		return -ENOMEM;
+	rxq->dev = dev;
 	rxq->mb_pool = mp;
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
@@ -5245,3 +5247,35 @@ ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
 {
 	return -1;
 }
+
+/**
+ * A function for link up/down.
+ * Handle the link up/down event but not receiving.
+ */
+uint16_t
+ixgbevf_recv_pkts_fake(void *rx_queue,
+		       struct rte_mbuf __rte_unused **rx_pkts,
+		       uint16_t __rte_unused nb_pkts)
+{
+	struct ixgbe_rx_queue *rxq;
+
+	rxq = rx_queue;
+	ixgbevf_dev_link_up_down_handler(rxq->dev);
+	return 0;
+}
+
+/**
+ * A function for link up/down.
+ * Handle the link up/down event but not transmitting.
+ */
+uint16_t
+ixgbevf_xmit_pkts_fake(void *tx_queue,
+		       struct rte_mbuf __rte_unused **tx_pkts,
+		       uint16_t __rte_unused nb_pkts)
+{
+	struct ixgbe_tx_queue *txq;
+
+	txq = tx_queue;
+	ixgbevf_dev_link_up_down_handler(txq->dev);
+	return 0;
+}
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 3691a19..50971e1 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -125,6 +125,7 @@ struct ixgbe_rx_queue {
 	struct ixgbe_scattered_rx_entry *sw_sc_ring; /**< address of scattered Rx software ring. */
 	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
 	struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
+	struct rte_eth_dev *dev; /**< device this queue belongs to. */
 	uint64_t            mbuf_initializer; /**< value to init mbufs */
 	uint16_t            nb_rx_desc; /**< number of RX descriptors. */
 	uint16_t            rx_tail;  /**< current value of RDT register. */
@@ -212,6 +213,7 @@ struct ixgbe_tx_queue {
 		struct ixgbe_tx_entry_v *sw_ring_v; /**< address of SW ring for vector PMD */
 	};
 	volatile uint32_t   *tdt_reg_addr; /**< Address of TDT register. */
+	struct rte_eth_dev  *dev; /**< device this queue belongs to. */
 	uint16_t            nb_tx_desc;    /**< number of TX descriptors. */
 	uint16_t            tx_tail;       /**< current value of TDT reg. */
 	/**< Start freeing TX buffers if there are less free descriptors than
-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH 4/4] igb: automatic link recovery on VF
  2016-05-04 21:10 [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Wenzhuo Lu
                   ` (2 preceding siblings ...)
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF Wenzhuo Lu
@ 2016-05-04 21:10 ` Wenzhuo Lu
  2016-05-24  5:46 ` [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Lu, Wenzhuo
  4 siblings, 0 replies; 10+ messages in thread
From: Wenzhuo Lu @ 2016-05-04 21:10 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

When the physical link is down and recover later,
the VF link cannot recover until the user stop and
start it manually.
This patch implements the automatic recovery of VF
port.
The automatic recovery bases on the link up/down
message received from PF. When VF receives the link
up/down message, it will replace the RX/TX and
operation functions with fake ones to stop RX/TX
and any future operation. Then reset the VF port.
After successfully resetting the port, recover the
RX/TX and operation functions.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  2 +-
 drivers/net/e1000/e1000_ethdev.h       | 14 ++++++
 drivers/net/e1000/igb_ethdev.c         | 87 +++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_rxtx.c           | 38 +++++++++++++++
 4 files changed, 139 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d80f449..8144450 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -40,7 +40,7 @@ This section should contain new features added in this release. Sample format:
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
-* **Added the support of automatic link recovery for ixgbe VF.**
+* **Added the support of automatic link recovery for ixgbe/igb VF.**
 
   When the physical link becomes down and recover later, VF will receive
   the mailbox message for that. VF handles this message by resetting the
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index e8bf8da..1b71f9b 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -34,6 +34,7 @@
 #ifndef _E1000_ETHDEV_H_
 #define _E1000_ETHDEV_H_
 #include <rte_time.h>
+#include <rte_spinlock.h>
 
 /* need update link, bit flag */
 #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
@@ -261,6 +262,9 @@ struct e1000_adapter {
 	struct rte_timecounter  systime_tc;
 	struct rte_timecounter  rx_tstamp_tc;
 	struct rte_timecounter  tx_tstamp_tc;
+	eth_rx_burst_t rx_backup;
+	eth_tx_burst_t tx_backup;
+	rte_spinlock_t vf_reset_lock;
 };
 
 #define E1000_DEV_PRIVATE(adapter) \
@@ -316,6 +320,14 @@ uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igbvf_xmit_pkts_fake(void *txq,
+				  struct rte_mbuf **tx_pkts,
+				  uint16_t nb_pkts);
+
+uint16_t eth_igbvf_recv_pkts_fake(void *rxq,
+				  struct rte_mbuf **rx_pkts,
+				  uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_scattered_pkts(void *rxq,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 
@@ -388,4 +400,6 @@ void em_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 
 void igb_pf_host_uninit(struct rte_eth_dev *dev);
 
+void igbvf_dev_link_up_down_handler(struct rte_eth_dev *dev);
+
 #endif /* _E1000_ETHDEV_H_ */
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..5dc3182 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -411,6 +411,8 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
 	.get_reg              = igbvf_get_regs,
 };
 
+static const struct eth_dev_ops igbvf_eth_dev_ops_fake = {NULL};
+
 /* store statistics names and its offset in stats structure */
 struct rte_igb_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -911,6 +913,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	rte_spinlock_init(&adapter->vf_reset_lock);
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2641,6 +2644,8 @@ eth_igbvf_interrupt_get_status(struct rte_eth_dev *dev)
 
 void igbvf_mbx_process(struct rte_eth_dev *dev)
 {
+	struct e1000_adapter *adapter =
+		(struct e1000_adapter *)dev->data->dev_private;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct e1000_mbx_info *mbx = &hw->mbx;
@@ -2650,8 +2655,88 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
 		return;
 
 	/* PF reset VF event */
-	if (in_msg == E1000_PF_CONTROL_MSG)
+	if (in_msg == E1000_PF_CONTROL_MSG) {
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET);
+
+		/* Stop the ops and rx/tx */
+		if (dev->data->dev_started) {
+			PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+			dev->dev_ops = &igbvf_eth_dev_ops_fake;
+
+			adapter->rx_backup = dev->rx_pkt_burst;
+			adapter->tx_backup = dev->tx_pkt_burst;
+			dev->rx_pkt_burst = eth_igbvf_recv_pkts_fake;
+			dev->tx_pkt_burst = eth_igbvf_xmit_pkts_fake;
+		}
+	}
+}
+
+void
+igbvf_dev_link_up_down_handler(struct rte_eth_dev *dev)
+{
+	struct e1000_adapter *adapter =
+		(struct e1000_adapter *)dev->data->dev_private;
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag;
+	uint32_t eiam;
+
+	/* Reference igbvf_intr_enable */
+	uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+	/* Only one working core need to performance VF reset */
+	if (rte_spinlock_trylock(&adapter->vf_reset_lock)) {
+		/**
+		 * When fake rec/xmit is replaced, working thread may is running
+		 * into real RX/TX func, so wait long enough to assume all
+		 * working thread exit. The assumption is it will spend less
+		 * than 100us for each execution of RX and TX func.
+		 */
+		rte_delay_us(100);
+
+		do {
+			dev->data->dev_started = 0;
+			igbvf_dev_stop(dev);
+			rte_delay_us(1000000);
+
+			diag = igbvf_dev_start(dev);
+			if (diag) {
+				PMD_INIT_LOG(ERR, "Igb VF reset: "
+					     "Failed to start device.");
+				return;
+			}
+			dev->data->dev_started = 1;
+			eth_igbvf_stats_reset(dev);
+			if (dev->data->dev_conf.intr_conf.lsc == 0)
+				diag = eth_igb_link_update(dev, 0);
+			if (diag) {
+				PMD_INIT_LOG(INFO, "Igb VF reset: "
+					     "Failed to update link.");
+			}
+
+			/**
+			 * When the PF link is down, there has chance
+			 * that VF cannot operate its registers. Will
+			 * check if the registers is written
+			 * successfully. If not, repeat stop/start until
+			 * the PF link is up, in other words, until the
+			 * registers can be written.
+			 */
+			eiam = E1000_READ_REG(hw, E1000_EIAM);
+		} while (!(eiam & eiam_mbx));
+
+		dev->rx_pkt_burst = adapter->rx_backup;
+		dev->tx_pkt_burst = adapter->tx_backup;
+		dev->dev_ops = &igbvf_eth_dev_ops;
+
+		/**
+		 * Wait a while to ensure other working thread is running with
+		 * real rx/tx func. Can avoid other working thread runs into and
+		 * reset device again.
+		 */
+		rte_delay_us(100);
+		rte_spinlock_unlock(&adapter->vf_reset_lock);
+	}
 }
 
 static int
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 4a987e3..5e3b47b 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -117,6 +117,7 @@ struct igb_rx_queue {
 	struct igb_rx_entry *sw_ring;   /**< address of RX software ring. */
 	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
 	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	struct rte_eth_dev  *dev; /**< device this queue belongs to. */
 	uint16_t            nb_rx_desc; /**< number of RX descriptors. */
 	uint16_t            rx_tail;    /**< current value of RDT register. */
 	uint16_t            nb_rx_hold; /**< number of held free RX desc. */
@@ -185,6 +186,7 @@ struct igb_tx_queue {
 	uint64_t               tx_ring_phys_addr; /**< TX ring DMA address. */
 	struct igb_tx_entry    *sw_ring; /**< virtual address of SW ring. */
 	volatile uint32_t      *tdt_reg_addr; /**< Address of TDT register. */
+	struct rte_eth_dev     *dev; /**< device this queue belongs to. */
 	uint32_t               txd_type;      /**< Device-specific TXD type */
 	uint16_t               nb_tx_desc;    /**< number of TX descriptors. */
 	uint16_t               tx_tail; /**< Current value of TDT register. */
@@ -1344,6 +1346,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	txq->dev = dev;
 	txq->nb_tx_desc = nb_desc;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
 	txq->hthresh = tx_conf->tx_thresh.hthresh;
@@ -1461,6 +1464,7 @@ eth_igb_rx_queue_setup(struct rte_eth_dev *dev,
 			  RTE_CACHE_LINE_SIZE);
 	if (rxq == NULL)
 		return -ENOMEM;
+	rxq->dev = dev;
 	rxq->mb_pool = mp;
 	rxq->nb_rx_desc = nb_desc;
 	rxq->pthresh = rx_conf->rx_thresh.pthresh;
@@ -2524,3 +2528,37 @@ igb_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	qinfo->conf.tx_thresh.hthresh = txq->hthresh;
 	qinfo->conf.tx_thresh.wthresh = txq->wthresh;
 }
+
+/**
+ * A function for link up/down.
+ * Handle the link up/down event but not receiving.
+ */
+uint16_t
+eth_igbvf_xmit_pkts_fake(void *tx_queue,
+			 struct rte_mbuf __rte_unused **tx_pkts,
+			 uint16_t __rte_unused nb_pkts)
+{
+	struct igb_tx_queue *txq;
+
+	txq = tx_queue;
+	igbvf_dev_link_up_down_handler(txq->dev);
+
+	return 0;
+}
+
+/**
+ * A function for link up/down.
+ * Handle the link up/down event but not transmitting.
+ */
+uint16_t
+eth_igbvf_recv_pkts_fake(void *rx_queue,
+			 struct rte_mbuf __rte_unused **rx_pkts,
+			 uint16_t __rte_unused nb_pkts)
+{
+	struct igb_rx_queue *rxq;
+
+	rxq = rx_queue;
+	igbvf_dev_link_up_down_handler(rxq->dev);
+
+	return 0;
+}
-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF Wenzhuo Lu
@ 2016-05-16 12:01   ` Olivier Matz
  2016-05-17  1:11     ` Lu, Wenzhuo
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Matz @ 2016-05-16 12:01 UTC (permalink / raw)
  To: Wenzhuo Lu, dev

Hi Wenzhuo,

On 05/04/2016 11:10 PM, Wenzhuo Lu wrote:
> When the physical link is down and recover later,
> the VF link cannot recover until the user stop and
> start it manually.
> This patch implements the automatic recovery of VF
> port.
> The automatic recovery bases on the link up/down
> message received from PF. When VF receives the link
> up/down message, it will replace the RX/TX and
> operation functions with fake ones to stop RX/TX
> and any future operation. Then reset the VF port.
> After successfully resetting the port, recover the
> RX/TX and operation functions.
> 
> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> 
> [...]
> 
> +void
> +ixgbevf_dev_link_up_down_handler(struct rte_eth_dev *dev)
> +{
> +	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct ixgbe_adapter *adapter =
> +		(struct ixgbe_adapter *)dev->data->dev_private;
> +	int diag;
> +	uint32_t vteiam;
> +
> +	/* Only one working core need to performance VF reset */
> +	if (rte_spinlock_trylock(&adapter->vf_reset_lock)) {
> +		/**
> +		 * When fake rec/xmit is replaced, working thread may is running
> +		 * into real RX/TX func, so wait long enough to assume all
> +		 * working thread exit. The assumption is it will spend less
> +		 * than 100us for each execution of RX and TX func.
> +		 */
> +		rte_delay_us(100);
> +
> +		do {
> +			dev->data->dev_started = 0;
> +			ixgbevf_dev_stop(dev);
> +			rte_delay_us(1000000);

If I understand well, ixgbevf_dev_link_up_down_handler() is called
by ixgbevf_recv_pkts_fake() on a dataplane core. It means that the
core that acquired the lock will loop during 100us + 1sec at least.
If this core was also in charge of polling other queues of other
ports, or timers, many packets will be dropped (even with a 100us
loop). I don't think it is acceptable to actively wait inside a
rx function.

I think it would avoid many issues to delegate this work to the
application, maybe by notifying it that the port is in a bad state
and must be restarted. The application could then properly stop
polling the queues, and stop and restart the port in a separate thread,
without bothering the dataplane cores.


Regards,
Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
  2016-05-16 12:01   ` Olivier Matz
@ 2016-05-17  1:11     ` Lu, Wenzhuo
  2016-05-17  7:50       ` Olivier MATZ
  0 siblings, 1 reply; 10+ messages in thread
From: Lu, Wenzhuo @ 2016-05-17  1:11 UTC (permalink / raw)
  To: Olivier Matz, dev

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, May 16, 2016 8:01 PM
> To: Lu, Wenzhuo; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
> 
> Hi Wenzhuo,
> 
> On 05/04/2016 11:10 PM, Wenzhuo Lu wrote:
> > When the physical link is down and recover later, the VF link cannot
> > recover until the user stop and start it manually.
> > This patch implements the automatic recovery of VF port.
> > The automatic recovery bases on the link up/down message received from
> > PF. When VF receives the link up/down message, it will replace the
> > RX/TX and operation functions with fake ones to stop RX/TX and any
> > future operation. Then reset the VF port.
> > After successfully resetting the port, recover the RX/TX and operation
> > functions.
> >
> > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> >
> > [...]
> >
> > +void
> > +ixgbevf_dev_link_up_down_handler(struct rte_eth_dev *dev) {
> > +	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> > +	struct ixgbe_adapter *adapter =
> > +		(struct ixgbe_adapter *)dev->data->dev_private;
> > +	int diag;
> > +	uint32_t vteiam;
> > +
> > +	/* Only one working core need to performance VF reset */
> > +	if (rte_spinlock_trylock(&adapter->vf_reset_lock)) {
> > +		/**
> > +		 * When fake rec/xmit is replaced, working thread may is
> running
> > +		 * into real RX/TX func, so wait long enough to assume all
> > +		 * working thread exit. The assumption is it will spend less
> > +		 * than 100us for each execution of RX and TX func.
> > +		 */
> > +		rte_delay_us(100);
> > +
> > +		do {
> > +			dev->data->dev_started = 0;
> > +			ixgbevf_dev_stop(dev);
> > +			rte_delay_us(1000000);
> 
> If I understand well, ixgbevf_dev_link_up_down_handler() is called by
> ixgbevf_recv_pkts_fake() on a dataplane core. It means that the core that
> acquired the lock will loop during 100us + 1sec at least.
> If this core was also in charge of polling other queues of other ports, or timers,
> many packets will be dropped (even with a 100us loop). I don't think it is
> acceptable to actively wait inside a rx function.
> 
> I think it would avoid many issues to delegate this work to the application,
> maybe by notifying it that the port is in a bad state and must be restarted. The
> application could then properly stop polling the queues, and stop and restart the
> port in a separate thread, without bothering the dataplane cores.
Thanks for the comments.
Yes, you're right. I had a wrong assumption that every queue is handled by one core.
But surely it's not right, we cannot tell how the users will deploy their system.

I plan to update this patch set. The solution now is, first let the users choose if they want this
auto-reset feature. If so, we will apply another series rx/tx functions which have lock. So we
can stop the rx/tx of the bad ports.
And we also apply a reset API for users. The APPs should call this API in their management thread or so.
It means APPs should guarantee the thread safe for the API.
You see, there're 2 things,
1, Lock the rx/tx to stop them for users.
2, Apply a resetting API for users, and every NIC can do their own job. APPs need not to worry about the difference 
between different NICs.

Surely, it's not *automatic* now. The reason is DPDK doesn't guarantee the thread safe. So the operations have to be
left to the APPs and let them to guarantee the thread safe.

And if the users choose not using auto-reset feature, we will leave this work to the APP :)

> 
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
  2016-05-17  1:11     ` Lu, Wenzhuo
@ 2016-05-17  7:50       ` Olivier MATZ
  2016-05-17  8:20         ` Lu, Wenzhuo
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier MATZ @ 2016-05-17  7:50 UTC (permalink / raw)
  To: Lu, Wenzhuo, dev

Hi Wenzhuo,

On 05/17/2016 03:11 AM, Lu, Wenzhuo wrote:
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> If I understand well, ixgbevf_dev_link_up_down_handler() is called by
>> ixgbevf_recv_pkts_fake() on a dataplane core. It means that the core that
>> acquired the lock will loop during 100us + 1sec at least.
>> If this core was also in charge of polling other queues of other ports, or timers,
>> many packets will be dropped (even with a 100us loop). I don't think it is
>> acceptable to actively wait inside a rx function.
>>
>> I think it would avoid many issues to delegate this work to the application,
>> maybe by notifying it that the port is in a bad state and must be restarted. The
>> application could then properly stop polling the queues, and stop and restart the
>> port in a separate thread, without bothering the dataplane cores.
> Thanks for the comments.
> Yes, you're right. I had a wrong assumption that every queue is handled by one core.
> But surely it's not right, we cannot tell how the users will deploy their system.
>
> I plan to update this patch set. The solution now is, first let the users choose if they want this
> auto-reset feature. If so, we will apply another series rx/tx functions which have lock. So we
> can stop the rx/tx of the bad ports.
> And we also apply a reset API for users. The APPs should call this API in their management thread or so.
> It means APPs should guarantee the thread safe for the API.
> You see, there're 2 things,
> 1, Lock the rx/tx to stop them for users.
> 2, Apply a resetting API for users, and every NIC can do their own job. APPs need not to worry about the difference
> between different NICs.
>
> Surely, it's not *automatic* now. The reason is DPDK doesn't guarantee the thread safe. So the operations have to be
> left to the APPs and let them to guarantee the thread safe.
>
> And if the users choose not using auto-reset feature, we will leave this work to the APP :)

Yes, I think having 2 modes is a good approach:

- the first mode would let the application know a reset has to
   be performed, without active loop or lock in the rx/tx funcs.
- the second mode would transparently manage the reset in the driver,
   but may lock the core during some time.

By the way, you talk about a reset API, why not just using the
usual stop/start functions? I think it would work the same.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
  2016-05-17  7:50       ` Olivier MATZ
@ 2016-05-17  8:20         ` Lu, Wenzhuo
  0 siblings, 0 replies; 10+ messages in thread
From: Lu, Wenzhuo @ 2016-05-17  8:20 UTC (permalink / raw)
  To: Olivier MATZ, dev

Hi Olivier,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, May 17, 2016 3:51 PM
> To: Lu, Wenzhuo; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
> 
> Hi Wenzhuo,
> 
> On 05/17/2016 03:11 AM, Lu, Wenzhuo wrote:
> >> -----Original Message-----
> >> From: Olivier Matz [mailto:olivier.matz@6wind.com] If I understand
> >> well, ixgbevf_dev_link_up_down_handler() is called by
> >> ixgbevf_recv_pkts_fake() on a dataplane core. It means that the core
> >> that acquired the lock will loop during 100us + 1sec at least.
> >> If this core was also in charge of polling other queues of other
> >> ports, or timers, many packets will be dropped (even with a 100us
> >> loop). I don't think it is acceptable to actively wait inside a rx function.
> >>
> >> I think it would avoid many issues to delegate this work to the
> >> application, maybe by notifying it that the port is in a bad state
> >> and must be restarted. The application could then properly stop
> >> polling the queues, and stop and restart the port in a separate thread,
> without bothering the dataplane cores.
> > Thanks for the comments.
> > Yes, you're right. I had a wrong assumption that every queue is handled by one
> core.
> > But surely it's not right, we cannot tell how the users will deploy their system.
> >
> > I plan to update this patch set. The solution now is, first let the
> > users choose if they want this auto-reset feature. If so, we will
> > apply another series rx/tx functions which have lock. So we can stop the rx/tx
> of the bad ports.
> > And we also apply a reset API for users. The APPs should call this API in their
> management thread or so.
> > It means APPs should guarantee the thread safe for the API.
> > You see, there're 2 things,
> > 1, Lock the rx/tx to stop them for users.
> > 2, Apply a resetting API for users, and every NIC can do their own
> > job. APPs need not to worry about the difference between different NICs.
> >
> > Surely, it's not *automatic* now. The reason is DPDK doesn't guarantee
> > the thread safe. So the operations have to be left to the APPs and let them to
> guarantee the thread safe.
> >
> > And if the users choose not using auto-reset feature, we will leave
> > this work to the APP :)
> 
> Yes, I think having 2 modes is a good approach:
> 
> - the first mode would let the application know a reset has to
>    be performed, without active loop or lock in the rx/tx funcs.
> - the second mode would transparently manage the reset in the driver,
>    but may lock the core during some time.
For the second mode, at first we want to let the driver manage the reset transparently. But the bad news is
in driver layer the operations is not thread safe. If we want the reset to be transparent,
we need a whole new mechanism to guarantee the thread safe for the operations in driver layer.
Obviously, it need to be discussed and cannot be finished in this release.
So now we write a reset API for APP, and let APP call this API and guarantee the thread safe for all the operations.
It's not transparent. But seems it's what we can do at this stage.

> 
> By the way, you talk about a reset API, why not just using the usual stop/start
> functions? I think it would work the same.
For ixgbe/igb, stop/start is enough. But for i40e, some other work should be done. (For example, the resource of the queues should be re-init.)
So we think about introducing a new API, then different NICs can do what they have to do.

> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF
  2016-05-04 21:10 [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Wenzhuo Lu
                   ` (3 preceding siblings ...)
  2016-05-04 21:10 ` [dpdk-dev] [PATCH 4/4] igb: " Wenzhuo Lu
@ 2016-05-24  5:46 ` Lu, Wenzhuo
  4 siblings, 0 replies; 10+ messages in thread
From: Lu, Wenzhuo @ 2016-05-24  5:46 UTC (permalink / raw)
  To: Lu, Wenzhuo, dev

Hi,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu
> Sent: Thursday, May 5, 2016 5:11 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF
> 
> Now if the PF link is down and up, VF doesn't handle this event, user need to
> reset the VF port to let it recover.
> This patch set addes the support of the mailbox interruption on VF. So, VF can
> receice the messges for physical link down/up.
> And VF will handle this event and let the VF link recover automatically.
> 
> Wenzhuo Lu (4):
>   ixgbe: VF supports mailbox interruption for PF link up/down
>   igb: VF supports mailbox interruption for PF link up/down
>   ixgbe: automatic link recovery on VF
>   igb: automatic link recovery on VF
> 
>  doc/guides/rel_notes/release_16_07.rst |  11 ++
>  drivers/net/e1000/e1000_ethdev.h       |  14 ++
>  drivers/net/e1000/igb_ethdev.c         | 244
> +++++++++++++++++++++++++++++++++
>  drivers/net/e1000/igb_rxtx.c           |  38 +++++
>  drivers/net/ixgbe/ixgbe_ethdev.c       | 169 ++++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h       |  14 ++
>  drivers/net/ixgbe/ixgbe_rxtx.c         |  34 +++++
>  drivers/net/ixgbe/ixgbe_rxtx.h         |   2 +
>  8 files changed, 523 insertions(+), 3 deletions(-)
> 
> --
> 1.9.3
Self Nack. Will split this patch set to 2 ones. One provides support of the mailbox interruption for PF link up/down.
The other is for the mechanism of VF link recovery. For there's discussion about if we need to handle the link up/down
event in driver layer.  No matter the driver should handle the event or not, we need to mailbox interruption first.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-24  5:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-04 21:10 [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Wenzhuo Lu
2016-05-04 21:10 ` [dpdk-dev] [PATCH 1/4] ixgbe: VF supports mailbox interruption for PF link up/down Wenzhuo Lu
2016-05-04 21:10 ` [dpdk-dev] [PATCH 2/4] igb: " Wenzhuo Lu
2016-05-04 21:10 ` [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF Wenzhuo Lu
2016-05-16 12:01   ` Olivier Matz
2016-05-17  1:11     ` Lu, Wenzhuo
2016-05-17  7:50       ` Olivier MATZ
2016-05-17  8:20         ` Lu, Wenzhuo
2016-05-04 21:10 ` [dpdk-dev] [PATCH 4/4] igb: " Wenzhuo Lu
2016-05-24  5:46 ` [dpdk-dev] [PATCH 0/4] automatic link recovery on ixgbe/igb VF Lu, Wenzhuo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).