DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe)
@ 2017-01-10  2:42 Ruslan Nikolaev
  2017-01-10  2:47 ` Ruslan Nikolaev
  2017-01-10  2:48 ` Ruslan Nikolaev
  0 siblings, 2 replies; 4+ messages in thread
From: Ruslan Nikolaev @ 2017-01-10  2:42 UTC (permalink / raw)
  To: dev

Attached are 2 patches, and the discussion below is related to the slightly modified version of the dpdk 16.07 library: interrupts_excerpt.patch and dpdk_vfreset.patch

1. We use a single-shot interrupt mechanism for the RX queue (vector #1, intr_handle.efds[0]file descriptor).

When we receive first interrupt, we start a polling thread. When the polling thread becomes idle again, we enable interrupts.
(to enable interrupts, we use rte_eth_dev_rx_intr_enable, queue_id = 0)

2. We enable interrupts right away for mailbox, reset adapter notifications (vector #0, intr_handle.fd file descriptor)
(to enable interrupts, we use rte_eth_dev_rx_intr_enable, queue_id = UINT16_MAX which we reserved for non-RX interrupts)

3. Changes related to interrupt setup and enabling/disabling are in interrupt_excerpt.patch

Changes: Seems like writing to the register already implies OR-semantic in interrupt enabling, so it does not seem be necessary to read previous value of the register (especially that now we have to have 2 vectors and want to avoid any race condition between reading and writing the register). Also, rte_intr_enable is going to write the same configuration to VFIO which does not seem to be necessary. Could you confirm and/or clarify that?

For disabling interrupt, it seems we have to use a different register.

4. Changes related to resetting devices are in dpdk_vfreset.patch

We used an unofficial patch from http://dpdk.org/dev/patchwork/patch/14009/ as the model. The patch is doing pretty the same thing but just maintains a state machine for our convenience, so that we can have a loop outside the reset function.

5. We see an intermittent stall of interrupt vector #1 when links are toggled. It does not always happen but only intermittently. Vector #0 still seems to work fine because we are able to get mailbox interrupts (when the adapter is reset).

Our current suspicion is that it may have something to do to the reset adapter handling in the unofficial patch (which, in turns, relies on dev_stop/dev_start functions). It appears that vector #1 (RX) interrupts are stalled intermittently only after the adapter reset takes place.

Please give your advice / suggestions.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe)
  2017-01-10  2:42 [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe) Ruslan Nikolaev
@ 2017-01-10  2:47 ` Ruslan Nikolaev
  2017-01-10  2:48 ` Ruslan Nikolaev
  1 sibling, 0 replies; 4+ messages in thread
From: Ruslan Nikolaev @ 2017-01-10  2:47 UTC (permalink / raw)
  To: dev

interrupts_excerpts.patch:

drivers/net/ixgbe/ixgbe_ethdev.c

eth_ixgbevf_dev_init:

@@ -1462,8 +1467,9 @@
        rte_intr_callback_register(&pci_dev->intr_handle,
                                   ixgbevf_dev_interrupt_handler,
                                   (void *)eth_dev);
-       rte_intr_enable(&pci_dev->intr_handle);
-       ixgbevf_intr_enable(hw);
+
+       /* XXX: not enabling interrupts here since they are enabled in dev_start anyway,
+          and we do not have correct number of interrupt vectors here yet. */

        PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x mac.type=%s",
                     eth_dev->data->port_id, pci_dev->id.vendor_id,

ixgbevf_dev_start:
@@ -4168,7 +4174,7 @@

        /* check and configure queue intr-vector mapping */
        if (dev->data->dev_conf.intr_conf.rxq != 0) {
-               intr_vector = dev->data->nb_rx_queues;
+               intr_vector = 1;
                if (rte_intr_efd_enable(intr_handle, intr_vector))
                        return -1;
        }

@@ -4789,31 +4801,27 @@
 static int
 ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 {
-       uint32_t mask;
+       /* use vector #0 for mailbox interrupts and vector #1 for all RX queues */
+       uint8_t idx = (queue_id < IXGBE_MAX_QUEUE_NUM_PER_VF);
+       uint32_t mask = 1U << idx;
        struct ixgbe_hw *hw =
                IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

-       mask = IXGBE_READ_REG(hw, IXGBE_VTEIMS);
-       mask |= (1 << IXGBE_MISC_VEC_ID);
-       RTE_SET_USED(queue_id);
        IXGBE_WRITE_REG(hw, IXGBE_VTEIMS, mask);

-       rte_intr_enable(&dev->pci_dev->intr_handle);
-
        return 0;
 }

 static int
 ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 {
-       uint32_t mask;
+       /* use vector #0 for mailbox interrupts and vector #1 for all RX queues */
+       uint8_t idx = (queue_id < IXGBE_MAX_QUEUE_NUM_PER_VF);
+       uint32_t mask = 1U << idx;
        struct ixgbe_hw *hw =
                IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

-       mask = IXGBE_READ_REG(hw, IXGBE_VTEIMS);
-       mask &= ~(1 << IXGBE_MISC_VEC_ID);
-       RTE_SET_USED(queue_id);
-       IXGBE_WRITE_REG(hw, IXGBE_VTEIMS, mask);
+       IXGBE_WRITE_REG(hw, IXGBE_VTEIMC, mask);

        return 0;
 }

ixgbevf_configure_msix:
@@ -4948,10 +4956,9 @@
        struct ixgbe_hw *hw =
                IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
        uint32_t q_idx;
-       uint32_t vector_idx = IXGBE_MISC_VEC_ID;

        /* Configure VF other cause ivar */
-       ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+       ixgbevf_set_ivar_map(hw, -1, 1, IXGBE_MISC_VEC_ID);

        /* won't configure msix register if no mapping is done
         * between intr vector and event fd.
@@ -4961,11 +4968,9 @@

        /* Configure all RX queues of VF */
        for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
-               /* Force all queue use vector 0,
-                * as IXGBE_VF_MAXMSIVECOTR = 1
-                */
-               ixgbevf_set_ivar_map(hw, 0, q_idx, vector_idx);
-               intr_handle->intr_vec[q_idx] = vector_idx;
+               /* Force all queues to use vector 1 */
+               ixgbevf_set_ivar_map(hw, 0, q_idx, IXGBE_RX_VEC_START);
+               intr_handle->intr_vec[q_idx] = IXGBE_RX_VEC_START;
        }
 }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe)
  2017-01-10  2:42 [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe) Ruslan Nikolaev
  2017-01-10  2:47 ` Ruslan Nikolaev
@ 2017-01-10  2:48 ` Ruslan Nikolaev
  2017-01-10 10:39   ` Mcnamara, John
  1 sibling, 1 reply; 4+ messages in thread
From: Ruslan Nikolaev @ 2017-01-10  2:48 UTC (permalink / raw)
  To: dev

dpdk_vfreset.patch:

diff -urN dpdk-16.07/doc/guides/nics/overview.rst dpdk-16.07-new/doc/guides/nics/overview.rst
--- dpdk-16.07/doc/guides/nics/overview.rst	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/doc/guides/nics/overview.rst	2016-12-15 17:32:27.436425563 -0800
@@ -89,6 +89,7 @@
    Speed capabilities
    Link status            Y Y Y   Y Y   Y Y Y     Y   Y Y Y Y         Y Y         Y Y   Y Y Y Y Y
    Link status event      Y Y       Y     Y Y     Y   Y Y             Y Y         Y Y     Y Y
+   Link reset                               Y Y   Y     Y Y
    Queue status event                                                                       Y
    Rx interrupt                     Y     Y Y Y Y Y Y Y Y Y Y Y Y Y Y
    Queue start/stop           Y   Y   Y Y Y Y Y Y     Y Y     Y Y Y Y Y Y               Y Y   Y Y
diff -urN dpdk-16.07/doc/guides/rel_notes/release_16_07.rst dpdk-16.07-new/doc/guides/rel_notes/release_16_07.rst
--- dpdk-16.07/doc/guides/rel_notes/release_16_07.rst	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/doc/guides/rel_notes/release_16_07.rst	2016-12-15 17:32:27.436425563 -0800
@@ -15,6 +15,15 @@
 
       firefox build/doc/html/guides/rel_notes/release_16_07.html
 
+* **Added device reset support for ixgbe VF.**
+
+  Added the device reset API. APP can call this API to reset the VF port
+  when it's not working.
+  Based on the mailbox interruption support, when VF reseives the control
+  message from PF, it means the PF link state changes, VF uses the reset
+  callback in the message handler to notice the APP. APP need call the device
+  reset API to reset the VF port.
+
 
 New Features
 ------------
diff -urN dpdk-16.07/drivers/net/ixgbe/ixgbe_ethdev.c dpdk-16.07-new/drivers/net/ixgbe/ixgbe_ethdev.c
--- dpdk-16.07/drivers/net/ixgbe/ixgbe_ethdev.c	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/drivers/net/ixgbe/ixgbe_ethdev.c	2016-12-15 17:47:51.212425563 -0800
@@ -388,6 +388,10 @@
 static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *udp_tunnel);
 
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev,
+			     struct rte_eth_reset_state *state,
+			     uint32_t nonce);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -593,6 +597,7 @@
 	.reta_query           = ixgbe_dev_rss_reta_query,
 	.rss_hash_update      = ixgbe_dev_rss_hash_update,
 	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
+	.dev_reset            = ixgbevf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -4157,7 +4162,9 @@
 		ETH_VLAN_EXTEND_MASK;
 	ixgbevf_vlan_offload_set(dev, mask);
 
-	ixgbevf_dev_rxtx_start(dev);
+	err = ixgbevf_dev_rxtx_start(dev);
+	if (err)
+		return err;
 
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7305,6 +7312,75 @@
 }
 
 static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev, struct rte_eth_reset_state *state, uint32_t nonce)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag = 0;
+	uint32_t vteiam;
+	struct timespec ts;
+
+	/* STATES: [0] initial, [2] stopped, and [1] started (reset complete). */
+	if (state->state <= 1)
+	{
+		/* Nothing needs to be done if reset is complete and the nonce is the same. */
+		if (state->state == 1 && state->nonce == nonce)
+			return 0;
+
+		state->nonce = nonce;
+		state->state = 2;
+		/* Performance VF reset. */
+		dev->data->dev_started = 0;
+		ixgbevf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+		}
+		clock_gettime(CLOCK_MONOTONIC_COARSE, &ts);
+		state->time = ts.tv_sec;
+		return 1;
+	}
+
+	/* Delay of 1s */
+	clock_gettime(CLOCK_MONOTONIC_COARSE, &ts);
+	if (ts.tv_sec - state->time < 1) {
+		return 1;
+	}
+
+	state->state = 0;
+	diag = ixgbevf_dev_start(dev);
+	/* If fail to start the device, need to stop/start it again. */
+	if (diag) {
+		PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+			     "Failed to start device.");
+		return 1;
+	}
+	dev->data->dev_started = 1;
+	ixgbevf_dev_stats_reset(dev);
+	if (dev->data->dev_conf.intr_conf.lsc == 0)
+		diag = ixgbe_dev_link_update(dev, 0);
+	if (diag) {
+		PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+			     "Failed to update link.");
+	}
+
+	/**
+	 * When the PF link is down, there has chance
+	 * that VF cannot operate its registers. Will
+	 * check if the registers is written
+	 * successfully. If not, repeat stop/start until
+	 * the PF link is up, in other words, until the
+	 * registers can be written.
+	 */
+	vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+	/* Reference ixgbevf_intr_enable when checking. */
+	state->state = (vteiam == IXGBE_VF_IRQ_ENABLE_MASK && state->nonce == nonce);
+	/* The state is going to be 1 if successful and 0 -- otherwise. */
+	return !state->state;
+}
+
+static int
 ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
 {
 	uint32_t eicr;
diff -urN dpdk-16.07/drivers/net/ixgbe/ixgbe_ethdev.h dpdk-16.07-new/drivers/net/ixgbe/ixgbe_ethdev.h
--- dpdk-16.07/drivers/net/ixgbe/ixgbe_ethdev.h	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/drivers/net/ixgbe/ixgbe_ethdev.h	2016-12-15 17:32:27.436425563 -0800
@@ -377,7 +377,7 @@
 
 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
 
-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
 
 uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
diff -urN dpdk-16.07/drivers/net/ixgbe/ixgbe_rxtx.c dpdk-16.07-new/drivers/net/ixgbe/ixgbe_rxtx.c
--- dpdk-16.07/drivers/net/ixgbe/ixgbe_rxtx.c	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/drivers/net/ixgbe/ixgbe_rxtx.c	2016-12-15 17:32:27.440425563 -0800
@@ -5247,7 +5247,7 @@
 /*
  * [VF] Start Transmit and Receive Units.
  */
-void __attribute__((cold))
+int __attribute__((cold))
 ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw     *hw;
@@ -5283,8 +5283,10 @@
 			rte_delay_ms(1);
 			txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
 		} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
-		if (!poll_ms)
+		if (!poll_ms) {
 			PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+			return -1;
+		}
 	}
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 
@@ -5300,12 +5302,16 @@
 			rte_delay_ms(1);
 			rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
 		} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
-		if (!poll_ms)
+		if (!poll_ms) {
 			PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+			return -1;
+		}
 		rte_wmb();
 		IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);
 
 	}
+
+	return 0;
 }
 
 /* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
diff -urN dpdk-16.07/lib/librte_ether/rte_ethdev.c dpdk-16.07-new/lib/librte_ether/rte_ethdev.c
--- dpdk-16.07/lib/librte_ether/rte_ethdev.c	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/lib/librte_ether/rte_ethdev.c	2016-12-15 17:32:27.440425563 -0800
@@ -3446,3 +3446,20 @@
 				-ENOTSUP);
 	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
 }
+
+int
+rte_eth_dev_reset(uint8_t port_id, struct rte_eth_reset_state *state, uint32_t nonce)
+{
+	struct rte_eth_dev *dev;
+	int diag;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+	diag = (*dev->dev_ops->dev_reset)(dev, state, nonce);
+
+	return diag;
+}
diff -urN dpdk-16.07/lib/librte_ether/rte_ethdev.h dpdk-16.07-new/lib/librte_ether/rte_ethdev.h
--- dpdk-16.07/lib/librte_ether/rte_ethdev.h	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/lib/librte_ether/rte_ethdev.h	2016-12-15 17:40:44.936425563 -0800
@@ -980,6 +980,15 @@
 };
 
 /**
+ * A structure to maintain reset state.
+ */
+struct rte_eth_reset_state {
+	time_t time;
+	uint32_t nonce;
+	uint8_t state;
+};
+
+/**
  * RX/TX queue states
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
@@ -1347,6 +1356,11 @@
 	 uint8_t en);
 /**< @internal enable/disable the l2 tunnel offload functions */
 
+typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev,
+				struct rte_eth_reset_state *state,
+				uint32_t nonce);
+/**< @internal Function used to reset a configured Ethernet device. */
+
 #ifdef RTE_NIC_BYPASS
 
 enum {
@@ -1537,6 +1551,8 @@
 	eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
 	/** Enable/disable l2 tunnel offload functions */
 	eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+	/** Reset device. */
+	eth_dev_reset_t dev_reset;
 };
 
 /**
@@ -4368,6 +4384,30 @@
 int
 rte_eth_dev_get_name_by_port(uint8_t port_id, char *name);
 
+/**
+ * Reset an ethernet device when it's not working. One scenario is, after PF
+ * port is down and up, the related VF port should be reset.
+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.
+ * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
+ * APP can drop the packets and release the buffer instead of sending them.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param state
+ *   The reset state (must be initialized to 0 initially).
+ * @param nonce
+ *   A 32-bit number indicating the current reset attempt.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (1) needs to be called again.
+ *   - (-ENODEV) if port identifier is invalid.
+ *   - (-ENOTSUP) if hardware doesn't support this function.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id, struct rte_eth_reset_state *state, uint32_t nonce);
+
 #ifdef __cplusplus
 }
 #endif
diff -urN dpdk-16.07/lib/librte_ether/rte_ether_version.map dpdk-16.07-new/lib/librte_ether/rte_ether_version.map
--- dpdk-16.07/lib/librte_ether/rte_ether_version.map	2016-07-28 11:48:41.000000000 -0700
+++ dpdk-16.07-new/lib/librte_ether/rte_ether_version.map	2016-12-15 17:32:27.440425563 -0800
@@ -138,4 +138,5 @@
 	rte_eth_dev_get_name_by_port;
 	rte_eth_dev_get_port_by_name;
 	rte_eth_xstats_get_names;
+	rte_eth_dev_reset;
 } DPDK_16.04;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe)
  2017-01-10  2:48 ` Ruslan Nikolaev
@ 2017-01-10 10:39   ` Mcnamara, John
  0 siblings, 0 replies; 4+ messages in thread
From: Mcnamara, John @ 2017-01-10 10:39 UTC (permalink / raw)
  To: Ruslan Nikolaev, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ruslan Nikolaev
> Sent: Tuesday, January 10, 2017 2:48 AM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes
> after VF reset (ixgbe)
> 
> dpdk_vfreset.patch:
> 
> diff -urN dpdk-16.07/doc/guides/nics/overview.rst dpdk-16.07-
> new/doc/guides/nics/overview.rst
> --- dpdk-16.07/doc/guides/nics/overview.rst	2016-07-28
> 11:48:41.000000000 -0700
> +++ dpdk-16.07-new/doc/guides/nics/overview.rst	2016-12-15
> 17:32:27.436425563 -0800
> @@ -89,6 +89,7 @@
>     Speed capabilities
>     Link status            Y Y Y   Y Y   Y Y Y     Y   Y Y Y Y         Y Y
> Y Y   Y Y Y Y Y
>     Link status event      Y Y       Y     Y Y     Y   Y Y             Y Y
> Y Y     Y Y
> +   Link reset                               Y Y   Y     Y Y
>     Queue status event
> Y
>     Rx interrupt                     Y     Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>     Queue start/stop           Y   Y   Y Y Y Y Y Y     Y Y     Y Y Y Y Y Y
> Y Y   Y Y

Hi,

Overview.rst is no longer in this format. 

Capabilities should now be added to the ini files in doc/guides/nics/features.

John

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-10 10:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-10  2:42 [dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe) Ruslan Nikolaev
2017-01-10  2:47 ` Ruslan Nikolaev
2017-01-10  2:48 ` Ruslan Nikolaev
2017-01-10 10:39   ` Mcnamara, John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).