DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v2 00/10] igb_uio patches
@ 2014-06-06 23:50 Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 01/10] igb_uio: use kernel standard log message Stephen Hemminger
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

These apply to the current DPDK tree, and are an alternative to
the patches from Alan. It provides indication of IRQ mode via
sysfs attribute.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 01/10] igb_uio: use kernel standard log message
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 02/10] igb_uio: use standard uio naming Stephen Hemminger
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-msg.patch --]
[-- Type: text/plain, Size: 2876 bytes --]

Use Linux kernel standard coding conventions for console messages.
Bare use of printk() is not desirable and is reported as a style
problem by checkpatch. Instead use pr_info() and dev_info()
 to print out log messages where appropriate.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -22,6 +22,8 @@
  *   Intel Corporation
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -314,7 +316,7 @@
 	pci_unlock(pdev);
 spin_unlock:
 	spin_unlock_irqrestore(&udev->lock, flags);
-	printk(KERN_INFO "irq 0x%x %s\n", irq, (ret == IRQ_HANDLED) ? "handled" : "not handled");
+	pr_info("irq 0x%x %s\n", irq, (ret == IRQ_HANDLED) ? "handled" : "not handled");
 
 	return ret;
 }
@@ -477,7 +479,7 @@
 	 * memory
 	 */
 	if (pci_enable_device(dev)) {
-		printk(KERN_ERR "Cannot enable PCI device\n");
+		dev_err(&dev->dev, "Cannot enable PCI device\n");
 		goto fail_free;
 	}
 
@@ -486,7 +488,7 @@
 	 * module
 	 */
 	if (pci_request_regions(dev, "igb_uio")) {
-		printk(KERN_ERR "Cannot request regions\n");
+		dev_err(&dev->dev, "Cannot request regions\n");
 		goto fail_disable;
 	}
 
@@ -499,10 +501,10 @@
 
 	/* set 64-bit DMA mask */
 	if (pci_set_dma_mask(dev,  DMA_BIT_MASK(64))) {
-		printk(KERN_ERR "Cannot set DMA mask\n");
+		dev_err(&dev->dev, "Cannot set DMA mask\n");
 		goto fail_release_iomem;
 	} else if (pci_set_consistent_dma_mask(dev, DMA_BIT_MASK(64))) {
-		printk(KERN_ERR "Cannot set consistent DMA mask\n");
+		dev_err(&dev->dev, "Cannot set consistent DMA mask\n");
 		goto fail_release_iomem;
 	}
 
@@ -533,7 +535,7 @@
 		}
 		else {
 			pci_disable_msix(udev->pdev);
-			printk(KERN_INFO "fail to enable pci msix, or not enough msix entries\n");
+			pr_info("fail to enable pci msix, or not enough msix entries\n");
 		}
 	}
 	switch (udev->mode) {
@@ -585,7 +587,7 @@
 	struct uio_info *info = pci_get_drvdata(dev);
 
 	if (info->priv == NULL) {
-		printk(KERN_DEBUG "Not igbuio device\n");
+		pr_notice("Not igbuio device\n");
 		return;
 	}
 
@@ -605,18 +607,18 @@
 igbuio_config_intr_mode(char *intr_str)
 {
 	if (!intr_str) {
-		printk(KERN_INFO "Use MSIX interrupt by default\n");
+		pr_info("Use MSIX interrupt by default\n");
 		return 0;
 	}
 
 	if (!strcmp(intr_str, "msix")) {
 		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
-		printk(KERN_INFO "Use MSIX interrupt\n");
+		pr_info("Use MSIX interrupt\n");
 	} else if (!strcmp(intr_str, "legacy")) {
 		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
-		printk(KERN_INFO "Use legacy interrupt\n");
+		pr_info("Use legacy interrupt\n");
 	} else {
-		printk(KERN_INFO "Error: bad parameter - %s\n", intr_str);
+		pr_info("Error: bad parameter - %s\n", intr_str);
 		return -EINVAL;
 	}
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 02/10] igb_uio: use standard uio naming
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 01/10] igb_uio: use kernel standard log message Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 03/10] igb_uio: fix checkpatch warnings Stephen Hemminger
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-name.patch --]
[-- Type: text/plain, Size: 577 bytes --]

Don't put capitialization and space in name since it will show
up in /proc/interrupts. Instead use driver name to follow the
conventions used in the kernel by other drivers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -509,7 +509,7 @@
 	}
 
 	/* fill uio infos */
-	udev->info.name = "Intel IGB UIO";
+	udev->info.name = "igb_uio";
 	udev->info.version = "0.1";
 	udev->info.handler = igbuio_pci_irqhandler;
 	udev->info.irqcontrol = igbuio_pci_irqcontrol;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 03/10] igb_uio: fix checkpatch warnings
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 01/10] igb_uio: use kernel standard log message Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 02/10] igb_uio: use standard uio naming Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 04/10] igb_uio: dont wrap pci_num_vf function needlessly Stephen Hemminger
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-checkpatch.diff --]
[-- Type: text/plain, Size: 4475 bytes --]

Fix whitespace and other style issues reported by checkpatch.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -1,23 +1,23 @@
 /*-
  * GPL LICENSE SUMMARY
- * 
+ *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- * 
+ *
  *   This program is free software; you can redistribute it and/or modify
  *   it under the terms of version 2 of the GNU General Public License as
  *   published by the Free Software Foundation.
- * 
+ *
  *   This program is distributed in the hope that it will be useful, but
  *   WITHOUT ANY WARRANTY; without even the implied warranty of
  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  *   General Public License for more details.
- * 
+ *
  *   You should have received a copy of the GNU General Public License
  *   along with this program; if not, write to the Free Software
  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
  *   The full GNU General Public License is included in this distribution
  *   in the file called LICENSE.GPL.
- * 
+ *
  *   Contact Information:
  *   Intel Corporation
  */
@@ -32,7 +32,7 @@
 #include <linux/msi.h>
 #include <linux/version.h>
 
-#ifdef CONFIG_XEN_DOM0 
+#ifdef CONFIG_XEN_DOM0
 #include <xen/xen.h>
 #endif
 
@@ -71,7 +71,7 @@
 		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
-static char *intr_mode = NULL;
+static char *intr_mode;
 static enum igbuio_intr_mode igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
 
 /* PCI device id table */
@@ -109,11 +109,11 @@
 		u16 total;
 		u16 initial;
 		u16 nr_virtfn;
-	} *iov = (struct iov*)dev->sriov;
+	} *iov = (struct iov *)dev->sriov;
 
 	if (!dev->is_physfn)
 		return 0;
-	
+
 	return iov->nr_virtfn;
 #else
 	return pci_num_vf(dev);
@@ -146,13 +146,13 @@
 	else /* do nothing if change max_vfs number */
 		err = -EINVAL;
 
-	return err ? err : count;							
+	return err ? err : count;
 }
 
 static DEVICE_ATTR(max_vfs, S_IRUGO | S_IWUSR, show_max_vfs, store_max_vfs);
 static struct attribute *dev_attrs[] = {
 	&dev_attr_max_vfs.attr,
-        NULL,
+	NULL,
 };
 
 static const struct attribute_group dev_attr_grp = {
@@ -338,7 +338,7 @@
 }
 
 /**
- * This is uio device mmap method which will use igbuio mmap for Xen 
+ * This is uio device mmap method which will use igbuio mmap for Xen
  * Dom0 enviroment.
  */
 static int
@@ -346,9 +346,9 @@
 {
 	int idx;
 
-	if (vma->vm_pgoff >= MAX_UIO_MAPS) 
+	if (vma->vm_pgoff >= MAX_UIO_MAPS)
 		return -EINVAL;
-	if(info->mem[vma->vm_pgoff].size == 0)
+	if (info->mem[vma->vm_pgoff].size == 0)
 		return  -EINVAL;
 
 	idx = (int)vma->vm_pgoff;
@@ -360,7 +360,7 @@
 	default:
 		return -EINVAL;
 	}
-}       
+}
 #endif
 
 /* Remap pci resources described by bar #pci_bar in uio resource n. */
@@ -371,8 +371,8 @@
 	unsigned long addr, len;
 	void *internal_addr;
 
-	if (sizeof(info->mem) / sizeof (info->mem[0]) <= n)  
-		return (EINVAL);
+	if (sizeof(info->mem) / sizeof(info->mem[0]) <= n)
+		return -EINVAL;
 
 	addr = pci_resource_start(dev, pci_bar);
 	len = pci_resource_len(dev, pci_bar);
@@ -396,20 +396,20 @@
 {
 	unsigned long addr, len;
 
-	if (sizeof(info->port) / sizeof (info->port[0]) <= n)  
-		return (EINVAL);
+	if (sizeof(info->port) / sizeof(info->port[0]) <= n)
+		return -EINVAL;
 
 	addr = pci_resource_start(dev, pci_bar);
 	len = pci_resource_len(dev, pci_bar);
 	if (addr == 0 || len == 0)
-		return (-1);
+		return -1;
 
 	info->port[n].name = name;
 	info->port[n].start = addr;
 	info->port[n].size = len;
 	info->port[n].porttype = UIO_PORT_X86;
 
-	return (0);
+	return 0;
 }
 
 /* Unmap previously ioremap'd resources */
@@ -445,14 +445,18 @@
 				pci_resource_start(dev, i) != 0) {
 			flags = pci_resource_flags(dev, i);
 			if (flags & IORESOURCE_MEM) {
-				if ((ret = igbuio_pci_setup_iomem(dev, info,
-						iom, i, bar_names[i])) != 0)
-					return (ret);
+				ret = igbuio_pci_setup_iomem(dev, info,
+							     iom, i,
+							     bar_names[i]);
+				if (ret != 0)
+					return ret;
 				iom++;
 			} else if (flags & IORESOURCE_IO) {
-				if ((ret = igbuio_pci_setup_ioport(dev, info,
-						iop, i, bar_names[i])) != 0)
-					return (ret);
+				ret = igbuio_pci_setup_ioport(dev, info,
+							      iop, i,
+							      bar_names[i]);
+				if (ret != 0)
+					return ret;
 				iop++;
 			}
 		}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 04/10] igb_uio: dont wrap pci_num_vf function needlessly
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (2 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 03/10] igb_uio: fix checkpatch warnings Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 05/10] Subjec: igb_uio: msix cleanups Stephen Hemminger
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-better-pci-num-vf.patch --]
[-- Type: text/plain, Size: 1211 bytes --]

It is better style to just use the pci_num_vf directly, rather
than wrapping it with a local (but globally named) function with
the same effect.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -98,9 +98,8 @@
 }
 
 /* sriov sysfs */
-int local_pci_num_vf(struct pci_dev *dev)
-{
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,34)
+static int pci_num_vf(struct pci_dev *dev)
 	struct iov {
 		int pos;
 		int nres;
@@ -115,17 +114,15 @@
 		return 0;
 
 	return iov->nr_virtfn;
-#else
-	return pci_num_vf(dev);
-#endif
 }
+#endif
 
 static ssize_t
 show_max_vfs(struct device *dev, struct device_attribute *attr,
 	     char *buf)
 {
-	return snprintf(buf, 10, "%u\n", local_pci_num_vf(
-				container_of(dev, struct pci_dev, dev)));
+	return snprintf(buf, 10, "%u\n",
+			pci_num_vf(container_of(dev, struct pci_dev, dev)));
 }
 
 static ssize_t
@@ -141,7 +138,7 @@
 
 	if (0 == max_vfs)
 		pci_disable_sriov(pdev);
-	else if (0 == local_pci_num_vf(pdev))
+	else if (0 == pci_num_vf(pdev))
 		err = pci_enable_sriov(pdev, max_vfs);
 	else /* do nothing if change max_vfs number */
 		err = -EINVAL;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 05/10] Subjec: igb_uio: msix cleanups
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (3 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 04/10] igb_uio: dont wrap pci_num_vf function needlessly Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-07-18 11:05   ` Thomas Monjalon
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 06/10] igb_uio: propogate error numbers in probe code Stephen Hemminger
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-msix-vector.patch --]
[-- Type: text/plain, Size: 2212 bytes --]

Since only one MSI-X entry is ever defined, there is no need to
put it as an array in the driver private data structure. One msix_entry
can just be put on the stack and initialized there.

Also remove the unused backport defines related to MSI-X.
I suspect this code was just inherited from some other project and
never cleaned up.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -36,21 +36,6 @@
 #include <xen/xen.h>
 #endif
 
-/**
- * MSI-X related macros, copy from linux/pci_regs.h in kernel 2.6.39,
- * but none of them in kernel 2.6.35.
- */
-#ifndef PCI_MSIX_ENTRY_SIZE
-#define PCI_MSIX_ENTRY_SIZE             16
-#define PCI_MSIX_ENTRY_LOWER_ADDR       0
-#define PCI_MSIX_ENTRY_UPPER_ADDR       4
-#define PCI_MSIX_ENTRY_DATA             8
-#define PCI_MSIX_ENTRY_VECTOR_CTRL      12
-#define PCI_MSIX_ENTRY_CTRL_MASKBIT     1
-#endif
-
-#define IGBUIO_NUM_MSI_VECTORS 1
-
 /* interrupt mode */
 enum igbuio_intr_mode {
 	IGBUIO_LEGACY_INTR_MODE = 0,
@@ -67,8 +52,6 @@
 	struct pci_dev *pdev;
 	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
 	enum igbuio_intr_mode mode;
-	struct msix_entry \
-		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
 static char *intr_mode;
@@ -526,17 +509,16 @@
 
 	/* check if it need to try msix first */
 	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
-		int vector;
-
-		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
-			udev->msix_entries[vector].entry = vector;
+		/* only one MSIX vector needed */
+		struct msix_entry msix_entry = {
+			.entry = 0,
+		};
 
-		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
+		if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
 			udev->mode = IGBUIO_MSIX_INTR_MODE;
-		}
-		else {
-			pci_disable_msix(udev->pdev);
-			pr_info("fail to enable pci msix, or not enough msix entries\n");
+		} else {
+			pr_err("failed to enable pci msix, or not enough msix entries\n");
+			udev->mode = IGBUIO_LEGACY_INTR_MODE;
 		}
 	}
 	switch (udev->mode) {

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 06/10] igb_uio: propogate error numbers in probe code
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (4 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 05/10] Subjec: igb_uio: msix cleanups Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 07/10] igb_uio: make irq mode param read-only Stephen Hemminger
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-error-handling.patch --]
[-- Type: text/plain, Size: 2774 bytes --]

It is good practice to propogate the return values of failing
functions so that more information can be reported. The failed result
of probe will make it out to errno and get printed by modprobe
and will aid in diagnosis of failures.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -442,7 +442,7 @@
 		}
 	}
 
-	return ((iom != 0) ? ret : ENOENT);
+	return (iom != 0) ? ret : -ENOENT;
 }
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(3,8,0)
@@ -453,6 +453,7 @@
 igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
 	struct rte_uio_pci_dev *udev;
+	int err;
 
 	udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL);
 	if (!udev)
@@ -462,7 +463,8 @@
 	 * enable device: ask low-level code to enable I/O and
 	 * memory
 	 */
-	if (pci_enable_device(dev)) {
+	err = pci_enable_device(dev);
+	if (err != 0) {
 		dev_err(&dev->dev, "Cannot enable PCI device\n");
 		goto fail_free;
 	}
@@ -471,7 +473,8 @@
 	 * reserve device's PCI memory regions for use by this
 	 * module
 	 */
-	if (pci_request_regions(dev, "igb_uio")) {
+	err = pci_request_regions(dev, "igb_uio");
+	if (err != 0) {
 		dev_err(&dev->dev, "Cannot request regions\n");
 		goto fail_disable;
 	}
@@ -480,14 +483,19 @@
 	pci_set_master(dev);
 
 	/* remap IO memory */
-	if (igbuio_setup_bars(dev, &udev->info))
+	err = igbuio_setup_bars(dev, &udev->info);
+	if (err != 0)
 		goto fail_release_iomem;
 
 	/* set 64-bit DMA mask */
-	if (pci_set_dma_mask(dev,  DMA_BIT_MASK(64))) {
+	err = pci_set_dma_mask(dev,  DMA_BIT_MASK(64));
+	if (err != 0) {
 		dev_err(&dev->dev, "Cannot set DMA mask\n");
 		goto fail_release_iomem;
-	} else if (pci_set_consistent_dma_mask(dev, DMA_BIT_MASK(64))) {
+	}
+
+	err = pci_set_consistent_dma_mask(dev, DMA_BIT_MASK(64));
+	if (err != 0) {
 		dev_err(&dev->dev, "Cannot set consistent DMA mask\n");
 		goto fail_release_iomem;
 	}
@@ -536,12 +544,14 @@
 		break;
 	}
 
-	if (sysfs_create_group(&dev->dev.kobj, &dev_attr_grp))
+	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
+	if (err != 0)
 		goto fail_release_iomem;
 
 	/* register uio driver */
-	if (uio_register_device(&dev->dev, &udev->info))
-		goto fail_release_iomem;
+	err = uio_register_device(&dev->dev, &udev->info);
+	if (err != 0)
+		goto fail_remove_group;
 
 	pci_set_drvdata(dev, udev);
 
@@ -550,8 +560,9 @@
 
 	return 0;
 
-fail_release_iomem:
+fail_remove_group:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
+fail_release_iomem:
 	igbuio_pci_release_iomem(&udev->info);
 	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
 		pci_disable_msix(udev->pdev);
@@ -561,7 +572,7 @@
 fail_free:
 	kfree(udev);
 
-	return -ENODEV;
+	return err;
 }
 
 static void

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 07/10] igb_uio: make irq mode param read-only
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (5 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 06/10] igb_uio: propogate error numbers in probe code Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling Stephen Hemminger
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-mode-param-ro.patch --]
[-- Type: text/plain, Size: 553 bytes --]

The module parameter is read-only since changing mode after loading
isn't going to work.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -647,7 +647,7 @@
 module_init(igbuio_pci_init_module);
 module_exit(igbuio_pci_exit_module);
 
-module_param(intr_mode, charp, S_IRUGO | S_IWUSR);
+module_param(intr_mode, charp, S_IRUGO);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
 "    msix       Use MSIX interrupt\n"

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (6 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 07/10] igb_uio: make irq mode param read-only Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-07-18 11:49   ` Thomas Monjalon
  2014-07-18 12:41   ` Thomas Monjalon
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs Stephen Hemminger
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-irq-mgmt.patch --]
[-- Type: text/plain, Size: 9639 bytes --]

This pach reworks how IRQ mode handling is done.

The biggest code change is to use the standard INTX management
code that exists in more recent kernels (and provide backport version).
This also fixes the pci_lock code which was broken, since it was
not protecting against config access, and was doing trylock.

Make this driver behave like other Linux drivers.
Start at MSI-X and degrade to less desireable modes
automatically if the desired type is not available.

This patch also makes MSI mode work, previously the mode
was there but it would never work.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -40,8 +40,7 @@
 enum igbuio_intr_mode {
 	IGBUIO_LEGACY_INTR_MODE = 0,
 	IGBUIO_MSI_INTR_MODE,
-	IGBUIO_MSIX_INTR_MODE,
-	IGBUIO_INTR_MODE_MAX
+	IGBUIO_MSIX_INTR_MODE
 };
 
 /**
@@ -50,7 +49,6 @@
 struct rte_uio_pci_dev {
 	struct uio_info info;
 	struct pci_dev *pdev;
-	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
 	enum igbuio_intr_mode mode;
 };
 
@@ -139,36 +137,67 @@
 	.attrs = dev_attrs,
 };
 
-static inline int
-pci_lock(struct pci_dev * pdev)
-{
-	/* Some function names changes between 3.2.0 and 3.3.0... */
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(3,3,0)
-	pci_block_user_cfg_access(pdev);
-	return 1;
-#else
-	return pci_cfg_access_trylock(pdev);
-#endif
+/* Check if INTX works to control irq's.
+ * Set's INTX_DISABLE flag and reads it back
+ */
+static bool pci_intx_mask_supported(struct pci_dev *dev)
+{
+	bool mask_supported = false;
+	uint16_t orig, new
+
+	pci_block_user_cfg_access(dev);
+	pci_read_config_word(pdev, PCI_COMMAND, &orig);
+	pci_write_config_word(dev, PCI_COMMAND,
+			      orig ^ PCI_COMMAND_INTX_DISABLE);
+	pci_read_config_word(dev, PCI_COMMAND, &new);
+
+	if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
+		dev_err(&dev->dev, "Command register changed from "
+			"0x%x to 0x%x: driver or hardware bug?\n", orig, new);
+	} else if ((new ^ orig) & PCI_COMMAND_INTX_DISABLE) {
+		mask_supported = true;
+		pci_write_config_word(dev, PCI_COMMAND, orig);
+	}
+	pci_unblock_user_cfg_access(dev);
 }
 
-static inline void
-pci_unlock(struct pci_dev * pdev)
+static bool pci_check_and_mask_intx(struct pci_dev *pdev)
 {
-	/* Some function names changes between 3.2.0 and 3.3.0... */
-#if LINUX_VERSION_CODE < KERNEL_VERSION(3,3,0)
-	pci_unblock_user_cfg_access(pdev);
-#else
-	pci_cfg_access_unlock(pdev);
-#endif
+	bool pending;
+	uint32_t status;
+
+	pci_block_user_cfg_access(dev);
+	pci_read_config_dword(pdev, PCI_COMMAND, &status);
+
+	/* interrupt is not ours, goes to out */
+	pending = (((status >> 16) & PCI_STATUS_INTERRUPT) != 0);
+	if (pending) {
+		uint16_t old, new;
+
+		old = status;
+		if (state != 0)
+			new = old & (~PCI_COMMAND_INTX_DISABLE);
+		else
+			new = old | PCI_COMMAND_INTX_DISABLE;
+
+		if (old != new)
+			pci_write_config_word(pdev, PCI_COMMAND, new);
+	}
+	pci_unblock_user_cfg_access(dev);
+
+	return pending;
 }
+#endif
 
-/**
+/*
  * It masks the msix on/off of generating MSI-X messages.
  */
-static int
+static void
 igbuio_msix_mask_irq(struct msi_desc *desc, int32_t state)
 {
-	uint32_t mask_bits = desc->masked;
+	u32 mask_bits = desc->masked;
 	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 						PCI_MSIX_ENTRY_VECTOR_CTRL;
 
@@ -182,49 +211,24 @@
 		readl(desc->mask_base);
 		desc->masked = mask_bits;
 	}
-
-	return 0;
 }
 
-/**
- * This function sets/clears the masks for generating LSC interrupts.
- *
- * @param info
- *   The pointer to struct uio_info.
- * @param on
- *   The on/off flag of masking LSC.
- * @return
- *   -On success, zero value.
- *   -On failure, a negative value.
- */
-static int
-igbuio_set_interrupt_mask(struct rte_uio_pci_dev *udev, int32_t state)
+static void
+igbuio_msi_mask_irq(struct irq_data *data, u32 enable)
 {
-	struct pci_dev *pdev = udev->pdev;
-
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
-		struct msi_desc *desc;
+	struct msi_desc *desc = irq_data_get_msi(data);
+	u32 mask_bits = desc->masked;
+	unsigned offset = data->irq - desc->dev->irq;
+	u32 mask = 1 << offset;
+	u32 flag = enable << offset;
 
-		list_for_each_entry(desc, &pdev->msi_list, list) {
-			igbuio_msix_mask_irq(desc, state);
-		}
-	}
-	else if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
-		uint32_t status;
-		uint16_t old, new;
+	mask_bits &= ~mask;
+	mask_bits |= flag;
 
-		pci_read_config_dword(pdev, PCI_COMMAND, &status);
-		old = status;
-		if (state != 0)
-			new = old & (~PCI_COMMAND_INTX_DISABLE);
-		else
-			new = old | PCI_COMMAND_INTX_DISABLE;
-
-		if (old != new)
-			pci_write_config_word(pdev, PCI_COMMAND, new);
+	if (desc->msi_attrib.maskbit && mask_bits != desc->masked) {
+		pci_write_config_dword(desc->dev, desc->mask_pos, mask_bits);
+		desc->masked = mask_bits;
 	}
-
-	return 0;
 }
 
 /**
@@ -243,20 +247,23 @@
 static int
 igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
 {
-	unsigned long flags;
 	struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
 	struct pci_dev *pdev = udev->pdev;
 
-	spin_lock_irqsave(&udev->lock, flags);
-	if (!pci_lock(pdev)) {
-		spin_unlock_irqrestore(&udev->lock, flags);
-		return -1;
-	}
+	pci_cfg_access_lock(pdev);
+	if (udev->mode == IGBUIO_LEGACY_INTR_MODE)
+		pci_intx(pdev, !!irq_state);
+	else if (udev->mode == IGBUIO_MSI_INTR_MODE) {
+		struct irq_data *data = irq_get_irq_data(pdev->irq);
 
-	igbuio_set_interrupt_mask(udev, irq_state);
+		igbuio_msi_mask_irq(data, !!irq_state);
+	} else if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
+		struct msi_desc *desc;
 
-	pci_unlock(pdev);
-	spin_unlock_irqrestore(&udev->lock, flags);
+		list_for_each_entry(desc, &pdev->msi_list, list)
+			igbuio_msix_mask_irq(desc, irq_state);
+	}
+	pci_cfg_access_unlock(pdev);
 
 	return 0;
 }
@@ -268,37 +275,15 @@
 static irqreturn_t
 igbuio_pci_irqhandler(int irq, struct uio_info *info)
 {
-	irqreturn_t ret = IRQ_NONE;
-	unsigned long flags;
 	struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
-	struct pci_dev *pdev = udev->pdev;
-	uint32_t cmd_status_dword;
-	uint16_t status;
 
-	spin_lock_irqsave(&udev->lock, flags);
-	/* block userspace PCI config reads/writes */
-	if (!pci_lock(pdev))
-		goto spin_unlock;
-
-	/* for legacy mode, interrupt maybe shared */
-	if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
-		pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
-		status = cmd_status_dword >> 16;
-		/* interrupt is not ours, goes to out */
-		if (!(status & PCI_STATUS_INTERRUPT))
-			goto done;
-	}
-
-	igbuio_set_interrupt_mask(udev, 0);
-	ret = IRQ_HANDLED;
-done:
-	/* unblock userspace PCI config reads/writes */
-	pci_unlock(pdev);
-spin_unlock:
-	spin_unlock_irqrestore(&udev->lock, flags);
-	pr_info("irq 0x%x %s\n", irq, (ret == IRQ_HANDLED) ? "handled" : "not handled");
+	/* Legacy mode need to mask in hardware */
+	if (udev->mode == IGBUIO_LEGACY_INTR_MODE &&
+	    !pci_check_and_mask_intx(udev->pdev))
+		return IRQ_NONE;
 
-	return ret;
+	/* Message signal mode, no share IRQ and automasked */
+	return IRQ_HANDLED;
 }
 
 #ifdef CONFIG_XEN_DOM0
@@ -454,6 +439,7 @@
 {
 	struct rte_uio_pci_dev *udev;
 	int err;
+	struct msix_entry msix_entry;
 
 	udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL);
 	if (!udev)
@@ -505,6 +491,7 @@
 	udev->info.version = "0.1";
 	udev->info.handler = igbuio_pci_irqhandler;
 	udev->info.irqcontrol = igbuio_pci_irqcontrol;
+	udev->info.irq = dev->irq;
 #ifdef CONFIG_XEN_DOM0
 	/* check if the driver run on Xen Dom0 */
 	if (xen_initial_domain())
@@ -512,36 +499,36 @@
 #endif
 	udev->info.priv = udev;
 	udev->pdev = dev;
-	udev->mode = 0; /* set the default value for interrupt mode */
-	spin_lock_init(&udev->lock);
 
-	/* check if it need to try msix first */
-	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
-		/* only one MSIX vector needed */
-		struct msix_entry msix_entry = {
-			.entry = 0,
-		};
-
-		if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
+	switch (igbuio_intr_mode_preferred) {
+	case IGBUIO_MSIX_INTR_MODE:
+		/* Only 1 msi-x vector needed */
+		msix_entry.entry = 0;
+		if (pci_enable_msix(dev, &msix_entry, 1) == 0) {
+			dev_dbg(&dev->dev, "using MSI-X");
+			udev->info.irq = msix_entry.vector;
 			udev->mode = IGBUIO_MSIX_INTR_MODE;
-		} else {
-			pr_err("failed to enable pci msix, or not enough msix entries\n");
-			udev->mode = IGBUIO_LEGACY_INTR_MODE;
+			break;
 		}
-	}
-	switch (udev->mode) {
-	case IGBUIO_MSIX_INTR_MODE:
-		udev->info.irq_flags = 0;
-		udev->info.irq = udev->msix_entries[0].vector;
-		break;
+		/* fall back to MSI */
 	case IGBUIO_MSI_INTR_MODE:
-		break;
+		if (pci_enable_msi(dev) == 0) {
+			dev_dbg(&dev->dev, "using MSI");
+			udev->info.irq = dev->irq;
+			udev->mode = IGBUIO_MSI_INTR_MODE;
+			break;
+		}
+		/* fall back to INTX */
 	case IGBUIO_LEGACY_INTR_MODE:
-		udev->info.irq_flags = IRQF_SHARED;
-		udev->info.irq = dev->irq;
-		break;
-	default:
-		break;
+		if (pci_intx_mask_supported(dev)) {
+			dev_dbg(&dev->dev, "using INTX");
+			udev->info.irq_flags = IRQF_SHARED;
+			udev->mode = IGBUIO_LEGACY_INTR_MODE;
+		} else {
+			dev_err(&dev->dev, "PCI INTX mask not supported\n");
+			err = -EIO;
+			goto fail_release_iomem;
+		}
 	}
 
 	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
@@ -566,6 +553,8 @@
 	igbuio_pci_release_iomem(&udev->info);
 	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
 		pci_disable_msix(udev->pdev);
+	if (udev->mode == IGBUIO_MSI_INTR_MODE)
+		pci_disable_msi(udev->pdev);
 	pci_release_regions(dev);
 fail_disable:
 	pci_disable_device(dev);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (7 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-08 15:37   ` Neil Horman
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 10/10] igbuio: use mode string for module param Stephen Hemminger
  2014-06-13 16:14 ` [dpdk-dev] [PATCH v2 00/10] igb_uio patches Thomas Monjalon
  10 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igbuio-irq-mode-sysfs.patch --]
[-- Type: text/plain, Size: 1013 bytes --]

Since irq mode is determined dynamically on a per-device
basis, and virtio needs to know if using intx or msi-x,
make it a sysfs attribute.

--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -43,6 +43,10 @@
 	IGBUIO_MSIX_INTR_MODE
 };
 
+static const char *igbuio_intr_modes[] = {
+	"legacy", "msi", "msix"
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -128,8 +132,20 @@
 }
 
 static DEVICE_ATTR(max_vfs, S_IRUGO | S_IWUSR, show_max_vfs, store_max_vfs);
+
+static ssize_t irq_mode_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct uio_info *info = pci_get_drvdata(to_pci_dev(dev));
+	struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
+
+	return sprintf(buf, "%s\n", igbuio_intr_modes[udev->mode]);
+}
+static DEVICE_ATTR(irq_mode, S_IRUGO, irq_mode_show, NULL);
+
 static struct attribute *dev_attrs[] = {
 	&dev_attr_max_vfs.attr,
+	&dev_attr_irq_mode.attr,
 	NULL,
 };
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2 10/10] igbuio: use mode string for module param
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (8 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs Stephen Hemminger
@ 2014-06-06 23:50 ` Stephen Hemminger
  2014-06-13 16:14 ` [dpdk-dev] [PATCH v2 00/10] igb_uio patches Thomas Monjalon
  10 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-06 23:50 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

[-- Attachment #1: igb_uio-mode-param-table.patch --]
[-- Type: text/plain, Size: 1395 bytes --]

Since we now have a list of interrupt modes for sysfs,
use that to match module param values. This also allows msi
to be selected as a preferred mode.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -605,21 +605,24 @@
 static int
 igbuio_config_intr_mode(char *intr_str)
 {
+	unsigned int i;
+
 	if (!intr_str) {
 		pr_info("Use MSIX interrupt by default\n");
 		return 0;
 	}
 
-	if (!strcmp(intr_str, "msix")) {
-		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
-		pr_info("Use MSIX interrupt\n");
-	} else if (!strcmp(intr_str, "legacy")) {
-		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
-		pr_info("Use legacy interrupt\n");
-	} else {
-		pr_info("Error: bad parameter - %s\n", intr_str);
-		return -EINVAL;
-	}
+	for (i = 0; i < ARRAY_SIZE(igbuio_intr_modes); i++) {
+		const char *mode = igbuio_intr_modes[i];
+		if (!strcmp(intr_str, mode)) {
+			pr_info("Use %s interrupt mode\n", mode);
+			igbuio_intr_mode_preferred = i;
+			return 0;
+		}
+ 	}
+
+	pr_warn("bad interrupt mode parameter %s\n", intr_str);
+	return -EINVAL;
 
 	return 0;
 }
@@ -656,6 +659,7 @@
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
 "    msix       Use MSIX interrupt\n"
+"    msi	Use MSI interrupt\n"
 "    legacy     Use Legacy interrupt\n"
 "\n");
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs Stephen Hemminger
@ 2014-06-08 15:37   ` Neil Horman
  2014-06-11 18:27     ` Carew, Alan
  0 siblings, 1 reply; 27+ messages in thread
From: Neil Horman @ 2014-06-08 15:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Fri, Jun 06, 2014 at 04:50:37PM -0700, Stephen Hemminger wrote:
> Since irq mode is determined dynamically on a per-device
> basis, and virtio needs to know if using intx or msi-x,
> make it a sysfs attribute.
> 
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -43,6 +43,10 @@
>  	IGBUIO_MSIX_INTR_MODE
>  };
>  
> +static const char *igbuio_intr_modes[] = {
> +	"legacy", "msi", "msix"
> +};
> +
>  /**
>   * A structure describing the private information for a uio device.
>   */
> @@ -128,8 +132,20 @@
>  }
>  
>  static DEVICE_ATTR(max_vfs, S_IRUGO | S_IWUSR, show_max_vfs, store_max_vfs);
> +
> +static ssize_t irq_mode_show(struct device *dev,
> +			     struct device_attribute *attr, char *buf)
> +{
> +	struct uio_info *info = pci_get_drvdata(to_pci_dev(dev));
> +	struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
> +
> +	return sprintf(buf, "%s\n", igbuio_intr_modes[udev->mode]);
> +}
> +static DEVICE_ATTR(irq_mode, S_IRUGO, irq_mode_show, NULL);
> +
>  static struct attribute *dev_attrs[] = {
>  	&dev_attr_max_vfs.attr,
> +	&dev_attr_irq_mode.attr,
>  	NULL,
>  };
>  
> 
> 
Do you really need this attribute?  The pci bus sysfs tree already exports irq
mode information for each device allocated already in
/sys/bus/pci/device/msi_irqs<irq_number>

Neil

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
  2014-06-08 15:37   ` Neil Horman
@ 2014-06-11 18:27     ` Carew, Alan
  2014-06-11 20:08       ` Stephen Hemminger
  2014-06-13  0:28       ` Neil Horman
  0 siblings, 2 replies; 27+ messages in thread
From: Carew, Alan @ 2014-06-11 18:27 UTC (permalink / raw)
  To: Neil Horman, Stephen Hemminger; +Cc: dev

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Sunday, June 08, 2014 4:37 PM
> To: Stephen Hemminger
> Cc: Carew, Alan; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
> 
> On Fri, Jun 06, 2014 at 04:50:37PM -0700, Stephen Hemminger wrote:
> > Since irq mode is determined dynamically on a per-device
> > basis, and virtio needs to know if using intx or msi-x,
> > make it a sysfs attribute.
> >
> > --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > @@ -43,6 +43,10 @@
> >  	IGBUIO_MSIX_INTR_MODE
> >  };
> >
> > +static const char *igbuio_intr_modes[] = {
> > +	"legacy", "msi", "msix"
> > +};
> > +
> >  /**
> >   * A structure describing the private information for a uio device.
> >   */
> > @@ -128,8 +132,20 @@
> >  }
> >
> >  static DEVICE_ATTR(max_vfs, S_IRUGO | S_IWUSR, show_max_vfs,
> store_max_vfs);
> > +
> > +static ssize_t irq_mode_show(struct device *dev,
> > +			     struct device_attribute *attr, char *buf)
> > +{
> > +	struct uio_info *info = pci_get_drvdata(to_pci_dev(dev));
> > +	struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
> > +
> > +	return sprintf(buf, "%s\n", igbuio_intr_modes[udev->mode]);
> > +}
> > +static DEVICE_ATTR(irq_mode, S_IRUGO, irq_mode_show, NULL);
> > +
> >  static struct attribute *dev_attrs[] = {
> >  	&dev_attr_max_vfs.attr,
> > +	&dev_attr_irq_mode.attr,
> >  	NULL,
> >  };
> >
> >
> >
> Do you really need this attribute?  The pci bus sysfs tree already exports irq
> mode information for each device allocated already in
> /sys/bus/pci/device/msi_irqs<irq_number>
> 
> Neil

Hi Neil,

After some digging around, I now realise what you mean, it is a good suggestion.
/sys/bus/pci/devices/<D:B:D.F>/msi_irqs/<IRQ>/mode, where "msi_irqs" directory only exists for msi/msix and not for int-x. Also the "mode" file contains string of configured interrupt mode.

Thanks,
Alan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
  2014-06-11 18:27     ` Carew, Alan
@ 2014-06-11 20:08       ` Stephen Hemminger
  2014-06-16  8:03         ` Carew, Alan
  2014-06-13  0:28       ` Neil Horman
  1 sibling, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-11 20:08 UTC (permalink / raw)
  To: Carew, Alan; +Cc: dev

This is what I am testing, along with 10 other virtio patches.

Subject: virtio: check for using msix interrupt

Fix how the device driver detects MSI-X vs INTX mode.
Look in sysfs to find if MSI-X is enabled.

Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -709,6 +709,28 @@
 	return 0;
 }
 
+/*
+ * Detect if using INTX or MSI-X by looking for:
+ *  /sys/bus/pci/devices/<D:B:D.F>/msi_irqs/
+ * if directory exists, must be using msi-x
+ */
+static int
+has_msix(const struct rte_pci_addr *loc)
+{
+	DIR *d;
+	char dirname[PATH_MAX];
+
+	rte_snprintf(dirname, sizeof(dirname),
+		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
+		     loc->domain, loc->bus, loc->devid, loc->function);
+
+	d = opendir(dirname);
+	if (d)
+		closedir(d);
+
+	return (d != NULL);
+}
+
 static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen)
 {
 	unsigned int uio_num;
@@ -872,6 +894,8 @@
 		PMD_INIT_LOG(DEBUG,
 			     "PCI Port IO found start=0x%lx with size=0x%lx\n",
 			     start, size);
+
+		hw->use_msix = has_msix(&pci_dev->addr);
 	}
 #endif
 	hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -177,6 +177,7 @@
 	uint16_t    subsystem_device_id;
 	uint16_t    subsystem_vendor_id;
 	uint8_t     revision_id;
+	uint8_t	    use_msix;
 	uint8_t     mac_addr[ETHER_ADDR_LEN];
 	int         adapter_stopped;
 };
@@ -194,13 +195,11 @@
 	uint16_t   max_virtqueue_pairs;
 } __attribute__((packed));
 
-/* Value indicated in device config */
-#define VIRTIO_PCI_FLAG_MSIX  0x0020
 /*
  * The remaining space is defined by each driver as the per-driver
  * configuration space.
  */
-#define VIRTIO_PCI_CONFIG(hw) (((hw)->guest_features & VIRTIO_PCI_FLAG_MSIX) ? 24 : 20)
+#define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)
 
 /*
  * How many bits to shift physical queue address written to QUEUE_PFN.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
  2014-06-11 18:27     ` Carew, Alan
  2014-06-11 20:08       ` Stephen Hemminger
@ 2014-06-13  0:28       ` Neil Horman
  1 sibling, 0 replies; 27+ messages in thread
From: Neil Horman @ 2014-06-13  0:28 UTC (permalink / raw)
  To: Carew, Alan; +Cc: dev

On Wed, Jun 11, 2014 at 06:27:35PM +0000, Carew, Alan wrote:
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Sunday, June 08, 2014 4:37 PM
> > To: Stephen Hemminger
> > Cc: Carew, Alan; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
> > 
> > On Fri, Jun 06, 2014 at 04:50:37PM -0700, Stephen Hemminger wrote:
> > > Since irq mode is determined dynamically on a per-device
> > > basis, and virtio needs to know if using intx or msi-x,
> > > make it a sysfs attribute.
> > >
> > > --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > > +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > > @@ -43,6 +43,10 @@
> > >  	IGBUIO_MSIX_INTR_MODE
> > >  };
> > >
> > > +static const char *igbuio_intr_modes[] = {
> > > +	"legacy", "msi", "msix"
> > > +};
> > > +
> > >  /**
> > >   * A structure describing the private information for a uio device.
> > >   */
> > > @@ -128,8 +132,20 @@
> > >  }
> > >
> > >  static DEVICE_ATTR(max_vfs, S_IRUGO | S_IWUSR, show_max_vfs,
> > store_max_vfs);
> > > +
> > > +static ssize_t irq_mode_show(struct device *dev,
> > > +			     struct device_attribute *attr, char *buf)
> > > +{
> > > +	struct uio_info *info = pci_get_drvdata(to_pci_dev(dev));
> > > +	struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
> > > +
> > > +	return sprintf(buf, "%s\n", igbuio_intr_modes[udev->mode]);
> > > +}
> > > +static DEVICE_ATTR(irq_mode, S_IRUGO, irq_mode_show, NULL);
> > > +
> > >  static struct attribute *dev_attrs[] = {
> > >  	&dev_attr_max_vfs.attr,
> > > +	&dev_attr_irq_mode.attr,
> > >  	NULL,
> > >  };
> > >
> > >
> > >
> > Do you really need this attribute?  The pci bus sysfs tree already exports irq
> > mode information for each device allocated already in
> > /sys/bus/pci/device/msi_irqs<irq_number>
> > 
> > Neil
> 
> Hi Neil,
> 
> After some digging around, I now realise what you mean, it is a good suggestion.
thanks.

> /sys/bus/pci/devices/<D:B:D.F>/msi_irqs/<IRQ>/mode, where "msi_irqs" directory only exists for msi/msix and not for int-x. Also the "mode" file contains string of configured interrupt mode.
Yes, thats correct (although the file layout is slightly different in later
kernels, as the mode is contained in a file that is named after the irq vector).
For int-x interrupts you can query the /sys/bus/pci/devices/<D:B:D.F/irq file.

Best
Neil

> 
> Thanks,
> Alan
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/10] igb_uio patches
  2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
                   ` (9 preceding siblings ...)
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 10/10] igbuio: use mode string for module param Stephen Hemminger
@ 2014-06-13 16:14 ` Thomas Monjalon
  2014-06-13 17:24   ` Stephen Hemminger
  10 siblings, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2014-06-13 16:14 UTC (permalink / raw)
  To: Alan Carew; +Cc: dev

Alan, Stephen,

2014-06-06 16:50, Stephen Hemminger:
> These apply to the current DPDK tree, and are an alternative to
> the patches from Alan. It provides indication of IRQ mode via
> sysfs attribute.

I need clear conclusion to these patch series.
Alan, do you acknowledge Stephen's serie?
If yes, could you self-nack your own serie?

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/10] igb_uio patches
  2014-06-13 16:14 ` [dpdk-dev] [PATCH v2 00/10] igb_uio patches Thomas Monjalon
@ 2014-06-13 17:24   ` Stephen Hemminger
  2014-06-13 17:51     ` [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF Chris Wright
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2014-06-13 17:24 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Fri, 13 Jun 2014 18:14:28 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> Alan, Stephen,
> 
> 2014-06-06 16:50, Stephen Hemminger:
> > These apply to the current DPDK tree, and are an alternative to
> > the patches from Alan. It provides indication of IRQ mode via
> > sysfs attribute.
> 
> I need clear conclusion to these patch series.
> Alan, do you acknowledge Stephen's serie?
> If yes, could you self-nack your own serie?
> 
> Thanks

I was about to resend a new set without the irq mode sysfs file
since it seems that is not needed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
  2014-06-13 17:24   ` Stephen Hemminger
@ 2014-06-13 17:51     ` Chris Wright
  2014-06-13 18:02       ` Richardson, Bruce
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Wright @ 2014-06-13 17:51 UTC (permalink / raw)
  To: Richardson, Bruce, Stephen Hemminger; +Cc: dev

To keep from confusing users, cap max VFs at 7, despite PCI SR-IOV config
space showing a max of 8.  This reserves a queue pair for the PF.

This issue was cited here:

 http://dpdk.org/ml/archives/dev/2014-April/001832.html

Cc: Bruce Richardson <bruce.richardson@intel.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Chris Wright <chrisw@redhat.com>
---

This is what Linux kernel driver does.  I have only
compile tested it.  Stephen sending to you and Bruce
in case you want to Ack and add to your current queue.

 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 6fa7396..d5db97a 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -139,6 +139,12 @@ store_max_vfs(struct device *dev, struct device_attribute *attr,
 	if (0 != strict_strtoul(buf, 0, &max_vfs))
 		return -EINVAL;
 
+	/* reserve a queue pair for PF */
+	if (max_vfs > 7) {
+		dev_warn(dev, "Maxixum of 7 VFs per PF, using max\n");
+		max_vfs = 7;
+	}
+
 	if (0 == max_vfs)
 		pci_disable_sriov(pdev);
 	else if (0 == local_pci_num_vf(pdev))

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
  2014-06-13 17:51     ` [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF Chris Wright
@ 2014-06-13 18:02       ` Richardson, Bruce
  2014-06-13 18:14         ` Chris Wright
  0 siblings, 1 reply; 27+ messages in thread
From: Richardson, Bruce @ 2014-06-13 18:02 UTC (permalink / raw)
  To: Chris Wright, Stephen Hemminger; +Cc: dev



> -----Original Message-----
> From: Chris Wright [mailto:chrisw@redhat.com]
> Sent: Friday, June 13, 2014 10:52 AM
> To: Richardson, Bruce; Stephen Hemminger
> Cc: Thomas Monjalon; dev@dpdk.org
> Subject: [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
> 
> To keep from confusing users, cap max VFs at 7, despite PCI SR-IOV config
> space showing a max of 8.  This reserves a queue pair for the PF.
> 
> This issue was cited here:
> 
>  http://dpdk.org/ml/archives/dev/2014-April/001832.html
> 
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Chris Wright <chrisw@redhat.com>
> ---
> 
> This is what Linux kernel driver does.  I have only
> compile tested it.  Stephen sending to you and Bruce
> in case you want to Ack and add to your current queue.
> 

Sorry, NAK - at least for this implementation.

Hardcoding this to 7 is a bad idea, as the actual max number of VFs supported will depend on the actual hardware used. For someone using an 82599, they can have up to 64 VFs, or 63+PF, so limiting so 7 in that case is a major reduction in capability. What might work there is querying the max number of VFs and limiting to max - 1.
However, even with that, I would suggest that any limit should be possible to override. It's entirely possible that someone max actually want to reserve the full number of VFs, either because they don't want to use the NIC on the host at all, or because they are happy to use a VF on the host instead. Module parameter to allow override might work - and information on it could be added to the error message when we limit the VFs inside the driver.

/Bruce

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
  2014-06-13 18:02       ` Richardson, Bruce
@ 2014-06-13 18:14         ` Chris Wright
  2014-06-13 19:22           ` Richardson, Bruce
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Wright @ 2014-06-13 18:14 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev

* Richardson, Bruce (bruce.richardson@intel.com) wrote:
> 
> 
> > -----Original Message-----
> > From: Chris Wright [mailto:chrisw@redhat.com]
> > Sent: Friday, June 13, 2014 10:52 AM
> > To: Richardson, Bruce; Stephen Hemminger
> > Cc: Thomas Monjalon; dev@dpdk.org
> > Subject: [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
> > 
> > To keep from confusing users, cap max VFs at 7, despite PCI SR-IOV config
> > space showing a max of 8.  This reserves a queue pair for the PF.
> > 
> > This issue was cited here:
> > 
> >  http://dpdk.org/ml/archives/dev/2014-April/001832.html
> > 
> > Cc: Bruce Richardson <bruce.richardson@intel.com>
> > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Chris Wright <chrisw@redhat.com>
> > ---
> > 
> > This is what Linux kernel driver does.  I have only
> > compile tested it.  Stephen sending to you and Bruce
> > in case you want to Ack and add to your current queue.
> > 
> 
> Sorry, NAK - at least for this implementation.

Oh, that's fine.

> Hardcoding this to 7 is a bad idea, as the actual max number of VFs supported will depend on the actual hardware used. For someone using an 82599, they can have up to 64 VFs, or 63+PF, so limiting so 7 in that case is a major reduction in capability. What might work there is querying the max number of VFs and limiting to max - 1.

But this is igb_uio, not 82599 (ixgbe).

> However, even with that, I would suggest that any limit should be possible to override. It's entirely possible that someone max actually want to reserve the full number of VFs, either because they don't want to use the NIC on the host at all, or because they are happy to use a VF on the host instead. Module parameter to allow override might work - and information on it could be added to the error message when we limit the VFs inside the driver.

It's been a while since I've looked at this, but my recollection is
the PF must be there (basic mailbox handling, for example).

Would you rather a simple warning message as a hint?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
  2014-06-13 18:14         ` Chris Wright
@ 2014-06-13 19:22           ` Richardson, Bruce
  2014-06-13 19:48             ` Chris Wright
  2014-06-16 14:13             ` Ananyev, Konstantin
  0 siblings, 2 replies; 27+ messages in thread
From: Richardson, Bruce @ 2014-06-13 19:22 UTC (permalink / raw)
  To: Chris Wright; +Cc: dev

> -----Original Message-----
> From: Chris Wright [mailto:chrisw@redhat.com]
> Sent: Friday, June 13, 2014 11:14 AM
> To: Richardson, Bruce
> Cc: Chris Wright; Stephen Hemminger; Thomas Monjalon; dev@dpdk.org
> Subject: Re: [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
> 
> * Richardson, Bruce (bruce.richardson@intel.com) wrote:
> >
> >
> > > -----Original Message-----
> > > From: Chris Wright [mailto:chrisw@redhat.com]
> > > Sent: Friday, June 13, 2014 10:52 AM
> > > To: Richardson, Bruce; Stephen Hemminger
> > > Cc: Thomas Monjalon; dev@dpdk.org
> > > Subject: [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
> > >
> > > To keep from confusing users, cap max VFs at 7, despite PCI SR-IOV config
> > > space showing a max of 8.  This reserves a queue pair for the PF.
> > >
> > > This issue was cited here:
> > >
> > >  http://dpdk.org/ml/archives/dev/2014-April/001832.html
> > >
> > > Cc: Bruce Richardson <bruce.richardson@intel.com>
> > > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > > Signed-off-by: Chris Wright <chrisw@redhat.com>
> > > ---
> > >
> > > This is what Linux kernel driver does.  I have only
> > > compile tested it.  Stephen sending to you and Bruce
> > > in case you want to Ack and add to your current queue.
> > >
> >
> > Sorry, NAK - at least for this implementation.
> 
> Oh, that's fine.
> 
> > Hardcoding this to 7 is a bad idea, as the actual max number of VFs supported
> will depend on the actual hardware used. For someone using an 82599, they can
> have up to 64 VFs, or 63+PF, so limiting so 7 in that case is a major reduction in
> capability. What might work there is querying the max number of VFs and
> limiting to max - 1.
> 
> But this is igb_uio, not 82599 (ixgbe).

igb_uio is used as the supporting kernel module for both the e1000/igb and ixgbe pmd implementations (as well as for the forthcoming i40e pmd). Despite the name, it's not just for igb-based NICs.

> 
> > However, even with that, I would suggest that any limit should be possible to
> override. It's entirely possible that someone max actually want to reserve the
> full number of VFs, either because they don't want to use the NIC on the host at
> all, or because they are happy to use a VF on the host instead. Module
> parameter to allow override might work - and information on it could be added
> to the error message when we limit the VFs inside the driver.
> 
> It's been a while since I've looked at this, but my recollection is
> the PF must be there (basic mailbox handling, for example).
> 
> Would you rather a simple warning message as a hint?

I'm not sure about the PF still needing to be there or not - I'm not an expert in that area, so you may indeed be right. 
However, as for this patch, I'd probably be ok for now with a version that queried the max_vfs and limited based on that. If in future we do need to add an override it should be trivial to add later-on.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
  2014-06-13 19:22           ` Richardson, Bruce
@ 2014-06-13 19:48             ` Chris Wright
  2014-06-16 14:13             ` Ananyev, Konstantin
  1 sibling, 0 replies; 27+ messages in thread
From: Chris Wright @ 2014-06-13 19:48 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev

* Richardson, Bruce (bruce.richardson@intel.com) wrote:
> > -----Original Message-----
> > From: Chris Wright [mailto:chrisw@redhat.com]
> > Sent: Friday, June 13, 2014 11:14 AM
> > To: Richardson, Bruce
> > Cc: Chris Wright; Stephen Hemminger; Thomas Monjalon; dev@dpdk.org
> > Subject: Re: [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
> > 
> > * Richardson, Bruce (bruce.richardson@intel.com) wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Chris Wright [mailto:chrisw@redhat.com]
> > > > Sent: Friday, June 13, 2014 10:52 AM
> > > > To: Richardson, Bruce; Stephen Hemminger
> > > > Cc: Thomas Monjalon; dev@dpdk.org
> > > > Subject: [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
> > > >
> > > > To keep from confusing users, cap max VFs at 7, despite PCI SR-IOV config
> > > > space showing a max of 8.  This reserves a queue pair for the PF.
> > > >
> > > > This issue was cited here:
> > > >
> > > >  http://dpdk.org/ml/archives/dev/2014-April/001832.html
> > > >
> > > > Cc: Bruce Richardson <bruce.richardson@intel.com>
> > > > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > > > Signed-off-by: Chris Wright <chrisw@redhat.com>
> > > > ---
> > > >
> > > > This is what Linux kernel driver does.  I have only
> > > > compile tested it.  Stephen sending to you and Bruce
> > > > in case you want to Ack and add to your current queue.
> > > >
> > >
> > > Sorry, NAK - at least for this implementation.
> > 
> > Oh, that's fine.
> > 
> > > Hardcoding this to 7 is a bad idea, as the actual max number of VFs supported
> > will depend on the actual hardware used. For someone using an 82599, they can
> > have up to 64 VFs, or 63+PF, so limiting so 7 in that case is a major reduction in
> > capability. What might work there is querying the max number of VFs and
> > limiting to max - 1.
> > 
> > But this is igb_uio, not 82599 (ixgbe).
> 
> igb_uio is used as the supporting kernel module for both the e1000/igb and ixgbe pmd implementations (as well as for the forthcoming i40e pmd). Despite the name, it's not just for igb-based NICs.

Oh, right, sorry, was looking at pmd side for each driver.

> > > However, even with that, I would suggest that any limit should be possible to
> > override. It's entirely possible that someone max actually want to reserve the
> > full number of VFs, either because they don't want to use the NIC on the host at
> > all, or because they are happy to use a VF on the host instead. Module
> > parameter to allow override might work - and information on it could be added
> > to the error message when we limit the VFs inside the driver.
> > 
> > It's been a while since I've looked at this, but my recollection is
> > the PF must be there (basic mailbox handling, for example).
> > 
> > Would you rather a simple warning message as a hint?
> 
> I'm not sure about the PF still needing to be there or not - I'm not an expert in that area, so you may indeed be right. 
> However, as for this patch, I'd probably be ok for now with a version that queried the max_vfs and limited based on that. If in future we do need to add an override it should be trivial to add later-on.

I'll look at that idea.

thanks,
-chris

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
  2014-06-11 20:08       ` Stephen Hemminger
@ 2014-06-16  8:03         ` Carew, Alan
  0 siblings, 0 replies; 27+ messages in thread
From: Carew, Alan @ 2014-06-16  8:03 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, June 11, 2014 9:08 PM
> To: Carew, Alan
> Cc: Neil Horman; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs
> 
> This is what I am testing, along with 10 other virtio patches.
> 
> Subject: virtio: check for using msix interrupt
> 
> Fix how the device driver detects MSI-X vs INTX mode.
> Look in sysfs to find if MSI-X is enabled.
> 
> Suggested-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> 
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> @@ -709,6 +709,28 @@
>  	return 0;
>  }
> 
> +/*
> + * Detect if using INTX or MSI-X by looking for:
> + *  /sys/bus/pci/devices/<D:B:D.F>/msi_irqs/
> + * if directory exists, must be using msi-x
> + */
> +static int
> +has_msix(const struct rte_pci_addr *loc)
> +{
> +	DIR *d;
> +	char dirname[PATH_MAX];
> +
> +	rte_snprintf(dirname, sizeof(dirname),
> +		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
> +		     loc->domain, loc->bus, loc->devid, loc->function);
> +
> +	d = opendir(dirname);
> +	if (d)
> +		closedir(d);
> +
> +	return (d != NULL);
> +}
> +
>  static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen)
>  {
>  	unsigned int uio_num;
> @@ -872,6 +894,8 @@
>  		PMD_INIT_LOG(DEBUG,
>  			     "PCI Port IO found start=0x%lx with size=0x%lx\n",
>  			     start, size);
> +
> +		hw->use_msix = has_msix(&pci_dev->addr);
>  	}
>  #endif
>  	hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
> --- a/lib/librte_pmd_virtio/virtio_pci.h
> +++ b/lib/librte_pmd_virtio/virtio_pci.h
> @@ -177,6 +177,7 @@
>  	uint16_t    subsystem_device_id;
>  	uint16_t    subsystem_vendor_id;
>  	uint8_t     revision_id;
> +	uint8_t	    use_msix;
>  	uint8_t     mac_addr[ETHER_ADDR_LEN];
>  	int         adapter_stopped;
>  };
> @@ -194,13 +195,11 @@
>  	uint16_t   max_virtqueue_pairs;
>  } __attribute__((packed));
> 
> -/* Value indicated in device config */
> -#define VIRTIO_PCI_FLAG_MSIX  0x0020
>  /*
>   * The remaining space is defined by each driver as the per-driver
>   * configuration space.
>   */
> -#define VIRTIO_PCI_CONFIG(hw) (((hw)->guest_features &
> VIRTIO_PCI_FLAG_MSIX) ? 24 : 20)
> +#define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)
> 
>  /*
>   * How many bits to shift physical queue address written to QUEUE_PFN.

Hi Stephen,

The mechanism is fine, however I would be against OS specific code in abstracted drivers. If "has_msix" was a helper function in eal (and suitably renamed) we could then add a FreeBSD equivalent implementation. A less favourable solution would be conditional compilation(RTE_EXEC_ENV_LINUXAPP/BSDPAPP) around the above function.

Thanks,
Alan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF
  2014-06-13 19:22           ` Richardson, Bruce
  2014-06-13 19:48             ` Chris Wright
@ 2014-06-16 14:13             ` Ananyev, Konstantin
  1 sibling, 0 replies; 27+ messages in thread
From: Ananyev, Konstantin @ 2014-06-16 14:13 UTC (permalink / raw)
  To: Richardson, Bruce, Chris Wright; +Cc: dev


Hi Bruce,

>> > However, even with that, I would suggest that any limit should be possible to
>> override. It's entirely possible that someone max actually want to reserve the
>> full number of VFs, either because they don't want to use the NIC on the host at
>> all, or because they are happy to use a VF on the host instead. Module
>> parameter to allow override might work - and information on it could be added
>> to the error message when we limit the VFs inside the driver.
>> 
>> It's been a while since I've looked at this, but my recollection is
>> the PF must be there (basic mailbox handling, for example).
>> 
>> Would you rather a simple warning message as a hint?

>I'm not sure about the PF still needing to be there or not - I'm not an expert in that area, so you may indeed be right. 
>However, as for this patch, I'd probably be ok for now with a version that queried the max_vfs and limited based on that. If in >future we do need to add an override it should be trivial to add later-on.

I don't think it is a right way to put all this logic into the kernel module:
igb_uio doesn't know how many queues user-space PF plans to use for itself.
In your example: 16 queues in total, 7 VFsx2queues = 14 queues, 2 queues left to PF.
But then user-space PF decides it needs 4 queues and we we would hit the same problem again. 

I think, that to fix that issue properly we need to do that in userspace PMD.
After rte_eth_dev_init(), we should know how many queues that device has in total, and how many of them are reserved for VFs.
So at rte_eth_dev_configure() we can add a check that number of requested queues doesn't exceed max_queues - vf_reserved_queues and make rte_eth_dev_configure() to fail, if it does.  

Konstantin
  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/10] Subjec: igb_uio: msix cleanups
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 05/10] Subjec: igb_uio: msix cleanups Stephen Hemminger
@ 2014-07-18 11:05   ` Thomas Monjalon
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2014-07-18 11:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

2014-06-06 16:50, Stephen Hemminger:
> Since only one MSI-X entry is ever defined, there is no need to
> put it as an array in the driver private data structure. One msix_entry
> can just be put on the stack and initialized there.

When merging this patch, I realized it's not complete:
an occurence of the msix_entries array is remaining.
See the regarding part of your patch and my proposal below
to be merged in this patch.

> @@ -67,8 +52,6 @@
>  	struct pci_dev *pdev;
>  	spinlock_t lock; /* spinlock for accessing PCI config space or msix data
> in multi tasks/isr */ enum igbuio_intr_mode mode;
> -	struct msix_entry \
> -		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to
> be allocated later */ };
> 
>  static char *intr_mode;
> @@ -526,17 +509,16 @@
> 
>  	/* check if it need to try msix first */
>  	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
> -		int vector;
> -
> -		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
> -			udev->msix_entries[vector].entry = vector;
> +		/* only one MSIX vector needed */
> +		struct msix_entry msix_entry = {
> +			.entry = 0,
> +		};
> 
> -		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
> +		if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
>  			udev->mode = IGBUIO_MSIX_INTR_MODE;
> -		}
> -		else {
> -			pci_disable_msix(udev->pdev);
> -			pr_info("fail to enable pci msix, or not enough msix entries\n");
> +		} else {
> +			pr_err("failed to enable pci msix, or not enough msix entries\n");
> +			udev->mode = IGBUIO_LEGACY_INTR_MODE;
>  		}
>  	}
>  	switch (udev->mode) {


Proposed changes:
- udev->info.irq need to be set with msix_entry
- udev->mode is already the legacy one by default


 		if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
 			udev->mode = RTE_INTR_MODE_MSIX;
-		} else {
+			udev->info.irq = msix_entry.vector;
+			udev->info.irq_flags = 0;
+		} else
 			pr_err("failed to enable pci msix, or not enough msix entries\n");
-			udev->mode = IGBUIO_LEGACY_INTR_MODE;
-		}
 	}
-	switch (udev->mode) {
-	case RTE_INTR_MODE_MSIX:
-		udev->info.irq_flags = 0;
-		udev->info.irq = udev->msix_entries[0].vector;
-		break;
-	case RTE_INTR_MODE_MSI:
-		break;
-	case RTE_INTR_MODE_LEGACY:
+	if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		udev->info.irq_flags = IRQF_SHARED;
 		udev->info.irq = dev->irq;
-		break;
-	default:
-		break;
 	}


Please confirm it's ok for you.
-- 
Thomas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling Stephen Hemminger
@ 2014-07-18 11:49   ` Thomas Monjalon
  2014-07-18 12:41   ` Thomas Monjalon
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2014-07-18 11:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

I cannot merge this part because some lines were removed from the context.
I think I just have to add them but I would like confirmation.
See below.

> @@ -512,36 +499,36 @@
>  #endif
>  	udev->info.priv = udev;
>  	udev->pdev = dev;
> -	udev->mode = 0; /* set the default value for interrupt mode */
> -	spin_lock_init(&udev->lock);
>  
> -	/* check if it need to try msix first */
> -	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
> -		/* only one MSIX vector needed */
> -		struct msix_entry msix_entry = {
> -			.entry = 0,
> -		};
> -
> -		if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
> +	switch (igbuio_intr_mode_preferred) {
> +	case IGBUIO_MSIX_INTR_MODE:
> +		/* Only 1 msi-x vector needed */
> +		msix_entry.entry = 0;
> +		if (pci_enable_msix(dev, &msix_entry, 1) == 0) {
> +			dev_dbg(&dev->dev, "using MSI-X");
> +			udev->info.irq = msix_entry.vector;
>  			udev->mode = IGBUIO_MSIX_INTR_MODE;
> -		} else {
> -			pr_err("failed to enable pci msix, or not enough msix entries\n");
> -			udev->mode = IGBUIO_LEGACY_INTR_MODE;
> +			break;
>  		}
> -	}
> -	switch (udev->mode) {
> -	case IGBUIO_MSIX_INTR_MODE:
> -		udev->info.irq_flags = 0;
> -		udev->info.irq = udev->msix_entries[0].vector;
> -		break;
> +		/* fall back to MSI */
>  	case IGBUIO_MSI_INTR_MODE:
> -		break;
> +		if (pci_enable_msi(dev) == 0) {
> +			dev_dbg(&dev->dev, "using MSI");
> +			udev->info.irq = dev->irq;
> +			udev->mode = IGBUIO_MSI_INTR_MODE;
> +			break;
> +		}
> +		/* fall back to INTX */
>  	case IGBUIO_LEGACY_INTR_MODE:
> -		udev->info.irq_flags = IRQF_SHARED;
> -		udev->info.irq = dev->irq;
> -		break;
> -	default:
> -		break;
> +		if (pci_intx_mask_supported(dev)) {
> +			dev_dbg(&dev->dev, "using INTX");
> +			udev->info.irq_flags = IRQF_SHARED;
> +			udev->mode = IGBUIO_LEGACY_INTR_MODE;
> +		} else {
> +			dev_err(&dev->dev, "PCI INTX mask not supported\n");
> +			err = -EIO;
> +			goto fail_release_iomem;
> +		}
>  	}

There is a problem here. These 2 lines are missing:

	pci_set_drvdata(dev, udev);
	igbuio_pci_irqcontrol(&udev->info, 0);

>  	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);

-- 
Thomas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling
  2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling Stephen Hemminger
  2014-07-18 11:49   ` Thomas Monjalon
@ 2014-07-18 12:41   ` Thomas Monjalon
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2014-07-18 12:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

I have other (inlined) comments on this patch.

>  	udev->info.version = "0.1";
>  	udev->info.handler = igbuio_pci_irqhandler;
>  	udev->info.irqcontrol = igbuio_pci_irqcontrol;
> +	udev->info.irq = dev->irq;

[...]

> +		/* fall back to MSI */
>  	case IGBUIO_MSI_INTR_MODE:
> -		break;
> +		if (pci_enable_msi(dev) == 0) {
> +			dev_dbg(&dev->dev, "using MSI");
> +			udev->info.irq = dev->irq;

I think we can remove this line: info.irq is already set to the right value.

> +			udev->mode = IGBUIO_MSI_INTR_MODE;
> +			break;
> +		}

There is no default case in this switch statement. It's now required for the 
enum completeness. So I suggest to add these lines:

+	default:
+		dev_err(&dev->dev, "unknown interrupt mode\n");
+		err = -EINVAL;
+		goto fail_release_iomem;

-- 
Thomas

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-07-18 12:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06 23:50 [dpdk-dev] [PATCH v2 00/10] igb_uio patches Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 01/10] igb_uio: use kernel standard log message Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 02/10] igb_uio: use standard uio naming Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 03/10] igb_uio: fix checkpatch warnings Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 04/10] igb_uio: dont wrap pci_num_vf function needlessly Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 05/10] Subjec: igb_uio: msix cleanups Stephen Hemminger
2014-07-18 11:05   ` Thomas Monjalon
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 06/10] igb_uio: propogate error numbers in probe code Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 07/10] igb_uio: make irq mode param read-only Stephen Hemminger
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 08/10] igb_uio: fix IRQ mode handling Stephen Hemminger
2014-07-18 11:49   ` Thomas Monjalon
2014-07-18 12:41   ` Thomas Monjalon
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 09/10] igbuio: show irq mode in sysfs Stephen Hemminger
2014-06-08 15:37   ` Neil Horman
2014-06-11 18:27     ` Carew, Alan
2014-06-11 20:08       ` Stephen Hemminger
2014-06-16  8:03         ` Carew, Alan
2014-06-13  0:28       ` Neil Horman
2014-06-06 23:50 ` [dpdk-dev] [PATCH v2 10/10] igbuio: use mode string for module param Stephen Hemminger
2014-06-13 16:14 ` [dpdk-dev] [PATCH v2 00/10] igb_uio patches Thomas Monjalon
2014-06-13 17:24   ` Stephen Hemminger
2014-06-13 17:51     ` [dpdk-dev] [PATCH] igb_uio: cap max VFs at 7 to reserve one for PF Chris Wright
2014-06-13 18:02       ` Richardson, Bruce
2014-06-13 18:14         ` Chris Wright
2014-06-13 19:22           ` Richardson, Bruce
2014-06-13 19:48             ` Chris Wright
2014-06-16 14:13             ` Ananyev, Konstantin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).