From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: dev@dpdk.org
Subject: [dpdk-dev] [PATCH v2 4/4] doc: add librte_pmd_mlx4 documentation
Date: Sat, 21 Feb 2015 05:17:07 +0100 [thread overview]
Message-ID: <1424492227-27229-5-git-send-email-adrien.mazarguil@6wind.com> (raw)
In-Reply-To: <1422544846-10697-1-git-send-email-adrien.mazarguil@6wind.com>
This documentation covers implementation details, features and limitations,
configuration, prerequisites and provides a usage example.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
doc/guides/prog_guide/index.rst | 1 +
doc/guides/prog_guide/mlx4_poll_mode_drv.rst | 327 +++++++++++++++++++++++++++
doc/guides/prog_guide/source_org.rst | 1 +
3 files changed, 329 insertions(+)
create mode 100644 doc/guides/prog_guide/mlx4_poll_mode_drv.rst
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index de69682..87f6b35 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -56,6 +56,7 @@ Programmer's Guide
intel_dpdk_xen_based_packet_switch_sol
libpcap_ring_based_poll_mode_drv
link_bonding_poll_mode_drv_lib
+ mlx4_poll_mode_drv
timer_lib
hash_lib
lpm_lib
diff --git a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
new file mode 100644
index 0000000..e2c6b92
--- /dev/null
+++ b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
@@ -0,0 +1,327 @@
+.. BSD LICENSE
+ Copyright(c) 2012-2015 6WIND S.A.
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+ * Neither the name of 6WIND S.A. nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MLX4 poll mode driver library
+=============================
+
+The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support
+for **Mellanox ConnectX-3** 10/40 Gbps adapters (EN 40, EN 10, Pro EN 40) as
+well as their virtual functions (VF) in SR-IOV context.
+
+.. note::
+
+ Due to external dependencies, this driver is disabled by default. It must
+ be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and
+ recompiling DPDK.
+
+Implementation details
+----------------------
+
+Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI
+bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a
+PCI driver that allocates one Ethernet device per detected port.
+
+For this reason, one cannot white/blacklist a single port without also
+white/blacklisting the others on the same device.
+
+Besides its dependency on libibverbs (that implies libmlx4 and associated
+kernel support), librte_pmd_mlx4 relies heavily on system calls for control
+operations such as querying/updating the MTU and flow control parameters.
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel
+combined with hardware specifications that allow it to handle virtual memory
+addresses directly ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+This capability allows the PMD to coexist with kernel network interfaces
+which remain functional, although they stop receiving unicast packets as
+long as they share the same MAC address.
+
+Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs.
+
+Features and limitations
+------------------------
+
+- RSS, also known as RCA, is supported. In this mode the number of
+ configured RX queues must be a power of two.
+- VLAN filtering is supported.
+- Link state information is provided.
+- Promiscuous mode is supported.
+- All multicast mode is supported.
+- Multiple MAC addresses (unicast, multicast) can be configured.
+- Scattered packets are supported for TX and RX.
+
+..
+
+- RSS hash key cannot be modified.
+- Hardware counters are not implemented (they are software counters).
+- Checksum offloads are not supported yet.
+
+Configuration
+-------------
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+- ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**)
+
+ Toggle compilation of librte_pmd_mlx4 itself.
+
+- ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**)
+
+ Toggle debugging code and stricter compilation flags. Enabling this option
+ adds additional run-time checks and debugging messages at the cost of
+ lower performance.
+
+- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**)
+
+ Number of scatter/gather elements (SGEs) per work request (WR). Lowering
+ this number improves performance but also limits the ability to receive
+ scattered packets (packets that do not fit a single mbuf). The default
+ value is a safe tradeoff.
+
+- ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**)
+
+ Amount of data to be inlined during TX operations. Improves latency but
+ lowers throughput.
+
+- ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**)
+
+ Maximum number of cached memory pools (MPs) per TX queue. Each MP from
+ which buffers are to be transmitted must be associated to memory regions
+ (MRs). This is a slow operation that must be cached.
+
+ This value is always 1 for RX queues since they use a single MP.
+
+- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**)
+
+ Toggle software counters. No counters are available if this option is
+ disabled since hardware counters are not supported.
+
+- ``CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE`` (default **1**)
+
+ Toggle VMware compatibility code. It also requires the environment
+ variable ``MLX4_COMPAT_VMWARE`` set to a nonzero value at runtime.
+
+Environment variables
+~~~~~~~~~~~~~~~~~~~~~
+
+- ``MLX4_INLINE_RECV_SIZE``
+
+ A nonzero value enables inline receive for packets up to that size. May
+ significantly improve performance in some cases but lower it in
+ others. Requires careful testing.
+
+- ``MLX4_COMPAT_VMWARE``
+
+ Only supported when compiled with
+ ``CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1``. Adds workarounds to run in
+ VMware systems that do not support the flows API properly.
+
+Run-time configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- The only constraint when RSS mode is requested is to make sure the number
+ of RX queues is a power of two. This is a hardware requirement.
+
+- librte_pmd_mlx4 brings kernel network interfaces up during initialization
+ because it is affected by their state. Forcing them down prevents packets
+ reception.
+
+- **ethtool** operations on related kernel interfaces also affect the PMD.
+
+Prerequisites
+-------------
+
+This driver relies on external libraries and kernel drivers for resources
+allocations and initialization. The following dependencies are not part of
+DPDK and must be installed separately:
+
+- **libibverbs**
+
+ User space verbs framework used by librte_pmd_mlx4. This library provides
+ a generic interface between the kernel and low-level user space drivers
+ such as libmlx4.
+
+ It allows slow and privileged operations (context initialization, hardware
+ resources allocations) to be managed by the kernel and fast operations to
+ never leave user space.
+
+- **libmlx4**
+
+ Low-level user space driver library for Mellanox ConnectX-3 devices,
+ it is automatically loaded by libibverbs.
+
+ This library basically implements send/receive calls to the hardware
+ queues.
+
+- **Kernel modules** (mlnx-ofed-kernel)
+
+ They provide the kernel-side verbs API and low level device drivers that
+ manage actual hardware initialization and resources sharing with user
+ space processes.
+
+ Unlike most other PMDs, these modules must remain loaded and bound to
+ their devices:
+
+ - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices.
+ - mlx4_en: Ethernet device driver that provides kernel network interfaces.
+ - mlx4_ib: InifiniBand device driver.
+ - ib_uverbs: user space driver for verbs (entry point for libibverbs).
+
+While these libraries and kernel modules are available on OpenFabrics
+Aliance's `website <https://www.openfabrics.org/>`_ and provided by package
+managers on most distributions, this PMD requires Ethernet extensions that
+may not be supported at the moment (this is a work in progress).
+
+`Mellanox OFED
+<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers>`_
+includes the necessary support and should be used in the meantime. For DPDK,
+only libibverbs, libmlx4 and mlnx-ofed-kernel packages are required from
+that distribution.
+
+.. note::
+
+ Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
+ licensed.
+
+Usage example
+-------------
+
+This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3
+devices managed by librte_pmd_mlx4.
+
+#. Load the kernel modules:
+
+ .. code-block:: console
+
+ modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib
+
+ .. note::
+
+ User space I/O kernel modules (uio and igb_uio) are not used and do
+ not have to be loaded.
+
+#. Make sure Ethernet interfaces are in working order and linked to kernel
+ verbs. Related sysfs entries should be present:
+
+ .. code-block:: console
+
+ ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5
+
+ Example output:
+
+ .. code-block:: console
+
+ eth2
+ eth3
+ eth4
+ eth5
+
+#. Optionally, retrieve their PCI bus addresses for whitelisting:
+
+ .. code-block:: console
+
+ {
+ for intf in eth2 eth3 eth4 eth5;
+ do
+ (cd "/sys/class/net/${intf}/device/" && pwd -P);
+ done;
+ } |
+ sed -n 's,.*/\(.*\),-w \1,p'
+
+ Example output:
+
+ .. code-block:: console
+
+ -w 0000:83:00.0
+ -w 0000:83:00.0
+ -w 0000:84:00.0
+ -w 0000:84:00.0
+
+ .. note::
+
+ There are only two distinct PCI bus addresses because the Mellanox
+ ConnectX-3 adapters installed on this system are dual port.
+
+#. Request huge pages:
+
+ .. code-block:: console
+
+ echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages
+
+#. Start testpmd with basic parameters:
+
+ .. code-block:: console
+
+ testpmd -c 0xff00 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i
+
+ Example output:
+
+ .. code-block:: console
+
+ [...]
+ EAL: PCI device 0000:83:00.0 on NUMA socket 1
+ EAL: probe driver: 15b3:1007 librte_pmd_mlx4
+ PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false)
+ PMD: librte_pmd_mlx4: 2 port(s) detected
+ PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50
+ PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51
+ EAL: PCI device 0000:84:00.0 on NUMA socket 1
+ EAL: probe driver: 15b3:1007 librte_pmd_mlx4
+ PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false)
+ PMD: librte_pmd_mlx4: 2 port(s) detected
+ PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0
+ PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1
+ Interactive-mode selected
+ Configuring Port 0 (socket 0)
+ PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2
+ Port 0: 00:02:C9:B5:B7:50
+ Configuring Port 1 (socket 0)
+ PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2
+ Port 1: 00:02:C9:B5:B7:51
+ Configuring Port 2 (socket 0)
+ PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2
+ Port 2: 00:02:C9:B5:BA:B0
+ Configuring Port 3 (socket 0)
+ PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2
+ Port 3: 00:02:C9:B5:BA:B1
+ Checking link statuses...
+ Port 0 Link Up - speed 10000 Mbps - full-duplex
+ Port 1 Link Up - speed 40000 Mbps - full-duplex
+ Port 2 Link Up - speed 10000 Mbps - full-duplex
+ Port 3 Link Up - speed 40000 Mbps - full-duplex
+ Done
+ testpmd>
diff --git a/doc/guides/prog_guide/source_org.rst b/doc/guides/prog_guide/source_org.rst
index c8ca54f..c66ad16 100644
--- a/doc/guides/prog_guide/source_org.rst
+++ b/doc/guides/prog_guide/source_org.rst
@@ -83,6 +83,7 @@ The lib directory contains::
+-- librte_pmd_e1000 # 1GbE poll mode drivers (igb and em)
+-- librte_pmd_ixgbe # 10GbE poll mode driver
+-- librte_pmd_i40e # 40GbE poll mode driver
+ +-- librte_pmd_mlx4 # Mellanox ConnectX-3 poll mode driver
+-- librte_pmd_pcap # PCAP poll mode driver
+-- librte_pmd_ring # ring poll mode driver
+-- librte_pmd_virtio # virtio poll mode driver
--
2.1.0
next prev parent reply other threads:[~2015-02-21 4:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-29 15:20 [dpdk-dev] [PATCH 0/2] Mellanox ConnectX-3 PMD Adrien Mazarguil
2015-01-29 15:20 ` [dpdk-dev] [PATCH 1/2] scripts: add auto-config-h.sh Adrien Mazarguil
2015-01-29 15:20 ` [dpdk-dev] [PATCH 2/2] mlx4: new poll mode driver Adrien Mazarguil
2015-01-29 15:37 ` Stephen Hemminger
2015-01-29 15:45 ` Adrien Mazarguil
2015-02-21 4:16 ` [dpdk-dev] [PATCH v2 0/4] Mellanox ConnectX-3 PMD Adrien Mazarguil
2015-02-21 4:16 ` [dpdk-dev] [PATCH v2 1/4] scripts: add auto-config-h.sh Adrien Mazarguil
2015-02-25 13:52 ` [dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD Adrien Mazarguil
2015-02-25 13:52 ` [dpdk-dev] [PATCH v3 1/3] scripts: check features to generate configuration header Adrien Mazarguil
2015-02-25 13:52 ` [dpdk-dev] [PATCH v3 2/3] mlx4: new poll mode driver Adrien Mazarguil
2015-02-25 13:52 ` [dpdk-dev] [PATCH v3 3/3] doc: add librte_pmd_mlx4 documentation Adrien Mazarguil
2015-03-02 17:45 ` Butler, Siobhan A
2015-02-25 15:13 ` [dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD Thomas Monjalon
2015-02-26 11:51 ` Gleb Natapov
2015-02-26 13:36 ` Thomas Monjalon
2015-02-26 13:49 ` Gleb Natapov
2015-02-26 14:18 ` Adrien Mazarguil
2015-02-26 15:03 ` Gleb Natapov
2015-02-27 18:38 ` Adrien Mazarguil
2015-03-01 11:07 ` Gleb Natapov
2015-02-21 4:17 ` [dpdk-dev] [PATCH v2 2/4] mlx4: new poll mode driver Adrien Mazarguil
2015-02-21 4:17 ` [dpdk-dev] [PATCH v2 3/4] maintainers: claim responsibility for mlx4 PMD Adrien Mazarguil
2015-02-21 4:17 ` Adrien Mazarguil [this message]
2015-03-01 11:15 ` [dpdk-dev] [PATCH 0/2] Mellanox ConnectX-3 PMD Keunhong Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1424492227-27229-5-git-send-email-adrien.mazarguil@6wind.com \
--to=adrien.mazarguil@6wind.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).