From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f171.google.com (mail-we0-f171.google.com [74.125.82.171]) by dpdk.org (Postfix) with ESMTP id B69FC9AA7 for ; Wed, 25 Feb 2015 14:52:34 +0100 (CET) Received: by wesx3 with SMTP id x3so3737826wes.6 for ; Wed, 25 Feb 2015 05:52:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=BRg8vf7WAI5/82m2e6RJeYuQmMb/D1D8b/C+OhcC2Uk=; b=Z4KupmvDgFGd2Z13C5O5k6LGMkE1iQnFsiV11tKJaph5mCLJ6y4HHRFzVyFBapyzcl fTFsgyNXofcOWIW7L4necui34bU8EIi06Ec9eMWQ4GXlET4bQptigkiMBVuQHQ4mOTke k2X2ar3OW7dqoYS+2v21xSbhEyWjlPmb0Zq8Yopsfax4dvJst0DjTkerYz+dOd/sT3pu Ay6duUGTmEUiKd1J2b3BgcYZ6ZuOBnHJhJ4Ci48fXRHkCOTN4kmuJ9bc8CqIExeOgeCG NPJlmCbcZLke3GxRxQ6OqCvOMV2AVC2x3cRRlSdcrw0IHJykeFUyfDiH/9oRYHLjMhyh eXOg== X-Gm-Message-State: ALoCoQlXCkYCIyQ5ApnE+It0PyCbJgF8d19NAXCWcY9eJBLy3Wa+RwXCfWunV1j5bXf0vWc/emdX X-Received: by 10.180.108.84 with SMTP id hi20mr6712776wib.86.1424872354599; Wed, 25 Feb 2015 05:52:34 -0800 (PST) Received: from 6wind.com (6wind.net2.nerim.net. [213.41.180.237]) by mx.google.com with ESMTPSA id fs8sm25232472wib.8.2015.02.25.05.52.33 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 25 Feb 2015 05:52:33 -0800 (PST) From: Adrien Mazarguil To: dev@dpdk.org Date: Wed, 25 Feb 2015 14:52:06 +0100 Message-Id: <1424872326-17930-4-git-send-email-adrien.mazarguil@6wind.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1424872326-17930-1-git-send-email-adrien.mazarguil@6wind.com> References: <1424492174-27072-1-git-send-email-adrien.mazarguil@6wind.com> <1424872326-17930-1-git-send-email-adrien.mazarguil@6wind.com> Subject: [dpdk-dev] [PATCH v3 3/3] doc: add librte_pmd_mlx4 documentation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2015 13:52:35 -0000 This documentation covers implementation details, features and limitations, configuration, prerequisites and provides a usage example. Signed-off-by: Adrien Mazarguil --- MAINTAINERS | 1 + doc/guides/prog_guide/index.rst | 1 + doc/guides/prog_guide/mlx4_poll_mode_drv.rst | 326 +++++++++++++++++++++++++++ doc/guides/prog_guide/source_org.rst | 1 + 4 files changed, 329 insertions(+) create mode 100644 doc/guides/prog_guide/mlx4_poll_mode_drv.rst diff --git a/MAINTAINERS b/MAINTAINERS index d8b0fbc..ac61825 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -223,6 +223,7 @@ F: lib/librte_pmd_fm10k/ Mellanox mlx4 M: Adrien Mazarguil F: lib/librte_pmd_mlx4/ +F: doc/guides/prog_guide/mlx4_poll_mode_drv.rst RedHat virtio M: Changchun Ouyang diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index de69682..87f6b35 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -56,6 +56,7 @@ Programmer's Guide intel_dpdk_xen_based_packet_switch_sol libpcap_ring_based_poll_mode_drv link_bonding_poll_mode_drv_lib + mlx4_poll_mode_drv timer_lib hash_lib lpm_lib diff --git a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst new file mode 100644 index 0000000..35570c3 --- /dev/null +++ b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst @@ -0,0 +1,326 @@ +.. BSD LICENSE + Copyright 2012-2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX4 poll mode driver library +============================= + +The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support +for **Mellanox ConnectX-3** 10/40 Gbps adapters (EN 40, EN 10, Pro EN 40) as +well as their virtual functions (VF) in SR-IOV context. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI +bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a +PCI driver that allocates one Ethernet device per detected port. + +For this reason, one cannot white/blacklist a single port without also +white/blacklisting the others on the same device. + +Besides its dependency on libibverbs (that implies libmlx4 and associated +kernel support), librte_pmd_mlx4 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs. + +Features and limitations +------------------------ + +- RSS, also known as RCA, is supported. In this mode the number of + configured RX queues must be a power of two. +- VLAN filtering is supported. +- Link state information is provided. +- Promiscuous mode is supported. +- All multicast mode is supported. +- Multiple MAC addresses (unicast, multicast) can be configured. +- Scattered packets are supported for TX and RX. + +.. + +- RSS hash key cannot be modified. +- Hardware counters are not implemented (they are software counters). +- Checksum offloads are not supported yet. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +- ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx4 itself. + +- ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**) + + Number of scatter/gather elements (SGEs) per work request (WR). Lowering + this number improves performance but also limits the ability to receive + scattered packets (packets that do not fit a single mbuf). The default + value is a safe tradeoff. + +- ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**) + + Amount of data to be inlined during TX operations. Improves latency but + lowers throughput. + +- ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**) + + Toggle software counters. No counters are available if this option is + disabled since hardware counters are not supported. + +- ``CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE`` (default **1**) + + Toggle VMware compatibility code. It also requires the environment + variable ``MLX4_COMPAT_VMWARE`` set to a nonzero value at runtime. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX4_INLINE_RECV_SIZE`` + + A nonzero value enables inline receive for packets up to that size. May + significantly improve performance in some cases but lower it in + others. Requires careful testing. + +- ``MLX4_COMPAT_VMWARE`` + + Only supported when compiled with + ``CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1``. Adds workarounds to run in + VMware systems that do not support the flows API properly. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- The only constraint when RSS mode is requested is to make sure the number + of RX queues is a power of two. This is a hardware requirement. + +- librte_pmd_mlx4 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space verbs framework used by librte_pmd_mlx4. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx4. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx4** + + Low-level user space driver library for Mellanox ConnectX-3 devices, + it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices. + - mlx4_en: Ethernet device driver that provides kernel network interfaces. + - mlx4_ib: InifiniBand device driver. + - ib_uverbs: user space driver for verbs (entry point for libibverbs). + +While these libraries and kernel modules are available on OpenFabrics +Aliance's `website `_ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +`_ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx4 and mlnx-ofed-kernel packages are required from +that distribution. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3 +devices managed by librte_pmd_mlx4. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth2 + eth3 + eth4 + eth5 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:83:00.0 + -w 0000:83:00.0 + -w 0000:84:00.0 + -w 0000:84:00.0 + + .. note:: + + There are only two distinct PCI bus addresses because the Mellanox + ConnectX-3 adapters installed on this system are dual port. + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -c 0xff00 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51 + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1 + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2 + Port 0: 00:02:C9:B5:B7:50 + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2 + Port 1: 00:02:C9:B5:B7:51 + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2 + Port 2: 00:02:C9:B5:BA:B0 + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2 + Port 3: 00:02:C9:B5:BA:B1 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 40000 Mbps - full-duplex + Done + testpmd> diff --git a/doc/guides/prog_guide/source_org.rst b/doc/guides/prog_guide/source_org.rst index c8ca54f..c66ad16 100644 --- a/doc/guides/prog_guide/source_org.rst +++ b/doc/guides/prog_guide/source_org.rst @@ -83,6 +83,7 @@ The lib directory contains:: +-- librte_pmd_e1000 # 1GbE poll mode drivers (igb and em) +-- librte_pmd_ixgbe # 10GbE poll mode driver +-- librte_pmd_i40e # 40GbE poll mode driver + +-- librte_pmd_mlx4 # Mellanox ConnectX-3 poll mode driver +-- librte_pmd_pcap # PCAP poll mode driver +-- librte_pmd_ring # ring poll mode driver +-- librte_pmd_virtio # virtio poll mode driver -- 2.1.0