DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v6 0/7] switching device representation
@ 2018-03-28 13:54 Declan Doherty
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation Declan Doherty
                   ` (8 more replies)
  0 siblings, 9 replies; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

This patchset follows on from the port rerpesentor patchsets and the
community discussion that resulted. It outlines the model for
representing and controlling switching capable devices in a new
programmer's guide entry based upon the excellent summary by 
Adrien Mazarguil in
 (http://dpdk.org/ml/archives/dev/2018-March/092513.html).

The next patches introduce changes to librte_ether to:
1, support the definition of a switch domain and make it public to
application through the rte_eth_dev_info structure.
2, Add generic ethdev create/destroy APIs to facilitate and generalise the
creation of ethdev's on different bus types.
3, Add ethdev attribute to dev_flags to specify that a port is a
representor port and make public through the rte_eth_dev_info
structure.
4, Add devargs parsing for generic eth_devargs to facilate parsing in
NET PMDs. This will be refactored to take account of the changes in 
(http://dpdk.org/ml/archives/dev/2018-March/092513.html)

This patchset also includes the enablement of port representor for ixgbe 
and i40e PF devices.


Adrien Mazarguil (1):
  doc: add switch representation documentation

Declan Doherty (6):
  ethdev: add switch identifier parameter to port
  ethdev: add generic create/destroy ethdev APIs
  ethdev: Add port representor device flag
  app/testpmd: add port name to device info
  net/i40e: add support for representor ports
  net/ixgbe: add support for representor ports

Remy Horton (1):
  ethdev: add common devargs parser

 app/test-pmd/config.c                           |   4 +
 doc/guides/prog_guide/index.rst                 |   1 +
 doc/guides/prog_guide/switch_representation.rst | 829 ++++++++++++++++++++++++
 drivers/net/i40e/Makefile                       |   3 +
 drivers/net/i40e/i40e_ethdev.c                  |  71 +-
 drivers/net/i40e/i40e_ethdev.h                  |  12 +
 drivers/net/i40e/i40e_vf_representor.c          | 392 +++++++++++
 drivers/net/i40e/meson.build                    |   4 +-
 drivers/net/i40e/rte_pmd_i40e.c                 |  43 ++
 drivers/net/i40e/rte_pmd_i40e.h                 |  18 +
 drivers/net/ixgbe/Makefile                      |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c                |  70 +-
 drivers/net/ixgbe/ixgbe_ethdev.h                |  12 +
 drivers/net/ixgbe/ixgbe_vf_representor.c        | 210 ++++++
 drivers/net/ixgbe/meson.build                   |   4 +-
 lib/Makefile                                    |   1 +
 lib/librte_ether/Makefile                       |   1 +
 lib/librte_ether/meson.build                    |   1 +
 lib/librte_ether/rte_ethdev.c                   | 291 ++++++++-
 lib/librte_ether/rte_ethdev.h                   |  10 +-
 lib/librte_ether/rte_ethdev_core.h              |   1 +
 lib/librte_ether/rte_ethdev_driver.h            |  87 +++
 lib/librte_ether/rte_ethdev_pci.h               |  12 +
 lib/librte_ether/rte_ethdev_representor.h       |  31 +
 lib/librte_ether/rte_ethdev_version.map         |   9 +
 25 files changed, 2097 insertions(+), 21 deletions(-)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c
 create mode 100644 lib/librte_ether/rte_ethdev_representor.h

-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-28 14:53   ` Thomas Monjalon
  2018-04-03 15:52   ` Adrien Mazarguil
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port Declan Doherty
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Adrien Mazarguil,
	Declan Doherty

From: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Add document to describe a model for representing switching capable
devices in DPDK, using a general ethdev port model and through port
representors.This document also details the port model and the
rte_flow semantics required for flow programming, as well as listing
some example use cases.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 doc/guides/prog_guide/index.rst                 |   1 +
 doc/guides/prog_guide/switch_representation.rst | 829 ++++++++++++++++++++++++
 2 files changed, 830 insertions(+)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index bbbe7895d..09224af2e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -17,6 +17,7 @@ Programmer's Guide
     mbuf_lib
     poll_mode_drv
     rte_flow
+    switch_representation
     traffic_metering_and_policing
     traffic_management
     bbdev
diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
new file mode 100644
index 000000000..f1a84f6b7
--- /dev/null
+++ b/doc/guides/prog_guide/switch_representation.rst
@@ -0,0 +1,829 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 6WIND S.A.
+
+.. _switch_representation:
+
+Switch representation within DPDK applications
+==============================================
+
+.. contents:: :local:
+
+Introduction
+------------
+
+Network adapters with multiple physical ports and/or SR-IOV capabilities
+usually support the offload of traffic steering rules between their virtual
+functions (VFs), physical functions (PFs) and ports.
+
+Like for standard Ethernet switches, this involves a combination of
+automatic MAC learning and manual configuration. For most purposes it is
+managed by the host system and fully transparent to users and applications.
+
+On the other hand, applications typically found on hypervisors that process
+layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
+according on their own criteria.
+
+Without a standard software interface to manage traffic steering rules
+between VFs, PFs and the various physical ports of a given device,
+applications cannot take advantage of these offloads; software processing is
+mandatory even for traffic which ends up re-injected into the device it
+originates from.
+
+This document describes how such steering rules can be configured through
+the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
+(PF/VF steering) using a single physical port for clarity, however the same
+logic applies to any number of ports without necessarily involving SR-IOV.
+
+Port Representors
+-----------------
+
+In many cases, traffic steering rules cannot be determined in advance;
+applications usually have to process a bit of traffic in software before
+thinking about offloading specific flows to hardware.
+
+Applications therefore need the ability to receive and inject traffic to
+various device endpoints (other VFs, PFs or physical ports) before
+connecting them together. Device drivers must provide means to hook the
+"other end" of these endpoints and to refer them when configuring flow
+rules.
+
+This role is left to so-called "port representors" (also known as "VF
+representors" in the specific context of VFs), which are to DPDK what the
+Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
+which can be thought as a software "patch panel" front-end for applications.
+
+- DPDK port representors are implemented as additional virtual Ethernet
+  device (**ethdev**) instances, spawned on an as needed basis through
+  configuration parameters passed to the driver of the underlying
+  device using devargs.
+
+::
+
+   -w pci:dbdf,representor=0
+   -w pci:dbdf,representor=[0-3]
+   -w pci:dbdf,representor=[0,5-11]
+
+- As virtual devices, they may be more limited than their physical
+  counterparts, for instance by exposing only a subset of device
+  configuration callbacks and/or by not necessarily having Rx/Tx capability.
+
+- Among other things, they can be used to assign MAC addresses to the
+  resource they represent.
+
+- Applications can tell port representors apart from other physcial of virtual
+  port by checking the dev_flags field within their device information
+  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.
+
+.. code-block:: c
+
+  struct rte_eth_dev_info {
+	..
+	uint32_t dev_flags; /**< Device flags */
+	..
+  };
+
+- The device or group relationship of ports can be discovered using the
+  switch_id field within the device information structure. By default the
+  switch_id of a port will be it's port_id but ports within the same switch
+  domain will share the same *switch_id* which in the case of SR-IOV devices
+  would align to the port_id of the physical function port.
+
+.. code-block:: c
+
+  struct rte_eth_dev_info {
+	..
+	uint16_t switch_id; /**< Switch Domain Id */
+	..
+  };
+
+
+.. [1] `Ethernet switch device driver model (switchdev)
+       <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
+
+Basic SR-IOV
+------------
+
+"Basic" in the sense that it is not managed by applications, which
+nonetheless expect traffic to flow between the various endpoints and the
+outside as if everything was linked by an Ethernet hub.
+
+The following diagram pictures a setup involving a device with one PF, two
+VFs and one shared physical port
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+----------'                 `----------+--' `--+----------'
+          |                                       |       |
+    .-----+-----.                                 |       |
+    | port_id 3 |                                 |       |
+    `-----+-----'                                 |       |
+          |                                       |       |
+        .-+--.                                .---+--. .--+---.
+        | PF |                                | VF 1 | | VF 2 |
+        `-+--'                                `---+--' `--+---'
+          |                                       |       |
+          `---------.     .-----------------------'       |
+                    |     |     .-------------------------'
+                    |     |     |
+                 .--+-----+-----+--.
+                 | interconnection |
+                 `--------+--------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- A DPDK application running on the hypervisor owns the PF device, which is
+  arbitrarily assigned port index 3.
+
+- Both VFs are assigned to VMs and used by unknown applications; they may be
+  DPDK-based or anything else.
+
+- Interconnection is not necessarily done through a true Ethernet switch and
+  may not even exist as a separate entity. The role of this block is to show
+  that something brings PF, VFs and physical ports together and enables
+  communication between them, with a number of built-in restrictions.
+
+Subsequent sections in this document describe means for DPDK applications
+running on the hypervisor to freely assign specific flows between PF, VFs
+and physical ports based on traffic properties, by managing this
+interconnection.
+
+Controlled SR-IOV
+-----------------
+
+Initialization
+~~~~~~~~~~~~~~
+
+When a DPDK application gets assigned a PF device and is deliberately not
+started in `basic SR-IOV`_ mode, any traffic coming from physical ports is
+received by PF according to default rules, while VFs remain isolated.
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+----------'                 `----------+--' `--+----------'
+          |                                       |       |
+    .-----+-----.                                 |       |
+    | port_id 3 |                                 |       |
+    `-----+-----'                                 |       |
+          |                                       |       |
+        .-+--.                                .---+--. .--+---.
+        | PF |                                | VF 1 | | VF 2 |
+        `-+--'                                `------' `------'
+          |
+          `-----.
+                |
+             .--+----------------------.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+In this mode, interconnection must be configured by the application to
+enable VF communication, for instance by explicitly directing traffic with a
+given destination MAC address to VF 1 and allowing that with the same source
+MAC address to come out of it.
+
+For this to work, hypervisor applications need a way to refer to either VF 1
+or VF 2 in addition to the PF. This is addressed by `VF representors`_.
+
+VF representors
+~~~~~~~~~~~~~~~
+
+VF representors are virtual but standard DPDK network devices (albeit with
+limited capabilities) created by PMDs when managing a PF device.
+
+Since they represent VF instances used by other applications, configuring
+them (e.g. assigning a MAC address or setting up promiscuous mode) affects
+interconnection accordingly. If supported, they may also be used as two-way
+communication ports with VFs (assuming **switchdev** topology)
+
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          |             |             |           |       |
+    .-----+-----. .-----+-----. .-----+-----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
+          |             |             |           |       |
+          |             |   .---------'           |       |
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- VF representors are assigned arbitrary port indices 4 and 5 in the
+  hypervisor application and are respectively associated with VF 1 and VF 2.
+
+- They can't be dissociated; even if VF 1 and VF 2 were not connected,
+  representors could still be used for configuration.
+
+- In this context, port index 3 can be thought as a representor for physical
+  port 0.
+
+As previously described, the "interconnection" block represents a logical
+concept. Interconnection occurs when hardware configuration enables traffic
+flows from one place to another (e.g. physical port 0 to VF 1) according to
+some criteria.
+
+This is discussed in more detail in `traffic steering`_.
+
+Traffic steering
+~~~~~~~~~~~~~~~~
+
+In the following diagram, each meaningful traffic origin or endpoint as seen
+by the hypervisor application is tagged with a unique letter from A to F.
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          |             |             |           |       |
+    .----(A)----. .----(B)----. .----(C)----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+          |             |             |           |       |
+          |             |   .---------'           |       |
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .---(F)----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- **A**: PF device.
+- **B**: port representor for VF 1.
+- **C**: port representor for VF 2.
+- **D**: VF 1 proper.
+- **E**: VF 2 proper.
+- **F**: physical port.
+
+Although uncommon, some devices do not enforce a one to one mapping between
+PF and physical ports. For instance, by default all ports of **mlx4**
+adapters are available to all their PF/VF instances, in which case
+additional ports appear next to **F** in the above diagram.
+
+Assuming no interconnection is provided by default in this mode, setting up
+a `basic SR-IOV`_ configuration involving physical port 0 could be broken
+down as:
+
+PF:
+
+- **A to F**: let everything through.
+- **F to A**: PF MAC as destination.
+
+VF 1:
+
+- **A to D**, **E to D** and **F to D**: VF 1 MAC as destination.
+- **D to A**: VF 1 MAC as source and PF MAC as destination.
+- **D to E**: VF 1 MAC as source and VF 2 MAC as destination.
+- **D to F**: VF 1 MAC as source.
+
+VF 2:
+
+- **A to E**, **D to E** and **F to E**: VF 2 MAC as destination.
+- **E to A**: VF 2 MAC as source and PF MAC as destination.
+- **E to D**: VF 2 MAC as source and VF 1 MAC as destination.
+- **E to F**: VF 2 MAC as source.
+
+Devices may additionally support advanced matching criteria such as
+IPv4/IPv6 addresses or TCP/UDP ports.
+
+The combination of matching criteria with target endpoints fits well with
+**rte_flow** [6]_, which expresses flow rules as combinations of patterns
+and actions.
+
+Enhancing **rte_flow** with the ability to make flow rules match and target
+these endpoints provides a standard interface to manage their
+interconnection without introducing new concepts and whole new API to
+implement them. This is described in `flow API (rte_flow)`_.
+
+.. [6] `Generic flow API (rte_flow)
+       <http://dpdk.org/doc/guides/prog_guide/rte_flow.html>`_
+
+Flow API (rte_flow)
+-------------------
+
+Extensions
+~~~~~~~~~~
+
+Compared to creating a brand new dedicated interface, **rte_flow** was
+deemed flexible enough to manage representor traffic only with minor
+extensions:
+
+- Using physical ports, PF, VF or port representors as targets.
+
+- Affecting traffic that is not necessarily addressed to the DPDK port ID a
+  flow rule is associated with (e.g. forcing VF traffic redirection to PF).
+
+For advanced uses:
+
+- Rule-based packet counters.
+
+- The ability to combine several identical actions for traffic duplication
+  (e.g. VF representor in addition to a physical port).
+
+- Dedicated actions for traffic encapsulation / decapsulation before
+  reaching a endpoint.
+
+Traffic direction
+~~~~~~~~~~~~~~~~~
+
+From an application standpoint, "ingress" and "egress" flow rule attributes
+apply to the DPDK port ID they are associated with. They select a traffic
+direction for matching patterns, but have no impact on actions.
+
+When matching traffic coming from or going to a different place than the
+immediate port ID a flow rule is associated with, these attributes keep
+their meaning while applying to the chosen origin, as highlighted by the
+following diagram
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          | ^           | ^           | ^         |       |
+          | | ingress   | | ingress   | | ingress |       |
+          | | egress    | | egress    | | egress  |       |
+          | v           | v           | v         |       |
+    .----(A)----. .----(B)----. .----(C)----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+          |             |             |         ^ |       | ^
+          |             |             |  egress | |       | | egress
+          |             |             | ingress | |       | | ingress
+          |             |   .---------'         v |       | v
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                        ^ |
+                ingress | |
+                 egress | |
+                        v |
+                     .---(F)----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+Ingress and egress are defined as relative to the application creating the
+flow rule.
+
+For instance, matching traffic sent by VM 2 would be done through an ingress
+flow rule on VF 2 (**E**). Likewise for incoming traffic on physical port
+(**F**). This also applies to **C** and **A** respectively.
+
+Transferring traffic
+~~~~~~~~~~~~~~~~~~~~
+
+Without port representors
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`Traffic direction`_ describes how an application could match traffic coming
+from or going to a specific place reachable from a DPDK port ID. This makes
+sense when the traffic in question is normally seen (i.e. sent or received)
+by the application creating the flow rule (e.g. as in "redirect all traffic
+coming from VF 1 to local queue 6").
+
+However this does not force such traffic to take a specific route. Creating
+a flow rule on **A** matching traffic coming from **D** is only meaningful
+if it can be received by **A** in the first place, otherwise doing so simply
+has no effect.
+
+A new flow rule attribute named "transfer" is necessary for that. Combining
+it with "ingress" or "egress" and a specific origin requests a flow rule to
+be applied at the lowest level
+
+::
+
+             ingress only           :       ingress + transfer
+                                    :
+    .-------------. .-------------. : .-------------. .-------------.
+    | hypervisor  | |    VM 1     | : | hypervisor  | |    VM 1     |
+    | application | | application | : | application | | application |
+    `------+------' `--+----------' : `------+------' `--+----------'
+           |           | | traffic  :        |           | | traffic
+     .----(A)----.     | v          :  .----(A)----.     | v
+     | port_id 3 |     |            :  | port_id 3 |     |
+     `-----+-----'     |            :  `-----+-----'     |
+           |           |            :        | ^         |
+           |           |            :        | | traffic |
+         .-+--.    .---+--.         :      .-+--.    .---+--.
+         | PF |    | VF 1 |         :      | PF |    | VF 1 |
+         `-+--'    `--(D)-'         :      `-+--'    `--(D)-'
+           |           | | traffic  :        | ^         | | traffic
+           |           | v          :        | | traffic | v
+        .--+-----------+--.         :     .--+-----------+--.
+        | interconnection |         :     | interconnection |
+        `--------+--------'         :     `--------+--------'
+                 | | traffic        :              |
+                 | v                :              |
+            .---(F)----.            :         .---(F)----.
+            | physical |            :         | physical |
+            |  port 0  |            :         |  port 0  |
+            `----------'            :         `----------'
+
+With "ingress" only, traffic is matched on **A** thus still goes to physical
+port **F** by default
+
+
+::
+
+   testpmd> flow create 3 ingress pattern vf id is 1 / end
+              actions queue index 6 / end
+
+With "ingress + transfer", traffic is matched on **D** and is therefore
+successfully assigned to queue 6 on **A**
+
+
+::
+
+    testpmd> flow create 3 ingress transfer pattern vf id is 1 / end
+              actions queue index 6 / end
+
+
+With port representors
+^^^^^^^^^^^^^^^^^^^^^^
+
+When port representors exist, implicit flow rules with the "transfer"
+attribute (described in `without port representors`_) are be assumed to
+exist between them and their represented resources. These may be immutable.
+
+In this case, traffic is received by default through the representor and
+neither the "transfer" attribute nor traffic origin in flow rule patterns
+are necessary. They simply have to be created on the representor port
+directly and may target a different representor as described in `PORT_ID
+action`_.
+
+Implicit traffic flow with port representor
+
+::
+
+       .-------------.   .-------------.
+       | hypervisor  |   |    VM 1     |
+       | application |   | application |
+       `--+-------+--'   `----------+--'
+          |       | ^               | | traffic
+          |       | | traffic       | v
+          |       `-----.           |
+          |             |           |
+    .----(A)----. .----(B)----.     |
+    | port_id 3 | | port_id 4 |     |
+    `-----+-----' `-----+-----'     |
+          |             |           |
+        .-+--.    .-----+-----. .---+--.
+        | PF |    | VF 1 rep. | | VF 1 |
+        `-+--'    `-----+-----' `--(D)-'
+          |             |           |
+       .--|-------------|-----------|--.
+       |  |             |           |  |
+       |  |             `-----------'  |
+       |  |              <-- traffic   |
+       `--|----------------------------'
+          |
+     .---(F)----.
+     | physical |
+     |  port 0  |
+     `----------'
+
+Pattern items and actions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PORT pattern item
+^^^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a physical
+port of the underlying device.
+
+Using this pattern item without specifying a port index matches the physical
+port associated with the current DPDK port ID by default. As described in
+`traffic steering`_, specifying it should be rarely needed.
+
+- Matches **F** in `traffic steering`_.
+
+PORT action
+^^^^^^^^^^^
+
+Directs matching traffic to a given physical port index.
+
+- Targets **F** in `traffic steering`_.
+
+PORT_ID pattern item
+^^^^^^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a given DPDK
+port ID.
+
+Normally only supported if the port ID in question is known by the
+underlying PMD and related to the device the flow rule is created against.
+
+This must not be confused with the `PORT pattern item`_ which refers to the
+physical port of a device. ``PORT_ID`` refers to a ``struct rte_eth_dev``
+object on the application side (also known as "port representor" depending
+on the kind of underlying device).
+
+- Matches **A**, **B** or **C** in `traffic steering`_.
+
+PORT_ID action
+^^^^^^^^^^^^^^
+
+Directs matching traffic to a given DPDK port ID.
+
+Same restrictions as `PORT_ID pattern item`_.
+
+- Targets **A**, **B** or **C** in `traffic steering`_.
+
+PF pattern item
+^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) the physical
+function of the current device.
+
+If supported, should work even if the physical function is not managed by
+the application and thus not associated with a DPDK port ID. Its behavior is
+otherwise similar to `PORT_ID pattern item`_ using PF port ID.
+
+- Matches **A** in `traffic steering`_.
+
+PF action
+^^^^^^^^^
+
+Directs matching traffic to the physical function of the current device.
+
+Same restrictions as `PF pattern item`_.
+
+- Targets **A** in `traffic steering`_.
+
+VF pattern item
+^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a given
+virtual function of the current device.
+
+If supported, should work even if the virtual function is not managed by
+the application and thus not associated with a DPDK port ID. Its behavior is
+otherwise similar to `PORT_ID pattern item`_ using VF port ID.
+
+Note this pattern item does not match VF representors traffic which, as
+separate entities, should be addressed through their own port IDs.
+
+- Matches **D** or **E** in `traffic steering`_.
+
+VF action
+^^^^^^^^^
+
+Directs matching traffic to a given virtual function of the current device.
+
+Same restrictions as `VF pattern item`_.
+
+- Targets **D** or **E** in `traffic steering`_.
+
+\*_ENCAP actions
+^^^^^^^^^^^^^^^^
+
+These actions are named according to the protocol they encapsulate traffic
+with (e.g. ``VXLAN_ENCAP``) and using specific parameters (e.g. VNI for
+VXLAN).
+
+While they modify traffic and can be used multiple times (order matters),
+unlike `PORT_ID action`_ and friends, they have no impact on steering.
+
+As described in `actions order and repetition`_ this means they are useless
+if used alone in an action list, the resulting traffic gets dropped unless
+combined with either ``PASSTHRU`` or other endpoint-targeting actions.
+
+\*_DECAP actions
+^^^^^^^^^^^^^^^^
+
+They perform the reverse of `\*_ENCAP actions`_ by popping protocol headers
+from traffic instead of pushing them. They can be used multiple times as
+well.
+
+Note that using these actions on non-matching traffic results in undefined
+behavior. It is recommended to match the protocol headers to decapsulate on
+the pattern side of a flow rule in order to use these actions or otherwise
+make sure only matching traffic goes through.
+
+Actions Order and Repetition
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Flow rules are currently restricted to at most a single action of each
+supported type, performed in an unpredictable order (or all at once). To
+repeat actions in a predictable fashion, applications have to make rules
+pass-through and use priority levels.
+
+It's now clear that PMD support for chaining multiple non-terminating flow
+rules of varying priority levels is prohibitively difficult to implement
+compared to simply allowing multiple identical actions performed in a
+defined order by a single flow rule.
+
+- This change is required to support protocol encapsulation offloads and the
+  ability to perform them multiple times (e.g. VLAN then VXLAN).
+
+- It makes the ``DUP`` action redundant since multiple ``QUEUE`` actions can
+  be combined for duplication.
+
+- The (non-)terminating property of actions must be discarded. Instead, flow
+  rules themselves must be considered terminating by default (i.e. dropping
+  traffic if there is no specific target) unless a ``PASSTHRU`` action is
+  also specified.
+
+Switching Examples
+------------------
+
+This section provides practical examples based on the established Testpmd
+flow command syntax [2]_, in the context described in `traffic steering`_
+
+::
+
+      .-------------.                 .-------------. .-------------.
+      | hypervisor  |                 |    VM 1     | |    VM 2     |
+      | application |                 | application | | application |
+      `--+---+---+--'                 `----------+--' `--+----------'
+         |   |   |                               |       |
+         |   |   `-------------------.           |       |
+         |   `---------.             |           |       |
+         |             |             |           |       |
+   .----(A)----. .----(B)----. .----(C)----.     |       |
+   | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+   `-----+-----' `-----+-----' `-----+-----'     |       |
+        |             |             |           |       |
+      .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+      | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+      `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+        |             |             |           |       |
+        |             |   .---------'           |       |
+        `-----.       |   |   .-----------------'       |
+              |       |   |   |   .---------------------'
+              |       |   |   |   |
+           .--|-------|---|---|---|--.
+           |  |       |   `---|---'  |
+           |  |       `-------'      |
+           |  `---------.            |
+           `------------|------------'
+                        |
+                   .---(F)----.
+                   | physical |
+                   |  port 0  |
+                   `----------'
+
+By default, PF (**A**) can communicate with the physical port it is
+associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
+and restricted to communicate with the hypervisor application through their
+respective representors (**B** and **C**) if supported.
+
+Examples in subsequent sections apply to hypervisor applications only and
+are based on port representors **A**, **B** and **C**.
+
+.. [2] `Flow syntax
+    <http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`
+
+Associating VF 1 with physical port 0
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assign all port traffic (**F**) to VF 1 (**D**) indiscriminately through
+their representors
+
+::
+
+   flow create 3 ingress pattern / end actions port_id id 4 / end
+   flow create 4 ingress pattern / end actions port_id id 3 / end
+
+More practical example with MAC address restrictions
+
+::
+
+   flow create 3 ingress
+       pattern eth dst is {VF 1 MAC} / end
+       actions port_id id 4 / end
+
+::
+
+   flow create 4 ingress
+       pattern eth src is {VF 1 MAC} / end
+       actions port_id id 3 / end
+
+
+Sharing broadcasts
+~~~~~~~~~~~~~~~~~~
+
+From outside to PF and VFs
+
+::
+
+   flow create 3 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff / end
+      actions port_id id 3 / port_id id 4 / port_id id 5 / end
+
+Note ``port_id id 3`` is necessary otherwise only VFs would receive matching
+traffic.
+
+From PF to outside and VFs
+
+::
+
+   flow create 3 egress
+      pattern eth dst is ff:ff:ff:ff:ff:ff / end
+      actions port / port_id id 4 / port_id id 5 / end
+
+From VFs to outside and PF
+
+::
+
+   flow create 4 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 1 MAC} / end
+      actions port_id id 3 / port_id id 5 / end
+
+   flow create 5 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 2 MAC} / end
+      actions port_id id 4 / port_id id 4 / end
+
+Similar ``33:33:*`` rules based on known MAC addresses should be added for
+IPv6 traffic.
+
+Encapsulating VF 2 traffic in VXLAN
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assuming pass-through flow rules are supported
+
+::
+
+   flow create 5 ingress
+      pattern eth / end
+      actions vxlan_encap vni 42 / passthru / end
+
+::
+
+   flow create 5 egress
+      pattern vxlan vni is 42 / end
+      actions vxlan_decap / passthru / end
+
+Here ``passthru`` is needed since as described in `actions order and
+repetition`_, flow rules are otherwise terminating; if supported, a rule
+without a target endpoint will drop traffic.
+
+Without pass-through support, ingress encapsulation on the destination
+endpoint might not be supported and action list must provide one
+
+::
+
+   flow create 5 ingress
+      pattern eth src is {VF 2 MAC} / end
+      actions vxlan_encap vni 42 / port_id id 3 / end
+
+   flow create 3 ingress
+      pattern vxlan vni is 42 / end
+      actions vxlan_decap / port_id id 5 / end
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-29  6:13   ` Shahaf Shuler
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs Declan Doherty
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

Introduces a new port attribute to ethdev port's which denotes the
switch domain a port belongs to. By default all port's switch
identifiers are the their port_id. Ports which share a common switch
domain are configured with the same switch id.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 app/test-pmd/config.c              | 1 +
 lib/librte_ether/rte_ethdev.c      | 3 +++
 lib/librte_ether/rte_ethdev.h      | 1 +
 lib/librte_ether/rte_ethdev_core.h | 1 +
 4 files changed, 6 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 4bb255c62..e12f8c515 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -517,6 +517,7 @@ port_infos_display(portid_t port_id)
 	printf("Min possible number of TXDs per queue: %hu\n",
 		dev_info.tx_desc_lim.nb_min);
 	printf("TXDs number alignment: %hu\n", dev_info.tx_desc_lim.nb_align);
+	printf("Switch Id: %u\n", dev_info.switch_id);
 }
 
 void
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 23857c91f..f32d18cad 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -290,6 +290,8 @@ rte_eth_dev_allocate(const char *name)
 	eth_dev = eth_dev_get(port_id);
 	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
 	eth_dev->data->port_id = port_id;
+	eth_dev->data->switch_id = port_id;
+	/**< Default switch_id is the port_id of the device */
 	eth_dev->data->mtu = ETHER_MTU;
 
 unlock:
@@ -2395,6 +2397,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
 	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
+	dev_info->switch_id = dev->data->switch_id;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 036153306..dced4fc41 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1029,6 +1029,7 @@ struct rte_eth_dev_info {
 	/** Configured number of rx/tx queues */
 	uint16_t nb_rx_queues; /**< Number of RX queues. */
 	uint16_t nb_tx_queues; /**< Number of TX queues. */
+	uint16_t switch_id; /**< Switch Domain Id */
 };
 
 /**
diff --git a/lib/librte_ether/rte_ethdev_core.h b/lib/librte_ether/rte_ethdev_core.h
index e5681e466..caed7a4e6 100644
--- a/lib/librte_ether/rte_ethdev_core.h
+++ b/lib/librte_ether/rte_ethdev_core.h
@@ -585,6 +585,7 @@ struct rte_eth_dev_data {
 	struct ether_addr* hash_mac_addrs;
 	/** Device Ethernet MAC addresses of hash filtering. */
 	uint16_t port_id;           /**< Device [external] port identifier. */
+	uint16_t switch_id;	    /**< Switch which port is associated with */
 	__extension__
 	uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
 		scattered_rx : 1,  /**< RX of scattered packets is ON(1) / OFF(0) */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation Declan Doherty
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-29  6:13   ` Shahaf Shuler
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag Declan Doherty
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

Add new bus generic ethdev create/destroy APIs which are bus independent
and provide hooks for bus specific initialisation.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/Makefile                 |  1 +
 lib/librte_ether/meson.build              |  1 +
 lib/librte_ether/rte_ethdev.c             | 96 ++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ethdev_driver.h      | 57 ++++++++++++++++++
 lib/librte_ether/rte_ethdev_pci.h         | 12 ++++
 lib/librte_ether/rte_ethdev_representor.h | 28 +++++++++
 lib/librte_ether/rte_ethdev_version.map   |  8 +++
 7 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_ethdev_representor.h

diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 3ca5782bb..5698cd47b 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -32,6 +32,7 @@ SYMLINK-y-include += rte_ethdev_driver.h
 SYMLINK-y-include += rte_ethdev_core.h
 SYMLINK-y-include += rte_ethdev_pci.h
 SYMLINK-y-include += rte_ethdev_vdev.h
+SYMLINK-y-include += rte_ethdev_representor.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
diff --git a/lib/librte_ether/meson.build b/lib/librte_ether/meson.build
index 7fed86056..163891556 100644
--- a/lib/librte_ether/meson.build
+++ b/lib/librte_ether/meson.build
@@ -15,6 +15,7 @@ headers = files('rte_ethdev.h',
 	'rte_ethdev_core.h',
 	'rte_ethdev_pci.h',
 	'rte_ethdev_vdev.h',
+	'rte_ethdev_representor.h',
 	'rte_eth_ctrl.h',
 	'rte_dev_info.h',
 	'rte_flow.h',
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index f32d18cad..c719f84a3 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -345,7 +345,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
 	rte_eth_dev_shared_data_prepare();
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
-
+	eth_dev->device = NULL;
+	eth_dev->intr_handle = NULL;
 	eth_dev->state = RTE_ETH_DEV_UNUSED;
 
 	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
@@ -3403,6 +3404,99 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
 	return rte_memzone_reserve_aligned(z_name, size, socket_id, 0, align);
 }
 
+int __rte_experimental
+rte_eth_dev_create(struct rte_device *device, const char *name,
+	size_t priv_data_size,
+	ethdev_bus_specific_init ethdev_bus_specific_init,
+	void *bus_init_params,
+	ethdev_init_t ethdev_init, void *init_params)
+{
+	struct rte_eth_dev *ethdev;
+	int retval;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		ethdev = rte_eth_dev_allocate(name);
+		if (!ethdev) {
+			retval = -ENODEV;
+			goto probe_failed;
+		}
+
+		if (priv_data_size) {
+			ethdev->data->dev_private = rte_zmalloc_socket(
+				name, priv_data_size, RTE_CACHE_LINE_SIZE,
+				device->numa_node);
+
+			if (!ethdev->data->dev_private) {
+				RTE_LOG(ERR, EAL, "failed to allocate private data");
+				retval = -ENOMEM;
+				goto probe_failed;
+			}
+		}
+	} else {
+		ethdev = rte_eth_dev_attach_secondary(name);
+		if (!ethdev) {
+			RTE_LOG(ERR, EAL, "secondary process attach failed, "
+				"ethdev doesn't exist");
+			retval = -ENODEV;
+			goto probe_failed;
+		}
+	}
+
+	ethdev->device = device;
+
+	if (ethdev_bus_specific_init) {
+		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
+		if (retval) {
+			RTE_LOG(ERR, EAL,
+				"ethdev bus specific initialisation failed");
+			goto probe_failed;
+		}
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);
+	retval = ethdev_init(ethdev, init_params);
+	if (retval) {
+		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
+		goto probe_failed;
+	}
+
+	return retval;
+probe_failed:
+	/* free ports private data if primary process */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(ethdev->data->dev_private);
+
+	rte_eth_dev_release_port(ethdev);
+
+	return retval;
+}
+
+int  __rte_experimental
+rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
+	ethdev_uninit_t ethdev_uninit)
+{
+	int ret;
+
+	ethdev = rte_eth_dev_allocated(ethdev->data->name);
+	if (!ethdev)
+		return -ENODEV;
+
+	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_uninit, -EINVAL);
+	if (ethdev_uninit) {
+		ret = ethdev_uninit(ethdev);
+		if (ret)
+			return ret;
+	}
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(ethdev->data->dev_private);
+
+	ethdev->data->dev_private = NULL;
+
+	return rte_eth_dev_release_port(ethdev);
+}
+
+
 int
 rte_eth_dev_rx_intr_ctl_q(uint16_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data)
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index 45f08c65e..4896cea93 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -125,6 +125,63 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *eth_dev, const char *name,
 			 uint16_t queue_id, size_t size,
 			 unsigned align, int socket_id);
 
+
+typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
+typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
+	void *bus_specific_init_params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function for the creation of a new ethdev ports.
+ *
+ * @param device
+ *  rte_device handle.
+ * @param	name
+ *  port name.
+ * @param priv_data_size
+ *  size of private data required for port.
+ * @param bus_specific_init
+ *  port bus specific initialisation callback function
+ * @param bus_init_params
+ *  port bus specific initialisation parameters
+ * @param ethdev_init
+ *  device specific port initialization callback function
+ * @param init_params
+ *  port initialisation parameters
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_dev_create(struct rte_device *device, const char *name,
+	size_t priv_data_size,
+	ethdev_bus_specific_init bus_specific_init, void *bus_init_params,
+	ethdev_init_t ethdev_init, void *init_params);
+
+
+typedef int (*ethdev_uninit_t)(struct rte_eth_dev *ethdev);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function for cleaing up the resources of a ethdev port on it's
+ * destruction.
+ *
+ * @param ethdev
+ *   ethdev handle of port.
+ * @param ethdev
+ *   device specific port un-initialise callback function
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
+	ethdev_uninit_t ethdev_uninit);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ethdev_pci.h b/lib/librte_ether/rte_ethdev_pci.h
index 897ce5b41..8604a0474 100644
--- a/lib/librte_ether/rte_ethdev_pci.h
+++ b/lib/librte_ether/rte_ethdev_pci.h
@@ -70,6 +70,18 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev,
 	eth_dev->data->numa_node = pci_dev->device.numa_node;
 }
 
+static inline int
+eth_dev_pci_specific_init(struct rte_eth_dev *eth_dev, void *bus_device) {
+	struct rte_pci_device *pci_dev = bus_device;
+
+	if (!pci_dev)
+		return -ENODEV;
+
+	rte_eth_copy_pci_info(eth_dev, pci_dev);
+
+	return 0;
+}
+
 /**
  * @internal
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
diff --git a/lib/librte_ether/rte_ethdev_representor.h b/lib/librte_ether/rte_ethdev_representor.h
new file mode 100644
index 000000000..cbc1f2855
--- /dev/null
+++ b/lib/librte_ether/rte_ethdev_representor.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+
+#ifndef _RTE_ETHDEV_REPRESENTOR_H_
+#define _RTE_ETHDEV_REPRESENTOR_H_
+
+#include <rte_ethdev_driver.h>
+
+static int
+eth_dev_representor_port_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct rte_eth_dev *base_ethdev = init_params;
+
+	if (!ethdev || !base_ethdev)
+		return -ENODEV;
+
+	/** representor shares same driver as it's base device */
+	ethdev->device->driver = base_ethdev->device->driver;
+
+	/** representor inherits the switch id of it's base device */
+	ethdev->data->switch_id = base_ethdev->data->switch_id;
+
+	return 0;
+}
+
+#endif /* _RTE_ETHDEV_REPRESENTOR_H_ */
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 87f02fb74..48b08bc36 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -230,3 +230,11 @@ EXPERIMENTAL {
 	rte_mtr_stats_update;
 
 } DPDK_17.11;
+
+EXPERIMENTAL {
+	global:
+
+	rte_eth_dev_create;
+	rte_eth_dev_destroy;
+
+} DPDK_18.05;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
                   ` (2 preceding siblings ...)
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-29  6:13   ` Shahaf Shuler
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 5/8] app/testpmd: add port name to device info Declan Doherty
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

Add new device flag to specify that ethdev port is a port representor.
Extend rte_eth_dev_info structure to expose device flags to user which
enable applications to discover if a port is a representor port.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c             | 1 +
 lib/librte_ether/rte_ethdev.h             | 9 ++++++---
 lib/librte_ether/rte_ethdev_representor.h | 3 +++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index c719f84a3..163246433 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2399,6 +2399,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
 	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
 	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
 	dev_info->switch_id = dev->data->switch_id;
+	dev_info->dev_flags = dev->data->dev_flags;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index dced4fc41..226acc8b1 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -996,6 +996,7 @@ struct rte_eth_dev_info {
 	const char *driver_name; /**< Device Driver name. */
 	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
 		Use if_indextoname() to translate into an interface name. */
+	uint32_t dev_flags; /**< Device flags */
 	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
 	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
 	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
@@ -1229,11 +1230,13 @@ struct rte_eth_dev_owner {
 };
 
 /** Device supports link state interrupt */
-#define RTE_ETH_DEV_INTR_LSC     0x0002
+#define RTE_ETH_DEV_INTR_LSC		0x0002
 /** Device is a bonded slave */
-#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
+#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
 /** Device supports device removal interrupt */
-#define RTE_ETH_DEV_INTR_RMV     0x0008
+#define RTE_ETH_DEV_INTR_RMV		0x0008
+/** Device is port representor */
+#define RTE_ETH_DEV_REPRESENTOR		0x0010
 
 /**
  * @warning
diff --git a/lib/librte_ether/rte_ethdev_representor.h b/lib/librte_ether/rte_ethdev_representor.h
index cbc1f2855..f3726d0ba 100644
--- a/lib/librte_ether/rte_ethdev_representor.h
+++ b/lib/librte_ether/rte_ethdev_representor.h
@@ -22,6 +22,9 @@ eth_dev_representor_port_init(struct rte_eth_dev *ethdev, void *init_params)
 	/** representor inherits the switch id of it's base device */
 	ethdev->data->switch_id = base_ethdev->data->switch_id;
 
+	/** Set device flags to specify that device is a representor port */
+	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
+
 	return 0;
 }
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 5/8] app/testpmd: add port name to device info
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
                   ` (3 preceding siblings ...)
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 6/8] ethdev: add common devargs parser Declan Doherty
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

Add the port name to information printed by show port info <port_id>

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 app/test-pmd/config.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index e12f8c515..0fbdfdcdd 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -407,6 +407,7 @@ port_infos_display(portid_t port_id)
 	static const char *info_border = "*********************";
 	portid_t pid;
 	uint16_t mtu;
+	char name[RTE_ETH_NAME_MAX_LEN];
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN)) {
 		printf("Valid port range is [0");
@@ -423,6 +424,8 @@ port_infos_display(portid_t port_id)
 	       info_border, port_id, info_border);
 	rte_eth_macaddr_get(port_id, &mac_addr);
 	print_ethaddr("MAC address: ", &mac_addr);
+	rte_eth_dev_get_name_by_port(port_id, name);
+	printf("\nDevice name: %s", name);
 	printf("\nDriver name: %s", dev_info.driver_name);
 	printf("\nConnect to socket: %u", port->socket_id);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 6/8] ethdev: add common devargs parser
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
                   ` (4 preceding siblings ...)
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 5/8] app/testpmd: add port name to device info Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-29 12:12   ` Gaëtan Rivet
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 7/8] net/i40e: add support for representor ports Declan Doherty
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

From: Remy Horton <remy.horton@intel.com>

Introduces a new structure, rte_eth_devargs, to support generic
ethdev arguments common across NET PMDs, with a new API
rte_eth_devargs_parse API to support PMD parsing these arguments.

Signed-off-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/Makefile                            |   1 +
 lib/librte_ether/rte_ethdev.c           | 191 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_driver.h    |  30 +++++
 lib/librte_ether/rte_ethdev_version.map |   1 +
 4 files changed, 223 insertions(+)

diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..4144d99f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
 DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
 DEPDIRS-librte_ether += librte_mbuf
+DEPDIRS-librte_ether += librte_kvargs
 DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
 DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 163246433..cdab23feb 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -35,6 +35,7 @@
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
 #include <rte_compat.h>
+#include <rte_kvargs.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -4262,3 +4263,193 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 
 	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
 }
+
+typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
+
+static int
+rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
+{
+	int state;
+	struct rte_kvargs_pair *pair;
+	char *letter;
+
+	arglist->str = strdup(str_in);
+	if (arglist->str == NULL)
+		return -ENOMEM;
+
+	letter = arglist->str;
+	state = 0;
+	arglist->count = 0;
+	pair = &arglist->pairs[0];
+	while (1) {
+		switch (state) {
+		case 0: /* Initial */
+			if (*letter == '=')
+				return -EINVAL;
+			else if (*letter == '\0')
+				return 0;
+			state = 1;
+			pair->key = letter;
+			/* fall-thru */
+
+		case 1: /* Parsing key */
+			if (*letter == '=') {
+				*letter = '\0';
+				pair->value = letter + 1;
+				state = 2;
+			} else if (*letter == ',' || *letter == '\0')
+				return -EINVAL;
+			break;
+
+
+		case 2: /* Parsing value */
+			if (*letter == '[')
+				state = 3;
+			else if (*letter == ',' || *letter == '\0') {
+				*letter = '\0';
+				arglist->count++;
+				pair = &arglist->pairs[arglist->count];
+				state = 0;
+			}
+			break;
+
+		case 3: /* Parsing list */
+			if (*letter == ']')
+				state = 2;
+			else if (*letter == '\0')
+				return -EINVAL;
+			break;
+		}
+		letter++;
+	}
+}
+
+
+static int
+rte_eth_devargs_parse_list(char *str, rte_eth_devargs_callback_t callback,
+	void *data)
+{
+	char *str_start;
+	int state;
+	int result;
+
+	if (*str != '[')
+		/* Single element, not a list */
+		return callback(str, data);
+
+	/* Sanity check, then strip the brackets */
+	str_start = &str[strlen(str) - 1];
+	if (*str_start != ']') {
+		RTE_LOG(ERR, EAL, "(%s): List does not end with ']'", str);
+		return -EINVAL;
+	}
+	str++;
+	*str_start = '\0';
+
+	/* Process list elements */
+	state = 0;
+	while (1) {
+		if (state == 0) {
+			if (*str == '\0')
+				break;
+			if (*str != ',') {
+				str_start = str;
+				state = 1;
+			}
+		} else if (state == 1) {
+			if (*str == ',' || *str == '\0') {
+				if (str > str_start) {
+					/* Non-empty string fragment */
+					*str = '\0';
+					result = callback(str_start, data);
+					if (result < 0)
+						return result;
+				}
+				state = 0;
+			}
+		}
+		str++;
+	}
+	return 0;
+}
+
+static int
+rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
+	const uint16_t max_list)
+{
+	unsigned int lo;
+	unsigned int hi;
+	unsigned int value;
+	int result;
+
+	result = sscanf(str, "%u-%u", &lo, &hi);
+	if (result == 1) {
+		if (*len_list >= max_list)
+			return -ENOMEM;
+		list[(*len_list)++] = lo;
+	} else if (result == 2) {
+		if (lo >= hi)
+			return -EINVAL;
+		for (value = lo; value <= hi; value++) {
+			if (*len_list >= max_list)
+				return -ENOMEM;
+			list[(*len_list)++] = value;
+		}
+	} else
+		return -EINVAL;
+	return 0;
+}
+
+static int
+rte_eth_devargs_parse_ports(char *str, void *data)
+{
+	struct rte_eth_devargs *eth_da = data;
+
+	return rte_eth_devargs_process_range(str, eth_da->ports,
+		&eth_da->nb_ports, RTE_MAX_ETHPORTS);
+}
+
+
+static int
+rte_eth_devargs_parse_representor_ports(char *str, void *data)
+{
+	struct rte_eth_devargs *eth_da = data;
+
+	return rte_eth_devargs_process_range(str, eth_da->representor_ports,
+		&eth_da->nb_representor_ports, RTE_MAX_ETHPORTS);
+}
+
+int __rte_experimental
+rte_eth_devargs_parse(const struct rte_devargs * const da,
+	struct rte_eth_devargs *eth_da)
+{
+	struct rte_kvargs args;
+	struct rte_kvargs_pair *pair;
+	unsigned int i;
+	int result;
+
+	memset(eth_da, 0, sizeof(*eth_da));
+
+	result = rte_eth_devargs_tokenise(&args, da->args);
+	if (result < 0)
+		return result;
+
+	for (i = 0; i < args.count; i++) {
+		pair = &args.pairs[i];
+
+		if (strcmp("port", pair->key) == 0) {
+			result = rte_eth_devargs_parse_list(pair->value,
+				rte_eth_devargs_parse_ports, eth_da);
+			if (result < 0)
+				return result;
+		} else if (strcmp("representor", pair->key) == 0) {
+			result = rte_eth_devargs_parse_list(pair->value,
+				rte_eth_devargs_parse_representor_ports,
+				eth_da);
+			if (result < 0)
+				return result;
+		}
+	}
+
+	return 0;
+}
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index 4896cea93..2e9ce96a6 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -126,6 +126,36 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *eth_dev, const char *name,
 			 unsigned align, int socket_id);
 
 
+/** Generic Ethernet device arguments  */
+struct rte_eth_devargs {
+	uint16_t ports[RTE_MAX_ETHPORTS];
+	/** port/s number to enable on a multi-port single function */
+	uint16_t nb_ports;
+	/** number of ports in ports field */
+	uint16_t representor_ports[RTE_MAX_ETHPORTS];
+	/** representor port/s identifier to enable on device */
+	uint16_t nb_representor_ports;
+	/** number of ports in representor port field */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function to parse ethdev arguments
+ *
+ * @param devargs
+ *  device arguments
+ * @param eth_devargs
+ *  parsed ethdev specific arguments.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_devargs_parse(const struct rte_devargs * const devargs,
+	struct rte_eth_devargs *eth_devargs);
+
 typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
 typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
 	void *bus_specific_init_params);
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 48b08bc36..99d396c5f 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -234,6 +234,7 @@ EXPERIMENTAL {
 EXPERIMENTAL {
 	global:
 
+	rt_eth_devargs_parse;
 	rte_eth_dev_create;
 	rte_eth_dev_destroy;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 7/8] net/i40e: add support for representor ports
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
                   ` (5 preceding siblings ...)
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 6/8] ethdev: add common devargs parser Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 8/8] net/ixgbe: " Declan Doherty
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
  8 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

Add support for virtual function representor ports to the i40e PF driver.
When SR-IOV virtual functions devices are enabled a corresponding
representor port for each VF can be enabled in the process in which the
i40e PMD is running within, by specifying the representor devarg with
the list of VF ports that representors are to be created for.

An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:

-w pci:D:B:D.F,representor=[0,2,4-7]

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 drivers/net/i40e/Makefile              |   3 +
 drivers/net/i40e/i40e_ethdev.c         |  71 +++++-
 drivers/net/i40e/i40e_ethdev.h         |  12 +
 drivers/net/i40e/i40e_vf_representor.c | 392 +++++++++++++++++++++++++++++++++
 drivers/net/i40e/meson.build           |   4 +-
 drivers/net/i40e/rte_pmd_i40e.c        |  43 ++++
 drivers/net/i40e/rte_pmd_i40e.h        |  18 ++
 7 files changed, 535 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 5663f5b1c..6184b38f3 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -11,6 +11,8 @@ LIB = librte_pmd_i40e.a
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -DPF_DRIVER -DVF_DRIVER -DINTEGRATED_VF
 CFLAGS += -DX722_A0_SUPPORT
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
 LDLIBS += -lrte_bus_pci
@@ -85,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += rte_pmd_i40e.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_vf_representor.c
 
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_AVX2,$(CFLAGS)),RTE_MACHINE_CPUFLAG_AVX2)
 	CC_AVX2_SUPPORT=1
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 508b4171c..397d834ed 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -18,6 +18,7 @@
 #include <rte_ether.h>
 #include <rte_ethdev_driver.h>
 #include <rte_ethdev_pci.h>
+#include <rte_ethdev_representor.h>
 #include <rte_memzone.h>
 #include <rte_malloc.h>
 #include <rte_memcpy.h>
@@ -213,7 +214,7 @@
 /* Bit mask of Extended Tag enable/disable */
 #define PCI_DEV_CTRL_EXT_TAG_MASK  (1 << PCI_DEV_CTRL_EXT_TAG_SHIFT)
 
-static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev);
+static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev, void *init_params);
 static int eth_i40e_dev_uninit(struct rte_eth_dev *eth_dev);
 static int i40e_dev_configure(struct rte_eth_dev *dev);
 static int i40e_dev_start(struct rte_eth_dev *dev);
@@ -607,16 +608,72 @@ static const struct rte_i40e_xstats_name_off rte_i40e_txq_prio_strings[] = {
 #define I40E_NB_TXQ_PRIO_XSTATS (sizeof(rte_i40e_txq_prio_strings) / \
 		sizeof(rte_i40e_txq_prio_strings[0]))
 
-static int eth_i40e_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+
+static int
+eth_i40e_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_probe(pci_dev,
-		sizeof(struct i40e_adapter), eth_i40e_dev_init);
+	char name[RTE_ETH_NAME_MAX_LEN];
+	struct rte_eth_devargs eth_da = { .nb_representor_ports = 0 };
+	int i, retval;
+
+	retval = rte_eth_devargs_parse(pci_dev->device.devargs, &eth_da);
+	if (retval)
+		return retval;
+
+	/* physical port net_bdf_port */
+	snprintf(name, sizeof(name), "net_%s", pci_dev->device.name);
+
+	retval = rte_eth_dev_create(&pci_dev->device, name,
+		sizeof(struct i40e_adapter),
+		eth_dev_pci_specific_init, pci_dev,
+		eth_i40e_dev_init, NULL);
+
+	if (retval || eth_da.nb_representor_ports < 1)
+		return retval;
+
+	/* probe VF representor ports */
+	struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+
+	if (pf_ethdev == NULL)
+		return -ENODEV;
+
+	for (i = 0; i < eth_da.nb_representor_ports; i++) {
+		struct i40e_vf_representor representor = {
+			.vf_id = eth_da.representor_ports[i],
+			.adapter = I40E_DEV_PRIVATE_TO_ADAPTER(
+				pf_ethdev->data->dev_private)
+		};
+
+		/* representor port net_bdf_port */
+		snprintf(name, sizeof(name), "net_%s_representor_%d",
+			pci_dev->device.name, eth_da.representor_ports[i]);
+
+		retval = rte_eth_dev_create(&pci_dev->device, name,
+			sizeof(struct i40e_vf_representor),
+			eth_dev_representor_port_init, pf_ethdev,
+			i40e_vf_representor_init, &representor);
+
+		if (retval)
+			PMD_DRV_LOG(ERR, "failed to create i40e vf "
+				"representor %s.", name);
+	}
+
+	return 0;
 }
 
 static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_remove(pci_dev, eth_i40e_dev_uninit);
+	struct rte_eth_dev *ethdev;
+
+	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!ethdev)
+		return -ENODEV;
+
+	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		return rte_eth_dev_destroy(ethdev, i40e_vf_representor_uninit);
+	else
+		return rte_eth_dev_destroy(ethdev, eth_i40e_dev_uninit);
 }
 
 static struct rte_pci_driver rte_i40e_pmd = {
@@ -1118,7 +1175,7 @@ i40e_support_multi_driver(struct rte_eth_dev *dev)
 }
 
 static int
-eth_i40e_dev_init(struct rte_eth_dev *dev)
+eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params __rte_unused)
 {
 	struct rte_pci_device *pci_dev;
 	struct rte_intr_handle *intr_handle;
@@ -2339,7 +2396,7 @@ i40e_dev_reset(struct rte_eth_dev *dev)
 	if (ret)
 		return ret;
 
-	ret = eth_i40e_dev_init(dev);
+	ret = eth_i40e_dev_init(dev, NULL);
 
 	return ret;
 }
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 99efb6707..2d3666e39 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -1058,6 +1058,16 @@ struct i40e_adapter {
 	uint64_t pctypes_mask;
 };
 
+/**
+ * Strucute to store private data for each VF representor instance
+ */
+struct i40e_vf_representor {
+	uint16_t vf_id;
+	/**< Virtual Function ID */
+	struct i40e_adapter *adapter;
+	/**< Private data store of assocaiated physical function */
+};
+
 extern const struct rte_flow_ops i40e_flow_ops;
 
 union i40e_filter_t {
@@ -1216,6 +1226,8 @@ int i40e_set_rss_key(struct i40e_vsi *vsi, uint8_t *key, uint8_t key_len);
 int i40e_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size);
 int i40e_config_rss_filter(struct i40e_pf *pf,
 		struct i40e_rte_flow_rss_conf *conf, bool add);
+int i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params);
+int i40e_vf_representor_uninit(struct rte_eth_dev *ethdev);
 
 #define I40E_DEV_TO_PCI(eth_dev) \
 	RTE_DEV_TO_PCI((eth_dev)->device)
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
new file mode 100644
index 000000000..be14c5892
--- /dev/null
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -0,0 +1,392 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_bus_pci.h>
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_malloc.h>
+
+#include "base/i40e_type.h"
+#include "base/virtchnl.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "rte_pmd_i40e.h"
+
+static int
+i40e_vf_representor_link_update(struct rte_eth_dev *ethdev,
+	int wait_to_complete)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return i40e_dev_link_update(representor->adapter->eth_dev,
+		wait_to_complete);
+}
+static void
+i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	/* get dev info for the vdev */
+	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(representor->adapter->eth_dev);
+
+	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
+	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
+
+	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
+	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+		sizeof(uint32_t);
+	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
+	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+	dev_info->max_mac_addrs = I40E_NUM_MACADDR_MAX;
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_CKSUM;
+	dev_info->tx_offload_capa =
+		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT |
+		DEV_TX_OFFLOAD_IPV4_CKSUM |
+		DEV_TX_OFFLOAD_UDP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_CKSUM |
+		DEV_TX_OFFLOAD_SCTP_CKSUM |
+		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+		DEV_TX_OFFLOAD_TCP_TSO |
+		DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
+		DEV_TX_OFFLOAD_GRE_TNL_TSO |
+		DEV_TX_OFFLOAD_IPIP_TNL_TSO |
+		DEV_TX_OFFLOAD_GENEVE_TNL_TSO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_thresh = {
+			.pthresh = I40E_DEFAULT_RX_PTHRESH,
+			.hthresh = I40E_DEFAULT_RX_HTHRESH,
+			.wthresh = I40E_DEFAULT_RX_WTHRESH,
+		},
+		.rx_free_thresh = I40E_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_thresh = {
+			.pthresh = I40E_DEFAULT_TX_PTHRESH,
+			.hthresh = I40E_DEFAULT_TX_HTHRESH,
+			.wthresh = I40E_DEFAULT_TX_WTHRESH,
+		},
+		.tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH,
+		.tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH,
+		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+				ETH_TXQ_FLAGS_NOOFFLOADS,
+	};
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = I40E_MAX_RING_DESC,
+		.nb_min = I40E_MIN_RING_DESC,
+		.nb_align = I40E_ALIGN_RING_DESC,
+	};
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = I40E_MAX_RING_DESC,
+		.nb_min = I40E_MIN_RING_DESC,
+		.nb_align = I40E_ALIGN_RING_DESC,
+	};
+}
+
+static int
+i40e_vf_representor_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_dev_start(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static void
+i40e_vf_representor_dev_stop(__rte_unused struct rte_eth_dev *dev)
+{
+}
+
+static int
+i40e_vf_representor_rx_queue_setup(__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_rxconf *rx_conf,
+	__rte_unused struct rte_mempool *mb_pool)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_tx_queue_setup(__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_txconf *tx_conf)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_stats_get(struct rte_eth_dev *ethdev,
+		struct rte_eth_stats *stats)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_get_vf_stats(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, stats);
+}
+
+static void
+i40e_vf_representor_stats_reset(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_reset_vf_stats(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id);
+}
+
+static void
+i40e_vf_representor_promiscuous_enable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_unicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, 1);
+}
+
+static void
+i40e_vf_representor_promiscuous_disable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_unicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, 0);
+}
+
+
+static void
+i40e_vf_representor_allmulticast_enable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_multicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id,  1);
+}
+
+static void
+i40e_vf_representor_allmulticast_disable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_multicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id,  0);
+}
+
+static void
+i40e_vf_representor_mac_addr_remove(struct rte_eth_dev *ethdev, uint32_t index)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_remove_vf_mac_addr(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, &ethdev->data->mac_addrs[index]);
+}
+
+static void
+i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+		struct ether_addr *mac_addr)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_mac_addr(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, mac_addr);
+}
+
+static int
+i40e_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
+		uint16_t vlan_id, int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+	uint64_t vf_mask = 1ULL << representor->vf_id;
+
+	return rte_pmd_i40e_set_vf_vlan_filter(
+		representor->adapter->eth_dev->data->port_id,
+		vlan_id, vf_mask, on);
+}
+
+static int
+i40e_vf_representor_vlan_offload_set(struct rte_eth_dev *ethdev, int mask)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+	struct rte_eth_dev *pdev;
+	struct i40e_pf_vf *vf;
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	uint32_t vfid;
+
+	pdev = representor->adapter->eth_dev;
+	vfid = representor->vf_id;
+
+	if (!is_i40e_supported(pdev)) {
+		PMD_DRV_LOG(ERR, "Invalid PF dev.");
+		return -EINVAL;
+	}
+
+	pf = I40E_DEV_PRIVATE_TO_PF(pdev->data->dev_private);
+
+	if (vfid >= pf->vf_num || !pf->vfs) {
+		PMD_DRV_LOG(ERR, "Invalid VF ID.");
+		return -EINVAL;
+	}
+
+	vf = &pf->vfs[vfid];
+	vsi = vf->vsi;
+	if (!vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -EINVAL;
+	}
+
+	if (mask & ETH_VLAN_FILTER_MASK) {
+		/* Enable or disable VLAN filtering offload */
+		if (ethdev->data->dev_conf.rxmode.hw_vlan_filter)
+			return i40e_vsi_config_vlan_filter(vsi, TRUE);
+		else
+			return i40e_vsi_config_vlan_filter(vsi, FALSE);
+	}
+
+	if (mask & ETH_VLAN_STRIP_MASK) {
+		/* Enable or disable VLAN stripping offload */
+		if (ethdev->data->dev_conf.rxmode.hw_vlan_strip)
+			return i40e_vsi_config_vlan_stripping(vsi, TRUE);
+		else
+			return i40e_vsi_config_vlan_stripping(vsi, FALSE);
+	}
+
+	return -EINVAL;
+}
+
+static void
+i40e_vf_representor_vlan_strip_queue_set(struct rte_eth_dev *ethdev,
+	__rte_unused uint16_t rx_queue_id, int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_vlan_stripq(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, on);
+}
+
+static int
+i40e_vf_representor_vlan_pvid_set(struct rte_eth_dev *ethdev, uint16_t vlan_id,
+	__rte_unused int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_set_vf_vlan_insert(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, vlan_id);
+}
+
+struct eth_dev_ops i40e_representor_dev_ops = {
+	.dev_infos_get        = i40e_vf_representor_dev_infos_get,
+
+	.dev_start            = i40e_vf_representor_dev_start,
+	.dev_configure        = i40e_vf_representor_dev_configure,
+	.dev_stop             = i40e_vf_representor_dev_stop,
+
+	.rx_queue_setup       = i40e_vf_representor_rx_queue_setup,
+	.tx_queue_setup       = i40e_vf_representor_tx_queue_setup,
+
+	.link_update          = i40e_vf_representor_link_update,
+
+	.stats_get            = i40e_vf_representor_stats_get,
+	.stats_reset          = i40e_vf_representor_stats_reset,
+
+	.promiscuous_enable   = i40e_vf_representor_promiscuous_enable,
+	.promiscuous_disable  = i40e_vf_representor_promiscuous_disable,
+
+	.allmulticast_enable  = i40e_vf_representor_allmulticast_enable,
+	.allmulticast_disable = i40e_vf_representor_allmulticast_disable,
+
+	.mac_addr_remove      = i40e_vf_representor_mac_addr_remove,
+	.mac_addr_set         = i40e_vf_representor_mac_addr_set,
+
+	.vlan_filter_set      = i40e_vf_representor_vlan_filter_set,
+	.vlan_offload_set     = i40e_vf_representor_vlan_offload_set,
+	.vlan_strip_queue_set = i40e_vf_representor_vlan_strip_queue_set,
+	.vlan_pvid_set        = i40e_vf_representor_vlan_pvid_set
+
+};
+
+
+int
+i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	struct i40e_pf *pf;
+	struct i40e_pf_vf *vf;
+	struct rte_eth_link *link;
+
+	representor->vf_id =
+		((struct i40e_vf_representor *)init_params)->vf_id;
+	representor->adapter =
+		((struct i40e_vf_representor *)init_params)->adapter;
+
+	pf = I40E_DEV_PRIVATE_TO_PF(
+		representor->adapter->eth_dev->data->dev_private);
+
+	if (representor->vf_id >= pf->vf_num)
+		return -ENODEV;
+
+	/* Set representor device ops */
+	ethdev->dev_ops = &i40e_representor_dev_ops;
+
+	/* No data-path so no RX/TX functions */
+	ethdev->rx_pkt_burst = NULL;
+	ethdev->tx_pkt_burst = NULL;
+
+	vf = &pf->vfs[representor->vf_id];
+
+	if (!vf->vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -ENODEV;
+	}
+	/* Setting the number queues allocated to the VF */
+	ethdev->data->nb_rx_queues = vf->vsi->nb_qps;
+	ethdev->data->nb_tx_queues = vf->vsi->nb_qps;
+
+	ethdev->data->mac_addrs = &vf->mac_addr;
+
+	/* Link state. Inherited from PF */
+	link = &representor->adapter->eth_dev->data->dev_link;
+
+	ethdev->data->dev_link.link_speed = link->link_speed;
+	ethdev->data->dev_link.link_duplex = link->link_duplex;
+	ethdev->data->dev_link.link_status = link->link_status;
+	ethdev->data->dev_link.link_autoneg = link->link_autoneg;
+
+	return 0;
+}
+
+
+int
+i40e_vf_representor_uninit(struct rte_eth_dev *ethdev __rte_unused)
+{
+	return 0;
+}
diff --git a/drivers/net/i40e/meson.build b/drivers/net/i40e/meson.build
index 8764b0e5b..706b5fce4 100644
--- a/drivers/net/i40e/meson.build
+++ b/drivers/net/i40e/meson.build
@@ -4,7 +4,8 @@
 cflags += ['-DPF_DRIVER',
 	'-DVF_DRIVER',
 	'-DINTEGRATED_VF',
-	'-DX722_A0_SUPPORT']
+	'-DX722_A0_SUPPORT',
+	'-DALLOW_EXPERIMENTAL_API']
 
 subdir('base')
 objs = [base_objs]
@@ -17,6 +18,7 @@ sources = files(
 	'i40e_fdir.c',
 	'i40e_flow.c',
 	'i40e_tm.c',
+	'i40e_vf_representor.c',
 	'rte_pmd_i40e.c'
 	)
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index dae59e6dc..59680d9d6 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -570,6 +570,49 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	return 0;
 }
 
+static const struct ether_addr null_mac_addr;
+
+int
+rte_pmd_i40e_remove_vf_mac_addr(uint16_t port, uint16_t vf_id,
+	struct ether_addr *mac_addr)
+{
+	struct rte_eth_dev *dev;
+	struct i40e_pf_vf *vf;
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+
+	if (i40e_validate_mac_addr((u8 *)mac_addr) != I40E_SUCCESS)
+		return -EINVAL;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+
+	if (!is_i40e_supported(dev))
+		return -ENOTSUP;
+
+	pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+
+	if (vf_id >= pf->vf_num || !pf->vfs)
+		return -EINVAL;
+
+	vf = &pf->vfs[vf_id];
+	vsi = vf->vsi;
+	if (!vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -EINVAL;
+	}
+
+	if (is_same_ether_addr(mac_addr, &vf->mac_addr))
+		/* Reset the mac with NULL address */
+		ether_addr_copy(&null_mac_addr, &vf->mac_addr);
+
+	/* Remove the mac */
+	i40e_vsi_delete_mac(vsi, mac_addr);
+
+	return 0;
+}
+
 /* Set vlan strip on/off for specific VF from host */
 int
 rte_pmd_i40e_set_vf_vlan_stripq(uint16_t port, uint16_t vf_id, uint8_t on)
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index d248adb1a..be4a6024a 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -455,6 +455,24 @@ int rte_pmd_i40e_set_vf_multicast_promisc(uint16_t port,
 int rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 				 struct ether_addr *mac_addr);
 
+/**
+ * Remove the VF MAC address.
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param vf_id
+ *   VF id.
+ * @param mac_addr
+ *   VF MAC address.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-EINVAL) if *vf* or *mac_addr* is invalid.
+ */
+int
+rte_pmd_i40e_remove_vf_mac_addr(uint16_t port, uint16_t vf_id,
+	struct ether_addr *mac_addr);
+
 /**
  * Enable/Disable vf vlan strip for all queues in a pool
  *
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v6 8/8] net/ixgbe: add support for representor ports
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
                   ` (6 preceding siblings ...)
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 7/8] net/i40e: add support for representor ports Declan Doherty
@ 2018-03-28 13:54 ` Declan Doherty
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
  8 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-03-28 13:54 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Qi Zhang, Alejandro Lucero, Andrew Rybchenko,
	Mohammad Abdul Awal, Remy Horton, John McNamara, Rony Efraim, Wu,
	Jingjing, Lu, Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson,
	Bruce, Ananyev, Konstantin, Wang, Zhihong, Declan Doherty

Add support for virtual function representor ports to the ixgbe PF driver.
When SR-IOV virtual functions devices are enabled a corresponding
representor port for each VF can be enabled in the process in which the
i40e PMD is running within, by specifying the representor devarg with
the list of VF ports that representors are to be created for.

An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:

-w pci:D:B:D.F,representor=[0,2,4-7]

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 drivers/net/ixgbe/Makefile               |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c         |  70 +++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.h         |  12 ++
 drivers/net/ixgbe/ixgbe_vf_representor.c | 210 +++++++++++++++++++++++++++++++
 drivers/net/ixgbe/meson.build            |   4 +-
 5 files changed, 288 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index d0804fc5b..f8725aebb 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -103,6 +103,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ipsec.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += rte_pmd_ixgbe.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_vf_representor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)-include := rte_pmd_ixgbe.h
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 448325857..632e3e116 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -28,6 +28,7 @@
 #include <rte_ether.h>
 #include <rte_ethdev_driver.h>
 #include <rte_ethdev_pci.h>
+#include <rte_ethdev_representor.h>
 #include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_dev.h>
@@ -133,7 +134,7 @@
 #define IXGBE_EXVET_VET_EXT_SHIFT              16
 #define IXGBE_DMATXCTL_VT_MASK                 0xFFFF0000
 
-static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
+static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
 static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev);
 static int ixgbe_fdir_filter_uninit(struct rte_eth_dev *eth_dev);
@@ -1096,7 +1097,7 @@ ixgbe_swfw_lock_reset(struct ixgbe_hw *hw)
  * It returns 0 on success.
  */
 static int
-eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
+eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 {
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
 	struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
@@ -1755,16 +1756,69 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 	return 0;
 }
 
-static int eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+static int
+eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_probe(pci_dev,
-		sizeof(struct ixgbe_adapter), eth_ixgbe_dev_init);
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_eth_devargs eth_da;
+	int i, retval;
+
+	retval = rte_eth_devargs_parse(pci_dev->device.devargs, &eth_da);
+	if (retval)
+		return retval;
+
+	/* physical port net_bdf_port */
+	snprintf(name, sizeof(name), "net_%s_%d", pci_dev->device.name, 0);
+
+	retval = rte_eth_dev_create(&pci_dev->device, name,
+		sizeof(struct ixgbe_adapter),
+		eth_dev_pci_specific_init, pci_dev,
+		eth_ixgbe_dev_init, NULL);
+
+	if (retval || eth_da.nb_representor_ports < 1)
+		return retval;
+
+	/* probe VF representor ports */
+	struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+
+	for (i = 0; i < eth_da.nb_representor_ports; i++) {
+		struct ixgbe_vf_representor representor = {
+			.vf_id = eth_da.representor_ports[i],
+			.pf_ethdev = pf_ethdev
+		};
+
+		/* representor port net_bdf_port */
+		snprintf(name, sizeof(name), "net_%s_representor_%d",
+			pci_dev->device.name,
+			eth_da.representor_ports[i]);
+
+		retval = rte_eth_dev_create(&pci_dev->device, name,
+			sizeof(struct ixgbe_vf_representor),
+			eth_dev_representor_port_init, pf_ethdev,
+			ixgbe_vf_representor_init, &representor);
+
+		if (retval)
+			PMD_DRV_LOG(ERR, "failed to create ixgbe vf "
+				"representor %s.", name);
+	}
+
+	return 0;
 }
 
 static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_remove(pci_dev, eth_ixgbe_dev_uninit);
+	struct rte_eth_dev *ethdev;
+
+	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!ethdev)
+		return -ENODEV;
+
+	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		return rte_eth_dev_destroy(ethdev, ixgbe_vf_representor_uninit);
+	else
+		return rte_eth_dev_destroy(ethdev, eth_ixgbe_dev_uninit);
 }
 
 static struct rte_pci_driver rte_ixgbe_pmd = {
@@ -2881,7 +2935,7 @@ ixgbe_dev_reset(struct rte_eth_dev *dev)
 	if (ret)
 		return ret;
 
-	ret = eth_ixgbe_dev_init(dev);
+	ret = eth_ixgbe_dev_init(dev, NULL);
 
 	return ret;
 }
@@ -3941,7 +3995,7 @@ ixgbevf_check_link(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
 }
 
 /* return 0 means link status changed, -1 means not changed */
-static int
+int
 ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
 			    int wait_to_complete, int vf)
 {
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index c56d65244..cee87ca1f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -480,6 +480,14 @@ struct ixgbe_adapter {
  	struct ixgbe_tm_conf        tm_conf;
 };
 
+struct ixgbe_vf_representor {
+	uint16_t vf_id;
+	struct rte_eth_dev *pf_ethdev;
+};
+
+int ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params);
+int ixgbe_vf_representor_uninit(struct rte_eth_dev *ethdev);
+
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct ixgbe_adapter *)adapter)->hw)
 
@@ -652,6 +660,10 @@ int ixgbe_fdir_filter_program(struct rte_eth_dev *dev,
 
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);
 
+int
+ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
+			    int wait_to_complete, int vf);
+
 /*
  * misc function prototypes
  */
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
new file mode 100644
index 000000000..8394efd8d
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_malloc.h>
+
+#include "base/ixgbe_type.h"
+#include "base/ixgbe_vf.h"
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+#include "rte_pmd_ixgbe.h"
+
+
+static int
+ixgbe_vf_representor_link_update(struct rte_eth_dev *ethdev,
+	int wait_to_complete)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	return ixgbe_dev_link_update_share(representor->pf_ethdev,
+		wait_to_complete, 1);
+}
+
+static void
+ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+	struct ether_addr *mac_addr)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_ixgbe_set_vf_mac_addr(representor->pf_ethdev->data->port_id,
+		representor->vf_id, mac_addr);
+}
+
+static void
+ixgbe_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(
+		representor->pf_ethdev->data->dev_private);
+
+	dev_info->pci_dev = NULL;
+
+	dev_info->min_rx_bufsize = 1024;
+	/**< Minimum size of RX buffer. */
+	dev_info->max_rx_pktlen = 9728;
+	/**< Maximum configurable length of RX pkt. */
+	dev_info->max_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
+	/**< Maximum number of RX queues. */
+	dev_info->max_tx_queues = IXGBE_VF_MAX_TX_QUEUES;
+	/**< Maximum number of TX queues. */
+
+	dev_info->max_mac_addrs = hw->mac.num_rar_entries;
+	/**< Maximum number of MAC addresses. */
+
+	dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |	DEV_RX_OFFLOAD_UDP_CKSUM  |
+		DEV_RX_OFFLOAD_TCP_CKSUM;
+	/**< Device RX offload capabilities. */
+
+	dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_SCTP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_TSO;
+	/**< Device TX offload capabilities. */
+
+	dev_info->speed_capa =
+		representor->pf_ethdev->data->dev_link.link_speed;
+	/**< Supported speeds bitmap (ETH_LINK_SPEED_). */
+}
+
+static int ixgbe_vf_representor_dev_configure(
+		__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_rx_queue_setup(
+	__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_rxconf *rx_conf,
+	__rte_unused struct rte_mempool *mb_pool)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_tx_queue_setup(
+	__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_txconf *tx_conf)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_dev_start(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static void ixgbe_vf_representor_dev_stop(__rte_unused struct rte_eth_dev *dev)
+{
+}
+
+static int
+ixgbe_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
+	uint16_t vlan_id, int on)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+	uint64_t vf_mask = 1ULL << representor->vf_id;
+
+	return rte_pmd_ixgbe_set_vf_vlan_filter(
+		representor->pf_ethdev->data->port_id, vlan_id, vf_mask, on);
+}
+
+static void
+ixgbe_vf_representor_vlan_strip_queue_set(struct rte_eth_dev *ethdev,
+	__rte_unused uint16_t rx_queue_id, int on)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_ixgbe_set_vf_vlan_stripq(representor->pf_ethdev->data->port_id,
+		representor->vf_id, on);
+}
+
+struct eth_dev_ops ixgbe_vf_representor_dev_ops = {
+	.dev_infos_get		= ixgbe_vf_representor_dev_infos_get,
+
+	.dev_start		= ixgbe_vf_representor_dev_start,
+	.dev_configure		= ixgbe_vf_representor_dev_configure,
+	.dev_stop		= ixgbe_vf_representor_dev_stop,
+
+	.rx_queue_setup		= ixgbe_vf_representor_rx_queue_setup,
+	.tx_queue_setup		= ixgbe_vf_representor_tx_queue_setup,
+
+	.link_update		= ixgbe_vf_representor_link_update,
+
+	.vlan_filter_set	= ixgbe_vf_representor_vlan_filter_set,
+	.vlan_strip_queue_set	= ixgbe_vf_representor_vlan_strip_queue_set,
+
+	.mac_addr_set		= ixgbe_vf_representor_mac_addr_set,
+};
+
+
+int
+ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	struct ixgbe_vf_info *vf_data;
+	struct rte_pci_device *pci_dev;
+	struct rte_eth_link *link;
+
+	if (!representor)
+		return -ENOMEM;
+
+	representor->vf_id =
+		((struct ixgbe_vf_representor *)init_params)->vf_id;
+	representor->pf_ethdev =
+		((struct ixgbe_vf_representor *)init_params)->pf_ethdev;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(representor->pf_ethdev);
+
+	if (representor->vf_id >= pci_dev->max_vfs)
+		return -ENODEV;
+
+	/* Set representor device ops */
+	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
+
+	/* No data-path so no RX/TX functions */
+	ethdev->rx_pkt_burst = NULL;
+	ethdev->tx_pkt_burst = NULL;
+
+	/* Setting the number queues allocated to the VF */
+	ethdev->data->nb_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
+	ethdev->data->nb_tx_queues = IXGBE_VF_MAX_RX_QUEUES;
+
+	/* Reference VF mac address from PF data structure */
+	vf_data = *IXGBE_DEV_PRIVATE_TO_P_VFDATA(
+		representor->pf_ethdev->data->dev_private);
+
+	ethdev->data->mac_addrs = (struct ether_addr *)
+		vf_data[representor->vf_id].vf_mac_addresses;
+
+	/* Inherit Switch Identifier from PF */
+	ethdev->data->switch_id = representor->pf_ethdev->data->switch_id;
+
+	/* Link state. Inherited from PF */
+	link = &representor->pf_ethdev->data->dev_link;
+
+	ethdev->data->dev_link.link_speed = link->link_speed;
+	ethdev->data->dev_link.link_duplex = link->link_duplex;
+	ethdev->data->dev_link.link_status = link->link_status;
+	ethdev->data->dev_link.link_autoneg = link->link_autoneg;
+
+	return 0;
+}
+
+
+int
+ixgbe_vf_representor_uninit(struct rte_eth_dev *ethdev __rte_unused)
+{
+	return 0;
+}
diff --git a/drivers/net/ixgbe/meson.build b/drivers/net/ixgbe/meson.build
index 60af0baef..6862e7ed0 100644
--- a/drivers/net/ixgbe/meson.build
+++ b/drivers/net/ixgbe/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-cflags += ['-DRTE_LIBRTE_IXGBE_BYPASS']
+cflags += ['-DRTE_LIBRTE_IXGBE_BYPASS',
+	'-DALLOW_EXPERIMENTAL_API']
 
 subdir('base')
 objs = [base_objs]
@@ -17,6 +18,7 @@ sources = files(
 	'ixgbe_pf.c',
 	'ixgbe_rxtx.c',
 	'ixgbe_tm.c',
+	'ixgbe_vf_rerpesentor.c',
 	'rte_pmd_ixgbe.c'
 )
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation Declan Doherty
@ 2018-03-28 14:53   ` Thomas Monjalon
  2018-03-28 15:05     ` Doherty, Declan
  2018-04-03 15:52   ` Adrien Mazarguil
  1 sibling, 1 reply; 73+ messages in thread
From: Thomas Monjalon @ 2018-03-28 14:53 UTC (permalink / raw)
  To: Declan Doherty; +Cc: dev, adrien.mazarguil

28/03/2018 15:54, Declan Doherty:
> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> 
> Add document to describe a model for representing switching capable
> devices in DPDK, using a general ethdev port model and through port
> representors.This document also details the port model and the
> rte_flow semantics required for flow programming, as well as listing
> some example use cases.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>

It is strange to have different From: and SoB:
If Adrien participated in this writing, he should have his SoB too I think.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation
  2018-03-28 14:53   ` Thomas Monjalon
@ 2018-03-28 15:05     ` Doherty, Declan
  0 siblings, 0 replies; 73+ messages in thread
From: Doherty, Declan @ 2018-03-28 15:05 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, adrien.mazarguil

On 28/03/2018 3:53 PM, Thomas Monjalon wrote:
> 28/03/2018 15:54, Declan Doherty:
>> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
>>
>> Add document to describe a model for representing switching capable
>> devices in DPDK, using a general ethdev port model and through port
>> representors.This document also details the port model and the
>> rte_flow semantics required for flow programming, as well as listing
>> some example use cases.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> 
> It is strange to have different From: and SoB:
> If Adrien participated in this writing, he should have his SoB too I think.
> 
> 
> 

Yep, I just wanted to make sure that Adrien was credited with the 
generation of the content as he authored the vast majority of it in this 
mail (http://dpdk.org/ml/archives/dev/2018-March/092513.html) but I 
didn't want to assume his sign-off until he had a chance to comment. 
I'll address in next revision.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port Declan Doherty
@ 2018-03-29  6:13   ` Shahaf Shuler
  2018-03-29  9:13     ` Doherty, Declan
  0 siblings, 1 reply; 73+ messages in thread
From: Shahaf Shuler @ 2018-03-29  6:13 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

Hi Declan,

Thanks for the series! See some comments below

Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> Subject: [dpdk-dev][PATCH v6 2/8] ethdev: add switch identifier parameter
> to port
> 
> Introduces a new port attribute to ethdev port's which denotes the switch
> domain a port belongs to. By default all port's switch identifiers are the their
> port_id. Ports which share a common switch domain are configured with the
> same switch id.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  app/test-pmd/config.c              | 1 +
>  lib/librte_ether/rte_ethdev.c      | 3 +++
>  lib/librte_ether/rte_ethdev.h      | 1 +
>  lib/librte_ether/rte_ethdev_core.h | 1 +
>  4 files changed, 6 insertions(+)
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> 4bb255c62..e12f8c515 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -517,6 +517,7 @@ port_infos_display(portid_t port_id)
>  	printf("Min possible number of TXDs per queue: %hu\n",
>  		dev_info.tx_desc_lim.nb_min);
>  	printf("TXDs number alignment: %hu\n",
> dev_info.tx_desc_lim.nb_align);
> +	printf("Switch Id: %u\n", dev_info.switch_id);
>  }
> 
>  void
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 23857c91f..f32d18cad 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -290,6 +290,8 @@ rte_eth_dev_allocate(const char *name)
>  	eth_dev = eth_dev_get(port_id);
>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
> "%s", name);
>  	eth_dev->data->port_id = port_id;
> +	eth_dev->data->switch_id = port_id;
> +	/**< Default switch_id is the port_id of the device */

Why such default is needed? Why not let the PMD to set it always?

>  	eth_dev->data->mtu = ETHER_MTU;
> 
>  unlock:
> @@ -2395,6 +2397,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
> rte_eth_dev_info *dev_info)
>  	dev_info->driver_name = dev->device->driver->name;
>  	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>  	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> +	dev_info->switch_id = dev->data->switch_id;

Why there is a need to keep the switch_id on device data? 
I think PMD to store it on its private structure and report it in dev_info is enough. 

>  }
> 
>  int
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 036153306..dced4fc41 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1029,6 +1029,7 @@ struct rte_eth_dev_info {
>  	/** Configured number of rx/tx queues */
>  	uint16_t nb_rx_queues; /**< Number of RX queues. */
>  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> +	uint16_t switch_id; /**< Switch Domain Id */
>  };
> 
>  /**
> diff --git a/lib/librte_ether/rte_ethdev_core.h
> b/lib/librte_ether/rte_ethdev_core.h
> index e5681e466..caed7a4e6 100644
> --- a/lib/librte_ether/rte_ethdev_core.h
> +++ b/lib/librte_ether/rte_ethdev_core.h
> @@ -585,6 +585,7 @@ struct rte_eth_dev_data {
>  	struct ether_addr* hash_mac_addrs;
>  	/** Device Ethernet MAC addresses of hash filtering. */
>  	uint16_t port_id;           /**< Device [external] port identifier. */
> +	uint16_t switch_id;	    /**< Switch which port is associated with
> */
>  	__extension__
>  	uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0).
> */
>  		scattered_rx : 1,  /**< RX of scattered packets is ON(1) /
> OFF(0) */
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs Declan Doherty
@ 2018-03-29  6:13   ` Shahaf Shuler
  2018-03-29  9:22     ` Doherty, Declan
  0 siblings, 1 reply; 73+ messages in thread
From: Shahaf Shuler @ 2018-03-29  6:13 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> Subject: [dpdk-dev][PATCH v6 3/8] ethdev: add generic create/destroy
> ethdev APIs
> 
> Add new bus generic ethdev create/destroy APIs which are bus independent
> and provide hooks for bus specific initialisation.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  lib/librte_ether/Makefile                 |  1 +
>  lib/librte_ether/meson.build              |  1 +
>  lib/librte_ether/rte_ethdev.c             | 96
> ++++++++++++++++++++++++++++++-
>  lib/librte_ether/rte_ethdev_driver.h      | 57 ++++++++++++++++++
>  lib/librte_ether/rte_ethdev_pci.h         | 12 ++++
>  lib/librte_ether/rte_ethdev_representor.h | 28 +++++++++
>  lib/librte_ether/rte_ethdev_version.map   |  8 +++
>  7 files changed, 202 insertions(+), 1 deletion(-)  create mode 100644
> lib/librte_ether/rte_ethdev_representor.h
> 
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile index
> 3ca5782bb..5698cd47b 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -32,6 +32,7 @@ SYMLINK-y-include += rte_ethdev_driver.h  SYMLINK-y-
> include += rte_ethdev_core.h  SYMLINK-y-include += rte_ethdev_pci.h
> SYMLINK-y-include += rte_ethdev_vdev.h
> +SYMLINK-y-include += rte_ethdev_representor.h
>  SYMLINK-y-include += rte_eth_ctrl.h
>  SYMLINK-y-include += rte_dev_info.h
>  SYMLINK-y-include += rte_flow.h
> diff --git a/lib/librte_ether/meson.build b/lib/librte_ether/meson.build
> index 7fed86056..163891556 100644
> --- a/lib/librte_ether/meson.build
> +++ b/lib/librte_ether/meson.build
> @@ -15,6 +15,7 @@ headers = files('rte_ethdev.h',
>  	'rte_ethdev_core.h',
>  	'rte_ethdev_pci.h',
>  	'rte_ethdev_vdev.h',
> +	'rte_ethdev_representor.h',
>  	'rte_eth_ctrl.h',
>  	'rte_dev_info.h',
>  	'rte_flow.h',
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index f32d18cad..c719f84a3 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -345,7 +345,8 @@ rte_eth_dev_release_port(struct rte_eth_dev
> *eth_dev)
>  	rte_eth_dev_shared_data_prepare();
> 
>  	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
> -
> +	eth_dev->device = NULL;
> +	eth_dev->intr_handle = NULL;
>  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> 
>  	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data)); @@ -
> 3403,6 +3404,99 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev
> *dev, const char *ring_name,
>  	return rte_memzone_reserve_aligned(z_name, size, socket_id, 0,
> align);  }
> 
> +int __rte_experimental
> +rte_eth_dev_create(struct rte_device *device, const char *name,
> +	size_t priv_data_size,
> +	ethdev_bus_specific_init ethdev_bus_specific_init,
> +	void *bus_init_params,
> +	ethdev_init_t ethdev_init, void *init_params) {
> +	struct rte_eth_dev *ethdev;
> +	int retval;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		ethdev = rte_eth_dev_allocate(name);
> +		if (!ethdev) {
> +			retval = -ENODEV;
> +			goto probe_failed;
> +		}
> +
> +		if (priv_data_size) {
> +			ethdev->data->dev_private = rte_zmalloc_socket(
> +				name, priv_data_size,
> RTE_CACHE_LINE_SIZE,
> +				device->numa_node);
> +
> +			if (!ethdev->data->dev_private) {
> +				RTE_LOG(ERR, EAL, "failed to allocate private
> data");
> +				retval = -ENOMEM;
> +				goto probe_failed;
> +			}
> +		}
> +	} else {
> +		ethdev = rte_eth_dev_attach_secondary(name);
> +		if (!ethdev) {
> +			RTE_LOG(ERR, EAL, "secondary process attach failed,
> "
> +				"ethdev doesn't exist");
> +			retval = -ENODEV;
> +			goto probe_failed;
> +		}
> +	}
> +
> +	ethdev->device = device;
> +
> +	if (ethdev_bus_specific_init) {
> +		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
> +		if (retval) {
> +			RTE_LOG(ERR, EAL,
> +				"ethdev bus specific initialisation failed");
> +			goto probe_failed;
> +		}
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);
> +	retval = ethdev_init(ethdev, init_params);
> +	if (retval) {
> +		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
> +		goto probe_failed;
> +	}
> +
> +	return retval;
> +probe_failed:
> +	/* free ports private data if primary process */
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		rte_free(ethdev->data->dev_private);
> +
> +	rte_eth_dev_release_port(ethdev);
> +
> +	return retval;
> +}
> +
> +int  __rte_experimental
> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
> +	ethdev_uninit_t ethdev_uninit)
> +{
> +	int ret;
> +
> +	ethdev = rte_eth_dev_allocated(ethdev->data->name);
> +	if (!ethdev)
> +		return -ENODEV;
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_uninit, -EINVAL);
> +	if (ethdev_uninit) {
> +		ret = ethdev_uninit(ethdev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		rte_free(ethdev->data->dev_private);
> +
> +	ethdev->data->dev_private = NULL;
> +
> +	return rte_eth_dev_release_port(ethdev); }
> +
> +
>  int
>  rte_eth_dev_rx_intr_ctl_q(uint16_t port_id, uint16_t queue_id,
>  			  int epfd, int op, void *data)
> diff --git a/lib/librte_ether/rte_ethdev_driver.h
> b/lib/librte_ether/rte_ethdev_driver.h
> index 45f08c65e..4896cea93 100644
> --- a/lib/librte_ether/rte_ethdev_driver.h
> +++ b/lib/librte_ether/rte_ethdev_driver.h
> @@ -125,6 +125,63 @@ rte_eth_dma_zone_reserve(const struct
> rte_eth_dev *eth_dev, const char *name,
>  			 uint16_t queue_id, size_t size,
>  			 unsigned align, int socket_id);
> 
> +
> +typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void
> +*init_params); typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev
> *ethdev,
> +	void *bus_specific_init_params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function for the creation of a new ethdev ports.
> + *
> + * @param device
> + *  rte_device handle.
> + * @param	name
> + *  port name.
> + * @param priv_data_size
> + *  size of private data required for port.
> + * @param bus_specific_init
> + *  port bus specific initialisation callback function
> + * @param bus_init_params
> + *  port bus specific initialisation parameters
> + * @param ethdev_init
> + *  device specific port initialization callback function
> + * @param init_params
> + *  port initialisation parameters
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_dev_create(struct rte_device *device, const char *name,
> +	size_t priv_data_size,
> +	ethdev_bus_specific_init bus_specific_init, void *bus_init_params,
> +	ethdev_init_t ethdev_init, void *init_params);
> +
> +
> +typedef int (*ethdev_uninit_t)(struct rte_eth_dev *ethdev);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function for cleaing up the resources of a ethdev port on
> +it's
> + * destruction.
> + *
> + * @param ethdev
> + *   ethdev handle of port.
> + * @param ethdev
> + *   device specific port un-initialise callback function
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
> +	ethdev_uninit_t ethdev_uninit);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ether/rte_ethdev_pci.h
> b/lib/librte_ether/rte_ethdev_pci.h
> index 897ce5b41..8604a0474 100644
> --- a/lib/librte_ether/rte_ethdev_pci.h
> +++ b/lib/librte_ether/rte_ethdev_pci.h
> @@ -70,6 +70,18 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev,
>  	eth_dev->data->numa_node = pci_dev->device.numa_node;  }
> 
> +static inline int
> +eth_dev_pci_specific_init(struct rte_eth_dev *eth_dev, void *bus_device)
> {
> +	struct rte_pci_device *pci_dev = bus_device;
> +
> +	if (!pci_dev)
> +		return -ENODEV;
> +
> +	rte_eth_copy_pci_info(eth_dev, pci_dev);
> +
> +	return 0;
> +}
> +
>  /**
>   * @internal
>   * Allocates a new ethdev slot for an ethernet device and returns the pointer
> diff --git a/lib/librte_ether/rte_ethdev_representor.h
> b/lib/librte_ether/rte_ethdev_representor.h
> new file mode 100644
> index 000000000..cbc1f2855
> --- /dev/null
> +++ b/lib/librte_ether/rte_ethdev_representor.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation.
> + */
> +
> +
> +#ifndef _RTE_ETHDEV_REPRESENTOR_H_
> +#define _RTE_ETHDEV_REPRESENTOR_H_
> +
> +#include <rte_ethdev_driver.h>
> +
> +static int
> +eth_dev_representor_port_init(struct rte_eth_dev *ethdev, void
> +*init_params) {
> +	struct rte_eth_dev *base_ethdev = init_params;
> +
> +	if (!ethdev || !base_ethdev)
> +		return -ENODEV;
> +
> +	/** representor shares same driver as it's base device */
> +	ethdev->device->driver = base_ethdev->device->driver;
> +
> +	/** representor inherits the switch id of it's base device */
> +	ethdev->data->switch_id = base_ethdev->data->switch_id;

Why not let the PMD to set it? 

The PMD knows the specific port is a represntor port and to which switch it belongs. 
Doing it on ethdev layer will block us in the future from having more complex model were, for example, there are multiple switch domain for a set of PF + representors. 

> +
> +	return 0;
> +}
> +
> +#endif /* _RTE_ETHDEV_REPRESENTOR_H_ */
> diff --git a/lib/librte_ether/rte_ethdev_version.map
> b/lib/librte_ether/rte_ethdev_version.map
> index 87f02fb74..48b08bc36 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -230,3 +230,11 @@ EXPERIMENTAL {
>  	rte_mtr_stats_update;
> 
>  } DPDK_17.11;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_eth_dev_create;
> +	rte_eth_dev_destroy;
> +
> +} DPDK_18.05;
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag Declan Doherty
@ 2018-03-29  6:13   ` Shahaf Shuler
  2018-03-29  7:34     ` Thomas Monjalon
  2018-03-29 14:53     ` Doherty, Declan
  0 siblings, 2 replies; 73+ messages in thread
From: Shahaf Shuler @ 2018-03-29  6:13 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> Subject: [dpdk-dev][PATCH v6 4/8] ethdev: Add port representor device flag
> 
> Add new device flag to specify that ethdev port is a port representor.
> Extend rte_eth_dev_info structure to expose device flags to user which
> enable applications to discover if a port is a representor port.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c             | 1 +
>  lib/librte_ether/rte_ethdev.h             | 9 ++++++---
>  lib/librte_ether/rte_ethdev_representor.h | 3 +++
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index c719f84a3..163246433 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2399,6 +2399,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
> rte_eth_dev_info *dev_info)
>  	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>  	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>  	dev_info->switch_id = dev->data->switch_id;
> +	dev_info->dev_flags = dev->data->dev_flags;
>  }
> 
>  int
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index dced4fc41..226acc8b1 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -996,6 +996,7 @@ struct rte_eth_dev_info {
>  	const char *driver_name; /**< Device Driver name. */
>  	unsigned int if_index; /**< Index to bound host interface, or 0 if
> none.
>  		Use if_indextoname() to translate into an interface name. */
> +	uint32_t dev_flags; /**< Device flags */
>  	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>  	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
> pkt. */
>  	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
> @@ -1229,11 +1230,13 @@ struct rte_eth_dev_owner {  };
> 
>  /** Device supports link state interrupt */
> -#define RTE_ETH_DEV_INTR_LSC     0x0002
> +#define RTE_ETH_DEV_INTR_LSC		0x0002
>  /** Device is a bonded slave */
> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
>  /** Device supports device removal interrupt */
> -#define RTE_ETH_DEV_INTR_RMV     0x0008
> +#define RTE_ETH_DEV_INTR_RMV		0x0008
> +/** Device is port representor */
> +#define RTE_ETH_DEV_REPRESENTOR		0x0010

Maybe it is a good time to make some order here. 
I understand the decision to use flags instead of bit-field. It is better. 

However there is a mix here of device capabilities like : RTE_ETH_DEV_INTR_LSC   and RTE_ETH_DEV_INTR_RMV   
And device attributes like : RTE_ETH_DEV_BONDED_SLAVE and RTE_ETH_DEV_REPRESENTOR.
I don't think they belong together under the genetic name of dev_flags. 

Moreover, I am not sure the fact device is bonded slave should be exposed to the application. It should be internal to ethdev and its port iterators. 

Finally I think representor port may need more info now (and in the future), for example the associated vf id.
For that, I think it is better it to be exposed as a dedicated struct on device info.

> 
>  /**
>   * @warning
> diff --git a/lib/librte_ether/rte_ethdev_representor.h
> b/lib/librte_ether/rte_ethdev_representor.h
> index cbc1f2855..f3726d0ba 100644
> --- a/lib/librte_ether/rte_ethdev_representor.h
> +++ b/lib/librte_ether/rte_ethdev_representor.h
> @@ -22,6 +22,9 @@ eth_dev_representor_port_init(struct rte_eth_dev
> *ethdev, void *init_params)
>  	/** representor inherits the switch id of it's base device */
>  	ethdev->data->switch_id = base_ethdev->data->switch_id;
> 
> +	/** Set device flags to specify that device is a representor port */
> +	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;

Should be set in the PMD, not in ethdev layer. 

> +
>  	return 0;
>  }
> 
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  2018-03-29  6:13   ` Shahaf Shuler
@ 2018-03-29  7:34     ` Thomas Monjalon
  2018-03-29 14:53     ` Doherty, Declan
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-03-29  7:34 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Declan Doherty, dev, Alex Rosenbaum, Ferruh Yigit, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

29/03/2018 08:13, Shahaf Shuler:
> And device attributes like : RTE_ETH_DEV_BONDED_SLAVE and RTE_ETH_DEV_REPRESENTOR.
> I don't think they belong together under the genetic name of dev_flags. 
> 
> Moreover, I am not sure the fact device is bonded slave should be exposed to the application. It should be internal to ethdev and its port iterators.

RTE_ETH_DEV_BONDED_SLAVE flag is used to prevent a manual detach of a slave.
I think it is wrong.
The bonding PMD should be able to manage any detach at any time,
because a real hardware plug-out can happen.

So I am in favor of removing RTE_ETH_DEV_BONDED_SLAVE.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port
  2018-03-29  6:13   ` Shahaf Shuler
@ 2018-03-29  9:13     ` Doherty, Declan
  2018-03-29 10:12       ` Shahaf Shuler
  0 siblings, 1 reply; 73+ messages in thread
From: Doherty, Declan @ 2018-03-29  9:13 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> Hi Declan,
> 
> Thanks for the series! See some comments below
> 
> Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
>> Subject: [dpdk-dev][PATCH v6 2/8] ethdev: add switch identifier parameter
>> to port
>>
>> Introduces a new port attribute to ethdev port's which denotes the switch
>> domain a port belongs to. By default all port's switch identifiers are the their
>> port_id. Ports which share a common switch domain are configured with the
>> same switch id.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> ---
>>   app/test-pmd/config.c              | 1 +
>>   lib/librte_ether/rte_ethdev.c      | 3 +++
>>   lib/librte_ether/rte_ethdev.h      | 1 +
>>   lib/librte_ether/rte_ethdev_core.h | 1 +
>>   4 files changed, 6 insertions(+)
>>
>> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
>> 4bb255c62..e12f8c515 100644
>> --- a/app/test-pmd/config.c
>> +++ b/app/test-pmd/config.c
>> @@ -517,6 +517,7 @@ port_infos_display(portid_t port_id)
>>   	printf("Min possible number of TXDs per queue: %hu\n",
>>   		dev_info.tx_desc_lim.nb_min);
>>   	printf("TXDs number alignment: %hu\n",
>> dev_info.tx_desc_lim.nb_align);
>> +	printf("Switch Id: %u\n", dev_info.switch_id);
>>   }
>>
>>   void
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 23857c91f..f32d18cad 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -290,6 +290,8 @@ rte_eth_dev_allocate(const char *name)
>>   	eth_dev = eth_dev_get(port_id);
>>   	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>> "%s", name);
>>   	eth_dev->data->port_id = port_id;
>> +	eth_dev->data->switch_id = port_id;
>> +	/**< Default switch_id is the port_id of the device */
> 
> Why such default is needed? Why not let the PMD to set it always?

I saw this a simple way to have a consistent default value (the port_id) 
for all PMDs, without the need to modify existing PMD which don't 
currently have any concept of a switch domain.

Also taking the approach of just leaving it up to the PMD to decide the 
value would mean that some form of synchronisation would be required so 
that two device don't select the switch domain identifier.

> 
>>   	eth_dev->data->mtu = ETHER_MTU;
>>
>>   unlock:
>> @@ -2395,6 +2397,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
>> rte_eth_dev_info *dev_info)
>>   	dev_info->driver_name = dev->device->driver->name;
>>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>> +	dev_info->switch_id = dev->data->switch_id;
> 
> Why there is a need to keep the switch_id on device data?
> I think PMD to store it on its private structure and report it in dev_info is enough.
> 

That way would require every PMD to be modified to maintain a switch_id 
structure, which I know isn't a big deal, as only the device which need 
to support it would need it, but we would need to change the 
dev_info->switch_id from being a uint16_t to being a signed value so we 
could have a default of -1 for device which don't support switch domain. 
I thought the former approach was cleaner.

>>   }
>>
>>   int
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index 036153306..dced4fc41 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -1029,6 +1029,7 @@ struct rte_eth_dev_info {
>>   	/** Configured number of rx/tx queues */
>>   	uint16_t nb_rx_queues; /**< Number of RX queues. */
>>   	uint16_t nb_tx_queues; /**< Number of TX queues. */
>> +	uint16_t switch_id; /**< Switch Domain Id */
>>   };
>>
>>   /**
>> diff --git a/lib/librte_ether/rte_ethdev_core.h
>> b/lib/librte_ether/rte_ethdev_core.h
>> index e5681e466..caed7a4e6 100644
>> --- a/lib/librte_ether/rte_ethdev_core.h
>> +++ b/lib/librte_ether/rte_ethdev_core.h
>> @@ -585,6 +585,7 @@ struct rte_eth_dev_data {
>>   	struct ether_addr* hash_mac_addrs;
>>   	/** Device Ethernet MAC addresses of hash filtering. */
>>   	uint16_t port_id;           /**< Device [external] port identifier. */
>> +	uint16_t switch_id;	    /**< Switch which port is associated with
>> */
>>   	__extension__
>>   	uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0).
>> */
>>   		scattered_rx : 1,  /**< RX of scattered packets is ON(1) /
>> OFF(0) */
>> --
>> 2.14.3
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs
  2018-03-29  6:13   ` Shahaf Shuler
@ 2018-03-29  9:22     ` Doherty, Declan
  0 siblings, 0 replies; 73+ messages in thread
From: Doherty, Declan @ 2018-03-29  9:22 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
>> Subject: [dpdk-dev][PATCH v6 3/8] ethdev: add generic create/destroy
>> ethdev APIs
>>
>> Add new bus generic ethdev create/destroy APIs which are bus independent
>> and provide hooks for bus specific initialisation.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> ---
>>   lib/librte_ether/Makefile                 |  1 +
>>   lib/librte_ether/meson.build              |  1 +
>>   lib/librte_ether/rte_ethdev.c             | 96
>> ++++++++++++++++++++++++++++++-
>>   lib/librte_ether/rte_ethdev_driver.h      | 57 ++++++++++++++++++
>>   lib/librte_ether/rte_ethdev_pci.h         | 12 ++++
>>   lib/librte_ether/rte_ethdev_representor.h | 28 +++++++++
>>   lib/librte_ether/rte_ethdev_version.map   |  8 +++
>>   7 files changed, 202 insertions(+), 1 deletion(-)  create mode 100644
>> lib/librte_ether/rte_ethdev_representor.h
>>
>> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile index
>> 3ca5782bb..5698cd47b 100644
>> --- a/lib/librte_ether/Makefile
>> +++ b/lib/librte_ether/Makefile
>> @@ -32,6 +32,7 @@ SYMLINK-y-include += rte_ethdev_driver.h  SYMLINK-y-
>> include += rte_ethdev_core.h  SYMLINK-y-include += rte_ethdev_pci.h
>> SYMLINK-y-include += rte_ethdev_vdev.h
>> +SYMLINK-y-include += rte_ethdev_representor.h
>>   SYMLINK-y-include += rte_eth_ctrl.h
>>   SYMLINK-y-include += rte_dev_info.h
>>   SYMLINK-y-include += rte_flow.h
>> diff --git a/lib/librte_ether/meson.build b/lib/librte_ether/meson.build
>> index 7fed86056..163891556 100644
>> --- a/lib/librte_ether/meson.build
>> +++ b/lib/librte_ether/meson.build
>> @@ -15,6 +15,7 @@ headers = files('rte_ethdev.h',
>>   	'rte_ethdev_core.h',
>>   	'rte_ethdev_pci.h',
>>   	'rte_ethdev_vdev.h',
>> +	'rte_ethdev_representor.h',
>>   	'rte_eth_ctrl.h',
>>   	'rte_dev_info.h',
>>   	'rte_flow.h',
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index f32d18cad..c719f84a3 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -345,7 +345,8 @@ rte_eth_dev_release_port(struct rte_eth_dev
>> *eth_dev)
>>   	rte_eth_dev_shared_data_prepare();
>>
>>   	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
>> -
>> +	eth_dev->device = NULL;
>> +	eth_dev->intr_handle = NULL;
>>   	eth_dev->state = RTE_ETH_DEV_UNUSED;
>>
>>   	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data)); @@ -
>> 3403,6 +3404,99 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev
>> *dev, const char *ring_name,
>>   	return rte_memzone_reserve_aligned(z_name, size, socket_id, 0,
>> align);  }
>>
>> +int __rte_experimental
>> +rte_eth_dev_create(struct rte_device *device, const char *name,
>> +	size_t priv_data_size,
>> +	ethdev_bus_specific_init ethdev_bus_specific_init,
>> +	void *bus_init_params,
>> +	ethdev_init_t ethdev_init, void *init_params) {
>> +	struct rte_eth_dev *ethdev;
>> +	int retval;
>> +
>> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>> +		ethdev = rte_eth_dev_allocate(name);
>> +		if (!ethdev) {
>> +			retval = -ENODEV;
>> +			goto probe_failed;
>> +		}
>> +
>> +		if (priv_data_size) {
>> +			ethdev->data->dev_private = rte_zmalloc_socket(
>> +				name, priv_data_size,
>> RTE_CACHE_LINE_SIZE,
>> +				device->numa_node);
>> +
>> +			if (!ethdev->data->dev_private) {
>> +				RTE_LOG(ERR, EAL, "failed to allocate private
>> data");
>> +				retval = -ENOMEM;
>> +				goto probe_failed;
>> +			}
>> +		}
>> +	} else {
>> +		ethdev = rte_eth_dev_attach_secondary(name);
>> +		if (!ethdev) {
>> +			RTE_LOG(ERR, EAL, "secondary process attach failed,
>> "
>> +				"ethdev doesn't exist");
>> +			retval = -ENODEV;
>> +			goto probe_failed;
>> +		}
>> +	}
>> +
>> +	ethdev->device = device;
>> +
>> +	if (ethdev_bus_specific_init) {
>> +		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
>> +		if (retval) {
>> +			RTE_LOG(ERR, EAL,
>> +				"ethdev bus specific initialisation failed");
>> +			goto probe_failed;
>> +		}
>> +	}
>> +
>> +	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);
>> +	retval = ethdev_init(ethdev, init_params);
>> +	if (retval) {
>> +		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
>> +		goto probe_failed;
>> +	}
>> +
>> +	return retval;
>> +probe_failed:
>> +	/* free ports private data if primary process */
>> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>> +		rte_free(ethdev->data->dev_private);
>> +
>> +	rte_eth_dev_release_port(ethdev);
>> +
>> +	return retval;
>> +}
>> +
>> +int  __rte_experimental
>> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
>> +	ethdev_uninit_t ethdev_uninit)
>> +{
>> +	int ret;
>> +
>> +	ethdev = rte_eth_dev_allocated(ethdev->data->name);
>> +	if (!ethdev)
>> +		return -ENODEV;
>> +
>> +	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_uninit, -EINVAL);
>> +	if (ethdev_uninit) {
>> +		ret = ethdev_uninit(ethdev);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>> +		rte_free(ethdev->data->dev_private);
>> +
>> +	ethdev->data->dev_private = NULL;
>> +
>> +	return rte_eth_dev_release_port(ethdev); }
>> +
>> +
>>   int
>>   rte_eth_dev_rx_intr_ctl_q(uint16_t port_id, uint16_t queue_id,
>>   			  int epfd, int op, void *data)
>> diff --git a/lib/librte_ether/rte_ethdev_driver.h
>> b/lib/librte_ether/rte_ethdev_driver.h
>> index 45f08c65e..4896cea93 100644
>> --- a/lib/librte_ether/rte_ethdev_driver.h
>> +++ b/lib/librte_ether/rte_ethdev_driver.h
>> @@ -125,6 +125,63 @@ rte_eth_dma_zone_reserve(const struct
>> rte_eth_dev *eth_dev, const char *name,
>>   			 uint16_t queue_id, size_t size,
>>   			 unsigned align, int socket_id);
>>
>> +
>> +typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void
>> +*init_params); typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev
>> *ethdev,
>> +	void *bus_specific_init_params);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * PMD helper function for the creation of a new ethdev ports.
>> + *
>> + * @param device
>> + *  rte_device handle.
>> + * @param	name
>> + *  port name.
>> + * @param priv_data_size
>> + *  size of private data required for port.
>> + * @param bus_specific_init
>> + *  port bus specific initialisation callback function
>> + * @param bus_init_params
>> + *  port bus specific initialisation parameters
>> + * @param ethdev_init
>> + *  device specific port initialization callback function
>> + * @param init_params
>> + *  port initialisation parameters
>> + *
>> + * @return
>> + *   Negative errno value on error, 0 on success.
>> + */
>> +int __rte_experimental
>> +rte_eth_dev_create(struct rte_device *device, const char *name,
>> +	size_t priv_data_size,
>> +	ethdev_bus_specific_init bus_specific_init, void *bus_init_params,
>> +	ethdev_init_t ethdev_init, void *init_params);
>> +
>> +
>> +typedef int (*ethdev_uninit_t)(struct rte_eth_dev *ethdev);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * PMD helper function for cleaing up the resources of a ethdev port on
>> +it's
>> + * destruction.
>> + *
>> + * @param ethdev
>> + *   ethdev handle of port.
>> + * @param ethdev
>> + *   device specific port un-initialise callback function
>> + *
>> + * @return
>> + *   Negative errno value on error, 0 on success.
>> + */
>> +int __rte_experimental
>> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
>> +	ethdev_uninit_t ethdev_uninit);
>> +
>>   #ifdef __cplusplus
>>   }
>>   #endif
>> diff --git a/lib/librte_ether/rte_ethdev_pci.h
>> b/lib/librte_ether/rte_ethdev_pci.h
>> index 897ce5b41..8604a0474 100644
>> --- a/lib/librte_ether/rte_ethdev_pci.h
>> +++ b/lib/librte_ether/rte_ethdev_pci.h
>> @@ -70,6 +70,18 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev,
>>   	eth_dev->data->numa_node = pci_dev->device.numa_node;  }
>>
>> +static inline int
>> +eth_dev_pci_specific_init(struct rte_eth_dev *eth_dev, void *bus_device)
>> {
>> +	struct rte_pci_device *pci_dev = bus_device;
>> +
>> +	if (!pci_dev)
>> +		return -ENODEV;
>> +
>> +	rte_eth_copy_pci_info(eth_dev, pci_dev);
>> +
>> +	return 0;
>> +}
>> +
>>   /**
>>    * @internal
>>    * Allocates a new ethdev slot for an ethernet device and returns the pointer
>> diff --git a/lib/librte_ether/rte_ethdev_representor.h
>> b/lib/librte_ether/rte_ethdev_representor.h
>> new file mode 100644
>> index 000000000..cbc1f2855
>> --- /dev/null
>> +++ b/lib/librte_ether/rte_ethdev_representor.h
>> @@ -0,0 +1,28 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2018 Intel Corporation.
>> + */
>> +
>> +
>> +#ifndef _RTE_ETHDEV_REPRESENTOR_H_
>> +#define _RTE_ETHDEV_REPRESENTOR_H_
>> +
>> +#include <rte_ethdev_driver.h>
>> +
>> +static int
>> +eth_dev_representor_port_init(struct rte_eth_dev *ethdev, void
>> +*init_params) {
>> +	struct rte_eth_dev *base_ethdev = init_params;
>> +
>> +	if (!ethdev || !base_ethdev)
>> +		return -ENODEV;
>> +
>> +	/** representor shares same driver as it's base device */
>> +	ethdev->device->driver = base_ethdev->device->driver;
>> +
>> +	/** representor inherits the switch id of it's base device */
>> +	ethdev->data->switch_id = base_ethdev->data->switch_id;
> 
> Why not let the PMD to set it?
> 
> The PMD knows the specific port is a represntor port and to which switch it belongs.
> Doing it on ethdev layer will block us in the future from having more complex model were, for example, there are multiple switch domain for a set of PF + representors.
> 

This is only a generic helper function for a representor port bus 
specific initialisation, and can be passed as the bus_specific_init 
parameter in the rte_eth_dev_create function, but whether or not to use 
it is complete up to the PMD. It only handles the simple case of a 
single function device with multiple representors on a single switch 
domain. For more complex use cases, like multiple switch domains, the 
PMD can provide it's own implementation of this. I thought it was useful 
to provide as a common function as for the ixgbe, i40e it fulfills there 
requirements, and will probably be sufficient for many other devices too.

>> +
>> +	return 0;
>> +}
>> +
>> +#endif /* _RTE_ETHDEV_REPRESENTOR_H_ */
>> diff --git a/lib/librte_ether/rte_ethdev_version.map
>> b/lib/librte_ether/rte_ethdev_version.map
>> index 87f02fb74..48b08bc36 100644
>> --- a/lib/librte_ether/rte_ethdev_version.map
>> +++ b/lib/librte_ether/rte_ethdev_version.map
>> @@ -230,3 +230,11 @@ EXPERIMENTAL {
>>   	rte_mtr_stats_update;
>>
>>   } DPDK_17.11;
>> +
>> +EXPERIMENTAL {
>> +	global:
>> +
>> +	rte_eth_dev_create;
>> +	rte_eth_dev_destroy;
>> +
>> +} DPDK_18.05;
>> --
>> 2.14.3
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port
  2018-03-29  9:13     ` Doherty, Declan
@ 2018-03-29 10:12       ` Shahaf Shuler
  2018-03-29 15:12         ` Doherty, Declan
  0 siblings, 1 reply; 73+ messages in thread
From: Shahaf Shuler @ 2018-03-29 10:12 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

Thursday, March 29, 2018 12:14 PM, Doherty, Declan:
> On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> > Hi Declan,
> >
> > Thanks for the series! See some comments below
> >
> > Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> >> Subject: [dpdk-dev][PATCH v6 2/8] ethdev: add switch identifier
> >> parameter to port
> >>
> >> Introduces a new port attribute to ethdev port's which denotes the
> >> switch domain a port belongs to. By default all port's switch
> >> identifiers are the their port_id. Ports which share a common switch
> >> domain are configured with the same switch id.
> >>
> >> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> >> ---
> >>   app/test-pmd/config.c              | 1 +
> >>   lib/librte_ether/rte_ethdev.c      | 3 +++
> >>   lib/librte_ether/rte_ethdev.h      | 1 +
> >>   lib/librte_ether/rte_ethdev_core.h | 1 +
> >>   4 files changed, 6 insertions(+)
> >>
> >> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> >> 4bb255c62..e12f8c515 100644
> >> --- a/app/test-pmd/config.c
> >> +++ b/app/test-pmd/config.c
> >> @@ -517,6 +517,7 @@ port_infos_display(portid_t port_id)
> >>   	printf("Min possible number of TXDs per queue: %hu\n",
> >>   		dev_info.tx_desc_lim.nb_min);
> >>   	printf("TXDs number alignment: %hu\n",
> >> dev_info.tx_desc_lim.nb_align);
> >> +	printf("Switch Id: %u\n", dev_info.switch_id);
> >>   }
> >>
> >>   void
> >> diff --git a/lib/librte_ether/rte_ethdev.c
> >> b/lib/librte_ether/rte_ethdev.c index 23857c91f..f32d18cad 100644
> >> --- a/lib/librte_ether/rte_ethdev.c
> >> +++ b/lib/librte_ether/rte_ethdev.c
> >> @@ -290,6 +290,8 @@ rte_eth_dev_allocate(const char *name)
> >>   	eth_dev = eth_dev_get(port_id);
> >>   	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
> "%s",
> >> name);
> >>   	eth_dev->data->port_id = port_id;
> >> +	eth_dev->data->switch_id = port_id;
> >> +	/**< Default switch_id is the port_id of the device */
> >
> > Why such default is needed? Why not let the PMD to set it always?
> 
> I saw this a simple way to have a consistent default value (the port_id) for all
> PMDs, without the need to modify existing PMD which don't currently have
> any concept of a switch domain.

The default value don't makes much sense though. By default it would mean the application each port is on different switch domain. This is obviously not true in case of multiple VFs 

Maybe we can define ETH_SWITCH_ID_INVALID (0) to emphasis 0 is not the switch_id for the PMDs which didn't implemented it. 

> 
> Also taking the approach of just leaving it up to the PMD to decide the value
> would mean that some form of synchronisation would be required so that
> two device don't select the switch domain identifier.

What kind of knowledge ethdev layer has to set the switche_id? It should be based on the underlying capabilities of the device.
Some devices will use the kernel sysfs  to check that. 

Maybe there are devices which are able to expose the same switch for multiple devices, passing the packet between them using  peer2peer trough the PCI.

The point is we need to have APIs which will enable all the future flexibility. 

Maybe we can have array of switch_id which are allready taken by the underlying devices, and PMD will register their switch_id to it atomically. To ease the synchronization. 

> 
> >
> >>   	eth_dev->data->mtu = ETHER_MTU;
> >>
> >>   unlock:
> >> @@ -2395,6 +2397,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
> >> rte_eth_dev_info *dev_info)
> >>   	dev_info->driver_name = dev->device->driver->name;
> >>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
> >>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> >> +	dev_info->switch_id = dev->data->switch_id;
> >
> > Why there is a need to keep the switch_id on device data?
> > I think PMD to store it on its private structure and report it in dev_info is
> enough.
> >
> 
> That way would require every PMD to be modified to maintain a switch_id
> structure, which I know isn't a big deal, as only the device which need to
> support it would need it, but we would need to change the dev_info-
> >switch_id from being a uint16_t to being a signed value so we could have a
> default of -1 for device which don't support switch domain.

We can say also 0 is the invalid switch id. 

> I thought the former approach was cleaner.
> 
> >>   }
> >>
> >>   int
> >> diff --git a/lib/librte_ether/rte_ethdev.h
> >> b/lib/librte_ether/rte_ethdev.h index 036153306..dced4fc41 100644
> >> --- a/lib/librte_ether/rte_ethdev.h
> >> +++ b/lib/librte_ether/rte_ethdev.h
> >> @@ -1029,6 +1029,7 @@ struct rte_eth_dev_info {
> >>   	/** Configured number of rx/tx queues */
> >>   	uint16_t nb_rx_queues; /**< Number of RX queues. */
> >>   	uint16_t nb_tx_queues; /**< Number of TX queues. */
> >> +	uint16_t switch_id; /**< Switch Domain Id */
> >>   };
> >>
> >>   /**
> >> diff --git a/lib/librte_ether/rte_ethdev_core.h
> >> b/lib/librte_ether/rte_ethdev_core.h
> >> index e5681e466..caed7a4e6 100644
> >> --- a/lib/librte_ether/rte_ethdev_core.h
> >> +++ b/lib/librte_ether/rte_ethdev_core.h
> >> @@ -585,6 +585,7 @@ struct rte_eth_dev_data {
> >>   	struct ether_addr* hash_mac_addrs;
> >>   	/** Device Ethernet MAC addresses of hash filtering. */
> >>   	uint16_t port_id;           /**< Device [external] port identifier. */
> >> +	uint16_t switch_id;	    /**< Switch which port is associated with
> >> */
> >>   	__extension__
> >>   	uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0).
> >> */
> >>   		scattered_rx : 1,  /**< RX of scattered packets is ON(1) /
> >> OFF(0) */
> >> --
> >> 2.14.3
> >


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 6/8] ethdev: add common devargs parser
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 6/8] ethdev: add common devargs parser Declan Doherty
@ 2018-03-29 12:12   ` Gaëtan Rivet
  0 siblings, 0 replies; 73+ messages in thread
From: Gaëtan Rivet @ 2018-03-29 12:12 UTC (permalink / raw)
  To: Declan Doherty
  Cc: dev, Ferruh Yigit, Thomas Monjalon, Mohammad Abdul Awal,
	Remy Horton, Yuanhan Liu

Hi,

On Wed, Mar 28, 2018 at 02:54:31PM +0100, Declan Doherty wrote:
> From: Remy Horton <remy.horton@intel.com>
> 
> Introduces a new structure, rte_eth_devargs, to support generic
> ethdev arguments common across NET PMDs, with a new API
> rte_eth_devargs_parse API to support PMD parsing these arguments.
> 

Here is the future layout of rte_devargs:

   1. The rte_class introduced by [1], should be expanded with a "find_device"
      function, equivalent to that of the rte_bus interface.

   2. Class and Bus should export a match function as well as a parse
      function. The match function would be used for device querying,
      parsing would serve for device declaration.

   3. The match function is already implemented by [1].
      The parse function for bus already exists and should now
      be expanded to classes as well. Its expected input should
      change to be a list of kvargs.

   4. Accompanying those changes, the rte_devargs lib would
      now divide the device string in the three layers identified,
      use the class parse function to identify the intended class, and
      be able to feed each layers with its proper input.

This way, this API should be generic to all layers.

Now, this work in underway but takes time.
The current patch I think is an attempt to go in the right direction,
but in the end is only a compromise between the simple way and the
generic way.

Instead of having rte_eth_devargs_parse, you could have had a simple
rte_eth_representor_set(uint16_t *pid_list, size_t len);

That would have set the proper info within the rte_eth_dev_data. The port_id
list would have been parsed by your PMD by reading the representor
option.

The current version, that feeds directly the devargs to the eth layer,
makes conflicts inevitable (with PMDs having potential representor as
their parameter, or for future ether parameters such as "mac" that will
conflicts with current existing PMD parameters).

I would say that this implementation should be simple at first, for
the current work on representor. If the generic API is ready for this
release, then we might integrate afterward.

[1]: https://dpdk.org/ml/archives/dev/2018-March/092891.html

> Signed-off-by: Remy Horton <remy.horton@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---

Regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  2018-03-29  6:13   ` Shahaf Shuler
  2018-03-29  7:34     ` Thomas Monjalon
@ 2018-03-29 14:53     ` Doherty, Declan
  2018-04-01  6:14       ` Shahaf Shuler
  1 sibling, 1 reply; 73+ messages in thread
From: Doherty, Declan @ 2018-03-29 14:53 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
>> Subject: [dpdk-dev][PATCH v6 4/8] ethdev: Add port representor device flag
>>
>> Add new device flag to specify that ethdev port is a port representor.
>> Extend rte_eth_dev_info structure to expose device flags to user which
>> enable applications to discover if a port is a representor port.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> ---
>>   lib/librte_ether/rte_ethdev.c             | 1 +
>>   lib/librte_ether/rte_ethdev.h             | 9 ++++++---
>>   lib/librte_ether/rte_ethdev_representor.h | 3 +++
>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index c719f84a3..163246433 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -2399,6 +2399,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
>> rte_eth_dev_info *dev_info)
>>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>>   	dev_info->switch_id = dev->data->switch_id;
>> +	dev_info->dev_flags = dev->data->dev_flags;
>>   }
>>
>>   int
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index dced4fc41..226acc8b1 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -996,6 +996,7 @@ struct rte_eth_dev_info {
>>   	const char *driver_name; /**< Device Driver name. */
>>   	unsigned int if_index; /**< Index to bound host interface, or 0 if
>> none.
>>   		Use if_indextoname() to translate into an interface name. */
>> +	uint32_t dev_flags; /**< Device flags */
>>   	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>>   	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
>> pkt. */
>>   	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
>> @@ -1229,11 +1230,13 @@ struct rte_eth_dev_owner {  };
>>
>>   /** Device supports link state interrupt */
>> -#define RTE_ETH_DEV_INTR_LSC     0x0002
>> +#define RTE_ETH_DEV_INTR_LSC		0x0002
>>   /** Device is a bonded slave */
>> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
>> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
>>   /** Device supports device removal interrupt */
>> -#define RTE_ETH_DEV_INTR_RMV     0x0008
>> +#define RTE_ETH_DEV_INTR_RMV		0x0008
>> +/** Device is port representor */
>> +#define RTE_ETH_DEV_REPRESENTOR		0x0010
> 
> Maybe it is a good time to make some order here.
> I understand the decision to use flags instead of bit-field. It is better.
> 
> However there is a mix here of device capabilities like : RTE_ETH_DEV_INTR_LSC   and RTE_ETH_DEV_INTR_RMV
> And device attributes like : RTE_ETH_DEV_BONDED_SLAVE and RTE_ETH_DEV_REPRESENTOR.
> I don't think they belong together under the genetic name of dev_flags.
> 
> Moreover, I am not sure the fact device is bonded slave should be exposed to the application. It should be internal to ethdev and its port iterators.

That's a good point on the bonded slave flag, I'll look at fixing that 
for the next release. I don't think changing it should effect ABI but 
I'll need to have a closer look.

Do you think that we should have a separate device attributes field, 
which the representor flag is contained in.

> 
> Finally I think representor port may need more info now (and in the future), for example the associated vf id.
> For that, I think it is better it to be exposed as a dedicated struct on device info.

I think a switch port id should suffice for that, for SR-IOV devices it 
would map to the vf_id.

> 
>>
>>   /**
>>    * @warning
>> diff --git a/lib/librte_ether/rte_ethdev_representor.h
>> b/lib/librte_ether/rte_ethdev_representor.h
>> index cbc1f2855..f3726d0ba 100644
>> --- a/lib/librte_ether/rte_ethdev_representor.h
>> +++ b/lib/librte_ether/rte_ethdev_representor.h
>> @@ -22,6 +22,9 @@ eth_dev_representor_port_init(struct rte_eth_dev
>> *ethdev, void *init_params)
>>   	/** representor inherits the switch id of it's base device */
>>   	ethdev->data->switch_id = base_ethdev->data->switch_id;
>>
>> +	/** Set device flags to specify that device is a representor port */
>> +	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
> 
> Should be set in the PMD, not in ethdev layer

As in the previous patch this is just a generic port bus init function 
which meets the simplest use case of representor port with a single 
switch domain, a PMD doesn't need to use it but having it here saves 
duplicating the same code across multiple PMD which are only supporting 
the basic mode.

> 
>> +
>>   	return 0;
>>   }
>>
>> --
>> 2.14.3
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port
  2018-03-29 10:12       ` Shahaf Shuler
@ 2018-03-29 15:12         ` Doherty, Declan
  2018-04-01  6:10           ` Shahaf Shuler
  0 siblings, 1 reply; 73+ messages in thread
From: Doherty, Declan @ 2018-03-29 15:12 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Jingjing Wu, Wenzhuo Lu,
	Vincent Jardin, Yuanhan Liu, Bruce Richardson,
	Konstantin Ananyev, Zhihong Wang

On 29/03/2018 11:12 AM, Shahaf Shuler wrote:
> Thursday, March 29, 2018 12:14 PM, Doherty, Declan:
>> On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
>>> Hi Declan,
>>>
>>> Thanks for the series! See some comments below
>>>
>>> Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
>>>> Subject: [dpdk-dev][PATCH v6 2/8] ethdev: add switch identifier
>>>> parameter to port
>>>>
>>>> Introduces a new port attribute to ethdev port's which denotes the
>>>> switch domain a port belongs to. By default all port's switch
>>>> identifiers are the their port_id. Ports which share a common switch
>>>> domain are configured with the same switch id.
>>>>
>>>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>>>> ---
>>>>    app/test-pmd/config.c              | 1 +
>>>>    lib/librte_ether/rte_ethdev.c      | 3 +++
>>>>    lib/librte_ether/rte_ethdev.h      | 1 +
>>>>    lib/librte_ether/rte_ethdev_core.h | 1 +
>>>>    4 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
>>>> 4bb255c62..e12f8c515 100644
>>>> --- a/app/test-pmd/config.c
>>>> +++ b/app/test-pmd/config.c
>>>> @@ -517,6 +517,7 @@ port_infos_display(portid_t port_id)
>>>>    	printf("Min possible number of TXDs per queue: %hu\n",
>>>>    		dev_info.tx_desc_lim.nb_min);
>>>>    	printf("TXDs number alignment: %hu\n",
>>>> dev_info.tx_desc_lim.nb_align);
>>>> +	printf("Switch Id: %u\n", dev_info.switch_id);
>>>>    }
>>>>
>>>>    void
>>>> diff --git a/lib/librte_ether/rte_ethdev.c
>>>> b/lib/librte_ether/rte_ethdev.c index 23857c91f..f32d18cad 100644
>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>> @@ -290,6 +290,8 @@ rte_eth_dev_allocate(const char *name)
>>>>    	eth_dev = eth_dev_get(port_id);
>>>>    	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>> "%s",
>>>> name);
>>>>    	eth_dev->data->port_id = port_id;
>>>> +	eth_dev->data->switch_id = port_id;
>>>> +	/**< Default switch_id is the port_id of the device */
>>>
>>> Why such default is needed? Why not let the PMD to set it always?
>>
>> I saw this a simple way to have a consistent default value (the port_id) for all
>> PMDs, without the need to modify existing PMD which don't currently have
>> any concept of a switch domain.
> 
> The default value don't makes much sense though. By default it would mean the application each port is on different switch domain. This is obviously not true in case of multiple VFs

But it is the case today for all ports, that they are assumed to be in 
separate switch domains or at least in unknowable switch domains by 
applications, with the exception of LAG groups, were it is implicit that 
they are in the same domain. Even if two ports were attached to the same 
switch, or if you where using multiple VFs from the same switch there is 
currently no way of telling the application that this is the case. What 
would be the use case for having multiple VFs from the same port/switch 
domain in a single data path application. It doesn't seem likely to me 
that you would use hardware switching to move traffic between 2 VFs in 
the same data path application.

If you are using port representors, I assumed that you would set a valid 
switch domain id, possibly picking up the port id of the first port 
created on that switch domain, but it would be completely up to the PMD 
how this was done, I have just shown the simplest model in which the 
port_id of the PF is used for all subsequent representor ports in that 
switch domain, but if you wanted to create multiple domains within the 
same device, you could just use subsequent port representors port_ids to 
setup further switch domains. A switch_id is only meaningful within the 
contex of DPDK process it is running in.

> 
> Maybe we can define ETH_SWITCH_ID_INVALID (0) to emphasis 0 is not the switch_id for the PMDs which didn't implemented it.
> 
>>
>> Also taking the approach of just leaving it up to the PMD to decide the value
>> would mean that some form of synchronisation would be required so that
>> two device don't select the switch domain identifier.
> 
> What kind of knowledge ethdev layer has to set the switche_id? It should be based on the underlying capabilities of the device.
> Some devices will use the kernel sysfs  to check that.
> 
> Maybe there are devices which are able to expose the same switch for multiple devices, passing the packet between them using  peer2peer trough the PCI.
> 
> The point is we need to have APIs which will enable all the future flexibility.
> 
> Maybe we can have array of switch_id which are allready taken by the underlying devices, and PMD will register their switch_id to it atomically. To ease the synchronization.
> 
>>
>>>
>>>>    	eth_dev->data->mtu = ETHER_MTU;
>>>>
>>>>    unlock:
>>>> @@ -2395,6 +2397,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
>>>> rte_eth_dev_info *dev_info)
>>>>    	dev_info->driver_name = dev->device->driver->name;
>>>>    	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>>>>    	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>>>> +	dev_info->switch_id = dev->data->switch_id;
>>>
>>> Why there is a need to keep the switch_id on device data?
>>> I think PMD to store it on its private structure and report it in dev_info is
>> enough.
>>>
>>
>> That way would require every PMD to be modified to maintain a switch_id
>> structure, which I know isn't a big deal, as only the device which need to
>> support it would need it, but we would need to change the dev_info-
>>> switch_id from being a uint16_t to being a signed value so we could have a
>> default of -1 for device which don't support switch domain.
> 
> We can say also 0 is the invalid switch id.
> 
>> I thought the former approach was cleaner.
>>
>>>>    }
>>>>
>>>>    int
>>>> diff --git a/lib/librte_ether/rte_ethdev.h
>>>> b/lib/librte_ether/rte_ethdev.h index 036153306..dced4fc41 100644
>>>> --- a/lib/librte_ether/rte_ethdev.h
>>>> +++ b/lib/librte_ether/rte_ethdev.h
>>>> @@ -1029,6 +1029,7 @@ struct rte_eth_dev_info {
>>>>    	/** Configured number of rx/tx queues */
>>>>    	uint16_t nb_rx_queues; /**< Number of RX queues. */
>>>>    	uint16_t nb_tx_queues; /**< Number of TX queues. */
>>>> +	uint16_t switch_id; /**< Switch Domain Id */
>>>>    };
>>>>
>>>>    /**
>>>> diff --git a/lib/librte_ether/rte_ethdev_core.h
>>>> b/lib/librte_ether/rte_ethdev_core.h
>>>> index e5681e466..caed7a4e6 100644
>>>> --- a/lib/librte_ether/rte_ethdev_core.h
>>>> +++ b/lib/librte_ether/rte_ethdev_core.h
>>>> @@ -585,6 +585,7 @@ struct rte_eth_dev_data {
>>>>    	struct ether_addr* hash_mac_addrs;
>>>>    	/** Device Ethernet MAC addresses of hash filtering. */
>>>>    	uint16_t port_id;           /**< Device [external] port identifier. */
>>>> +	uint16_t switch_id;	    /**< Switch which port is associated with
>>>> */
>>>>    	__extension__
>>>>    	uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0).
>>>> */
>>>>    		scattered_rx : 1,  /**< RX of scattered packets is ON(1) /
>>>> OFF(0) */
>>>> --
>>>> 2.14.3
>>>
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port
  2018-03-29 15:12         ` Doherty, Declan
@ 2018-04-01  6:10           ` Shahaf Shuler
  0 siblings, 0 replies; 73+ messages in thread
From: Shahaf Shuler @ 2018-04-01  6:10 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Jingjing Wu, Wenzhuo Lu,
	Vincent Jardin, Yuanhan Liu, Bruce Richardson,
	Konstantin Ananyev, Zhihong Wang



--Shahaf


> -----Original Message-----
> From: Doherty, Declan [mailto:declan.doherty@intel.com]
> Sent: Thursday, March 29, 2018 6:13 PM
> To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Cc: Alex Rosenbaum <alexr@mellanox.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Qi
> Zhang <qi.z.zhang@intel.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Mohammad Abdul Awal
> <mohammad.abdul.awal@intel.com>; Remy Horton
> <remy.horton@intel.com>; John McNamara <john.mcnamara@intel.com>;
> Rony Efraim <ronye@mellanox.com>; Jingjing Wu <jingjing.wu@intel.com>;
> Wenzhuo Lu <wenzhuo.lu@intel.com>; Vincent Jardin
> <vincent.jardin@6wind.com>; Yuanhan Liu <yliu@fridaylinux.org>; Bruce
> Richardson <bruce.richardson@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Zhihong Wang
> <zhihong.wang@intel.com>
> Subject: Re: [dpdk-dev][PATCH v6 2/8] ethdev: add switch identifier
> parameter to port
> 
> On 29/03/2018 11:12 AM, Shahaf Shuler wrote:
> > Thursday, March 29, 2018 12:14 PM, Doherty, Declan:
> >> On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> >>> Hi Declan,
> >>>
> >>> Thanks for the series! See some comments below
> >>>
> >>> Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> >>>> Subject: [dpdk-dev][PATCH v6 2/8] ethdev: add switch identifier
> >>>> parameter to port
> >>>>
> >>>> Introduces a new port attribute to ethdev port's which denotes the
> >>>> switch domain a port belongs to. By default all port's switch
> >>>> identifiers are the their port_id. Ports which share a common
> >>>> switch domain are configured with the same switch id.
> >>>>
> >>>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> >>>> ---
> >>>>    app/test-pmd/config.c              | 1 +
> >>>>    lib/librte_ether/rte_ethdev.c      | 3 +++
> >>>>    lib/librte_ether/rte_ethdev.h      | 1 +
> >>>>    lib/librte_ether/rte_ethdev_core.h | 1 +
> >>>>    4 files changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> >>>> 4bb255c62..e12f8c515 100644
> >>>> --- a/app/test-pmd/config.c
> >>>> +++ b/app/test-pmd/config.c
> >>>> @@ -517,6 +517,7 @@ port_infos_display(portid_t port_id)
> >>>>    	printf("Min possible number of TXDs per queue: %hu\n",
> >>>>    		dev_info.tx_desc_lim.nb_min);
> >>>>    	printf("TXDs number alignment: %hu\n",
> >>>> dev_info.tx_desc_lim.nb_align);
> >>>> +	printf("Switch Id: %u\n", dev_info.switch_id);
> >>>>    }
> >>>>
> >>>>    void
> >>>> diff --git a/lib/librte_ether/rte_ethdev.c
> >>>> b/lib/librte_ether/rte_ethdev.c index 23857c91f..f32d18cad 100644
> >>>> --- a/lib/librte_ether/rte_ethdev.c
> >>>> +++ b/lib/librte_ether/rte_ethdev.c
> >>>> @@ -290,6 +290,8 @@ rte_eth_dev_allocate(const char *name)
> >>>>    	eth_dev = eth_dev_get(port_id);
> >>>>    	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
> >> "%s",
> >>>> name);
> >>>>    	eth_dev->data->port_id = port_id;
> >>>> +	eth_dev->data->switch_id = port_id;
> >>>> +	/**< Default switch_id is the port_id of the device */
> >>>
> >>> Why such default is needed? Why not let the PMD to set it always?
> >>
> >> I saw this a simple way to have a consistent default value (the
> >> port_id) for all PMDs, without the need to modify existing PMD which
> >> don't currently have any concept of a switch domain.
> >
> > The default value don't makes much sense though. By default it would
> > mean the application each port is on different switch domain. This is
> > obviously not true in case of multiple VFs
> 
> But it is the case today for all ports, that they are assumed to be in separate
> switch domains or at least in unknowable switch domains by applications,
> with the exception of LAG groups, were it is implicit that they are in the same
> domain. Even if two ports were attached to the same switch, or if you where
> using multiple VFs from the same switch there is currently no way of telling
> the application that this is the case. 

Right, but I thought your new parameter is the way :

Uint16_t switch_domain -
"Introduces a new port attribute to ethdev port's which denotes the
switch domain a port belongs to"


What would be the use case for having
> multiple VFs from the same port/switch domain in a single data path
> application. It doesn't seem likely to me that you would use hardware
> switching to move traffic between 2 VFs in the same data path application.

It is not likely. However the parameter you defined is not true only for the representor case, but for the generic case (unless you state otherwise in the documentation of it).

Maybe it can also use application like VFd to understand the connectivity between the different VFs.

My point is - unless this parameter is valid only for the reprsentor case, the default value should be the correct one. 
I think it is better to not limit this field to representor only. Extra info for application is never a bad thing. 

> 
> If you are using port representors, I assumed that you would set a valid
> switch domain id, possibly picking up the port id of the first port created on
> that switch domain, but it would be completely up to the PMD how this was
> done, I have just shown the simplest model in which the port_id of the PF is
> used for all subsequent representor ports in that switch domain, but if you
> wanted to create multiple domains within the same device, you could just
> use subsequent port representors port_ids to setup further switch domains.
> A switch_id is only meaningful within the contex of DPDK process it is running
> in.
> 
> >
> > Maybe we can define ETH_SWITCH_ID_INVALID (0) to emphasis 0 is not
> the switch_id for the PMDs which didn't implemented it.
> >
> >>
> >> Also taking the approach of just leaving it up to the PMD to decide
> >> the value would mean that some form of synchronisation would be
> >> required so that two device don't select the switch domain identifier.
> >
> > What kind of knowledge ethdev layer has to set the switche_id? It should
> be based on the underlying capabilities of the device.
> > Some devices will use the kernel sysfs  to check that.
> >
> > Maybe there are devices which are able to expose the same switch for
> multiple devices, passing the packet between them using  peer2peer trough
> the PCI.
> >
> > The point is we need to have APIs which will enable all the future flexibility.
> >
> > Maybe we can have array of switch_id which are allready taken by the
> underlying devices, and PMD will register their switch_id to it atomically. To
> ease the synchronization.
> >
> >>
> >>>
> >>>>    	eth_dev->data->mtu = ETHER_MTU;
> >>>>
> >>>>    unlock:
> >>>> @@ -2395,6 +2397,7 @@ rte_eth_dev_info_get(uint16_t port_id,
> struct
> >>>> rte_eth_dev_info *dev_info)
> >>>>    	dev_info->driver_name = dev->device->driver->name;
> >>>>    	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
> >>>>    	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> >>>> +	dev_info->switch_id = dev->data->switch_id;
> >>>
> >>> Why there is a need to keep the switch_id on device data?
> >>> I think PMD to store it on its private structure and report it in
> >>> dev_info is
> >> enough.
> >>>
> >>
> >> That way would require every PMD to be modified to maintain a
> >> switch_id structure, which I know isn't a big deal, as only the
> >> device which need to support it would need it, but we would need to
> >> change the dev_info-
> >>> switch_id from being a uint16_t to being a signed value so we could
> >>> have a
> >> default of -1 for device which don't support switch domain.
> >
> > We can say also 0 is the invalid switch id.
> >
> >> I thought the former approach was cleaner.
> >>
> >>>>    }
> >>>>
> >>>>    int
> >>>> diff --git a/lib/librte_ether/rte_ethdev.h
> >>>> b/lib/librte_ether/rte_ethdev.h index 036153306..dced4fc41 100644
> >>>> --- a/lib/librte_ether/rte_ethdev.h
> >>>> +++ b/lib/librte_ether/rte_ethdev.h
> >>>> @@ -1029,6 +1029,7 @@ struct rte_eth_dev_info {
> >>>>    	/** Configured number of rx/tx queues */
> >>>>    	uint16_t nb_rx_queues; /**< Number of RX queues. */
> >>>>    	uint16_t nb_tx_queues; /**< Number of TX queues. */
> >>>> +	uint16_t switch_id; /**< Switch Domain Id */
> >>>>    };
> >>>>
> >>>>    /**
> >>>> diff --git a/lib/librte_ether/rte_ethdev_core.h
> >>>> b/lib/librte_ether/rte_ethdev_core.h
> >>>> index e5681e466..caed7a4e6 100644
> >>>> --- a/lib/librte_ether/rte_ethdev_core.h
> >>>> +++ b/lib/librte_ether/rte_ethdev_core.h
> >>>> @@ -585,6 +585,7 @@ struct rte_eth_dev_data {
> >>>>    	struct ether_addr* hash_mac_addrs;
> >>>>    	/** Device Ethernet MAC addresses of hash filtering. */
> >>>>    	uint16_t port_id;           /**< Device [external] port identifier. */
> >>>> +	uint16_t switch_id;	    /**< Switch which port is associated with
> >>>> */
> >>>>    	__extension__
> >>>>    	uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0).
> >>>> */
> >>>>    		scattered_rx : 1,  /**< RX of scattered packets is ON(1) /
> >>>> OFF(0) */
> >>>> --
> >>>> 2.14.3
> >>>
> >


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  2018-03-29 14:53     ` Doherty, Declan
@ 2018-04-01  6:14       ` Shahaf Shuler
  0 siblings, 0 replies; 73+ messages in thread
From: Shahaf Shuler @ 2018-04-01  6:14 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

Thursday, March 29, 2018 5:53 PM, Doherty, Declan:
> On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> > Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> >> Subject: [dpdk-dev][PATCH v6 4/8] ethdev: Add port representor device
> >> flag
> >>
> >> Add new device flag to specify that ethdev port is a port representor.
> >> Extend rte_eth_dev_info structure to expose device flags to user
> >> which enable applications to discover if a port is a representor port.
> >>
> >> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> >> ---
> >>   lib/librte_ether/rte_ethdev.c             | 1 +
> >>   lib/librte_ether/rte_ethdev.h             | 9 ++++++---
> >>   lib/librte_ether/rte_ethdev_representor.h | 3 +++
> >>   3 files changed, 10 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/lib/librte_ether/rte_ethdev.c
> >> b/lib/librte_ether/rte_ethdev.c index c719f84a3..163246433 100644
> >> --- a/lib/librte_ether/rte_ethdev.c
> >> +++ b/lib/librte_ether/rte_ethdev.c
> >> @@ -2399,6 +2399,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
> >> rte_eth_dev_info *dev_info)
> >>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
> >>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> >>   	dev_info->switch_id = dev->data->switch_id;
> >> +	dev_info->dev_flags = dev->data->dev_flags;
> >>   }
> >>
> >>   int
> >> diff --git a/lib/librte_ether/rte_ethdev.h
> >> b/lib/librte_ether/rte_ethdev.h index dced4fc41..226acc8b1 100644
> >> --- a/lib/librte_ether/rte_ethdev.h
> >> +++ b/lib/librte_ether/rte_ethdev.h
> >> @@ -996,6 +996,7 @@ struct rte_eth_dev_info {
> >>   	const char *driver_name; /**< Device Driver name. */
> >>   	unsigned int if_index; /**< Index to bound host interface, or 0 if
> >> none.
> >>   		Use if_indextoname() to translate into an interface name. */
> >> +	uint32_t dev_flags; /**< Device flags */
> >>   	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
> >>   	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
> >> pkt. */
> >>   	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
> @@
> >> -1229,11 +1230,13 @@ struct rte_eth_dev_owner {  };
> >>
> >>   /** Device supports link state interrupt */
> >> -#define RTE_ETH_DEV_INTR_LSC     0x0002
> >> +#define RTE_ETH_DEV_INTR_LSC		0x0002
> >>   /** Device is a bonded slave */
> >> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
> >> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
> >>   /** Device supports device removal interrupt */
> >> -#define RTE_ETH_DEV_INTR_RMV     0x0008
> >> +#define RTE_ETH_DEV_INTR_RMV		0x0008
> >> +/** Device is port representor */
> >> +#define RTE_ETH_DEV_REPRESENTOR		0x0010
> >
> > Maybe it is a good time to make some order here.
> > I understand the decision to use flags instead of bit-field. It is better.
> >
> > However there is a mix here of device capabilities like :
> RTE_ETH_DEV_INTR_LSC   and RTE_ETH_DEV_INTR_RMV
> > And device attributes like : RTE_ETH_DEV_BONDED_SLAVE and
> RTE_ETH_DEV_REPRESENTOR.
> > I don't think they belong together under the genetic name of dev_flags.
> >
> > Moreover, I am not sure the fact device is bonded slave should be exposed
> to the application. It should be internal to ethdev and its port iterators.
> 
> That's a good point on the bonded slave flag, I'll look at fixing that for the
> next release. I don't think changing it should effect ABI but I'll need to have a
> closer look.
> 
> Do you think that we should have a separate device attributes field, which
> the representor flag is contained in.
> 
> >
> > Finally I think representor port may need more info now (and in the
> future), for example the associated vf id.
> > For that, I think it is better it to be exposed as a dedicated struct on device
> info.
> 
> I think a switch port id should suffice for that, for SR-IOV devices it would
> map to the vf_id.

I think we need both switch_domain and vf_id. 
Because for representors, the application should know which VFs can be reached from this representor and which VF it represent. 

> 
> >
> >>
> >>   /**
> >>    * @warning
> >> diff --git a/lib/librte_ether/rte_ethdev_representor.h
> >> b/lib/librte_ether/rte_ethdev_representor.h
> >> index cbc1f2855..f3726d0ba 100644
> >> --- a/lib/librte_ether/rte_ethdev_representor.h
> >> +++ b/lib/librte_ether/rte_ethdev_representor.h
> >> @@ -22,6 +22,9 @@ eth_dev_representor_port_init(struct rte_eth_dev
> >> *ethdev, void *init_params)
> >>   	/** representor inherits the switch id of it's base device */
> >>   	ethdev->data->switch_id = base_ethdev->data->switch_id;
> >>
> >> +	/** Set device flags to specify that device is a representor port */
> >> +	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
> >
> > Should be set in the PMD, not in ethdev layer
> 
> As in the previous patch this is just a generic port bus init function which
> meets the simplest use case of representor port with a single switch domain,
> a PMD doesn't need to use it but having it here saves duplicating the same
> code across multiple PMD which are only supporting the basic mode.
> 
> >
> >> +
> >>   	return 0;
> >>   }
> >>
> >> --
> >> 2.14.3
> >


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation Declan Doherty
  2018-03-28 14:53   ` Thomas Monjalon
@ 2018-04-03 15:52   ` Adrien Mazarguil
  1 sibling, 0 replies; 73+ messages in thread
From: Adrien Mazarguil @ 2018-04-03 15:52 UTC (permalink / raw)
  To: Declan Doherty; +Cc: dev

Hi Declan,

On Wed, Mar 28, 2018 at 02:54:26PM +0100, Declan Doherty wrote:
> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> 
> Add document to describe a model for representing switching capable
> devices in DPDK, using a general ethdev port model and through port
> representors.This document also details the port model and the
> rte_flow semantics required for flow programming, as well as listing
> some example use cases.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>

OK for using the text of my original RFC, however since I'm not the *commit*
author, I suggest to make it yours with:

 git commit --amend --reset-author

You can then include my SoB line:

 Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Thanks. More cosmetic comments below.

<snip>
> +Port Representors
> +-----------------
> +
> +In many cases, traffic steering rules cannot be determined in advance;
> +applications usually have to process a bit of traffic in software before
> +thinking about offloading specific flows to hardware.
> +
> +Applications therefore need the ability to receive and inject traffic to
> +various device endpoints (other VFs, PFs or physical ports) before
> +connecting them together. Device drivers must provide means to hook the
> +"other end" of these endpoints and to refer them when configuring flow
> +rules.
> +
> +This role is left to so-called "port representors" (also known as "VF
> +representors" in the specific context of VFs), which are to DPDK what the
> +Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
> +which can be thought as a software "patch panel" front-end for applications.
> +
> +- DPDK port representors are implemented as additional virtual Ethernet
> +  device (**ethdev**) instances, spawned on an as needed basis through
> +  configuration parameters passed to the driver of the underlying
> +  device using devargs.
> +
> +::
> +
> +   -w pci:dbdf,representor=0
> +   -w pci:dbdf,representor=[0-3]
> +   -w pci:dbdf,representor=[0,5-11]
> +
> +- As virtual devices, they may be more limited than their physical
> +  counterparts, for instance by exposing only a subset of device
> +  configuration callbacks and/or by not necessarily having Rx/Tx capability.
> +
> +- Among other things, they can be used to assign MAC addresses to the
> +  resource they represent.
> +
> +- Applications can tell port representors apart from other physcial of virtual
> +  port by checking the dev_flags field within their device information
> +  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.
> +
> +.. code-block:: c
> +
> +  struct rte_eth_dev_info {
> +	..
> +	uint32_t dev_flags; /**< Device flags */
> +	..
> +  };
> +
> +- The device or group relationship of ports can be discovered using the
> +  switch_id field within the device information structure. By default the
> +  switch_id of a port will be it's port_id but ports within the same switch
> +  domain will share the same *switch_id* which in the case of SR-IOV devices
> +  would align to the port_id of the physical function port.
> +
> +.. code-block:: c
> +
> +  struct rte_eth_dev_info {
> +	..
> +	uint16_t switch_id; /**< Switch Domain Id */
> +	..
> +  };
> +

OK for these additions, note this section may have to be updated later
depending on how the API settles (especially on the devargs side) according
to discussions which are still going on.

<snip>
> +VF representors
> +~~~~~~~~~~~~~~~

Looks like you capitalized all words in some section titles but missed
others such as this one. I'm not a huge fan of capitalization in the middle
of sentences and actually prefer the original form, but I know it's very
common.

So I don't mind which you choose, however it should be consistent across all
section titles.

<snip>
> +Switching Examples
> +------------------
> +
> +This section provides practical examples based on the established Testpmd
> +flow command syntax [2]_, in the context described in `traffic steering`_
> +
> +::
> +
> +      .-------------.                 .-------------. .-------------.
> +      | hypervisor  |                 |    VM 1     | |    VM 2     |
> +      | application |                 | application | | application |
> +      `--+---+---+--'                 `----------+--' `--+----------'
> +         |   |   |                               |       |
> +         |   |   `-------------------.           |       |
> +         |   `---------.             |           |       |
> +         |             |             |           |       |
> +   .----(A)----. .----(B)----. .----(C)----.     |       |
> +   | port_id 3 | | port_id 4 | | port_id 5 |     |       |
> +   `-----+-----' `-----+-----' `-----+-----'     |       |
> +        |             |             |           |       |
> +      .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
> +      | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
> +      `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
> +        |             |             |           |       |
> +        |             |   .---------'           |       |
> +        `-----.       |   |   .-----------------'       |
> +              |       |   |   |   .---------------------'
> +              |       |   |   |   |
> +           .--|-------|---|---|---|--.
> +           |  |       |   `---|---'  |
> +           |  |       `-------'      |
> +           |  `---------.            |
> +           `------------|------------'
> +                        |
> +                   .---(F)----.
> +                   | physical |
> +                   |  port 0  |
> +                   `----------'

This diagram is a somewhat broken horizontally.

> +
> +By default, PF (**A**) can communicate with the physical port it is
> +associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
> +and restricted to communicate with the hypervisor application through their
> +respective representors (**B** and **C**) if supported.
> +
> +Examples in subsequent sections apply to hypervisor applications only and
> +are based on port representors **A**, **B** and **C**.
> +
> +.. [2] `Flow syntax
> +    <http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`

Internal documentation links should not go through HTTP where possible but
use the ":ref:`foo`" syntax, see doc/guides/contributing/documentation.rst.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 0/9] switching devices representation
  2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
                   ` (7 preceding siblings ...)
  2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 8/8] net/ixgbe: " Declan Doherty
@ 2018-04-16 13:05 ` Declan Doherty
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation Declan Doherty
                     ` (9 more replies)
  8 siblings, 10 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:05 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty

This patchset follows on from the port rerpesentor patchsets and the
community discussion that resulted. It outlines the model for
representing and controlling switching capable devices in a new
programmer's guide entry based upon the excellent summary by 
Adrien Mazarguil in 
(http://dpdk.org/ml/archives/dev/2018-March/092513.html).

The next patches introduce changes to librte_ether to:
1, support the definition of a switch domain and make it public to
application through the rte_eth_dev_info structure.
2, Add generic ethdev create/destroy APIs to facilitate and generalise the
creation of ethdev's on different bus types.
3, Add ethdev attribute to dev_flags to specify that a port is a
representor port and make public through the rte_eth_dev_info
structure.
4, Add devargs parsing for generic eth_devargs to facilate parsing in
NET PMDs. This will be refactored to take account of the changes in 
(http://dpdk.org/ml/archives/dev/2018-March/092513.html)
5, Add new API to allocate switch domain ids to devices which support
this feature. 

This patchset also includes the enablement of vf port representor for ixgbe 
and i40e PF devices.

V7: 
This patch address the following changes:
 - fixes in documentation patch
 - changes the default value of switch domain id to be INVALID to allow
   applications to easily identify devices which can/cannot support the
   concept. Updates the switch information available through the
   rte_eth_dev_info structure.
 - remove the rte_ethdev_representor.h header and leave representor
   specific initialisation to driver
 - add new APIs for allocating and freeing switch domain identifier to
   enable PMDs to have unique switch domaind ids without the ethdev
   infrastructure placing any restriction on how theses are managed by
   devices.
 - bug fix in ethdev args parsing code.

Declan Doherty (8):
  doc: add switch representation documentation
  ethdev: add switch identifier parameter to port
  ethdev: add generic create/destroy ethdev APIs
  ethdev: Add port representor device flag
  app/testpmd: add port name to device info
  ethdev: add switch domain allocator
  net/i40e: add support for representor ports
  net/ixgbe: add support for representor ports

Remy Horton (1):
  ethdev: add common devargs parser

 app/test-pmd/config.c                           |  15 +
 doc/guides/prog_guide/index.rst                 |   1 +
 doc/guides/prog_guide/switch_representation.rst | 837 ++++++++++++++++++++++++
 drivers/net/i40e/Makefile                       |   3 +
 drivers/net/i40e/i40e_ethdev.c                  |  82 ++-
 drivers/net/i40e/i40e_ethdev.h                  |  16 +
 drivers/net/i40e/i40e_vf_representor.c          | 405 ++++++++++++
 drivers/net/i40e/meson.build                    |   4 +-
 drivers/net/i40e/rte_pmd_i40e.c                 |  43 ++
 drivers/net/i40e/rte_pmd_i40e.h                 |  18 +
 drivers/net/ixgbe/Makefile                      |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c                |  73 ++-
 drivers/net/ixgbe/ixgbe_ethdev.h                |  14 +
 drivers/net/ixgbe/ixgbe_pf.c                    |   7 +
 drivers/net/ixgbe/ixgbe_vf_representor.c        | 217 ++++++
 drivers/net/ixgbe/meson.build                   |   1 +
 lib/Makefile                                    |   1 +
 lib/librte_ether/rte_ethdev.c                   | 345 +++++++++-
 lib/librte_ether/rte_ethdev.h                   |  26 +-
 lib/librte_ether/rte_ethdev_driver.h            | 126 ++++
 lib/librte_ether/rte_ethdev_pci.h               |  12 +
 lib/librte_ether/rte_ethdev_version.map         |  12 +
 22 files changed, 2239 insertions(+), 20 deletions(-)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c

-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
@ 2018-04-16 13:05   ` Declan Doherty
  2018-04-16 15:55     ` Kovacevic, Marko
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 2/9] ethdev: add switch identifier parameter to port Declan Doherty
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:05 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty, Adrien Mazarguil

Add document to describe the  model for representing switching capable
devices in DPDK, using a general ethdev port model and through port
representors. This document also details the port model and the
rte_flow semantics required for flow programming, as well as listing
some example use cases.

Signed-off-by: Adrien Mazarguil <adrien.mazaguil@6wind.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 doc/guides/prog_guide/index.rst                 |   1 +
 doc/guides/prog_guide/switch_representation.rst | 837 ++++++++++++++++++++++++
 2 files changed, 838 insertions(+)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index bbbe7895d..09224af2e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -17,6 +17,7 @@ Programmer's Guide
     mbuf_lib
     poll_mode_drv
     rte_flow
+    switch_representation
     traffic_metering_and_policing
     traffic_management
     bbdev
diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
new file mode 100644
index 000000000..8875d2846
--- /dev/null
+++ b/doc/guides/prog_guide/switch_representation.rst
@@ -0,0 +1,837 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 6WIND S.A.
+
+.. _switch_representation:
+
+Switch Representation within DPDK Applications
+==============================================
+
+.. contents:: :local:
+
+Introduction
+------------
+
+Network adapters with multiple physical ports and/or SR-IOV capabilities
+usually support the offload of traffic steering rules between their virtual
+functions (VFs), physical functions (PFs) and ports.
+
+Like for standard Ethernet switches, this involves a combination of
+automatic MAC learning and manual configuration. For most purposes it is
+managed by the host system and fully transparent to users and applications.
+
+On the other hand, applications typically found on hypervisors that process
+layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
+according on their own criteria.
+
+Without a standard software interface to manage traffic steering rules
+between VFs, PFs and the various physical ports of a given device,
+applications cannot take advantage of these offloads; software processing is
+mandatory even for traffic which ends up re-injected into the device it
+originates from.
+
+This document describes how such steering rules can be configured through
+the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
+(PF/VF steering) using a single physical port for clarity, however the same
+logic applies to any number of ports without necessarily involving SR-IOV.
+
+Port Representors
+-----------------
+
+In many cases, traffic steering rules cannot be determined in advance;
+applications usually have to process a bit of traffic in software before
+thinking about offloading specific flows to hardware.
+
+Applications therefore need the ability to receive and inject traffic to
+various device endpoints (other VFs, PFs or physical ports) before
+connecting them together. Device drivers must provide means to hook the
+"other end" of these endpoints and to refer them when configuring flow
+rules.
+
+This role is left to so-called "port representors" (also known as "VF
+representors" in the specific context of VFs), which are to DPDK what the
+Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
+which can be thought as a software "patch panel" front-end for applications.
+
+- DPDK port representors are implemented as additional virtual Ethernet
+  device (**ethdev**) instances, spawned on an as needed basis through
+  configuration parameters passed to the driver of the underlying
+  device using devargs.
+
+::
+
+   -w pci:dbdf,representor=0
+   -w pci:dbdf,representor=[0-3]
+   -w pci:dbdf,representor=[0,5-11]
+
+- As virtual devices, they may be more limited than their physical
+  counterparts, for instance by exposing only a subset of device
+  configuration callbacks and/or by not necessarily having Rx/Tx capability.
+
+- Among other things, they can be used to assign MAC addresses to the
+  resource they represent.
+
+- Applications can tell port representors apart from other physcial of virtual
+  port by checking the dev_flags field within their device information
+  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.
+
+.. code-block:: c
+
+  struct rte_eth_dev_info {
+	..
+	uint32_t dev_flags; /**< Device flags */
+	..
+  };
+
+- The device or group relationship of ports can be discovered using the
+  switch ``domain_id`` field within the devices switch information structure. By
+  default the switch ``domain_id`` of a port will be
+  ``RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID`` to indicate that the port doesn't
+  support the concept of a switch domain, but ports which do support the concept
+  will be allocated a unique switch ``domain_id``, ports within the same switch
+  domain will share the same ``domain_id``. The switch ``port_id`` is used to 
+  specify the port_id in terms of the switch, so in the case of SR-IOV devices
+  the switch ``port_id`` would represent the virtual function identifier of the
+  port.
+
+.. code-block:: c
+
+   /**
+    * Ethernet device associated switch information
+    */
+   struct rte_eth_switch_info {
+       const char *name; /**< switch name */
+       uint16_t domain_id; /**< switch domain id */
+       uint16_t port_id; /**< switch port id */
+   };
+
+
+.. [1] `Ethernet switch device driver model (switchdev)
+       <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
+
+Basic SR-IOV
+------------
+
+"Basic" in the sense that it is not managed by applications, which
+nonetheless expect traffic to flow between the various endpoints and the
+outside as if everything was linked by an Ethernet hub.
+
+The following diagram pictures a setup involving a device with one PF, two
+VFs and one shared physical port
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+----------'                 `----------+--' `--+----------'
+          |                                       |       |
+    .-----+-----.                                 |       |
+    | port_id 3 |                                 |       |
+    `-----+-----'                                 |       |
+          |                                       |       |
+        .-+--.                                .---+--. .--+---.
+        | PF |                                | VF 1 | | VF 2 |
+        `-+--'                                `---+--' `--+---'
+          |                                       |       |
+          `---------.     .-----------------------'       |
+                    |     |     .-------------------------'
+                    |     |     |
+                 .--+-----+-----+--.
+                 | interconnection |
+                 `--------+--------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- A DPDK application running on the hypervisor owns the PF device, which is
+  arbitrarily assigned port index 3.
+
+- Both VFs are assigned to VMs and used by unknown applications; they may be
+  DPDK-based or anything else.
+
+- Interconnection is not necessarily done through a true Ethernet switch and
+  may not even exist as a separate entity. The role of this block is to show
+  that something brings PF, VFs and physical ports together and enables
+  communication between them, with a number of built-in restrictions.
+
+Subsequent sections in this document describe means for DPDK applications
+running on the hypervisor to freely assign specific flows between PF, VFs
+and physical ports based on traffic properties, by managing this
+interconnection.
+
+Controlled SR-IOV
+-----------------
+
+Initialization
+~~~~~~~~~~~~~~
+
+When a DPDK application gets assigned a PF device and is deliberately not
+started in `basic SR-IOV`_ mode, any traffic coming from physical ports is
+received by PF according to default rules, while VFs remain isolated.
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+----------'                 `----------+--' `--+----------'
+          |                                       |       |
+    .-----+-----.                                 |       |
+    | port_id 3 |                                 |       |
+    `-----+-----'                                 |       |
+          |                                       |       |
+        .-+--.                                .---+--. .--+---.
+        | PF |                                | VF 1 | | VF 2 |
+        `-+--'                                `------' `------'
+          |
+          `-----.
+                |
+             .--+----------------------.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+In this mode, interconnection must be configured by the application to
+enable VF communication, for instance by explicitly directing traffic with a
+given destination MAC address to VF 1 and allowing that with the same source
+MAC address to come out of it.
+
+For this to work, hypervisor applications need a way to refer to either VF 1
+or VF 2 in addition to the PF. This is addressed by `VF representors`_.
+
+VF Representors
+~~~~~~~~~~~~~~~
+
+VF representors are virtual but standard DPDK network devices (albeit with
+limited capabilities) created by PMDs when managing a PF device.
+
+Since they represent VF instances used by other applications, configuring
+them (e.g. assigning a MAC address or setting up promiscuous mode) affects
+interconnection accordingly. If supported, they may also be used as two-way
+communication ports with VFs (assuming **switchdev** topology)
+
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          |             |             |           |       |
+    .-----+-----. .-----+-----. .-----+-----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
+          |             |             |           |       |
+          |             |   .---------'           |       |
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- VF representors are assigned arbitrary port indices 4 and 5 in the
+  hypervisor application and are respectively associated with VF 1 and VF 2.
+
+- They can't be dissociated; even if VF 1 and VF 2 were not connected,
+  representors could still be used for configuration.
+
+- In this context, port index 3 can be thought as a representor for physical
+  port 0.
+
+As previously described, the "interconnection" block represents a logical
+concept. Interconnection occurs when hardware configuration enables traffic
+flows from one place to another (e.g. physical port 0 to VF 1) according to
+some criteria.
+
+This is discussed in more detail in `traffic steering`_.
+
+Traffic Steering
+~~~~~~~~~~~~~~~~
+
+In the following diagram, each meaningful traffic origin or endpoint as seen
+by the hypervisor application is tagged with a unique letter from A to F.
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          |             |             |           |       |
+    .----(A)----. .----(B)----. .----(C)----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+          |             |             |           |       |
+          |             |   .---------'           |       |
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .---(F)----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- **A**: PF device.
+- **B**: port representor for VF 1.
+- **C**: port representor for VF 2.
+- **D**: VF 1 proper.
+- **E**: VF 2 proper.
+- **F**: physical port.
+
+Although uncommon, some devices do not enforce a one to one mapping between
+PF and physical ports. For instance, by default all ports of **mlx4**
+adapters are available to all their PF/VF instances, in which case
+additional ports appear next to **F** in the above diagram.
+
+Assuming no interconnection is provided by default in this mode, setting up
+a `basic SR-IOV`_ configuration involving physical port 0 could be broken
+down as:
+
+PF:
+
+- **A to F**: let everything through.
+- **F to A**: PF MAC as destination.
+
+VF 1:
+
+- **A to D**, **E to D** and **F to D**: VF 1 MAC as destination.
+- **D to A**: VF 1 MAC as source and PF MAC as destination.
+- **D to E**: VF 1 MAC as source and VF 2 MAC as destination.
+- **D to F**: VF 1 MAC as source.
+
+VF 2:
+
+- **A to E**, **D to E** and **F to E**: VF 2 MAC as destination.
+- **E to A**: VF 2 MAC as source and PF MAC as destination.
+- **E to D**: VF 2 MAC as source and VF 1 MAC as destination.
+- **E to F**: VF 2 MAC as source.
+
+Devices may additionally support advanced matching criteria such as
+IPv4/IPv6 addresses or TCP/UDP ports.
+
+The combination of matching criteria with target endpoints fits well with
+**rte_flow** [6]_, which expresses flow rules as combinations of patterns
+and actions.
+
+Enhancing **rte_flow** with the ability to make flow rules match and target
+these endpoints provides a standard interface to manage their
+interconnection without introducing new concepts and whole new API to
+implement them. This is described in `flow API (rte_flow)`_.
+
+.. [6] `Generic flow API (rte_flow)
+       <http://dpdk.org/doc/guides/prog_guide/rte_flow.html>`_
+
+Flow API (rte_flow)
+-------------------
+
+Extensions
+~~~~~~~~~~
+
+Compared to creating a brand new dedicated interface, **rte_flow** was
+deemed flexible enough to manage representor traffic only with minor
+extensions:
+
+- Using physical ports, PF, VF or port representors as targets.
+
+- Affecting traffic that is not necessarily addressed to the DPDK port ID a
+  flow rule is associated with (e.g. forcing VF traffic redirection to PF).
+
+For advanced uses:
+
+- Rule-based packet counters.
+
+- The ability to combine several identical actions for traffic duplication
+  (e.g. VF representor in addition to a physical port).
+
+- Dedicated actions for traffic encapsulation / decapsulation before
+  reaching a endpoint.
+
+Traffic Direction
+~~~~~~~~~~~~~~~~~
+
+From an application standpoint, "ingress" and "egress" flow rule attributes
+apply to the DPDK port ID they are associated with. They select a traffic
+direction for matching patterns, but have no impact on actions.
+
+When matching traffic coming from or going to a different place than the
+immediate port ID a flow rule is associated with, these attributes keep
+their meaning while applying to the chosen origin, as highlighted by the
+following diagram
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          | ^           | ^           | ^         |       |
+          | | ingress   | | ingress   | | ingress |       |
+          | | egress    | | egress    | | egress  |       |
+          | v           | v           | v         |       |
+    .----(A)----. .----(B)----. .----(C)----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+          |             |             |         ^ |       | ^
+          |             |             |  egress | |       | | egress
+          |             |             | ingress | |       | | ingress
+          |             |   .---------'         v |       | v
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                        ^ |
+                ingress | |
+                 egress | |
+                        v |
+                     .---(F)----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+Ingress and egress are defined as relative to the application creating the
+flow rule.
+
+For instance, matching traffic sent by VM 2 would be done through an ingress
+flow rule on VF 2 (**E**). Likewise for incoming traffic on physical port
+(**F**). This also applies to **C** and **A** respectively.
+
+Transferring Traffic
+~~~~~~~~~~~~~~~~~~~~
+
+Without Port Representors
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`Traffic direction`_ describes how an application could match traffic coming
+from or going to a specific place reachable from a DPDK port ID. This makes
+sense when the traffic in question is normally seen (i.e. sent or received)
+by the application creating the flow rule (e.g. as in "redirect all traffic
+coming from VF 1 to local queue 6").
+
+However this does not force such traffic to take a specific route. Creating
+a flow rule on **A** matching traffic coming from **D** is only meaningful
+if it can be received by **A** in the first place, otherwise doing so simply
+has no effect.
+
+A new flow rule attribute named "transfer" is necessary for that. Combining
+it with "ingress" or "egress" and a specific origin requests a flow rule to
+be applied at the lowest level
+
+::
+
+             ingress only           :       ingress + transfer
+                                    :
+    .-------------. .-------------. : .-------------. .-------------.
+    | hypervisor  | |    VM 1     | : | hypervisor  | |    VM 1     |
+    | application | | application | : | application | | application |
+    `------+------' `--+----------' : `------+------' `--+----------'
+           |           | | traffic  :        |           | | traffic
+     .----(A)----.     | v          :  .----(A)----.     | v
+     | port_id 3 |     |            :  | port_id 3 |     |
+     `-----+-----'     |            :  `-----+-----'     |
+           |           |            :        | ^         |
+           |           |            :        | | traffic |
+         .-+--.    .---+--.         :      .-+--.    .---+--.
+         | PF |    | VF 1 |         :      | PF |    | VF 1 |
+         `-+--'    `--(D)-'         :      `-+--'    `--(D)-'
+           |           | | traffic  :        | ^         | | traffic
+           |           | v          :        | | traffic | v
+        .--+-----------+--.         :     .--+-----------+--.
+        | interconnection |         :     | interconnection |
+        `--------+--------'         :     `--------+--------'
+                 | | traffic        :              |
+                 | v                :              |
+            .---(F)----.            :         .---(F)----.
+            | physical |            :         | physical |
+            |  port 0  |            :         |  port 0  |
+            `----------'            :         `----------'
+
+With "ingress" only, traffic is matched on **A** thus still goes to physical
+port **F** by default
+
+
+::
+
+   testpmd> flow create 3 ingress pattern vf id is 1 / end
+              actions queue index 6 / end
+
+With "ingress + transfer", traffic is matched on **D** and is therefore
+successfully assigned to queue 6 on **A**
+
+
+::
+
+    testpmd> flow create 3 ingress transfer pattern vf id is 1 / end
+              actions queue index 6 / end
+
+
+With Port Representors
+^^^^^^^^^^^^^^^^^^^^^^
+
+When port representors exist, implicit flow rules with the "transfer"
+attribute (described in `without port representors`_) are be assumed to
+exist between them and their represented resources. These may be immutable.
+
+In this case, traffic is received by default through the representor and
+neither the "transfer" attribute nor traffic origin in flow rule patterns
+are necessary. They simply have to be created on the representor port
+directly and may target a different representor as described in `PORT_ID
+action`_.
+
+Implicit traffic flow with port representor
+
+::
+
+       .-------------.   .-------------.
+       | hypervisor  |   |    VM 1     |
+       | application |   | application |
+       `--+-------+--'   `----------+--'
+          |       | ^               | | traffic
+          |       | | traffic       | v
+          |       `-----.           |
+          |             |           |
+    .----(A)----. .----(B)----.     |
+    | port_id 3 | | port_id 4 |     |
+    `-----+-----' `-----+-----'     |
+          |             |           |
+        .-+--.    .-----+-----. .---+--.
+        | PF |    | VF 1 rep. | | VF 1 |
+        `-+--'    `-----+-----' `--(D)-'
+          |             |           |
+       .--|-------------|-----------|--.
+       |  |             |           |  |
+       |  |             `-----------'  |
+       |  |              <-- traffic   |
+       `--|----------------------------'
+          |
+     .---(F)----.
+     | physical |
+     |  port 0  |
+     `----------'
+
+Pattern Items And Actions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PORT Pattern Item
+^^^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a physical
+port of the underlying device.
+
+Using this pattern item without specifying a port index matches the physical
+port associated with the current DPDK port ID by default. As described in
+`traffic steering`_, specifying it should be rarely needed.
+
+- Matches **F** in `traffic steering`_.
+
+PORT Action
+^^^^^^^^^^^
+
+Directs matching traffic to a given physical port index.
+
+- Targets **F** in `traffic steering`_.
+
+PORT_ID Pattern Item
+^^^^^^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a given DPDK
+port ID.
+
+Normally only supported if the port ID in question is known by the
+underlying PMD and related to the device the flow rule is created against.
+
+This must not be confused with the `PORT pattern item`_ which refers to the
+physical port of a device. ``PORT_ID`` refers to a ``struct rte_eth_dev``
+object on the application side (also known as "port representor" depending
+on the kind of underlying device).
+
+- Matches **A**, **B** or **C** in `traffic steering`_.
+
+PORT_ID Action
+^^^^^^^^^^^^^^
+
+Directs matching traffic to a given DPDK port ID.
+
+Same restrictions as `PORT_ID pattern item`_.
+
+- Targets **A**, **B** or **C** in `traffic steering`_.
+
+PF Pattern Item
+^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) the physical
+function of the current device.
+
+If supported, should work even if the physical function is not managed by
+the application and thus not associated with a DPDK port ID. Its behavior is
+otherwise similar to `PORT_ID pattern item`_ using PF port ID.
+
+- Matches **A** in `traffic steering`_.
+
+PF Action
+^^^^^^^^^
+
+Directs matching traffic to the physical function of the current device.
+
+Same restrictions as `PF pattern item`_.
+
+- Targets **A** in `traffic steering`_.
+
+VF Pattern Item
+^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a given
+virtual function of the current device.
+
+If supported, should work even if the virtual function is not managed by
+the application and thus not associated with a DPDK port ID. Its behavior is
+otherwise similar to `PORT_ID pattern item`_ using VF port ID.
+
+Note this pattern item does not match VF representors traffic which, as
+separate entities, should be addressed through their own port IDs.
+
+- Matches **D** or **E** in `traffic steering`_.
+
+VF Action
+^^^^^^^^^
+
+Directs matching traffic to a given virtual function of the current device.
+
+Same restrictions as `VF pattern item`_.
+
+- Targets **D** or **E** in `traffic steering`_.
+
+\*_ENCAP actions
+^^^^^^^^^^^^^^^^
+
+These actions are named according to the protocol they encapsulate traffic
+with (e.g. ``VXLAN_ENCAP``) and using specific parameters (e.g. VNI for
+VXLAN).
+
+While they modify traffic and can be used multiple times (order matters),
+unlike `PORT_ID action`_ and friends, they have no impact on steering.
+
+As described in `actions order and repetition`_ this means they are useless
+if used alone in an action list, the resulting traffic gets dropped unless
+combined with either ``PASSTHRU`` or other endpoint-targeting actions.
+
+\*_DECAP actions
+^^^^^^^^^^^^^^^^
+
+They perform the reverse of `\*_ENCAP actions`_ by popping protocol headers
+from traffic instead of pushing them. They can be used multiple times as
+well.
+
+Note that using these actions on non-matching traffic results in undefined
+behavior. It is recommended to match the protocol headers to decapsulate on
+the pattern side of a flow rule in order to use these actions or otherwise
+make sure only matching traffic goes through.
+
+Actions Order and Repetition
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Flow rules are currently restricted to at most a single action of each
+supported type, performed in an unpredictable order (or all at once). To
+repeat actions in a predictable fashion, applications have to make rules
+pass-through and use priority levels.
+
+It's now clear that PMD support for chaining multiple non-terminating flow
+rules of varying priority levels is prohibitively difficult to implement
+compared to simply allowing multiple identical actions performed in a
+defined order by a single flow rule.
+
+- This change is required to support protocol encapsulation offloads and the
+  ability to perform them multiple times (e.g. VLAN then VXLAN).
+
+- It makes the ``DUP`` action redundant since multiple ``QUEUE`` actions can
+  be combined for duplication.
+
+- The (non-)terminating property of actions must be discarded. Instead, flow
+  rules themselves must be considered terminating by default (i.e. dropping
+  traffic if there is no specific target) unless a ``PASSTHRU`` action is
+  also specified.
+
+Switching Examples
+------------------
+
+This section provides practical examples based on the established Testpmd
+flow command syntax [2]_, in the context described in `traffic steering`_
+
+::
+
+      .-------------.                 .-------------. .-------------.
+      | hypervisor  |                 |    VM 1     | |    VM 2     |
+      | application |                 | application | | application |
+      `--+---+---+--'                 `----------+--' `--+----------'
+         |   |   |                               |       |
+         |   |   `-------------------.           |       |
+         |   `---------.             |           |       |
+         |             |             |           |       |
+   .----(A)----. .----(B)----. .----(C)----.     |       |
+   | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+   `-----+-----' `-----+-----' `-----+-----'     |       |
+         |             |             |           |       |
+       .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+       | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+       `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+         |             |             |           |       |
+         |             |   .---------'           |       |
+         `-----.       |   |   .-----------------'       |
+               |       |   |   |   .---------------------'
+               |       |   |   |   |
+            .--|-------|---|---|---|--.
+            |  |       |   `---|---'  |
+            |  |       `-------'      |
+            |  `---------.            |
+            `------------|------------'
+                         |
+                    .---(F)----.
+                    | physical |
+                    |  port 0  |
+                    `----------'
+
+By default, PF (**A**) can communicate with the physical port it is
+associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
+and restricted to communicate with the hypervisor application through their
+respective representors (**B** and **C**) if supported.
+
+Examples in subsequent sections apply to hypervisor applications only and
+are based on port representors **A**, **B** and **C**.
+
+.. [2] `Flow syntax
+    <http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`
+
+Associating VF 1 with Physical Port 0
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assign all port traffic (**F**) to VF 1 (**D**) indiscriminately through
+their representors
+
+::
+
+   flow create 3 ingress pattern / end actions port_id id 4 / end
+   flow create 4 ingress pattern / end actions port_id id 3 / end
+
+More practical example with MAC address restrictions
+
+::
+
+   flow create 3 ingress
+       pattern eth dst is {VF 1 MAC} / end
+       actions port_id id 4 / end
+
+::
+
+   flow create 4 ingress
+       pattern eth src is {VF 1 MAC} / end
+       actions port_id id 3 / end
+
+
+Sharing Broadcasts
+~~~~~~~~~~~~~~~~~~
+
+From outside to PF and VFs
+
+::
+
+   flow create 3 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff / end
+      actions port_id id 3 / port_id id 4 / port_id id 5 / end
+
+Note ``port_id id 3`` is necessary otherwise only VFs would receive matching
+traffic.
+
+From PF to outside and VFs
+
+::
+
+   flow create 3 egress
+      pattern eth dst is ff:ff:ff:ff:ff:ff / end
+      actions port / port_id id 4 / port_id id 5 / end
+
+From VFs to outside and PF
+
+::
+
+   flow create 4 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 1 MAC} / end
+      actions port_id id 3 / port_id id 5 / end
+
+   flow create 5 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 2 MAC} / end
+      actions port_id id 4 / port_id id 4 / end
+
+Similar ``33:33:*`` rules based on known MAC addresses should be added for
+IPv6 traffic.
+
+Encapsulating VF 2 Traffic in VXLAN
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assuming pass-through flow rules are supported
+
+::
+
+   flow create 5 ingress
+      pattern eth / end
+      actions vxlan_encap vni 42 / passthru / end
+
+::
+
+   flow create 5 egress
+      pattern vxlan vni is 42 / end
+      actions vxlan_decap / passthru / end
+
+Here ``passthru`` is needed since as described in `actions order and
+repetition`_, flow rules are otherwise terminating; if supported, a rule
+without a target endpoint will drop traffic.
+
+Without pass-through support, ingress encapsulation on the destination
+endpoint might not be supported and action list must provide one
+
+::
+
+   flow create 5 ingress
+      pattern eth src is {VF 2 MAC} / end
+      actions vxlan_encap vni 42 / port_id id 3 / end
+
+   flow create 3 ingress
+      pattern vxlan vni is 42 / end
+      actions vxlan_decap / port_id id 5 / end
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 2/9] ethdev: add switch identifier parameter to port
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation Declan Doherty
@ 2018-04-16 13:05   ` Declan Doherty
  2018-04-24 16:38     ` Thomas Monjalon
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:05 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty

Introduces a new port attribute to ethdev port's which denotes the
switch domain a port belongs to. By default all port's switch
identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports 
which supported the concept of switch domains can be configured with
the same switch domain id.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 app/test-pmd/config.c         | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h | 17 +++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index dd051f5ca..884bcb3b6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -517,6 +517,18 @@ port_infos_display(portid_t port_id)
 	printf("Min possible number of TXDs per queue: %hu\n",
 		dev_info.tx_desc_lim.nb_min);
 	printf("TXDs number alignment: %hu\n", dev_info.tx_desc_lim.nb_align);
+
+	/* Show switch info only if valid switch domain and port id is set */
+	if (dev_info.switch_info.domain_id !=
+		RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) {
+		if (dev_info.switch_info.name)
+			printf("Switch name: %s\n", dev_info.switch_info.name);
+
+		printf("Switch domain Id: %u\n",
+			dev_info.switch_info.domain_id);
+		printf("Switch Port Id: %u\n",
+			dev_info.switch_info.port_id);
+	}
 }
 
 void
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4f417f573..b5e5fc52a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1009,6 +1009,21 @@ struct rte_eth_dev_portconf {
 	uint16_t nb_queues; /**< Device-preferred number of queues */
 };
 
+/**
+ * Default values for switch domain id when ethdev does not support switch
+ * domain definitions.
+ */
+#define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID	(0)
+
+/**
+ * Ethernet device associated switch information
+ */
+struct rte_eth_switch_info {
+	const char *name;	/**< switch name */
+	uint16_t domain_id;	/**< switch domain id */
+	uint16_t port_id;	/**< switch port id */
+};
+
 /**
  * Ethernet device information
  */
@@ -1054,6 +1069,8 @@ struct rte_eth_dev_info {
 	struct rte_eth_dev_portconf default_rxportconf;
 	/** Tx parameter recommendations */
 	struct rte_eth_dev_portconf default_txportconf;
+	/** ethdev switch information */
+	struct rte_eth_switch_info switch_info;
 };
 
 /**
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation Declan Doherty
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 2/9] ethdev: add switch identifier parameter to port Declan Doherty
@ 2018-04-16 13:05   ` Declan Doherty
  2018-04-20 13:01     ` Ananyev, Konstantin
  2018-04-24 17:48     ` Thomas Monjalon
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag Declan Doherty
                     ` (6 subsequent siblings)
  9 siblings, 2 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:05 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty

Add new bus generic ethdev create/destroy APIs which are bus independent
and provide hooks for bus specific initialisation.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c           | 95 ++++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ethdev_driver.h    | 57 ++++++++++++++++++++
 lib/librte_ether/rte_ethdev_pci.h       | 12 +++++
 lib/librte_ether/rte_ethdev_version.map |  8 +++
 4 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 3c049ef43..b16d23b9a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -348,7 +348,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
 	rte_eth_dev_shared_data_prepare();
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
-
+	eth_dev->device = NULL;
+	eth_dev->intr_handle = NULL;
 	eth_dev->state = RTE_ETH_DEV_UNUSED;
 
 	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
@@ -3439,6 +3440,98 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
 			RTE_MEMZONE_IOVA_CONTIG, align);
 }
 
+int __rte_experimental
+rte_eth_dev_create(struct rte_device *device, const char *name,
+	size_t priv_data_size,
+	ethdev_bus_specific_init ethdev_bus_specific_init,
+	void *bus_init_params,
+	ethdev_init_t ethdev_init, void *init_params)
+{
+	struct rte_eth_dev *ethdev;
+	int retval;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		ethdev = rte_eth_dev_allocate(name);
+		if (!ethdev) {
+			retval = -ENODEV;
+			goto probe_failed;
+		}
+
+		if (priv_data_size) {
+			ethdev->data->dev_private = rte_zmalloc_socket(
+				name, priv_data_size, RTE_CACHE_LINE_SIZE,
+				device->numa_node);
+
+			if (!ethdev->data->dev_private) {
+				RTE_LOG(ERR, EAL, "failed to allocate private data");
+				retval = -ENOMEM;
+				goto probe_failed;
+			}
+		}
+	} else {
+		ethdev = rte_eth_dev_attach_secondary(name);
+		if (!ethdev) {
+			RTE_LOG(ERR, EAL, "secondary process attach failed, "
+				"ethdev doesn't exist");
+			retval = -ENODEV;
+			goto probe_failed;
+		}
+	}
+
+	ethdev->device = device;
+
+	if (ethdev_bus_specific_init) {
+		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
+		if (retval) {
+			RTE_LOG(ERR, EAL,
+				"ethdev bus specific initialisation failed");
+			goto probe_failed;
+		}
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);
+	retval = ethdev_init(ethdev, init_params);
+	if (retval) {
+		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
+		goto probe_failed;
+	}
+
+	return retval;
+probe_failed:
+	/* free ports private data if primary process */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(ethdev->data->dev_private);
+
+	rte_eth_dev_release_port(ethdev);
+
+	return retval;
+}
+
+int  __rte_experimental
+rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
+	ethdev_uninit_t ethdev_uninit)
+{
+	int ret;
+
+	ethdev = rte_eth_dev_allocated(ethdev->data->name);
+	if (!ethdev)
+		return -ENODEV;
+
+	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_uninit, -EINVAL);
+	if (ethdev_uninit) {
+		ret = ethdev_uninit(ethdev);
+		if (ret)
+			return ret;
+	}
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(ethdev->data->dev_private);
+
+	ethdev->data->dev_private = NULL;
+
+	return rte_eth_dev_release_port(ethdev);
+}
+
 int
 rte_eth_dev_rx_intr_ctl_q(uint16_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data)
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index a406ef123..e52add0ad 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -188,6 +188,63 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 #endif
 }
 
+
+typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
+typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
+	void *bus_specific_init_params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function for the creation of a new ethdev ports.
+ *
+ * @param device
+ *  rte_device handle.
+ * @param	name
+ *  port name.
+ * @param priv_data_size
+ *  size of private data required for port.
+ * @param bus_specific_init
+ *  port bus specific initialisation callback function
+ * @param bus_init_params
+ *  port bus specific initialisation parameters
+ * @param ethdev_init
+ *  device specific port initialization callback function
+ * @param init_params
+ *  port initialisation parameters
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_dev_create(struct rte_device *device, const char *name,
+	size_t priv_data_size,
+	ethdev_bus_specific_init bus_specific_init, void *bus_init_params,
+	ethdev_init_t ethdev_init, void *init_params);
+
+
+typedef int (*ethdev_uninit_t)(struct rte_eth_dev *ethdev);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function for cleaing up the resources of a ethdev port on it's
+ * destruction.
+ *
+ * @param ethdev
+ *   ethdev handle of port.
+ * @param ethdev
+ *   device specific port un-initialise callback function
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
+	ethdev_uninit_t ethdev_uninit);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ethdev_pci.h b/lib/librte_ether/rte_ethdev_pci.h
index 6565ae7d3..603287c28 100644
--- a/lib/librte_ether/rte_ethdev_pci.h
+++ b/lib/librte_ether/rte_ethdev_pci.h
@@ -70,6 +70,18 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev,
 	eth_dev->data->numa_node = pci_dev->device.numa_node;
 }
 
+static inline int
+eth_dev_pci_specific_init(struct rte_eth_dev *eth_dev, void *bus_device) {
+	struct rte_pci_device *pci_dev = bus_device;
+
+	if (!pci_dev)
+		return -ENODEV;
+
+	rte_eth_copy_pci_info(eth_dev, pci_dev);
+
+	return 0;
+}
+
 /**
  * @internal
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 34df6c8b5..bd7232923 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -229,3 +229,11 @@ EXPERIMENTAL {
 	rte_mtr_stats_update;
 
 } DPDK_17.11;
+
+EXPERIMENTAL {
+	global:
+
+	rte_eth_dev_create;
+	rte_eth_dev_destroy;
+
+} DPDK_18.05;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (2 preceding siblings ...)
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
@ 2018-04-16 13:06   ` Declan Doherty
  2018-04-24 19:37     ` Thomas Monjalon
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 5/9] app/testpmd: add port name to device info Declan Doherty
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:06 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty

Add new device flag to specify that an ethdev port is a port representor.
Extend rte_eth_dev_info structure to expose device flags to the user which
enables applications to discover if a port is a representor port.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c | 2 ++
 lib/librte_ether/rte_ethdev.h | 9 ++++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b16d23b9a..1d38d8e75 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2431,6 +2431,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
 	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
+
+	dev_info->dev_flags = dev->data->dev_flags;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b5e5fc52a..0a52067d3 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1032,6 +1032,7 @@ struct rte_eth_dev_info {
 	const char *driver_name; /**< Device Driver name. */
 	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
 		Use if_indextoname() to translate into an interface name. */
+	uint32_t dev_flags; /**< Device flags */
 	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
 	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
 	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
@@ -1268,11 +1269,13 @@ struct rte_eth_dev_owner {
 };
 
 /** Device supports link state interrupt */
-#define RTE_ETH_DEV_INTR_LSC     0x0002
+#define RTE_ETH_DEV_INTR_LSC		0x0002
 /** Device is a bonded slave */
-#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
+#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
 /** Device supports device removal interrupt */
-#define RTE_ETH_DEV_INTR_RMV     0x0008
+#define RTE_ETH_DEV_INTR_RMV		0x0008
+/** Device is port representor */
+#define RTE_ETH_DEV_REPRESENTOR		0x0010
 
 /**
  * @warning
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 5/9] app/testpmd: add port name to device info
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (3 preceding siblings ...)
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag Declan Doherty
@ 2018-04-16 13:06   ` Declan Doherty
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser Declan Doherty
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:06 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty

Add the port name to information printed by show port info <port_id>

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 app/test-pmd/config.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 884bcb3b6..1b985056a 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -407,6 +407,7 @@ port_infos_display(portid_t port_id)
 	static const char *info_border = "*********************";
 	portid_t pid;
 	uint16_t mtu;
+	char name[RTE_ETH_NAME_MAX_LEN];
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN)) {
 		printf("Valid port range is [0");
@@ -423,6 +424,8 @@ port_infos_display(portid_t port_id)
 	       info_border, port_id, info_border);
 	rte_eth_macaddr_get(port_id, &mac_addr);
 	print_ethaddr("MAC address: ", &mac_addr);
+	rte_eth_dev_get_name_by_port(port_id, name);
+	printf("\nDevice name: %s", name);
 	printf("\nDriver name: %s", dev_info.driver_name);
 	printf("\nConnect to socket: %u", port->socket_id);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (4 preceding siblings ...)
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 5/9] app/testpmd: add port name to device info Declan Doherty
@ 2018-04-16 13:06   ` Declan Doherty
  2018-04-20 13:16     ` Ananyev, Konstantin
  2018-04-24 19:53     ` Thomas Monjalon
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator Declan Doherty
                     ` (3 subsequent siblings)
  9 siblings, 2 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:06 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Remy Horton, Declan Doherty

From: Remy Horton <remy.horton@intel.com>

Introduces a new structure, rte_eth_devargs, to support generic
ethdev arguments common across NET PMDs, with a new API
rte_eth_devargs_parse API to support PMD parsing these arguments.

Signed-off-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/Makefile                            |   1 +
 lib/librte_ether/rte_ethdev.c           | 195 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_driver.h    |  30 +++++
 lib/librte_ether/rte_ethdev_version.map |   1 +
 4 files changed, 227 insertions(+)

diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..4144d99f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
 DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
 DEPDIRS-librte_ether += librte_mbuf
+DEPDIRS-librte_ether += librte_kvargs
 DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
 DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1d38d8e75..a082b211c 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,6 +34,7 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
++#include <rte_kvargs.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -4149,6 +4150,200 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
 }
 
+typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
+
+static int
+rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
+{
+	int state;
+	struct rte_kvargs_pair *pair;
+	char *letter;
+
+	arglist->str = strdup(str_in);
+	if (arglist->str == NULL)
+		return -ENOMEM;
+
+	letter = arglist->str;
+	state = 0;
+	arglist->count = 0;
+	pair = &arglist->pairs[0];
+	while (1) {
+		switch (state) {
+		case 0: /* Initial */
+			if (*letter == '=')
+				return -EINVAL;
+			else if (*letter == '\0')
+				return 0;
+
+			state = 1;
+			pair->key = letter;
+			/* fall-thru */
+
+		case 1: /* Parsing key */
+			if (*letter == '=') {
+				*letter = '\0';
+				pair->value = letter + 1;
+				state = 2;
+			} else if (*letter == ',' || *letter == '\0')
+				return -EINVAL;
+			break;
+
+
+		case 2: /* Parsing value */
+			if (*letter == '[')
+				state = 3;
+			else if (*letter == ',') {
+				*letter = '\0';
+				arglist->count++;
+				pair = &arglist->pairs[arglist->count];
+				state = 0;
+			} else if (*letter == '\0') {
+				letter--;
+				arglist->count++;
+				pair = &arglist->pairs[arglist->count];
+				state = 0;
+			}
+			break;
+
+		case 3: /* Parsing list */
+			if (*letter == ']')
+				state = 2;
+			else if (*letter == '\0')
+				return -EINVAL;
+			break;
+		}
+		letter++;
+	}
+}
+
+static int
+rte_eth_devargs_parse_list(char *str, rte_eth_devargs_callback_t callback,
+	void *data)
+{
+	char *str_start;
+	int state;
+	int result;
+
+	if (*str != '[')
+		/* Single element, not a list */
+		return callback(str, data);
+
+	/* Sanity check, then strip the brackets */
+	str_start = &str[strlen(str) - 1];
+	if (*str_start != ']') {
+		RTE_LOG(ERR, EAL, "(%s): List does not end with ']'", str);
+		return -EINVAL;
+	}
+	str++;
+	*str_start = '\0';
+
+	/* Process list elements */
+	state = 0;
+	while (1) {
+		if (state == 0) {
+			if (*str == '\0')
+				break;
+			if (*str != ',') {
+				str_start = str;
+				state = 1;
+			}
+		} else if (state == 1) {
+			if (*str == ',' || *str == '\0') {
+				if (str > str_start) {
+					/* Non-empty string fragment */
+					*str = '\0';
+					result = callback(str_start, data);
+					if (result < 0)
+						return result;
+				}
+				state = 0;
+			}
+		}
+		str++;
+	}
+	return 0;
+}
+
+static int
+rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
+	const uint16_t max_list)
+{
+	unsigned int lo;
+	unsigned int hi;
+	unsigned int value;
+	int result;
+
+	result = sscanf(str, "%u-%u", &lo, &hi);
+	if (result == 1) {
+		if (*len_list >= max_list)
+			return -ENOMEM;
+		list[(*len_list)++] = lo;
+	} else if (result == 2) {
+		if (lo >= hi)
+			return -EINVAL;
+		for (value = lo; value <= hi; value++) {
+			if (*len_list >= max_list)
+				return -ENOMEM;
+			list[(*len_list)++] = value;
+		}
+	} else
+		return -EINVAL;
+	return 0;
+}
+
+static int
+rte_eth_devargs_parse_ports(char *str, void *data)
+{
+	struct rte_eth_devargs *eth_da = data;
+
+	return rte_eth_devargs_process_range(str, eth_da->ports,
+		&eth_da->nb_ports, RTE_MAX_ETHPORTS);
+}
+
+
+static int
+rte_eth_devargs_parse_representor_ports(char *str, void *data)
+{
+	struct rte_eth_devargs *eth_da = data;
+
+	return rte_eth_devargs_process_range(str, eth_da->representor_ports,
+		&eth_da->nb_representor_ports, RTE_MAX_ETHPORTS);
+}
+
+int __rte_experimental
+rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
+{
+	struct rte_kvargs args;
+	struct rte_kvargs_pair *pair;
+	unsigned int i;
+	int result;
+
+	memset(eth_da, 0, sizeof(*eth_da));
+
+	result = rte_eth_devargs_tokenise(&args, dargs);
+	if (result < 0)
+		return result;
+
+	for (i = 0; i < args.count; i++) {
+		pair = &args.pairs[i];
+
+		if (strcmp("port", pair->key) == 0) {
+			result = rte_eth_devargs_parse_list(pair->value,
+				rte_eth_devargs_parse_ports, eth_da);
+			if (result < 0)
+				return result;
+		} else if (strcmp("representor", pair->key) == 0) {
+			result = rte_eth_devargs_parse_list(pair->value,
+				rte_eth_devargs_parse_representor_ports,
+				eth_da);
+			if (result < 0)
+				return result;
+		}
+	}
+
+	return 0;
+}
+
 RTE_INIT(ethdev_init_log);
 static void
 ethdev_init_log(void)
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index e52add0ad..3bce5747d 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -189,6 +189,36 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 }
 
 
+/** Generic Ethernet device arguments  */
+struct rte_eth_devargs {
+	uint16_t ports[RTE_MAX_ETHPORTS];
+	/** port/s number to enable on a multi-port single function */
+	uint16_t nb_ports;
+	/** number of ports in ports field */
+	uint16_t representor_ports[RTE_MAX_ETHPORTS];
+	/** representor port/s identifier to enable on device */
+	uint16_t nb_representor_ports;
+	/** number of ports in representor port field */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function to parse ethdev arguments
+ *
+ * @param devargs
+ *  device arguments
+ * @param eth_devargs
+ *  parsed ethdev specific arguments.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_devargs);
+
+
 typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
 typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
 	void *bus_specific_init_params);
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index bd7232923..62ecbdb8a 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -233,6 +233,7 @@ EXPERIMENTAL {
 EXPERIMENTAL {
 	global:
 
+	rt_eth_devargs_parse;
 	rte_eth_dev_create;
 	rte_eth_dev_destroy;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (5 preceding siblings ...)
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser Declan Doherty
@ 2018-04-16 13:06   ` Declan Doherty
  2018-04-20 13:22     ` Ananyev, Konstantin
  2018-04-24 19:58     ` Thomas Monjalon
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 8/9] net/i40e: add support for representor ports Declan Doherty
                     ` (2 subsequent siblings)
  9 siblings, 2 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:06 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c           | 53 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_driver.h    | 39 ++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_version.map |  3 ++
 3 files changed, 95 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a082b211c..d1f95161f 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -4300,6 +4300,59 @@ rte_eth_devargs_parse_ports(char *str, void *data)
 		&eth_da->nb_ports, RTE_MAX_ETHPORTS);
 }
 
+/**
+ * A set of values to describe the possible states of a switch domain.
+ */
+enum rte_eth_switch_domain_state {
+	RTE_ETH_SWITCH_DOMAIN_UNUSED = 0,
+	RTE_ETH_SWITCH_DOMAIN_ALLOCATED
+};
+
+/**
+ * Array of switch domains available for allocation. Array is sized to
+ * RTE_MAX_ETHPORTS elements as there cannot be more active switch domains than
+ * ethdev ports in a single process.
+ */
+struct rte_eth_dev_switch {
+	enum rte_eth_switch_domain_state state;
+} rte_eth_switch_domains[RTE_MAX_ETHPORTS];
+
+int __rte_experimental
+rte_eth_switch_domain_alloc(uint16_t *domain_id)
+{
+	unsigned int i;
+
+	*domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
+
+	for (i = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID + 1;
+		i < RTE_MAX_ETHPORTS; i++) {
+		if (rte_eth_switch_domains[i].state ==
+			RTE_ETH_SWITCH_DOMAIN_UNUSED) {
+			rte_eth_switch_domains[i].state =
+				RTE_ETH_SWITCH_DOMAIN_ALLOCATED;
+			*domain_id = i;
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_eth_switch_domain_free(uint16_t domain_id)
+{
+	if (domain_id == RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID ||
+		domain_id >= RTE_MAX_ETHPORTS)
+		return -EINVAL;
+
+	if (rte_eth_switch_domains[domain_id].state !=
+		RTE_ETH_SWITCH_DOMAIN_ALLOCATED)
+		return -EINVAL;
+
+	rte_eth_switch_domains[domain_id].state = RTE_ETH_SWITCH_DOMAIN_UNUSED;
+
+	return 0;
+}
 
 static int
 rte_eth_devargs_parse_representor_ports(char *str, void *data)
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index 3bce5747d..c22fcbde1 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -188,6 +188,45 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 #endif
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate an unique switch domain identifier.
+ *
+ * A pool of switch domain identifiers which can be allocated on request. This
+ * will enabled devices which support the concept of switch domains to request
+ * a switch domain id which is guaranteed to be unique from other devices
+ * running in the same process.
+ *
+ * @param domain_id
+ *  switch domain identifier parameter to pass back to application
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_switch_domain_alloc(uint16_t *domain_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Free switch domain.
+ *
+ * Return a switch domain identifier to the pool of free identifiers after it is
+ * no longer in use by device.
+ *
+ * @param domain_id
+ *  switch domain identifier to free
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_switch_domain_free(uint16_t domain_id);
+
+
 
 /** Generic Ethernet device arguments  */
 struct rte_eth_devargs {
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 62ecbdb8a..6601ef106 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -236,5 +236,8 @@ EXPERIMENTAL {
 	rt_eth_devargs_parse;
 	rte_eth_dev_create;
 	rte_eth_dev_destroy;
+	rte_eth_switch_domain_alloc;
+	rte_eth_switch_domain_free;
+	rte_eth_switch_domains;
 
 } DPDK_18.05;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 8/9] net/i40e: add support for representor ports
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (6 preceding siblings ...)
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator Declan Doherty
@ 2018-04-16 13:06   ` Declan Doherty
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 9/9] net/ixgbe: " Declan Doherty
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:06 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty, Mohammad Abdul Awal, Remy Horton

Add support for virtual function representor ports to the i40e PF driver.
When SR-IOV virtual functions devices are enabled a corresponding
representor port for each VF can be enabled, in the process in which the
i40e PMD is running, by specifying the representor devargs with
the list of VF ports that representors are to be created for.

An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:

-w pci:D:B:D.F,representor=[0,2,4-7]

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 drivers/net/i40e/Makefile              |   3 +
 drivers/net/i40e/i40e_ethdev.c         |  82 ++++++-
 drivers/net/i40e/i40e_ethdev.h         |  16 ++
 drivers/net/i40e/i40e_vf_representor.c | 405 +++++++++++++++++++++++++++++++++
 drivers/net/i40e/meson.build           |   4 +-
 drivers/net/i40e/rte_pmd_i40e.c        |  43 ++++
 drivers/net/i40e/rte_pmd_i40e.h        |  18 ++
 lib/librte_ether/rte_ethdev.c          |   2 +-
 8 files changed, 564 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 5663f5b1c..6184b38f3 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -11,6 +11,8 @@ LIB = librte_pmd_i40e.a
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -DPF_DRIVER -DVF_DRIVER -DINTEGRATED_VF
 CFLAGS += -DX722_A0_SUPPORT
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
 LDLIBS += -lrte_bus_pci
@@ -85,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += rte_pmd_i40e.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_vf_representor.c
 
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_AVX2,$(CFLAGS)),RTE_MACHINE_CPUFLAG_AVX2)
 	CC_AVX2_SUPPORT=1
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 180ac7449..ad19af42f 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -213,7 +213,7 @@
 /* Bit mask of Extended Tag enable/disable */
 #define PCI_DEV_CTRL_EXT_TAG_MASK  (1 << PCI_DEV_CTRL_EXT_TAG_SHIFT)
 
-static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev);
+static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev, void *init_params);
 static int eth_i40e_dev_uninit(struct rte_eth_dev *eth_dev);
 static int i40e_dev_configure(struct rte_eth_dev *dev);
 static int i40e_dev_start(struct rte_eth_dev *dev);
@@ -607,16 +607,74 @@ static const struct rte_i40e_xstats_name_off rte_i40e_txq_prio_strings[] = {
 #define I40E_NB_TXQ_PRIO_XSTATS (sizeof(rte_i40e_txq_prio_strings) / \
 		sizeof(rte_i40e_txq_prio_strings[0]))
 
-static int eth_i40e_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+
+static int
+eth_i40e_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_probe(pci_dev,
-		sizeof(struct i40e_adapter), eth_i40e_dev_init);
+	char name[RTE_ETH_NAME_MAX_LEN];
+	struct rte_eth_devargs eth_da = { .nb_representor_ports = 0 };
+	int i, retval;
+
+	retval = rte_eth_devargs_parse(pci_dev->device.devargs->args, &eth_da);
+	if (retval)
+		return retval;
+
+	/* physical port net_bdf_port */
+	snprintf(name, sizeof(name), "net_%s", pci_dev->device.name);
+
+	retval = rte_eth_dev_create(&pci_dev->device, name,
+		sizeof(struct i40e_adapter),
+		eth_dev_pci_specific_init, pci_dev,
+		eth_i40e_dev_init, NULL);
+
+	if (retval || eth_da.nb_representor_ports < 1)
+		return retval;
+
+	/* probe VF representor ports */
+	struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+
+	if (pf_ethdev == NULL)
+		return -ENODEV;
+
+	for (i = 0; i < eth_da.nb_representor_ports; i++) {
+		struct i40e_vf_representor representor = {
+			.vf_id = eth_da.representor_ports[i],
+			.switch_domain_id = I40E_DEV_PRIVATE_TO_PF(
+				pf_ethdev->data->dev_private)->switch_domain_id,
+			.adapter = I40E_DEV_PRIVATE_TO_ADAPTER(
+				pf_ethdev->data->dev_private)
+		};
+
+		/* representor port net_bdf_port */
+		snprintf(name, sizeof(name), "net_%s_representor_%d",
+			pci_dev->device.name, eth_da.representor_ports[i]);
+
+		retval = rte_eth_dev_create(&pci_dev->device, name,
+			sizeof(struct i40e_vf_representor), NULL, NULL,
+			i40e_vf_representor_init, &representor);
+
+		if (retval)
+			PMD_DRV_LOG(ERR, "failed to create i40e vf "
+				"representor %s.", name);
+	}
+
+	return 0;
 }
 
 static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_remove(pci_dev, eth_i40e_dev_uninit);
+	struct rte_eth_dev *ethdev;
+
+	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!ethdev)
+		return -ENODEV;
+
+
+	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		return rte_eth_dev_destroy(ethdev, i40e_vf_representor_uninit);
+	else
+		return rte_eth_dev_destroy(ethdev, eth_i40e_dev_uninit);
 }
 
 static struct rte_pci_driver rte_i40e_pmd = {
@@ -1090,7 +1148,7 @@ i40e_support_multi_driver(struct rte_eth_dev *dev)
 }
 
 static int
-eth_i40e_dev_init(struct rte_eth_dev *dev)
+eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params __rte_unused)
 {
 	struct rte_pci_device *pci_dev;
 	struct rte_intr_handle *intr_handle;
@@ -1517,6 +1575,10 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
 	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 	intr_handle = &pci_dev->intr_handle;
 
+	ret = rte_eth_switch_domain_free(pf->switch_domain_id);
+	if (ret)
+		PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
+
 	if (hw->adapter_stopped == 0)
 		i40e_dev_close(dev);
 
@@ -2323,7 +2385,7 @@ i40e_dev_reset(struct rte_eth_dev *dev)
 	if (ret)
 		return ret;
 
-	ret = eth_i40e_dev_init(dev);
+	ret = eth_i40e_dev_init(dev, NULL);
 
 	return ret;
 }
@@ -5748,6 +5810,12 @@ i40e_pf_setup(struct i40e_pf *pf)
 		PMD_DRV_LOG(ERR, "Could not get switch config, err %d", ret);
 		return ret;
 	}
+
+	ret = rte_eth_switch_domain_alloc(&pf->switch_domain_id);
+	if (ret)
+		PMD_INIT_LOG(WARNING,
+			"failed to allocate switch domain for device %d", ret);
+
 	if (pf->flags & I40E_FLAG_FDIR) {
 		/* make queue allocated first, let FDIR use queue pair 0*/
 		ret = i40e_res_pool_alloc(&pf->qp_pool, I40E_DEFAULT_QP_NUM_FDIR);
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index d33b255e7..7aaae71c0 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -957,6 +957,8 @@ struct i40e_pf {
 	bool gtp_support; /* 1 - support GTP-C and GTP-U */
 	/* customer customized pctype */
 	struct i40e_customized_pctype customized_pctype[I40E_CUSTOMIZED_MAX];
+	/* Switch Domain Id */
+	uint16_t switch_domain_id;
 };
 
 enum pending_msg {
@@ -1062,6 +1064,18 @@ struct i40e_adapter {
 	uint64_t pctypes_mask;
 };
 
+/**
+ * Strucute to store private data for each VF representor instance
+ */
+struct i40e_vf_representor {
+	uint16_t switch_domain_id;
+	/**< Virtual Function ID */
+	uint16_t vf_id;
+	/**< Virtual Function ID */
+	struct i40e_adapter *adapter;
+	/**< Private data store of assocaiated physical function */
+};
+
 extern const struct rte_flow_ops i40e_flow_ops;
 
 union i40e_filter_t {
@@ -1221,6 +1235,8 @@ int i40e_set_rss_key(struct i40e_vsi *vsi, uint8_t *key, uint8_t key_len);
 int i40e_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size);
 int i40e_config_rss_filter(struct i40e_pf *pf,
 		struct i40e_rte_flow_rss_conf *conf, bool add);
+int i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params);
+int i40e_vf_representor_uninit(struct rte_eth_dev *ethdev);
 
 #define I40E_DEV_TO_PCI(eth_dev) \
 	RTE_DEV_TO_PCI((eth_dev)->device)
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
new file mode 100644
index 000000000..e11d9c0c9
--- /dev/null
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -0,0 +1,405 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_bus_pci.h>
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_malloc.h>
+
+#include "base/i40e_type.h"
+#include "base/virtchnl.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "rte_pmd_i40e.h"
+
+static int
+i40e_vf_representor_link_update(struct rte_eth_dev *ethdev,
+	int wait_to_complete)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return i40e_dev_link_update(representor->adapter->eth_dev,
+		wait_to_complete);
+}
+static void
+i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	/* get dev info for the vdev */
+	dev_info->device = ethdev->device;
+
+	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
+	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
+
+	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
+	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+		sizeof(uint32_t);
+	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
+	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+	dev_info->max_mac_addrs = I40E_NUM_MACADDR_MAX;
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_CKSUM;
+	dev_info->tx_offload_capa =
+		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT |
+		DEV_TX_OFFLOAD_IPV4_CKSUM |
+		DEV_TX_OFFLOAD_UDP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_CKSUM |
+		DEV_TX_OFFLOAD_SCTP_CKSUM |
+		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+		DEV_TX_OFFLOAD_TCP_TSO |
+		DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
+		DEV_TX_OFFLOAD_GRE_TNL_TSO |
+		DEV_TX_OFFLOAD_IPIP_TNL_TSO |
+		DEV_TX_OFFLOAD_GENEVE_TNL_TSO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_thresh = {
+			.pthresh = I40E_DEFAULT_RX_PTHRESH,
+			.hthresh = I40E_DEFAULT_RX_HTHRESH,
+			.wthresh = I40E_DEFAULT_RX_WTHRESH,
+		},
+		.rx_free_thresh = I40E_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_thresh = {
+			.pthresh = I40E_DEFAULT_TX_PTHRESH,
+			.hthresh = I40E_DEFAULT_TX_HTHRESH,
+			.wthresh = I40E_DEFAULT_TX_WTHRESH,
+		},
+		.tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH,
+		.tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH,
+		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+				ETH_TXQ_FLAGS_NOOFFLOADS,
+	};
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = I40E_MAX_RING_DESC,
+		.nb_min = I40E_MIN_RING_DESC,
+		.nb_align = I40E_ALIGN_RING_DESC,
+	};
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = I40E_MAX_RING_DESC,
+		.nb_min = I40E_MIN_RING_DESC,
+		.nb_align = I40E_ALIGN_RING_DESC,
+	};
+
+	dev_info->switch_info.name =
+		representor->adapter->eth_dev->device->name;
+	dev_info->switch_info.domain_id = representor->switch_domain_id;
+	dev_info->switch_info.port_id = representor->vf_id;
+}
+
+static int
+i40e_vf_representor_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_dev_start(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static void
+i40e_vf_representor_dev_stop(__rte_unused struct rte_eth_dev *dev)
+{
+}
+
+static int
+i40e_vf_representor_rx_queue_setup(__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_rxconf *rx_conf,
+	__rte_unused struct rte_mempool *mb_pool)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_tx_queue_setup(__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_txconf *tx_conf)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_stats_get(struct rte_eth_dev *ethdev,
+		struct rte_eth_stats *stats)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_get_vf_stats(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, stats);
+}
+
+static void
+i40e_vf_representor_stats_reset(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_reset_vf_stats(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id);
+}
+
+static void
+i40e_vf_representor_promiscuous_enable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_unicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, 1);
+}
+
+static void
+i40e_vf_representor_promiscuous_disable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_unicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, 0);
+}
+
+
+static void
+i40e_vf_representor_allmulticast_enable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_multicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id,  1);
+}
+
+static void
+i40e_vf_representor_allmulticast_disable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_multicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id,  0);
+}
+
+static void
+i40e_vf_representor_mac_addr_remove(struct rte_eth_dev *ethdev, uint32_t index)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_remove_vf_mac_addr(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, &ethdev->data->mac_addrs[index]);
+}
+
+static int
+i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+		struct ether_addr *mac_addr)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_set_vf_mac_addr(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, mac_addr);
+}
+
+static int
+i40e_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
+		uint16_t vlan_id, int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+	uint64_t vf_mask = 1ULL << representor->vf_id;
+
+	return rte_pmd_i40e_set_vf_vlan_filter(
+		representor->adapter->eth_dev->data->port_id,
+		vlan_id, vf_mask, on);
+}
+
+static int
+i40e_vf_representor_vlan_offload_set(struct rte_eth_dev *ethdev, int mask)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+	struct rte_eth_dev *pdev;
+	struct i40e_pf_vf *vf;
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	uint32_t vfid;
+
+	pdev = representor->adapter->eth_dev;
+	vfid = representor->vf_id;
+
+	if (!is_i40e_supported(pdev)) {
+		PMD_DRV_LOG(ERR, "Invalid PF dev.");
+		return -EINVAL;
+	}
+
+	pf = I40E_DEV_PRIVATE_TO_PF(pdev->data->dev_private);
+
+	if (vfid >= pf->vf_num || !pf->vfs) {
+		PMD_DRV_LOG(ERR, "Invalid VF ID.");
+		return -EINVAL;
+	}
+
+	vf = &pf->vfs[vfid];
+	vsi = vf->vsi;
+	if (!vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -EINVAL;
+	}
+
+	if (mask & ETH_VLAN_FILTER_MASK) {
+		/* Enable or disable VLAN filtering offload */
+		if (ethdev->data->dev_conf.rxmode.hw_vlan_filter)
+			return i40e_vsi_config_vlan_filter(vsi, TRUE);
+		else
+			return i40e_vsi_config_vlan_filter(vsi, FALSE);
+	}
+
+	if (mask & ETH_VLAN_STRIP_MASK) {
+		/* Enable or disable VLAN stripping offload */
+		if (ethdev->data->dev_conf.rxmode.hw_vlan_strip)
+			return i40e_vsi_config_vlan_stripping(vsi, TRUE);
+		else
+			return i40e_vsi_config_vlan_stripping(vsi, FALSE);
+	}
+
+	return -EINVAL;
+}
+
+static void
+i40e_vf_representor_vlan_strip_queue_set(struct rte_eth_dev *ethdev,
+	__rte_unused uint16_t rx_queue_id, int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_vlan_stripq(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, on);
+}
+
+static int
+i40e_vf_representor_vlan_pvid_set(struct rte_eth_dev *ethdev, uint16_t vlan_id,
+	__rte_unused int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_set_vf_vlan_insert(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, vlan_id);
+}
+
+struct eth_dev_ops i40e_representor_dev_ops = {
+	.dev_infos_get        = i40e_vf_representor_dev_infos_get,
+
+	.dev_start            = i40e_vf_representor_dev_start,
+	.dev_configure        = i40e_vf_representor_dev_configure,
+	.dev_stop             = i40e_vf_representor_dev_stop,
+
+	.rx_queue_setup       = i40e_vf_representor_rx_queue_setup,
+	.tx_queue_setup       = i40e_vf_representor_tx_queue_setup,
+
+	.link_update          = i40e_vf_representor_link_update,
+
+	.stats_get            = i40e_vf_representor_stats_get,
+	.stats_reset          = i40e_vf_representor_stats_reset,
+
+	.promiscuous_enable   = i40e_vf_representor_promiscuous_enable,
+	.promiscuous_disable  = i40e_vf_representor_promiscuous_disable,
+
+	.allmulticast_enable  = i40e_vf_representor_allmulticast_enable,
+	.allmulticast_disable = i40e_vf_representor_allmulticast_disable,
+
+	.mac_addr_remove      = i40e_vf_representor_mac_addr_remove,
+	.mac_addr_set         = i40e_vf_representor_mac_addr_set,
+
+	.vlan_filter_set      = i40e_vf_representor_vlan_filter_set,
+	.vlan_offload_set     = i40e_vf_representor_vlan_offload_set,
+	.vlan_strip_queue_set = i40e_vf_representor_vlan_strip_queue_set,
+	.vlan_pvid_set        = i40e_vf_representor_vlan_pvid_set
+
+};
+
+
+int
+i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	struct i40e_pf *pf;
+	struct i40e_pf_vf *vf;
+	struct rte_eth_link *link;
+
+	representor->vf_id =
+		((struct i40e_vf_representor *)init_params)->vf_id;
+	representor->switch_domain_id =
+		((struct i40e_vf_representor *)init_params)->switch_domain_id;
+	representor->adapter =
+		((struct i40e_vf_representor *)init_params)->adapter;
+
+	pf = I40E_DEV_PRIVATE_TO_PF(
+		representor->adapter->eth_dev->data->dev_private);
+
+	if (representor->vf_id >= pf->vf_num)
+		return -ENODEV;
+
+	/** representor shares the same driver as it's PF device */
+	ethdev->device->driver = representor->adapter->eth_dev->device->driver;
+
+	/* Set representor device ops */
+	ethdev->dev_ops = &i40e_representor_dev_ops;
+
+	/* No data-path so no RX/TX functions */
+	ethdev->rx_pkt_burst = NULL;
+	ethdev->tx_pkt_burst = NULL;
+
+	vf = &pf->vfs[representor->vf_id];
+
+	if (!vf->vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -ENODEV;
+	}
+
+	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
+
+	/* Setting the number queues allocated to the VF */
+	ethdev->data->nb_rx_queues = vf->vsi->nb_qps;
+	ethdev->data->nb_tx_queues = vf->vsi->nb_qps;
+
+	ethdev->data->mac_addrs = &vf->mac_addr;
+
+	/* Link state. Inherited from PF */
+	link = &representor->adapter->eth_dev->data->dev_link;
+
+	ethdev->data->dev_link.link_speed = link->link_speed;
+	ethdev->data->dev_link.link_duplex = link->link_duplex;
+	ethdev->data->dev_link.link_status = link->link_status;
+	ethdev->data->dev_link.link_autoneg = link->link_autoneg;
+
+	return 0;
+}
+
+
+int
+i40e_vf_representor_uninit(struct rte_eth_dev *ethdev __rte_unused)
+{
+	return 0;
+}
diff --git a/drivers/net/i40e/meson.build b/drivers/net/i40e/meson.build
index 197e611d8..f2129df07 100644
--- a/drivers/net/i40e/meson.build
+++ b/drivers/net/i40e/meson.build
@@ -6,7 +6,8 @@ version = 2
 cflags += ['-DPF_DRIVER',
 	'-DVF_DRIVER',
 	'-DINTEGRATED_VF',
-	'-DX722_A0_SUPPORT']
+	'-DX722_A0_SUPPORT',
+	'-DALLOW_EXPERIMENTAL_API']
 
 subdir('base')
 objs = [base_objs]
@@ -19,6 +20,7 @@ sources = files(
 	'i40e_fdir.c',
 	'i40e_flow.c',
 	'i40e_tm.c',
+	'i40e_vf_representor.c',
 	'rte_pmd_i40e.c'
 	)
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 9f9a6504d..7aa1a7518 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -570,6 +570,49 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	return 0;
 }
 
+static const struct ether_addr null_mac_addr;
+
+int
+rte_pmd_i40e_remove_vf_mac_addr(uint16_t port, uint16_t vf_id,
+	struct ether_addr *mac_addr)
+{
+	struct rte_eth_dev *dev;
+	struct i40e_pf_vf *vf;
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+
+	if (i40e_validate_mac_addr((u8 *)mac_addr) != I40E_SUCCESS)
+		return -EINVAL;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+
+	if (!is_i40e_supported(dev))
+		return -ENOTSUP;
+
+	pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+
+	if (vf_id >= pf->vf_num || !pf->vfs)
+		return -EINVAL;
+
+	vf = &pf->vfs[vf_id];
+	vsi = vf->vsi;
+	if (!vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -EINVAL;
+	}
+
+	if (is_same_ether_addr(mac_addr, &vf->mac_addr))
+		/* Reset the mac with NULL address */
+		ether_addr_copy(&null_mac_addr, &vf->mac_addr);
+
+	/* Remove the mac */
+	i40e_vsi_delete_mac(vsi, mac_addr);
+
+	return 0;
+}
+
 /* Set vlan strip on/off for specific VF from host */
 int
 rte_pmd_i40e_set_vf_vlan_stripq(uint16_t port, uint16_t vf_id, uint8_t on)
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index d248adb1a..be4a6024a 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -455,6 +455,24 @@ int rte_pmd_i40e_set_vf_multicast_promisc(uint16_t port,
 int rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 				 struct ether_addr *mac_addr);
 
+/**
+ * Remove the VF MAC address.
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param vf_id
+ *   VF id.
+ * @param mac_addr
+ *   VF MAC address.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-EINVAL) if *vf* or *mac_addr* is invalid.
+ */
+int
+rte_pmd_i40e_remove_vf_mac_addr(uint16_t port, uint16_t vf_id,
+	struct ether_addr *mac_addr);
+
 /**
  * Enable/Disable vf vlan strip for all queues in a pool
  *
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index d1f95161f..7a4bde127 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,7 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-+#include <rte_kvargs.h>
+#include <rte_kvargs.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v7 9/9] net/ixgbe: add support for representor ports
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (7 preceding siblings ...)
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 8/9] net/i40e: add support for representor ports Declan Doherty
@ 2018-04-16 13:06   ` Declan Doherty
  2018-04-20 13:29     ` Ananyev, Konstantin
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
  9 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-04-16 13:06 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Declan Doherty, Mohammad Abdul Awal, Remy Horton

Add support for virtual function representor ports to the ixgbe PF driver.
When SR-IOV virtual functions devices are enabled a corresponding
representor port for each VF can be enabled in the process in which the
i40e PMD is running within, by specifying the representor devargs with
the list of VF ports that representors are to be created for.

An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:

-w pci:D:B:D.F,representor=[0,2,4-7]

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 drivers/net/ixgbe/Makefile               |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c         |  73 +++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.h         |  14 ++
 drivers/net/ixgbe/ixgbe_pf.c             |   7 +
 drivers/net/ixgbe/ixgbe_vf_representor.c | 217 +++++++++++++++++++++++++++++++
 drivers/net/ixgbe/meson.build            |   1 +
 6 files changed, 305 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index d0804fc5b..f8725aebb 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -103,6 +103,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ipsec.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += rte_pmd_ixgbe.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_vf_representor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)-include := rte_pmd_ixgbe.h
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a5e2fc0ca..0d81db008 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -132,7 +132,7 @@
 #define IXGBE_EXVET_VET_EXT_SHIFT              16
 #define IXGBE_DMATXCTL_VT_MASK                 0xFFFF0000
 
-static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
+static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
 static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev);
 static int ixgbe_fdir_filter_uninit(struct rte_eth_dev *eth_dev);
@@ -1043,7 +1043,7 @@ ixgbe_swfw_lock_reset(struct ixgbe_hw *hw)

  * It returns 0 on success.
  */
 static int
-eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
+eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 {
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
 	struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
@@ -1226,6 +1226,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	/* initialize PF if max_vfs not zero */
 	ixgbe_pf_host_init(eth_dev);
 
+
 	ctrl_ext = IXGBE_READ_REG(hw, IXGBE_CTRL_EXT);
 	/* let hardware know driver is loaded */
 	ctrl_ext |= IXGBE_CTRL_EXT_DRV_LOAD;
@@ -1716,16 +1717,72 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 	return 0;
 }
 
-static int eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+static int
+eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_probe(pci_dev,
-		sizeof(struct ixgbe_adapter), eth_ixgbe_dev_init);
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_eth_devargs eth_da;
+	int i, retval;
+
+	retval = rte_eth_devargs_parse(pci_dev->device.devargs->args, &eth_da);
+	if (retval)
+		return retval;
+
+	/* physical port net_bdf_port */
+	snprintf(name, sizeof(name), "net_%s_%d", pci_dev->device.name, 0);
+
+	retval = rte_eth_dev_create(&pci_dev->device, name,
+		sizeof(struct ixgbe_adapter),
+		eth_dev_pci_specific_init, pci_dev,
+		eth_ixgbe_dev_init, NULL);
+
+	if (retval || eth_da.nb_representor_ports < 1)
+		return retval;
+
+	/* probe VF representor ports */
+	struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+
+	for (i = 0; i < eth_da.nb_representor_ports; i++) {
+		struct ixgbe_vf_info *vfinfo = *IXGBE_DEV_PRIVATE_TO_P_VFDATA(
+			pf_ethdev->data->dev_private);
+
+		struct ixgbe_vf_representor representor = {
+			.vf_id = eth_da.representor_ports[i],
+			.switch_domain_id = vfinfo->switch_domain_id,
+			.pf_ethdev = pf_ethdev
+		};
+
+		/* representor port net_bdf_port */
+		snprintf(name, sizeof(name), "net_%s_representor_%d",
+			pci_dev->device.name,
+			eth_da.representor_ports[i]);
+
+		retval = rte_eth_dev_create(&pci_dev->device, name,
+			sizeof(struct ixgbe_vf_representor), NULL, NULL,
+			ixgbe_vf_representor_init, &representor);
+
+		if (retval)
+			PMD_DRV_LOG(ERR, "failed to create ixgbe vf "
+				"representor %s.", name);
+	}
+
+	return 0;
 }
 
 static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_remove(pci_dev, eth_ixgbe_dev_uninit);
+	struct rte_eth_dev *ethdev;
+
+	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!ethdev)
+		return -ENODEV;
+
+	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		return rte_eth_dev_destroy(ethdev, ixgbe_vf_representor_uninit);
+	else
+		return rte_eth_dev_destroy(ethdev, eth_ixgbe_dev_uninit);
 }
 
 static struct rte_pci_driver rte_ixgbe_pmd = {
@@ -2868,7 +2925,7 @@ ixgbe_dev_reset(struct rte_eth_dev *dev)
 	if (ret)
 		return ret;
 
-	ret = eth_ixgbe_dev_init(dev);
+	ret = eth_ixgbe_dev_init(dev, NULL);
 
 	return ret;
 }
@@ -3883,7 +3940,7 @@ ixgbevf_check_link(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
 }
 
 /* return 0 means link status changed, -1 means not changed */
-static int
+int
 ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
 			    int wait_to_complete, int vf)
 {
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 655077700..1947442d9 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -253,6 +253,7 @@ struct ixgbe_vf_info {
 	uint16_t vlan_count;
 	uint8_t spoofchk_enabled;
 	uint8_t api_version;
+	uint16_t switch_domain_id;
 };
 
 /*
@@ -480,6 +481,15 @@ struct ixgbe_adapter {
  	struct ixgbe_tm_conf        tm_conf;
 };
 
+struct ixgbe_vf_representor {
+	uint16_t vf_id;
+	uint16_t switch_domain_id;
+	struct rte_eth_dev *pf_ethdev;
+};
+
+int ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params);
+int ixgbe_vf_representor_uninit(struct rte_eth_dev *ethdev);
+
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct ixgbe_adapter *)adapter)->hw)
 
@@ -652,6 +662,10 @@ int ixgbe_fdir_filter_program(struct rte_eth_dev *dev,
 
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);
 
+int
+ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
+			    int wait_to_complete, int vf);
+
 /*
  * misc function prototypes
  */
diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 4e61310af..4d199c802 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -90,6 +90,8 @@ void ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)
 	if (*vfinfo == NULL)
 		rte_panic("Cannot allocate memory for private VF data\n");
 
+	rte_eth_switch_domain_alloc(&(*vfinfo)->switch_domain_id);
+
 	memset(mirror_info, 0, sizeof(struct ixgbe_mirror_info));
 	memset(uta_info, 0, sizeof(struct ixgbe_uta_info));
 	hw->mac.mc_filter_type = 0;
@@ -122,6 +124,7 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct ixgbe_vf_info **vfinfo;
 	uint16_t vf_num;
+	int ret;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -132,6 +135,10 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
 	RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0;
 	RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0;
 
+	ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id);
+	if (ret)
+		PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
+
 	vf_num = dev_num_vf(eth_dev);
 	if (vf_num == 0)
 		return;
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
new file mode 100644
index 000000000..6254f6afa
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_malloc.h>
+
+#include "base/ixgbe_type.h"
+#include "base/ixgbe_vf.h"
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+#include "rte_pmd_ixgbe.h"
+
+
+static int
+ixgbe_vf_representor_link_update(struct rte_eth_dev *ethdev,
+	int wait_to_complete)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	return ixgbe_dev_link_update_share(representor->pf_ethdev,
+		wait_to_complete, 1);
+}
+
+static int
+ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+	struct ether_addr *mac_addr)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_ixgbe_set_vf_mac_addr(
+		representor->pf_ethdev->data->port_id,
+		representor->vf_id, mac_addr);
+}
+
+static void
+ixgbe_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(
+		representor->pf_ethdev->data->dev_private);
+
+	dev_info->device = representor->pf_ethdev->device;
+
+	dev_info->min_rx_bufsize = 1024;
+	/**< Minimum size of RX buffer. */
+	dev_info->max_rx_pktlen = 9728;
+	/**< Maximum configurable length of RX pkt. */
+	dev_info->max_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
+	/**< Maximum number of RX queues. */
+	dev_info->max_tx_queues = IXGBE_VF_MAX_TX_QUEUES;
+	/**< Maximum number of TX queues. */
+
+	dev_info->max_mac_addrs = hw->mac.num_rar_entries;
+	/**< Maximum number of MAC addresses. */
+
+	dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |	DEV_RX_OFFLOAD_UDP_CKSUM  |
+		DEV_RX_OFFLOAD_TCP_CKSUM;
+	/**< Device RX offload capabilities. */
+
+	dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_SCTP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_TSO;
+	/**< Device TX offload capabilities. */
+
+	dev_info->speed_capa =
+		representor->pf_ethdev->data->dev_link.link_speed;
+	/**< Supported speeds bitmap (ETH_LINK_SPEED_). */
+
+	dev_info->switch_info.name =
+		representor->pf_ethdev->device->name;
+	dev_info->switch_info.domain_id = representor->switch_domain_id;
+	dev_info->switch_info.port_id = representor->vf_id;
+}
+
+static int ixgbe_vf_representor_dev_configure(
+		__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_rx_queue_setup(
+	__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_rxconf *rx_conf,
+	__rte_unused struct rte_mempool *mb_pool)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_tx_queue_setup(
+	__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_txconf *tx_conf)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_dev_start(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static void ixgbe_vf_representor_dev_stop(__rte_unused struct rte_eth_dev *dev)
+{
+}
+
+static int
+ixgbe_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
+	uint16_t vlan_id, int on)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+	uint64_t vf_mask = 1ULL << representor->vf_id;
+
+	return rte_pmd_ixgbe_set_vf_vlan_filter(
+		representor->pf_ethdev->data->port_id, vlan_id, vf_mask, on);
+}
+
+static void
+ixgbe_vf_representor_vlan_strip_queue_set(struct rte_eth_dev *ethdev,
+	__rte_unused uint16_t rx_queue_id, int on)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_ixgbe_set_vf_vlan_stripq(representor->pf_ethdev->data->port_id,
+		representor->vf_id, on);
+}
+
+struct eth_dev_ops ixgbe_vf_representor_dev_ops = {
+	.dev_infos_get		= ixgbe_vf_representor_dev_infos_get,
+
+	.dev_start		= ixgbe_vf_representor_dev_start,
+	.dev_configure		= ixgbe_vf_representor_dev_configure,
+	.dev_stop		= ixgbe_vf_representor_dev_stop,
+
+	.rx_queue_setup		= ixgbe_vf_representor_rx_queue_setup,
+	.tx_queue_setup		= ixgbe_vf_representor_tx_queue_setup,
+
+	.link_update		= ixgbe_vf_representor_link_update,
+
+	.vlan_filter_set	= ixgbe_vf_representor_vlan_filter_set,
+	.vlan_strip_queue_set	= ixgbe_vf_representor_vlan_strip_queue_set,
+
+	.mac_addr_set		= ixgbe_vf_representor_mac_addr_set,
+};
+
+
+int
+ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	struct ixgbe_vf_info *vf_data;
+	struct rte_pci_device *pci_dev;
+	struct rte_eth_link *link;
+
+	if (!representor)
+		return -ENOMEM;
+
+	representor->vf_id =
+		((struct ixgbe_vf_representor *)init_params)->vf_id;
+	representor->switch_domain_id =
+		((struct ixgbe_vf_representor *)init_params)->switch_domain_id;
+	representor->pf_ethdev =
+		((struct ixgbe_vf_representor *)init_params)->pf_ethdev;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(representor->pf_ethdev);
+
+	if (representor->vf_id >= pci_dev->max_vfs)
+		return -ENODEV;
+
+	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
+
+	/* Set representor device ops */
+	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
+
+	/* No data-path so no RX/TX functions */
+	ethdev->rx_pkt_burst = NULL;
+	ethdev->tx_pkt_burst = NULL;
+
+	/* Setting the number queues allocated to the VF */
+	ethdev->data->nb_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
+	ethdev->data->nb_tx_queues = IXGBE_VF_MAX_RX_QUEUES;
+
+	/* Reference VF mac address from PF data structure */
+	vf_data = *IXGBE_DEV_PRIVATE_TO_P_VFDATA(
+		representor->pf_ethdev->data->dev_private);
+
+	ethdev->data->mac_addrs = (struct ether_addr *)
+		vf_data[representor->vf_id].vf_mac_addresses;
+
+	/* Link state. Inherited from PF */
+	link = &representor->pf_ethdev->data->dev_link;
+
+	ethdev->data->dev_link.link_speed = link->link_speed;
+	ethdev->data->dev_link.link_duplex = link->link_duplex;
+	ethdev->data->dev_link.link_status = link->link_status;
+	ethdev->data->dev_link.link_autoneg = link->link_autoneg;
+
+	return 0;
+}
+
+
+int
+ixgbe_vf_representor_uninit(struct rte_eth_dev *ethdev __rte_unused)
+{
+	return 0;
+}
diff --git a/drivers/net/ixgbe/meson.build b/drivers/net/ixgbe/meson.build
index f649e659d..5c3a8ca9f 100644
--- a/drivers/net/ixgbe/meson.build
+++ b/drivers/net/ixgbe/meson.build
@@ -19,6 +19,7 @@ sources = files(
 	'ixgbe_pf.c',
 	'ixgbe_rxtx.c',
 	'ixgbe_tm.c',
+	'ixgbe_vf_rerpesentor.c',
 	'rte_pmd_ixgbe.c'
 )
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation Declan Doherty
@ 2018-04-16 15:55     ` Kovacevic, Marko
  0 siblings, 0 replies; 73+ messages in thread
From: Kovacevic, Marko @ 2018-04-16 15:55 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Doherty, Declan, Adrien Mazarguil

Small changes commented below.


> Add document to describe the  model for representing switching capable
> devices in DPDK, using a general ethdev port model and through port
> representors. This document also details the port model and the rte_flow
> semantics required for flow programming, as well as listing some example
> use cases.
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazaguil@6wind.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  doc/guides/prog_guide/index.rst                 |   1 +
>  doc/guides/prog_guide/switch_representation.rst | 837
> ++++++++++++++++++++++++
>  2 files changed, 838 insertions(+)

<...>

> +- As virtual devices, they may be more limited than their physical
> +  counterparts, for instance by exposing only a subset of device
> +  configuration callbacks and/or by not necessarily having Rx/Tx capability.
> +
> +- Among other things, they can be used to assign MAC addresses to the
> +  resource they represent.
> +
> +- Applications can tell port representors apart from other physcial of
> +virtual
> +  port by checking the dev_flags field within their device information
> +  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.

physcial / physical


> +.. code-block:: c
> +
> +  struct rte_eth_dev_info {
> +	..
> +	uint32_t dev_flags; /**< Device flags */
> +	..
> +  };

In the code block above tabs were used you can use spaces instead

<...>

> +Extensions
> +~~~~~~~~~~
> +
> +Compared to creating a brand new dedicated interface, **rte_flow** was
> +deemed flexible enough to manage representor traffic only with minor
> +extensions:
> +
> +- Using physical ports, PF, VF or port representors as targets.
> +
> +- Affecting traffic that is not necessarily addressed to the DPDK port
> +ID a
> +  flow rule is associated with (e.g. forcing VF traffic redirection to PF).
> +
> +For advanced uses:
> +
> +- Rule-based packet counters.
> +
> +- The ability to combine several identical actions for traffic
> +duplication
> +  (e.g. VF representor in addition to a physical port).
> +
> +- Dedicated actions for traffic encapsulation / decapsulation before
> +  reaching a endpoint.
> +

^^^^^^^^^^^^^
reaching an endpoint


<...>

> +Switching Examples
> +------------------
> +
> +This section provides practical examples based on the established
> +Testpmd flow command syntax [2]_, in the context described in `traffic
> +steering`_
> +
> +::
> +
> +      .-------------.                 .-------------. .-------------.
> +      | hypervisor  |                 |    VM 1     | |    VM 2     |
> +      | application |                 | application | | application |
> +      `--+---+---+--'                 `----------+--' `--+----------'
> +         |   |   |                               |       |
> +         |   |   `-------------------.           |       |
> +         |   `---------.             |           |       |
> +         |             |             |           |       |
> +   .----(A)----. .----(B)----. .----(C)----.     |       |
> +   | port_id 3 | | port_id 4 | | port_id 5 |     |       |
> +   `-----+-----' `-----+-----' `-----+-----'     |       |
> +         |             |             |           |       |
> +       .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
> +       | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
> +       `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
> +         |             |             |           |       |
> +         |             |   .---------'           |       |
> +         `-----.       |   |   .-----------------'       |
> +               |       |   |   |   .---------------------'
> +               |       |   |   |   |
> +            .--|-------|---|---|---|--.
> +            |  |       |   `---|---'  |
> +            |  |       `-------'      |
> +            |  `---------.            |
> +            `------------|------------'
> +                         |
> +                    .---(F)----.
> +                    | physical |
> +                    |  port 0  |
> +                    `----------'
> +
> +By default, PF (**A**) can communicate with the physical port it is
> +associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are
> +isolated and restricted to communicate with the hypervisor application
> +through their respective representors (**B** and **C**) if supported.
> +
> +Examples in subsequent sections apply to hypervisor applications only
> +and are based on port representors **A**, **B** and **C**.
> +
> +.. [2] `Flow syntax
> +
> +<http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-
> synt
> +ax>`
> +

^^^^^^^^^^^^^^^^^^^^^^^^^
You're missing an underscore to complete the link for the case above its not clickable at the moment

<...>

Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
@ 2018-04-20 13:01     ` Ananyev, Konstantin
  2018-04-24 17:48     ` Thomas Monjalon
  1 sibling, 0 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 13:01 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Doherty, Declan

Hi Declan,

> Add new bus generic ethdev create/destroy APIs which are bus independent
> and provide hooks for bus specific initialisation.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c           | 95 ++++++++++++++++++++++++++++++++-
>  lib/librte_ether/rte_ethdev_driver.h    | 57 ++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_pci.h       | 12 +++++
>  lib/librte_ether/rte_ethdev_version.map |  8 +++
>  4 files changed, 171 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 3c049ef43..b16d23b9a 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -348,7 +348,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
>  	rte_eth_dev_shared_data_prepare();
> 
>  	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
> -
> +	eth_dev->device = NULL;
> +	eth_dev->intr_handle = NULL;
>  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> 
>  	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> @@ -3439,6 +3440,98 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>  			RTE_MEMZONE_IOVA_CONTIG, align);
>  }
> 
> +int __rte_experimental
> +rte_eth_dev_create(struct rte_device *device, const char *name,
> +	size_t priv_data_size,
> +	ethdev_bus_specific_init ethdev_bus_specific_init,
> +	void *bus_init_params,
> +	ethdev_init_t ethdev_init, void *init_params)
> +{
> +	struct rte_eth_dev *ethdev;
> +	int retval;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		ethdev = rte_eth_dev_allocate(name);
> +		if (!ethdev) {
> +			retval = -ENODEV;
> +			goto probe_failed;
> +		}
> +
> +		if (priv_data_size) {
> +			ethdev->data->dev_private = rte_zmalloc_socket(
> +				name, priv_data_size, RTE_CACHE_LINE_SIZE,
> +				device->numa_node);
> +
> +			if (!ethdev->data->dev_private) {
> +				RTE_LOG(ERR, EAL, "failed to allocate private data");
> +				retval = -ENOMEM;
> +				goto probe_failed;
> +			}
> +		}
> +	} else {
> +		ethdev = rte_eth_dev_attach_secondary(name);
> +		if (!ethdev) {
> +			RTE_LOG(ERR, EAL, "secondary process attach failed, "
> +				"ethdev doesn't exist");
> +			retval = -ENODEV;
> +			goto probe_failed;
> +		}
> +	}
> +
> +	ethdev->device = device;
> +
> +	if (ethdev_bus_specific_init) {
> +		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
> +		if (retval) {
> +			RTE_LOG(ERR, EAL,
> +				"ethdev bus specific initialisation failed");
> +			goto probe_failed;
> +		}
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);

You probably have to do it at the start - before allocating ethdev, etc.

> +	retval = ethdev_init(ethdev, init_params);
> +	if (retval) {
> +		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
> +		goto probe_failed;
> +	}
> +
> +	return retval;
> +probe_failed:
> +	/* free ports private data if primary process */
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		rte_free(ethdev->data->dev_private);
> +
> +	rte_eth_dev_release_port(ethdev);
> +
> +	return retval;
> +}
> +
> +int  __rte_experimental
> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
> +	ethdev_uninit_t ethdev_uninit)
> +{
> +	int ret;
> +
> +	ethdev = rte_eth_dev_allocated(ethdev->data->name);
> +	if (!ethdev)
> +		return -ENODEV;
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_uninit, -EINVAL);
> +	if (ethdev_uninit) {
> +		ret = ethdev_uninit(ethdev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		rte_free(ethdev->data->dev_private);
> +
> +	ethdev->data->dev_private = NULL;
> +
> +	return rte_eth_dev_release_port(ethdev);
> +}
> +
>  int
>  rte_eth_dev_rx_intr_ctl_q(uint16_t port_id, uint16_t queue_id,
>  			  int epfd, int op, void *data)
> diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
> index a406ef123..e52add0ad 100644
> --- a/lib/librte_ether/rte_ethdev_driver.h
> +++ b/lib/librte_ether/rte_ethdev_driver.h
> @@ -188,6 +188,63 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>  #endif
>  }
> 
> +
> +typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
> +typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
> +	void *bus_specific_init_params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function for the creation of a new ethdev ports.
> + *
> + * @param device
> + *  rte_device handle.
> + * @param	name
> + *  port name.
> + * @param priv_data_size
> + *  size of private data required for port.
> + * @param bus_specific_init
> + *  port bus specific initialisation callback function
> + * @param bus_init_params
> + *  port bus specific initialisation parameters
> + * @param ethdev_init
> + *  device specific port initialization callback function
> + * @param init_params
> + *  port initialisation parameters
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_dev_create(struct rte_device *device, const char *name,
> +	size_t priv_data_size,
> +	ethdev_bus_specific_init bus_specific_init, void *bus_init_params,
> +	ethdev_init_t ethdev_init, void *init_params);
> +
> +
> +typedef int (*ethdev_uninit_t)(struct rte_eth_dev *ethdev);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function for cleaing up the resources of a ethdev port on it's
> + * destruction.
> + *
> + * @param ethdev
> + *   ethdev handle of port.
> + * @param ethdev
> + *   device specific port un-initialise callback function
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
> +	ethdev_uninit_t ethdev_uninit);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ether/rte_ethdev_pci.h b/lib/librte_ether/rte_ethdev_pci.h
> index 6565ae7d3..603287c28 100644
> --- a/lib/librte_ether/rte_ethdev_pci.h
> +++ b/lib/librte_ether/rte_ethdev_pci.h
> @@ -70,6 +70,18 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev,
>  	eth_dev->data->numa_node = pci_dev->device.numa_node;
>  }
> 
> +static inline int
> +eth_dev_pci_specific_init(struct rte_eth_dev *eth_dev, void *bus_device) {
> +	struct rte_pci_device *pci_dev = bus_device;
> +
> +	if (!pci_dev)
> +		return -ENODEV;
> +
> +	rte_eth_copy_pci_info(eth_dev, pci_dev);
> +
> +	return 0;
> +}
> +
>  /**
>   * @internal
>   * Allocates a new ethdev slot for an ethernet device and returns the pointer
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index 34df6c8b5..bd7232923 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -229,3 +229,11 @@ EXPERIMENTAL {
>  	rte_mtr_stats_update;
> 
>  } DPDK_17.11;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_eth_dev_create;
> +	rte_eth_dev_destroy;
> +
> +} DPDK_18.05;
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser Declan Doherty
@ 2018-04-20 13:16     ` Ananyev, Konstantin
  2018-04-24 19:53     ` Thomas Monjalon
  1 sibling, 0 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 13:16 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Horton, Remy, Doherty, Declan

> Introduces a new structure, rte_eth_devargs, to support generic
> ethdev arguments common across NET PMDs, with a new API
> rte_eth_devargs_parse API to support PMD parsing these arguments.
> 
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  lib/Makefile                            |   1 +
>  lib/librte_ether/rte_ethdev.c           | 195 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_driver.h    |  30 +++++
>  lib/librte_ether/rte_ethdev_version.map |   1 +
>  4 files changed, 227 insertions(+)
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index ec965a606..4144d99f9 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
>  DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
>  DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
>  DEPDIRS-librte_ether += librte_mbuf
> +DEPDIRS-librte_ether += librte_kvargs
>  DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
>  DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
>  DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 1d38d8e75..a082b211c 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -34,6 +34,7 @@
>  #include <rte_errno.h>
>  #include <rte_spinlock.h>
>  #include <rte_string_fns.h>
> ++#include <rte_kvargs.h>
> 
>  #include "rte_ether.h"
>  #include "rte_ethdev.h"
> @@ -4149,6 +4150,200 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>  	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
>  }
> 
> +typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
> +
> +static int
> +rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
> +{
> +	int state;
> +	struct rte_kvargs_pair *pair;
> +	char *letter;

Hmm, so that extends rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]", right?
If so shouldn't it be in rte_kvargs then?
BTW, as I remember rte_kvargs allows you to have multiple identical key, i.e: "key=val1,key=val2,...,key=valn".
I suppose that approach would allow you to avoid rte_kvargs modifications.

> +
> +	arglist->str = strdup(str_in);

Who is going to free it?

> +	if (arglist->str == NULL)
> +		return -ENOMEM;
> +
> +	letter = arglist->str;
> +	state = 0;
> +	arglist->count = 0;
> +	pair = &arglist->pairs[0];
> +	while (1) {
> +		switch (state) {
> +		case 0: /* Initial */
> +			if (*letter == '=')
> +				return -EINVAL;
> +			else if (*letter == '\0')
> +				return 0;
> +
> +			state = 1;
> +			pair->key = letter;
> +			/* fall-thru */
> +
> +		case 1: /* Parsing key */
> +			if (*letter == '=') {
> +				*letter = '\0';
> +				pair->value = letter + 1;
> +				state = 2;
> +			} else if (*letter == ',' || *letter == '\0')
> +				return -EINVAL;
> +			break;
> +
> +
> +		case 2: /* Parsing value */
> +			if (*letter == '[')
> +				state = 3;
> +			else if (*letter == ',') {
> +				*letter = '\0';
> +				arglist->count++;
> +				pair = &arglist->pairs[arglist->count];
> +				state = 0;
> +			} else if (*letter == '\0') {
> +				letter--;
> +				arglist->count++;
> +				pair = &arglist->pairs[arglist->count];
> +				state = 0;
> +			}
> +			break;
> +
> +		case 3: /* Parsing list */
> +			if (*letter == ']')
> +				state = 2;
> +			else if (*letter == '\0')
> +				return -EINVAL;
> +			break;
> +		}
> +		letter++;
> +	}
> +}
> +
> +static int
> +rte_eth_devargs_parse_list(char *str, rte_eth_devargs_callback_t callback,
> +	void *data)
> +{
> +	char *str_start;
> +	int state;
> +	int result;
> +
> +	if (*str != '[')
> +		/* Single element, not a list */
> +		return callback(str, data);
> +
> +	/* Sanity check, then strip the brackets */
> +	str_start = &str[strlen(str) - 1];
> +	if (*str_start != ']') {
> +		RTE_LOG(ERR, EAL, "(%s): List does not end with ']'", str);
> +		return -EINVAL;
> +	}
> +	str++;
> +	*str_start = '\0';
> +
> +	/* Process list elements */
> +	state = 0;
> +	while (1) {
> +		if (state == 0) {
> +			if (*str == '\0')
> +				break;
> +			if (*str != ',') {
> +				str_start = str;
> +				state = 1;
> +			}
> +		} else if (state == 1) {
> +			if (*str == ',' || *str == '\0') {
> +				if (str > str_start) {
> +					/* Non-empty string fragment */
> +					*str = '\0';
> +					result = callback(str_start, data);
> +					if (result < 0)
> +						return result;
> +				}
> +				state = 0;
> +			}
> +		}
> +		str++;
> +	}
> +	return 0;
> +}
> +
> +static int
> +rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
> +	const uint16_t max_list)
> +{
> +	unsigned int lo;
> +	unsigned int hi;
> +	unsigned int value;
> +	int result;
> +
> +	result = sscanf(str, "%u-%u", &lo, &hi);

Should probably be %hu.
And probably check that lo and hi values are less than MAX_PORTS.

> +	if (result == 1) {
> +		if (*len_list >= max_list)
> +			return -ENOMEM;
> +		list[(*len_list)++] = lo;
> +	} else if (result == 2) {
> +		if (lo >= hi)
> +			return -EINVAL;
> +		for (value = lo; value <= hi; value++) {
> +			if (*len_list >= max_list)
> +				return -ENOMEM;
> +			list[(*len_list)++] = value;
> +		}
> +	} else
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int
> +rte_eth_devargs_parse_ports(char *str, void *data)
> +{
> +	struct rte_eth_devargs *eth_da = data;
> +
> +	return rte_eth_devargs_process_range(str, eth_da->ports,
> +		&eth_da->nb_ports, RTE_MAX_ETHPORTS);
> +}
> +
> +
> +static int
> +rte_eth_devargs_parse_representor_ports(char *str, void *data)
> +{
> +	struct rte_eth_devargs *eth_da = data;
> +
> +	return rte_eth_devargs_process_range(str, eth_da->representor_ports,
> +		&eth_da->nb_representor_ports, RTE_MAX_ETHPORTS);
> +}
> +
> +int __rte_experimental
> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
> +{
> +	struct rte_kvargs args;
> +	struct rte_kvargs_pair *pair;
> +	unsigned int i;
> +	int result;
> +
> +	memset(eth_da, 0, sizeof(*eth_da));
> +
> +	result = rte_eth_devargs_tokenise(&args, dargs);
> +	if (result < 0)
> +		return result;
> +
> +	for (i = 0; i < args.count; i++) {
> +		pair = &args.pairs[i];
> +
> +		if (strcmp("port", pair->key) == 0) {
> +			result = rte_eth_devargs_parse_list(pair->value,
> +				rte_eth_devargs_parse_ports, eth_da);
> +			if (result < 0)
> +				return result;
> +		} else if (strcmp("representor", pair->key) == 0) {
> +			result = rte_eth_devargs_parse_list(pair->value,
> +				rte_eth_devargs_parse_representor_ports,
> +				eth_da);
> +			if (result < 0)
> +				return result;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  RTE_INIT(ethdev_init_log);
>  static void
>  ethdev_init_log(void)
> diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
> index e52add0ad..3bce5747d 100644
> --- a/lib/librte_ether/rte_ethdev_driver.h
> +++ b/lib/librte_ether/rte_ethdev_driver.h
> @@ -189,6 +189,36 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>  }
> 
> 
> +/** Generic Ethernet device arguments  */
> +struct rte_eth_devargs {
> +	uint16_t ports[RTE_MAX_ETHPORTS];
> +	/** port/s number to enable on a multi-port single function */
> +	uint16_t nb_ports;
> +	/** number of ports in ports field */
> +	uint16_t representor_ports[RTE_MAX_ETHPORTS];
> +	/** representor port/s identifier to enable on device */
> +	uint16_t nb_representor_ports;
> +	/** number of ports in representor port field */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function to parse ethdev arguments
> + *
> + * @param devargs
> + *  device arguments
> + * @param eth_devargs
> + *  parsed ethdev specific arguments.
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_devargs);
> +
> +
>  typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
>  typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
>  	void *bus_specific_init_params);
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index bd7232923..62ecbdb8a 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -233,6 +233,7 @@ EXPERIMENTAL {
>  EXPERIMENTAL {
>  	global:
> 
> +	rt_eth_devargs_parse;
>  	rte_eth_dev_create;
>  	rte_eth_dev_destroy;
> 
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator Declan Doherty
@ 2018-04-20 13:22     ` Ananyev, Konstantin
  2018-04-24 19:58     ` Thomas Monjalon
  1 sibling, 0 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 13:22 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Doherty, Declan



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Declan Doherty
> Sent: Monday, April 16, 2018 2:06 PM
> To: dev@dpdk.org
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Shahaf Shuler <shahafs@mellanox.com>; Doherty, Declan <declan.doherty@intel.com>
> Subject: [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c           | 53 +++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_driver.h    | 39 ++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_version.map |  3 ++
>  3 files changed, 95 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index a082b211c..d1f95161f 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -4300,6 +4300,59 @@ rte_eth_devargs_parse_ports(char *str, void *data)
>  		&eth_da->nb_ports, RTE_MAX_ETHPORTS);
>  }
> 
> +/**
> + * A set of values to describe the possible states of a switch domain.
> + */
> +enum rte_eth_switch_domain_state {
> +	RTE_ETH_SWITCH_DOMAIN_UNUSED = 0,
> +	RTE_ETH_SWITCH_DOMAIN_ALLOCATED
> +};
> +
> +/**
> + * Array of switch domains available for allocation. Array is sized to
> + * RTE_MAX_ETHPORTS elements as there cannot be more active switch domains than
> + * ethdev ports in a single process.
> + */
> +struct rte_eth_dev_switch {
> +	enum rte_eth_switch_domain_state state;
> +} rte_eth_switch_domains[RTE_MAX_ETHPORTS];

Probably already discussed before, but  if we can't have more than one switch_id per port,
while we can't use port_id as switch_id?

> +
> +int __rte_experimental
> +rte_eth_switch_domain_alloc(uint16_t *domain_id)
> +{
> +	unsigned int i;
> +
> +	*domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> +
> +	for (i = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID + 1;
> +		i < RTE_MAX_ETHPORTS; i++) {
> +		if (rte_eth_switch_domains[i].state ==
> +			RTE_ETH_SWITCH_DOMAIN_UNUSED) {
> +			rte_eth_switch_domains[i].state =
> +				RTE_ETH_SWITCH_DOMAIN_ALLOCATED;
> +			*domain_id = i;
> +			return 0;
> +		}
> +	}

So all we need for state is just one status bit (occupied/free)?
Wouldn't bitmap do then?

> +
> +	return -ENOSPC;
> +}
> +
> +int __rte_experimental
> +rte_eth_switch_domain_free(uint16_t domain_id)
> +{
> +	if (domain_id == RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID ||
> +		domain_id >= RTE_MAX_ETHPORTS)
> +		return -EINVAL;
> +
> +	if (rte_eth_switch_domains[domain_id].state !=
> +		RTE_ETH_SWITCH_DOMAIN_ALLOCATED)
> +		return -EINVAL;
> +
> +	rte_eth_switch_domains[domain_id].state = RTE_ETH_SWITCH_DOMAIN_UNUSED;
> +
> +	return 0;
> +}
> 
>  static int
>  rte_eth_devargs_parse_representor_ports(char *str, void *data)
> diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
> index 3bce5747d..c22fcbde1 100644
> --- a/lib/librte_ether/rte_ethdev_driver.h
> +++ b/lib/librte_ether/rte_ethdev_driver.h
> @@ -188,6 +188,45 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>  #endif
>  }
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate an unique switch domain identifier.
> + *
> + * A pool of switch domain identifiers which can be allocated on request. This
> + * will enabled devices which support the concept of switch domains to request
> + * a switch domain id which is guaranteed to be unique from other devices
> + * running in the same process.
> + *
> + * @param domain_id
> + *  switch domain identifier parameter to pass back to application
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_switch_domain_alloc(uint16_t *domain_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Free switch domain.
> + *
> + * Return a switch domain identifier to the pool of free identifiers after it is
> + * no longer in use by device.
> + *
> + * @param domain_id
> + *  switch domain identifier to free
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_switch_domain_free(uint16_t domain_id);
> +
> +
> 
>  /** Generic Ethernet device arguments  */
>  struct rte_eth_devargs {
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index 62ecbdb8a..6601ef106 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -236,5 +236,8 @@ EXPERIMENTAL {
>  	rt_eth_devargs_parse;
>  	rte_eth_dev_create;
>  	rte_eth_dev_destroy;
> +	rte_eth_switch_domain_alloc;
> +	rte_eth_switch_domain_free;
> +	rte_eth_switch_domains;
> 
>  } DPDK_18.05;
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 9/9] net/ixgbe: add support for representor ports
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 9/9] net/ixgbe: " Declan Doherty
@ 2018-04-20 13:29     ` Ananyev, Konstantin
  0 siblings, 0 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 13:29 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Doherty, Declan, Awal, Mohammad Abdul, Horton, Remy


> --- /dev/null
> +++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
> @@ -0,0 +1,217 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation.
> + */
> +
> +#include <rte_ethdev.h>
> +#include <rte_pci.h>
> +#include <rte_malloc.h>
> +
> +#include "base/ixgbe_type.h"
> +#include "base/ixgbe_vf.h"
> +#include "ixgbe_ethdev.h"
> +#include "ixgbe_rxtx.h"
> +#include "rte_pmd_ixgbe.h"
> +
> +
> +static int
> +ixgbe_vf_representor_link_update(struct rte_eth_dev *ethdev,
> +	int wait_to_complete)
> +{
> +	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
> +
> +	return ixgbe_dev_link_update_share(representor->pf_ethdev,
> +		wait_to_complete, 1);
> +}
> +
> +static int
> +ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
> +	struct ether_addr *mac_addr)
> +{
> +	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
> +
> +	return rte_pmd_ixgbe_set_vf_mac_addr(
> +		representor->pf_ethdev->data->port_id,
> +		representor->vf_id, mac_addr);
> +}
> +
> +static void
> +ixgbe_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
> +	struct rte_eth_dev_info *dev_info)
> +{
> +	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
> +
> +	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(
> +		representor->pf_ethdev->data->dev_private);
> +
> +	dev_info->device = representor->pf_ethdev->device;
> +
> +	dev_info->min_rx_bufsize = 1024;
> +	/**< Minimum size of RX buffer. */
> +	dev_info->max_rx_pktlen = 9728;
> +	/**< Maximum configurable length of RX pkt. */
> +	dev_info->max_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
> +	/**< Maximum number of RX queues. */
> +	dev_info->max_tx_queues = IXGBE_VF_MAX_TX_QUEUES;
> +	/**< Maximum number of TX queues. */

Sort of generic question - for representor ports that do only control path -
shouldn't we have max_rx_queues=max_tx_queues=0, zero and make
queue_setup/rx_burst/tx_burst, etc. to return an error? 

> +
> +	dev_info->max_mac_addrs = hw->mac.num_rar_entries;
> +	/**< Maximum number of MAC addresses. */
> +
> +	dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
> +		DEV_RX_OFFLOAD_IPV4_CKSUM |	DEV_RX_OFFLOAD_UDP_CKSUM  |
> +		DEV_RX_OFFLOAD_TCP_CKSUM;
> +	/**< Device RX offload capabilities. */
> +
> +	dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
> +		DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM |
> +		DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_SCTP_CKSUM |
> +		DEV_TX_OFFLOAD_TCP_TSO;
> +	/**< Device TX offload capabilities. */
> +
> +	dev_info->speed_capa =
> +		representor->pf_ethdev->data->dev_link.link_speed;
> +	/**< Supported speeds bitmap (ETH_LINK_SPEED_). */
> +
> +	dev_info->switch_info.name =
> +		representor->pf_ethdev->device->name;
> +	dev_info->switch_info.domain_id = representor->switch_domain_id;
> +	dev_info->switch_info.port_id = representor->vf_id;
> +}
> +
> +static int ixgbe_vf_representor_dev_configure(
> +		__rte_unused struct rte_eth_dev *dev)
> +{
> +	return 0;
> +}
> +
> +static int ixgbe_vf_representor_rx_queue_setup(
> +	__rte_unused struct rte_eth_dev *dev,
> +	__rte_unused uint16_t rx_queue_id,
> +	__rte_unused uint16_t nb_rx_desc,
> +	__rte_unused unsigned int socket_id,
> +	__rte_unused const struct rte_eth_rxconf *rx_conf,
> +	__rte_unused struct rte_mempool *mb_pool)
> +{
> +	return 0;
> +}
> +
> +static int ixgbe_vf_representor_tx_queue_setup(
> +	__rte_unused struct rte_eth_dev *dev,
> +	__rte_unused uint16_t rx_queue_id,
> +	__rte_unused uint16_t nb_rx_desc,
> +	__rte_unused unsigned int socket_id,
> +	__rte_unused const struct rte_eth_txconf *tx_conf)
> +{
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 2/9] ethdev: add switch identifier parameter to port
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 2/9] ethdev: add switch identifier parameter to port Declan Doherty
@ 2018-04-24 16:38     ` Thomas Monjalon
  0 siblings, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-24 16:38 UTC (permalink / raw)
  To: Declan Doherty; +Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

16/04/2018 15:05, Declan Doherty:
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> +/**
> + * Default values for switch domain id when ethdev does not support switch
> + * domain definitions.

values -> value

> + */
> +#define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID	(0)
> +
> +/**
> + * Ethernet device associated switch information
> + */
> +struct rte_eth_switch_info {
> +	const char *name;	/**< switch name */
> +	uint16_t domain_id;	/**< switch domain id */
> +	uint16_t port_id;	/**< switch port id */

I feel we need more details about what is the "switch port id".

[...]
> @@ -1054,6 +1069,8 @@ struct rte_eth_dev_info {
>  	struct rte_eth_dev_portconf default_rxportconf;
>  	/** Tx parameter recommendations */
>  	struct rte_eth_dev_portconf default_txportconf;
> +	/** ethdev switch information */

Can we reword it to express that it is about the hardware
built-in switch hard wired to this port?

> +	struct rte_eth_switch_info switch_info;
>  };

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs
  2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
  2018-04-20 13:01     ` Ananyev, Konstantin
@ 2018-04-24 17:48     ` Thomas Monjalon
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-24 17:48 UTC (permalink / raw)
  To: Declan Doherty; +Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

16/04/2018 15:05, Declan Doherty:
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -348,7 +348,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
>  	rte_eth_dev_shared_data_prepare();
>  
>  	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
> -
> +	eth_dev->device = NULL;
> +	eth_dev->intr_handle = NULL;
>  	eth_dev->state = RTE_ETH_DEV_UNUSED;

Shouldn't it be in a separate patch with a proper explanation?


> + * @param device
> + *  rte_device handle.
> + * @param	name

There is a tab between param and name.


> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -229,3 +229,11 @@ EXPERIMENTAL {
>  	rte_mtr_stats_update;
>  
>  } DPDK_17.11;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_eth_dev_create;
> +	rte_eth_dev_destroy;
> +
> +} DPDK_18.05;

There is already an EXPERIMENTAL block.
Maybe you need to rebase.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag Declan Doherty
@ 2018-04-24 19:37     ` Thomas Monjalon
  2018-04-25 12:17       ` Doherty, Declan
  0 siblings, 1 reply; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-24 19:37 UTC (permalink / raw)
  To: Declan Doherty, qi.z.zhang
  Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

16/04/2018 15:06, Declan Doherty:
> Add new device flag to specify that an ethdev port is a port representor.
> Extend rte_eth_dev_info structure to expose device flags to the user which
> enables applications to discover if a port is a representor port.
[...]
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2431,6 +2431,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>  	dev_info->driver_name = dev->device->driver->name;
>  	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>  	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> +
> +	dev_info->dev_flags = dev->data->dev_flags;
>  }
[...]
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1032,6 +1032,7 @@ struct rte_eth_dev_info {
>  	const char *driver_name; /**< Device Driver name. */
>  	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>  		Use if_indextoname() to translate into an interface name. */
> +	uint32_t dev_flags; /**< Device flags */

A similar field has been added recently:

	http://dpdk.org/browse/next/dpdk-next-net/tree/lib/librte_ether/rte_ethdev.h#n1074

	/** Generic device capabilities */
	uint64_t dev_capa;

It is for flags DEV_CAPA_*
Note that the prefix should be fixed to RTE_ETH_DEV,
and the doxygen comment should mention the flags prefix.
Qi, please fix.

I think dev_capa and dev_flags are the same thing.
They could be merged.

>  /** Device supports link state interrupt */
> -#define RTE_ETH_DEV_INTR_LSC     0x0002
> +#define RTE_ETH_DEV_INTR_LSC		0x0002
>  /** Device is a bonded slave */
> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
>  /** Device supports device removal interrupt */
> -#define RTE_ETH_DEV_INTR_RMV     0x0008
> +#define RTE_ETH_DEV_INTR_RMV		0x0008
> +/** Device is port representor */
> +#define RTE_ETH_DEV_REPRESENTOR		0x0010

It seems you tried to re-align but it fails.
Better to use spaces for alignment.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser Declan Doherty
  2018-04-20 13:16     ` Ananyev, Konstantin
@ 2018-04-24 19:53     ` Thomas Monjalon
  2018-04-25  9:40       ` Remy Horton
  1 sibling, 1 reply; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-24 19:53 UTC (permalink / raw)
  To: Declan Doherty, Remy Horton
  Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

16/04/2018 15:06, Declan Doherty:
> From: Remy Horton <remy.horton@intel.com>
> 
> Introduces a new structure, rte_eth_devargs, to support generic
> ethdev arguments common across NET PMDs, with a new API
> rte_eth_devargs_parse API to support PMD parsing these arguments.

Most of the parsing functions should be removed when the new devargs
framework will be ready.
So I won't look specifically at this code.

But I would like to review the devargs you are standardizing.
Unfortunately, I cannot find a documentation about it.
How users are supposed to use it?
Can you, at least, describe the syntax in the commit log, please?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator Declan Doherty
  2018-04-20 13:22     ` Ananyev, Konstantin
@ 2018-04-24 19:58     ` Thomas Monjalon
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-24 19:58 UTC (permalink / raw)
  To: Declan Doherty; +Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

16/04/2018 15:06, Declan Doherty:
> +/**
> + * Array of switch domains available for allocation. Array is sized to
> + * RTE_MAX_ETHPORTS elements as there cannot be more active switch domains than
> + * ethdev ports in a single process.
> + */
> +struct rte_eth_dev_switch {
> +	enum rte_eth_switch_domain_state state;
> +} rte_eth_switch_domains[RTE_MAX_ETHPORTS];
[...]
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -236,5 +236,8 @@ EXPERIMENTAL {
>  	rt_eth_devargs_parse;
>  	rte_eth_dev_create;
>  	rte_eth_dev_destroy;
> +	rte_eth_switch_domain_alloc;
> +	rte_eth_switch_domain_free;
> +	rte_eth_switch_domains;

Why the table rte_eth_switch_domains is exported?
Can we use an iterator function + macro instead?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser
  2018-04-24 19:53     ` Thomas Monjalon
@ 2018-04-25  9:40       ` Remy Horton
  2018-04-25 10:06         ` Thomas Monjalon
  0 siblings, 1 reply; 73+ messages in thread
From: Remy Horton @ 2018-04-25  9:40 UTC (permalink / raw)
  To: Thomas Monjalon, Declan Doherty
  Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler


On 24/04/2018 20:53, Thomas Monjalon wrote:
[..]
> But I would like to review the devargs you are standardizing.
> Unfortunately, I cannot find a documentation about it.
> How users are supposed to use it?
> Can you, at least, describe the syntax in the commit log, please?

The patch follows this pseudo-BNF:

cfg   := pair (',' pair)*
pair  := (key '=' value)
key   := 'port' | 'representor'
value := range | list
range := int ('-' int)?
int   := [0-9]+
list  := '[' range (',' range)* ']'

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser
  2018-04-25  9:40       ` Remy Horton
@ 2018-04-25 10:06         ` Thomas Monjalon
  2018-04-25 10:45           ` Remy Horton
  0 siblings, 1 reply; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-25 10:06 UTC (permalink / raw)
  To: Remy Horton
  Cc: Declan Doherty, dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

25/04/2018 11:40, Remy Horton:
> 
> On 24/04/2018 20:53, Thomas Monjalon wrote:
> [..]
> > But I would like to review the devargs you are standardizing.
> > Unfortunately, I cannot find a documentation about it.
> > How users are supposed to use it?
> > Can you, at least, describe the syntax in the commit log, please?
> 
> The patch follows this pseudo-BNF:
> 
> cfg   := pair (',' pair)*
> pair  := (key '=' value)
> key   := 'port' | 'representor'
> value := range | list
> range := int ('-' int)?
> int   := [0-9]+
> list  := '[' range (',' range)* ']'

OK
Please can you add it as a comment in the parsing code?

We will need one or two examples in the commit message too.

Can you show a complete command line please?
How do you give ethdev properties without the new devargs syntax
(in progress by Gaetan)?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser
  2018-04-25 10:06         ` Thomas Monjalon
@ 2018-04-25 10:45           ` Remy Horton
  0 siblings, 0 replies; 73+ messages in thread
From: Remy Horton @ 2018-04-25 10:45 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Declan Doherty, dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler


On 25/04/2018 11:06, Thomas Monjalon wrote:
> 25/04/2018 11:40, Remy Horton:
>>
>> On 24/04/2018 20:53, Thomas Monjalon wrote:
[..]
> OK
> Please can you add it as a comment in the parsing code?
>
> We will need one or two examples in the commit message too.

Docs are being updated, so it should be in the v8 patchset.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag
  2018-04-24 19:37     ` Thomas Monjalon
@ 2018-04-25 12:17       ` Doherty, Declan
  2018-04-25 12:23         ` Thomas Monjalon
  0 siblings, 1 reply; 73+ messages in thread
From: Doherty, Declan @ 2018-04-25 12:17 UTC (permalink / raw)
  To: Thomas Monjalon, qi.z.zhang
  Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

On 24/04/2018 8:37 PM, Thomas Monjalon wrote:
> 16/04/2018 15:06, Declan Doherty:
>> Add new device flag to specify that an ethdev port is a port representor.
>> Extend rte_eth_dev_info structure to expose device flags to the user which
>> enables applications to discover if a port is a representor port.
> [...]
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -2431,6 +2431,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>>   	dev_info->driver_name = dev->device->driver->name;
>>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>> +
>> +	dev_info->dev_flags = dev->data->dev_flags;
>>   }
> [...]
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -1032,6 +1032,7 @@ struct rte_eth_dev_info {
>>   	const char *driver_name; /**< Device Driver name. */
>>   	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>>   		Use if_indextoname() to translate into an interface name. */
>> +	uint32_t dev_flags; /**< Device flags */
> 
> A similar field has been added recently:
> 
> 	http://dpdk.org/browse/next/dpdk-next-net/tree/lib/librte_ether/rte_ethdev.h#n1074
> 
> 	/** Generic device capabilities */
> 	uint64_t dev_capa;
> 
> It is for flags DEV_CAPA_*
> Note that the prefix should be fixed to RTE_ETH_DEV,
> and the doxygen comment should mention the flags prefix.
> Qi, please fix.
> 
> I think dev_capa and dev_flags are the same thing.
> They could be merged.

Do you have a preference for which one to keep, as dev_flags within
rte_eth_dev_data is widely used by PMDs and passing this same 
information out through rte_eth_dev_info makes sense to me?

> 
>>   /** Device supports link state interrupt */
>> -#define RTE_ETH_DEV_INTR_LSC     0x0002
>> +#define RTE_ETH_DEV_INTR_LSC		0x0002
>>   /** Device is a bonded slave */
>> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
>> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
>>   /** Device supports device removal interrupt */
>> -#define RTE_ETH_DEV_INTR_RMV     0x0008
>> +#define RTE_ETH_DEV_INTR_RMV		0x0008
>> +/** Device is port representor */
>> +#define RTE_ETH_DEV_REPRESENTOR		0x0010
> 
> It seems you tried to re-align but it fails.
> Better to use spaces for alignment.
> 
sure will fix.
> 
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag
  2018-04-25 12:17       ` Doherty, Declan
@ 2018-04-25 12:23         ` Thomas Monjalon
  0 siblings, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-25 12:23 UTC (permalink / raw)
  To: Doherty, Declan
  Cc: qi.z.zhang, dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler

25/04/2018 14:17, Doherty, Declan:
> On 24/04/2018 8:37 PM, Thomas Monjalon wrote:
> > I think dev_capa and dev_flags are the same thing.
> > They could be merged.
> 
> Do you have a preference for which one to keep, as dev_flags within
> rte_eth_dev_data is widely used by PMDs and passing this same 
> information out through rte_eth_dev_info makes sense to me?

I think a big cleanup in ethdev structures is required.
We could avoid copying fields from one struct to the other.
But I don't want to block this patch, so go ahead and we will clean
this mess later.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation
  2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
                     ` (8 preceding siblings ...)
  2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 9/9] net/ixgbe: " Declan Doherty
@ 2018-04-26 10:40   ` Declan Doherty
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 1/9] doc: add switch representation documentation Declan Doherty
                       ` (9 more replies)
  9 siblings, 10 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:40 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty

This patchset follows on from the port rerpesentor patchsets and the
community discussion that resulted. It outlines the model for
representing and controlling switching capable devices in a new
programmer's guide entry based upon the excellent summary by
Adrien Mazarguil in
(http://dpdk.org/ml/archives/dev/2018-March/092513.html).

The next patches introduce changes to librte_ether to:
1, support the definition of a switch domain and make it public to
application through the rte_eth_dev_info structure.
2, Add generic ethdev create/destroy APIs to facilitate and generalise the
creation of ethdev's on different bus types.
3, Add ethdev attribute to dev_flags to specify that a port is a
representor port and make public through the rte_eth_dev_info
structure.
4, Add devargs parsing for generic eth_devargs to facilate parsing in
NET PMDs. This will be refactored to take account of the changes in
(http://dpdk.org/ml/archives/dev/2018-March/092513.html)
5, Add new API to allocate switch domain ids to devices which support
this feature.

This patchset also includes the enablement of vf port representor for ixgbe
and i40e PF devices.

V8:
- add detailed descriptions to switch information structures
- fix err condition checking ethdev create helper function
- fix devargs memory leak and error checking + add documentation on
  ethdev args.
- remove rte_eth_switch_domains structure from export items.

V7:

This patch address the following changes:
 - fixes in documentation patch
 - changes the default value of switch domain id to be INVALID to allow
   applications to easily identify devices which can/cannot support the
   concept. Updates the switch information available through the
   rte_eth_dev_info structure.
 - remove the rte_ethdev_representor.h header and leave representor
   specific initialisation to driver
 - add new APIs for allocating and freeing switch domain identifier to
   enable PMDs to have unique switch domaind ids without the ethdev
   infrastructure placing any restriction on how theses are managed by
   devices.
 - bug fix in ethdev args parsing code.

Declan Doherty (8):
  doc: add switch representation documentation
  ethdev: add switch identifier parameter to port
  ethdev: add generic create/destroy ethdev APIs
  ethdev: Add port representor device flag
  app/testpmd: add port name to device info
  ethdev: add switch domain allocator
  net/i40e: add support for representor ports
  net/ixgbe: add support for representor ports

Remy Horton (1):
  ethdev: add common devargs parser

 app/test-pmd/config.c                           |  15 +
 doc/guides/nics/i40e.rst                        |  15 +
 doc/guides/nics/ixgbe.rst                       |  14 +
 doc/guides/prog_guide/index.rst                 |   1 +
 doc/guides/prog_guide/poll_mode_drv.rst         |  19 +
 doc/guides/prog_guide/switch_representation.rst | 837 ++++++++++++++++++++++++
 drivers/net/i40e/Makefile                       |   3 +
 drivers/net/i40e/i40e_ethdev.c                  |  82 ++-
 drivers/net/i40e/i40e_ethdev.h                  |  16 +
 drivers/net/i40e/i40e_vf_representor.c          | 405 ++++++++++++
 drivers/net/i40e/meson.build                    |   4 +-
 drivers/net/i40e/rte_pmd_i40e.c                 |  43 ++
 drivers/net/i40e/rte_pmd_i40e.h                 |  18 +
 drivers/net/ixgbe/Makefile                      |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c                |  80 ++-
 drivers/net/ixgbe/ixgbe_ethdev.h                |  14 +
 drivers/net/ixgbe/ixgbe_pf.c                    |   7 +
 drivers/net/ixgbe/ixgbe_vf_representor.c        | 217 ++++++
 drivers/net/ixgbe/meson.build                   |   1 +
 lib/Makefile                                    |   1 +
 lib/librte_ether/rte_ethdev.c                   | 331 ++++++++++
 lib/librte_ether/rte_ethdev.h                   |  30 +
 lib/librte_ether/rte_ethdev_driver.h            | 126 ++++
 lib/librte_ether/rte_ethdev_pci.h               |  12 +
 lib/librte_ether/rte_ethdev_version.map         |   5 +
 25 files changed, 2280 insertions(+), 17 deletions(-)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c

-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 1/9] doc: add switch representation documentation
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
@ 2018-04-26 10:40     ` Declan Doherty
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port Declan Doherty
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:40 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty, Adrien Mazarguil

Add document to describe the  model for representing switching capable
devices in DPDK, using a general ethdev port model and through port
representors. This document also details the port model and the
rte_flow semantics required for flow programming, as well as listing
some example use cases.

Signed-off-by: Adrien Mazarguil <adrien.mazaguil@6wind.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>
---
 doc/guides/prog_guide/index.rst                 |   1 +
 doc/guides/prog_guide/switch_representation.rst | 837 ++++++++++++++++++++++++
 2 files changed, 838 insertions(+)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 589c05d96..235ad0201 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -17,6 +17,7 @@ Programmer's Guide
     mbuf_lib
     poll_mode_drv
     rte_flow
+    switch_representation
     traffic_metering_and_policing
     traffic_management
     bbdev
diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
new file mode 100644
index 000000000..f5ee516f6
--- /dev/null
+++ b/doc/guides/prog_guide/switch_representation.rst
@@ -0,0 +1,837 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 6WIND S.A.
+
+.. _switch_representation:
+
+Switch Representation within DPDK Applications
+==============================================
+
+.. contents:: :local:
+
+Introduction
+------------
+
+Network adapters with multiple physical ports and/or SR-IOV capabilities
+usually support the offload of traffic steering rules between their virtual
+functions (VFs), physical functions (PFs) and ports.
+
+Like for standard Ethernet switches, this involves a combination of
+automatic MAC learning and manual configuration. For most purposes it is
+managed by the host system and fully transparent to users and applications.
+
+On the other hand, applications typically found on hypervisors that process
+layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
+according on their own criteria.
+
+Without a standard software interface to manage traffic steering rules
+between VFs, PFs and the various physical ports of a given device,
+applications cannot take advantage of these offloads; software processing is
+mandatory even for traffic which ends up re-injected into the device it
+originates from.
+
+This document describes how such steering rules can be configured through
+the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
+(PF/VF steering) using a single physical port for clarity, however the same
+logic applies to any number of ports without necessarily involving SR-IOV.
+
+Port Representors
+-----------------
+
+In many cases, traffic steering rules cannot be determined in advance;
+applications usually have to process a bit of traffic in software before
+thinking about offloading specific flows to hardware.
+
+Applications therefore need the ability to receive and inject traffic to
+various device endpoints (other VFs, PFs or physical ports) before
+connecting them together. Device drivers must provide means to hook the
+"other end" of these endpoints and to refer them when configuring flow
+rules.
+
+This role is left to so-called "port representors" (also known as "VF
+representors" in the specific context of VFs), which are to DPDK what the
+Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
+which can be thought as a software "patch panel" front-end for applications.
+
+- DPDK port representors are implemented as additional virtual Ethernet
+  device (**ethdev**) instances, spawned on an as needed basis through
+  configuration parameters passed to the driver of the underlying
+  device using devargs.
+
+::
+
+   -w pci:dbdf,representor=0
+   -w pci:dbdf,representor=[0-3]
+   -w pci:dbdf,representor=[0,5-11]
+
+- As virtual devices, they may be more limited than their physical
+  counterparts, for instance by exposing only a subset of device
+  configuration callbacks and/or by not necessarily having Rx/Tx capability.
+
+- Among other things, they can be used to assign MAC addresses to the
+  resource they represent.
+
+- Applications can tell port representors apart from other physical of virtual
+  port by checking the dev_flags field within their device information
+  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.
+
+.. code-block:: c
+
+  struct rte_eth_dev_info {
+      ...
+      uint32_t dev_flags; /**< Device flags */
+      ...
+  };
+
+- The device or group relationship of ports can be discovered using the
+  switch ``domain_id`` field within the devices switch information structure. By
+  default the switch ``domain_id`` of a port will be
+  ``RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID`` to indicate that the port doesn't
+  support the concept of a switch domain, but ports which do support the concept
+  will be allocated a unique switch ``domain_id``, ports within the same switch
+  domain will share the same ``domain_id``. The switch ``port_id`` is used to
+  specify the port_id in terms of the switch, so in the case of SR-IOV devices
+  the switch ``port_id`` would represent the virtual function identifier of the
+  port.
+
+.. code-block:: c
+
+   /**
+    * Ethernet device associated switch information
+    */
+   struct rte_eth_switch_info {
+       const char *name; /**< switch name */
+       uint16_t domain_id; /**< switch domain id */
+       uint16_t port_id; /**< switch port id */
+   };
+
+
+.. [1] `Ethernet switch device driver model (switchdev)
+       <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
+
+Basic SR-IOV
+------------
+
+"Basic" in the sense that it is not managed by applications, which
+nonetheless expect traffic to flow between the various endpoints and the
+outside as if everything was linked by an Ethernet hub.
+
+The following diagram pictures a setup involving a device with one PF, two
+VFs and one shared physical port
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+----------'                 `----------+--' `--+----------'
+          |                                       |       |
+    .-----+-----.                                 |       |
+    | port_id 3 |                                 |       |
+    `-----+-----'                                 |       |
+          |                                       |       |
+        .-+--.                                .---+--. .--+---.
+        | PF |                                | VF 1 | | VF 2 |
+        `-+--'                                `---+--' `--+---'
+          |                                       |       |
+          `---------.     .-----------------------'       |
+                    |     |     .-------------------------'
+                    |     |     |
+                 .--+-----+-----+--.
+                 | interconnection |
+                 `--------+--------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- A DPDK application running on the hypervisor owns the PF device, which is
+  arbitrarily assigned port index 3.
+
+- Both VFs are assigned to VMs and used by unknown applications; they may be
+  DPDK-based or anything else.
+
+- Interconnection is not necessarily done through a true Ethernet switch and
+  may not even exist as a separate entity. The role of this block is to show
+  that something brings PF, VFs and physical ports together and enables
+  communication between them, with a number of built-in restrictions.
+
+Subsequent sections in this document describe means for DPDK applications
+running on the hypervisor to freely assign specific flows between PF, VFs
+and physical ports based on traffic properties, by managing this
+interconnection.
+
+Controlled SR-IOV
+-----------------
+
+Initialization
+~~~~~~~~~~~~~~
+
+When a DPDK application gets assigned a PF device and is deliberately not
+started in `basic SR-IOV`_ mode, any traffic coming from physical ports is
+received by PF according to default rules, while VFs remain isolated.
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+----------'                 `----------+--' `--+----------'
+          |                                       |       |
+    .-----+-----.                                 |       |
+    | port_id 3 |                                 |       |
+    `-----+-----'                                 |       |
+          |                                       |       |
+        .-+--.                                .---+--. .--+---.
+        | PF |                                | VF 1 | | VF 2 |
+        `-+--'                                `------' `------'
+          |
+          `-----.
+                |
+             .--+----------------------.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+In this mode, interconnection must be configured by the application to
+enable VF communication, for instance by explicitly directing traffic with a
+given destination MAC address to VF 1 and allowing that with the same source
+MAC address to come out of it.
+
+For this to work, hypervisor applications need a way to refer to either VF 1
+or VF 2 in addition to the PF. This is addressed by `VF representors`_.
+
+VF Representors
+~~~~~~~~~~~~~~~
+
+VF representors are virtual but standard DPDK network devices (albeit with
+limited capabilities) created by PMDs when managing a PF device.
+
+Since they represent VF instances used by other applications, configuring
+them (e.g. assigning a MAC address or setting up promiscuous mode) affects
+interconnection accordingly. If supported, they may also be used as two-way
+communication ports with VFs (assuming **switchdev** topology)
+
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          |             |             |           |       |
+    .-----+-----. .-----+-----. .-----+-----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
+          |             |             |           |       |
+          |             |   .---------'           |       |
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .----+-----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- VF representors are assigned arbitrary port indices 4 and 5 in the
+  hypervisor application and are respectively associated with VF 1 and VF 2.
+
+- They can't be dissociated; even if VF 1 and VF 2 were not connected,
+  representors could still be used for configuration.
+
+- In this context, port index 3 can be thought as a representor for physical
+  port 0.
+
+As previously described, the "interconnection" block represents a logical
+concept. Interconnection occurs when hardware configuration enables traffic
+flows from one place to another (e.g. physical port 0 to VF 1) according to
+some criteria.
+
+This is discussed in more detail in `traffic steering`_.
+
+Traffic Steering
+~~~~~~~~~~~~~~~~
+
+In the following diagram, each meaningful traffic origin or endpoint as seen
+by the hypervisor application is tagged with a unique letter from A to F.
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          |             |             |           |       |
+    .----(A)----. .----(B)----. .----(C)----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+          |             |             |           |       |
+          |             |   .---------'           |       |
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                          |
+                     .---(F)----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+- **A**: PF device.
+- **B**: port representor for VF 1.
+- **C**: port representor for VF 2.
+- **D**: VF 1 proper.
+- **E**: VF 2 proper.
+- **F**: physical port.
+
+Although uncommon, some devices do not enforce a one to one mapping between
+PF and physical ports. For instance, by default all ports of **mlx4**
+adapters are available to all their PF/VF instances, in which case
+additional ports appear next to **F** in the above diagram.
+
+Assuming no interconnection is provided by default in this mode, setting up
+a `basic SR-IOV`_ configuration involving physical port 0 could be broken
+down as:
+
+PF:
+
+- **A to F**: let everything through.
+- **F to A**: PF MAC as destination.
+
+VF 1:
+
+- **A to D**, **E to D** and **F to D**: VF 1 MAC as destination.
+- **D to A**: VF 1 MAC as source and PF MAC as destination.
+- **D to E**: VF 1 MAC as source and VF 2 MAC as destination.
+- **D to F**: VF 1 MAC as source.
+
+VF 2:
+
+- **A to E**, **D to E** and **F to E**: VF 2 MAC as destination.
+- **E to A**: VF 2 MAC as source and PF MAC as destination.
+- **E to D**: VF 2 MAC as source and VF 1 MAC as destination.
+- **E to F**: VF 2 MAC as source.
+
+Devices may additionally support advanced matching criteria such as
+IPv4/IPv6 addresses or TCP/UDP ports.
+
+The combination of matching criteria with target endpoints fits well with
+**rte_flow** [6]_, which expresses flow rules as combinations of patterns
+and actions.
+
+Enhancing **rte_flow** with the ability to make flow rules match and target
+these endpoints provides a standard interface to manage their
+interconnection without introducing new concepts and whole new API to
+implement them. This is described in `flow API (rte_flow)`_.
+
+.. [6] `Generic flow API (rte_flow)
+       <http://dpdk.org/doc/guides/prog_guide/rte_flow.html>`_
+
+Flow API (rte_flow)
+-------------------
+
+Extensions
+~~~~~~~~~~
+
+Compared to creating a brand new dedicated interface, **rte_flow** was
+deemed flexible enough to manage representor traffic only with minor
+extensions:
+
+- Using physical ports, PF, VF or port representors as targets.
+
+- Affecting traffic that is not necessarily addressed to the DPDK port ID a
+  flow rule is associated with (e.g. forcing VF traffic redirection to PF).
+
+For advanced uses:
+
+- Rule-based packet counters.
+
+- The ability to combine several identical actions for traffic duplication
+  (e.g. VF representor in addition to a physical port).
+
+- Dedicated actions for traffic encapsulation / decapsulation before
+  reaching an endpoint.
+
+Traffic Direction
+~~~~~~~~~~~~~~~~~
+
+From an application standpoint, "ingress" and "egress" flow rule attributes
+apply to the DPDK port ID they are associated with. They select a traffic
+direction for matching patterns, but have no impact on actions.
+
+When matching traffic coming from or going to a different place than the
+immediate port ID a flow rule is associated with, these attributes keep
+their meaning while applying to the chosen origin, as highlighted by the
+following diagram
+
+::
+
+       .-------------.                 .-------------. .-------------.
+       | hypervisor  |                 |    VM 1     | |    VM 2     |
+       | application |                 | application | | application |
+       `--+---+---+--'                 `----------+--' `--+----------'
+          |   |   |                               |       |
+          |   |   `-------------------.           |       |
+          |   `---------.             |           |       |
+          | ^           | ^           | ^         |       |
+          | | ingress   | | ingress   | | ingress |       |
+          | | egress    | | egress    | | egress  |       |
+          | v           | v           | v         |       |
+    .----(A)----. .----(B)----. .----(C)----.     |       |
+    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+    `-----+-----' `-----+-----' `-----+-----'     |       |
+          |             |             |           |       |
+        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+          |             |             |         ^ |       | ^
+          |             |             |  egress | |       | | egress
+          |             |             | ingress | |       | | ingress
+          |             |   .---------'         v |       | v
+          `-----.       |   |   .-----------------'       |
+                |       |   |   |   .---------------------'
+                |       |   |   |   |
+             .--+-------+---+---+---+--.
+             | managed interconnection |
+             `------------+------------'
+                        ^ |
+                ingress | |
+                 egress | |
+                        v |
+                     .---(F)----.
+                     | physical |
+                     |  port 0  |
+                     `----------'
+
+Ingress and egress are defined as relative to the application creating the
+flow rule.
+
+For instance, matching traffic sent by VM 2 would be done through an ingress
+flow rule on VF 2 (**E**). Likewise for incoming traffic on physical port
+(**F**). This also applies to **C** and **A** respectively.
+
+Transferring Traffic
+~~~~~~~~~~~~~~~~~~~~
+
+Without Port Representors
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`Traffic direction`_ describes how an application could match traffic coming
+from or going to a specific place reachable from a DPDK port ID. This makes
+sense when the traffic in question is normally seen (i.e. sent or received)
+by the application creating the flow rule (e.g. as in "redirect all traffic
+coming from VF 1 to local queue 6").
+
+However this does not force such traffic to take a specific route. Creating
+a flow rule on **A** matching traffic coming from **D** is only meaningful
+if it can be received by **A** in the first place, otherwise doing so simply
+has no effect.
+
+A new flow rule attribute named "transfer" is necessary for that. Combining
+it with "ingress" or "egress" and a specific origin requests a flow rule to
+be applied at the lowest level
+
+::
+
+             ingress only           :       ingress + transfer
+                                    :
+    .-------------. .-------------. : .-------------. .-------------.
+    | hypervisor  | |    VM 1     | : | hypervisor  | |    VM 1     |
+    | application | | application | : | application | | application |
+    `------+------' `--+----------' : `------+------' `--+----------'
+           |           | | traffic  :        |           | | traffic
+     .----(A)----.     | v          :  .----(A)----.     | v
+     | port_id 3 |     |            :  | port_id 3 |     |
+     `-----+-----'     |            :  `-----+-----'     |
+           |           |            :        | ^         |
+           |           |            :        | | traffic |
+         .-+--.    .---+--.         :      .-+--.    .---+--.
+         | PF |    | VF 1 |         :      | PF |    | VF 1 |
+         `-+--'    `--(D)-'         :      `-+--'    `--(D)-'
+           |           | | traffic  :        | ^         | | traffic
+           |           | v          :        | | traffic | v
+        .--+-----------+--.         :     .--+-----------+--.
+        | interconnection |         :     | interconnection |
+        `--------+--------'         :     `--------+--------'
+                 | | traffic        :              |
+                 | v                :              |
+            .---(F)----.            :         .---(F)----.
+            | physical |            :         | physical |
+            |  port 0  |            :         |  port 0  |
+            `----------'            :         `----------'
+
+With "ingress" only, traffic is matched on **A** thus still goes to physical
+port **F** by default
+
+
+::
+
+   testpmd> flow create 3 ingress pattern vf id is 1 / end
+              actions queue index 6 / end
+
+With "ingress + transfer", traffic is matched on **D** and is therefore
+successfully assigned to queue 6 on **A**
+
+
+::
+
+    testpmd> flow create 3 ingress transfer pattern vf id is 1 / end
+              actions queue index 6 / end
+
+
+With Port Representors
+^^^^^^^^^^^^^^^^^^^^^^
+
+When port representors exist, implicit flow rules with the "transfer"
+attribute (described in `without port representors`_) are be assumed to
+exist between them and their represented resources. These may be immutable.
+
+In this case, traffic is received by default through the representor and
+neither the "transfer" attribute nor traffic origin in flow rule patterns
+are necessary. They simply have to be created on the representor port
+directly and may target a different representor as described in `PORT_ID
+action`_.
+
+Implicit traffic flow with port representor
+
+::
+
+       .-------------.   .-------------.
+       | hypervisor  |   |    VM 1     |
+       | application |   | application |
+       `--+-------+--'   `----------+--'
+          |       | ^               | | traffic
+          |       | | traffic       | v
+          |       `-----.           |
+          |             |           |
+    .----(A)----. .----(B)----.     |
+    | port_id 3 | | port_id 4 |     |
+    `-----+-----' `-----+-----'     |
+          |             |           |
+        .-+--.    .-----+-----. .---+--.
+        | PF |    | VF 1 rep. | | VF 1 |
+        `-+--'    `-----+-----' `--(D)-'
+          |             |           |
+       .--|-------------|-----------|--.
+       |  |             |           |  |
+       |  |             `-----------'  |
+       |  |              <-- traffic   |
+       `--|----------------------------'
+          |
+     .---(F)----.
+     | physical |
+     |  port 0  |
+     `----------'
+
+Pattern Items And Actions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PORT Pattern Item
+^^^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a physical
+port of the underlying device.
+
+Using this pattern item without specifying a port index matches the physical
+port associated with the current DPDK port ID by default. As described in
+`traffic steering`_, specifying it should be rarely needed.
+
+- Matches **F** in `traffic steering`_.
+
+PORT Action
+^^^^^^^^^^^
+
+Directs matching traffic to a given physical port index.
+
+- Targets **F** in `traffic steering`_.
+
+PORT_ID Pattern Item
+^^^^^^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a given DPDK
+port ID.
+
+Normally only supported if the port ID in question is known by the
+underlying PMD and related to the device the flow rule is created against.
+
+This must not be confused with the `PORT pattern item`_ which refers to the
+physical port of a device. ``PORT_ID`` refers to a ``struct rte_eth_dev``
+object on the application side (also known as "port representor" depending
+on the kind of underlying device).
+
+- Matches **A**, **B** or **C** in `traffic steering`_.
+
+PORT_ID Action
+^^^^^^^^^^^^^^
+
+Directs matching traffic to a given DPDK port ID.
+
+Same restrictions as `PORT_ID pattern item`_.
+
+- Targets **A**, **B** or **C** in `traffic steering`_.
+
+PF Pattern Item
+^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) the physical
+function of the current device.
+
+If supported, should work even if the physical function is not managed by
+the application and thus not associated with a DPDK port ID. Its behavior is
+otherwise similar to `PORT_ID pattern item`_ using PF port ID.
+
+- Matches **A** in `traffic steering`_.
+
+PF Action
+^^^^^^^^^
+
+Directs matching traffic to the physical function of the current device.
+
+Same restrictions as `PF pattern item`_.
+
+- Targets **A** in `traffic steering`_.
+
+VF Pattern Item
+^^^^^^^^^^^^^^^
+
+Matches traffic originating from (ingress) or going to (egress) a given
+virtual function of the current device.
+
+If supported, should work even if the virtual function is not managed by
+the application and thus not associated with a DPDK port ID. Its behavior is
+otherwise similar to `PORT_ID pattern item`_ using VF port ID.
+
+Note this pattern item does not match VF representors traffic which, as
+separate entities, should be addressed through their own port IDs.
+
+- Matches **D** or **E** in `traffic steering`_.
+
+VF Action
+^^^^^^^^^
+
+Directs matching traffic to a given virtual function of the current device.
+
+Same restrictions as `VF pattern item`_.
+
+- Targets **D** or **E** in `traffic steering`_.
+
+\*_ENCAP actions
+^^^^^^^^^^^^^^^^
+
+These actions are named according to the protocol they encapsulate traffic
+with (e.g. ``VXLAN_ENCAP``) and using specific parameters (e.g. VNI for
+VXLAN).
+
+While they modify traffic and can be used multiple times (order matters),
+unlike `PORT_ID action`_ and friends, they have no impact on steering.
+
+As described in `actions order and repetition`_ this means they are useless
+if used alone in an action list, the resulting traffic gets dropped unless
+combined with either ``PASSTHRU`` or other endpoint-targeting actions.
+
+\*_DECAP actions
+^^^^^^^^^^^^^^^^
+
+They perform the reverse of `\*_ENCAP actions`_ by popping protocol headers
+from traffic instead of pushing them. They can be used multiple times as
+well.
+
+Note that using these actions on non-matching traffic results in undefined
+behavior. It is recommended to match the protocol headers to decapsulate on
+the pattern side of a flow rule in order to use these actions or otherwise
+make sure only matching traffic goes through.
+
+Actions Order and Repetition
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Flow rules are currently restricted to at most a single action of each
+supported type, performed in an unpredictable order (or all at once). To
+repeat actions in a predictable fashion, applications have to make rules
+pass-through and use priority levels.
+
+It's now clear that PMD support for chaining multiple non-terminating flow
+rules of varying priority levels is prohibitively difficult to implement
+compared to simply allowing multiple identical actions performed in a
+defined order by a single flow rule.
+
+- This change is required to support protocol encapsulation offloads and the
+  ability to perform them multiple times (e.g. VLAN then VXLAN).
+
+- It makes the ``DUP`` action redundant since multiple ``QUEUE`` actions can
+  be combined for duplication.
+
+- The (non-)terminating property of actions must be discarded. Instead, flow
+  rules themselves must be considered terminating by default (i.e. dropping
+  traffic if there is no specific target) unless a ``PASSTHRU`` action is
+  also specified.
+
+Switching Examples
+------------------
+
+This section provides practical examples based on the established testpmd
+flow command syntax [2]_, in the context described in `traffic steering`_
+
+::
+
+      .-------------.                 .-------------. .-------------.
+      | hypervisor  |                 |    VM 1     | |    VM 2     |
+      | application |                 | application | | application |
+      `--+---+---+--'                 `----------+--' `--+----------'
+         |   |   |                               |       |
+         |   |   `-------------------.           |       |
+         |   `---------.             |           |       |
+         |             |             |           |       |
+   .----(A)----. .----(B)----. .----(C)----.     |       |
+   | port_id 3 | | port_id 4 | | port_id 5 |     |       |
+   `-----+-----' `-----+-----' `-----+-----'     |       |
+         |             |             |           |       |
+       .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
+       | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
+       `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
+         |             |             |           |       |
+         |             |   .---------'           |       |
+         `-----.       |   |   .-----------------'       |
+               |       |   |   |   .---------------------'
+               |       |   |   |   |
+            .--|-------|---|---|---|--.
+            |  |       |   `---|---'  |
+            |  |       `-------'      |
+            |  `---------.            |
+            `------------|------------'
+                         |
+                    .---(F)----.
+                    | physical |
+                    |  port 0  |
+                    `----------'
+
+By default, PF (**A**) can communicate with the physical port it is
+associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
+and restricted to communicate with the hypervisor application through their
+respective representors (**B** and **C**) if supported.
+
+Examples in subsequent sections apply to hypervisor applications only and
+are based on port representors **A**, **B** and **C**.
+
+.. [2] `Flow syntax
+    <http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`_
+
+Associating VF 1 with Physical Port 0
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assign all port traffic (**F**) to VF 1 (**D**) indiscriminately through
+their representors
+
+::
+
+   flow create 3 ingress pattern / end actions port_id id 4 / end
+   flow create 4 ingress pattern / end actions port_id id 3 / end
+
+More practical example with MAC address restrictions
+
+::
+
+   flow create 3 ingress
+       pattern eth dst is {VF 1 MAC} / end
+       actions port_id id 4 / end
+
+::
+
+   flow create 4 ingress
+       pattern eth src is {VF 1 MAC} / end
+       actions port_id id 3 / end
+
+
+Sharing Broadcasts
+~~~~~~~~~~~~~~~~~~
+
+From outside to PF and VFs
+
+::
+
+   flow create 3 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff / end
+      actions port_id id 3 / port_id id 4 / port_id id 5 / end
+
+Note ``port_id id 3`` is necessary otherwise only VFs would receive matching
+traffic.
+
+From PF to outside and VFs
+
+::
+
+   flow create 3 egress
+      pattern eth dst is ff:ff:ff:ff:ff:ff / end
+      actions port / port_id id 4 / port_id id 5 / end
+
+From VFs to outside and PF
+
+::
+
+   flow create 4 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 1 MAC} / end
+      actions port_id id 3 / port_id id 5 / end
+
+   flow create 5 ingress
+      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 2 MAC} / end
+      actions port_id id 4 / port_id id 4 / end
+
+Similar ``33:33:*`` rules based on known MAC addresses should be added for
+IPv6 traffic.
+
+Encapsulating VF 2 Traffic in VXLAN
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assuming pass-through flow rules are supported
+
+::
+
+   flow create 5 ingress
+      pattern eth / end
+      actions vxlan_encap vni 42 / passthru / end
+
+::
+
+   flow create 5 egress
+      pattern vxlan vni is 42 / end
+      actions vxlan_decap / passthru / end
+
+Here ``passthru`` is needed since as described in `actions order and
+repetition`_, flow rules are otherwise terminating; if supported, a rule
+without a target endpoint will drop traffic.
+
+Without pass-through support, ingress encapsulation on the destination
+endpoint might not be supported and action list must provide one
+
+::
+
+   flow create 5 ingress
+      pattern eth src is {VF 2 MAC} / end
+      actions vxlan_encap vni 42 / port_id id 3 / end
+
+   flow create 3 ingress
+      pattern vxlan vni is 42 / end
+      actions vxlan_decap / port_id id 5 / end
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 1/9] doc: add switch representation documentation Declan Doherty
@ 2018-04-26 10:40     ` Declan Doherty
  2018-04-26 12:02       ` Thomas Monjalon
  2018-04-27 16:29       ` Ferruh Yigit
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
                       ` (7 subsequent siblings)
  9 siblings, 2 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:40 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty

Introduces a new port attribute to ethdev port's which denotes the
switch domain a port belongs to. By default all port's switch
identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports
which supported the concept of switch domains can be configured with
the same switch domain id.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 app/test-pmd/config.c         | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h | 27 +++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 216a7eb4e..26f416100 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -517,6 +517,18 @@ port_infos_display(portid_t port_id)
 	printf("Min possible number of TXDs per queue: %hu\n",
 		dev_info.tx_desc_lim.nb_min);
 	printf("TXDs number alignment: %hu\n", dev_info.tx_desc_lim.nb_align);
+
+	/* Show switch info only if valid switch domain and port id is set */
+	if (dev_info.switch_info.domain_id !=
+		RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) {
+		if (dev_info.switch_info.name)
+			printf("Switch name: %s\n", dev_info.switch_info.name);
+
+		printf("Switch domain Id: %u\n",
+			dev_info.switch_info.domain_id);
+		printf("Switch Port Id: %u\n",
+			dev_info.switch_info.port_id);
+	}
 }
 
 void
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index efd84bb7b..06d9b288b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1026,6 +1026,28 @@ struct rte_eth_dev_portconf {
 	uint16_t nb_queues; /**< Device-preferred number of queues */
 };
 
+/**
+ * Default values for switch domain id when ethdev does not support switch
+ * domain definitions.
+ */
+#define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID	(0)
+
+/**
+ * Ethernet device associated switch information
+ */
+struct rte_eth_switch_info {
+	const char *name;	/**< switch name */
+	uint16_t domain_id;	/**< switch domain id */
+	uint16_t port_id;
+	/**<
+	 * mapping to the devices physical switch port as enumerated from the
+	 * perspective of the embedded interconnect/switch. For SR-IOV enabled
+	 * device this may correspond to the VF_ID of each virtual function,
+	 * but each driver should explicitly define the mapping of switch
+	 * port identifier to that physical interconnect/switch
+	 */
+};
+
 /**
  * Ethernet device information
  */
@@ -1073,6 +1095,11 @@ struct rte_eth_dev_info {
 	struct rte_eth_dev_portconf default_txportconf;
 	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
 	uint64_t dev_capa;
+	/**
+	 * Switching information for ports on a device with a
+	 * embedded managed interconnect/switch.
+	 */
+	struct rte_eth_switch_info switch_info;
 };
 
 /**
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 3/9] ethdev: add generic create/destroy ethdev APIs
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 1/9] doc: add switch representation documentation Declan Doherty
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port Declan Doherty
@ 2018-04-26 10:40     ` Declan Doherty
  2018-04-26 12:16       ` Ferruh Yigit
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 4/9] ethdev: Add port representor device flag Declan Doherty
                       ` (6 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:40 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty

Add new bus generic ethdev create/destroy APIs which are bus independent
and provide hooks for bus specific initialisation.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c           | 93 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_driver.h    | 57 ++++++++++++++++++++
 lib/librte_ether/rte_ethdev_pci.h       | 12 +++++
 lib/librte_ether/rte_ethdev_version.map |  2 +
 4 files changed, 164 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5f1a1bf2b..6f7695ab3 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3391,6 +3391,99 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
 			RTE_MEMZONE_IOVA_CONTIG, align);
 }
 
+int __rte_experimental
+rte_eth_dev_create(struct rte_device *device, const char *name,
+	size_t priv_data_size,
+	ethdev_bus_specific_init ethdev_bus_specific_init,
+	void *bus_init_params,
+	ethdev_init_t ethdev_init, void *init_params)
+{
+	struct rte_eth_dev *ethdev;
+	int retval;
+
+	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		ethdev = rte_eth_dev_allocate(name);
+		if (!ethdev) {
+			retval = -ENODEV;
+			goto probe_failed;
+		}
+
+		if (priv_data_size) {
+			ethdev->data->dev_private = rte_zmalloc_socket(
+				name, priv_data_size, RTE_CACHE_LINE_SIZE,
+				device->numa_node);
+
+			if (!ethdev->data->dev_private) {
+				RTE_LOG(ERR, EAL, "failed to allocate private data");
+				retval = -ENOMEM;
+				goto probe_failed;
+			}
+		}
+	} else {
+		ethdev = rte_eth_dev_attach_secondary(name);
+		if (!ethdev) {
+			RTE_LOG(ERR, EAL, "secondary process attach failed, "
+				"ethdev doesn't exist");
+			retval = -ENODEV;
+			goto probe_failed;
+		}
+	}
+
+	ethdev->device = device;
+
+	if (ethdev_bus_specific_init) {
+		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
+		if (retval) {
+			RTE_LOG(ERR, EAL,
+				"ethdev bus specific initialisation failed");
+			goto probe_failed;
+		}
+	}
+
+	retval = ethdev_init(ethdev, init_params);
+	if (retval) {
+		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
+		goto probe_failed;
+	}
+
+	return retval;
+probe_failed:
+	/* free ports private data if primary process */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(ethdev->data->dev_private);
+
+	rte_eth_dev_release_port(ethdev);
+
+	return retval;
+}
+
+int  __rte_experimental
+rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
+	ethdev_uninit_t ethdev_uninit)
+{
+	int ret;
+
+	ethdev = rte_eth_dev_allocated(ethdev->data->name);
+	if (!ethdev)
+		return -ENODEV;
+
+	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_uninit, -EINVAL);
+	if (ethdev_uninit) {
+		ret = ethdev_uninit(ethdev);
+		if (ret)
+			return ret;
+	}
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(ethdev->data->dev_private);
+
+	ethdev->data->dev_private = NULL;
+
+	return rte_eth_dev_release_port(ethdev);
+}
+
 int
 rte_eth_dev_rx_intr_ctl_q(uint16_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data)
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index a406ef123..8c61ab2f4 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -188,6 +188,63 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 #endif
 }
 
+
+typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
+typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
+	void *bus_specific_init_params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function for the creation of a new ethdev ports.
+ *
+ * @param device
+ *  rte_device handle.
+ * @param name
+ *  port name.
+ * @param priv_data_size
+ *  size of private data required for port.
+ * @param bus_specific_init
+ *  port bus specific initialisation callback function
+ * @param bus_init_params
+ *  port bus specific initialisation parameters
+ * @param ethdev_init
+ *  device specific port initialization callback function
+ * @param init_params
+ *  port initialisation parameters
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_dev_create(struct rte_device *device, const char *name,
+	size_t priv_data_size,
+	ethdev_bus_specific_init bus_specific_init, void *bus_init_params,
+	ethdev_init_t ethdev_init, void *init_params);
+
+
+typedef int (*ethdev_uninit_t)(struct rte_eth_dev *ethdev);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function for cleaing up the resources of a ethdev port on it's
+ * destruction.
+ *
+ * @param ethdev
+ *   ethdev handle of port.
+ * @param ethdev
+ *   device specific port un-initialise callback function
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
+	ethdev_uninit_t ethdev_uninit);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ethdev_pci.h b/lib/librte_ether/rte_ethdev_pci.h
index 6565ae7d3..603287c28 100644
--- a/lib/librte_ether/rte_ethdev_pci.h
+++ b/lib/librte_ether/rte_ethdev_pci.h
@@ -70,6 +70,18 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev,
 	eth_dev->data->numa_node = pci_dev->device.numa_node;
 }
 
+static inline int
+eth_dev_pci_specific_init(struct rte_eth_dev *eth_dev, void *bus_device) {
+	struct rte_pci_device *pci_dev = bus_device;
+
+	if (!pci_dev)
+		return -ENODEV;
+
+	rte_eth_copy_pci_info(eth_dev, pci_dev);
+
+	return 0;
+}
+
 /**
  * @internal
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 8fe07880f..c4380aa31 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -208,6 +208,8 @@ EXPERIMENTAL {
 
 	rte_eth_dev_count_avail;
 	rte_eth_dev_count_total;
+	rte_eth_dev_create;
+	rte_eth_dev_destroy;
 	rte_eth_dev_is_removed;
 	rte_eth_dev_owner_delete;
 	rte_eth_dev_owner_get;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 4/9] ethdev: Add port representor device flag
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (2 preceding siblings ...)
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
@ 2018-04-26 10:41     ` Declan Doherty
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 5/9] app/testpmd: add port name to device info Declan Doherty
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:41 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty

Add new device flag to specify that an ethdev port is a port representor.
Extend rte_eth_dev_info structure to expose device flags to the user which
enables applications to discover if a port is a representor port.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c | 2 ++
 lib/librte_ether/rte_ethdev.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 6f7695ab3..621f8af7f 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2380,6 +2380,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
 	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
+
+	dev_info->dev_flags = &dev->data->dev_flags;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 06d9b288b..0f28ee50d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1056,6 +1056,7 @@ struct rte_eth_dev_info {
 	const char *driver_name; /**< Device Driver name. */
 	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
 		Use if_indextoname() to translate into an interface name. */
+	const uint32_t *dev_flags; /**< Device flags */
 	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
 	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
 	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
@@ -1302,6 +1303,8 @@ struct rte_eth_dev_owner {
 #define RTE_ETH_DEV_BONDED_SLAVE 0x0004
 /** Device supports device removal interrupt */
 #define RTE_ETH_DEV_INTR_RMV     0x0008
+/** Device is port representor */
+#define RTE_ETH_DEV_REPRESENTOR  0x0010
 
 /**
  * @warning
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 5/9] app/testpmd: add port name to device info
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (3 preceding siblings ...)
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 4/9] ethdev: Add port representor device flag Declan Doherty
@ 2018-04-26 10:41     ` Declan Doherty
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser Declan Doherty
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:41 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty

Add the port name to information printed by show port info <port_id>

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 app/test-pmd/config.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 26f416100..57853e58f 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -407,6 +407,7 @@ port_infos_display(portid_t port_id)
 	static const char *info_border = "*********************";
 	portid_t pid;
 	uint16_t mtu;
+	char name[RTE_ETH_NAME_MAX_LEN];
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN)) {
 		printf("Valid port range is [0");
@@ -423,6 +424,8 @@ port_infos_display(portid_t port_id)
 	       info_border, port_id, info_border);
 	rte_eth_macaddr_get(port_id, &mac_addr);
 	print_ethaddr("MAC address: ", &mac_addr);
+	rte_eth_dev_get_name_by_port(port_id, name);
+	printf("\nDevice name: %s", name);
 	printf("\nDriver name: %s", dev_info.driver_name);
 	printf("\nConnect to socket: %u", port->socket_id);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (4 preceding siblings ...)
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 5/9] app/testpmd: add port name to device info Declan Doherty
@ 2018-04-26 10:41     ` Declan Doherty
  2018-04-26 12:03       ` Ananyev, Konstantin
  2018-04-26 12:15       ` Ferruh Yigit
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 7/9] ethdev: add switch domain allocator Declan Doherty
                       ` (3 subsequent siblings)
  9 siblings, 2 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:41 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Remy Horton, Declan Doherty

From: Remy Horton <remy.horton@intel.com>

Introduces a new structure, rte_eth_devargs, to support generic
ethdev arguments common across NET PMDs, with a new API
rte_eth_devargs_parse API to support PMD parsing these arguments. The
patch add support for a representor argument  passed with passed with
the EAL -w option. The representor parameter allows the user to specify
which representor ports to initialise on a device.

The argument supports passing a single representor port, a list of
port values or a range of port values.

-w BDF,representor=1  # create representor port 1 on pci device BDF
-w BDF,representor=[1,2,5,6,10] # create representor ports in list
-w BDF,representor=[0-31] # create representor ports in range

Signed-off-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 doc/guides/prog_guide/poll_mode_drv.rst |  19 ++++
 lib/Makefile                            |   1 +
 lib/librte_ether/rte_ethdev.c           | 182 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_driver.h    |  30 ++++++
 lib/librte_ether/rte_ethdev_version.map |   1 +
 5 files changed, 233 insertions(+)

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index e5d01874e..09a93baec 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -345,6 +345,25 @@ Ethernet Device API
 
 The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*.
 
+Ethernet Device Standard Device Arguments
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Standard Ethernet device arguments allow for a set of commonly used arguments/
+parameters which are applicable to all Ethernet devices to be available to for
+specification of specific device and for passing common configuration
+parameters to those ports.
+
+* ``representor`` for a device which supports the creation of representor ports
+  this argument allows user to specify which switch ports to enable port
+  representors for.::
+
+   -w BDBF,representor=0
+   -w BDBF,representor=[0,4,6,9]
+   -w BDBF,representor=[0-31]
+
+Note: PMDs are not required to support the standard device arguments and users
+should consult the relevant PMD documentation to see support devargs.
+
 Extended Statistics API
 ~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/Makefile b/lib/Makefile
index 965be6c8d..536775e59 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
 DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
 DEPDIRS-librte_ether += librte_mbuf
+DEPDIRS-librte_ether += librte_kvargs
 DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
 DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 621f8af7f..cb85d8bb7 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,6 +34,7 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
++#include <rte_kvargs.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -4101,6 +4102,187 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
 }
 
+typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
+
+static int
+rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
+{
+	int state;
+	struct rte_kvargs_pair *pair;
+	char *letter;
+
+	arglist->str = strdup(str_in);
+	if (arglist->str == NULL)
+		return -ENOMEM;
+
+	letter = arglist->str;
+	state = 0;
+	arglist->count = 0;
+	pair = &arglist->pairs[0];
+	while (1) {
+		switch (state) {
+		case 0: /* Initial */
+			if (*letter == '=')
+				return -EINVAL;
+			else if (*letter == '\0')
+				return 0;
+
+			state = 1;
+			pair->key = letter;
+			/* fall-thru */
+
+		case 1: /* Parsing key */
+			if (*letter == '=') {
+				*letter = '\0';
+				pair->value = letter + 1;
+				state = 2;
+			} else if (*letter == ',' || *letter == '\0')
+				return -EINVAL;
+			break;
+
+
+		case 2: /* Parsing value */
+			if (*letter == '[')
+				state = 3;
+			else if (*letter == ',') {
+				*letter = '\0';
+				arglist->count++;
+				pair = &arglist->pairs[arglist->count];
+				state = 0;
+			} else if (*letter == '\0') {
+				letter--;
+				arglist->count++;
+				pair = &arglist->pairs[arglist->count];
+				state = 0;
+			}
+			break;
+
+		case 3: /* Parsing list */
+			if (*letter == ']')
+				state = 2;
+			else if (*letter == '\0')
+				return -EINVAL;
+			break;
+		}
+		letter++;
+	}
+}
+
+static int
+rte_eth_devargs_parse_list(char *str, rte_eth_devargs_callback_t callback,
+	void *data)
+{
+	char *str_start;
+	int state;
+	int result;
+
+	if (*str != '[')
+		/* Single element, not a list */
+		return callback(str, data);
+
+	/* Sanity check, then strip the brackets */
+	str_start = &str[strlen(str) - 1];
+	if (*str_start != ']') {
+		RTE_LOG(ERR, EAL, "(%s): List does not end with ']'", str);
+		return -EINVAL;
+	}
+	str++;
+	*str_start = '\0';
+
+	/* Process list elements */
+	state = 0;
+	while (1) {
+		if (state == 0) {
+			if (*str == '\0')
+				break;
+			if (*str != ',') {
+				str_start = str;
+				state = 1;
+			}
+		} else if (state == 1) {
+			if (*str == ',' || *str == '\0') {
+				if (str > str_start) {
+					/* Non-empty string fragment */
+					*str = '\0';
+					result = callback(str_start, data);
+					if (result < 0)
+						return result;
+				}
+				state = 0;
+			}
+		}
+		str++;
+	}
+	return 0;
+}
+
+static int
+rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
+	const uint16_t max_list)
+{
+	uint16_t lo, hi, val;
+	int result;
+
+	result = sscanf(str, "%hu-%hu", &lo, &hi);
+	if (result == 1) {
+		if (*len_list >= max_list)
+			return -ENOMEM;
+		list[(*len_list)++] = lo;
+	} else if (result == 2) {
+		if (lo >= hi || lo > RTE_MAX_ETHPORTS || hi > RTE_MAX_ETHPORTS)
+			return -EINVAL;
+		for (val = lo; val <= hi; val++) {
+			if (*len_list >= max_list)
+				return -ENOMEM;
+			list[(*len_list)++] = val;
+		}
+	} else
+		return -EINVAL;
+	return 0;
+}
+
+
+static int
+rte_eth_devargs_parse_representor_ports(char *str, void *data)
+{
+	struct rte_eth_devargs *eth_da = data;
+
+	return rte_eth_devargs_process_range(str, eth_da->representor_ports,
+		&eth_da->nb_representor_ports, RTE_MAX_ETHPORTS);
+}
+
+int __rte_experimental
+rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
+{
+	struct rte_kvargs args;
+	struct rte_kvargs_pair *pair;
+	unsigned int i;
+	int result = 0;
+
+	memset(eth_da, 0, sizeof(*eth_da));
+
+	result = rte_eth_devargs_tokenise(&args, dargs);
+	if (result < 0)
+		goto parse_cleanup;
+
+	for (i = 0; i < args.count; i++) {
+		pair = &args.pairs[i];
+		if (strcmp("representor", pair->key) == 0) {
+			result = rte_eth_devargs_parse_list(pair->value,
+				rte_eth_devargs_parse_representor_ports,
+				eth_da);
+			if (result < 0)
+				goto parse_cleanup;
+		}
+	}
+
+parse_cleanup:
+	if (args.str)
+		free(args.str);
+
+	return result;
+}
+
 RTE_INIT(ethdev_init_log);
 static void
 ethdev_init_log(void)
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index 8c61ab2f4..492da754a 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -189,6 +189,36 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 }
 
 
+/** Generic Ethernet device arguments  */
+struct rte_eth_devargs {
+	uint16_t ports[RTE_MAX_ETHPORTS];
+	/** port/s number to enable on a multi-port single function */
+	uint16_t nb_ports;
+	/** number of ports in ports field */
+	uint16_t representor_ports[RTE_MAX_ETHPORTS];
+	/** representor port/s identifier to enable on device */
+	uint16_t nb_representor_ports;
+	/** number of ports in representor port field */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * PMD helper function to parse ethdev arguments
+ *
+ * @param devargs
+ *  device arguments
+ * @param eth_devargs
+ *  parsed ethdev specific arguments.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_devargs);
+
+
 typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
 typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
 	void *bus_specific_init_params);
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index c4380aa31..41c3d2699 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -206,6 +206,7 @@ DPDK_18.02 {
 EXPERIMENTAL {
 	global:
 
+	rte_eth_devargs_parse;
 	rte_eth_dev_count_avail;
 	rte_eth_dev_count_total;
 	rte_eth_dev_create;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 7/9] ethdev: add switch domain allocator
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (5 preceding siblings ...)
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser Declan Doherty
@ 2018-04-26 10:41     ` Declan Doherty
  2018-04-26 12:27       ` Ananyev, Konstantin
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 8/9] net/i40e: add support for representor ports Declan Doherty
                       ` (2 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:41 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty

Add switch domain allocate and free API to enable NET devices to synchronise
switch domain allocation.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
---
 lib/librte_ether/rte_ethdev.c           | 54 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_driver.h    | 39 ++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev_version.map |  2 ++
 3 files changed, 95 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index cb85d8bb7..a09c7e5b3 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -4102,6 +4102,60 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
 }
 
+/**
+ * A set of values to describe the possible states of a switch domain.
+ */
+enum rte_eth_switch_domain_state {
+	RTE_ETH_SWITCH_DOMAIN_UNUSED = 0,
+	RTE_ETH_SWITCH_DOMAIN_ALLOCATED
+};
+
+/**
+ * Array of switch domains available for allocation. Array is sized to
+ * RTE_MAX_ETHPORTS elements as there cannot be more active switch domains than
+ * ethdev ports in a single process.
+ */
+struct rte_eth_dev_switch {
+	enum rte_eth_switch_domain_state state;
+} rte_eth_switch_domains[RTE_MAX_ETHPORTS];
+
+int __rte_experimental
+rte_eth_switch_domain_alloc(uint16_t *domain_id)
+{
+	unsigned int i;
+
+	*domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
+
+	for (i = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID + 1;
+		i < RTE_MAX_ETHPORTS; i++) {
+		if (rte_eth_switch_domains[i].state ==
+			RTE_ETH_SWITCH_DOMAIN_UNUSED) {
+			rte_eth_switch_domains[i].state =
+				RTE_ETH_SWITCH_DOMAIN_ALLOCATED;
+			*domain_id = i;
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_eth_switch_domain_free(uint16_t domain_id)
+{
+	if (domain_id == RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID ||
+		domain_id >= RTE_MAX_ETHPORTS)
+		return -EINVAL;
+
+	if (rte_eth_switch_domains[domain_id].state !=
+		RTE_ETH_SWITCH_DOMAIN_ALLOCATED)
+		return -EINVAL;
+
+	rte_eth_switch_domains[domain_id].state = RTE_ETH_SWITCH_DOMAIN_UNUSED;
+
+	return 0;
+}
+
 typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
 
 static int
diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
index 492da754a..f428afa72 100644
--- a/lib/librte_ether/rte_ethdev_driver.h
+++ b/lib/librte_ether/rte_ethdev_driver.h
@@ -188,6 +188,45 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 #endif
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate an unique switch domain identifier.
+ *
+ * A pool of switch domain identifiers which can be allocated on request. This
+ * will enabled devices which support the concept of switch domains to request
+ * a switch domain id which is guaranteed to be unique from other devices
+ * running in the same process.
+ *
+ * @param domain_id
+ *  switch domain identifier parameter to pass back to application
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_switch_domain_alloc(uint16_t *domain_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Free switch domain.
+ *
+ * Return a switch domain identifier to the pool of free identifiers after it is
+ * no longer in use by device.
+ *
+ * @param domain_id
+ *  switch domain identifier to free
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int __rte_experimental
+rte_eth_switch_domain_free(uint16_t domain_id);
+
+
 
 /** Generic Ethernet device arguments  */
 struct rte_eth_devargs {
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 41c3d2699..86f06769a 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -220,6 +220,8 @@ EXPERIMENTAL {
 	rte_eth_dev_rx_offload_name;
 	rte_eth_dev_tx_offload_name;
 	rte_eth_find_next_owned_by;
+	rte_eth_switch_domain_alloc;
+	rte_eth_switch_domain_free;
 	rte_mtr_capabilities_get;
 	rte_mtr_create;
 	rte_mtr_destroy;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 8/9] net/i40e: add support for representor ports
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (6 preceding siblings ...)
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 7/9] ethdev: add switch domain allocator Declan Doherty
@ 2018-04-26 10:41     ` Declan Doherty
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 9/9] net/ixgbe: " Declan Doherty
  2018-04-26 16:24     ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Ferruh Yigit
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:41 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty, Mohammad Abdul Awal,
	Remy Horton

Add support for virtual function representor ports to the i40e PF driver.
When SR-IOV virtual functions devices are enabled a corresponding
representor port for each VF can be enabled, in the process in which the
i40e PMD is running, by specifying the representor devargs with
the list of VF ports that representors are to be created for.

An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:

-w DBDF,representor=[0,2,4-7]

and to just specify a single representor on virtual function 3 (switch
port id):

-w DBDF,representor=3

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 doc/guides/nics/i40e.rst               |  15 ++
 drivers/net/i40e/Makefile              |   3 +
 drivers/net/i40e/i40e_ethdev.c         |  82 ++++++-
 drivers/net/i40e/i40e_ethdev.h         |  16 ++
 drivers/net/i40e/i40e_vf_representor.c | 405 +++++++++++++++++++++++++++++++++
 drivers/net/i40e/meson.build           |   4 +-
 drivers/net/i40e/rte_pmd_i40e.c        |  43 ++++
 drivers/net/i40e/rte_pmd_i40e.h        |  18 ++
 lib/librte_ether/rte_ethdev.c          |   2 +-
 9 files changed, 579 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index e1b8083c1..212faf4b4 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -40,6 +40,7 @@ Features of the I40E PMD are:
 - VF Daemon (VFD) - EXPERIMENTAL
 - Dynamic Device Personalization (DDP)
 - Queue region configuration
+- Vitrual Function Port Representors
 
 Prerequisites
 -------------
@@ -121,6 +122,20 @@ Runtime Config Options
   will switch PF interrupt from IntN to Int0 to avoid interrupt conflict between
   DPDK and Linux Kernel.
 
+- ``Support VF Port Representor`` (default ``not enabled``)
+
+  The i40e PF PMD supports the creation of VF port representors for the control
+  and monitoring of i40e virtual function devices. Each port representor
+  corresponds to a single virtual function of that device. Using the ``devargs``
+  option ``representor`` the user can specify which virtual functions to create
+  port representors for on initialization of the PF PMD by passing the VF IDs of
+  the VFs which are required.::
+
+  -w DBDF,representor=[0,1,4]
+
+  Currently hot-plugging of representor ports is not supported so all required
+  representors must be specified on the creation of the PF.
+
 Driver compilation and testing
 ------------------------------
 
diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 7e34b50a7..3f869a8d6 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -11,6 +11,8 @@ LIB = librte_pmd_i40e.a
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -DPF_DRIVER -DVF_DRIVER -DINTEGRATED_VF
 CFLAGS += -DX722_A0_SUPPORT
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
 LDLIBS += -lrte_bus_pci
@@ -85,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += rte_pmd_i40e.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_vf_representor.c
 
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_AVX2,$(CFLAGS)),RTE_MACHINE_CPUFLAG_AVX2)
 	CC_AVX2_SUPPORT=1
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2fc98a7e7..0082ca693 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -213,7 +213,7 @@
 /* Bit mask of Extended Tag enable/disable */
 #define PCI_DEV_CTRL_EXT_TAG_MASK  (1 << PCI_DEV_CTRL_EXT_TAG_SHIFT)
 
-static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev);
+static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev, void *init_params);
 static int eth_i40e_dev_uninit(struct rte_eth_dev *eth_dev);
 static int i40e_dev_configure(struct rte_eth_dev *dev);
 static int i40e_dev_start(struct rte_eth_dev *dev);
@@ -607,16 +607,74 @@ static const struct rte_i40e_xstats_name_off rte_i40e_txq_prio_strings[] = {
 #define I40E_NB_TXQ_PRIO_XSTATS (sizeof(rte_i40e_txq_prio_strings) / \
 		sizeof(rte_i40e_txq_prio_strings[0]))
 
-static int eth_i40e_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+
+static int
+eth_i40e_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_probe(pci_dev,
-		sizeof(struct i40e_adapter), eth_i40e_dev_init);
+	char name[RTE_ETH_NAME_MAX_LEN];
+	struct rte_eth_devargs eth_da = { .nb_representor_ports = 0 };
+	int i, retval;
+
+	retval = rte_eth_devargs_parse(pci_dev->device.devargs->args, &eth_da);
+	if (retval)
+		return retval;
+
+	/* physical port net_bdf_port */
+	snprintf(name, sizeof(name), "net_%s", pci_dev->device.name);
+
+	retval = rte_eth_dev_create(&pci_dev->device, name,
+		sizeof(struct i40e_adapter),
+		eth_dev_pci_specific_init, pci_dev,
+		eth_i40e_dev_init, NULL);
+
+	if (retval || eth_da.nb_representor_ports < 1)
+		return retval;
+
+	/* probe VF representor ports */
+	struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+
+	if (pf_ethdev == NULL)
+		return -ENODEV;
+
+	for (i = 0; i < eth_da.nb_representor_ports; i++) {
+		struct i40e_vf_representor representor = {
+			.vf_id = eth_da.representor_ports[i],
+			.switch_domain_id = I40E_DEV_PRIVATE_TO_PF(
+				pf_ethdev->data->dev_private)->switch_domain_id,
+			.adapter = I40E_DEV_PRIVATE_TO_ADAPTER(
+				pf_ethdev->data->dev_private)
+		};
+
+		/* representor port net_bdf_port */
+		snprintf(name, sizeof(name), "net_%s_representor_%d",
+			pci_dev->device.name, eth_da.representor_ports[i]);
+
+		retval = rte_eth_dev_create(&pci_dev->device, name,
+			sizeof(struct i40e_vf_representor), NULL, NULL,
+			i40e_vf_representor_init, &representor);
+
+		if (retval)
+			PMD_DRV_LOG(ERR, "failed to create i40e vf "
+				"representor %s.", name);
+	}
+
+	return 0;
 }
 
 static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_remove(pci_dev, eth_i40e_dev_uninit);
+	struct rte_eth_dev *ethdev;
+
+	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!ethdev)
+		return -ENODEV;
+
+
+	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		return rte_eth_dev_destroy(ethdev, i40e_vf_representor_uninit);
+	else
+		return rte_eth_dev_destroy(ethdev, eth_i40e_dev_uninit);
 }
 
 static struct rte_pci_driver rte_i40e_pmd = {
@@ -1090,7 +1148,7 @@ i40e_support_multi_driver(struct rte_eth_dev *dev)
 }
 
 static int
-eth_i40e_dev_init(struct rte_eth_dev *dev)
+eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params __rte_unused)
 {
 	struct rte_pci_device *pci_dev;
 	struct rte_intr_handle *intr_handle;
@@ -1517,6 +1575,10 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
 	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 	intr_handle = &pci_dev->intr_handle;
 
+	ret = rte_eth_switch_domain_free(pf->switch_domain_id);
+	if (ret)
+		PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
+
 	if (hw->adapter_stopped == 0)
 		i40e_dev_close(dev);
 
@@ -2323,7 +2385,7 @@ i40e_dev_reset(struct rte_eth_dev *dev)
 	if (ret)
 		return ret;
 
-	ret = eth_i40e_dev_init(dev);
+	ret = eth_i40e_dev_init(dev, NULL);
 
 	return ret;
 }
@@ -5752,6 +5814,12 @@ i40e_pf_setup(struct i40e_pf *pf)
 		PMD_DRV_LOG(ERR, "Could not get switch config, err %d", ret);
 		return ret;
 	}
+
+	ret = rte_eth_switch_domain_alloc(&pf->switch_domain_id);
+	if (ret)
+		PMD_INIT_LOG(WARNING,
+			"failed to allocate switch domain for device %d", ret);
+
 	if (pf->flags & I40E_FLAG_FDIR) {
 		/* make queue allocated first, let FDIR use queue pair 0*/
 		ret = i40e_res_pool_alloc(&pf->qp_pool, I40E_DEFAULT_QP_NUM_FDIR);
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index d33b255e7..7aaae71c0 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -957,6 +957,8 @@ struct i40e_pf {
 	bool gtp_support; /* 1 - support GTP-C and GTP-U */
 	/* customer customized pctype */
 	struct i40e_customized_pctype customized_pctype[I40E_CUSTOMIZED_MAX];
+	/* Switch Domain Id */
+	uint16_t switch_domain_id;
 };
 
 enum pending_msg {
@@ -1062,6 +1064,18 @@ struct i40e_adapter {
 	uint64_t pctypes_mask;
 };
 
+/**
+ * Strucute to store private data for each VF representor instance
+ */
+struct i40e_vf_representor {
+	uint16_t switch_domain_id;
+	/**< Virtual Function ID */
+	uint16_t vf_id;
+	/**< Virtual Function ID */
+	struct i40e_adapter *adapter;
+	/**< Private data store of assocaiated physical function */
+};
+
 extern const struct rte_flow_ops i40e_flow_ops;
 
 union i40e_filter_t {
@@ -1221,6 +1235,8 @@ int i40e_set_rss_key(struct i40e_vsi *vsi, uint8_t *key, uint8_t key_len);
 int i40e_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size);
 int i40e_config_rss_filter(struct i40e_pf *pf,
 		struct i40e_rte_flow_rss_conf *conf, bool add);
+int i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params);
+int i40e_vf_representor_uninit(struct rte_eth_dev *ethdev);
 
 #define I40E_DEV_TO_PCI(eth_dev) \
 	RTE_DEV_TO_PCI((eth_dev)->device)
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
new file mode 100644
index 000000000..e11d9c0c9
--- /dev/null
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -0,0 +1,405 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_bus_pci.h>
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_malloc.h>
+
+#include "base/i40e_type.h"
+#include "base/virtchnl.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "rte_pmd_i40e.h"
+
+static int
+i40e_vf_representor_link_update(struct rte_eth_dev *ethdev,
+	int wait_to_complete)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return i40e_dev_link_update(representor->adapter->eth_dev,
+		wait_to_complete);
+}
+static void
+i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	/* get dev info for the vdev */
+	dev_info->device = ethdev->device;
+
+	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
+	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
+
+	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
+	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+		sizeof(uint32_t);
+	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
+	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+	dev_info->max_mac_addrs = I40E_NUM_MACADDR_MAX;
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_CKSUM;
+	dev_info->tx_offload_capa =
+		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT |
+		DEV_TX_OFFLOAD_IPV4_CKSUM |
+		DEV_TX_OFFLOAD_UDP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_CKSUM |
+		DEV_TX_OFFLOAD_SCTP_CKSUM |
+		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+		DEV_TX_OFFLOAD_TCP_TSO |
+		DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
+		DEV_TX_OFFLOAD_GRE_TNL_TSO |
+		DEV_TX_OFFLOAD_IPIP_TNL_TSO |
+		DEV_TX_OFFLOAD_GENEVE_TNL_TSO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_thresh = {
+			.pthresh = I40E_DEFAULT_RX_PTHRESH,
+			.hthresh = I40E_DEFAULT_RX_HTHRESH,
+			.wthresh = I40E_DEFAULT_RX_WTHRESH,
+		},
+		.rx_free_thresh = I40E_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_thresh = {
+			.pthresh = I40E_DEFAULT_TX_PTHRESH,
+			.hthresh = I40E_DEFAULT_TX_HTHRESH,
+			.wthresh = I40E_DEFAULT_TX_WTHRESH,
+		},
+		.tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH,
+		.tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH,
+		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+				ETH_TXQ_FLAGS_NOOFFLOADS,
+	};
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = I40E_MAX_RING_DESC,
+		.nb_min = I40E_MIN_RING_DESC,
+		.nb_align = I40E_ALIGN_RING_DESC,
+	};
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = I40E_MAX_RING_DESC,
+		.nb_min = I40E_MIN_RING_DESC,
+		.nb_align = I40E_ALIGN_RING_DESC,
+	};
+
+	dev_info->switch_info.name =
+		representor->adapter->eth_dev->device->name;
+	dev_info->switch_info.domain_id = representor->switch_domain_id;
+	dev_info->switch_info.port_id = representor->vf_id;
+}
+
+static int
+i40e_vf_representor_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_dev_start(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static void
+i40e_vf_representor_dev_stop(__rte_unused struct rte_eth_dev *dev)
+{
+}
+
+static int
+i40e_vf_representor_rx_queue_setup(__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_rxconf *rx_conf,
+	__rte_unused struct rte_mempool *mb_pool)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_tx_queue_setup(__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_txconf *tx_conf)
+{
+	return 0;
+}
+
+static int
+i40e_vf_representor_stats_get(struct rte_eth_dev *ethdev,
+		struct rte_eth_stats *stats)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_get_vf_stats(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, stats);
+}
+
+static void
+i40e_vf_representor_stats_reset(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_reset_vf_stats(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id);
+}
+
+static void
+i40e_vf_representor_promiscuous_enable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_unicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, 1);
+}
+
+static void
+i40e_vf_representor_promiscuous_disable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_unicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, 0);
+}
+
+
+static void
+i40e_vf_representor_allmulticast_enable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_multicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id,  1);
+}
+
+static void
+i40e_vf_representor_allmulticast_disable(struct rte_eth_dev *ethdev)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_multicast_promisc(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id,  0);
+}
+
+static void
+i40e_vf_representor_mac_addr_remove(struct rte_eth_dev *ethdev, uint32_t index)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_remove_vf_mac_addr(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, &ethdev->data->mac_addrs[index]);
+}
+
+static int
+i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+		struct ether_addr *mac_addr)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_set_vf_mac_addr(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, mac_addr);
+}
+
+static int
+i40e_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
+		uint16_t vlan_id, int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+	uint64_t vf_mask = 1ULL << representor->vf_id;
+
+	return rte_pmd_i40e_set_vf_vlan_filter(
+		representor->adapter->eth_dev->data->port_id,
+		vlan_id, vf_mask, on);
+}
+
+static int
+i40e_vf_representor_vlan_offload_set(struct rte_eth_dev *ethdev, int mask)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+	struct rte_eth_dev *pdev;
+	struct i40e_pf_vf *vf;
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	uint32_t vfid;
+
+	pdev = representor->adapter->eth_dev;
+	vfid = representor->vf_id;
+
+	if (!is_i40e_supported(pdev)) {
+		PMD_DRV_LOG(ERR, "Invalid PF dev.");
+		return -EINVAL;
+	}
+
+	pf = I40E_DEV_PRIVATE_TO_PF(pdev->data->dev_private);
+
+	if (vfid >= pf->vf_num || !pf->vfs) {
+		PMD_DRV_LOG(ERR, "Invalid VF ID.");
+		return -EINVAL;
+	}
+
+	vf = &pf->vfs[vfid];
+	vsi = vf->vsi;
+	if (!vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -EINVAL;
+	}
+
+	if (mask & ETH_VLAN_FILTER_MASK) {
+		/* Enable or disable VLAN filtering offload */
+		if (ethdev->data->dev_conf.rxmode.hw_vlan_filter)
+			return i40e_vsi_config_vlan_filter(vsi, TRUE);
+		else
+			return i40e_vsi_config_vlan_filter(vsi, FALSE);
+	}
+
+	if (mask & ETH_VLAN_STRIP_MASK) {
+		/* Enable or disable VLAN stripping offload */
+		if (ethdev->data->dev_conf.rxmode.hw_vlan_strip)
+			return i40e_vsi_config_vlan_stripping(vsi, TRUE);
+		else
+			return i40e_vsi_config_vlan_stripping(vsi, FALSE);
+	}
+
+	return -EINVAL;
+}
+
+static void
+i40e_vf_representor_vlan_strip_queue_set(struct rte_eth_dev *ethdev,
+	__rte_unused uint16_t rx_queue_id, int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_i40e_set_vf_vlan_stripq(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, on);
+}
+
+static int
+i40e_vf_representor_vlan_pvid_set(struct rte_eth_dev *ethdev, uint16_t vlan_id,
+	__rte_unused int on)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_i40e_set_vf_vlan_insert(
+		representor->adapter->eth_dev->data->port_id,
+		representor->vf_id, vlan_id);
+}
+
+struct eth_dev_ops i40e_representor_dev_ops = {
+	.dev_infos_get        = i40e_vf_representor_dev_infos_get,
+
+	.dev_start            = i40e_vf_representor_dev_start,
+	.dev_configure        = i40e_vf_representor_dev_configure,
+	.dev_stop             = i40e_vf_representor_dev_stop,
+
+	.rx_queue_setup       = i40e_vf_representor_rx_queue_setup,
+	.tx_queue_setup       = i40e_vf_representor_tx_queue_setup,
+
+	.link_update          = i40e_vf_representor_link_update,
+
+	.stats_get            = i40e_vf_representor_stats_get,
+	.stats_reset          = i40e_vf_representor_stats_reset,
+
+	.promiscuous_enable   = i40e_vf_representor_promiscuous_enable,
+	.promiscuous_disable  = i40e_vf_representor_promiscuous_disable,
+
+	.allmulticast_enable  = i40e_vf_representor_allmulticast_enable,
+	.allmulticast_disable = i40e_vf_representor_allmulticast_disable,
+
+	.mac_addr_remove      = i40e_vf_representor_mac_addr_remove,
+	.mac_addr_set         = i40e_vf_representor_mac_addr_set,
+
+	.vlan_filter_set      = i40e_vf_representor_vlan_filter_set,
+	.vlan_offload_set     = i40e_vf_representor_vlan_offload_set,
+	.vlan_strip_queue_set = i40e_vf_representor_vlan_strip_queue_set,
+	.vlan_pvid_set        = i40e_vf_representor_vlan_pvid_set
+
+};
+
+
+int
+i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct i40e_vf_representor *representor = ethdev->data->dev_private;
+
+	struct i40e_pf *pf;
+	struct i40e_pf_vf *vf;
+	struct rte_eth_link *link;
+
+	representor->vf_id =
+		((struct i40e_vf_representor *)init_params)->vf_id;
+	representor->switch_domain_id =
+		((struct i40e_vf_representor *)init_params)->switch_domain_id;
+	representor->adapter =
+		((struct i40e_vf_representor *)init_params)->adapter;
+
+	pf = I40E_DEV_PRIVATE_TO_PF(
+		representor->adapter->eth_dev->data->dev_private);
+
+	if (representor->vf_id >= pf->vf_num)
+		return -ENODEV;
+
+	/** representor shares the same driver as it's PF device */
+	ethdev->device->driver = representor->adapter->eth_dev->device->driver;
+
+	/* Set representor device ops */
+	ethdev->dev_ops = &i40e_representor_dev_ops;
+
+	/* No data-path so no RX/TX functions */
+	ethdev->rx_pkt_burst = NULL;
+	ethdev->tx_pkt_burst = NULL;
+
+	vf = &pf->vfs[representor->vf_id];
+
+	if (!vf->vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -ENODEV;
+	}
+
+	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
+
+	/* Setting the number queues allocated to the VF */
+	ethdev->data->nb_rx_queues = vf->vsi->nb_qps;
+	ethdev->data->nb_tx_queues = vf->vsi->nb_qps;
+
+	ethdev->data->mac_addrs = &vf->mac_addr;
+
+	/* Link state. Inherited from PF */
+	link = &representor->adapter->eth_dev->data->dev_link;
+
+	ethdev->data->dev_link.link_speed = link->link_speed;
+	ethdev->data->dev_link.link_duplex = link->link_duplex;
+	ethdev->data->dev_link.link_status = link->link_status;
+	ethdev->data->dev_link.link_autoneg = link->link_autoneg;
+
+	return 0;
+}
+
+
+int
+i40e_vf_representor_uninit(struct rte_eth_dev *ethdev __rte_unused)
+{
+	return 0;
+}
diff --git a/drivers/net/i40e/meson.build b/drivers/net/i40e/meson.build
index 197e611d8..f2129df07 100644
--- a/drivers/net/i40e/meson.build
+++ b/drivers/net/i40e/meson.build
@@ -6,7 +6,8 @@ version = 2
 cflags += ['-DPF_DRIVER',
 	'-DVF_DRIVER',
 	'-DINTEGRATED_VF',
-	'-DX722_A0_SUPPORT']
+	'-DX722_A0_SUPPORT',
+	'-DALLOW_EXPERIMENTAL_API']
 
 subdir('base')
 objs = [base_objs]
@@ -19,6 +20,7 @@ sources = files(
 	'i40e_fdir.c',
 	'i40e_flow.c',
 	'i40e_tm.c',
+	'i40e_vf_representor.c',
 	'rte_pmd_i40e.c'
 	)
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 9f9a6504d..7aa1a7518 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -570,6 +570,49 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	return 0;
 }
 
+static const struct ether_addr null_mac_addr;
+
+int
+rte_pmd_i40e_remove_vf_mac_addr(uint16_t port, uint16_t vf_id,
+	struct ether_addr *mac_addr)
+{
+	struct rte_eth_dev *dev;
+	struct i40e_pf_vf *vf;
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+
+	if (i40e_validate_mac_addr((u8 *)mac_addr) != I40E_SUCCESS)
+		return -EINVAL;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+
+	if (!is_i40e_supported(dev))
+		return -ENOTSUP;
+
+	pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+
+	if (vf_id >= pf->vf_num || !pf->vfs)
+		return -EINVAL;
+
+	vf = &pf->vfs[vf_id];
+	vsi = vf->vsi;
+	if (!vsi) {
+		PMD_DRV_LOG(ERR, "Invalid VSI.");
+		return -EINVAL;
+	}
+
+	if (is_same_ether_addr(mac_addr, &vf->mac_addr))
+		/* Reset the mac with NULL address */
+		ether_addr_copy(&null_mac_addr, &vf->mac_addr);
+
+	/* Remove the mac */
+	i40e_vsi_delete_mac(vsi, mac_addr);
+
+	return 0;
+}
+
 /* Set vlan strip on/off for specific VF from host */
 int
 rte_pmd_i40e_set_vf_vlan_stripq(uint16_t port, uint16_t vf_id, uint8_t on)
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index d248adb1a..be4a6024a 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -455,6 +455,24 @@ int rte_pmd_i40e_set_vf_multicast_promisc(uint16_t port,
 int rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 				 struct ether_addr *mac_addr);
 
+/**
+ * Remove the VF MAC address.
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param vf_id
+ *   VF id.
+ * @param mac_addr
+ *   VF MAC address.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-EINVAL) if *vf* or *mac_addr* is invalid.
+ */
+int
+rte_pmd_i40e_remove_vf_mac_addr(uint16_t port, uint16_t vf_id,
+	struct ether_addr *mac_addr);
+
 /**
  * Enable/Disable vf vlan strip for all queues in a pool
  *
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a09c7e5b3..6ed0ffa49 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,7 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-+#include <rte_kvargs.h>
+#include <rte_kvargs.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [dpdk-dev] [PATCH v8 9/9] net/ixgbe: add support for representor ports
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (7 preceding siblings ...)
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 8/9] net/i40e: add support for representor ports Declan Doherty
@ 2018-04-26 10:41     ` Declan Doherty
  2018-04-26 16:24     ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Ferruh Yigit
  9 siblings, 0 replies; 73+ messages in thread
From: Declan Doherty @ 2018-04-26 10:41 UTC (permalink / raw)
  To: dev
  Cc: Adrien Mazarguil, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Declan Doherty, Mohammad Abdul Awal,
	Remy Horton

Add support for virtual function representor ports to the ixgbe PF driver.
When SR-IOV virtual functions devices are enabled a corresponding
representor port for each VF can be enabled in the process in which the
i40e PMD is running within, by specifying the representor devargs with
the list of VF ports that representors are to be created for.

An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:

-w DBDF,representor=[0,2,4-7]

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 doc/guides/nics/ixgbe.rst                |  14 ++
 drivers/net/ixgbe/Makefile               |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c         |  80 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.h         |  14 ++
 drivers/net/ixgbe/ixgbe_pf.c             |   7 +
 drivers/net/ixgbe/ixgbe_vf_representor.c | 217 +++++++++++++++++++++++++++++++
 drivers/net/ixgbe/meson.build            |   1 +
 7 files changed, 325 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c

diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 0c660f298..5512e0b08 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -228,6 +228,20 @@ For more details see the IPsec Security Gateway Sample Application and Security
 library documentation.
 
 
+Virtual Function Port Representors
+----------------------------------
+The IXGBE PF PMD supports the creation of VF port representors for the control
+and monitoring of IXGBE virtual function devices. Each port representor
+corresponds to a single virtual function of that device. Using the ``devargs``
+option ``representor`` the user can specify which virtual functions to create
+port representors for on initialization of the PF PMD by passing the VF IDs of
+the VFs which are required.::
+
+  -w DBDF,representor=[0,1,4]
+
+Currently hot-plugging of representor ports is not supported so all required
+representors must be specified on the creation of the PF.
+
 Supported Chipsets and NICs
 ---------------------------
 
diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index f8cad125b..7b6af3532 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -104,6 +104,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ipsec.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += rte_pmd_ixgbe.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_vf_representor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)-include := rte_pmd_ixgbe.h
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 73a24b88a..ea2e58b16 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -132,7 +132,7 @@
 #define IXGBE_EXVET_VET_EXT_SHIFT              16
 #define IXGBE_DMATXCTL_VT_MASK                 0xFFFF0000
 
-static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
+static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
 static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev);
 static int ixgbe_fdir_filter_uninit(struct rte_eth_dev *eth_dev);
@@ -1043,7 +1043,7 @@ ixgbe_swfw_lock_reset(struct ixgbe_hw *hw)
  * It returns 0 on success.
  */
 static int
-eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
+eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 {
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
 	struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
@@ -1716,16 +1716,78 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 	return 0;
 }
 
-static int eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-	struct rte_pci_device *pci_dev)
+static int
+eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_probe(pci_dev,
-		sizeof(struct ixgbe_adapter), eth_ixgbe_dev_init);
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_eth_devargs eth_da;
+	int i, retval;
+
+	retval = rte_eth_devargs_parse(pci_dev->device.devargs->args, &eth_da);
+	if (retval)
+		return retval;
+
+	/* physical port net_bdf_port */
+	snprintf(name, sizeof(name), "net_%s_%d", pci_dev->device.name, 0);
+
+	retval = rte_eth_dev_create(&pci_dev->device, name,
+		sizeof(struct ixgbe_adapter),
+		eth_dev_pci_specific_init, pci_dev,
+		eth_ixgbe_dev_init, NULL);
+
+	if (retval || eth_da.nb_representor_ports < 1)
+		return retval;
+
+	/* probe VF representor ports */
+	struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+
+	for (i = 0; i < eth_da.nb_representor_ports; i++) {
+		struct ixgbe_vf_info *vfinfo;
+		struct ixgbe_vf_representor representor;
+
+		vfinfo = *IXGBE_DEV_PRIVATE_TO_P_VFDATA(
+			pf_ethdev->data->dev_private);
+		if (vfinfo == NULL) {
+			PMD_DRV_LOG(ERR,
+				"no virtual functions supported by PF");
+			break;
+		}
+
+		representor.vf_id = eth_da.representor_ports[i];
+		representor.switch_domain_id = vfinfo->switch_domain_id;
+		representor.pf_ethdev = pf_ethdev;
+
+		/* representor port net_bdf_port */
+		snprintf(name, sizeof(name), "net_%s_representor_%d",
+			pci_dev->device.name,
+			eth_da.representor_ports[i]);
+
+		retval = rte_eth_dev_create(&pci_dev->device, name,
+			sizeof(struct ixgbe_vf_representor), NULL, NULL,
+			ixgbe_vf_representor_init, &representor);
+
+		if (retval)
+			PMD_DRV_LOG(ERR, "failed to create ixgbe vf "
+				"representor %s.", name);
+	}
+
+	return 0;
 }
 
 static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
 {
-	return rte_eth_dev_pci_generic_remove(pci_dev, eth_ixgbe_dev_uninit);
+	struct rte_eth_dev *ethdev;
+
+	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!ethdev)
+		return -ENODEV;
+
+	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		return rte_eth_dev_destroy(ethdev, ixgbe_vf_representor_uninit);
+	else
+		return rte_eth_dev_destroy(ethdev, eth_ixgbe_dev_uninit);
 }
 
 static struct rte_pci_driver rte_ixgbe_pmd = {
@@ -2868,7 +2930,7 @@ ixgbe_dev_reset(struct rte_eth_dev *dev)
 	if (ret)
 		return ret;
 
-	ret = eth_ixgbe_dev_init(dev);
+	ret = eth_ixgbe_dev_init(dev, NULL);
 
 	return ret;
 }
@@ -3883,7 +3945,7 @@ ixgbevf_check_link(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
 }
 
 /* return 0 means link status changed, -1 means not changed */
-static int
+int
 ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
 			    int wait_to_complete, int vf)
 {
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 655077700..1947442d9 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -253,6 +253,7 @@ struct ixgbe_vf_info {
 	uint16_t vlan_count;
 	uint8_t spoofchk_enabled;
 	uint8_t api_version;
+	uint16_t switch_domain_id;
 };
 
 /*
@@ -480,6 +481,15 @@ struct ixgbe_adapter {
  	struct ixgbe_tm_conf        tm_conf;
 };
 
+struct ixgbe_vf_representor {
+	uint16_t vf_id;
+	uint16_t switch_domain_id;
+	struct rte_eth_dev *pf_ethdev;
+};
+
+int ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params);
+int ixgbe_vf_representor_uninit(struct rte_eth_dev *ethdev);
+
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct ixgbe_adapter *)adapter)->hw)
 
@@ -652,6 +662,10 @@ int ixgbe_fdir_filter_program(struct rte_eth_dev *dev,
 
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);
 
+int
+ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
+			    int wait_to_complete, int vf);
+
 /*
  * misc function prototypes
  */
diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 4e61310af..4d199c802 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -90,6 +90,8 @@ void ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)
 	if (*vfinfo == NULL)
 		rte_panic("Cannot allocate memory for private VF data\n");
 
+	rte_eth_switch_domain_alloc(&(*vfinfo)->switch_domain_id);
+
 	memset(mirror_info, 0, sizeof(struct ixgbe_mirror_info));
 	memset(uta_info, 0, sizeof(struct ixgbe_uta_info));
 	hw->mac.mc_filter_type = 0;
@@ -122,6 +124,7 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct ixgbe_vf_info **vfinfo;
 	uint16_t vf_num;
+	int ret;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -132,6 +135,10 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
 	RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0;
 	RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0;
 
+	ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id);
+	if (ret)
+		PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
+
 	vf_num = dev_num_vf(eth_dev);
 	if (vf_num == 0)
 		return;
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
new file mode 100644
index 000000000..e9edf0f67
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_malloc.h>
+
+#include "base/ixgbe_type.h"
+#include "base/ixgbe_vf.h"
+#include "ixgbe_ethdev.h"
+#include "ixgbe_rxtx.h"
+#include "rte_pmd_ixgbe.h"
+
+
+static int
+ixgbe_vf_representor_link_update(struct rte_eth_dev *ethdev,
+	int wait_to_complete)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	return ixgbe_dev_link_update_share(representor->pf_ethdev,
+		wait_to_complete, 0);
+}
+
+static int
+ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+	struct ether_addr *mac_addr)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	return rte_pmd_ixgbe_set_vf_mac_addr(
+		representor->pf_ethdev->data->port_id,
+		representor->vf_id, mac_addr);
+}
+
+static void
+ixgbe_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(
+		representor->pf_ethdev->data->dev_private);
+
+	dev_info->device = representor->pf_ethdev->device;
+
+	dev_info->min_rx_bufsize = 1024;
+	/**< Minimum size of RX buffer. */
+	dev_info->max_rx_pktlen = 9728;
+	/**< Maximum configurable length of RX pkt. */
+	dev_info->max_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
+	/**< Maximum number of RX queues. */
+	dev_info->max_tx_queues = IXGBE_VF_MAX_TX_QUEUES;
+	/**< Maximum number of TX queues. */
+
+	dev_info->max_mac_addrs = hw->mac.num_rar_entries;
+	/**< Maximum number of MAC addresses. */
+
+	dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |	DEV_RX_OFFLOAD_UDP_CKSUM  |
+		DEV_RX_OFFLOAD_TCP_CKSUM;
+	/**< Device RX offload capabilities. */
+
+	dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_SCTP_CKSUM |
+		DEV_TX_OFFLOAD_TCP_TSO;
+	/**< Device TX offload capabilities. */
+
+	dev_info->speed_capa =
+		representor->pf_ethdev->data->dev_link.link_speed;
+	/**< Supported speeds bitmap (ETH_LINK_SPEED_). */
+
+	dev_info->switch_info.name =
+		representor->pf_ethdev->device->name;
+	dev_info->switch_info.domain_id = representor->switch_domain_id;
+	dev_info->switch_info.port_id = representor->vf_id;
+}
+
+static int ixgbe_vf_representor_dev_configure(
+		__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_rx_queue_setup(
+	__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_rxconf *rx_conf,
+	__rte_unused struct rte_mempool *mb_pool)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_tx_queue_setup(
+	__rte_unused struct rte_eth_dev *dev,
+	__rte_unused uint16_t rx_queue_id,
+	__rte_unused uint16_t nb_rx_desc,
+	__rte_unused unsigned int socket_id,
+	__rte_unused const struct rte_eth_txconf *tx_conf)
+{
+	return 0;
+}
+
+static int ixgbe_vf_representor_dev_start(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static void ixgbe_vf_representor_dev_stop(__rte_unused struct rte_eth_dev *dev)
+{
+}
+
+static int
+ixgbe_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
+	uint16_t vlan_id, int on)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+	uint64_t vf_mask = 1ULL << representor->vf_id;
+
+	return rte_pmd_ixgbe_set_vf_vlan_filter(
+		representor->pf_ethdev->data->port_id, vlan_id, vf_mask, on);
+}
+
+static void
+ixgbe_vf_representor_vlan_strip_queue_set(struct rte_eth_dev *ethdev,
+	__rte_unused uint16_t rx_queue_id, int on)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	rte_pmd_ixgbe_set_vf_vlan_stripq(representor->pf_ethdev->data->port_id,
+		representor->vf_id, on);
+}
+
+struct eth_dev_ops ixgbe_vf_representor_dev_ops = {
+	.dev_infos_get		= ixgbe_vf_representor_dev_infos_get,
+
+	.dev_start		= ixgbe_vf_representor_dev_start,
+	.dev_configure		= ixgbe_vf_representor_dev_configure,
+	.dev_stop		= ixgbe_vf_representor_dev_stop,
+
+	.rx_queue_setup		= ixgbe_vf_representor_rx_queue_setup,
+	.tx_queue_setup		= ixgbe_vf_representor_tx_queue_setup,
+
+	.link_update		= ixgbe_vf_representor_link_update,
+
+	.vlan_filter_set	= ixgbe_vf_representor_vlan_filter_set,
+	.vlan_strip_queue_set	= ixgbe_vf_representor_vlan_strip_queue_set,
+
+	.mac_addr_set		= ixgbe_vf_representor_mac_addr_set,
+};
+
+
+int
+ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
+{
+	struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
+
+	struct ixgbe_vf_info *vf_data;
+	struct rte_pci_device *pci_dev;
+	struct rte_eth_link *link;
+
+	if (!representor)
+		return -ENOMEM;
+
+	representor->vf_id =
+		((struct ixgbe_vf_representor *)init_params)->vf_id;
+	representor->switch_domain_id =
+		((struct ixgbe_vf_representor *)init_params)->switch_domain_id;
+	representor->pf_ethdev =
+		((struct ixgbe_vf_representor *)init_params)->pf_ethdev;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(representor->pf_ethdev);
+
+	if (representor->vf_id >= pci_dev->max_vfs)
+		return -ENODEV;
+
+	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
+
+	/* Set representor device ops */
+	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
+
+	/* No data-path so no RX/TX functions */
+	ethdev->rx_pkt_burst = NULL;
+	ethdev->tx_pkt_burst = NULL;
+
+	/* Setting the number queues allocated to the VF */
+	ethdev->data->nb_rx_queues = IXGBE_VF_MAX_RX_QUEUES;
+	ethdev->data->nb_tx_queues = IXGBE_VF_MAX_RX_QUEUES;
+
+	/* Reference VF mac address from PF data structure */
+	vf_data = *IXGBE_DEV_PRIVATE_TO_P_VFDATA(
+		representor->pf_ethdev->data->dev_private);
+
+	ethdev->data->mac_addrs = (struct ether_addr *)
+		vf_data[representor->vf_id].vf_mac_addresses;
+
+	/* Link state. Inherited from PF */
+	link = &representor->pf_ethdev->data->dev_link;
+
+	ethdev->data->dev_link.link_speed = link->link_speed;
+	ethdev->data->dev_link.link_duplex = link->link_duplex;
+	ethdev->data->dev_link.link_status = link->link_status;
+	ethdev->data->dev_link.link_autoneg = link->link_autoneg;
+
+	return 0;
+}
+
+
+int
+ixgbe_vf_representor_uninit(struct rte_eth_dev *ethdev __rte_unused)
+{
+	return 0;
+}
diff --git a/drivers/net/ixgbe/meson.build b/drivers/net/ixgbe/meson.build
index f649e659d..02d5ef5e4 100644
--- a/drivers/net/ixgbe/meson.build
+++ b/drivers/net/ixgbe/meson.build
@@ -19,6 +19,7 @@ sources = files(
 	'ixgbe_pf.c',
 	'ixgbe_rxtx.c',
 	'ixgbe_tm.c',
+	'ixgbe_vf_representor.c',
 	'rte_pmd_ixgbe.c'
 )
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port Declan Doherty
@ 2018-04-26 12:02       ` Thomas Monjalon
  2018-04-26 14:26         ` Thomas Monjalon
  2018-04-27 16:29       ` Ferruh Yigit
  1 sibling, 1 reply; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-26 12:02 UTC (permalink / raw)
  To: Declan Doherty
  Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler, Konstantin Ananyev

26/04/2018 12:40, Declan Doherty:
> Introduces a new port attribute to ethdev port's which denotes the
> switch domain a port belongs to. By default all port's switch
> identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports
> which supported the concept of switch domains can be configured with
> the same switch domain id.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>

It's very well detailed now :)

Acked-by: Thomas Monjalon <thomas@monjalon.net>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser Declan Doherty
@ 2018-04-26 12:03       ` Ananyev, Konstantin
  2018-04-26 14:21         ` Ferruh Yigit
                           ` (2 more replies)
  2018-04-26 12:15       ` Ferruh Yigit
  1 sibling, 3 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-26 12:03 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Horton, Remy



> -----Original Message-----
> From: Doherty, Declan
> Sent: Thursday, April 26, 2018 11:41 AM
> To: dev@dpdk.org
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Shahaf Shuler <shahafs@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Horton, Remy <remy.horton@intel.com>;
> Doherty, Declan <declan.doherty@intel.com>
> Subject: [dpdk-dev][PATCH v8 6/9] ethdev: add common devargs parser
> 
> From: Remy Horton <remy.horton@intel.com>
> 
> Introduces a new structure, rte_eth_devargs, to support generic
> ethdev arguments common across NET PMDs, with a new API
> rte_eth_devargs_parse API to support PMD parsing these arguments. The
> patch add support for a representor argument  passed with passed with
> the EAL -w option. The representor parameter allows the user to specify
> which representor ports to initialise on a device.
> 
> The argument supports passing a single representor port, a list of
> port values or a range of port values.
> 
> -w BDF,representor=1  # create representor port 1 on pci device BDF
> -w BDF,representor=[1,2,5,6,10] # create representor ports in list
> -w BDF,representor=[0-31] # create representor ports in range
> 
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  doc/guides/prog_guide/poll_mode_drv.rst |  19 ++++
>  lib/Makefile                            |   1 +
>  lib/librte_ether/rte_ethdev.c           | 182 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_driver.h    |  30 ++++++
>  lib/librte_ether/rte_ethdev_version.map |   1 +
>  5 files changed, 233 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index e5d01874e..09a93baec 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -345,6 +345,25 @@ Ethernet Device API
> 
>  The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*.
> 
> +Ethernet Device Standard Device Arguments
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Standard Ethernet device arguments allow for a set of commonly used arguments/
> +parameters which are applicable to all Ethernet devices to be available to for
> +specification of specific device and for passing common configuration
> +parameters to those ports.
> +
> +* ``representor`` for a device which supports the creation of representor ports
> +  this argument allows user to specify which switch ports to enable port
> +  representors for.::
> +
> +   -w BDBF,representor=0
> +   -w BDBF,representor=[0,4,6,9]
> +   -w BDBF,representor=[0-31]
> +
> +Note: PMDs are not required to support the standard device arguments and users
> +should consult the relevant PMD documentation to see support devargs.
> +
>  Extended Statistics API
>  ~~~~~~~~~~~~~~~~~~~~~~~
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index 965be6c8d..536775e59 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
>  DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
>  DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
>  DEPDIRS-librte_ether += librte_mbuf
> +DEPDIRS-librte_ether += librte_kvargs
>  DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
>  DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
>  DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 621f8af7f..cb85d8bb7 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -34,6 +34,7 @@
>  #include <rte_errno.h>
>  #include <rte_spinlock.h>
>  #include <rte_string_fns.h>
> ++#include <rte_kvargs.h>
> 
>  #include "rte_ether.h"
>  #include "rte_ethdev.h"
> @@ -4101,6 +4102,187 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>  	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
>  }
> 
> +typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
> +
> +static int
> +rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
> +{

I still think that if you'd like to extend rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]",
you have to make it generic kvargs ability and put it into librte_kvargs, not try to introduce your own new parser here.
Imagine that in addition to your 'port=[val1,val2, ..valn]' devargs string would contain some extra (let say device specific)
parameters.
What would happen, when PMD will try to use rte_kvargs_parse() on such string?
My understanding - it would fail, correct?

As an alternative - as I remember rte_kvargs allows you to have multiple identical key, i.e: "key=val1,key=val2,...,key=valn".
Why not to use that way, if you don't want to introduce extra code in rte_kvargs?

> +	int state;
> +	struct rte_kvargs_pair *pair;
> +	char *letter;
> +
> +	arglist->str = strdup(str_in);
> +	if (arglist->str == NULL)
> +		return -ENOMEM;
> +
> +	letter = arglist->str;
> +	state = 0;
> +	arglist->count = 0;
> +	pair = &arglist->pairs[0];
> +	while (1) {
> +		switch (state) {
> +		case 0: /* Initial */
> +			if (*letter == '=')
> +				return -EINVAL;
> +			else if (*letter == '\0')
> +				return 0;
> +
> +			state = 1;
> +			pair->key = letter;
> +			/* fall-thru */
> +
> +		case 1: /* Parsing key */
> +			if (*letter == '=') {
> +				*letter = '\0';
> +				pair->value = letter + 1;
> +				state = 2;
> +			} else if (*letter == ',' || *letter == '\0')
> +				return -EINVAL;
> +			break;
> +
> +
> +		case 2: /* Parsing value */
> +			if (*letter == '[')
> +				state = 3;
> +			else if (*letter == ',') {
> +				*letter = '\0';
> +				arglist->count++;
> +				pair = &arglist->pairs[arglist->count];
> +				state = 0;
> +			} else if (*letter == '\0') {
> +				letter--;
> +				arglist->count++;
> +				pair = &arglist->pairs[arglist->count];
> +				state = 0;
> +			}
> +			break;
> +
> +		case 3: /* Parsing list */
> +			if (*letter == ']')
> +				state = 2;
> +			else if (*letter == '\0')
> +				return -EINVAL;
> +			break;
> +		}
> +		letter++;
> +	}
> +}
> +
> +static int
> +rte_eth_devargs_parse_list(char *str, rte_eth_devargs_callback_t callback,
> +	void *data)
> +{
> +	char *str_start;
> +	int state;
> +	int result;
> +
> +	if (*str != '[')
> +		/* Single element, not a list */
> +		return callback(str, data);
> +
> +	/* Sanity check, then strip the brackets */
> +	str_start = &str[strlen(str) - 1];
> +	if (*str_start != ']') {
> +		RTE_LOG(ERR, EAL, "(%s): List does not end with ']'", str);
> +		return -EINVAL;
> +	}
> +	str++;
> +	*str_start = '\0';
> +
> +	/* Process list elements */
> +	state = 0;
> +	while (1) {
> +		if (state == 0) {
> +			if (*str == '\0')
> +				break;
> +			if (*str != ',') {
> +				str_start = str;
> +				state = 1;
> +			}
> +		} else if (state == 1) {
> +			if (*str == ',' || *str == '\0') {
> +				if (str > str_start) {
> +					/* Non-empty string fragment */
> +					*str = '\0';
> +					result = callback(str_start, data);
> +					if (result < 0)
> +						return result;
> +				}
> +				state = 0;
> +			}
> +		}
> +		str++;
> +	}
> +	return 0;
> +}
> +
> +static int
> +rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
> +	const uint16_t max_list)
> +{
> +	uint16_t lo, hi, val;
> +	int result;
> +
> +	result = sscanf(str, "%hu-%hu", &lo, &hi);
> +	if (result == 1) {
> +		if (*len_list >= max_list)
> +			return -ENOMEM;
> +		list[(*len_list)++] = lo;
> +	} else if (result == 2) {
> +		if (lo >= hi || lo > RTE_MAX_ETHPORTS || hi > RTE_MAX_ETHPORTS)

lo > RTE_MAX_ETHPORTS is redundant here.

> +			return -EINVAL;
> +		for (val = lo; val <= hi; val++) {
> +			if (*len_list >= max_list)
> +				return -ENOMEM;
> +			list[(*len_list)++] = val;
> +		}
> +	} else
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +
> +static int
> +rte_eth_devargs_parse_representor_ports(char *str, void *data)
> +{
> +	struct rte_eth_devargs *eth_da = data;
> +
> +	return rte_eth_devargs_process_range(str, eth_da->representor_ports,
> +		&eth_da->nb_representor_ports, RTE_MAX_ETHPORTS);
> +}
> +
> +int __rte_experimental
> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
> +{
> +	struct rte_kvargs args;
> +	struct rte_kvargs_pair *pair;
> +	unsigned int i;
> +	int result = 0;
> +
> +	memset(eth_da, 0, sizeof(*eth_da));
> +
> +	result = rte_eth_devargs_tokenise(&args, dargs);
> +	if (result < 0)
> +		goto parse_cleanup;
> +
> +	for (i = 0; i < args.count; i++) {
> +		pair = &args.pairs[i];
> +		if (strcmp("representor", pair->key) == 0) {
> +			result = rte_eth_devargs_parse_list(pair->value,
> +				rte_eth_devargs_parse_representor_ports,
> +				eth_da);
> +			if (result < 0)
> +				goto parse_cleanup;
> +		}
> +	}
> +
> +parse_cleanup:
> +	if (args.str)
> +		free(args.str);
> +
> +	return result;
> +}
> +
>  RTE_INIT(ethdev_init_log);
>  static void
>  ethdev_init_log(void)
> diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
> index 8c61ab2f4..492da754a 100644
> --- a/lib/librte_ether/rte_ethdev_driver.h
> +++ b/lib/librte_ether/rte_ethdev_driver.h
> @@ -189,6 +189,36 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>  }
> 
> 
> +/** Generic Ethernet device arguments  */
> +struct rte_eth_devargs {
> +	uint16_t ports[RTE_MAX_ETHPORTS];
> +	/** port/s number to enable on a multi-port single function */
> +	uint16_t nb_ports;
> +	/** number of ports in ports field */
> +	uint16_t representor_ports[RTE_MAX_ETHPORTS];
> +	/** representor port/s identifier to enable on device */
> +	uint16_t nb_representor_ports;
> +	/** number of ports in representor port field */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function to parse ethdev arguments
> + *
> + * @param devargs
> + *  device arguments
> + * @param eth_devargs
> + *  parsed ethdev specific arguments.
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_devargs);
> +
> +
>  typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
>  typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
>  	void *bus_specific_init_params);
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index c4380aa31..41c3d2699 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -206,6 +206,7 @@ DPDK_18.02 {
>  EXPERIMENTAL {
>  	global:
> 
> +	rte_eth_devargs_parse;
>  	rte_eth_dev_count_avail;
>  	rte_eth_dev_count_total;
>  	rte_eth_dev_create;
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser Declan Doherty
  2018-04-26 12:03       ` Ananyev, Konstantin
@ 2018-04-26 12:15       ` Ferruh Yigit
  1 sibling, 0 replies; 73+ messages in thread
From: Ferruh Yigit @ 2018-04-26 12:15 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Adrien Mazarguil, Thomas Monjalon, Shahaf Shuler,
	Konstantin Ananyev, Remy Horton

On 4/26/2018 11:41 AM, Declan Doherty wrote:
> From: Remy Horton <remy.horton@intel.com>
> 
> Introduces a new structure, rte_eth_devargs, to support generic
> ethdev arguments common across NET PMDs, with a new API
> rte_eth_devargs_parse API to support PMD parsing these arguments. The
> patch add support for a representor argument  passed with passed with
> the EAL -w option. The representor parameter allows the user to specify
> which representor ports to initialise on a device.
> 
> The argument supports passing a single representor port, a list of
> port values or a range of port values.
> 
> -w BDF,representor=1  # create representor port 1 on pci device BDF
> -w BDF,representor=[1,2,5,6,10] # create representor ports in list
> -w BDF,representor=[0-31] # create representor ports in range
> 
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>

<...>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function to parse ethdev arguments
> + *
> + * @param devargs
> + *  device arguments
> + * @param eth_devargs
> + *  parsed ethdev specific arguments.
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_devargs);

API doc build giving warning because of doxygen comments, devargs vs dargs, I
will fix while applying as "devargs"

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/9] ethdev: add generic create/destroy ethdev APIs
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
@ 2018-04-26 12:16       ` Ferruh Yigit
  0 siblings, 0 replies; 73+ messages in thread
From: Ferruh Yigit @ 2018-04-26 12:16 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Adrien Mazarguil, Thomas Monjalon, Shahaf Shuler, Konstantin Ananyev

On 4/26/2018 11:40 AM, Declan Doherty wrote:
> Add new bus generic ethdev create/destroy APIs which are bus independent
> and provide hooks for bus specific initialisation.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>

<...>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * PMD helper function for cleaing up the resources of a ethdev port on it's
> + * destruction.
> + *
> + * @param ethdev
> + *   ethdev handle of port.
> + * @param ethdev
> + *   device specific port un-initialise callback function
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
> +	ethdev_uninit_t ethdev_uninit);

Will fix doxygen warning while applying, ethdev vs ethdev_uninit

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 7/9] ethdev: add switch domain allocator
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 7/9] ethdev: add switch domain allocator Declan Doherty
@ 2018-04-26 12:27       ` Ananyev, Konstantin
  0 siblings, 0 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-26 12:27 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler



> -----Original Message-----
> From: Doherty, Declan
> Sent: Thursday, April 26, 2018 11:41 AM
> To: dev@dpdk.org
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Shahaf Shuler <shahafs@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>
> Subject: [dpdk-dev][PATCH v8 7/9] ethdev: add switch domain allocator
> 
> Add switch domain allocate and free API to enable NET devices to synchronise
> switch domain allocation.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c           | 54 +++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_driver.h    | 39 ++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_version.map |  2 ++
>  3 files changed, 95 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index cb85d8bb7..a09c7e5b3 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -4102,6 +4102,60 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>  	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
>  }
> 
> +/**
> + * A set of values to describe the possible states of a switch domain.
> + */
> +enum rte_eth_switch_domain_state {
> +	RTE_ETH_SWITCH_DOMAIN_UNUSED = 0,
> +	RTE_ETH_SWITCH_DOMAIN_ALLOCATED
> +};
> +
> +/**
> + * Array of switch domains available for allocation. Array is sized to
> + * RTE_MAX_ETHPORTS elements as there cannot be more active switch domains than
> + * ethdev ports in a single process.
> + */

Question from previous version review:
Probably already discussed before, but  if we can't have more than one switch_id per port,
while we can't use port_id as switch_id?
Or switch_id can represent some other entity (not rte_ethdev)?
Konstantin

> +struct rte_eth_dev_switch {
> +	enum rte_eth_switch_domain_state state;
> +} rte_eth_switch_domains[RTE_MAX_ETHPORTS];
> +
> +int __rte_experimental
> +rte_eth_switch_domain_alloc(uint16_t *domain_id)
> +{
> +	unsigned int i;
> +
> +	*domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> +
> +	for (i = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID + 1;
> +		i < RTE_MAX_ETHPORTS; i++) {
> +		if (rte_eth_switch_domains[i].state ==
> +			RTE_ETH_SWITCH_DOMAIN_UNUSED) {
> +			rte_eth_switch_domains[i].state =
> +				RTE_ETH_SWITCH_DOMAIN_ALLOCATED;
> +			*domain_id = i;
> +			return 0;
> +		}
> +	}
> +
> +	return -ENOSPC;
> +}
> +
> +int __rte_experimental
> +rte_eth_switch_domain_free(uint16_t domain_id)
> +{
> +	if (domain_id == RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID ||
> +		domain_id >= RTE_MAX_ETHPORTS)
> +		return -EINVAL;
> +
> +	if (rte_eth_switch_domains[domain_id].state !=
> +		RTE_ETH_SWITCH_DOMAIN_ALLOCATED)
> +		return -EINVAL;
> +
> +	rte_eth_switch_domains[domain_id].state = RTE_ETH_SWITCH_DOMAIN_UNUSED;
> +
> +	return 0;
> +}
> +
>  typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
> 
>  static int
> diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
> index 492da754a..f428afa72 100644
> --- a/lib/librte_ether/rte_ethdev_driver.h
> +++ b/lib/librte_ether/rte_ethdev_driver.h
> @@ -188,6 +188,45 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>  #endif
>  }
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate an unique switch domain identifier.
> + *
> + * A pool of switch domain identifiers which can be allocated on request. This
> + * will enabled devices which support the concept of switch domains to request
> + * a switch domain id which is guaranteed to be unique from other devices
> + * running in the same process.
> + *
> + * @param domain_id
> + *  switch domain identifier parameter to pass back to application
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_switch_domain_alloc(uint16_t *domain_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Free switch domain.
> + *
> + * Return a switch domain identifier to the pool of free identifiers after it is
> + * no longer in use by device.
> + *
> + * @param domain_id
> + *  switch domain identifier to free
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int __rte_experimental
> +rte_eth_switch_domain_free(uint16_t domain_id);
> +
> +
> 
>  /** Generic Ethernet device arguments  */
>  struct rte_eth_devargs {
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index 41c3d2699..86f06769a 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -220,6 +220,8 @@ EXPERIMENTAL {
>  	rte_eth_dev_rx_offload_name;
>  	rte_eth_dev_tx_offload_name;
>  	rte_eth_find_next_owned_by;
> +	rte_eth_switch_domain_alloc;
> +	rte_eth_switch_domain_free;
>  	rte_mtr_capabilities_get;
>  	rte_mtr_create;
>  	rte_mtr_destroy;
> --
> 2.14.3

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 12:03       ` Ananyev, Konstantin
@ 2018-04-26 14:21         ` Ferruh Yigit
  2018-04-26 14:28         ` Doherty, Declan
  2018-04-26 14:30         ` Remy Horton
  2 siblings, 0 replies; 73+ messages in thread
From: Ferruh Yigit @ 2018-04-26 14:21 UTC (permalink / raw)
  To: Ananyev, Konstantin, Doherty, Declan, dev
  Cc: Adrien Mazarguil, Thomas Monjalon, Shahaf Shuler, Horton, Remy

On 4/26/2018 1:03 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Doherty, Declan
>> Sent: Thursday, April 26, 2018 11:41 AM
>> To: dev@dpdk.org
>> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>> Shahaf Shuler <shahafs@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Horton, Remy <remy.horton@intel.com>;
>> Doherty, Declan <declan.doherty@intel.com>
>> Subject: [dpdk-dev][PATCH v8 6/9] ethdev: add common devargs parser
>>
>> From: Remy Horton <remy.horton@intel.com>
>>
>> Introduces a new structure, rte_eth_devargs, to support generic
>> ethdev arguments common across NET PMDs, with a new API
>> rte_eth_devargs_parse API to support PMD parsing these arguments. The
>> patch add support for a representor argument  passed with passed with
>> the EAL -w option. The representor parameter allows the user to specify
>> which representor ports to initialise on a device.
>>
>> The argument supports passing a single representor port, a list of
>> port values or a range of port values.
>>
>> -w BDF,representor=1  # create representor port 1 on pci device BDF
>> -w BDF,representor=[1,2,5,6,10] # create representor ports in list
>> -w BDF,representor=[0-31] # create representor ports in range
>>
>> Signed-off-by: Remy Horton <remy.horton@intel.com>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> ---
>>  doc/guides/prog_guide/poll_mode_drv.rst |  19 ++++
>>  lib/Makefile                            |   1 +
>>  lib/librte_ether/rte_ethdev.c           | 182 ++++++++++++++++++++++++++++++++
>>  lib/librte_ether/rte_ethdev_driver.h    |  30 ++++++
>>  lib/librte_ether/rte_ethdev_version.map |   1 +
>>  5 files changed, 233 insertions(+)
>>
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>> index e5d01874e..09a93baec 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -345,6 +345,25 @@ Ethernet Device API
>>
>>  The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*.
>>
>> +Ethernet Device Standard Device Arguments
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Standard Ethernet device arguments allow for a set of commonly used arguments/
>> +parameters which are applicable to all Ethernet devices to be available to for
>> +specification of specific device and for passing common configuration
>> +parameters to those ports.
>> +
>> +* ``representor`` for a device which supports the creation of representor ports
>> +  this argument allows user to specify which switch ports to enable port
>> +  representors for.::
>> +
>> +   -w BDBF,representor=0
>> +   -w BDBF,representor=[0,4,6,9]
>> +   -w BDBF,representor=[0-31]
>> +
>> +Note: PMDs are not required to support the standard device arguments and users
>> +should consult the relevant PMD documentation to see support devargs.
>> +
>>  Extended Statistics API
>>  ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 965be6c8d..536775e59 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
>>  DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
>>  DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
>>  DEPDIRS-librte_ether += librte_mbuf
>> +DEPDIRS-librte_ether += librte_kvargs
>>  DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
>>  DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
>>  DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 621f8af7f..cb85d8bb7 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -34,6 +34,7 @@
>>  #include <rte_errno.h>
>>  #include <rte_spinlock.h>
>>  #include <rte_string_fns.h>
>> ++#include <rte_kvargs.h>
>>
>>  #include "rte_ether.h"
>>  #include "rte_ethdev.h"
>> @@ -4101,6 +4102,187 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>>  	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
>>  }
>>
>> +typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
>> +
>> +static int
>> +rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
>> +{
> 
> I still think that if you'd like to extend rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]",
> you have to make it generic kvargs ability and put it into librte_kvargs, not try to introduce your own new parser here.
> Imagine that in addition to your 'port=[val1,val2, ..valn]' devargs string would contain some extra (let say device specific)
> parameters.
> What would happen, when PMD will try to use rte_kvargs_parse() on such string?
> My understanding - it would fail, correct?
> 
> As an alternative - as I remember rte_kvargs allows you to have multiple identical key, i.e: "key=val1,key=val2,...,key=valn".
> Why not to use that way, if you don't want to introduce extra code in rte_kvargs?

Hi Declan, Remy,

I will continue with existing patchset, for the sake of the rc1,
can you please address these comments as incremental updates to the set?

>> +static int
>> +rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
>> +	const uint16_t max_list)
>> +{
>> +	uint16_t lo, hi, val;
>> +	int result;
>> +
>> +	result = sscanf(str, "%hu-%hu", &lo, &hi);
>> +	if (result == 1) {
>> +		if (*len_list >= max_list)
>> +			return -ENOMEM;
>> +		list[(*len_list)++] = lo;
>> +	} else if (result == 2) {
>> +		if (lo >= hi || lo > RTE_MAX_ETHPORTS || hi > RTE_MAX_ETHPORTS)
> 
> lo > RTE_MAX_ETHPORTS is redundant here.

Same here, thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port
  2018-04-26 12:02       ` Thomas Monjalon
@ 2018-04-26 14:26         ` Thomas Monjalon
  0 siblings, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-26 14:26 UTC (permalink / raw)
  To: Declan Doherty
  Cc: dev, Adrien Mazarguil, Ferruh Yigit, Shahaf Shuler, Konstantin Ananyev

26/04/2018 14:02, Thomas Monjalon:
> 26/04/2018 12:40, Declan Doherty:
> > Introduces a new port attribute to ethdev port's which denotes the
> > switch domain a port belongs to. By default all port's switch
> > identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports
> > which supported the concept of switch domains can be configured with
> > the same switch domain id.
> > 
> > Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> 
> It's very well detailed now :)
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

One miss: you forgot to remove the deprecation notice in this patch.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 12:03       ` Ananyev, Konstantin
  2018-04-26 14:21         ` Ferruh Yigit
@ 2018-04-26 14:28         ` Doherty, Declan
  2018-04-26 14:44           ` Thomas Monjalon
  2018-04-26 14:48           ` Ananyev, Konstantin
  2018-04-26 14:30         ` Remy Horton
  2 siblings, 2 replies; 73+ messages in thread
From: Doherty, Declan @ 2018-04-26 14:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Horton, Remy

On 26/04/2018 1:03 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Doherty, Declan
>> Sent: Thursday, April 26, 2018 11:41 AM
>> To: dev@dpdk.org
>> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>> Shahaf Shuler <shahafs@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Horton, Remy <remy.horton@intel.com>;
>> Doherty, Declan <declan.doherty@intel.com>
>> Subject: [dpdk-dev][PATCH v8 6/9] ethdev: add common devargs parser
>>
>> From: Remy Horton <remy.horton@intel.com>
>>
>> Introduces a new structure, rte_eth_devargs, to support generic
>> ethdev arguments common across NET PMDs, with a new API
>> rte_eth_devargs_parse API to support PMD parsing these arguments. The
>> patch add support for a representor argument  passed with passed with
>> the EAL -w option. The representor parameter allows the user to specify
>> which representor ports to initialise on a device.
>>
>> The argument supports passing a single representor port, a list of
>> port values or a range of port values.
>>
>> -w BDF,representor=1  # create representor port 1 on pci device BDF
>> -w BDF,representor=[1,2,5,6,10] # create representor ports in list
>> -w BDF,representor=[0-31] # create representor ports in range
>>
>> Signed-off-by: Remy Horton <remy.horton@intel.com>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> ---
>>   doc/guides/prog_guide/poll_mode_drv.rst |  19 ++++
>>   lib/Makefile                            |   1 +
>>   lib/librte_ether/rte_ethdev.c           | 182 ++++++++++++++++++++++++++++++++
>>   lib/librte_ether/rte_ethdev_driver.h    |  30 ++++++
>>   lib/librte_ether/rte_ethdev_version.map |   1 +
>>   5 files changed, 233 insertions(+)
>>
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>> index e5d01874e..09a93baec 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -345,6 +345,25 @@ Ethernet Device API
>>
>>   The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*.
>>
>> +Ethernet Device Standard Device Arguments
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Standard Ethernet device arguments allow for a set of commonly used arguments/
>> +parameters which are applicable to all Ethernet devices to be available to for
>> +specification of specific device and for passing common configuration
>> +parameters to those ports.
>> +
>> +* ``representor`` for a device which supports the creation of representor ports
>> +  this argument allows user to specify which switch ports to enable port
>> +  representors for.::
>> +
>> +   -w BDBF,representor=0
>> +   -w BDBF,representor=[0,4,6,9]
>> +   -w BDBF,representor=[0-31]
>> +
>> +Note: PMDs are not required to support the standard device arguments and users
>> +should consult the relevant PMD documentation to see support devargs.
>> +
>>   Extended Statistics API
>>   ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 965be6c8d..536775e59 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
>>   DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
>>   DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
>>   DEPDIRS-librte_ether += librte_mbuf
>> +DEPDIRS-librte_ether += librte_kvargs
>>   DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
>>   DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
>>   DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 621f8af7f..cb85d8bb7 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -34,6 +34,7 @@
>>   #include <rte_errno.h>
>>   #include <rte_spinlock.h>
>>   #include <rte_string_fns.h>
>> ++#include <rte_kvargs.h>
>>
>>   #include "rte_ether.h"
>>   #include "rte_ethdev.h"
>> @@ -4101,6 +4102,187 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>>   	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
>>   }
>>
>> +typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
>> +
>> +static int
>> +rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
>> +{
> 
> I still think that if you'd like to extend rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]",
> you have to make it generic kvargs ability and put it into librte_kvargs, not try to introduce your own new parser here.
> Imagine that in addition to your 'port=[val1,val2, ..valn]' devargs string would contain some extra (let say device specific)
> parameters.
> What would happen, when PMD will try to use rte_kvargs_parse() on such string?
> My understanding - it would fail, correct?
> 
> As an alternative - as I remember rte_kvargs allows you to have multiple identical key, i.e: "key=val1,key=val2,...,key=valn".
> Why not to use that way, if you don't want to introduce extra code in rte_kvargs?
> 

Hey Konstantin, the rationale for keeping this independent from 
librte_kvargs was that it is likely that the implementation of parsing 
devarfs will change in the next release due the proposed rework on the 
whole devargs infrastructure. I hadn't considered the potential issue 
with rte_kvargs_parse() on string using the proposed syntax here. I'll 
send a patch for alignment with librte_kvargs for the next release 
candidate.

>> +	int state;
>> +	struct rte_kvargs_pair *pair;
>> +	char *letter;
>> +
>> +	arglist->str = strdup(str_in);
>> +	if (arglist->str == NULL)
>> +		return -ENOMEM;
>> +
>> +	letter = arglist->str;
>> +	state = 0;
>> +	arglist->count = 0;
>> +	pair = &arglist->pairs[0];
>> +	while (1) {
>> +		switch (state) {
>> +		case 0: /* Initial */
>> +			if (*letter == '=')
>> +				return -EINVAL;
>> +			else if (*letter == '\0')
>> +				return 0;
>> +
>> +			state = 1;
>> +			pair->key = letter;
>> +			/* fall-thru */
>> +
>> +		case 1: /* Parsing key */
>> +			if (*letter == '=') {
>> +				*letter = '\0';
>> +				pair->value = letter + 1;
>> +				state = 2;
>> +			} else if (*letter == ',' || *letter == '\0')
>> +				return -EINVAL;
>> +			break;
>> +
>> +
>> +		case 2: /* Parsing value */
>> +			if (*letter == '[')
>> +				state = 3;
>> +			else if (*letter == ',') {
>> +				*letter = '\0';
>> +				arglist->count++;
>> +				pair = &arglist->pairs[arglist->count];
>> +				state = 0;
>> +			} else if (*letter == '\0') {
>> +				letter--;
>> +				arglist->count++;
>> +				pair = &arglist->pairs[arglist->count];
>> +				state = 0;
>> +			}
>> +			break;
>> +
>> +		case 3: /* Parsing list */
>> +			if (*letter == ']')
>> +				state = 2;
>> +			else if (*letter == '\0')
>> +				return -EINVAL;
>> +			break;
>> +		}
>> +		letter++;
>> +	}
>> +}
>> +
>> +static int
>> +rte_eth_devargs_parse_list(char *str, rte_eth_devargs_callback_t callback,
>> +	void *data)
>> +{
>> +	char *str_start;
>> +	int state;
>> +	int result;
>> +
>> +	if (*str != '[')
>> +		/* Single element, not a list */
>> +		return callback(str, data);
>> +
>> +	/* Sanity check, then strip the brackets */
>> +	str_start = &str[strlen(str) - 1];
>> +	if (*str_start != ']') {
>> +		RTE_LOG(ERR, EAL, "(%s): List does not end with ']'", str);
>> +		return -EINVAL;
>> +	}
>> +	str++;
>> +	*str_start = '\0';
>> +
>> +	/* Process list elements */
>> +	state = 0;
>> +	while (1) {
>> +		if (state == 0) {
>> +			if (*str == '\0')
>> +				break;
>> +			if (*str != ',') {
>> +				str_start = str;
>> +				state = 1;
>> +			}
>> +		} else if (state == 1) {
>> +			if (*str == ',' || *str == '\0') {
>> +				if (str > str_start) {
>> +					/* Non-empty string fragment */
>> +					*str = '\0';
>> +					result = callback(str_start, data);
>> +					if (result < 0)
>> +						return result;
>> +				}
>> +				state = 0;
>> +			}
>> +		}
>> +		str++;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int
>> +rte_eth_devargs_process_range(char *str, uint16_t *list, uint16_t *len_list,
>> +	const uint16_t max_list)
>> +{
>> +	uint16_t lo, hi, val;
>> +	int result;
>> +
>> +	result = sscanf(str, "%hu-%hu", &lo, &hi);
>> +	if (result == 1) {
>> +		if (*len_list >= max_list)
>> +			return -ENOMEM;
>> +		list[(*len_list)++] = lo;
>> +	} else if (result == 2) {
>> +		if (lo >= hi || lo > RTE_MAX_ETHPORTS || hi > RTE_MAX_ETHPORTS)
> 
> lo > RTE_MAX_ETHPORTS is redundant here.
> 
>> +			return -EINVAL;
>> +		for (val = lo; val <= hi; val++) {
>> +			if (*len_list >= max_list)
>> +				return -ENOMEM;
>> +			list[(*len_list)++] = val;
>> +		}
>> +	} else
>> +		return -EINVAL;
>> +	return 0;
>> +}
>> +
>> +
>> +static int
>> +rte_eth_devargs_parse_representor_ports(char *str, void *data)
>> +{
>> +	struct rte_eth_devargs *eth_da = data;
>> +
>> +	return rte_eth_devargs_process_range(str, eth_da->representor_ports,
>> +		&eth_da->nb_representor_ports, RTE_MAX_ETHPORTS);
>> +}
>> +
>> +int __rte_experimental
>> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
>> +{
>> +	struct rte_kvargs args;
>> +	struct rte_kvargs_pair *pair;
>> +	unsigned int i;
>> +	int result = 0;
>> +
>> +	memset(eth_da, 0, sizeof(*eth_da));
>> +
>> +	result = rte_eth_devargs_tokenise(&args, dargs);
>> +	if (result < 0)
>> +		goto parse_cleanup;
>> +
>> +	for (i = 0; i < args.count; i++) {
>> +		pair = &args.pairs[i];
>> +		if (strcmp("representor", pair->key) == 0) {
>> +			result = rte_eth_devargs_parse_list(pair->value,
>> +				rte_eth_devargs_parse_representor_ports,
>> +				eth_da);
>> +			if (result < 0)
>> +				goto parse_cleanup;
>> +		}
>> +	}
>> +
>> +parse_cleanup:
>> +	if (args.str)
>> +		free(args.str);
>> +
>> +	return result;
>> +}
>> +
>>   RTE_INIT(ethdev_init_log);
>>   static void
>>   ethdev_init_log(void)
>> diff --git a/lib/librte_ether/rte_ethdev_driver.h b/lib/librte_ether/rte_ethdev_driver.h
>> index 8c61ab2f4..492da754a 100644
>> --- a/lib/librte_ether/rte_ethdev_driver.h
>> +++ b/lib/librte_ether/rte_ethdev_driver.h
>> @@ -189,6 +189,36 @@ rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>>   }
>>
>>
>> +/** Generic Ethernet device arguments  */
>> +struct rte_eth_devargs {
>> +	uint16_t ports[RTE_MAX_ETHPORTS];
>> +	/** port/s number to enable on a multi-port single function */
>> +	uint16_t nb_ports;
>> +	/** number of ports in ports field */
>> +	uint16_t representor_ports[RTE_MAX_ETHPORTS];
>> +	/** representor port/s identifier to enable on device */
>> +	uint16_t nb_representor_ports;
>> +	/** number of ports in representor port field */
>> +};
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * PMD helper function to parse ethdev arguments
>> + *
>> + * @param devargs
>> + *  device arguments
>> + * @param eth_devargs
>> + *  parsed ethdev specific arguments.
>> + *
>> + * @return
>> + *   Negative errno value on error, 0 on success.
>> + */
>> +int __rte_experimental
>> +rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_devargs);
>> +
>> +
>>   typedef int (*ethdev_init_t)(struct rte_eth_dev *ethdev, void *init_params);
>>   typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
>>   	void *bus_specific_init_params);
>> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
>> index c4380aa31..41c3d2699 100644
>> --- a/lib/librte_ether/rte_ethdev_version.map
>> +++ b/lib/librte_ether/rte_ethdev_version.map
>> @@ -206,6 +206,7 @@ DPDK_18.02 {
>>   EXPERIMENTAL {
>>   	global:
>>
>> +	rte_eth_devargs_parse;
>>   	rte_eth_dev_count_avail;
>>   	rte_eth_dev_count_total;
>>   	rte_eth_dev_create;
>> --
>> 2.14.3
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 12:03       ` Ananyev, Konstantin
  2018-04-26 14:21         ` Ferruh Yigit
  2018-04-26 14:28         ` Doherty, Declan
@ 2018-04-26 14:30         ` Remy Horton
  2 siblings, 0 replies; 73+ messages in thread
From: Remy Horton @ 2018-04-26 14:30 UTC (permalink / raw)
  To: Ananyev, Konstantin, Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler


On 26/04/2018 13:03, Ananyev, Konstantin wrote:
[..]
> I still think that if you'd like to extend rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]",
> you have to make it generic kvargs ability and put it into librte_kvargs, not try to introduce your own new parser here.
> Imagine that in addition to your 'port=[val1,val2, ..valn]' devargs string would contain some extra (let say device specific)
> parameters.
> What would happen, when PMD will try to use rte_kvargs_parse() on such string?
> My understanding - it would fail, correct?

This is partly dependent on what will (and won't) devargs provide when 
it is finalised. It was insourced in order to unblock the rest of the 
patchset in the meantime.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 14:28         ` Doherty, Declan
@ 2018-04-26 14:44           ` Thomas Monjalon
  2018-04-26 14:48           ` Ananyev, Konstantin
  1 sibling, 0 replies; 73+ messages in thread
From: Thomas Monjalon @ 2018-04-26 14:44 UTC (permalink / raw)
  To: Doherty, Declan
  Cc: Ananyev, Konstantin, dev, Adrien Mazarguil, Yigit, Ferruh,
	Shahaf Shuler, Horton, Remy, gaetan.rivet

26/04/2018 16:28, Doherty, Declan:
> On 26/04/2018 1:03 PM, Ananyev, Konstantin wrote:
> > From: Doherty, Declan
> >> +typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
> >> +
> >> +static int
> >> +rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
> >> +{
> > 
> > I still think that if you'd like to extend rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]",
> > you have to make it generic kvargs ability and put it into librte_kvargs, not try to introduce your own new parser here.
> > Imagine that in addition to your 'port=[val1,val2, ..valn]' devargs string would contain some extra (let say device specific)
> > parameters.
> > What would happen, when PMD will try to use rte_kvargs_parse() on such string?
> > My understanding - it would fail, correct?
> > 
> > As an alternative - as I remember rte_kvargs allows you to have multiple identical key, i.e: "key=val1,key=val2,...,key=valn".
> > Why not to use that way, if you don't want to introduce extra code in rte_kvargs?
> > 
> 
> Hey Konstantin, the rationale for keeping this independent from 
> librte_kvargs was that it is likely that the implementation of parsing 
> devarfs will change in the next release due the proposed rework on the 
> whole devargs infrastructure. I hadn't considered the potential issue 
> with rte_kvargs_parse() on string using the proposed syntax here. I'll 
> send a patch for alignment with librte_kvargs for the next release 
> candidate.

The new devargs infra will rely on librte_kvargs.
So improving kvargs is the right way here.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser
  2018-04-26 14:28         ` Doherty, Declan
  2018-04-26 14:44           ` Thomas Monjalon
@ 2018-04-26 14:48           ` Ananyev, Konstantin
  1 sibling, 0 replies; 73+ messages in thread
From: Ananyev, Konstantin @ 2018-04-26 14:48 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Adrien Mazarguil, Yigit, Ferruh, Thomas Monjalon, Shahaf Shuler,
	Horton, Remy



> -----Original Message-----
> From: Doherty, Declan
> Sent: Thursday, April 26, 2018 3:29 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Shahaf Shuler <shahafs@mellanox.com>; Horton, Remy <remy.horton@intel.com>
> Subject: Re: [dpdk-dev][PATCH v8 6/9] ethdev: add common devargs parser
> 
> On 26/04/2018 1:03 PM, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Doherty, Declan
> >> Sent: Thursday, April 26, 2018 11:41 AM
> >> To: dev@dpdk.org
> >> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>;
> >> Shahaf Shuler <shahafs@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Horton, Remy
> <remy.horton@intel.com>;
> >> Doherty, Declan <declan.doherty@intel.com>
> >> Subject: [dpdk-dev][PATCH v8 6/9] ethdev: add common devargs parser
> >>
> >> From: Remy Horton <remy.horton@intel.com>
> >>
> >> Introduces a new structure, rte_eth_devargs, to support generic
> >> ethdev arguments common across NET PMDs, with a new API
> >> rte_eth_devargs_parse API to support PMD parsing these arguments. The
> >> patch add support for a representor argument  passed with passed with
> >> the EAL -w option. The representor parameter allows the user to specify
> >> which representor ports to initialise on a device.
> >>
> >> The argument supports passing a single representor port, a list of
> >> port values or a range of port values.
> >>
> >> -w BDF,representor=1  # create representor port 1 on pci device BDF
> >> -w BDF,representor=[1,2,5,6,10] # create representor ports in list
> >> -w BDF,representor=[0-31] # create representor ports in range
> >>
> >> Signed-off-by: Remy Horton <remy.horton@intel.com>
> >> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> >> ---
> >>   doc/guides/prog_guide/poll_mode_drv.rst |  19 ++++
> >>   lib/Makefile                            |   1 +
> >>   lib/librte_ether/rte_ethdev.c           | 182 ++++++++++++++++++++++++++++++++
> >>   lib/librte_ether/rte_ethdev_driver.h    |  30 ++++++
> >>   lib/librte_ether/rte_ethdev_version.map |   1 +
> >>   5 files changed, 233 insertions(+)
> >>
> >> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> >> index e5d01874e..09a93baec 100644
> >> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >> @@ -345,6 +345,25 @@ Ethernet Device API
> >>
> >>   The Ethernet device API exported by the Ethernet PMDs is described in the *DPDK API Reference*.
> >>
> >> +Ethernet Device Standard Device Arguments
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +Standard Ethernet device arguments allow for a set of commonly used arguments/
> >> +parameters which are applicable to all Ethernet devices to be available to for
> >> +specification of specific device and for passing common configuration
> >> +parameters to those ports.
> >> +
> >> +* ``representor`` for a device which supports the creation of representor ports
> >> +  this argument allows user to specify which switch ports to enable port
> >> +  representors for.::
> >> +
> >> +   -w BDBF,representor=0
> >> +   -w BDBF,representor=[0,4,6,9]
> >> +   -w BDBF,representor=[0-31]
> >> +
> >> +Note: PMDs are not required to support the standard device arguments and users
> >> +should consult the relevant PMD documentation to see support devargs.
> >> +
> >>   Extended Statistics API
> >>   ~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> diff --git a/lib/Makefile b/lib/Makefile
> >> index 965be6c8d..536775e59 100644
> >> --- a/lib/Makefile
> >> +++ b/lib/Makefile
> >> @@ -21,6 +21,7 @@ DEPDIRS-librte_cmdline := librte_eal
> >>   DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
> >>   DEPDIRS-librte_ether := librte_net librte_eal librte_mempool librte_ring
> >>   DEPDIRS-librte_ether += librte_mbuf
> >> +DEPDIRS-librte_ether += librte_kvargs
> >>   DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev
> >>   DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf
> >>   DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
> >> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> >> index 621f8af7f..cb85d8bb7 100644
> >> --- a/lib/librte_ether/rte_ethdev.c
> >> +++ b/lib/librte_ether/rte_ethdev.c
> >> @@ -34,6 +34,7 @@
> >>   #include <rte_errno.h>
> >>   #include <rte_spinlock.h>
> >>   #include <rte_string_fns.h>
> >> ++#include <rte_kvargs.h>
> >>
> >>   #include "rte_ether.h"
> >>   #include "rte_ethdev.h"
> >> @@ -4101,6 +4102,187 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
> >>   	return (*dev->dev_ops->pool_ops_supported)(dev, pool);
> >>   }
> >>
> >> +typedef int (*rte_eth_devargs_callback_t)(char *str, void *data);
> >> +
> >> +static int
> >> +rte_eth_devargs_tokenise(struct rte_kvargs *arglist, const char *str_in)
> >> +{
> >
> > I still think that if you'd like to extend rte_kvarrgs to be able to parse something like: "key=[val1,val2,...,valn]",
> > you have to make it generic kvargs ability and put it into librte_kvargs, not try to introduce your own new parser here.
> > Imagine that in addition to your 'port=[val1,val2, ..valn]' devargs string would contain some extra (let say device specific)
> > parameters.
> > What would happen, when PMD will try to use rte_kvargs_parse() on such string?
> > My understanding - it would fail, correct?
> >
> > As an alternative - as I remember rte_kvargs allows you to have multiple identical key, i.e: "key=val1,key=val2,...,key=valn".
> > Why not to use that way, if you don't want to introduce extra code in rte_kvargs?
> >
> 
> Hey Konstantin, the rationale for keeping this independent from
> librte_kvargs was that it is likely that the implementation of parsing
> devarfs will change in the next release due the proposed rework on the
> whole devargs infrastructure. I hadn't considered the potential issue
> with rte_kvargs_parse() on string using the proposed syntax here. I'll
> send a patch for alignment with librte_kvargs for the next release
> candidate.

Ok, thanks Declan.
Konstantin


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation
  2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
                       ` (8 preceding siblings ...)
  2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 9/9] net/ixgbe: " Declan Doherty
@ 2018-04-26 16:24     ` Ferruh Yigit
  2018-04-26 16:35       ` Ferruh Yigit
  9 siblings, 1 reply; 73+ messages in thread
From: Ferruh Yigit @ 2018-04-26 16:24 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Adrien Mazarguil, Thomas Monjalon, Shahaf Shuler, Konstantin Ananyev

On 4/26/2018 11:40 AM, Declan Doherty wrote:
> This patchset follows on from the port rerpesentor patchsets and the
> community discussion that resulted. It outlines the model for
> representing and controlling switching capable devices in a new
> programmer's guide entry based upon the excellent summary by
> Adrien Mazarguil in
> (http://dpdk.org/ml/archives/dev/2018-March/092513.html).
> 
> The next patches introduce changes to librte_ether to:
> 1, support the definition of a switch domain and make it public to
> application through the rte_eth_dev_info structure.
> 2, Add generic ethdev create/destroy APIs to facilitate and generalise the
> creation of ethdev's on different bus types.
> 3, Add ethdev attribute to dev_flags to specify that a port is a
> representor port and make public through the rte_eth_dev_info
> structure.
> 4, Add devargs parsing for generic eth_devargs to facilate parsing in
> NET PMDs. This will be refactored to take account of the changes in
> (http://dpdk.org/ml/archives/dev/2018-March/092513.html)
> 5, Add new API to allocate switch domain ids to devices which support
> this feature.
> 
> This patchset also includes the enablement of vf port representor for ixgbe
> and i40e PF devices.
> 
> V8:
> - add detailed descriptions to switch information structures
> - fix err condition checking ethdev create helper function
> - fix devargs memory leak and error checking + add documentation on
>   ethdev args.
> - remove rte_eth_switch_domains structure from export items.
> 
> V7:
> 
> This patch address the following changes:
>  - fixes in documentation patch
>  - changes the default value of switch domain id to be INVALID to allow
>    applications to easily identify devices which can/cannot support the
>    concept. Updates the switch information available through the
>    rte_eth_dev_info structure.
>  - remove the rte_ethdev_representor.h header and leave representor
>    specific initialisation to driver
>  - add new APIs for allocating and freeing switch domain identifier to
>    enable PMDs to have unique switch domaind ids without the ethdev
>    infrastructure placing any restriction on how theses are managed by
>    devices.
>  - bug fix in ethdev args parsing code.
> 
> Declan Doherty (8):
>   doc: add switch representation documentation
>   ethdev: add switch identifier parameter to port
>   ethdev: add generic create/destroy ethdev APIs
>   ethdev: Add port representor device flag
>   app/testpmd: add port name to device info
>   ethdev: add switch domain allocator
>   net/i40e: add support for representor ports
>   net/ixgbe: add support for representor ports
> 
> Remy Horton (1):
>   ethdev: add common devargs parser


For series,
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation
  2018-04-26 16:24     ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Ferruh Yigit
@ 2018-04-26 16:35       ` Ferruh Yigit
  0 siblings, 0 replies; 73+ messages in thread
From: Ferruh Yigit @ 2018-04-26 16:35 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Adrien Mazarguil, Thomas Monjalon, Shahaf Shuler, Konstantin Ananyev

On 4/26/2018 5:24 PM, Ferruh Yigit wrote:
> On 4/26/2018 11:40 AM, Declan Doherty wrote:
>> This patchset follows on from the port rerpesentor patchsets and the
>> community discussion that resulted. It outlines the model for
>> representing and controlling switching capable devices in a new
>> programmer's guide entry based upon the excellent summary by
>> Adrien Mazarguil in
>> (http://dpdk.org/ml/archives/dev/2018-March/092513.html).
>>
>> The next patches introduce changes to librte_ether to:
>> 1, support the definition of a switch domain and make it public to
>> application through the rte_eth_dev_info structure.
>> 2, Add generic ethdev create/destroy APIs to facilitate and generalise the
>> creation of ethdev's on different bus types.
>> 3, Add ethdev attribute to dev_flags to specify that a port is a
>> representor port and make public through the rte_eth_dev_info
>> structure.
>> 4, Add devargs parsing for generic eth_devargs to facilate parsing in
>> NET PMDs. This will be refactored to take account of the changes in
>> (http://dpdk.org/ml/archives/dev/2018-March/092513.html)
>> 5, Add new API to allocate switch domain ids to devices which support
>> this feature.
>>
>> This patchset also includes the enablement of vf port representor for ixgbe
>> and i40e PF devices.
>>
>> V8:
>> - add detailed descriptions to switch information structures
>> - fix err condition checking ethdev create helper function
>> - fix devargs memory leak and error checking + add documentation on
>>   ethdev args.
>> - remove rte_eth_switch_domains structure from export items.
>>
>> V7:
>>
>> This patch address the following changes:
>>  - fixes in documentation patch
>>  - changes the default value of switch domain id to be INVALID to allow
>>    applications to easily identify devices which can/cannot support the
>>    concept. Updates the switch information available through the
>>    rte_eth_dev_info structure.
>>  - remove the rte_ethdev_representor.h header and leave representor
>>    specific initialisation to driver
>>  - add new APIs for allocating and freeing switch domain identifier to
>>    enable PMDs to have unique switch domaind ids without the ethdev
>>    infrastructure placing any restriction on how theses are managed by
>>    devices.
>>  - bug fix in ethdev args parsing code.
>>
>> Declan Doherty (8):
>>   doc: add switch representation documentation
>>   ethdev: add switch identifier parameter to port
>>   ethdev: add generic create/destroy ethdev APIs
>>   ethdev: Add port representor device flag
>>   app/testpmd: add port name to device info
>>   ethdev: add switch domain allocator
>>   net/i40e: add support for representor ports
>>   net/ixgbe: add support for representor ports
>>
>> Remy Horton (1):
>>   ethdev: add common devargs parser
> 
> 
> For series,
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Series applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port
  2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port Declan Doherty
  2018-04-26 12:02       ` Thomas Monjalon
@ 2018-04-27 16:29       ` Ferruh Yigit
  1 sibling, 0 replies; 73+ messages in thread
From: Ferruh Yigit @ 2018-04-27 16:29 UTC (permalink / raw)
  To: Declan Doherty, dev
  Cc: Adrien Mazarguil, Thomas Monjalon, Shahaf Shuler, Konstantin Ananyev

On 4/26/2018 11:40 AM, Declan Doherty wrote:
> Introduces a new port attribute to ethdev port's which denotes the
> switch domain a port belongs to. By default all port's switch
> identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports
> which supported the concept of switch domains can be configured with
> the same switch domain id.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> ---
>  app/test-pmd/config.c         | 12 ++++++++++++
>  lib/librte_ether/rte_ethdev.h | 27 +++++++++++++++++++++++++++

Patch updated in next-net to remove deprecation notice:

  diff --git a/doc/guides/rel_notes/deprecation.rst
b/doc/guides/rel_notes/deprecation.rst
 index fd85a141b..c3b79a22f 100644
  --- a/doc/guides/rel_notes/deprecation.rst
  +++ b/doc/guides/rel_notes/deprecation.rst
 @@ -59,12 +59,6 @@ Deprecation Notices
    Target release for removal of the legacy API will be defined once most
     PMDs have switched to rte_flow.

  -* ethdev: A work is being planned for 18.05 to expose VF port representors
  -  as a mean to perform control and data path operation on the different VFs.
  -  As VF representor is an ethdev port, new fields are needed in order to map
  -  between the VF representor and the VF or the parent PF. Those new fields
  -  are to be included in ``rte_eth_dev_info`` struct.
  -
   * i40e: The default flexible payload configuration which extracts the first 16
     bytes of the payload for RSS will be deprecated starting from 18.02. If
     required the previous behavior can be configured using existing flow

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2018-04-27 16:29 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-28 13:54 [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 1/8] doc: add switch representation documentation Declan Doherty
2018-03-28 14:53   ` Thomas Monjalon
2018-03-28 15:05     ` Doherty, Declan
2018-04-03 15:52   ` Adrien Mazarguil
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 2/8] ethdev: add switch identifier parameter to port Declan Doherty
2018-03-29  6:13   ` Shahaf Shuler
2018-03-29  9:13     ` Doherty, Declan
2018-03-29 10:12       ` Shahaf Shuler
2018-03-29 15:12         ` Doherty, Declan
2018-04-01  6:10           ` Shahaf Shuler
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 3/8] ethdev: add generic create/destroy ethdev APIs Declan Doherty
2018-03-29  6:13   ` Shahaf Shuler
2018-03-29  9:22     ` Doherty, Declan
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag Declan Doherty
2018-03-29  6:13   ` Shahaf Shuler
2018-03-29  7:34     ` Thomas Monjalon
2018-03-29 14:53     ` Doherty, Declan
2018-04-01  6:14       ` Shahaf Shuler
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 5/8] app/testpmd: add port name to device info Declan Doherty
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 6/8] ethdev: add common devargs parser Declan Doherty
2018-03-29 12:12   ` Gaëtan Rivet
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 7/8] net/i40e: add support for representor ports Declan Doherty
2018-03-28 13:54 ` [dpdk-dev] [PATCH v6 8/8] net/ixgbe: " Declan Doherty
2018-04-16 13:05 ` [dpdk-dev] [PATCH v7 0/9] switching devices representation Declan Doherty
2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation Declan Doherty
2018-04-16 15:55     ` Kovacevic, Marko
2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 2/9] ethdev: add switch identifier parameter to port Declan Doherty
2018-04-24 16:38     ` Thomas Monjalon
2018-04-16 13:05   ` [dpdk-dev] [PATCH v7 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
2018-04-20 13:01     ` Ananyev, Konstantin
2018-04-24 17:48     ` Thomas Monjalon
2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 4/9] ethdev: Add port representor device flag Declan Doherty
2018-04-24 19:37     ` Thomas Monjalon
2018-04-25 12:17       ` Doherty, Declan
2018-04-25 12:23         ` Thomas Monjalon
2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 5/9] app/testpmd: add port name to device info Declan Doherty
2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 6/9] ethdev: add common devargs parser Declan Doherty
2018-04-20 13:16     ` Ananyev, Konstantin
2018-04-24 19:53     ` Thomas Monjalon
2018-04-25  9:40       ` Remy Horton
2018-04-25 10:06         ` Thomas Monjalon
2018-04-25 10:45           ` Remy Horton
2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 7/9] ethdev: add switch domain allocator Declan Doherty
2018-04-20 13:22     ` Ananyev, Konstantin
2018-04-24 19:58     ` Thomas Monjalon
2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 8/9] net/i40e: add support for representor ports Declan Doherty
2018-04-16 13:06   ` [dpdk-dev] [PATCH v7 9/9] net/ixgbe: " Declan Doherty
2018-04-20 13:29     ` Ananyev, Konstantin
2018-04-26 10:40   ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Declan Doherty
2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 1/9] doc: add switch representation documentation Declan Doherty
2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 2/9] ethdev: add switch identifier parameter to port Declan Doherty
2018-04-26 12:02       ` Thomas Monjalon
2018-04-26 14:26         ` Thomas Monjalon
2018-04-27 16:29       ` Ferruh Yigit
2018-04-26 10:40     ` [dpdk-dev] [PATCH v8 3/9] ethdev: add generic create/destroy ethdev APIs Declan Doherty
2018-04-26 12:16       ` Ferruh Yigit
2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 4/9] ethdev: Add port representor device flag Declan Doherty
2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 5/9] app/testpmd: add port name to device info Declan Doherty
2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 6/9] ethdev: add common devargs parser Declan Doherty
2018-04-26 12:03       ` Ananyev, Konstantin
2018-04-26 14:21         ` Ferruh Yigit
2018-04-26 14:28         ` Doherty, Declan
2018-04-26 14:44           ` Thomas Monjalon
2018-04-26 14:48           ` Ananyev, Konstantin
2018-04-26 14:30         ` Remy Horton
2018-04-26 12:15       ` Ferruh Yigit
2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 7/9] ethdev: add switch domain allocator Declan Doherty
2018-04-26 12:27       ` Ananyev, Konstantin
2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 8/9] net/i40e: add support for representor ports Declan Doherty
2018-04-26 10:41     ` [dpdk-dev] [PATCH v8 9/9] net/ixgbe: " Declan Doherty
2018-04-26 16:24     ` [dpdk-dev] [dpdk=-dev][PATCH v8 0/9] switching devices representation Ferruh Yigit
2018-04-26 16:35       ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).