* [dpdk-dev] [PATCH v2 1/3] ethdev: identify SR-IOV VF from host
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 0/3] " Thomas Monjalon
@ 2019-10-29 18:50 ` Thomas Monjalon
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 2/3] ethdev: set VF MAC address " Thomas Monjalon
` (3 subsequent siblings)
4 siblings, 0 replies; 36+ messages in thread
From: Thomas Monjalon @ 2019-10-29 18:50 UTC (permalink / raw)
To: Ferruh Yigit, Andrew Rybchenko; +Cc: dev
In a virtual environment, the network controller may have to configure
some SR-IOV VF parameters for security reasons.
When the PF (host port) is driven by DPDK (OVS-DPDK case),
we face two different cases:
- driver is bifurcated (Mellanox case),
so the VF can be configured via the kernel.
- driver is on top of UIO or VFIO, so DPDK API is required,
and PMD-specific APIs were used.
This new generic API will avoid vendors fragmentation.
In order to target a VF (which has no port ID in the host),
the higher bit of port ID is reserved to be used
with port representor ID in existing functions.
Summary:
representor ID + VF bit == VF ID
If a function is not expected to do VF configuration,
or if the port does not control any VF configuration,
it returns -EINVAL or -ENODEV.
If a function can do VF configuration,
but the PMD does not support it, then -ENOTSUP should be returned.
The port can allow the use of the VF bit per function
by adding the implementation to its vf_ops.
The new macro RTE_ETH_VALID_ID_OR_ERR_RET must be called
instead of RTE_ETH_VALID_PORTID_OR_ERR_RET
to allow the use of the VF bit.
The new macro CALL_OP_OR_ERR_RET can be used to help
calling the right function in dev_ops or vf_ops,
depending on the VF bit.
No feature is enabled in this commit.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
lib/librte_ethdev/rte_ethdev.c | 44 ++++++++++++++++++++++--
lib/librte_ethdev/rte_ethdev.h | 38 ++++++++++++++++++++
lib/librte_ethdev/rte_ethdev_core.h | 1 +
lib/librte_ethdev/rte_ethdev_version.map | 1 +
4 files changed, 81 insertions(+), 3 deletions(-)
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7743205d38..fb3da4dcc3 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -593,11 +593,36 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
int
rte_eth_dev_is_valid_port(uint16_t port_id)
{
+ /* legacy behaviour - without VF flag */
+ return rte_eth_dev_is_valid(port_id, 0);
+}
+
+static uint16_t
+port_id_parse(uint16_t port_id, bool *is_vf)
+{
+ *is_vf = (port_id & RTE_ETH_PORT_VF_FLAG) != 0;
+ return port_id & RTE_ETH_PORT_ID_MASK;
+}
+
+int
+rte_eth_dev_is_valid(uint16_t port_id, char allow_vf)
+{
+ bool is_vf;
+
+ port_id = port_id_parse(port_id, &is_vf);
+ if (is_vf && !allow_vf)
+ return 0; /* unallowed VF */
+
if (port_id >= RTE_MAX_ETHPORTS ||
(rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
- return 0;
- else
- return 1;
+ return 0; /* invalid port */
+
+ if (!is_vf)
+ return 1; /* valid port */
+
+ if (rte_eth_devices[port_id].vf_ops == NULL)
+ return 0; /* VF flag applies only to port controlling a VF */
+ return 2; /* VF connected to a valid port on the host */
}
static int
@@ -851,6 +876,19 @@ eth_err(uint16_t port_id, int ret)
return ret;
}
+static inline const struct eth_dev_ops *
+eth_dev_ops_get(const struct rte_eth_dev *dev, bool is_vf)
+{
+ if (is_vf)
+ return dev->vf_ops;
+ return dev->dev_ops;
+}
+
+#define ETH_DEV_OP_CALL(dev, vf, op, ...) ({ \
+ RTE_FUNC_PTR_OR_ERR_RET((eth_dev_ops_get(dev, vf)->op), -ENOTSUP); \
+ (eth_dev_ops_get(dev, vf)->op)(dev, ## __VA_ARGS__); \
+})
+
static int
rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
{
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index c36c1b631f..a410a195ce 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1345,6 +1345,13 @@ struct rte_eth_dcb_info {
#define RTE_ETH_ALL RTE_MAX_ETHPORTS
/* Macros to check for valid port */
+#define RTE_ETH_VALID_ID_OR_ERR_RET(port_id, retval) do { \
+ if (!rte_eth_dev_is_valid(port_id, 1)) { \
+ RTE_ETHDEV_LOG(ERR, "Invalid port_id=%u\n", port_id); \
+ return retval; \
+ } \
+} while (0)
+
#define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do { \
if (!rte_eth_dev_is_valid_port(port_id)) { \
RTE_ETHDEV_LOG(ERR, "Invalid port_id=%u\n", port_id); \
@@ -1468,6 +1475,17 @@ struct rte_eth_dev_owner {
/** Device does not support MAC change after started */
#define RTE_ETH_DEV_NOLIVE_MAC_ADDR 0x0020
+/**
+ * Highest bit of port ID is reserved for targeting controlled VF.
+ * This bit can be combined with the port ID of a representor
+ * which implements some vf_ops.
+ * The meaning is to target the VF connected with the representor port
+ * instead of the representor port itself.
+ */
+#define RTE_ETH_PORT_VF_FLAG (1 << 15)
+/** Mask to get representor port ID from VF ID, excluding VF flag. */
+#define RTE_ETH_PORT_ID_MASK (RTE_ETH_PORT_VF_FLAG - 1)
+
/**
* Iterates over valid ethdev ports owned by a specific owner.
*
@@ -1909,6 +1927,26 @@ int rte_eth_dev_socket_id(uint16_t port_id);
*/
int rte_eth_dev_is_valid_port(uint16_t port_id);
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if port_id of device is attached.
+ * The port_id can represent a VF connected to port
+ * implementing some vf_ops.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param allow_vf
+ * The bit RTE_ETH_PORT_VF_FLAG is considered valid.
+ * @return
+ * - 0 if port is not attached or unallowed VF
+ * - 1 if device is attached and not representing a VF
+ * - 2 if is a remote VF connected to a port implementing vf_ops
+ */
+__rte_experimental
+int rte_eth_dev_is_valid(uint16_t port_id, char allow_vf);
+
/**
* Start specified RX queue of a port. It is used when rx_deferred_start
* flag of the specified queue is true.
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 392aea8e6b..46bc01926d 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -682,6 +682,7 @@ struct rte_eth_dev {
struct rte_eth_dev_data *data; /**< Pointer to device data. */
void *process_private; /**< Pointer to per-process device data. */
const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
+ const struct eth_dev_ops *vf_ops; /**< Functions for VF control */
struct rte_device *device; /**< Backing device */
struct rte_intr_handle *intr_handle; /**< Device interrupt handle */
/** User application callbacks for NIC interrupts */
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index e59d51648f..09670d4bb3 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -285,6 +285,7 @@ EXPERIMENTAL {
rte_eth_read_clock;
# added in 19.11
+ rte_eth_dev_is_valid;
rte_eth_rx_burst_mode_get;
rte_eth_tx_burst_mode_get;
rte_eth_burst_mode_option_name;
--
2.23.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [PATCH v2 2/3] ethdev: set VF MAC address from host
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 0/3] " Thomas Monjalon
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 1/3] ethdev: identify " Thomas Monjalon
@ 2019-10-29 18:50 ` Thomas Monjalon
2019-11-01 0:18 ` [dpdk-dev] [RFC PATCH] net/i[xgb|40]e: " Thomas Monjalon
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 3/3] net/mlx5: " Thomas Monjalon
` (2 subsequent siblings)
4 siblings, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2019-10-29 18:50 UTC (permalink / raw)
To: Ferruh Yigit, Andrew Rybchenko; +Cc: dev
The API to set a default MAC address is extended
to support a VF ID as port ID.
In order to be supported by a driver,
the related vf_ops must be implemented.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
lib/librte_ethdev/rte_ethdev.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index fb3da4dcc3..9cf82ff10b 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -3386,20 +3386,23 @@ int
rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
{
struct rte_eth_dev *dev;
+ bool vf; /* true if port_id targets a connected VF */
int ret;
- RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ RTE_ETH_VALID_ID_OR_ERR_RET(port_id, -ENODEV);
if (!rte_is_valid_assigned_ether_addr(addr))
return -EINVAL;
+ port_id = port_id_parse(port_id, &vf);
dev = &rte_eth_devices[port_id];
- RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mac_addr_set, -ENOTSUP);
-
- ret = (*dev->dev_ops->mac_addr_set)(dev, addr);
+ ret = ETH_DEV_OP_CALL(dev, vf, mac_addr_set, addr);
if (ret < 0)
return ret;
+ if (vf)
+ return 0;
+
/* Update default address in NIC data structure */
rte_ether_addr_copy(addr, &dev->data->mac_addrs[0]);
--
2.23.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [RFC PATCH] net/i[xgb|40]e: set VF MAC address from host
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 2/3] ethdev: set VF MAC address " Thomas Monjalon
@ 2019-11-01 0:18 ` Thomas Monjalon
0 siblings, 0 replies; 36+ messages in thread
From: Thomas Monjalon @ 2019-11-01 0:18 UTC (permalink / raw)
To: Beilei Xing, Qi Zhang, Wenzhuo Lu, Konstantin Ananyev
Cc: dev, ferruh.yigit, bernard.iremonger, daniels, alexz,
declan.doherty, john.mcnamara
Allow to configure the default MAC address of a VF
via its representor port in the host,
when the VF is explicitly targeted with RTE_ETH_PORT_VF_FLAG.
In the long term, the representor behaviour should be changed
to configure itself when the VF is not targeted.
This RFC patch will be split for ixgbe and i40e.
If we can agree on implementing this VF API,
OVS will be have a generic solution.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
drivers/net/i40e/i40e_vf_representor.c | 19 ++++++++++++++++++-
drivers/net/ixgbe/ixgbe_vf_representor.c | 19 ++++++++++++++++++-
2 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index b07b35c03f..e944288959 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -325,7 +325,7 @@ i40e_vf_representor_mac_addr_remove(struct rte_eth_dev *ethdev, uint32_t index)
}
static int
-i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+i40e_remote_vf_mac_addr_set(struct rte_eth_dev *ethdev,
struct rte_ether_addr *mac_addr)
{
struct i40e_vf_representor *representor = ethdev->data->dev_private;
@@ -335,6 +335,17 @@ i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
representor->vf_id, mac_addr);
}
+static int
+i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+ struct rte_ether_addr *mac_addr)
+{
+ /* Should configure the representor port
+ * but configure the VF port instead,
+ * for compatibility purpose.
+ */
+ return i40e_remote_vf_mac_addr_set(ethdev, mac_addr);
+}
+
static int
i40e_vf_representor_vlan_filter_set(struct rte_eth_dev *ethdev,
uint16_t vlan_id, int on)
@@ -453,6 +464,11 @@ static const struct eth_dev_ops i40e_representor_dev_ops = {
};
+/* operations on VF from representor */
+const struct eth_dev_ops i40e_remote_vf_ops = {
+ .mac_addr_set = i40e_remote_vf_mac_addr_set,
+};
+
static uint16_t
i40e_vf_representor_rx_burst(__rte_unused void *rx_queue,
__rte_unused struct rte_mbuf **rx_pkts, __rte_unused uint16_t nb_pkts)
@@ -491,6 +507,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
/* Set representor device ops */
ethdev->dev_ops = &i40e_representor_dev_ops;
+ ethdev->vf_ops = &i40e_remote_vf_ops;
/* No data-path, but need stub Rx/Tx functions to avoid crash
* when testing with the likes of testpmd.
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
index dbbef294ae..17fccaf4f3 100644
--- a/drivers/net/ixgbe/ixgbe_vf_representor.c
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -24,7 +24,7 @@ ixgbe_vf_representor_link_update(struct rte_eth_dev *ethdev,
}
static int
-ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+ixgbe_remote_vf_mac_addr_set(struct rte_eth_dev *ethdev,
struct rte_ether_addr *mac_addr)
{
struct ixgbe_vf_representor *representor = ethdev->data->dev_private;
@@ -34,6 +34,17 @@ ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
representor->vf_id, mac_addr);
}
+static int
+ixgbe_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
+ struct rte_ether_addr *mac_addr)
+{
+ /* Should configure the representor port
+ * but configure the VF port instead,
+ * for compatibility purpose.
+ */
+ return ixgbe_remote_vf_mac_addr_set(ethdev, mac_addr);
+}
+
static int
ixgbe_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
struct rte_eth_dev_info *dev_info)
@@ -155,6 +166,11 @@ static const struct eth_dev_ops ixgbe_vf_representor_dev_ops = {
.mac_addr_set = ixgbe_vf_representor_mac_addr_set,
};
+/* operations on VF from representor */
+const struct eth_dev_ops ixgbe_remote_vf_ops = {
+ .mac_addr_set = ixgbe_remote_vf_mac_addr_set,
+};
+
static uint16_t
ixgbe_vf_representor_rx_burst(__rte_unused void *rx_queue,
__rte_unused struct rte_mbuf **rx_pkts, __rte_unused uint16_t nb_pkts)
@@ -198,6 +214,7 @@ ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
/* Set representor device ops */
ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
+ ethdev->vf_ops = &ixgbe_remote_vf_ops;
/* No data-path, but need stub Rx/Tx functions to avoid crash
* when testing with the likes of testpmd.
--
2.23.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [PATCH v2 3/3] net/mlx5: set VF MAC address from host
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 0/3] " Thomas Monjalon
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 1/3] ethdev: identify " Thomas Monjalon
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 2/3] ethdev: set VF MAC address " Thomas Monjalon
@ 2019-10-29 18:50 ` Thomas Monjalon
2019-10-30 4:08 ` [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF " Jerin Jacob
2019-10-30 15:07 ` Ilya Maximets
4 siblings, 0 replies; 36+ messages in thread
From: Thomas Monjalon @ 2019-10-29 18:50 UTC (permalink / raw)
To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko; +Cc: dev
Allow to configure the default MAC address of a VF
via its representor port in the host.
This patch is a stub to demonstrate how to implement a VF operation.
The real code is not implemented.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
drivers/net/mlx5/mlx5.c | 6 ++++++
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_mac.c | 19 +++++++++++++++++++
3 files changed, 26 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fac510507d..b679b5e299 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1073,6 +1073,11 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
.get_module_eeprom = mlx5_get_module_eeprom,
};
+/* Available operations for VF connected to a representor. */
+const struct eth_dev_ops mlx5_remote_vf_ops = {
+ .mac_addr_set = mlx5_vf_mac_addr_set,
+};
+
/**
* Verify and store value for device argument.
*
@@ -2130,6 +2135,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
eth_dev->rx_pkt_burst = removed_rx_burst;
eth_dev->tx_pkt_burst = removed_tx_burst;
eth_dev->dev_ops = &mlx5_dev_ops;
+ eth_dev->vf_ops = &mlx5_remote_vf_ops;
/* Register MAC address. */
claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
if (config.vf && config.vf_nl_en)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b6a51b2b4d..e34a24ee7c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -754,6 +754,7 @@ void mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
int mlx5_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
uint32_t index, uint32_t vmdq);
int mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr);
+int mlx5_vf_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac);
int mlx5_set_mc_addr_list(struct rte_eth_dev *dev,
struct rte_ether_addr *mc_addr_set,
uint32_t nb_mc_addr);
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 0ffef5c5db..4c4dd55b86 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -202,6 +202,25 @@ mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
return mlx5_mac_addr_add(dev, mac_addr, 0, 0);
}
+/**
+ * DPDK callback to set primary MAC address of the remote VF.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure of a representor.
+ * @param mac_addr
+ * MAC address to register in the remote VF.
+ *
+ * @return
+ * 0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vf_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac)
+{
+ DRV_LOG(DEBUG, "VF represented by port %u setting primary MAC address",
+ dev->data->port_id);
+ /* TODO: should not be this dev but its VF */
+ return mlx5_nl_mac_addr_add(dev, mac, 0);
+}
+
/**
* DPDK callback to set multicast addresses list.
*
--
2.23.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 0/3] " Thomas Monjalon
` (2 preceding siblings ...)
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 3/3] net/mlx5: " Thomas Monjalon
@ 2019-10-30 4:08 ` Jerin Jacob
2019-10-30 7:22 ` Shahaf Shuler
2019-10-30 8:56 ` Thomas Monjalon
2019-10-30 15:07 ` Ilya Maximets
4 siblings, 2 replies; 36+ messages in thread
From: Jerin Jacob @ 2019-10-30 4:08 UTC (permalink / raw)
To: Thomas Monjalon, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
Cc: dpdk-dev
On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> In a virtual environment, the network controller may have to configure
> some SR-IOV VF parameters for security reasons.
Just to understand, Could you explain more details/examples for
security reasons?
>
> When the PF (host port) is driven by DPDK (OVS-DPDK case),
> we face two different cases:
> - driver is bifurcated (Mellanox case),
> so the VF can be configured via the kernel.
> - driver is on top of UIO or VFIO, so DPDK API is required,
Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
the PF device.
It is only allowed through igb-uio out of tree driver without iommu support.
> and PMD-specific APIs were used.
> This new generic API will avoid vendors fragmentation.
The API is good. But I have concerns about the vendor implementation
of this API.
It can support only vendors with bifurcated driver(Mellanox case).
or using igb_uio(non iommu case) but not the devices with VFIO(Which
is the first-class citizen).
All the control plane control stuff to replace Linux with "port
representor" logic
will be of the mercy of an "out of tree" driver either with igb_uio
or http://patches.dpdk.org/patch/58810/
I am _not against_ on DPDK supports port representor or controlling
netdev VF traffic,
but if we have taken that path then DPDK should have the
infrastructure to support for
all driver models like VFIO(Addressed in [1])
I would have this question when DPDK starts supporting port
representor(but I was not
aware that kernel security issue on netdev ports controlled by DPDK in
non-bifurcated driver case
and concise effort block such scheme by kernel [2])
[1]
http://patches.dpdk.org/patch/58810/
[2]
https://patchwork.kernel.org/patch/10522381/
>
> Some PMD-specific API could migrate to this generic model.
> As an example, the default MAC address configuration is demonstrated
> for a VF mapped to mlx5 representor port.
>
> As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.
>
> I am sorry I had not send a patch since proposing a RFC in August.
> (I gave priority to the summit and the -rc1 release)
>
>
> Thomas Monjalon (3):
> ethdev: identify SR-IOV VF from host
> ethdev: set VF MAC address from host
> net/mlx5: set VF MAC address from host
>
> drivers/net/mlx5/mlx5.c | 6 +++
> drivers/net/mlx5/mlx5.h | 1 +
> drivers/net/mlx5/mlx5_mac.c | 19 ++++++++
> lib/librte_ethdev/rte_ethdev.c | 55 +++++++++++++++++++++---
> lib/librte_ethdev/rte_ethdev.h | 38 ++++++++++++++++
> lib/librte_ethdev/rte_ethdev_core.h | 1 +
> lib/librte_ethdev/rte_ethdev_version.map | 1 +
> 7 files changed, 114 insertions(+), 7 deletions(-)
>
> --
> 2.23.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 4:08 ` [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF " Jerin Jacob
@ 2019-10-30 7:22 ` Shahaf Shuler
2019-10-30 9:24 ` Jerin Jacob
2019-10-30 8:56 ` Thomas Monjalon
1 sibling, 1 reply; 36+ messages in thread
From: Shahaf Shuler @ 2019-10-30 7:22 UTC (permalink / raw)
To: Jerin Jacob, Thomas Monjalon, Stephen Hemminger,
Andrew Rybchenko, Ferruh Yigit
Cc: dpdk-dev
Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
>
> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> >
> > In a virtual environment, the network controller may have to configure
> > some SR-IOV VF parameters for security reasons.
>
> Just to understand, Could you explain more details/examples for security
> reasons?
>
> >
> > When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
> > different cases:
> > - driver is bifurcated (Mellanox case),
> > so the VF can be configured via the kernel.
> > - driver is on top of UIO or VFIO, so DPDK API is required,
>
> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
> PF device.
> It is only allowed through igb-uio out of tree driver without iommu support.
Per my understanding Thomas proposal is not to create the VFs from the PF device. it is to configure their network attributes from the PF after they have been created.
>
>
> > and PMD-specific APIs were used.
> > This new generic API will avoid vendors fragmentation.
>
> The API is good. But I have concerns about the vendor implementation of
> this API.
> It can support only vendors with bifurcated driver(Mellanox case).
> or using igb_uio(non iommu case) but not the devices with VFIO(Which is the
> first-class citizen).
>
> All the control plane control stuff to replace Linux with "port representor"
> logic will be of the mercy of an "out of tree" driver either with igb_uio or
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
> es.dpdk.org%2Fpatch%2F58810%2F&data=02%7C01%7Cshahafs%40mel
> lanox.com%7C8da6ffacecde48af24f608d75ceee28c%7Ca652971c7d2e4d9ba6
> a4d149256f461b%7C0%7C0%7C637080053397844419&sdata=sAIRqTnAN
> G8lIb2eYhvcylU%2F6%2F81eXPDeGbnrUdMnis%3D&reserved=0
I am not sure I follow.
Device that supports representor should enable the HV to configure their macs. It is the best if it can allow it using the in-tree drivers (VFIO, Mellanox bifurcated..) by using, for example, so device registers on the device bar.
Otherwise such vendor will need to recommend its customers to use other, out of tree, driver to get the needed functionality to enable switchdev and representors.
>
> I am _not against_ on DPDK supports port representor or controlling netdev
> VF traffic, but if we have taken that path then DPDK should have the
> infrastructure to support for all driver models like VFIO(Addressed in [1])
>
> I would have this question when DPDK starts supporting port
> representor(but I was not aware that kernel security issue on netdev ports
> controlled by DPDK in non-bifurcated driver case and concise effort block
> such scheme by kernel [2])
>
>
> [1]
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
> es.dpdk.org%2Fpatch%2F58810%2F&data=02%7C01%7Cshahafs%40mel
> lanox.com%7C8da6ffacecde48af24f608d75ceee28c%7Ca652971c7d2e4d9ba6
> a4d149256f461b%7C0%7C0%7C637080053397844419&sdata=sAIRqTnAN
> G8lIb2eYhvcylU%2F6%2F81eXPDeGbnrUdMnis%3D&reserved=0
> [2]
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatch
> work.kernel.org%2Fpatch%2F10522381%2F&data=02%7C01%7Cshahafs
> %40mellanox.com%7C8da6ffacecde48af24f608d75ceee28c%7Ca652971c7d2e
> 4d9ba6a4d149256f461b%7C0%7C0%7C637080053397844419&sdata=fyEo
> fHJQM51L8ssvLNyaLwrsCK8bBJiuPT%2FgMje3QxE%3D&reserved=0
>
>
>
> >
> > Some PMD-specific API could migrate to this generic model.
> > As an example, the default MAC address configuration is demonstrated
> > for a VF mapped to mlx5 representor port.
> >
> > As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.
> >
> > I am sorry I had not send a patch since proposing a RFC in August.
> > (I gave priority to the summit and the -rc1 release)
> >
> >
> > Thomas Monjalon (3):
> > ethdev: identify SR-IOV VF from host
> > ethdev: set VF MAC address from host
> > net/mlx5: set VF MAC address from host
> >
> > drivers/net/mlx5/mlx5.c | 6 +++
> > drivers/net/mlx5/mlx5.h | 1 +
> > drivers/net/mlx5/mlx5_mac.c | 19 ++++++++
> > lib/librte_ethdev/rte_ethdev.c | 55 +++++++++++++++++++++---
> > lib/librte_ethdev/rte_ethdev.h | 38 ++++++++++++++++
> > lib/librte_ethdev/rte_ethdev_core.h | 1 +
> > lib/librte_ethdev/rte_ethdev_version.map | 1 +
> > 7 files changed, 114 insertions(+), 7 deletions(-)
> >
> > --
> > 2.23.0
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 7:22 ` Shahaf Shuler
@ 2019-10-30 9:24 ` Jerin Jacob
2019-11-01 0:24 ` Thomas Monjalon
0 siblings, 1 reply; 36+ messages in thread
From: Jerin Jacob @ 2019-10-30 9:24 UTC (permalink / raw)
To: Shahaf Shuler
Cc: Thomas Monjalon, Stephen Hemminger, Andrew Rybchenko,
Ferruh Yigit, dpdk-dev
On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
> > Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> > host
> >
> > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > >
> > > In a virtual environment, the network controller may have to configure
> > > some SR-IOV VF parameters for security reasons.
> >
> > Just to understand, Could you explain more details/examples for security
> > reasons?
> >
> > >
> > > When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
> > > different cases:
> > > - driver is bifurcated (Mellanox case),
> > > so the VF can be configured via the kernel.
> > > - driver is on top of UIO or VFIO, so DPDK API is required,
> >
> > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
> > PF device.
> > It is only allowed through igb-uio out of tree driver without iommu support.
>
> Per my understanding Thomas proposal is not to create the VFs from the PF device. it is to configure their network attributes from the PF after they have been created.
Yes. My question is without creating the VF, How do you control them?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 9:24 ` Jerin Jacob
@ 2019-11-01 0:24 ` Thomas Monjalon
2019-11-01 9:06 ` Ilya Maximets
0 siblings, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2019-11-01 0:24 UTC (permalink / raw)
To: Jerin Jacob
Cc: dev, Shahaf Shuler, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
30/10/2019 10:24, Jerin Jacob:
> On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
> > Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
> > > Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> > > host
> > >
> > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> > > <thomas@monjalon.net> wrote:
> > > >
> > > > In a virtual environment, the network controller may have to configure
> > > > some SR-IOV VF parameters for security reasons.
> > >
> > > Just to understand, Could you explain more details/examples for security
> > > reasons?
> > >
> > > >
> > > > When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
> > > > different cases:
> > > > - driver is bifurcated (Mellanox case),
> > > > so the VF can be configured via the kernel.
> > > > - driver is on top of UIO or VFIO, so DPDK API is required,
> > >
> > > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
> > > PF device.
> > > It is only allowed through igb-uio out of tree driver without iommu support.
> >
> > Per my understanding Thomas proposal is not to create the VFs
> > from the PF device. it is to configure their network attributes
> > from the PF after they have been created.
>
> Yes. My question is without creating the VF, How do you control them?
We can create the VF via the kernel PF driver, before binding the PF to VFIO.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-01 0:24 ` Thomas Monjalon
@ 2019-11-01 9:06 ` Ilya Maximets
2019-11-01 9:56 ` Ilya Maximets
0 siblings, 1 reply; 36+ messages in thread
From: Ilya Maximets @ 2019-11-01 9:06 UTC (permalink / raw)
To: Thomas Monjalon, Jerin Jacob
Cc: dev, Shahaf Shuler, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
On 01.11.2019 1:24, Thomas Monjalon wrote:
> 30/10/2019 10:24, Jerin Jacob:
>> On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>>> Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>>>> host
>>>>
>>>> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
>>>> <thomas@monjalon.net> wrote:
>>>>>
>>>>> In a virtual environment, the network controller may have to configure
>>>>> some SR-IOV VF parameters for security reasons.
>>>>
>>>> Just to understand, Could you explain more details/examples for security
>>>> reasons?
>>>>
>>>>>
>>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
>>>>> different cases:
>>>>> - driver is bifurcated (Mellanox case),
>>>>> so the VF can be configured via the kernel.
>>>>> - driver is on top of UIO or VFIO, so DPDK API is required,
>>>>
>>>> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
>>>> PF device.
>>>> It is only allowed through igb-uio out of tree driver without iommu support.
>>>
>>> Per my understanding Thomas proposal is not to create the VFs
>>> from the PF device. it is to configure their network attributes
>>> from the PF after they have been created.
>>
>> Yes. My question is without creating the VF, How do you control them?
>
> We can create the VF via the kernel PF driver, before binding the PF to VFIO.
AFAIK, this is not possible. VFs are gone as soon as you're unbinding kernel
PF driver. And after binding of vfio-pci you can no longer create VFs.
I tried to check some representor functionality about 2 months ago and didn't
find a way to enable VFs on Intel NICs if PF is under control of vfio-pci.
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-01 9:06 ` Ilya Maximets
@ 2019-11-01 9:56 ` Ilya Maximets
0 siblings, 0 replies; 36+ messages in thread
From: Ilya Maximets @ 2019-11-01 9:56 UTC (permalink / raw)
To: Ilya Maximets, Thomas Monjalon, Jerin Jacob
Cc: dev, Shahaf Shuler, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
On 01.11.2019 10:06, Ilya Maximets wrote:
> On 01.11.2019 1:24, Thomas Monjalon wrote:
>> 30/10/2019 10:24, Jerin Jacob:
>>> On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>>>> Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
>>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>>>>> host
>>>>>
>>>>> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
>>>>> <thomas@monjalon.net> wrote:
>>>>>>
>>>>>> In a virtual environment, the network controller may have to configure
>>>>>> some SR-IOV VF parameters for security reasons.
>>>>>
>>>>> Just to understand, Could you explain more details/examples for security
>>>>> reasons?
>>>>>
>>>>>>
>>>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
>>>>>> different cases:
>>>>>> - driver is bifurcated (Mellanox case),
>>>>>> so the VF can be configured via the kernel.
>>>>>> - driver is on top of UIO or VFIO, so DPDK API is required,
>>>>>
>>>>> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
>>>>> PF device.
>>>>> It is only allowed through igb-uio out of tree driver without iommu support.
>>>>
>>>> Per my understanding Thomas proposal is not to create the VFs
>>>> from the PF device. it is to configure their network attributes
>>>> from the PF after they have been created.
>>>
>>> Yes. My question is without creating the VF, How do you control them?
>>
>> We can create the VF via the kernel PF driver, before binding the PF to VFIO.
>
> AFAIK, this is not possible. VFs are gone as soon as you're unbinding kernel
> PF driver. And after binding of vfio-pci you can no longer create VFs.
>
> I tried to check some representor functionality about 2 months ago and didn't
> find a way to enable VFs on Intel NICs if PF is under control of vfio-pci.
Likely it was i40e driver from the kernel side. I see from the lkml thread that
some drivers might not clear sriov on exit (ixgbe), but that wasn't my case and
it's actually a controversial feature in general.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 4:08 ` [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF " Jerin Jacob
2019-10-30 7:22 ` Shahaf Shuler
@ 2019-10-30 8:56 ` Thomas Monjalon
2019-10-30 9:15 ` Jerin Jacob
1 sibling, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2019-10-30 8:56 UTC (permalink / raw)
To: Jerin Jacob; +Cc: Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit, dpdk-dev
30/10/2019 05:08, Jerin Jacob:
> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > In a virtual environment, the network controller may have to configure
> > some SR-IOV VF parameters for security reasons.
>
> Just to understand, Could you explain more details/examples for
> security reasons?
Examples are setting the MAC address or the promiscuous mode.
These settings should be decided by the hypervisor,
and not freely set by the VM.
> > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > we face two different cases:
> > - driver is bifurcated (Mellanox case),
> > so the VF can be configured via the kernel.
> > - driver is on top of UIO or VFIO, so DPDK API is required,
>
> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> the PF device.
> It is only allowed through igb-uio out of tree driver without iommu support.
Not sure I said the contrary.
igb_uio and vfio_pf are 2 implementations of UIO and VFIO.
> > and PMD-specific APIs were used.
> > This new generic API will avoid vendors fragmentation.
>
> The API is good. But I have concerns about the vendor implementation
> of this API.
> It can support only vendors with bifurcated driver(Mellanox case).
> or using igb_uio(non iommu case) but not the devices with VFIO(Which
> is the first-class citizen).
Why not? (see above)
The API is agnostic to the kernel driver in use.
> All the control plane control stuff to replace Linux with "port
> representor" logic
> will be of the mercy of an "out of tree" driver either with igb_uio
> or http://patches.dpdk.org/patch/58810/
>
> I am _not against_ on DPDK supports port representor or controlling
> netdev VF traffic,
> but if we have taken that path then DPDK should have the
> infrastructure to support for
> all driver models like VFIO(Addressed in [1])
>
> I would have this question when DPDK starts supporting port
> representor(but I was not
> aware that kernel security issue on netdev ports controlled by DPDK in
> non-bifurcated driver case
> and concise effort block such scheme by kernel [2])
>
>
> [1]
> http://patches.dpdk.org/patch/58810/
> [2]
> https://patchwork.kernel.org/patch/10522381/
I feel you are using this thread to promote your vfio_pf implementation.
But this API and the kernel module are orthogonals.
Please can we focus on the DPDK API in this thread,
and talk about VFIO implementation in the other thread?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 8:56 ` Thomas Monjalon
@ 2019-10-30 9:15 ` Jerin Jacob
2019-11-01 0:33 ` Thomas Monjalon
0 siblings, 1 reply; 36+ messages in thread
From: Jerin Jacob @ 2019-10-30 9:15 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit, dpdk-dev
On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/10/2019 05:08, Jerin Jacob:
> > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > In a virtual environment, the network controller may have to configure
> > > some SR-IOV VF parameters for security reasons.
> >
> > Just to understand, Could you explain more details/examples for
> > security reasons?
>
> Examples are setting the MAC address or the promiscuous mode.
> These settings should be decided by the hypervisor,
> and not freely set by the VM.
What is hypervisor here, rte_flow based DPDK application over using
port representor?
>
> > > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > > we face two different cases:
> > > - driver is bifurcated (Mellanox case),
> > > so the VF can be configured via the kernel.
> > > - driver is on top of UIO or VFIO, so DPDK API is required,
> >
> > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> > the PF device.
> > It is only allowed through igb-uio out of tree driver without iommu support.
>
> Not sure I said the contrary.
> igb_uio and vfio_pf are 2 implementations of UIO and VFIO.
Yes. I am saying without out of tree module it is not possible.
>
> > > and PMD-specific APIs were used.
> > > This new generic API will avoid vendors fragmentation.
> >
> > The API is good. But I have concerns about the vendor implementation
> > of this API.
> > It can support only vendors with bifurcated driver(Mellanox case).
> > or using igb_uio(non iommu case) but not the devices with VFIO(Which
> > is the first-class citizen).
>
> Why not? (see above)
> The API is agnostic to the kernel driver in use.
My question is how do you enable in DPDK with VFIO if DPDK is giving
this feature?
>
> > All the control plane control stuff to replace Linux with "port
> > representor" logic
> > will be of the mercy of an "out of tree" driver either with igb_uio
> > or http://patches.dpdk.org/patch/58810/
> >
> > I am _not against_ on DPDK supports port representor or controlling
> > netdev VF traffic,
> > but if we have taken that path then DPDK should have the
> > infrastructure to support for
> > all driver models like VFIO(Addressed in [1])
> >
> > I would have this question when DPDK starts supporting port
> > representor(but I was not
> > aware that kernel security issue on netdev ports controlled by DPDK in
> > non-bifurcated driver case
> > and concise effort block such scheme by kernel [2])
> >
> >
> > [1]
> > http://patches.dpdk.org/patch/58810/
> > [2]
> > https://patchwork.kernel.org/patch/10522381/
>
> I feel you are using this thread to promote your vfio_pf implementation.
Yes. I am. Because, I need to think, How I can enable this API with VFIO.
Otherwise, I can not implement this API.
> But this API and the kernel module are orthogonals.
> Please can we focus on the DPDK API in this thread,
> and talk about VFIO implementation in the other thread?
Yes. My concerns are we keep adding APIs without thinking about how it
can be implemented
in the overall scheme of things. Just API is not enough.
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 9:15 ` Jerin Jacob
@ 2019-11-01 0:33 ` Thomas Monjalon
2019-11-01 11:01 ` Jerin Jacob
2019-11-01 13:25 ` Jerin Jacob
0 siblings, 2 replies; 36+ messages in thread
From: Thomas Monjalon @ 2019-11-01 0:33 UTC (permalink / raw)
To: Jerin Jacob; +Cc: dev, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
30/10/2019 10:15, Jerin Jacob:
> On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 30/10/2019 05:08, Jerin Jacob:
> > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >
> > > > In a virtual environment, the network controller may have to configure
> > > > some SR-IOV VF parameters for security reasons.
> > >
> > > Just to understand, Could you explain more details/examples for
> > > security reasons?
> >
> > Examples are setting the MAC address or the promiscuous mode.
> > These settings should be decided by the hypervisor,
> > and not freely set by the VM.
>
> What is hypervisor here, rte_flow based DPDK application over using
> port representor?
Yes, something like that. An example is OpenStack/OVS.
> > > > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > > > we face two different cases:
> > > > - driver is bifurcated (Mellanox case),
> > > > so the VF can be configured via the kernel.
> > > > - driver is on top of UIO or VFIO, so DPDK API is required,
> > >
> > > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> > > the PF device.
> > > It is only allowed through igb-uio out of tree driver without iommu support.
> >
> > Not sure I said the contrary.
> > igb_uio and vfio_pf are 2 implementations of UIO and VFIO.
>
> Yes. I am saying without out of tree module it is not possible.
>
> > > > and PMD-specific APIs were used.
> > > > This new generic API will avoid vendors fragmentation.
> > >
> > > The API is good. But I have concerns about the vendor implementation
> > > of this API.
> > > It can support only vendors with bifurcated driver(Mellanox case).
> > > or using igb_uio(non iommu case) but not the devices with VFIO(Which
> > > is the first-class citizen).
> >
> > Why not? (see above)
> > The API is agnostic to the kernel driver in use.
>
> My question is how do you enable in DPDK with VFIO if DPDK is giving
> this feature?
I didn't get your question.
If it is the same question as before, I think you can create the VF
before binding the PF to VFIO.
> > > All the control plane control stuff to replace Linux with "port
> > > representor" logic
> > > will be of the mercy of an "out of tree" driver either with igb_uio
> > > or http://patches.dpdk.org/patch/58810/
> > >
> > > I am _not against_ on DPDK supports port representor or controlling
> > > netdev VF traffic,
> > > but if we have taken that path then DPDK should have the
> > > infrastructure to support for
> > > all driver models like VFIO(Addressed in [1])
> > >
> > > I would have this question when DPDK starts supporting port
> > > representor(but I was not
> > > aware that kernel security issue on netdev ports controlled by DPDK in
> > > non-bifurcated driver case
> > > and concise effort block such scheme by kernel [2])
> > >
> > >
> > > [1]
> > > http://patches.dpdk.org/patch/58810/
> > > [2]
> > > https://patchwork.kernel.org/patch/10522381/
> >
> > I feel you are using this thread to promote your vfio_pf implementation.
>
> Yes. I am. Because, I need to think, How I can enable this API with VFIO.
> Otherwise, I can not implement this API.
>
> > But this API and the kernel module are orthogonals.
> > Please can we focus on the DPDK API in this thread,
> > and talk about VFIO implementation in the other thread?
>
> Yes. My concerns are we keep adding APIs without thinking about how it
> can be implemented
> in the overall scheme of things. Just API is not enough.
We keep thinking about all scenarios (maybe in other threads).
And adding an API is a step in the right direction in my opinion.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-01 0:33 ` Thomas Monjalon
@ 2019-11-01 11:01 ` Jerin Jacob
2019-11-01 13:25 ` Jerin Jacob
1 sibling, 0 replies; 36+ messages in thread
From: Jerin Jacob @ 2019-11-01 11:01 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dpdk-dev, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
On Fri, Nov 1, 2019 at 6:03 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/10/2019 10:15, Jerin Jacob:
> > On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 30/10/2019 05:08, Jerin Jacob:
> > > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > In a virtual environment, the network controller may have to configure
> > > > > some SR-IOV VF parameters for security reasons.
> > > >
> > > > Just to understand, Could you explain more details/examples for
> > > > security reasons?
> > >
> > > Examples are setting the MAC address or the promiscuous mode.
> > > These settings should be decided by the hypervisor,
> > > and not freely set by the VM.
> >
> > What is hypervisor here, rte_flow based DPDK application over using
> > port representor?
>
> Yes, something like that. An example is OpenStack/OVS.
> > > > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > > > > we face two different cases:
> > > > > - driver is bifurcated (Mellanox case),
> > > > > so the VF can be configured via the kernel.
> > > > > - driver is on top of UIO or VFIO, so DPDK API is required,
> > > >
> > > > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> > > > the PF device.
> > > > It is only allowed through igb-uio out of tree driver without iommu support.
> > >
> > > Not sure I said the contrary.
> > > igb_uio and vfio_pf are 2 implementations of UIO and VFIO.
> >
> > Yes. I am saying without out of tree module it is not possible.
> >
> > > > > and PMD-specific APIs were used.
> > > > > This new generic API will avoid vendors fragmentation.
> > > >
> > > > The API is good. But I have concerns about the vendor implementation
> > > > of this API.
> > > > It can support only vendors with bifurcated driver(Mellanox case).
> > > > or using igb_uio(non iommu case) but not the devices with VFIO(Which
> > > > is the first-class citizen).
> > >
> > > Why not? (see above)
> > > The API is agnostic to the kernel driver in use.
> >
> > My question is how do you enable in DPDK with VFIO if DPDK is giving
> > this feature?
>
> I didn't get your question.
> If it is the same question as before, I think you can create the VF
> before binding the PF to VFIO.
No more. Following patch made to stop that hack.
https://patchwork.kernel.org/patch/10522381/
>
> > > > All the control plane control stuff to replace Linux with "port
> > > > representor" logic
> > > > will be of the mercy of an "out of tree" driver either with igb_uio
> > > > or http://patches.dpdk.org/patch/58810/
> > > >
> > > > I am _not against_ on DPDK supports port representor or controlling
> > > > netdev VF traffic,
> > > > but if we have taken that path then DPDK should have the
> > > > infrastructure to support for
> > > > all driver models like VFIO(Addressed in [1])
> > > >
> > > > I would have this question when DPDK starts supporting port
> > > > representor(but I was not
> > > > aware that kernel security issue on netdev ports controlled by DPDK in
> > > > non-bifurcated driver case
> > > > and concise effort block such scheme by kernel [2])
> > > >
> > > >
> > > > [1]
> > > > http://patches.dpdk.org/patch/58810/
> > > > [2]
> > > > https://patchwork.kernel.org/patch/10522381/
> > >
> > > I feel you are using this thread to promote your vfio_pf implementation.
> >
> > Yes. I am. Because, I need to think, How I can enable this API with VFIO.
> > Otherwise, I can not implement this API.
> >
> > > But this API and the kernel module are orthogonals.
> > > Please can we focus on the DPDK API in this thread,
> > > and talk about VFIO implementation in the other thread?
> >
> > Yes. My concerns are we keep adding APIs without thinking about how it
> > can be implemented
> > in the overall scheme of things. Just API is not enough.
>
> We keep thinking about all scenarios (maybe in other threads).
> And adding an API is a step in the right direction in my opinion.
I am not against DPDK adding the API.
My concerns are Linux kernel folks don't like the parrel
infrastructure we are building
with port representor bypassing Linux kernel tools and infrastructure tools.
ie, DPDK controlling the Netdev VFS due to security issues.
And I agree. The Mellanox case is not an issue as bifurcated driver
architecture.
But all other vendors will have the issue. (using UIO or VFIO)
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-01 0:33 ` Thomas Monjalon
2019-11-01 11:01 ` Jerin Jacob
@ 2019-11-01 13:25 ` Jerin Jacob
2019-11-03 6:31 ` Shahaf Shuler
1 sibling, 1 reply; 36+ messages in thread
From: Jerin Jacob @ 2019-11-01 13:25 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dpdk-dev, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
On Fri, Nov 1, 2019 at 6:03 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/10/2019 10:15, Jerin Jacob:
> > On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 30/10/2019 05:08, Jerin Jacob:
> > > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > In a virtual environment, the network controller may have to configure
> > > > > some SR-IOV VF parameters for security reasons.
> > > >
> > > > Just to understand, Could you explain more details/examples for
> > > > security reasons?
> > >
> > > Examples are setting the MAC address or the promiscuous mode.
> > > These settings should be decided by the hypervisor,
> > > and not freely set by the VM.
> >
> > What is hypervisor here, rte_flow based DPDK application over using
> > port representor?
>
> Yes, something like that. An example is OpenStack/OVS.
So it is more of an orchestration primitive. Not the security
primitive as mentioned above,
Because in case the kernel will approve the data set from the guest. Right?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-01 13:25 ` Jerin Jacob
@ 2019-11-03 6:31 ` Shahaf Shuler
0 siblings, 0 replies; 36+ messages in thread
From: Shahaf Shuler @ 2019-11-03 6:31 UTC (permalink / raw)
To: Jerin Jacob, Thomas Monjalon
Cc: dpdk-dev, Stephen Hemminger, Andrew Rybchenko, Ferruh Yigit
Friday, November 1, 2019 3:25 PM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
>
> On Fri, Nov 1, 2019 at 6:03 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> >
> > 30/10/2019 10:15, Jerin Jacob:
> > > On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > >
> > > > 30/10/2019 05:08, Jerin Jacob:
> > > > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > >
> > > > > > In a virtual environment, the network controller may have to
> > > > > > configure some SR-IOV VF parameters for security reasons.
> > > > >
> > > > > Just to understand, Could you explain more details/examples for
> > > > > security reasons?
> > > >
> > > > Examples are setting the MAC address or the promiscuous mode.
> > > > These settings should be decided by the hypervisor, and not freely
> > > > set by the VM.
> > >
> > > What is hypervisor here, rte_flow based DPDK application over using
> > > port representor?
> >
> > Yes, something like that. An example is OpenStack/OVS.
>
> So it is more of an orchestration primitive. Not the security primitive as
> mentioned above,
Yes this feature is to support deployment w/ OpenStack and DPDK vSwitch using switchdev (representors).
One must configure the guest VF permanent MAC. w/ OVS kernel this is not an issue as all drivers bind to their kernel module.
w/ DPDK standard API is needed.
w/ Switchdev, even though the VF are promisc, all traffic is validated by the vSwitch running on the HV. Only after flow validation vSwitch will insert rule for device to directly fwd the packet to its target location.
Because in case the kernel will approve the data set from
> the guest. Right?
The exact behavior is device specific. But the kernel/user space driver on the HV is required to listen to events of recv mode change from VF and act upon.
All recv mode modification by guest should be validated by kernel, however this is a completely diff issue than what discussed here.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-29 18:50 ` [dpdk-dev] [PATCH v2 0/3] " Thomas Monjalon
` (3 preceding siblings ...)
2019-10-30 4:08 ` [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF " Jerin Jacob
@ 2019-10-30 15:07 ` Ilya Maximets
2019-10-30 15:49 ` Thomas Monjalon
4 siblings, 1 reply; 36+ messages in thread
From: Ilya Maximets @ 2019-10-30 15:07 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Shahaf Shuler, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger
On 29.10.2019 19:50, Thomas Monjalon wrote:
> In a virtual environment, the network controller may have to configure
> some SR-IOV VF parameters for security reasons.
>
> When the PF (host port) is driven by DPDK (OVS-DPDK case),
> we face two different cases:
> - driver is bifurcated (Mellanox case),
> so the VF can be configured via the kernel.
> - driver is on top of UIO or VFIO, so DPDK API is required,
> and PMD-specific APIs were used.
So, what is wrong with setting VF mac via the representor port as
it done now? From our previous discussion, I thought that we concluded
that is enough to have current API, i.e. just call set_default_mac()
for a representor port and this will lead to setting mac for VF.
This is how it's implemented in Linux kernel and this is how it's
implemented in current DPDK drivers that supports setting mac for
the representor.
The only use case for this new API is to be able to control mac of
the representor itself, which doesn't make much sense. At least there
are only hypothetical use cases. And once again, Linux kernel doesn't
support this behavior.
> This new generic API will avoid vendors fragmentation.
I don't see the fragmentation. Right now you can set VF mac from DPDK
by calling set_default_mac() for representor. This API exists for all
vendors. Not implemented for Intel, but new API is not implemented too.
The this is that this new API will produce conceptual fragmentation
between DPDK and the Linux kernel, because to do the same thing you'll
have to use different ways. I mean, to change mac of VF in kernel you
need to set mac to the representor, but in DPDK changing setting mac to
representor will lead to changing the mac of the representor itself,
not the VF. This will be really confusing for users.
>
> Some PMD-specific API could migrate to this generic model.
> As an example, the default MAC address configuration is demonstrated
> for a VF mapped to mlx5 representor port.
>
> As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.
>
> I am sorry I had not send a patch since proposing a RFC in August.
> (I gave priority to the summit and the -rc1 release)
>
>
> Thomas Monjalon (3):
> ethdev: identify SR-IOV VF from host
> ethdev: set VF MAC address from host
> net/mlx5: set VF MAC address from host
>
> drivers/net/mlx5/mlx5.c | 6 +++
> drivers/net/mlx5/mlx5.h | 1 +
> drivers/net/mlx5/mlx5_mac.c | 19 ++++++++
> lib/librte_ethdev/rte_ethdev.c | 55 +++++++++++++++++++++---
> lib/librte_ethdev/rte_ethdev.h | 38 ++++++++++++++++
> lib/librte_ethdev/rte_ethdev_core.h | 1 +
> lib/librte_ethdev/rte_ethdev_version.map | 1 +
> 7 files changed, 114 insertions(+), 7 deletions(-)
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 15:07 ` Ilya Maximets
@ 2019-10-30 15:49 ` Thomas Monjalon
2019-10-30 16:09 ` Ilya Maximets
0 siblings, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2019-10-30 15:49 UTC (permalink / raw)
To: Ilya Maximets
Cc: dev, Shahaf Shuler, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ferruh.yigit, arybchenko
30/10/2019 16:07, Ilya Maximets:
> On 29.10.2019 19:50, Thomas Monjalon wrote:
> > In a virtual environment, the network controller may have to configure
> > some SR-IOV VF parameters for security reasons.
> >
> > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > we face two different cases:
> > - driver is bifurcated (Mellanox case),
> > so the VF can be configured via the kernel.
> > - driver is on top of UIO or VFIO, so DPDK API is required,
> > and PMD-specific APIs were used.
>
> So, what is wrong with setting VF mac via the representor port as
> it done now? From our previous discussion, I thought that we concluded
> that is enough to have current API, i.e. just call set_default_mac()
> for a representor port and this will lead to setting mac for VF.
> This is how it's implemented in Linux kernel and this is how it's
> implemented in current DPDK drivers that supports setting mac for
> the representor.
I don't know what is done in the Linux kernel.
In DPDK, setting the MAC of the representor is really setting
the MAC of the representor. Is it crazy?
> The only use case for this new API is to be able to control mac of
> the representor itself, which doesn't make much sense. At least there
> are only hypothetical use cases. And once again, Linux kernel doesn't
> support this behavior.
I think it is sane to be able to set different MAC addresses
for the representor and the VF.
> > This new generic API will avoid vendors fragmentation.
>
> I don't see the fragmentation. Right now you can set VF mac from DPDK
> by calling set_default_mac() for representor. This API exists for all
> vendors. Not implemented for Intel, but new API is not implemented too.
No, the current situation is the following:
- for mlx5, VF MAC can be configured with iproute2 or netlink
- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
- for i40e, rte_pmd_i40e_set_vf_mac_addr()
- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
> The this is that this new API will produce conceptual fragmentation
> between DPDK and the Linux kernel, because to do the same thing you'll
> have to use different ways. I mean, to change mac of VF in kernel you
> need to set mac to the representor, but in DPDK changing setting mac to
> representor will lead to changing the mac of the representor itself,
> not the VF. This will be really confusing for users.
I am not responsible of the choices in Linux.
But I agree it would be interesting to check the reason of such decision.
Rony, please could you explain?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 15:49 ` Thomas Monjalon
@ 2019-10-30 16:09 ` Ilya Maximets
2019-10-30 21:42 ` Thomas Monjalon
0 siblings, 1 reply; 36+ messages in thread
From: Ilya Maximets @ 2019-10-30 16:09 UTC (permalink / raw)
To: Thomas Monjalon, Ilya Maximets
Cc: dev, Shahaf Shuler, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger
On 30.10.2019 16:49, Thomas Monjalon wrote:
> 30/10/2019 16:07, Ilya Maximets:
>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>> In a virtual environment, the network controller may have to configure
>>> some SR-IOV VF parameters for security reasons.
>>>
>>> When the PF (host port) is driven by DPDK (OVS-DPDK case),
>>> we face two different cases:
>>> - driver is bifurcated (Mellanox case),
>>> so the VF can be configured via the kernel.
>>> - driver is on top of UIO or VFIO, so DPDK API is required,
>>> and PMD-specific APIs were used.
>>
>> So, what is wrong with setting VF mac via the representor port as
>> it done now? From our previous discussion, I thought that we concluded
>> that is enough to have current API, i.e. just call set_default_mac()
>> for a representor port and this will lead to setting mac for VF.
>> This is how it's implemented in Linux kernel and this is how it's
>> implemented in current DPDK drivers that supports setting mac for
>> the representor.
>
> I don't know what is done in the Linux kernel.
> In DPDK, setting the MAC of the representor is really setting
> the MAC of the representor. Is it crazy?
Kind of. And no, it doesn't work this way in DPDK.
Just look at the i40e driver:
325 static int
326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
327 struct ether_addr *mac_addr)
328 {
329 struct i40e_vf_representor *representor = ethdev->data->dev_private;
330
331 return rte_pmd_i40e_set_vf_mac_addr(
332 representor->adapter->eth_dev->data->port_id,
333 representor->vf_id, mac_addr);
334 }
....
423 static const struct eth_dev_ops i40e_representor_dev_ops = {
<...>
445 .mac_addr_set = i40e_vf_representor_mac_addr_set,
It really calls the function to set VF mac address.
And for MLX drivers it's the same.
MLX driver call netlink to set representor MAC, but netlink is in
kernel and this call inside the kernel translates to the setting
of mac address of the VF.
Am I missing something?
>
>> The only use case for this new API is to be able to control mac of
>> the representor itself, which doesn't make much sense. At least there
>> are only hypothetical use cases. And once again, Linux kernel doesn't
>> support this behavior.
>
> I think it is sane to be able to set different MAC addresses
> for the representor and the VF.
>
>>> This new generic API will avoid vendors fragmentation.
>>
>> I don't see the fragmentation. Right now you can set VF mac from DPDK
>> by calling set_default_mac() for representor. This API exists for all
>> vendors. Not implemented for Intel, but new API is not implemented too.
>
> No, the current situation is the following:
> - for mlx5, VF MAC can be configured with iproute2 or netlink
> - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
> - for i40e, rte_pmd_i40e_set_vf_mac_addr()
> - for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
>
>> The this is that this new API will produce conceptual fragmentation
>> between DPDK and the Linux kernel, because to do the same thing you'll
>> have to use different ways. I mean, to change mac of VF in kernel you
>> need to set mac to the representor, but in DPDK changing setting mac to
>> representor will lead to changing the mac of the representor itself,
>> not the VF. This will be really confusing for users.
>
> I am not responsible of the choices in Linux.
> But I agree it would be interesting to check the reason of such decision.
> Rony, please could you explain?
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 16:09 ` Ilya Maximets
@ 2019-10-30 21:42 ` Thomas Monjalon
2019-11-01 9:32 ` Ilya Maximets
0 siblings, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2019-10-30 21:42 UTC (permalink / raw)
To: Ilya Maximets
Cc: dev, Shahaf Shuler, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde
30/10/2019 17:09, Ilya Maximets:
> On 30.10.2019 16:49, Thomas Monjalon wrote:
> > 30/10/2019 16:07, Ilya Maximets:
> >> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>> In a virtual environment, the network controller may have to configure
> >>> some SR-IOV VF parameters for security reasons.
> >>>
> >>> When the PF (host port) is driven by DPDK (OVS-DPDK case),
> >>> we face two different cases:
> >>> - driver is bifurcated (Mellanox case),
> >>> so the VF can be configured via the kernel.
> >>> - driver is on top of UIO or VFIO, so DPDK API is required,
> >>> and PMD-specific APIs were used.
> >>
> >> So, what is wrong with setting VF mac via the representor port as
> >> it done now? From our previous discussion, I thought that we concluded
> >> that is enough to have current API, i.e. just call set_default_mac()
> >> for a representor port and this will lead to setting mac for VF.
> >> This is how it's implemented in Linux kernel and this is how it's
> >> implemented in current DPDK drivers that supports setting mac for
> >> the representor.
> >
> > I don't know what is done in the Linux kernel.
> > In DPDK, setting the MAC of the representor is really setting
> > the MAC of the representor. Is it crazy?
>
> Kind of. And no, it doesn't work this way in DPDK.
> Just look at the i40e driver:
>
> 325 static int
> 326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
> 327 struct ether_addr *mac_addr)
> 328 {
> 329 struct i40e_vf_representor *representor = ethdev->data->dev_private;
> 330
> 331 return rte_pmd_i40e_set_vf_mac_addr(
> 332 representor->adapter->eth_dev->data->port_id,
> 333 representor->vf_id, mac_addr);
> 334 }
> ....
> 423 static const struct eth_dev_ops i40e_representor_dev_ops = {
> <...>
> 445 .mac_addr_set = i40e_vf_representor_mac_addr_set,
>
>
> It really calls the function to set VF mac address.
Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling
rte_pmd_i40e_set_vf_mac_addr().
> And for MLX drivers it's the same.
> MLX driver call netlink to set representor MAC, but netlink is in
> kernel and this call inside the kernel translates to the setting
> of mac address of the VF.
>
> Am I missing something?
I am not sure about this kernel translation. At least not in mlx5.
Setting MAC address of a VF representor seems not supported on mlx5.
But there is a specific netlink request to set a VF MAC address:
ip link set <PF> vf <VF> mac <MAC>
> >> The only use case for this new API is to be able to control mac of
> >> the representor itself, which doesn't make much sense. At least there
> >> are only hypothetical use cases. And once again, Linux kernel doesn't
> >> support this behavior.
> >
> > I think it is sane to be able to set different MAC addresses
> > for the representor and the VF.
Let me explain better my thoughts.
In the switchdev design, VF and uplink ports are all connected
together via a switch.
The representors are mirrors of those switch ports.
So a VF representor port is supposed to mirror the switch port
where a VF is connected to. It is different of the VF itself.
This is a drawing of my understanding of switchdev design:
VF1 ------ VF1 rep /--------\
| switch | uplink rep ------ uplink ------ wire
VF2 ------ VF2 rep \--------/ (PF)
Of course, there can be more VF and uplink ports.
With some switch-aware protocols, it might be interesting to have
different MAC addresses on a VF and its representor.
And more generally, I imagine that the config of a VF representor
could be different of the config of a VF.
> >>> This new generic API will avoid vendors fragmentation.
> >>
> >> I don't see the fragmentation. Right now you can set VF mac from DPDK
> >> by calling set_default_mac() for representor. This API exists for all
> >> vendors. Not implemented for Intel, but new API is not implemented too.
> >
> > No, the current situation is the following:
> > - for mlx5, VF MAC can be configured with iproute2 or netlink
> > - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
> > - for i40e, rte_pmd_i40e_set_vf_mac_addr()
> > - for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
Thanks to Ilya's comment, this is an update of the DPDK situation:
- for mlx5, VF MAC can be configured with iproute2 or netlink
and rte_eth_dev_default_mac_addr_set(rep) is not supported
- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
and rte_eth_dev_default_mac_addr_set(rep) does the same
- for i40e, rte_pmd_i40e_set_vf_mac_addr()
and rte_eth_dev_default_mac_addr_set(rep) does the same
- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
and no representor
If we consider what Intel did, i.e. configure VF in place of representor
for some operations, there are two drawbacks:
- confusing that some ops apply to representor, others apply to VF
- some ops are not possible on representor (because targetted to VF)
I still feel that the addition of one single bit in the port ID
is an elegant solution to target either the VF or its representor.
> >> The this is that this new API will produce conceptual fragmentation
> >> between DPDK and the Linux kernel, because to do the same thing you'll
> >> have to use different ways. I mean, to change mac of VF in kernel you
> >> need to set mac to the representor, but in DPDK changing setting mac to
> >> representor will lead to changing the mac of the representor itself,
> >> not the VF. This will be really confusing for users.
> >
> > I am not responsible of the choices in Linux.
> > But I agree it would be interesting to check the reason of such decision.
> > Rony, please could you explain?
I looked at few Linux drivers:
bnxt_vf_rep_netdev_ops has no op to set MAC
bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
lio_vf_rep_ndev_ops has no op to set MAC
lionetdevops.ndo_set_vf_mac = set VF MAC from PF
mlx5e_netdev_ops_rep has no op to set MAC
mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF
nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC
nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor
nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
There is a big chance that the behaviour is not standardized in Linux
(as usual). So it is already confusing for users of Linux.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-10-30 21:42 ` Thomas Monjalon
@ 2019-11-01 9:32 ` Ilya Maximets
2019-11-03 6:48 ` Shahaf Shuler
0 siblings, 1 reply; 36+ messages in thread
From: Ilya Maximets @ 2019-11-01 9:32 UTC (permalink / raw)
To: Thomas Monjalon, Ilya Maximets
Cc: dev, Shahaf Shuler, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde
On 30.10.2019 22:42, Thomas Monjalon wrote:
> 30/10/2019 17:09, Ilya Maximets:
>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>> 30/10/2019 16:07, Ilya Maximets:
>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>> In a virtual environment, the network controller may have to configure
>>>>> some SR-IOV VF parameters for security reasons.
>>>>>
>>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case),
>>>>> we face two different cases:
>>>>> - driver is bifurcated (Mellanox case),
>>>>> so the VF can be configured via the kernel.
>>>>> - driver is on top of UIO or VFIO, so DPDK API is required,
>>>>> and PMD-specific APIs were used.
>>>>
>>>> So, what is wrong with setting VF mac via the representor port as
>>>> it done now? From our previous discussion, I thought that we concluded
>>>> that is enough to have current API, i.e. just call set_default_mac()
>>>> for a representor port and this will lead to setting mac for VF.
>>>> This is how it's implemented in Linux kernel and this is how it's
>>>> implemented in current DPDK drivers that supports setting mac for
>>>> the representor.
>>>
>>> I don't know what is done in the Linux kernel.
>>> In DPDK, setting the MAC of the representor is really setting
>>> the MAC of the representor. Is it crazy?
>>
>> Kind of. And no, it doesn't work this way in DPDK.
>> Just look at the i40e driver:
>>
>> 325 static int
>> 326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
>> 327 struct ether_addr *mac_addr)
>> 328 {
>> 329 struct i40e_vf_representor *representor = ethdev->data->dev_private;
>> 330
>> 331 return rte_pmd_i40e_set_vf_mac_addr(
>> 332 representor->adapter->eth_dev->data->port_id,
>> 333 representor->vf_id, mac_addr);
>> 334 }
>> ....
>> 423 static const struct eth_dev_ops i40e_representor_dev_ops = {
>> <...>
>> 445 .mac_addr_set = i40e_vf_representor_mac_addr_set,
>>
>>
>> It really calls the function to set VF mac address.
>
> Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling
> rte_pmd_i40e_set_vf_mac_addr().
>
>> And for MLX drivers it's the same.
>> MLX driver call netlink to set representor MAC, but netlink is in
>> kernel and this call inside the kernel translates to the setting
>> of mac address of the VF.
>>
>> Am I missing something?
>
> I am not sure about this kernel translation. At least not in mlx5.
> Setting MAC address of a VF representor seems not supported on mlx5.
> But there is a specific netlink request to set a VF MAC address:
> ip link set <PF> vf <VF> mac <MAC>
>
>
>>>> The only use case for this new API is to be able to control mac of
>>>> the representor itself, which doesn't make much sense. At least there
>>>> are only hypothetical use cases. And once again, Linux kernel doesn't
>>>> support this behavior.
>>>
>>> I think it is sane to be able to set different MAC addresses
>>> for the representor and the VF.
>
> Let me explain better my thoughts.
> In the switchdev design, VF and uplink ports are all connected
> together via a switch.
> The representors are mirrors of those switch ports.
> So a VF representor port is supposed to mirror the switch port
> where a VF is connected to. It is different of the VF itself.
>
> This is a drawing of my understanding of switchdev design:
>
> VF1 ------ VF1 rep /--------\
> | switch | uplink rep ------ uplink ------ wire
> VF2 ------ VF2 rep \--------/ (PF)
>
> Of course, there can be more VF and uplink ports.
>
> With some switch-aware protocols, it might be interesting to have
> different MAC addresses on a VF and its representor.
> And more generally, I imagine that the config of a VF representor
> could be different of the config of a VF.
>
>
>>>>> This new generic API will avoid vendors fragmentation.
>>>>
>>>> I don't see the fragmentation. Right now you can set VF mac from DPDK
>>>> by calling set_default_mac() for representor. This API exists for all
>>>> vendors. Not implemented for Intel, but new API is not implemented too.
>>>
>>> No, the current situation is the following:
>>> - for mlx5, VF MAC can be configured with iproute2 or netlink
>>> - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
>>> - for i40e, rte_pmd_i40e_set_vf_mac_addr()
>>> - for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
>
> Thanks to Ilya's comment, this is an update of the DPDK situation:
> - for mlx5, VF MAC can be configured with iproute2 or netlink
> and rte_eth_dev_default_mac_addr_set(rep) is not supported
> - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
> and rte_eth_dev_default_mac_addr_set(rep) does the same
> - for i40e, rte_pmd_i40e_set_vf_mac_addr()
> and rte_eth_dev_default_mac_addr_set(rep) does the same
> - for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
> and no representor
>
> If we consider what Intel did, i.e. configure VF in place of representor
> for some operations, there are two drawbacks:
> - confusing that some ops apply to representor, others apply to VF
> - some ops are not possible on representor (because targetted to VF)
>
> I still feel that the addition of one single bit in the port ID
> is an elegant solution to target either the VF or its representor.
Since we already have a confusion about what is configured when
operations are performed on a representor port we have 2 options:
1. Have this proposed API to configure representor itself while
setting config to representor and configuring VF if special
bit enabled.
2. Reverse the logic of current proposal, i.e. always apply
configuration to VF while working with representor and apply
configuration to representor itself if special bit is set.
I'd probably prefer option #2, because:
- From the OVS and OpenStack point of view, I think, we don't
really need to configure representor itself in most cases.
And OVS really should not know if it works with representor
or some real port.
- It seems that most of the existing code in DPDK already works
like this, i.e. applying configs to VF itself. Intel drivers
works like this and Mellanox drivers, as Thomas said, doesn't
have this functionality at all.
>
>
>>>> The this is that this new API will produce conceptual fragmentation
>>>> between DPDK and the Linux kernel, because to do the same thing you'll
>>>> have to use different ways. I mean, to change mac of VF in kernel you
>>>> need to set mac to the representor, but in DPDK changing setting mac to
>>>> representor will lead to changing the mac of the representor itself,
>>>> not the VF. This will be really confusing for users.
>>>
>>> I am not responsible of the choices in Linux.
>>> But I agree it would be interesting to check the reason of such decision.
>>> Rony, please could you explain?
>
> I looked at few Linux drivers:
>
> bnxt_vf_rep_netdev_ops has no op to set MAC
> bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>
> lio_vf_rep_ndev_ops has no op to set MAC
> lionetdevops.ndo_set_vf_mac = set VF MAC from PF
>
> mlx5e_netdev_ops_rep has no op to set MAC
> mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF
>
> nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC
> nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor
> nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>
> There is a big chance that the behaviour is not standardized in Linux
> (as usual). So it is already confusing for users of Linux.
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-01 9:32 ` Ilya Maximets
@ 2019-11-03 6:48 ` Shahaf Shuler
2019-11-03 15:27 ` Ananyev, Konstantin
2019-11-04 10:28 ` Ilya Maximets
0 siblings, 2 replies; 36+ messages in thread
From: Shahaf Shuler @ 2019-11-03 6:48 UTC (permalink / raw)
To: Ilya Maximets, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde
Friday, November 1, 2019 11:33 AM, Ilya Maximets:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
>
> On 30.10.2019 22:42, Thomas Monjalon wrote:
> > 30/10/2019 17:09, Ilya Maximets:
> >> On 30.10.2019 16:49, Thomas Monjalon wrote:
> >>> 30/10/2019 16:07, Ilya Maximets:
> >>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>>>> In a virtual environment, the network controller may have to
> >>>>> configure some SR-IOV VF parameters for security reasons.
> >>>>>
[...]
> > If we consider what Intel did, i.e. configure VF in place of
> > representor for some operations, there are two drawbacks:
> > - confusing that some ops apply to representor, others apply to VF
> > - some ops are not possible on representor (because targetted to VF)
> >
> > I still feel that the addition of one single bit in the port ID is an
> > elegant solution to target either the VF or its representor.
>
> Since we already have a confusion about what is configured when operations
> are performed on a representor port we have 2 options:
I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied on VF (please share if I missed it).
The fact that there are some drivers that implemented it doesn't mean it is correct.
> 1. Have this proposed API to configure representor itself while
> setting config to representor and configuring VF if special
> bit enabled.
> 2. Reverse the logic of current proposal, i.e. always apply
> configuration to VF while working with representor and apply
> configuration to representor itself if special bit is set.
>
> I'd probably prefer option #2, because:
> - From the OVS and OpenStack point of view, I think, we don't
> really need to configure representor itself in most cases.
> And OVS really should not know if it works with representor
> or some real port.
I don't thinks OVS can be really agnostic to the fact it runs on top of representors:
1. probing of representor has different command line -w <bdf>,representor=XXX
2. the whole acceleration framework based on insertion of flow rules for direct forward from the VF to target entity. Rules are applied on the representor and would not work if port is not such.
3. some multi-port devices cannot do direct fwd between its different port. This is why rep has switch_id and application should query it and act upon.
4. representor carry the VF port id. This is how application know to which VF (or vport) they associated with on their other side.
> - It seems that most of the existing code in DPDK already works
> like this, i.e. applying configs to VF itself. Intel drivers
> works like this and Mellanox drivers, as Thomas said, doesn't
> have this functionality at all.
As I said above, I don't think we need to refer to specific driver behavior, rather the API guidelines.
To me, it is a bit strange and not natural that ethdev configuration is applied to different port w/o any explicit request from the application.
This is why I would prefer #1 above.
>
> >
> >
> >>>> The this is that this new API will produce conceptual fragmentation
> >>>> between DPDK and the Linux kernel, because to do the same thing
> >>>> you'll have to use different ways. I mean, to change mac of VF in
> >>>> kernel you need to set mac to the representor, but in DPDK changing
> >>>> setting mac to representor will lead to changing the mac of the
> >>>> representor itself, not the VF. This will be really confusing for users.
> >>>
> >>> I am not responsible of the choices in Linux.
> >>> But I agree it would be interesting to check the reason of such decision.
> >>> Rony, please could you explain?
> >
> > I looked at few Linux drivers:
> >
> > bnxt_vf_rep_netdev_ops has no op to set MAC
> > bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >
> > lio_vf_rep_ndev_ops has no op to set MAC
> > lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> >
> > mlx5e_netdev_ops_rep has no op to set MAC
> > mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
> PF
> >
> > nfp_repr_netdev_ops.ndo_set_mac_address = set representor
> MAC
> > nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
> representor
> > nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >
> > There is a big chance that the behaviour is not standardized in Linux
> > (as usual). So it is already confusing for users of Linux.
> >
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-03 6:48 ` Shahaf Shuler
@ 2019-11-03 15:27 ` Ananyev, Konstantin
2019-11-03 22:09 ` Thomas Monjalon
2019-11-04 10:28 ` Ilya Maximets
1 sibling, 1 reply; 36+ messages in thread
From: Ananyev, Konstantin @ 2019-11-03 15:27 UTC (permalink / raw)
To: Shahaf Shuler, Ilya Maximets, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Yigit, Ferruh,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, Doherty, Declan,
Iremonger, Bernard, ajit.khaparde
> > > If we consider what Intel did, i.e. configure VF in place of
> > > representor for some operations, there are two drawbacks:
> > > - confusing that some ops apply to representor, others apply to VF
> > > - some ops are not possible on representor (because targetted to VF)
> > >
> > > I still feel that the addition of one single bit in the port ID is an
> > > elegant solution to target either the VF or its representor.
> >
> > Since we already have a confusion about what is configured when operations
> > are performed on a representor port we have 2 options:
>
> I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied
> on VF (please share if I missed it).
> The fact that there are some drivers that implemented it doesn't mean it is correct.
Well, it means that at least authors and reviewers of these patches,
plus probably next-net maintainers believe that it is correct.
At least they did - when patch was applied.
If that is not clearly stated in the doc - it might be just the gap in the documentation,
that needs to be fixed, not a mandate to break existing behavior.
>
> > 1. Have this proposed API to configure representor itself while
> > setting config to representor and configuring VF if special
> > bit enabled.
> > 2. Reverse the logic of current proposal, i.e. always apply
> > configuration to VF while working with representor and apply
> > configuration to representor itself if special bit is set.
> >
> > I'd probably prefer option #2, because:
> > - From the OVS and OpenStack point of view, I think, we don't
> > really need to configure representor itself in most cases.
> > And OVS really should not know if it works with representor
> > or some real port.
+1 to keep existing behavior if possible.
> I don't thinks OVS can be really agnostic to the fact it runs on top of representors:
> 1. probing of representor has different command line -w <bdf>,representor=XXX
> 2. the whole acceleration framework based on insertion of flow rules for direct forward from the VF to target entity. Rules are applied on
> the representor and would not work if port is not such.
> 3. some multi-port devices cannot do direct fwd between its different port. This is why rep has switch_id and application should query it and
> act upon.
> 4. representor carry the VF port id. This is how application know to which VF (or vport) they associated with on their other side.
>
> > - It seems that most of the existing code in DPDK already works
> > like this, i.e. applying configs to VF itself. Intel drivers
> > works like this and Mellanox drivers, as Thomas said, doesn't
> > have this functionality at all.
>
> As I said above, I don't think we need to refer to specific driver behavior, rather the API guidelines.
> To me, it is a bit strange and not natural that ethdev configuration is applied to different port w/o any explicit request from the application.
> This is why I would prefer #1 above.
>
> >
> > >
> > >
> > >>>> The this is that this new API will produce conceptual fragmentation
> > >>>> between DPDK and the Linux kernel, because to do the same thing
> > >>>> you'll have to use different ways. I mean, to change mac of VF in
> > >>>> kernel you need to set mac to the representor, but in DPDK changing
> > >>>> setting mac to representor will lead to changing the mac of the
> > >>>> representor itself, not the VF. This will be really confusing for users.
> > >>>
> > >>> I am not responsible of the choices in Linux.
> > >>> But I agree it would be interesting to check the reason of such decision.
> > >>> Rony, please could you explain?
> > >
> > > I looked at few Linux drivers:
> > >
> > > bnxt_vf_rep_netdev_ops has no op to set MAC
> > > bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > >
> > > lio_vf_rep_ndev_ops has no op to set MAC
> > > lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> > >
> > > mlx5e_netdev_ops_rep has no op to set MAC
> > > mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > > mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
> > PF
> > >
> > > nfp_repr_netdev_ops.ndo_set_mac_address = set representor
> > MAC
> > > nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
> > representor
> > > nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > >
> > > There is a big chance that the behaviour is not standardized in Linux
> > > (as usual). So it is already confusing for users of Linux.
> > >
> > >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-03 15:27 ` Ananyev, Konstantin
@ 2019-11-03 22:09 ` Thomas Monjalon
2019-11-07 14:44 ` Thomas Monjalon
0 siblings, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2019-11-03 22:09 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Shahaf Shuler, Ilya Maximets, dev, Jerin Jacob, Andrew Rybchenko,
Yigit, Ferruh, Stephen Hemminger, Roni Bar Yanai, Rony Efraim,
Doherty, Declan, Iremonger, Bernard, ajit.khaparde,
david.marchand
03/11/2019 16:27, Ananyev, Konstantin:
>
> > > > If we consider what Intel did, i.e. configure VF in place of
> > > > representor for some operations, there are two drawbacks:
> > > > - confusing that some ops apply to representor, others apply to VF
> > > > - some ops are not possible on representor (because targetted to VF)
> > > >
> > > > I still feel that the addition of one single bit in the port ID is an
> > > > elegant solution to target either the VF or its representor.
> > >
> > > Since we already have a confusion about what is configured when operations
> > > are performed on a representor port we have 2 options:
> >
> > I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied
> > on VF (please share if I missed it).
> > The fact that there are some drivers that implemented it doesn't mean it is correct.
>
> Well, it means that at least authors and reviewers of these patches,
> plus probably next-net maintainers believe that it is correct.
> At least they did - when patch was applied.
> If that is not clearly stated in the doc - it might be just the gap in the documentation,
Gap in the documentation? We should state that the config should be applied
to the port specified with port_id and no other one? Funny
> that needs to be fixed, not a mandate to break existing behavior.
So because you managed to have a wrong patch applied in your PMD,
you want to make it the generic API in ethdev?
What a process!
Hey guys, if you want to change an API behaviour in a way others don't,
you just have to implement what you want in your PMD silently,
then you will be able to change the API to comply with your behaviour.
Wonderful.
If we allow such practice, DPDK is dead.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-03 22:09 ` Thomas Monjalon
@ 2019-11-07 14:44 ` Thomas Monjalon
0 siblings, 0 replies; 36+ messages in thread
From: Thomas Monjalon @ 2019-11-07 14:44 UTC (permalink / raw)
To: dev
Cc: Ananyev, Konstantin, Shahaf Shuler, Ilya Maximets, Jerin Jacob,
Andrew Rybchenko, Yigit, Ferruh, Stephen Hemminger,
Roni Bar Yanai, Rony Efraim, Doherty, Declan, Iremonger, Bernard,
ajit.khaparde, david.marchand, rasland
Bad news for the DPDK API below:
03/11/2019 23:09, Thomas Monjalon:
> 03/11/2019 16:27, Ananyev, Konstantin:
> >
> > > > > If we consider what Intel did, i.e. configure VF in place of
> > > > > representor for some operations, there are two drawbacks:
> > > > > - confusing that some ops apply to representor, others apply to VF
> > > > > - some ops are not possible on representor (because targetted to VF)
It seems this confusion and limitation is not scary enough to convince
of the right approach.
Most probably this figure was not well understood:
VF1 ------ VF1 rep /--------\
| switch | uplink rep ------ uplink ------ wire
VF2 ------ VF2 rep \--------/ (PF)
Please understand that a packet going from the VF rep to the VF,
is Tx from the representor point of view,
while it is Rx from the VF point of view.
It should explain why VF and rep are different.
One more explanation: doing Rx/Tx from the VF rep is the way,
for a vSwitch, to manage the forwarding in CPU,
before offloading it in the HW switch with flow rules.
After reading how the representor is implemented in Intel PMDs,
I understand there is no intent to do some vSwitch offload with SR-IOV.
The Mellanox PMD is capable of such full offload, that's why
the representor port must be a real DPDK port, not a ghost.
> > > > >
> > > > > I still feel that the addition of one single bit in the port ID is an
> > > > > elegant solution to target either the VF or its representor.
> > > >
> > > > Since we already have a confusion about what is configured when operations
> > > > are performed on a representor port we have 2 options:
> > >
> > > I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied
> > > on VF (please share if I missed it).
> > > The fact that there are some drivers that implemented it doesn't mean it is correct.
> >
> > Well, it means that at least authors and reviewers of these patches,
> > plus probably next-net maintainers believe that it is correct.
> > At least they did - when patch was applied.
For information, a Mellanox paper of a netdevconf talk in 2016,
which is giving more information about the ideas behind the switchdev model:
https://netdevconf.info/1.2/papers/efraim-gerlitz-sriov-ovs-final.pdf
> > If that is not clearly stated in the doc - it might be just the gap in the documentation,
>
> Gap in the documentation? We should state that the config should be applied
> to the port specified with port_id and no other one? Funny
>
> > that needs to be fixed, not a mandate to break existing behavior.
>
> So because you managed to have a wrong patch applied in your PMD,
> you want to make it the generic API in ethdev?
> What a process!
>
> Hey guys, if you want to change an API behaviour in a way others don't,
> you just have to implement what you want in your PMD silently,
> then you will be able to change the API to comply with your behaviour.
> Wonderful.
> If we allow such practice, DPDK is dead.
During the Technical Board yesterday, it was decided to go with Intel
understanding of what is a representor, i.e. a ghost of the VF.
It was asked to implement the representor operations on the representor
port with a special flag to make explicit a non-VF operation.
I really tried to implement this reversed logic.
One blocker is to check the port ID flag in every functions,
including the inline Rx/Tx. So I came to the conclusion that the
reverse logic is really ugly and it has no short-term benefit
compared to the mix we have currently.
After long hours thinking in the night, I prefer focusing on the release.
It means we will continue to mix VF and representor operations
with the same port ID. For the record, I believe it is very bad.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-03 6:48 ` Shahaf Shuler
2019-11-03 15:27 ` Ananyev, Konstantin
@ 2019-11-04 10:28 ` Ilya Maximets
2019-11-04 14:30 ` Asaf Penso
2019-11-04 20:33 ` Shahaf Shuler
1 sibling, 2 replies; 36+ messages in thread
From: Ilya Maximets @ 2019-11-04 10:28 UTC (permalink / raw)
To: Shahaf Shuler, Ilya Maximets, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde, Ananyev, Konstantin
On 03.11.2019 7:48, Shahaf Shuler wrote:
> Friday, November 1, 2019 11:33 AM, Ilya Maximets:
>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>> host
>>
>> On 30.10.2019 22:42, Thomas Monjalon wrote:
>>> 30/10/2019 17:09, Ilya Maximets:
>>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>>>> 30/10/2019 16:07, Ilya Maximets:
>>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>>>> In a virtual environment, the network controller may have to
>>>>>>> configure some SR-IOV VF parameters for security reasons.
>>>>>>>
> [...]
>
>>> If we consider what Intel did, i.e. configure VF in place of
>>> representor for some operations, there are two drawbacks:
>>> - confusing that some ops apply to representor, others apply to VF
>>> - some ops are not possible on representor (because targetted to VF)
>>>
>>> I still feel that the addition of one single bit in the port ID is an
>>> elegant solution to target either the VF or its representor.
>>
>> Since we already have a confusion about what is configured when operations
>> are performed on a representor port we have 2 options:
>
> I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied on VF (please share if I missed it).
> The fact that there are some drivers that implemented it doesn't mean it is correct.
>
>> 1. Have this proposed API to configure representor itself while
>> setting config to representor and configuring VF if special
>> bit enabled.
>> 2. Reverse the logic of current proposal, i.e. always apply
>> configuration to VF while working with representor and apply
>> configuration to representor itself if special bit is set.
>>
>> I'd probably prefer option #2, because:
>> - From the OVS and OpenStack point of view, I think, we don't
>> really need to configure representor itself in most cases.
>> And OVS really should not know if it works with representor
>> or some real port.
>
> I don't thinks OVS can be really agnostic to the fact it runs on top of representors:
> 1. probing of representor has different command line -w <bdf>,representor=XXX
OVS doesn't care about content of devargs. It just passes them to hotplug
engine without any parsing (except a single case that must be eliminated
with a proper device iterators, not an OVS issue).
> 2. the whole acceleration framework based on insertion of flow rules for direct forward from the VF to target entity. Rules are applied on the representor and would not work if port is not such.
OVS tries to offload rules to the netdev from which packet was received.
That's it. If it succeeds - OK. If not, OVS doesn't care.
> 3. some multi-port devices cannot do direct fwd between its different port. This is why rep has switch_id and application should query it and act upon.
This is part of offloading engine that doesn't affect the generic code.
If needed, OVS could request switch_id for netdev it tries to offload rules on.
OVS should not know if it representor port or not. If this operation will not
succeed for non-representors, OVS should not care because we can't offload
anything for non-representors anyway.
> 4. representor carry the VF port id. This is how application know to which VF (or vport) they associated with on their other side.
This is just part of devargs, i.e. part of device unique identifier.
Once again, OVS doesn't parse devargs and should not do that.
>
>> - It seems that most of the existing code in DPDK already works
>> like this, i.e. applying configs to VF itself. Intel drivers
>> works like this and Mellanox drivers, as Thomas said, doesn't
>> have this functionality at all.
>
> As I said above, I don't think we need to refer to specific driver behavior, rather the API guidelines.
> To me, it is a bit strange and not natural that ethdev configuration is applied to different port w/o any explicit request from the application.
> This is why I would prefer #1 above.
IMHO, the whole concept of representors is that representor is a
way of attaching same port both to VM and vSwitch/hypervisor.
If you're looking at representors as a separate ports on a switch, well..
In this case, for me VF configuration looks like something that
vSwitch should not do at all, because it should not configure ports
that doesn't attached to it. It's like configuring the other
side of veth pair, which is nonsense.
BTW, I don't know a way to find out if port is a representor of something
or not in Linux kernel.
>
>>
>>>
>>>
>>>>>> The this is that this new API will produce conceptual fragmentation
>>>>>> between DPDK and the Linux kernel, because to do the same thing
>>>>>> you'll have to use different ways. I mean, to change mac of VF in
>>>>>> kernel you need to set mac to the representor, but in DPDK changing
>>>>>> setting mac to representor will lead to changing the mac of the
>>>>>> representor itself, not the VF. This will be really confusing for users.
>>>>>
>>>>> I am not responsible of the choices in Linux.
>>>>> But I agree it would be interesting to check the reason of such decision.
>>>>> Rony, please could you explain?
>>>
>>> I looked at few Linux drivers:
>>>
>>> bnxt_vf_rep_netdev_ops has no op to set MAC
>>> bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>
>>> lio_vf_rep_ndev_ops has no op to set MAC
>>> lionetdevops.ndo_set_vf_mac = set VF MAC from PF
>>>
>>> mlx5e_netdev_ops_rep has no op to set MAC
>>> mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>> mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
>> PF
>>>
>>> nfp_repr_netdev_ops.ndo_set_mac_address = set representor
>> MAC
>>> nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
>> representor
>>> nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>
>>> There is a big chance that the behaviour is not standardized in Linux
>>> (as usual). So it is already confusing for users of Linux.
>>>
>>>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-04 10:28 ` Ilya Maximets
@ 2019-11-04 14:30 ` Asaf Penso
2019-11-04 14:58 ` Ilya Maximets
2019-11-04 20:33 ` Shahaf Shuler
1 sibling, 1 reply; 36+ messages in thread
From: Asaf Penso @ 2019-11-04 14:30 UTC (permalink / raw)
To: Ilya Maximets, Shahaf Shuler, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde, Ananyev, Konstantin
Regards,
Asaf Penso
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ilya Maximets
> Sent: Monday, November 4, 2019 12:28 PM
> To: Shahaf Shuler <shahafs@mellanox.com>; Ilya Maximets
> <i.maximets@ovn.org>; Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Roni Bar Yanai <roniba@mellanox.com>;
> Rony Efraim <ronye@mellanox.com>; declan.doherty@intel.com;
> bernard.iremonger@intel.com; ajit.khaparde@broadcom.com; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
>
> On 03.11.2019 7:48, Shahaf Shuler wrote:
> > Friday, November 1, 2019 11:33 AM, Ilya Maximets:
> >> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> >> host
> >>
> >> On 30.10.2019 22:42, Thomas Monjalon wrote:
> >>> 30/10/2019 17:09, Ilya Maximets:
> >>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
> >>>>> 30/10/2019 16:07, Ilya Maximets:
> >>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>>>>>> In a virtual environment, the network controller may have to
> >>>>>>> configure some SR-IOV VF parameters for security reasons.
> >>>>>>>
> > [...]
> >
> >>> If we consider what Intel did, i.e. configure VF in place of
> >>> representor for some operations, there are two drawbacks:
> >>> - confusing that some ops apply to representor, others apply to VF
> >>> - some ops are not possible on representor (because targetted to VF)
> >>>
> >>> I still feel that the addition of one single bit in the port ID is an
> >>> elegant solution to target either the VF or its representor.
> >>
> >> Since we already have a confusion about what is configured when
> operations
> >> are performed on a representor port we have 2 options:
> >
> > I don't agree we have. I don't think there is any design note or API doc that
> says the ethdev configuration on representor should be applied on VF
> (please share if I missed it).
> > The fact that there are some drivers that implemented it doesn't mean it is
> correct.
> >
> >> 1. Have this proposed API to configure representor itself while
> >> setting config to representor and configuring VF if special
> >> bit enabled.
> >> 2. Reverse the logic of current proposal, i.e. always apply
> >> configuration to VF while working with representor and apply
> >> configuration to representor itself if special bit is set.
> >>
> >> I'd probably prefer option #2, because:
> >> - From the OVS and OpenStack point of view, I think, we don't
> >> really need to configure representor itself in most cases.
> >> And OVS really should not know if it works with representor
> >> or some real port.
> >
> > I don't thinks OVS can be really agnostic to the fact it runs on top of
> representors:
> > 1. probing of representor has different command line -w
> <bdf>,representor=XXX
>
> OVS doesn't care about content of devargs. It just passes them to hotplug
> engine without any parsing (except a single case that must be eliminated
> with a proper device iterators, not an OVS issue).
>
> > 2. the whole acceleration framework based on insertion of flow rules for
> direct forward from the VF to target entity. Rules are applied on the
> representor and would not work if port is not such.
>
> OVS tries to offload rules to the netdev from which packet was received.
> That's it. If it succeeds - OK. If not, OVS doesn't care.
>
> > 3. some multi-port devices cannot do direct fwd between its different port.
> This is why rep has switch_id and application should query it and act upon.
>
> This is part of offloading engine that doesn't affect the generic code.
> If needed, OVS could request switch_id for netdev it tries to offload rules on.
> OVS should not know if it representor port or not. If this operation will not
> succeed for non-representors, OVS should not care because we can't offload
> anything for non-representors anyway.
>
> > 4. representor carry the VF port id. This is how application know to which
> VF (or vport) they associated with on their other side.
>
> This is just part of devargs, i.e. part of device unique identifier.
> Once again, OVS doesn't parse devargs and should not do that.
>
> >
> >> - It seems that most of the existing code in DPDK already works
> >> like this, i.e. applying configs to VF itself. Intel drivers
> >> works like this and Mellanox drivers, as Thomas said, doesn't
> >> have this functionality at all.
> >
> > As I said above, I don't think we need to refer to specific driver behavior,
> rather the API guidelines.
> > To me, it is a bit strange and not natural that ethdev configuration is applied
> to different port w/o any explicit request from the application.
> > This is why I would prefer #1 above.
>
> IMHO, the whole concept of representors is that representor is a
> way of attaching same port both to VM and vSwitch/hypervisor.
>
> If you're looking at representors as a separate ports on a switch, well..
> In this case, for me VF configuration looks like something that
> vSwitch should not do at all, because it should not configure ports
> that doesn't attached to it. It's like configuring the other
> side of veth pair, which is nonsense.
>
>
> BTW, I don't know a way to find out if port is a representor of something
> or not in Linux kernel.
Specifically in OVS, in function dpdk_eth_dev_init, you can do this to detect:
*info.dev_flags & RTE_ETH_DEV_REPRESENTOR
>
> >
> >>
> >>>
> >>>
> >>>>>> The this is that this new API will produce conceptual fragmentation
> >>>>>> between DPDK and the Linux kernel, because to do the same thing
> >>>>>> you'll have to use different ways. I mean, to change mac of VF in
> >>>>>> kernel you need to set mac to the representor, but in DPDK changing
> >>>>>> setting mac to representor will lead to changing the mac of the
> >>>>>> representor itself, not the VF. This will be really confusing for users.
> >>>>>
> >>>>> I am not responsible of the choices in Linux.
> >>>>> But I agree it would be interesting to check the reason of such
> decision.
> >>>>> Rony, please could you explain?
> >>>
> >>> I looked at few Linux drivers:
> >>>
> >>> bnxt_vf_rep_netdev_ops has no op to set MAC
> >>> bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >>>
> >>> lio_vf_rep_ndev_ops has no op to set MAC
> >>> lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> >>>
> >>> mlx5e_netdev_ops_rep has no op to set MAC
> >>> mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >>> mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
> >> PF
> >>>
> >>> nfp_repr_netdev_ops.ndo_set_mac_address = set representor
> >> MAC
> >>> nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
> >> representor
> >>> nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >>>
> >>> There is a big chance that the behaviour is not standardized in Linux
> >>> (as usual). So it is already confusing for users of Linux.
> >>>
> >>>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-04 14:30 ` Asaf Penso
@ 2019-11-04 14:58 ` Ilya Maximets
0 siblings, 0 replies; 36+ messages in thread
From: Ilya Maximets @ 2019-11-04 14:58 UTC (permalink / raw)
To: Asaf Penso, Ilya Maximets, Shahaf Shuler, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde, Ananyev, Konstantin
On 04.11.2019 15:30, Asaf Penso wrote:
>
>
> Regards,
> Asaf Penso
>
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Ilya Maximets
>> Sent: Monday, November 4, 2019 12:28 PM
>> To: Shahaf Shuler <shahafs@mellanox.com>; Ilya Maximets
>> <i.maximets@ovn.org>; Thomas Monjalon <thomas@monjalon.net>
>> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Andrew
>> Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
>> <ferruh.yigit@intel.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Roni Bar Yanai <roniba@mellanox.com>;
>> Rony Efraim <ronye@mellanox.com>; declan.doherty@intel.com;
>> bernard.iremonger@intel.com; ajit.khaparde@broadcom.com; Ananyev,
>> Konstantin <konstantin.ananyev@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>> host
>>
>> On 03.11.2019 7:48, Shahaf Shuler wrote:
>>> Friday, November 1, 2019 11:33 AM, Ilya Maximets:
>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>>>> host
>>>>
>>>> On 30.10.2019 22:42, Thomas Monjalon wrote:
>>>>> 30/10/2019 17:09, Ilya Maximets:
>>>>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>>>>>> 30/10/2019 16:07, Ilya Maximets:
>>>>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>>>>>> In a virtual environment, the network controller may have to
>>>>>>>>> configure some SR-IOV VF parameters for security reasons.
>>>>>>>>>
>>> [...]
>>>
>>>>> If we consider what Intel did, i.e. configure VF in place of
>>>>> representor for some operations, there are two drawbacks:
>>>>> - confusing that some ops apply to representor, others apply to VF
>>>>> - some ops are not possible on representor (because targetted to VF)
>>>>>
>>>>> I still feel that the addition of one single bit in the port ID is an
>>>>> elegant solution to target either the VF or its representor.
>>>>
>>>> Since we already have a confusion about what is configured when
>> operations
>>>> are performed on a representor port we have 2 options:
>>>
>>> I don't agree we have. I don't think there is any design note or API doc that
>> says the ethdev configuration on representor should be applied on VF
>> (please share if I missed it).
>>> The fact that there are some drivers that implemented it doesn't mean it is
>> correct.
>>>
>>>> 1. Have this proposed API to configure representor itself while
>>>> setting config to representor and configuring VF if special
>>>> bit enabled.
>>>> 2. Reverse the logic of current proposal, i.e. always apply
>>>> configuration to VF while working with representor and apply
>>>> configuration to representor itself if special bit is set.
>>>>
>>>> I'd probably prefer option #2, because:
>>>> - From the OVS and OpenStack point of view, I think, we don't
>>>> really need to configure representor itself in most cases.
>>>> And OVS really should not know if it works with representor
>>>> or some real port.
>>>
>>> I don't thinks OVS can be really agnostic to the fact it runs on top of
>> representors:
>>> 1. probing of representor has different command line -w
>> <bdf>,representor=XXX
>>
>> OVS doesn't care about content of devargs. It just passes them to hotplug
>> engine without any parsing (except a single case that must be eliminated
>> with a proper device iterators, not an OVS issue).
>>
>>> 2. the whole acceleration framework based on insertion of flow rules for
>> direct forward from the VF to target entity. Rules are applied on the
>> representor and would not work if port is not such.
>>
>> OVS tries to offload rules to the netdev from which packet was received.
>> That's it. If it succeeds - OK. If not, OVS doesn't care.
>>
>>> 3. some multi-port devices cannot do direct fwd between its different port.
>> This is why rep has switch_id and application should query it and act upon.
>>
>> This is part of offloading engine that doesn't affect the generic code.
>> If needed, OVS could request switch_id for netdev it tries to offload rules on.
>> OVS should not know if it representor port or not. If this operation will not
>> succeed for non-representors, OVS should not care because we can't offload
>> anything for non-representors anyway.
>>
>>> 4. representor carry the VF port id. This is how application know to which
>> VF (or vport) they associated with on their other side.
>>
>> This is just part of devargs, i.e. part of device unique identifier.
>> Once again, OVS doesn't parse devargs and should not do that.
>>
>>>
>>>> - It seems that most of the existing code in DPDK already works
>>>> like this, i.e. applying configs to VF itself. Intel drivers
>>>> works like this and Mellanox drivers, as Thomas said, doesn't
>>>> have this functionality at all.
>>>
>>> As I said above, I don't think we need to refer to specific driver behavior,
>> rather the API guidelines.
>>> To me, it is a bit strange and not natural that ethdev configuration is applied
>> to different port w/o any explicit request from the application.
>>> This is why I would prefer #1 above.
>>
>> IMHO, the whole concept of representors is that representor is a
>> way of attaching same port both to VM and vSwitch/hypervisor.
>>
>> If you're looking at representors as a separate ports on a switch, well..
>> In this case, for me VF configuration looks like something that
>> vSwitch should not do at all, because it should not configure ports
>> that doesn't attached to it. It's like configuring the other
>> side of veth pair, which is nonsense.
>>
>>
>> BTW, I don't know a way to find out if port is a representor of something
>> or not in Linux kernel.
>
> Specifically in OVS, in function dpdk_eth_dev_init, you can do this to detect:
> *info.dev_flags & RTE_ETH_DEV_REPRESENTOR
Sure, but I meant a way to do that for Linux network interface, not DPDK.
>
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>>> The this is that this new API will produce conceptual fragmentation
>>>>>>>> between DPDK and the Linux kernel, because to do the same thing
>>>>>>>> you'll have to use different ways. I mean, to change mac of VF in
>>>>>>>> kernel you need to set mac to the representor, but in DPDK changing
>>>>>>>> setting mac to representor will lead to changing the mac of the
>>>>>>>> representor itself, not the VF. This will be really confusing for users.
>>>>>>>
>>>>>>> I am not responsible of the choices in Linux.
>>>>>>> But I agree it would be interesting to check the reason of such
>> decision.
>>>>>>> Rony, please could you explain?
>>>>>
>>>>> I looked at few Linux drivers:
>>>>>
>>>>> bnxt_vf_rep_netdev_ops has no op to set MAC
>>>>> bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>>>
>>>>> lio_vf_rep_ndev_ops has no op to set MAC
>>>>> lionetdevops.ndo_set_vf_mac = set VF MAC from PF
>>>>>
>>>>> mlx5e_netdev_ops_rep has no op to set MAC
>>>>> mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>>> mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
>>>> PF
>>>>>
>>>>> nfp_repr_netdev_ops.ndo_set_mac_address = set representor
>>>> MAC
>>>>> nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
>>>> representor
>>>>> nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>>>
>>>>> There is a big chance that the behaviour is not standardized in Linux
>>>>> (as usual). So it is already confusing for users of Linux.
>>>>>
>>>>>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-04 10:28 ` Ilya Maximets
2019-11-04 14:30 ` Asaf Penso
@ 2019-11-04 20:33 ` Shahaf Shuler
2019-11-05 12:15 ` Ilya Maximets
1 sibling, 1 reply; 36+ messages in thread
From: Shahaf Shuler @ 2019-11-04 20:33 UTC (permalink / raw)
To: Ilya Maximets, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde, Ananyev, Konstantin
Monday, November 4, 2019 12:28 PM, Ilya Maximets:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
>
> On 03.11.2019 7:48, Shahaf Shuler wrote:
> > Friday, November 1, 2019 11:33 AM, Ilya Maximets:
> >> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF
> >> from host
> >>
> >> On 30.10.2019 22:42, Thomas Monjalon wrote:
> >>> 30/10/2019 17:09, Ilya Maximets:
> >>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
> >>>>> 30/10/2019 16:07, Ilya Maximets:
> >>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>>>>>> In a virtual environment, the network controller may have to
> >>>>>>> configure some SR-IOV VF parameters for security reasons.
> >>>>>>>
> > [...]
> >
> >>> If we consider what Intel did, i.e. configure VF in place of
> >>> representor for some operations, there are two drawbacks:
> >>> - confusing that some ops apply to representor, others apply to VF
> >>> - some ops are not possible on representor (because targetted to VF)
> >>>
> >>> I still feel that the addition of one single bit in the port ID is
> >>> an elegant solution to target either the VF or its representor.
> >>
> >> Since we already have a confusion about what is configured when
> >> operations are performed on a representor port we have 2 options:
> >
> > I don't agree we have. I don't think there is any design note or API doc that
> says the ethdev configuration on representor should be applied on VF
> (please share if I missed it).
> > The fact that there are some drivers that implemented it doesn't mean it is
> correct.
> >
> >> 1. Have this proposed API to configure representor itself while
> >> setting config to representor and configuring VF if special
> >> bit enabled.
> >> 2. Reverse the logic of current proposal, i.e. always apply
> >> configuration to VF while working with representor and apply
> >> configuration to representor itself if special bit is set.
> >>
> >> I'd probably prefer option #2, because:
> >> - From the OVS and OpenStack point of view, I think, we don't
> >> really need to configure representor itself in most cases.
> >> And OVS really should not know if it works with representor
> >> or some real port.
> >
> > I don't thinks OVS can be really agnostic to the fact it runs on top of
> representors:
> > 1. probing of representor has different command line -w
> > <bdf>,representor=XXX
>
> OVS doesn't care about content of devargs. It just passes them to hotplug
> engine without any parsing (except a single case that must be eliminated
> with a proper device iterators, not an OVS issue).
>
> > 2. the whole acceleration framework based on insertion of flow rules for
> direct forward from the VF to target entity. Rules are applied on the
> representor and would not work if port is not such.
>
> OVS tries to offload rules to the netdev from which packet was received.
> That's it. If it succeeds - OK. If not, OVS doesn't care.
>
> > 3. some multi-port devices cannot do direct fwd between its different port.
> This is why rep has switch_id and application should query it and act upon.
>
> This is part of offloading engine that doesn't affect the generic code.
> If needed, OVS could request switch_id for netdev it tries to offload rules on.
> OVS should not know if it representor port or not. If this operation will not
> succeed for non-representors, OVS should not care because we can't offload
> anything for non-representors anyway.
I am not speaking on specific module in OVS if it is aware or not, rather OVS as a whole.
There is a big code part in OVS that its sole purpose is to create flow rule on representors.
There is no meaning for this code w/o representors and it should be used only when OVS runs on top of representors (otherwise I would expect bad perf due to the constant preparation of rte_flow rules).
>
> > 4. representor carry the VF port id. This is how application know to which
> VF (or vport) they associated with on their other side.
>
> This is just part of devargs,
It is not part of devargs, it is part of the ethdev info.
OVS should expose such info to the user in order to help w/ the open flow rules creation.
i.e. part of device unique identifier.
> Once again, OVS doesn't parse devargs and should not do that.
So how can OVS user know the mapping between representor and its VF?
>
> >
> >> - It seems that most of the existing code in DPDK already works
> >> like this, i.e. applying configs to VF itself. Intel drivers
> >> works like this and Mellanox drivers, as Thomas said, doesn't
> >> have this functionality at all.
> >
> > As I said above, I don't think we need to refer to specific driver behavior,
> rather the API guidelines.
> > To me, it is a bit strange and not natural that ethdev configuration is applied
> to different port w/o any explicit request from the application.
> > This is why I would prefer #1 above.
>
> IMHO, the whole concept of representors is that representor is a way of
> attaching same port both to VM and vSwitch/hypervisor.
>
> If you're looking at representors as a separate ports on a switch, well..
I think we cannot ignore the fact that both in Linux and DPDK those are netdev/ethdev and as such holds some network configuration (even though on some case it is very minimal or some random values).
> In this case, for me VF configuration looks like something that vSwitch should
> not do at all, because it should not configure ports that doesn't attached to it.
> It's like configuring the other side of veth pair, which is nonsense.
It seems to me we configure the remote VF on every possible solution. The suggest one discussed here just make it more explicit.
>
>
> BTW, I don't know a way to find out if port is a representor of something or
> not in Linux kernel.
One example is by querying the netdev phys_port_name. representor ports will have specific strings to describe the remote end (e.g. pf0vf0) while regular port (e.g. uplink) will have pf0.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from host
2019-11-04 20:33 ` Shahaf Shuler
@ 2019-11-05 12:15 ` Ilya Maximets
0 siblings, 0 replies; 36+ messages in thread
From: Ilya Maximets @ 2019-11-05 12:15 UTC (permalink / raw)
To: Shahaf Shuler, Ilya Maximets, Thomas Monjalon
Cc: dev, Jerin Jacob, Andrew Rybchenko, Ferruh Yigit,
Stephen Hemminger, Roni Bar Yanai, Rony Efraim, declan.doherty,
bernard.iremonger, ajit.khaparde, Ananyev, Konstantin
On 04.11.2019 21:33, Shahaf Shuler wrote:
> Monday, November 4, 2019 12:28 PM, Ilya Maximets:
>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>> host
>>
>> On 03.11.2019 7:48, Shahaf Shuler wrote:
>>> Friday, November 1, 2019 11:33 AM, Ilya Maximets:
>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF
>>>> from host
>>>>
>>>> On 30.10.2019 22:42, Thomas Monjalon wrote:
>>>>> 30/10/2019 17:09, Ilya Maximets:
>>>>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>>>>>> 30/10/2019 16:07, Ilya Maximets:
>>>>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>>>>>> In a virtual environment, the network controller may have to
>>>>>>>>> configure some SR-IOV VF parameters for security reasons.
>>>>>>>>>
>>> [...]
>>>
>>>>> If we consider what Intel did, i.e. configure VF in place of
>>>>> representor for some operations, there are two drawbacks:
>>>>> - confusing that some ops apply to representor, others apply to VF
>>>>> - some ops are not possible on representor (because targetted to VF)
>>>>>
>>>>> I still feel that the addition of one single bit in the port ID is
>>>>> an elegant solution to target either the VF or its representor.
>>>>
>>>> Since we already have a confusion about what is configured when
>>>> operations are performed on a representor port we have 2 options:
>>>
>>> I don't agree we have. I don't think there is any design note or API doc that
>> says the ethdev configuration on representor should be applied on VF
>> (please share if I missed it).
>>> The fact that there are some drivers that implemented it doesn't mean it is
>> correct.
>>>
>>>> 1. Have this proposed API to configure representor itself while
>>>> setting config to representor and configuring VF if special
>>>> bit enabled.
>>>> 2. Reverse the logic of current proposal, i.e. always apply
>>>> configuration to VF while working with representor and apply
>>>> configuration to representor itself if special bit is set.
>>>>
>>>> I'd probably prefer option #2, because:
>>>> - From the OVS and OpenStack point of view, I think, we don't
>>>> really need to configure representor itself in most cases.
>>>> And OVS really should not know if it works with representor
>>>> or some real port.
>>>
>>> I don't thinks OVS can be really agnostic to the fact it runs on top of
>> representors:
>>> 1. probing of representor has different command line -w
>>> <bdf>,representor=XXX
>>
>> OVS doesn't care about content of devargs. It just passes them to hotplug
>> engine without any parsing (except a single case that must be eliminated
>> with a proper device iterators, not an OVS issue).
>>
>>> 2. the whole acceleration framework based on insertion of flow rules for
>> direct forward from the VF to target entity. Rules are applied on the
>> representor and would not work if port is not such.
>>
>> OVS tries to offload rules to the netdev from which packet was received.
>> That's it. If it succeeds - OK. If not, OVS doesn't care.
>>
>>> 3. some multi-port devices cannot do direct fwd between its different port.
>> This is why rep has switch_id and application should query it and act upon.
>>
>> This is part of offloading engine that doesn't affect the generic code.
>> If needed, OVS could request switch_id for netdev it tries to offload rules on.
>> OVS should not know if it representor port or not. If this operation will not
>> succeed for non-representors, OVS should not care because we can't offload
>> anything for non-representors anyway.
>
> I am not speaking on specific module in OVS if it is aware or not, rather OVS as a whole.
> There is a big code part in OVS that its sole purpose is to create flow rule on representors.
> There is no meaning for this code w/o representors and it should be used only when OVS runs on top of representors (otherwise I would expect bad perf due to the constant preparation of rte_flow rules).
I tend to disagree with that statement. You're missing cases like
MARK+RSS actions which is the only offloading type supported now
in OVS userspace datapath. Few more cases I can think of are ingress
policing, access control, some fast rx drop rules that could be used
directly with any HW port that supports offloading, not only representors.
I don't know if all of this is possible right now with DPDK, but these
are definitely valid use cases.
Second point is that current implementation in OVS tries to offload
to any port, and you were one of the co-authors of that code.
After re-working of offloading achitecture that was done this year it's
possible to probe support of the feature before enabling offloading
for particular port, but it's 'TODO' item that is not implemented.
>
>>
>>> 4. representor carry the VF port id. This is how application know to which
>> VF (or vport) they associated with on their other side.
>>
>> This is just part of devargs,
>
> It is not part of devargs, it is part of the ethdev info.
> OVS should expose such info to the user in order to help w/ the open flow rules creation.
This info already exposed via devargs that are stored in ovsdb.
>
> i.e. part of device unique identifier.
>> Once again, OVS doesn't parse devargs and should not do that.
>
> So how can OVS user know the mapping between representor and its VF?
User already knows that because he/she is the one who wrote devargs.
The case is that OVS should not care about that. If users doesn't
want to check devargs passed to OVS to remember which port is the
representor of which VF, he/she could use meaningful port names.
>
>>
>>>
>>>> - It seems that most of the existing code in DPDK already works
>>>> like this, i.e. applying configs to VF itself. Intel drivers
>>>> works like this and Mellanox drivers, as Thomas said, doesn't
>>>> have this functionality at all.
>>>
>>> As I said above, I don't think we need to refer to specific driver behavior,
>> rather the API guidelines.
>>> To me, it is a bit strange and not natural that ethdev configuration is applied
>> to different port w/o any explicit request from the application.
>>> This is why I would prefer #1 above.
>>
>> IMHO, the whole concept of representors is that representor is a way of
>> attaching same port both to VM and vSwitch/hypervisor.
>>
>> If you're looking at representors as a separate ports on a switch, well..
>
> I think we cannot ignore the fact that both in Linux and DPDK those are netdev/ethdev and as such holds some network configuration (even though on some case it is very minimal or some random values).
>
>> In this case, for me VF configuration looks like something that vSwitch should
>> not do at all, because it should not configure ports that doesn't attached to it.
>> It's like configuring the other side of veth pair, which is nonsense.
>
> It seems to me we configure the remote VF on every possible solution. The suggest one discussed here just make it more explicit.
>
>>
>>
>> BTW, I don't know a way to find out if port is a representor of something or
>> not in Linux kernel.
>
> One example is by querying the netdev phys_port_name. representor ports will have specific strings to describe the remote end (e.g. pf0vf0) while regular port (e.g. uplink) will have pf0.
>
^ permalink raw reply [flat|nested] 36+ messages in thread