* [PATCH v1 0/3] add support for infiniband BTH match
@ 2023-05-11 7:55 Dong Zhou
2023-05-11 7:55 ` [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
` (3 more replies)
0 siblings, 4 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-11 7:55 UTC (permalink / raw)
To: orika, viacheslavo, thomas; +Cc: dev, rasland
Add new rte item to match the infiniband BTH in RoCE packets.
Dong Zhou (3):
ethdev: add flow item for RoCE infiniband BTH
net/mlx5: add support for infiniband BTH match
net/mlx5/hws: add support for infiniband BTH match
app/test-pmd/cmdline_flow.c | 58 +++++++++++
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 ++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 ++
drivers/common/mlx5/mlx5_prm.h | 5 +-
drivers/net/mlx5/hws/mlx5dr_definer.c | 76 ++++++++++++++-
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/mlx5_flow.h | 6 ++
drivers/net/mlx5/mlx5_flow_dv.c | 102 ++++++++++++++++++++
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 68 +++++++++++++
14 files changed, 359 insertions(+), 3 deletions(-)
create mode 100644 lib/net/rte_ib.h
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
2023-05-11 7:55 [PATCH v1 0/3] add support for infiniband BTH match Dong Zhou
@ 2023-05-11 7:55 ` Dong Zhou
2023-05-17 17:06 ` Ori Kam
2023-05-11 7:55 ` [PATCH v1 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
` (2 subsequent siblings)
3 siblings, 1 reply; 23+ messages in thread
From: Dong Zhou @ 2023-05-11 7:55 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Aman Singh, Yuying Zhang,
Ferruh Yigit, Andrew Rybchenko, Olivier Matz
Cc: dev, rasland
IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.
RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
ethernet link layer protocol, IB packets are encapsulated in the
ethernet layer and use ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.
This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:
testpmd> flow create 0 group 1 ingress pattern
eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
dst_qp is 0xd3 / end actions queue index 0 / end
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
---
app/test-pmd/cmdline_flow.c | 58 ++++++++++++++++++
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 +++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 68 +++++++++++++++++++++
8 files changed, 170 insertions(+)
create mode 100644 lib/net/rte_ib.h
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..3ade229ffc 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -496,6 +496,11 @@ enum index {
ITEM_QUOTA_STATE_NAME,
ITEM_AGGR_AFFINITY,
ITEM_AGGR_AFFINITY_VALUE,
+ ITEM_IB_BTH,
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
/* Validate/create actions. */
ACTIONS,
@@ -1452,6 +1457,7 @@ static const enum index next_item[] = {
ITEM_METER,
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
+ ITEM_IB_BTH,
END_SET,
ZERO,
};
@@ -1953,6 +1959,15 @@ static const enum index item_aggr_affinity[] = {
ZERO,
};
+static const enum index item_ib_bth[] = {
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
+ ITEM_NEXT,
+ ZERO,
+};
+
static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5523,6 +5538,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+ [ITEM_IB_BTH] = {
+ .name = "ib_bth",
+ .help = "match ib bth fields",
+ .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+ .next = NEXT(item_ib_bth),
+ .call = parse_vc,
+ },
+ [ITEM_IB_BTH_OPCODE] = {
+ .name = "opcode",
+ .help = "match ib bth opcode",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.opcode)),
+ },
+ [ITEM_IB_BTH_PKEY] = {
+ .name = "pkey",
+ .help = "partition key",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.pkey)),
+ },
+ [ITEM_IB_BTH_DST_QPN] = {
+ .name = "dst_qp",
+ .help = "destination qp",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.dst_qp)),
+ },
+ [ITEM_IB_BTH_PSN] = {
+ .name = "psn",
+ .help = "packet sequence number",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.psn)),
+ },
/* Validate/create actions. */
[ACTIONS] = {
.name = "actions",
@@ -11849,6 +11904,9 @@ flow_item_default_mask(const struct rte_flow_item *item)
case RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY:
mask = &rte_flow_item_aggr_affinity_mask;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ mask = &rte_flow_item_ib_bth_mask;
+ break;
default:
break;
}
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 1a5087abad..1738715e26 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -104,6 +104,7 @@ gtpc =
gtpu =
gtp_psc =
higig2 =
+ib_bth =
icmp =
icmp6 =
icmp6_echo_request =
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 32fc45516a..e2957df71c 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1551,6 +1551,13 @@ Matches flow quota state set by quota action.
- ``state``: Flow quota state
+Item: ``IB_BTH``
+^^^^^^^^^^^^^^^^
+
+Matches an InfiniBand base transport header in RoCE packet.
+
+- ``hdr``: InfiniBand base transport header definition (``rte_ib.h``).
+
Actions
~~~~~~~
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 8f23847859..4bad244029 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3781,6 +3781,13 @@ This section lists supported pattern items and their attributes, if any.
- ``send_to_kernel``: send packets to kernel.
+- ``ib_bth``: match InfiniBand BTH(base transport header).
+
+ - ``opcode {unsigned}``: Opcode.
+ - ``pkey {unsigned}``: Partition key.
+ - ``dst_qp {unsigned}``: Destination Queue Pair.
+ - ``psn {unsigned}``: Packet Sequence Number.
+
Actions list
^^^^^^^^^^^^
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 69e6e749f7..6e099deca3 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -164,6 +164,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
MK_FLOW_ITEM(QUOTA, sizeof(struct rte_flow_item_quota)),
MK_FLOW_ITEM(AGGR_AFFINITY, sizeof(struct rte_flow_item_aggr_affinity)),
+ MK_FLOW_ITEM(IB_BTH, sizeof(struct rte_flow_item_ib_bth)),
};
/** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 713ba8b65c..2b7f144c27 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -38,6 +38,7 @@
#include <rte_ppp.h>
#include <rte_gre.h>
#include <rte_macsec.h>
+#include <rte_ib.h>
#ifdef __cplusplus
extern "C" {
@@ -672,6 +673,13 @@ enum rte_flow_item_type {
* @see struct rte_flow_item_aggr_affinity.
*/
RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY,
+
+ /**
+ * Matches an InfiniBand base transport header in RoCE packet.
+ *
+ * See struct rte_flow_item_ib_bth.
+ */
+ RTE_FLOW_ITEM_TYPE_IB_BTH,
};
/**
@@ -2260,6 +2268,25 @@ rte_flow_item_aggr_affinity_mask = {
};
#endif
+/**
+ * RTE_FLOW_ITEM_TYPE_IB_BTH.
+ *
+ * Matches an InfiniBand base transport header in RoCE packet.
+ */
+struct rte_flow_item_ib_bth {
+ struct rte_ib_bth hdr; /**< InfiniBand base transport header definition. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_IB_BTH. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ib_bth rte_flow_item_ib_bth_mask = {
+ .hdr = {
+ .opcode = 0xff,
+ .dst_qp = "\xff\xff\xff",
+ },
+};
+#endif
+
/**
* Action types.
*
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..b7a0684101 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
'rte_geneve.h',
'rte_l2tpv2.h',
'rte_ppp.h',
+ 'rte_ib.h',
)
sources = files(
diff --git a/lib/net/rte_ib.h b/lib/net/rte_ib.h
new file mode 100644
index 0000000000..c1b2797815
--- /dev/null
+++ b/lib/net/rte_ib.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_IB_H
+#define RTE_IB_H
+
+/**
+ * @file
+ *
+ * InfiniBand headers definitions
+ *
+ * The infiniBand headers are used by RoCE (RDMA over Converged Ethernet).
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * InfiniBand Base Transport Header according to
+ * IB Specification Vol 1-Release-1.4.
+ */
+__extension__
+struct rte_ib_bth {
+ uint8_t opcode; /**< Opcode. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t tver:4; /**< Transport Header Version. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t se:1; /**< Solicited Event. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t se:1; /**< Solicited Event. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t tver:4; /**< Transport Header Version. */
+#endif
+ rte_be16_t pkey; /**< Partition key. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd0:6; /**< Reserved. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t f:1; /**< FECN. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t f:1; /**< FECN. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t rsvd0:6; /**< Reserved. */
+#endif
+ uint8_t dst_qp[3]; /**< Destination QP */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd1:7; /**< Reserved. */
+ uint8_t a:1; /**< Acknowledge Request. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t a:1; /**< Acknowledge Request. */
+ uint8_t rsvd1:7; /**< Reserved. */
+#endif
+ uint8_t psn[3]; /**< Packet Sequence Number */
+} __rte_packed;
+
+/** RoCEv2 default port. */
+#define RTE_ROCEV2_DEFAULT_PORT 4791
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_IB_H */
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v1 2/3] net/mlx5: add support for infiniband BTH match
2023-05-11 7:55 [PATCH v1 0/3] add support for infiniband BTH match Dong Zhou
2023-05-11 7:55 ` [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
@ 2023-05-11 7:55 ` Dong Zhou
2023-05-11 7:55 ` [PATCH v1 3/3] net/mlx5/hws: " Dong Zhou
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
3 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-11 7:55 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Matan Azrad; +Cc: dev, rasland
This patch adds support to match opcode and dst_qp fields in
infiniband BTH. Currently, only the RoCEv2 packet is supported,
the input BTH match item is defaulted to match one RoCEv2 packet.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
---
drivers/common/mlx5/mlx5_prm.h | 5 +-
drivers/net/mlx5/mlx5_flow.h | 6 ++
drivers/net/mlx5/mlx5_flow_dv.c | 102 ++++++++++++++++++++++++++++++++
3 files changed, 111 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ed3d5efbb7..8f55fd59b3 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -932,7 +932,7 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 gre_key_h[0x18];
u8 gre_key_l[0x8];
u8 vxlan_vni[0x18];
- u8 reserved_at_b8[0x8];
+ u8 bth_opcode[0x8];
u8 geneve_vni[0x18];
u8 lag_rx_port_affinity[0x4];
u8 reserved_at_e8[0x2];
@@ -945,7 +945,8 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 reserved_at_120[0xa];
u8 geneve_opt_len[0x6];
u8 geneve_protocol_type[0x10];
- u8 reserved_at_140[0x20];
+ u8 reserved_at_140[0x8];
+ u8 bth_dst_qp[0x18];
u8 inner_esp_spi[0x20];
u8 outer_esp_spi[0x20];
u8 reserved_at_1a0[0x60];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1d116ea0f6..c1d6a71708 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -227,6 +227,9 @@ enum mlx5_feature_name {
/* Aggregated affinity item */
#define MLX5_FLOW_ITEM_AGGR_AFFINITY (UINT64_C(1) << 49)
+/* IB BTH ITEM. */
+#define MLX5_FLOW_ITEM_IB_BTH (1ull << 51)
+
/* Outer Masks. */
#define MLX5_FLOW_LAYER_OUTER_L3 \
(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
@@ -364,6 +367,9 @@ enum mlx5_feature_name {
#define MLX5_UDP_PORT_VXLAN 4789
#define MLX5_UDP_PORT_VXLAN_GPE 4790
+/* UDP port numbers for RoCEv2. */
+#define MLX5_UDP_PORT_ROCEv2 4791
+
/* UDP port numbers for GENEVE. */
#define MLX5_UDP_PORT_GENEVE 6081
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f136f43b0a..b7dc8ecaf7 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7193,6 +7193,65 @@ flow_dv_validate_item_flex(struct rte_eth_dev *dev,
return 0;
}
+/**
+ * Validate IB BTH item.
+ *
+ * @param[in] dev
+ * Pointer to the rte_eth_dev structure.
+ * @param[in] udp_dport
+ * UDP destination port
+ * @param[in] item
+ * Item specification.
+ * @param root
+ * Whether action is on root table.
+ * @param[out] error
+ * Pointer to the error structure.
+ *
+ * @return
+ * 0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_validate_item_ib_bth(struct rte_eth_dev *dev,
+ uint16_t udp_dport,
+ const struct rte_flow_item *item,
+ bool root,
+ struct rte_flow_error *error)
+{
+ const struct rte_flow_item_ib_bth *mask = item->mask;
+ struct mlx5_priv *priv = dev->data->dev_private;
+ const struct rte_flow_item_ib_bth *valid_mask;
+ int ret;
+
+ valid_mask = &rte_flow_item_ib_bth_mask;
+ if (udp_dport && udp_dport != MLX5_UDP_PORT_ROCEv2)
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "protocol filtering not compatible"
+ " with UDP layer");
+ if (mask && (mask->hdr.se || mask->hdr.m || mask->hdr.padcnt ||
+ mask->hdr.tver || mask->hdr.pkey || mask->hdr.f || mask->hdr.b ||
+ mask->hdr.rsvd0 || mask->hdr.a || mask->hdr.rsvd1 ||
+ mask->hdr.psn[0] || mask->hdr.psn[1] || mask->hdr.psn[2]))
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "only opcode and dst_qp are supported");
+ if (root || priv->sh->steering_format_version ==
+ MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5)
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM,
+ item,
+ "IB BTH item is not supported");
+ if (!mask)
+ mask = &rte_flow_item_ib_bth_mask;
+ ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+ (const uint8_t *)valid_mask,
+ sizeof(struct rte_flow_item_ib_bth),
+ MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+ if (ret < 0)
+ return ret;
+ return 0;
+}
+
/**
* Internal validation function. For validating both actions and items.
*
@@ -7700,6 +7759,14 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
return ret;
last_item = MLX5_FLOW_ITEM_AGGR_AFFINITY;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ ret = mlx5_flow_validate_item_ib_bth(dev, udp_dport,
+ items, is_root, error);
+ if (ret < 0)
+ return ret;
+
+ last_item = MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
return rte_flow_error_set(error, ENOTSUP,
RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10971,6 +11038,37 @@ flow_dv_translate_item_aggr_affinity(void *key,
affinity_v->affinity & affinity_m->affinity);
}
+static void
+flow_dv_translate_item_ib_bth(void *key,
+ const struct rte_flow_item *item,
+ int inner, uint32_t key_type)
+{
+ const struct rte_flow_item_ib_bth *bth_m;
+ const struct rte_flow_item_ib_bth *bth_v;
+ void *headers_v, *misc_v;
+ uint16_t udp_dport;
+ char *qpn_v;
+ int i, size;
+
+ headers_v = inner ? MLX5_ADDR_OF(fte_match_param, key, inner_headers) :
+ MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+ if (!MLX5_GET16(fte_match_set_lyr_2_4, headers_v, udp_dport)) {
+ udp_dport = key_type & MLX5_SET_MATCHER_M ?
+ 0xFFFF : MLX5_UDP_PORT_ROCEv2;
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, udp_dport, udp_dport);
+ }
+ if (MLX5_ITEM_VALID(item, key_type))
+ return;
+ MLX5_ITEM_UPDATE(item, key_type, bth_v, bth_m, &rte_flow_item_ib_bth_mask);
+ misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+ MLX5_SET(fte_match_set_misc, misc_v, bth_opcode,
+ bth_v->hdr.opcode & bth_m->hdr.opcode);
+ qpn_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, bth_dst_qp);
+ size = sizeof(bth_m->hdr.dst_qp);
+ for (i = 0; i < size; ++i)
+ qpn_v[i] = bth_m->hdr.dst_qp[i] & bth_v->hdr.dst_qp[i];
+}
+
static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
#define HEADER_IS_ZERO(match_criteria, headers) \
@@ -13772,6 +13870,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
flow_dv_translate_item_aggr_affinity(key, items, key_type);
last_item = MLX5_FLOW_ITEM_AGGR_AFFINITY;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ flow_dv_translate_item_ib_bth(key, items, tunnel, key_type);
+ last_item = MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
break;
}
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v1 3/3] net/mlx5/hws: add support for infiniband BTH match
2023-05-11 7:55 [PATCH v1 0/3] add support for infiniband BTH match Dong Zhou
2023-05-11 7:55 ` [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-11 7:55 ` [PATCH v1 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
@ 2023-05-11 7:55 ` Dong Zhou
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
3 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-11 7:55 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Matan Azrad; +Cc: dev, rasland
This patch adds support to match opcode and dst_qp fields in
infiniband BTH. Currently, only the RoCEv2 packet is supported,
the input BTH match item is defaulted to match one RoCEv2 packet.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
---
drivers/net/mlx5/hws/mlx5dr_definer.c | 76 ++++++++++++++++++++++++++-
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
3 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c b/drivers/net/mlx5/hws/mlx5dr_definer.c
index f92d3e8e1f..1a427c9b64 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -10,6 +10,7 @@
#define ETH_TYPE_IPV6_VXLAN 0x86DD
#define ETH_VXLAN_DEFAULT_PORT 4789
#define IP_UDP_PORT_MPLS 6635
+#define UDP_ROCEV2_PORT 4791
#define DR_FLOW_LAYER_TUNNEL_NO_MPLS (MLX5_FLOW_LAYER_TUNNEL & ~MLX5_FLOW_LAYER_MPLS)
#define STE_NO_VLAN 0x0
@@ -171,7 +172,9 @@ struct mlx5dr_definer_conv_data {
X(SET_BE16, gre_opt_checksum, v->checksum_rsvd.checksum, rte_flow_item_gre_opt) \
X(SET, meter_color, rte_col_2_mlx5_col(v->color), rte_flow_item_meter_color) \
X(SET_BE32, ipsec_spi, v->hdr.spi, rte_flow_item_esp) \
- X(SET_BE32, ipsec_sequence_number, v->hdr.seq, rte_flow_item_esp)
+ X(SET_BE32, ipsec_sequence_number, v->hdr.seq, rte_flow_item_esp) \
+ X(SET, ib_l4_udp_port, UDP_ROCEV2_PORT, rte_flow_item_ib_bth) \
+ X(SET, ib_l4_opcode, v->hdr.opcode, rte_flow_item_ib_bth)
/* Item set function format */
#define X(set_type, func_name, value, item_type) \
@@ -583,6 +586,16 @@ mlx5dr_definer_mpls_label_set(struct mlx5dr_definer_fc *fc,
memcpy(tag + fc->byte_off + sizeof(v->label_tc_s), &v->ttl, sizeof(v->ttl));
}
+static void
+mlx5dr_definer_ib_l4_qp_set(struct mlx5dr_definer_fc *fc,
+ const void *item_spec,
+ uint8_t *tag)
+{
+ const struct rte_flow_item_ib_bth *v = item_spec;
+
+ memcpy(tag + fc->byte_off, &v->hdr.dst_qp, sizeof(v->hdr.dst_qp));
+}
+
static int
mlx5dr_definer_conv_item_eth(struct mlx5dr_definer_conv_data *cd,
struct rte_flow_item *item,
@@ -2041,6 +2054,63 @@ mlx5dr_definer_conv_item_flex_parser(struct mlx5dr_definer_conv_data *cd,
return 0;
}
+static int
+mlx5dr_definer_conv_item_ib_l4(struct mlx5dr_definer_conv_data *cd,
+ struct rte_flow_item *item,
+ int item_idx)
+{
+ const struct rte_flow_item_ib_bth *m = item->mask;
+ struct mlx5dr_definer_fc *fc;
+ bool inner = cd->tunnel;
+
+ /* In order to match on RoCEv2(layer4 ib), we must match
+ * on ip_protocol and l4_dport.
+ */
+ if (!cd->relaxed) {
+ fc = &cd->fc[DR_CALC_FNAME(IP_PROTOCOL, inner)];
+ if (!fc->tag_set) {
+ fc->item_idx = item_idx;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ fc->tag_set = &mlx5dr_definer_udp_protocol_set;
+ DR_CALC_SET(fc, eth_l2, l4_type_bwc, inner);
+ }
+
+ fc = &cd->fc[DR_CALC_FNAME(L4_DPORT, inner)];
+ if (!fc->tag_set) {
+ fc->item_idx = item_idx;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ fc->tag_set = &mlx5dr_definer_ib_l4_udp_port_set;
+ DR_CALC_SET(fc, eth_l4, destination_port, inner);
+ }
+ }
+
+ if (!m)
+ return 0;
+
+ if (m->hdr.se || m->hdr.m || m->hdr.padcnt || m->hdr.tver ||
+ m->hdr.pkey || m->hdr.f || m->hdr.b || m->hdr.rsvd0 ||
+ m->hdr.a || m->hdr.rsvd1 || !is_mem_zero(m->hdr.psn, 3)) {
+ rte_errno = ENOTSUP;
+ return rte_errno;
+ }
+
+ if (m->hdr.opcode) {
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_OPCODE];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_ib_l4_opcode_set;
+ DR_CALC_SET_HDR(fc, ib_l4, opcode);
+ }
+
+ if (!is_mem_zero(m->hdr.dst_qp, 3)) {
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_QPN];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_ib_l4_qp_set;
+ DR_CALC_SET_HDR(fc, ib_l4, qp);
+ }
+
+ return 0;
+}
+
static int
mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
struct mlx5dr_match_template *mt,
@@ -2182,6 +2252,10 @@ mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
item_flags |= MLX5_FLOW_LAYER_MPLS;
cd.mpls_idx++;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ ret = mlx5dr_definer_conv_item_ib_l4(&cd, items, i);
+ item_flags |= MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
DR_LOG(ERR, "Unsupported item type %d", items->type);
rte_errno = ENOTSUP;
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.h b/drivers/net/mlx5/hws/mlx5dr_definer.h
index 90ec4ce845..6b645f4cf0 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.h
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.h
@@ -134,6 +134,8 @@ enum mlx5dr_definer_fname {
MLX5DR_DEFINER_FNAME_OKS2_MPLS2_I,
MLX5DR_DEFINER_FNAME_OKS2_MPLS3_I,
MLX5DR_DEFINER_FNAME_OKS2_MPLS4_I,
+ MLX5DR_DEFINER_FNAME_IB_L4_OPCODE,
+ MLX5DR_DEFINER_FNAME_IB_L4_QPN,
MLX5DR_DEFINER_FNAME_MAX,
};
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7e0ee8d883..9381646267 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4969,6 +4969,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
case RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT:
case RTE_FLOW_ITEM_TYPE_ESP:
case RTE_FLOW_ITEM_TYPE_FLEX:
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
break;
case RTE_FLOW_ITEM_TYPE_INTEGRITY:
/*
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
2023-05-11 7:55 ` [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
@ 2023-05-17 17:06 ` Ori Kam
2023-05-22 7:01 ` Andrew Rybchenko
0 siblings, 1 reply; 23+ messages in thread
From: Ori Kam @ 2023-05-17 17:06 UTC (permalink / raw)
To: Bill Zhou, Slava Ovsiienko,
NBU-Contact-Thomas Monjalon (EXTERNAL),
Aman Singh, Yuying Zhang, Ferruh Yigit, Andrew Rybchenko,
Olivier Matz
Cc: dev, Raslan Darawsheh
Hi Bill,
> -----Original Message-----
> From: Bill Zhou <dongzhou@nvidia.com>
> Sent: Thursday, May 11, 2023 10:55 AM
> Subject: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
>
> IB(InfiniBand) is one type of networking used in high-performance
> computing with high throughput and low latency. Like Ethernet,
> IB defines a layered protocol (Physical, Link, Network, Transport
> Layers). IB provides native support for RDMA(Remote DMA), an
> extension of the DMA that allows direct access to remote host
> memory without CPU intervention. IB network requires NICs and
> switches to support the IB protocol.
>
> RoCE(RDMA over Converged Ethernet) is a network protocol that
> allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
> ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
> ethernet link layer protocol, IB packets are encapsulated in the
> ethernet layer and use ethernet type 0x8915. RoCEv2 is an internet
> layer protocol, IB packets are encapsulated in UDP payload and
> use a destination port 4791, The format of the RoCEv2 packet is
> as follows:
> ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
>
> BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> and RoCEv2 both contain this header. This patch introduces a new
> RTE item to match the IB BTH in RoCE packets. One use of this match
> is that the user can monitor RoCEv2's CNP(Congestion Notification
> Packet) by matching BTH opcode 0x81.
>
> This patch also adds the testpmd command line to match the RoCEv2
> BTH. Usage example:
>
> testpmd> flow create 0 group 1 ingress pattern
> eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
> dst_qp is 0xd3 / end actions queue index 0 / end
>
> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> ---
Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
2023-05-17 17:06 ` Ori Kam
@ 2023-05-22 7:01 ` Andrew Rybchenko
2023-05-24 6:58 ` Bill Zhou
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Rybchenko @ 2023-05-22 7:01 UTC (permalink / raw)
To: Ori Kam, Bill Zhou, Slava Ovsiienko,
NBU-Contact-Thomas Monjalon (EXTERNAL),
Aman Singh, Yuying Zhang, Ferruh Yigit, Olivier Matz
Cc: dev, Raslan Darawsheh
On 5/17/23 20:06, Ori Kam wrote:
> Hi Bill,
>
>> -----Original Message-----
>> From: Bill Zhou <dongzhou@nvidia.com>
>> Sent: Thursday, May 11, 2023 10:55 AM
>> Subject: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
RoEC should be added devtools/words-case.txt
IB as well.
>>
>> IB(InfiniBand) is one type of networking used in high-performance
>> computing with high throughput and low latency. Like Ethernet,
>> IB defines a layered protocol (Physical, Link, Network, Transport
>> Layers). IB provides native support for RDMA(Remote DMA), an
>> extension of the DMA that allows direct access to remote host
>> memory without CPU intervention. IB network requires NICs and
>> switches to support the IB protocol.
>>
>> RoCE(RDMA over Converged Ethernet) is a network protocol that
>> allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
>> ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
>> ethernet link layer protocol, IB packets are encapsulated in the
>> ethernet layer and use ethernet type 0x8915. RoCEv2 is an internet
ethernet -> Ethernet (4 times above)
>> layer protocol, IB packets are encapsulated in UDP payload and
>> use a destination port 4791, The format of the RoCEv2 packet is
>> as follows:
>> ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
>>
>> BTH(Base Transport Header) is the IB transport layer header, RoCEv1
>> and RoCEv2 both contain this header. This patch introduces a new
>> RTE item to match the IB BTH in RoCE packets. One use of this match
>> is that the user can monitor RoCEv2's CNP(Congestion Notification
>> Packet) by matching BTH opcode 0x81.
>>
>> This patch also adds the testpmd command line to match the RoCEv2
>> BTH. Usage example:
>>
>> testpmd> flow create 0 group 1 ingress pattern
>> eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
>> dst_qp is 0xd3 / end actions queue index 0 / end
>>
>> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
>> ---
>
> Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
2023-05-22 7:01 ` Andrew Rybchenko
@ 2023-05-24 6:58 ` Bill Zhou
0 siblings, 0 replies; 23+ messages in thread
From: Bill Zhou @ 2023-05-24 6:58 UTC (permalink / raw)
To: Andrew Rybchenko, Ori Kam, Slava Ovsiienko,
NBU-Contact-Thomas Monjalon (EXTERNAL),
Aman Singh, Yuying Zhang, Ferruh Yigit, Olivier Matz
Cc: dev, Raslan Darawsheh
Hi Andrew, will update those 2 comments in the V2, thanks.
> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, May 22, 2023 3:02 PM
> To: Ori Kam <orika@nvidia.com>; Bill Zhou <dongzhou@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
>
> On 5/17/23 20:06, Ori Kam wrote:
> > Hi Bill,
> >
> >> -----Original Message-----
> >> From: Bill Zhou <dongzhou@nvidia.com>
> >> Sent: Thursday, May 11, 2023 10:55 AM
> >> Subject: [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH
>
> RoEC should be added devtools/words-case.txt IB as well.
>
> >>
> >> IB(InfiniBand) is one type of networking used in high-performance
> >> computing with high throughput and low latency. Like Ethernet, IB
> >> defines a layered protocol (Physical, Link, Network, Transport
> >> Layers). IB provides native support for RDMA(Remote DMA), an
> >> extension of the DMA that allows direct access to remote host memory
> >> without CPU intervention. IB network requires NICs and switches to
> >> support the IB protocol.
> >>
> >> RoCE(RDMA over Converged Ethernet) is a network protocol that allows
> >> RDMA to run on Ethernet. RoCE encapsulates IB packets on ethernet and
> >> has two versions, RoCEv1 and RoCEv2. RoCEv1 is an ethernet link layer
> >> protocol, IB packets are encapsulated in the ethernet layer and use
> >> ethernet type 0x8915. RoCEv2 is an internet
>
> ethernet -> Ethernet (4 times above)
>
> >> layer protocol, IB packets are encapsulated in UDP payload and use a
> >> destination port 4791, The format of the RoCEv2 packet is as follows:
> >> ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
> >>
> >> BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> >> and RoCEv2 both contain this header. This patch introduces a new RTE
> >> item to match the IB BTH in RoCE packets. One use of this match is
> >> that the user can monitor RoCEv2's CNP(Congestion Notification
> >> Packet) by matching BTH opcode 0x81.
> >>
> >> This patch also adds the testpmd command line to match the RoCEv2
> >> BTH. Usage example:
> >>
> >> testpmd> flow create 0 group 1 ingress pattern
> >> eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
> >> dst_qp is 0xd3 / end actions queue index 0 / end
> >>
> >> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> >> ---
> >
> > Acked-by: Ori Kam <orika@nvidia.com>
>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 0/3] add support for infiniband BTH match
2023-05-11 7:55 [PATCH v1 0/3] add support for infiniband BTH match Dong Zhou
` (2 preceding siblings ...)
2023-05-11 7:55 ` [PATCH v1 3/3] net/mlx5/hws: " Dong Zhou
@ 2023-05-24 10:08 ` Dong Zhou
2023-05-24 10:08 ` [PATCH v2 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
` (3 more replies)
3 siblings, 4 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-24 10:08 UTC (permalink / raw)
To: orika, viacheslavo, thomas; +Cc: dev, rasland
Add new rte item to match the infiniband BTH in RoCE packets.
v2:
- Change "ethernet" name to "Ethernet" in the commit log.
- Add "RoCE" and "IB" 2 words to words-case.txt.
- Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
- Add "Acked-by" labels in the first ethdev patch.
Dong Zhou (3):
ethdev: add flow item for RoCE infiniband BTH
net/mlx5: add support for infiniband BTH match
net/mlx5/hws: add support for infiniband BTH match
app/test-pmd/cmdline_flow.c | 58 +++++++++++
devtools/words-case.txt | 2 +
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 ++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 ++
drivers/common/mlx5/mlx5_prm.h | 5 +-
drivers/net/mlx5/hws/mlx5dr_definer.c | 76 ++++++++++++++-
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/mlx5_flow.h | 6 ++
drivers/net/mlx5/mlx5_flow_dv.c | 102 ++++++++++++++++++++
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 ++++++++++++++
15 files changed, 363 insertions(+), 3 deletions(-)
create mode 100644 lib/net/rte_ib.h
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/3] ethdev: add flow item for RoCE infiniband BTH
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
@ 2023-05-24 10:08 ` Dong Zhou
2023-05-24 10:08 ` [PATCH v2 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
` (2 subsequent siblings)
3 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-24 10:08 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Aman Singh, Yuying Zhang,
Ferruh Yigit, Andrew Rybchenko, Olivier Matz
Cc: dev, rasland
IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.
RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
Ethernet link layer protocol, IB packets are encapsulated in the
Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.
This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:
testpmd> flow create 0 group 1 ingress pattern
eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
dst_qp is 0xd3 / end actions queue index 0 / end
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
app/test-pmd/cmdline_flow.c | 58 +++++++++++++++++
devtools/words-case.txt | 2 +
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 +++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 +++++++++++++++++++++
9 files changed, 174 insertions(+)
create mode 100644 lib/net/rte_ib.h
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index a68a6080a8..b9ecbe3c8d 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -498,6 +498,11 @@ enum index {
ITEM_AGGR_AFFINITY_VALUE,
ITEM_TX_QUEUE,
ITEM_TX_QUEUE_VALUE,
+ ITEM_IB_BTH,
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
/* Validate/create actions. */
ACTIONS,
@@ -1455,6 +1460,7 @@ static const enum index next_item[] = {
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
ITEM_TX_QUEUE,
+ ITEM_IB_BTH,
END_SET,
ZERO,
};
@@ -1962,6 +1968,15 @@ static const enum index item_tx_queue[] = {
ZERO,
};
+static const enum index item_ib_bth[] = {
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
+ ITEM_NEXT,
+ ZERO,
+};
+
static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5532,6 +5547,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+ [ITEM_IB_BTH] = {
+ .name = "ib_bth",
+ .help = "match ib bth fields",
+ .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+ .next = NEXT(item_ib_bth),
+ .call = parse_vc,
+ },
+ [ITEM_IB_BTH_OPCODE] = {
+ .name = "opcode",
+ .help = "match ib bth opcode",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.opcode)),
+ },
+ [ITEM_IB_BTH_PKEY] = {
+ .name = "pkey",
+ .help = "partition key",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.pkey)),
+ },
+ [ITEM_IB_BTH_DST_QPN] = {
+ .name = "dst_qp",
+ .help = "destination qp",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.dst_qp)),
+ },
+ [ITEM_IB_BTH_PSN] = {
+ .name = "psn",
+ .help = "packet sequence number",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.psn)),
+ },
/* Validate/create actions. */
[ACTIONS] = {
.name = "actions",
@@ -11877,6 +11932,9 @@ flow_item_default_mask(const struct rte_flow_item *item)
case RTE_FLOW_ITEM_TYPE_TX_QUEUE:
mask = &rte_flow_item_tx_queue_mask;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ mask = &rte_flow_item_ib_bth_mask;
+ break;
default:
break;
}
diff --git a/devtools/words-case.txt b/devtools/words-case.txt
index 42c7861b68..5bd34e8b88 100644
--- a/devtools/words-case.txt
+++ b/devtools/words-case.txt
@@ -27,6 +27,7 @@ GENEVE
GTPU
GUID
HW
+IB
ICMP
ID
IO
@@ -74,6 +75,7 @@ QinQ
RDMA
RETA
ROC
+RoCE
RQ
RSS
RVU
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 1a5087abad..1738715e26 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -104,6 +104,7 @@ gtpc =
gtpu =
gtp_psc =
higig2 =
+ib_bth =
icmp =
icmp6 =
icmp6_echo_request =
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index ac5c65131f..b82e9d99d4 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1558,6 +1558,13 @@ Matches flow quota state set by quota action.
- ``state``: Flow quota state
+Item: ``IB_BTH``
+^^^^^^^^^^^^^^^^
+
+Matches an InfiniBand base transport header in RoCE packet.
+
+- ``hdr``: InfiniBand base transport header definition (``rte_ib.h``).
+
Actions
~~~~~~~
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 29f7dd4428..049af62d88 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3785,6 +3785,13 @@ This section lists supported pattern items and their attributes, if any.
- ``send_to_kernel``: send packets to kernel.
+- ``ib_bth``: match InfiniBand BTH(base transport header).
+
+ - ``opcode {unsigned}``: Opcode.
+ - ``pkey {unsigned}``: Partition key.
+ - ``dst_qp {unsigned}``: Destination Queue Pair.
+ - ``psn {unsigned}``: Packet Sequence Number.
+
Actions list
^^^^^^^^^^^^
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index f0d7f868fa..163e662598 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -165,6 +165,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(QUOTA, sizeof(struct rte_flow_item_quota)),
MK_FLOW_ITEM(AGGR_AFFINITY, sizeof(struct rte_flow_item_aggr_affinity)),
MK_FLOW_ITEM(TX_QUEUE, sizeof(struct rte_flow_item_tx_queue)),
+ MK_FLOW_ITEM(IB_BTH, sizeof(struct rte_flow_item_ib_bth)),
};
/** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index fe28ba0a82..14ef25edd8 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -38,6 +38,7 @@
#include <rte_ppp.h>
#include <rte_gre.h>
#include <rte_macsec.h>
+#include <rte_ib.h>
#ifdef __cplusplus
extern "C" {
@@ -679,6 +680,13 @@ enum rte_flow_item_type {
* @see struct rte_flow_item_tx_queue
*/
RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+
+ /**
+ * Matches an InfiniBand base transport header in RoCE packet.
+ *
+ * See struct rte_flow_item_ib_bth.
+ */
+ RTE_FLOW_ITEM_TYPE_IB_BTH,
};
/**
@@ -2286,6 +2294,25 @@ rte_flow_item_aggr_affinity_mask = {
};
#endif
+/**
+ * RTE_FLOW_ITEM_TYPE_IB_BTH.
+ *
+ * Matches an InfiniBand base transport header in RoCE packet.
+ */
+struct rte_flow_item_ib_bth {
+ struct rte_ib_bth hdr; /**< InfiniBand base transport header definition. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_IB_BTH. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ib_bth rte_flow_item_ib_bth_mask = {
+ .hdr = {
+ .opcode = 0xff,
+ .dst_qp = "\xff\xff\xff",
+ },
+};
+#endif
+
/**
* Action types.
*
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..b7a0684101 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
'rte_geneve.h',
'rte_l2tpv2.h',
'rte_ppp.h',
+ 'rte_ib.h',
)
sources = files(
diff --git a/lib/net/rte_ib.h b/lib/net/rte_ib.h
new file mode 100644
index 0000000000..9eab5f9e15
--- /dev/null
+++ b/lib/net/rte_ib.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_IB_H
+#define RTE_IB_H
+
+/**
+ * @file
+ *
+ * InfiniBand headers definitions
+ *
+ * The infiniBand headers are used by RoCE (RDMA over Converged Ethernet).
+ */
+
+#include <stdint.h>
+
+#include <rte_byteorder.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * InfiniBand Base Transport Header according to
+ * IB Specification Vol 1-Release-1.4.
+ */
+__extension__
+struct rte_ib_bth {
+ uint8_t opcode; /**< Opcode. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t tver:4; /**< Transport Header Version. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t se:1; /**< Solicited Event. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t se:1; /**< Solicited Event. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t tver:4; /**< Transport Header Version. */
+#endif
+ rte_be16_t pkey; /**< Partition key. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd0:6; /**< Reserved. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t f:1; /**< FECN. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t f:1; /**< FECN. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t rsvd0:6; /**< Reserved. */
+#endif
+ uint8_t dst_qp[3]; /**< Destination QP */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd1:7; /**< Reserved. */
+ uint8_t a:1; /**< Acknowledge Request. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t a:1; /**< Acknowledge Request. */
+ uint8_t rsvd1:7; /**< Reserved. */
+#endif
+ uint8_t psn[3]; /**< Packet Sequence Number */
+} __rte_packed;
+
+/** RoCEv2 default port. */
+#define RTE_ROCEV2_DEFAULT_PORT 4791
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_IB_H */
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 2/3] net/mlx5: add support for infiniband BTH match
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
2023-05-24 10:08 ` [PATCH v2 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
@ 2023-05-24 10:08 ` Dong Zhou
2023-05-24 12:54 ` Ori Kam
2023-05-24 10:08 ` [PATCH v2 3/3] net/mlx5/hws: " Dong Zhou
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
3 siblings, 1 reply; 23+ messages in thread
From: Dong Zhou @ 2023-05-24 10:08 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Matan Azrad; +Cc: dev, rasland
This patch adds support to match opcode and dst_qp fields in
infiniband BTH. Currently, only the RoCEv2 packet is supported,
the input BTH match item is defaulted to match one RoCEv2 packet.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
---
drivers/common/mlx5/mlx5_prm.h | 5 +-
drivers/net/mlx5/mlx5_flow.h | 6 ++
drivers/net/mlx5/mlx5_flow_dv.c | 102 ++++++++++++++++++++++++++++++++
3 files changed, 111 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ed3d5efbb7..8f55fd59b3 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -932,7 +932,7 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 gre_key_h[0x18];
u8 gre_key_l[0x8];
u8 vxlan_vni[0x18];
- u8 reserved_at_b8[0x8];
+ u8 bth_opcode[0x8];
u8 geneve_vni[0x18];
u8 lag_rx_port_affinity[0x4];
u8 reserved_at_e8[0x2];
@@ -945,7 +945,8 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 reserved_at_120[0xa];
u8 geneve_opt_len[0x6];
u8 geneve_protocol_type[0x10];
- u8 reserved_at_140[0x20];
+ u8 reserved_at_140[0x8];
+ u8 bth_dst_qp[0x18];
u8 inner_esp_spi[0x20];
u8 outer_esp_spi[0x20];
u8 reserved_at_1a0[0x60];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1d116ea0f6..c1d6a71708 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -227,6 +227,9 @@ enum mlx5_feature_name {
/* Aggregated affinity item */
#define MLX5_FLOW_ITEM_AGGR_AFFINITY (UINT64_C(1) << 49)
+/* IB BTH ITEM. */
+#define MLX5_FLOW_ITEM_IB_BTH (1ull << 51)
+
/* Outer Masks. */
#define MLX5_FLOW_LAYER_OUTER_L3 \
(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
@@ -364,6 +367,9 @@ enum mlx5_feature_name {
#define MLX5_UDP_PORT_VXLAN 4789
#define MLX5_UDP_PORT_VXLAN_GPE 4790
+/* UDP port numbers for RoCEv2. */
+#define MLX5_UDP_PORT_ROCEv2 4791
+
/* UDP port numbers for GENEVE. */
#define MLX5_UDP_PORT_GENEVE 6081
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 7fcba284ad..d0d8a0739f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7193,6 +7193,65 @@ flow_dv_validate_item_flex(struct rte_eth_dev *dev,
return 0;
}
+/**
+ * Validate IB BTH item.
+ *
+ * @param[in] dev
+ * Pointer to the rte_eth_dev structure.
+ * @param[in] udp_dport
+ * UDP destination port
+ * @param[in] item
+ * Item specification.
+ * @param root
+ * Whether action is on root table.
+ * @param[out] error
+ * Pointer to the error structure.
+ *
+ * @return
+ * 0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_validate_item_ib_bth(struct rte_eth_dev *dev,
+ uint16_t udp_dport,
+ const struct rte_flow_item *item,
+ bool root,
+ struct rte_flow_error *error)
+{
+ const struct rte_flow_item_ib_bth *mask = item->mask;
+ struct mlx5_priv *priv = dev->data->dev_private;
+ const struct rte_flow_item_ib_bth *valid_mask;
+ int ret;
+
+ valid_mask = &rte_flow_item_ib_bth_mask;
+ if (udp_dport && udp_dport != MLX5_UDP_PORT_ROCEv2)
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "protocol filtering not compatible"
+ " with UDP layer");
+ if (mask && (mask->hdr.se || mask->hdr.m || mask->hdr.padcnt ||
+ mask->hdr.tver || mask->hdr.pkey || mask->hdr.f || mask->hdr.b ||
+ mask->hdr.rsvd0 || mask->hdr.a || mask->hdr.rsvd1 ||
+ mask->hdr.psn[0] || mask->hdr.psn[1] || mask->hdr.psn[2]))
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "only opcode and dst_qp are supported");
+ if (root || priv->sh->steering_format_version ==
+ MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5)
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM,
+ item,
+ "IB BTH item is not supported");
+ if (!mask)
+ mask = &rte_flow_item_ib_bth_mask;
+ ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+ (const uint8_t *)valid_mask,
+ sizeof(struct rte_flow_item_ib_bth),
+ MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+ if (ret < 0)
+ return ret;
+ return 0;
+}
+
/**
* Internal validation function. For validating both actions and items.
*
@@ -7700,6 +7759,14 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
return ret;
last_item = MLX5_FLOW_ITEM_AGGR_AFFINITY;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ ret = mlx5_flow_validate_item_ib_bth(dev, udp_dport,
+ items, is_root, error);
+ if (ret < 0)
+ return ret;
+
+ last_item = MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
return rte_flow_error_set(error, ENOTSUP,
RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10956,6 +11023,37 @@ flow_dv_translate_item_aggr_affinity(void *key,
affinity_v->affinity & affinity_m->affinity);
}
+static void
+flow_dv_translate_item_ib_bth(void *key,
+ const struct rte_flow_item *item,
+ int inner, uint32_t key_type)
+{
+ const struct rte_flow_item_ib_bth *bth_m;
+ const struct rte_flow_item_ib_bth *bth_v;
+ void *headers_v, *misc_v;
+ uint16_t udp_dport;
+ char *qpn_v;
+ int i, size;
+
+ headers_v = inner ? MLX5_ADDR_OF(fte_match_param, key, inner_headers) :
+ MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+ if (!MLX5_GET16(fte_match_set_lyr_2_4, headers_v, udp_dport)) {
+ udp_dport = key_type & MLX5_SET_MATCHER_M ?
+ 0xFFFF : MLX5_UDP_PORT_ROCEv2;
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, udp_dport, udp_dport);
+ }
+ if (MLX5_ITEM_VALID(item, key_type))
+ return;
+ MLX5_ITEM_UPDATE(item, key_type, bth_v, bth_m, &rte_flow_item_ib_bth_mask);
+ misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+ MLX5_SET(fte_match_set_misc, misc_v, bth_opcode,
+ bth_v->hdr.opcode & bth_m->hdr.opcode);
+ qpn_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, bth_dst_qp);
+ size = sizeof(bth_m->hdr.dst_qp);
+ for (i = 0; i < size; ++i)
+ qpn_v[i] = bth_m->hdr.dst_qp[i] & bth_v->hdr.dst_qp[i];
+}
+
static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
#define HEADER_IS_ZERO(match_criteria, headers) \
@@ -13757,6 +13855,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
flow_dv_translate_item_aggr_affinity(key, items, key_type);
last_item = MLX5_FLOW_ITEM_AGGR_AFFINITY;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ flow_dv_translate_item_ib_bth(key, items, tunnel, key_type);
+ last_item = MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
break;
}
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 3/3] net/mlx5/hws: add support for infiniband BTH match
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
2023-05-24 10:08 ` [PATCH v2 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-24 10:08 ` [PATCH v2 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
@ 2023-05-24 10:08 ` Dong Zhou
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
3 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-24 10:08 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Matan Azrad; +Cc: dev, rasland
This patch adds support to match opcode and dst_qp fields in
infiniband BTH. Currently, only the RoCEv2 packet is supported,
the input BTH match item is defaulted to match one RoCEv2 packet.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
---
drivers/net/mlx5/hws/mlx5dr_definer.c | 76 ++++++++++++++++++++++++++-
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
3 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c b/drivers/net/mlx5/hws/mlx5dr_definer.c
index f92d3e8e1f..1a427c9b64 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -10,6 +10,7 @@
#define ETH_TYPE_IPV6_VXLAN 0x86DD
#define ETH_VXLAN_DEFAULT_PORT 4789
#define IP_UDP_PORT_MPLS 6635
+#define UDP_ROCEV2_PORT 4791
#define DR_FLOW_LAYER_TUNNEL_NO_MPLS (MLX5_FLOW_LAYER_TUNNEL & ~MLX5_FLOW_LAYER_MPLS)
#define STE_NO_VLAN 0x0
@@ -171,7 +172,9 @@ struct mlx5dr_definer_conv_data {
X(SET_BE16, gre_opt_checksum, v->checksum_rsvd.checksum, rte_flow_item_gre_opt) \
X(SET, meter_color, rte_col_2_mlx5_col(v->color), rte_flow_item_meter_color) \
X(SET_BE32, ipsec_spi, v->hdr.spi, rte_flow_item_esp) \
- X(SET_BE32, ipsec_sequence_number, v->hdr.seq, rte_flow_item_esp)
+ X(SET_BE32, ipsec_sequence_number, v->hdr.seq, rte_flow_item_esp) \
+ X(SET, ib_l4_udp_port, UDP_ROCEV2_PORT, rte_flow_item_ib_bth) \
+ X(SET, ib_l4_opcode, v->hdr.opcode, rte_flow_item_ib_bth)
/* Item set function format */
#define X(set_type, func_name, value, item_type) \
@@ -583,6 +586,16 @@ mlx5dr_definer_mpls_label_set(struct mlx5dr_definer_fc *fc,
memcpy(tag + fc->byte_off + sizeof(v->label_tc_s), &v->ttl, sizeof(v->ttl));
}
+static void
+mlx5dr_definer_ib_l4_qp_set(struct mlx5dr_definer_fc *fc,
+ const void *item_spec,
+ uint8_t *tag)
+{
+ const struct rte_flow_item_ib_bth *v = item_spec;
+
+ memcpy(tag + fc->byte_off, &v->hdr.dst_qp, sizeof(v->hdr.dst_qp));
+}
+
static int
mlx5dr_definer_conv_item_eth(struct mlx5dr_definer_conv_data *cd,
struct rte_flow_item *item,
@@ -2041,6 +2054,63 @@ mlx5dr_definer_conv_item_flex_parser(struct mlx5dr_definer_conv_data *cd,
return 0;
}
+static int
+mlx5dr_definer_conv_item_ib_l4(struct mlx5dr_definer_conv_data *cd,
+ struct rte_flow_item *item,
+ int item_idx)
+{
+ const struct rte_flow_item_ib_bth *m = item->mask;
+ struct mlx5dr_definer_fc *fc;
+ bool inner = cd->tunnel;
+
+ /* In order to match on RoCEv2(layer4 ib), we must match
+ * on ip_protocol and l4_dport.
+ */
+ if (!cd->relaxed) {
+ fc = &cd->fc[DR_CALC_FNAME(IP_PROTOCOL, inner)];
+ if (!fc->tag_set) {
+ fc->item_idx = item_idx;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ fc->tag_set = &mlx5dr_definer_udp_protocol_set;
+ DR_CALC_SET(fc, eth_l2, l4_type_bwc, inner);
+ }
+
+ fc = &cd->fc[DR_CALC_FNAME(L4_DPORT, inner)];
+ if (!fc->tag_set) {
+ fc->item_idx = item_idx;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ fc->tag_set = &mlx5dr_definer_ib_l4_udp_port_set;
+ DR_CALC_SET(fc, eth_l4, destination_port, inner);
+ }
+ }
+
+ if (!m)
+ return 0;
+
+ if (m->hdr.se || m->hdr.m || m->hdr.padcnt || m->hdr.tver ||
+ m->hdr.pkey || m->hdr.f || m->hdr.b || m->hdr.rsvd0 ||
+ m->hdr.a || m->hdr.rsvd1 || !is_mem_zero(m->hdr.psn, 3)) {
+ rte_errno = ENOTSUP;
+ return rte_errno;
+ }
+
+ if (m->hdr.opcode) {
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_OPCODE];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_ib_l4_opcode_set;
+ DR_CALC_SET_HDR(fc, ib_l4, opcode);
+ }
+
+ if (!is_mem_zero(m->hdr.dst_qp, 3)) {
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_QPN];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_ib_l4_qp_set;
+ DR_CALC_SET_HDR(fc, ib_l4, qp);
+ }
+
+ return 0;
+}
+
static int
mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
struct mlx5dr_match_template *mt,
@@ -2182,6 +2252,10 @@ mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
item_flags |= MLX5_FLOW_LAYER_MPLS;
cd.mpls_idx++;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ ret = mlx5dr_definer_conv_item_ib_l4(&cd, items, i);
+ item_flags |= MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
DR_LOG(ERR, "Unsupported item type %d", items->type);
rte_errno = ENOTSUP;
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.h b/drivers/net/mlx5/hws/mlx5dr_definer.h
index 90ec4ce845..6b645f4cf0 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.h
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.h
@@ -134,6 +134,8 @@ enum mlx5dr_definer_fname {
MLX5DR_DEFINER_FNAME_OKS2_MPLS2_I,
MLX5DR_DEFINER_FNAME_OKS2_MPLS3_I,
MLX5DR_DEFINER_FNAME_OKS2_MPLS4_I,
+ MLX5DR_DEFINER_FNAME_IB_L4_OPCODE,
+ MLX5DR_DEFINER_FNAME_IB_L4_QPN,
MLX5DR_DEFINER_FNAME_MAX,
};
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 853c94af9c..f9e7f844ea 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4969,6 +4969,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
case RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT:
case RTE_FLOW_ITEM_TYPE_ESP:
case RTE_FLOW_ITEM_TYPE_FLEX:
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
break;
case RTE_FLOW_ITEM_TYPE_INTEGRITY:
/*
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v2 2/3] net/mlx5: add support for infiniband BTH match
2023-05-24 10:08 ` [PATCH v2 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
@ 2023-05-24 12:54 ` Ori Kam
0 siblings, 0 replies; 23+ messages in thread
From: Ori Kam @ 2023-05-24 12:54 UTC (permalink / raw)
To: Bill Zhou, Slava Ovsiienko,
NBU-Contact-Thomas Monjalon (EXTERNAL),
Matan Azrad
Cc: dev, Raslan Darawsheh
Hi Bill,
> -----Original Message-----
> From: Bill Zhou <dongzhou@nvidia.com>
> Sent: Wednesday, May 24, 2023 1:08 PM
>
> This patch adds support to match opcode and dst_qp fields in
> infiniband BTH. Currently, only the RoCEv2 packet is supported,
> the input BTH match item is defaulted to match one RoCEv2 packet.
>
> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> ---
Acked-by: Ori Kam <orika@nvidia.com>
Ori
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 0/3] add support for infiniband BTH match
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
` (2 preceding siblings ...)
2023-05-24 10:08 ` [PATCH v2 3/3] net/mlx5/hws: " Dong Zhou
@ 2023-05-25 7:40 ` Dong Zhou
2023-05-25 7:40 ` [PATCH v3 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
` (4 more replies)
3 siblings, 5 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-25 7:40 UTC (permalink / raw)
To: orika, viacheslavo, thomas; +Cc: dev, rasland
Add new rte item to match the infiniband BTH in RoCE packets.
v2:
- Change "ethernet" name to "Ethernet" in the commit log.
- Add "RoCE" and "IB" 2 words to words-case.txt.
- Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
- Add "Acked-by" labels in the first ethdev patch.
v3:
- Do rebase to fix the patch apply failure.
- Add "Acked-by" label in the second net/mlx5 patch.
Dong Zhou (3):
ethdev: add flow item for RoCE infiniband BTH
net/mlx5: add support for infiniband BTH match
net/mlx5/hws: add support for infiniband BTH match
app/test-pmd/cmdline_flow.c | 58 +++++++++++
devtools/words-case.txt | 2 +
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 ++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 ++
drivers/common/mlx5/mlx5_prm.h | 5 +-
drivers/net/mlx5/hws/mlx5dr_definer.c | 76 ++++++++++++++-
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/mlx5_flow.h | 6 ++
drivers/net/mlx5/mlx5_flow_dv.c | 102 ++++++++++++++++++++
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 ++++++++++++++
15 files changed, 363 insertions(+), 3 deletions(-)
create mode 100644 lib/net/rte_ib.h
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 1/3] ethdev: add flow item for RoCE infiniband BTH
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
@ 2023-05-25 7:40 ` Dong Zhou
2023-05-25 7:40 ` [PATCH v3 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
` (3 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-25 7:40 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Aman Singh, Yuying Zhang,
Ferruh Yigit, Andrew Rybchenko, Olivier Matz
Cc: dev, rasland
IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.
RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
Ethernet link layer protocol, IB packets are encapsulated in the
Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.
This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:
testpmd> flow create 0 group 1 ingress pattern
eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
dst_qp is 0xd3 / end actions queue index 0 / end
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
app/test-pmd/cmdline_flow.c | 58 +++++++++++++++++
devtools/words-case.txt | 2 +
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 +++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 +++++++++++++++++++++
9 files changed, 174 insertions(+)
create mode 100644 lib/net/rte_ib.h
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..3ade229ffc 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -496,6 +496,11 @@ enum index {
ITEM_QUOTA_STATE_NAME,
ITEM_AGGR_AFFINITY,
ITEM_AGGR_AFFINITY_VALUE,
+ ITEM_IB_BTH,
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
/* Validate/create actions. */
ACTIONS,
@@ -1452,6 +1457,7 @@ static const enum index next_item[] = {
ITEM_METER,
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
+ ITEM_IB_BTH,
END_SET,
ZERO,
};
@@ -1953,6 +1959,15 @@ static const enum index item_aggr_affinity[] = {
ZERO,
};
+static const enum index item_ib_bth[] = {
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
+ ITEM_NEXT,
+ ZERO,
+};
+
static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5523,6 +5538,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+ [ITEM_IB_BTH] = {
+ .name = "ib_bth",
+ .help = "match ib bth fields",
+ .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+ .next = NEXT(item_ib_bth),
+ .call = parse_vc,
+ },
+ [ITEM_IB_BTH_OPCODE] = {
+ .name = "opcode",
+ .help = "match ib bth opcode",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.opcode)),
+ },
+ [ITEM_IB_BTH_PKEY] = {
+ .name = "pkey",
+ .help = "partition key",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.pkey)),
+ },
+ [ITEM_IB_BTH_DST_QPN] = {
+ .name = "dst_qp",
+ .help = "destination qp",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.dst_qp)),
+ },
+ [ITEM_IB_BTH_PSN] = {
+ .name = "psn",
+ .help = "packet sequence number",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.psn)),
+ },
/* Validate/create actions. */
[ACTIONS] = {
.name = "actions",
@@ -11849,6 +11904,9 @@ flow_item_default_mask(const struct rte_flow_item *item)
case RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY:
mask = &rte_flow_item_aggr_affinity_mask;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ mask = &rte_flow_item_ib_bth_mask;
+ break;
default:
break;
}
diff --git a/devtools/words-case.txt b/devtools/words-case.txt
index 42c7861b68..5bd34e8b88 100644
--- a/devtools/words-case.txt
+++ b/devtools/words-case.txt
@@ -27,6 +27,7 @@ GENEVE
GTPU
GUID
HW
+IB
ICMP
ID
IO
@@ -74,6 +75,7 @@ QinQ
RDMA
RETA
ROC
+RoCE
RQ
RSS
RVU
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 1a5087abad..1738715e26 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -104,6 +104,7 @@ gtpc =
gtpu =
gtp_psc =
higig2 =
+ib_bth =
icmp =
icmp6 =
icmp6_echo_request =
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 32fc45516a..e2957df71c 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1551,6 +1551,13 @@ Matches flow quota state set by quota action.
- ``state``: Flow quota state
+Item: ``IB_BTH``
+^^^^^^^^^^^^^^^^
+
+Matches an InfiniBand base transport header in RoCE packet.
+
+- ``hdr``: InfiniBand base transport header definition (``rte_ib.h``).
+
Actions
~~~~~~~
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 8f23847859..4bad244029 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3781,6 +3781,13 @@ This section lists supported pattern items and their attributes, if any.
- ``send_to_kernel``: send packets to kernel.
+- ``ib_bth``: match InfiniBand BTH(base transport header).
+
+ - ``opcode {unsigned}``: Opcode.
+ - ``pkey {unsigned}``: Partition key.
+ - ``dst_qp {unsigned}``: Destination Queue Pair.
+ - ``psn {unsigned}``: Packet Sequence Number.
+
Actions list
^^^^^^^^^^^^
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 69e6e749f7..6e099deca3 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -164,6 +164,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
MK_FLOW_ITEM(QUOTA, sizeof(struct rte_flow_item_quota)),
MK_FLOW_ITEM(AGGR_AFFINITY, sizeof(struct rte_flow_item_aggr_affinity)),
+ MK_FLOW_ITEM(IB_BTH, sizeof(struct rte_flow_item_ib_bth)),
};
/** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 713ba8b65c..2b7f144c27 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -38,6 +38,7 @@
#include <rte_ppp.h>
#include <rte_gre.h>
#include <rte_macsec.h>
+#include <rte_ib.h>
#ifdef __cplusplus
extern "C" {
@@ -672,6 +673,13 @@ enum rte_flow_item_type {
* @see struct rte_flow_item_aggr_affinity.
*/
RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY,
+
+ /**
+ * Matches an InfiniBand base transport header in RoCE packet.
+ *
+ * See struct rte_flow_item_ib_bth.
+ */
+ RTE_FLOW_ITEM_TYPE_IB_BTH,
};
/**
@@ -2260,6 +2268,25 @@ rte_flow_item_aggr_affinity_mask = {
};
#endif
+/**
+ * RTE_FLOW_ITEM_TYPE_IB_BTH.
+ *
+ * Matches an InfiniBand base transport header in RoCE packet.
+ */
+struct rte_flow_item_ib_bth {
+ struct rte_ib_bth hdr; /**< InfiniBand base transport header definition. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_IB_BTH. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ib_bth rte_flow_item_ib_bth_mask = {
+ .hdr = {
+ .opcode = 0xff,
+ .dst_qp = "\xff\xff\xff",
+ },
+};
+#endif
+
/**
* Action types.
*
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..b7a0684101 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
'rte_geneve.h',
'rte_l2tpv2.h',
'rte_ppp.h',
+ 'rte_ib.h',
)
sources = files(
diff --git a/lib/net/rte_ib.h b/lib/net/rte_ib.h
new file mode 100644
index 0000000000..9eab5f9e15
--- /dev/null
+++ b/lib/net/rte_ib.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_IB_H
+#define RTE_IB_H
+
+/**
+ * @file
+ *
+ * InfiniBand headers definitions
+ *
+ * The infiniBand headers are used by RoCE (RDMA over Converged Ethernet).
+ */
+
+#include <stdint.h>
+
+#include <rte_byteorder.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * InfiniBand Base Transport Header according to
+ * IB Specification Vol 1-Release-1.4.
+ */
+__extension__
+struct rte_ib_bth {
+ uint8_t opcode; /**< Opcode. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t tver:4; /**< Transport Header Version. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t se:1; /**< Solicited Event. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t se:1; /**< Solicited Event. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t tver:4; /**< Transport Header Version. */
+#endif
+ rte_be16_t pkey; /**< Partition key. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd0:6; /**< Reserved. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t f:1; /**< FECN. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t f:1; /**< FECN. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t rsvd0:6; /**< Reserved. */
+#endif
+ uint8_t dst_qp[3]; /**< Destination QP */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd1:7; /**< Reserved. */
+ uint8_t a:1; /**< Acknowledge Request. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t a:1; /**< Acknowledge Request. */
+ uint8_t rsvd1:7; /**< Reserved. */
+#endif
+ uint8_t psn[3]; /**< Packet Sequence Number */
+} __rte_packed;
+
+/** RoCEv2 default port. */
+#define RTE_ROCEV2_DEFAULT_PORT 4791
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_IB_H */
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 2/3] net/mlx5: add support for infiniband BTH match
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
2023-05-25 7:40 ` [PATCH v3 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
@ 2023-05-25 7:40 ` Dong Zhou
2023-05-25 7:40 ` [PATCH v3 3/3] net/mlx5/hws: " Dong Zhou
` (2 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-25 7:40 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Matan Azrad, Suanming Mou; +Cc: dev, rasland
This patch adds support to match opcode and dst_qp fields in
infiniband BTH. Currently, only the RoCEv2 packet is supported,
the input BTH match item is defaulted to match one RoCEv2 packet.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
drivers/common/mlx5/mlx5_prm.h | 5 +-
drivers/net/mlx5/mlx5_flow.h | 6 ++
drivers/net/mlx5/mlx5_flow_dv.c | 102 ++++++++++++++++++++++++++++++++
3 files changed, 111 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ed3d5efbb7..8f55fd59b3 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -932,7 +932,7 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 gre_key_h[0x18];
u8 gre_key_l[0x8];
u8 vxlan_vni[0x18];
- u8 reserved_at_b8[0x8];
+ u8 bth_opcode[0x8];
u8 geneve_vni[0x18];
u8 lag_rx_port_affinity[0x4];
u8 reserved_at_e8[0x2];
@@ -945,7 +945,8 @@ struct mlx5_ifc_fte_match_set_misc_bits {
u8 reserved_at_120[0xa];
u8 geneve_opt_len[0x6];
u8 geneve_protocol_type[0x10];
- u8 reserved_at_140[0x20];
+ u8 reserved_at_140[0x8];
+ u8 bth_dst_qp[0x18];
u8 inner_esp_spi[0x20];
u8 outer_esp_spi[0x20];
u8 reserved_at_1a0[0x60];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1d116ea0f6..c1d6a71708 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -227,6 +227,9 @@ enum mlx5_feature_name {
/* Aggregated affinity item */
#define MLX5_FLOW_ITEM_AGGR_AFFINITY (UINT64_C(1) << 49)
+/* IB BTH ITEM. */
+#define MLX5_FLOW_ITEM_IB_BTH (1ull << 51)
+
/* Outer Masks. */
#define MLX5_FLOW_LAYER_OUTER_L3 \
(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
@@ -364,6 +367,9 @@ enum mlx5_feature_name {
#define MLX5_UDP_PORT_VXLAN 4789
#define MLX5_UDP_PORT_VXLAN_GPE 4790
+/* UDP port numbers for RoCEv2. */
+#define MLX5_UDP_PORT_ROCEv2 4791
+
/* UDP port numbers for GENEVE. */
#define MLX5_UDP_PORT_GENEVE 6081
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d14661298c..a3b72dbb5f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7193,6 +7193,65 @@ flow_dv_validate_item_flex(struct rte_eth_dev *dev,
return 0;
}
+/**
+ * Validate IB BTH item.
+ *
+ * @param[in] dev
+ * Pointer to the rte_eth_dev structure.
+ * @param[in] udp_dport
+ * UDP destination port
+ * @param[in] item
+ * Item specification.
+ * @param root
+ * Whether action is on root table.
+ * @param[out] error
+ * Pointer to the error structure.
+ *
+ * @return
+ * 0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_validate_item_ib_bth(struct rte_eth_dev *dev,
+ uint16_t udp_dport,
+ const struct rte_flow_item *item,
+ bool root,
+ struct rte_flow_error *error)
+{
+ const struct rte_flow_item_ib_bth *mask = item->mask;
+ struct mlx5_priv *priv = dev->data->dev_private;
+ const struct rte_flow_item_ib_bth *valid_mask;
+ int ret;
+
+ valid_mask = &rte_flow_item_ib_bth_mask;
+ if (udp_dport && udp_dport != MLX5_UDP_PORT_ROCEv2)
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "protocol filtering not compatible"
+ " with UDP layer");
+ if (mask && (mask->hdr.se || mask->hdr.m || mask->hdr.padcnt ||
+ mask->hdr.tver || mask->hdr.pkey || mask->hdr.f || mask->hdr.b ||
+ mask->hdr.rsvd0 || mask->hdr.a || mask->hdr.rsvd1 ||
+ mask->hdr.psn[0] || mask->hdr.psn[1] || mask->hdr.psn[2]))
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "only opcode and dst_qp are supported");
+ if (root || priv->sh->steering_format_version ==
+ MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5)
+ return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM,
+ item,
+ "IB BTH item is not supported");
+ if (!mask)
+ mask = &rte_flow_item_ib_bth_mask;
+ ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+ (const uint8_t *)valid_mask,
+ sizeof(struct rte_flow_item_ib_bth),
+ MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+ if (ret < 0)
+ return ret;
+ return 0;
+}
+
/**
* Internal validation function. For validating both actions and items.
*
@@ -7700,6 +7759,14 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
return ret;
last_item = MLX5_FLOW_ITEM_AGGR_AFFINITY;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ ret = mlx5_flow_validate_item_ib_bth(dev, udp_dport,
+ items, is_root, error);
+ if (ret < 0)
+ return ret;
+
+ last_item = MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
return rte_flow_error_set(error, ENOTSUP,
RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10971,6 +11038,37 @@ flow_dv_translate_item_aggr_affinity(void *key,
affinity_v->affinity & affinity_m->affinity);
}
+static void
+flow_dv_translate_item_ib_bth(void *key,
+ const struct rte_flow_item *item,
+ int inner, uint32_t key_type)
+{
+ const struct rte_flow_item_ib_bth *bth_m;
+ const struct rte_flow_item_ib_bth *bth_v;
+ void *headers_v, *misc_v;
+ uint16_t udp_dport;
+ char *qpn_v;
+ int i, size;
+
+ headers_v = inner ? MLX5_ADDR_OF(fte_match_param, key, inner_headers) :
+ MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+ if (!MLX5_GET16(fte_match_set_lyr_2_4, headers_v, udp_dport)) {
+ udp_dport = key_type & MLX5_SET_MATCHER_M ?
+ 0xFFFF : MLX5_UDP_PORT_ROCEv2;
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, udp_dport, udp_dport);
+ }
+ if (MLX5_ITEM_VALID(item, key_type))
+ return;
+ MLX5_ITEM_UPDATE(item, key_type, bth_v, bth_m, &rte_flow_item_ib_bth_mask);
+ misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+ MLX5_SET(fte_match_set_misc, misc_v, bth_opcode,
+ bth_v->hdr.opcode & bth_m->hdr.opcode);
+ qpn_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, bth_dst_qp);
+ size = sizeof(bth_m->hdr.dst_qp);
+ for (i = 0; i < size; ++i)
+ qpn_v[i] = bth_m->hdr.dst_qp[i] & bth_v->hdr.dst_qp[i];
+}
+
static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
#define HEADER_IS_ZERO(match_criteria, headers) \
@@ -13772,6 +13870,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
flow_dv_translate_item_aggr_affinity(key, items, key_type);
last_item = MLX5_FLOW_ITEM_AGGR_AFFINITY;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ flow_dv_translate_item_ib_bth(key, items, tunnel, key_type);
+ last_item = MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
break;
}
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 3/3] net/mlx5/hws: add support for infiniband BTH match
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
2023-05-25 7:40 ` [PATCH v3 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-25 7:40 ` [PATCH v3 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
@ 2023-05-25 7:40 ` Dong Zhou
2023-05-29 13:36 ` Alex Vesker
2023-05-30 3:06 ` [PATCH v4] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-31 3:26 ` [PATCH v5] " Dong Zhou
4 siblings, 1 reply; 23+ messages in thread
From: Dong Zhou @ 2023-05-25 7:40 UTC (permalink / raw)
To: orika, viacheslavo, thomas, Matan Azrad, Suanming Mou; +Cc: dev, rasland
This patch adds support to match opcode and dst_qp fields in
infiniband BTH. Currently, only the RoCEv2 packet is supported,
the input BTH match item is defaulted to match one RoCEv2 packet.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
---
drivers/net/mlx5/hws/mlx5dr_definer.c | 76 ++++++++++++++++++++++++++-
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
3 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c b/drivers/net/mlx5/hws/mlx5dr_definer.c
index f92d3e8e1f..1a427c9b64 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -10,6 +10,7 @@
#define ETH_TYPE_IPV6_VXLAN 0x86DD
#define ETH_VXLAN_DEFAULT_PORT 4789
#define IP_UDP_PORT_MPLS 6635
+#define UDP_ROCEV2_PORT 4791
#define DR_FLOW_LAYER_TUNNEL_NO_MPLS (MLX5_FLOW_LAYER_TUNNEL & ~MLX5_FLOW_LAYER_MPLS)
#define STE_NO_VLAN 0x0
@@ -171,7 +172,9 @@ struct mlx5dr_definer_conv_data {
X(SET_BE16, gre_opt_checksum, v->checksum_rsvd.checksum, rte_flow_item_gre_opt) \
X(SET, meter_color, rte_col_2_mlx5_col(v->color), rte_flow_item_meter_color) \
X(SET_BE32, ipsec_spi, v->hdr.spi, rte_flow_item_esp) \
- X(SET_BE32, ipsec_sequence_number, v->hdr.seq, rte_flow_item_esp)
+ X(SET_BE32, ipsec_sequence_number, v->hdr.seq, rte_flow_item_esp) \
+ X(SET, ib_l4_udp_port, UDP_ROCEV2_PORT, rte_flow_item_ib_bth) \
+ X(SET, ib_l4_opcode, v->hdr.opcode, rte_flow_item_ib_bth)
/* Item set function format */
#define X(set_type, func_name, value, item_type) \
@@ -583,6 +586,16 @@ mlx5dr_definer_mpls_label_set(struct mlx5dr_definer_fc *fc,
memcpy(tag + fc->byte_off + sizeof(v->label_tc_s), &v->ttl, sizeof(v->ttl));
}
+static void
+mlx5dr_definer_ib_l4_qp_set(struct mlx5dr_definer_fc *fc,
+ const void *item_spec,
+ uint8_t *tag)
+{
+ const struct rte_flow_item_ib_bth *v = item_spec;
+
+ memcpy(tag + fc->byte_off, &v->hdr.dst_qp, sizeof(v->hdr.dst_qp));
+}
+
static int
mlx5dr_definer_conv_item_eth(struct mlx5dr_definer_conv_data *cd,
struct rte_flow_item *item,
@@ -2041,6 +2054,63 @@ mlx5dr_definer_conv_item_flex_parser(struct mlx5dr_definer_conv_data *cd,
return 0;
}
+static int
+mlx5dr_definer_conv_item_ib_l4(struct mlx5dr_definer_conv_data *cd,
+ struct rte_flow_item *item,
+ int item_idx)
+{
+ const struct rte_flow_item_ib_bth *m = item->mask;
+ struct mlx5dr_definer_fc *fc;
+ bool inner = cd->tunnel;
+
+ /* In order to match on RoCEv2(layer4 ib), we must match
+ * on ip_protocol and l4_dport.
+ */
+ if (!cd->relaxed) {
+ fc = &cd->fc[DR_CALC_FNAME(IP_PROTOCOL, inner)];
+ if (!fc->tag_set) {
+ fc->item_idx = item_idx;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ fc->tag_set = &mlx5dr_definer_udp_protocol_set;
+ DR_CALC_SET(fc, eth_l2, l4_type_bwc, inner);
+ }
+
+ fc = &cd->fc[DR_CALC_FNAME(L4_DPORT, inner)];
+ if (!fc->tag_set) {
+ fc->item_idx = item_idx;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ fc->tag_set = &mlx5dr_definer_ib_l4_udp_port_set;
+ DR_CALC_SET(fc, eth_l4, destination_port, inner);
+ }
+ }
+
+ if (!m)
+ return 0;
+
+ if (m->hdr.se || m->hdr.m || m->hdr.padcnt || m->hdr.tver ||
+ m->hdr.pkey || m->hdr.f || m->hdr.b || m->hdr.rsvd0 ||
+ m->hdr.a || m->hdr.rsvd1 || !is_mem_zero(m->hdr.psn, 3)) {
+ rte_errno = ENOTSUP;
+ return rte_errno;
+ }
+
+ if (m->hdr.opcode) {
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_OPCODE];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_ib_l4_opcode_set;
+ DR_CALC_SET_HDR(fc, ib_l4, opcode);
+ }
+
+ if (!is_mem_zero(m->hdr.dst_qp, 3)) {
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_QPN];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_ib_l4_qp_set;
+ DR_CALC_SET_HDR(fc, ib_l4, qp);
+ }
+
+ return 0;
+}
+
static int
mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
struct mlx5dr_match_template *mt,
@@ -2182,6 +2252,10 @@ mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
item_flags |= MLX5_FLOW_LAYER_MPLS;
cd.mpls_idx++;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ ret = mlx5dr_definer_conv_item_ib_l4(&cd, items, i);
+ item_flags |= MLX5_FLOW_ITEM_IB_BTH;
+ break;
default:
DR_LOG(ERR, "Unsupported item type %d", items->type);
rte_errno = ENOTSUP;
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.h b/drivers/net/mlx5/hws/mlx5dr_definer.h
index 90ec4ce845..6b645f4cf0 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.h
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.h
@@ -134,6 +134,8 @@ enum mlx5dr_definer_fname {
MLX5DR_DEFINER_FNAME_OKS2_MPLS2_I,
MLX5DR_DEFINER_FNAME_OKS2_MPLS3_I,
MLX5DR_DEFINER_FNAME_OKS2_MPLS4_I,
+ MLX5DR_DEFINER_FNAME_IB_L4_OPCODE,
+ MLX5DR_DEFINER_FNAME_IB_L4_QPN,
MLX5DR_DEFINER_FNAME_MAX,
};
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 853c94af9c..f9e7f844ea 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4969,6 +4969,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
case RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT:
case RTE_FLOW_ITEM_TYPE_ESP:
case RTE_FLOW_ITEM_TYPE_FLEX:
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
break;
case RTE_FLOW_ITEM_TYPE_INTEGRITY:
/*
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v3 3/3] net/mlx5/hws: add support for infiniband BTH match
2023-05-25 7:40 ` [PATCH v3 3/3] net/mlx5/hws: " Dong Zhou
@ 2023-05-29 13:36 ` Alex Vesker
0 siblings, 0 replies; 23+ messages in thread
From: Alex Vesker @ 2023-05-29 13:36 UTC (permalink / raw)
To: Bill Zhou, Ori Kam, Slava Ovsiienko,
NBU-Contact-Thomas Monjalon (EXTERNAL),
Matan Azrad, Suanming Mou
Cc: dev, Raslan Darawsheh
Hi,
> -----Original Message-----
> From: Dong Zhou <dongzhou@nvidia.com>
> Sent: Thursday, 25 May 2023 10:41
> To: Ori Kam <orika@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> Matan Azrad <matan@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v3 3/3] net/mlx5/hws: add support for infiniband BTH match
>
> This patch adds support to match opcode and dst_qp fields in infiniband BTH.
> Currently, only the RoCEv2 packet is supported, the input BTH match item is
> defaulted to match one RoCEv2 packet.
>
> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> ---
> drivers/net/mlx5/hws/mlx5dr_definer.c | 76
> ++++++++++++++++++++++++++- drivers/net/mlx5/hws/mlx5dr_definer.h |
> 2 +
> drivers/net/mlx5/mlx5_flow_hw.c | 1 +
> 3 files changed, 78 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c
> b/drivers/net/mlx5/hws/mlx5dr_definer.c
> index f92d3e8e1f..1a427c9b64 100644
> --- a/drivers/net/mlx5/hws/mlx5dr_definer.c
> +++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
> @@ -10,6 +10,7 @@
> #define ETH_TYPE_IPV6_VXLAN 0x86DD
> #define ETH_VXLAN_DEFAULT_PORT 4789
> #define IP_UDP_PORT_MPLS 6635
> +#define UDP_ROCEV2_PORT 4791
> #define DR_FLOW_LAYER_TUNNEL_NO_MPLS (MLX5_FLOW_LAYER_TUNNEL
> & ~MLX5_FLOW_LAYER_MPLS)
>
> #define STE_NO_VLAN 0x0
> @@ -171,7 +172,9 @@ struct mlx5dr_definer_conv_data {
> X(SET_BE16, gre_opt_checksum, v->checksum_rsvd.checksum,
> rte_flow_item_gre_opt) \
> X(SET, meter_color, rte_col_2_mlx5_col(v->color),
> rte_flow_item_meter_color) \
> X(SET_BE32, ipsec_spi, v->hdr.spi, rte_flow_item_esp) \
> - X(SET_BE32, ipsec_sequence_number, v->hdr.seq,
> rte_flow_item_esp)
> + X(SET_BE32, ipsec_sequence_number, v->hdr.seq,
> rte_flow_item_esp) \
> + X(SET, ib_l4_udp_port, UDP_ROCEV2_PORT,
> rte_flow_item_ib_bth) \
> + X(SET, ib_l4_opcode, v->hdr.opcode,
> rte_flow_item_ib_bth)
>
> /* Item set function format */
> #define X(set_type, func_name, value, item_type) \ @@ -583,6 +586,16 @@
> mlx5dr_definer_mpls_label_set(struct mlx5dr_definer_fc *fc,
> memcpy(tag + fc->byte_off + sizeof(v->label_tc_s), &v->ttl, sizeof(v-
> >ttl)); }
>
> +static void
> +mlx5dr_definer_ib_l4_qp_set(struct mlx5dr_definer_fc *fc,
> + const void *item_spec,
> + uint8_t *tag)
> +{
> + const struct rte_flow_item_ib_bth *v = item_spec;
> +
> + memcpy(tag + fc->byte_off, &v->hdr.dst_qp, sizeof(v->hdr.dst_qp)); }
> +
> static int
> mlx5dr_definer_conv_item_eth(struct mlx5dr_definer_conv_data *cd,
> struct rte_flow_item *item,
> @@ -2041,6 +2054,63 @@ mlx5dr_definer_conv_item_flex_parser(struct
> mlx5dr_definer_conv_data *cd,
> return 0;
> }
>
> +static int
> +mlx5dr_definer_conv_item_ib_l4(struct mlx5dr_definer_conv_data *cd,
> + struct rte_flow_item *item,
> + int item_idx)
> +{
> + const struct rte_flow_item_ib_bth *m = item->mask;
> + struct mlx5dr_definer_fc *fc;
> + bool inner = cd->tunnel;
> +
> + /* In order to match on RoCEv2(layer4 ib), we must match
> + * on ip_protocol and l4_dport.
> + */
> + if (!cd->relaxed) {
> + fc = &cd->fc[DR_CALC_FNAME(IP_PROTOCOL, inner)];
> + if (!fc->tag_set) {
> + fc->item_idx = item_idx;
> + fc->tag_mask_set = &mlx5dr_definer_ones_set;
> + fc->tag_set = &mlx5dr_definer_udp_protocol_set;
> + DR_CALC_SET(fc, eth_l2, l4_type_bwc, inner);
> + }
> +
> + fc = &cd->fc[DR_CALC_FNAME(L4_DPORT, inner)];
> + if (!fc->tag_set) {
> + fc->item_idx = item_idx;
> + fc->tag_mask_set = &mlx5dr_definer_ones_set;
> + fc->tag_set = &mlx5dr_definer_ib_l4_udp_port_set;
> + DR_CALC_SET(fc, eth_l4, destination_port, inner);
> + }
> + }
> +
> + if (!m)
> + return 0;
> +
> + if (m->hdr.se || m->hdr.m || m->hdr.padcnt || m->hdr.tver ||
> + m->hdr.pkey || m->hdr.f || m->hdr.b || m->hdr.rsvd0 ||
> + m->hdr.a || m->hdr.rsvd1 || !is_mem_zero(m->hdr.psn, 3)) {
> + rte_errno = ENOTSUP;
> + return rte_errno;
> + }
> +
> + if (m->hdr.opcode) {
> + fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_OPCODE];
> + fc->item_idx = item_idx;
> + fc->tag_set = &mlx5dr_definer_ib_l4_opcode_set;
> + DR_CALC_SET_HDR(fc, ib_l4, opcode);
> + }
> +
> + if (!is_mem_zero(m->hdr.dst_qp, 3)) {
> + fc = &cd->fc[MLX5DR_DEFINER_FNAME_IB_L4_QPN];
> + fc->item_idx = item_idx;
> + fc->tag_set = &mlx5dr_definer_ib_l4_qp_set;
> + DR_CALC_SET_HDR(fc, ib_l4, qp);
> + }
> +
> + return 0;
> +}
> +
> static int
> mlx5dr_definer_conv_items_to_hl(struct mlx5dr_context *ctx,
> struct mlx5dr_match_template *mt,
> @@ -2182,6 +2252,10 @@ mlx5dr_definer_conv_items_to_hl(struct
> mlx5dr_context *ctx,
> item_flags |= MLX5_FLOW_LAYER_MPLS;
> cd.mpls_idx++;
> break;
> + case RTE_FLOW_ITEM_TYPE_IB_BTH:
> + ret = mlx5dr_definer_conv_item_ib_l4(&cd, items, i);
> + item_flags |= MLX5_FLOW_ITEM_IB_BTH;
> + break;
> default:
> DR_LOG(ERR, "Unsupported item type %d", items-
> >type);
> rte_errno = ENOTSUP;
> diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.h
> b/drivers/net/mlx5/hws/mlx5dr_definer.h
> index 90ec4ce845..6b645f4cf0 100644
> --- a/drivers/net/mlx5/hws/mlx5dr_definer.h
> +++ b/drivers/net/mlx5/hws/mlx5dr_definer.h
> @@ -134,6 +134,8 @@ enum mlx5dr_definer_fname {
> MLX5DR_DEFINER_FNAME_OKS2_MPLS2_I,
> MLX5DR_DEFINER_FNAME_OKS2_MPLS3_I,
> MLX5DR_DEFINER_FNAME_OKS2_MPLS4_I,
> + MLX5DR_DEFINER_FNAME_IB_L4_OPCODE,
> + MLX5DR_DEFINER_FNAME_IB_L4_QPN,
> MLX5DR_DEFINER_FNAME_MAX,
> };
>
> diff --git a/drivers/net/mlx5/mlx5_flow_hw.c
> b/drivers/net/mlx5/mlx5_flow_hw.c index 853c94af9c..f9e7f844ea 100644
> --- a/drivers/net/mlx5/mlx5_flow_hw.c
> +++ b/drivers/net/mlx5/mlx5_flow_hw.c
> @@ -4969,6 +4969,7 @@ flow_hw_pattern_validate(struct rte_eth_dev
> *dev,
> case RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT:
> case RTE_FLOW_ITEM_TYPE_ESP:
> case RTE_FLOW_ITEM_TYPE_FLEX:
> + case RTE_FLOW_ITEM_TYPE_IB_BTH:
> break;
> case RTE_FLOW_ITEM_TYPE_INTEGRITY:
> /*
> --
> 2.27.0
Acked-by: Alex Vesker <valex@nvidia.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4] ethdev: add flow item for RoCE infiniband BTH
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
` (2 preceding siblings ...)
2023-05-25 7:40 ` [PATCH v3 3/3] net/mlx5/hws: " Dong Zhou
@ 2023-05-30 3:06 ` Dong Zhou
2023-05-30 17:46 ` Ferruh Yigit
2023-05-31 3:26 ` [PATCH v5] " Dong Zhou
4 siblings, 1 reply; 23+ messages in thread
From: Dong Zhou @ 2023-05-30 3:06 UTC (permalink / raw)
To: orika, thomas, Aman Singh, Yuying Zhang, Ferruh Yigit,
Andrew Rybchenko, Olivier Matz
Cc: dev
IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.
RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
Ethernet link layer protocol, IB packets are encapsulated in the
Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.
This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:
testpmd> flow create 0 group 1 ingress pattern
eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
dst_qp is 0xd3 / end actions queue index 0 / end
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
v2:
- Change "ethernet" name to "Ethernet" in the commit log.
- Add "RoCE" and "IB" 2 words to words-case.txt.
- Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
- Add "Acked-by" labels in the first ethdev patch.
v3:
- Do rebase to fix the patch apply failure.
- Add "Acked-by" label in the second net/mlx5 patch.
v4:
- Split this series of patches, only keep the first ethdev patch.
---
app/test-pmd/cmdline_flow.c | 58 +++++++++++++++++
devtools/words-case.txt | 2 +
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 +++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 +++++++++++++++++++++
9 files changed, 174 insertions(+)
create mode 100644 lib/net/rte_ib.h
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..3ade229ffc 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -496,6 +496,11 @@ enum index {
ITEM_QUOTA_STATE_NAME,
ITEM_AGGR_AFFINITY,
ITEM_AGGR_AFFINITY_VALUE,
+ ITEM_IB_BTH,
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
/* Validate/create actions. */
ACTIONS,
@@ -1452,6 +1457,7 @@ static const enum index next_item[] = {
ITEM_METER,
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
+ ITEM_IB_BTH,
END_SET,
ZERO,
};
@@ -1953,6 +1959,15 @@ static const enum index item_aggr_affinity[] = {
ZERO,
};
+static const enum index item_ib_bth[] = {
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
+ ITEM_NEXT,
+ ZERO,
+};
+
static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5523,6 +5538,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+ [ITEM_IB_BTH] = {
+ .name = "ib_bth",
+ .help = "match ib bth fields",
+ .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+ .next = NEXT(item_ib_bth),
+ .call = parse_vc,
+ },
+ [ITEM_IB_BTH_OPCODE] = {
+ .name = "opcode",
+ .help = "match ib bth opcode",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.opcode)),
+ },
+ [ITEM_IB_BTH_PKEY] = {
+ .name = "pkey",
+ .help = "partition key",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.pkey)),
+ },
+ [ITEM_IB_BTH_DST_QPN] = {
+ .name = "dst_qp",
+ .help = "destination qp",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.dst_qp)),
+ },
+ [ITEM_IB_BTH_PSN] = {
+ .name = "psn",
+ .help = "packet sequence number",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.psn)),
+ },
/* Validate/create actions. */
[ACTIONS] = {
.name = "actions",
@@ -11849,6 +11904,9 @@ flow_item_default_mask(const struct rte_flow_item *item)
case RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY:
mask = &rte_flow_item_aggr_affinity_mask;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ mask = &rte_flow_item_ib_bth_mask;
+ break;
default:
break;
}
diff --git a/devtools/words-case.txt b/devtools/words-case.txt
index 42c7861b68..5bd34e8b88 100644
--- a/devtools/words-case.txt
+++ b/devtools/words-case.txt
@@ -27,6 +27,7 @@ GENEVE
GTPU
GUID
HW
+IB
ICMP
ID
IO
@@ -74,6 +75,7 @@ QinQ
RDMA
RETA
ROC
+RoCE
RQ
RSS
RVU
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 1a5087abad..1738715e26 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -104,6 +104,7 @@ gtpc =
gtpu =
gtp_psc =
higig2 =
+ib_bth =
icmp =
icmp6 =
icmp6_echo_request =
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 32fc45516a..e2957df71c 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1551,6 +1551,13 @@ Matches flow quota state set by quota action.
- ``state``: Flow quota state
+Item: ``IB_BTH``
+^^^^^^^^^^^^^^^^
+
+Matches an InfiniBand base transport header in RoCE packet.
+
+- ``hdr``: InfiniBand base transport header definition (``rte_ib.h``).
+
Actions
~~~~~~~
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 8f23847859..4bad244029 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3781,6 +3781,13 @@ This section lists supported pattern items and their attributes, if any.
- ``send_to_kernel``: send packets to kernel.
+- ``ib_bth``: match InfiniBand BTH(base transport header).
+
+ - ``opcode {unsigned}``: Opcode.
+ - ``pkey {unsigned}``: Partition key.
+ - ``dst_qp {unsigned}``: Destination Queue Pair.
+ - ``psn {unsigned}``: Packet Sequence Number.
+
Actions list
^^^^^^^^^^^^
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 69e6e749f7..6e099deca3 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -164,6 +164,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
MK_FLOW_ITEM(QUOTA, sizeof(struct rte_flow_item_quota)),
MK_FLOW_ITEM(AGGR_AFFINITY, sizeof(struct rte_flow_item_aggr_affinity)),
+ MK_FLOW_ITEM(IB_BTH, sizeof(struct rte_flow_item_ib_bth)),
};
/** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 713ba8b65c..2b7f144c27 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -38,6 +38,7 @@
#include <rte_ppp.h>
#include <rte_gre.h>
#include <rte_macsec.h>
+#include <rte_ib.h>
#ifdef __cplusplus
extern "C" {
@@ -672,6 +673,13 @@ enum rte_flow_item_type {
* @see struct rte_flow_item_aggr_affinity.
*/
RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY,
+
+ /**
+ * Matches an InfiniBand base transport header in RoCE packet.
+ *
+ * See struct rte_flow_item_ib_bth.
+ */
+ RTE_FLOW_ITEM_TYPE_IB_BTH,
};
/**
@@ -2260,6 +2268,25 @@ rte_flow_item_aggr_affinity_mask = {
};
#endif
+/**
+ * RTE_FLOW_ITEM_TYPE_IB_BTH.
+ *
+ * Matches an InfiniBand base transport header in RoCE packet.
+ */
+struct rte_flow_item_ib_bth {
+ struct rte_ib_bth hdr; /**< InfiniBand base transport header definition. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_IB_BTH. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ib_bth rte_flow_item_ib_bth_mask = {
+ .hdr = {
+ .opcode = 0xff,
+ .dst_qp = "\xff\xff\xff",
+ },
+};
+#endif
+
/**
* Action types.
*
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..b7a0684101 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
'rte_geneve.h',
'rte_l2tpv2.h',
'rte_ppp.h',
+ 'rte_ib.h',
)
sources = files(
diff --git a/lib/net/rte_ib.h b/lib/net/rte_ib.h
new file mode 100644
index 0000000000..9eab5f9e15
--- /dev/null
+++ b/lib/net/rte_ib.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_IB_H
+#define RTE_IB_H
+
+/**
+ * @file
+ *
+ * InfiniBand headers definitions
+ *
+ * The infiniBand headers are used by RoCE (RDMA over Converged Ethernet).
+ */
+
+#include <stdint.h>
+
+#include <rte_byteorder.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * InfiniBand Base Transport Header according to
+ * IB Specification Vol 1-Release-1.4.
+ */
+__extension__
+struct rte_ib_bth {
+ uint8_t opcode; /**< Opcode. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t tver:4; /**< Transport Header Version. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t se:1; /**< Solicited Event. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t se:1; /**< Solicited Event. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t tver:4; /**< Transport Header Version. */
+#endif
+ rte_be16_t pkey; /**< Partition key. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd0:6; /**< Reserved. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t f:1; /**< FECN. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t f:1; /**< FECN. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t rsvd0:6; /**< Reserved. */
+#endif
+ uint8_t dst_qp[3]; /**< Destination QP */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd1:7; /**< Reserved. */
+ uint8_t a:1; /**< Acknowledge Request. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t a:1; /**< Acknowledge Request. */
+ uint8_t rsvd1:7; /**< Reserved. */
+#endif
+ uint8_t psn[3]; /**< Packet Sequence Number */
+} __rte_packed;
+
+/** RoCEv2 default port. */
+#define RTE_ROCEV2_DEFAULT_PORT 4791
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_IB_H */
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4] ethdev: add flow item for RoCE infiniband BTH
2023-05-30 3:06 ` [PATCH v4] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
@ 2023-05-30 17:46 ` Ferruh Yigit
2023-05-31 3:22 ` Dong Zhou
0 siblings, 1 reply; 23+ messages in thread
From: Ferruh Yigit @ 2023-05-30 17:46 UTC (permalink / raw)
To: Dong Zhou, orika, thomas, Aman Singh, Yuying Zhang,
Andrew Rybchenko, Olivier Matz
Cc: dev
On 5/30/2023 4:06 AM, Dong Zhou wrote:
> IB(InfiniBand) is one type of networking used in high-performance
> computing with high throughput and low latency. Like Ethernet,
> IB defines a layered protocol (Physical, Link, Network, Transport
> Layers). IB provides native support for RDMA(Remote DMA), an
> extension of the DMA that allows direct access to remote host
> memory without CPU intervention. IB network requires NICs and
> switches to support the IB protocol.
>
> RoCE(RDMA over Converged Ethernet) is a network protocol that
> allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
> Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
> Ethernet link layer protocol, IB packets are encapsulated in the
> Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
> layer protocol, IB packets are encapsulated in UDP payload and
> use a destination port 4791, The format of the RoCEv2 packet is
> as follows:
> ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
>
> BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> and RoCEv2 both contain this header. This patch introduces a new
> RTE item to match the IB BTH in RoCE packets. One use of this match
> is that the user can monitor RoCEv2's CNP(Congestion Notification
> Packet) by matching BTH opcode 0x81.
>
> This patch also adds the testpmd command line to match the RoCEv2
> BTH. Usage example:
>
> testpmd> flow create 0 group 1 ingress pattern
> eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
> dst_qp is 0xd3 / end actions queue index 0 / end
>
> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>
> v2:
> - Change "ethernet" name to "Ethernet" in the commit log.
> - Add "RoCE" and "IB" 2 words to words-case.txt.
> - Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
> - Add "Acked-by" labels in the first ethdev patch.
>
> v3:
> - Do rebase to fix the patch apply failure.
> - Add "Acked-by" label in the second net/mlx5 patch.
>
> v4:
> - Split this series of patches, only keep the first ethdev patch.
>
Patch looks good, can you please add a release notes update too?
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v4] ethdev: add flow item for RoCE infiniband BTH
2023-05-30 17:46 ` Ferruh Yigit
@ 2023-05-31 3:22 ` Dong Zhou
0 siblings, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-31 3:22 UTC (permalink / raw)
To: orika, thomas; +Cc: dev
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, May 31, 2023 1:46 AM
> To: Dong Zhou <dongzhou@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v4] ethdev: add flow item for RoCE infiniband BTH
>
> On 5/30/2023 4:06 AM, Dong Zhou wrote:
> > IB(InfiniBand) is one type of networking used in high-performance
> > computing with high throughput and low latency. Like Ethernet, IB
> > defines a layered protocol (Physical, Link, Network, Transport
> > Layers). IB provides native support for RDMA(Remote DMA), an extension
> > of the DMA that allows direct access to remote host memory without CPU
> > intervention. IB network requires NICs and switches to support the IB
> > protocol.
> >
> > RoCE(RDMA over Converged Ethernet) is a network protocol that allows
> > RDMA to run on Ethernet. RoCE encapsulates IB packets on Ethernet and
> > has two versions, RoCEv1 and RoCEv2. RoCEv1 is an Ethernet link layer
> > protocol, IB packets are encapsulated in the Ethernet layer and use
> > Ethernet type 0x8915. RoCEv2 is an internet layer protocol, IB packets
> > are encapsulated in UDP payload and use a destination port 4791, The
> > format of the RoCEv2 packet is as follows:
> > ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
> >
> > BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> > and RoCEv2 both contain this header. This patch introduces a new RTE
> > item to match the IB BTH in RoCE packets. One use of this match is
> > that the user can monitor RoCEv2's CNP(Congestion Notification
> > Packet) by matching BTH opcode 0x81.
> >
> > This patch also adds the testpmd command line to match the RoCEv2 BTH.
> > Usage example:
> >
> > testpmd> flow create 0 group 1 ingress pattern
> > eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
> > dst_qp is 0xd3 / end actions queue index 0 / end
> >
> > Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> > Acked-by: Ori Kam <orika@nvidia.com>
> > Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >
> > v2:
> > - Change "ethernet" name to "Ethernet" in the commit log.
> > - Add "RoCE" and "IB" 2 words to words-case.txt.
> > - Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
> > - Add "Acked-by" labels in the first ethdev patch.
> >
> > v3:
> > - Do rebase to fix the patch apply failure.
> > - Add "Acked-by" label in the second net/mlx5 patch.
> >
> > v4:
> > - Split this series of patches, only keep the first ethdev patch.
> >
>
> Patch looks good, can you please add a release notes update too?
Sure, will send the v5 patch to update it.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v5] ethdev: add flow item for RoCE infiniband BTH
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
` (3 preceding siblings ...)
2023-05-30 3:06 ` [PATCH v4] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
@ 2023-05-31 3:26 ` Dong Zhou
2023-05-31 8:01 ` [PATCH v6] " Dong Zhou
2023-05-31 8:47 ` [PATCH v5] " Ferruh Yigit
4 siblings, 2 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-31 3:26 UTC (permalink / raw)
To: orika, thomas, Aman Singh, Yuying Zhang, Ferruh Yigit,
Andrew Rybchenko, Olivier Matz
Cc: dev
IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.
RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
Ethernet link layer protocol, IB packets are encapsulated in the
Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.
This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:
testpmd> flow create 0 group 1 ingress pattern
eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
dst_qp is 0xd3 / end actions queue index 0 / end
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
v2:
- Change "ethernet" name to "Ethernet" in the commit log.
- Add "RoCE" and "IB" 2 words to words-case.txt.
- Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
- Add "Acked-by" labels in the first ethdev patch.
v3:
- Do rebase to fix the patch apply failure.
- Add "Acked-by" label in the second net/mlx5 patch.
v4:
- Split this series of patches, only keep the first ethdev patch.
v5:
- Update the release notes.
- Update the doxy-api-index.md file.
---
app/test-pmd/cmdline_flow.c | 58 +++++++++++++++++
devtools/words-case.txt | 2 +
doc/api/doxy-api-index.md | 3 +-
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 +++
doc/guides/rel_notes/release_23_07.rst | 3 +
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 +++++++++++++++++++++
11 files changed, 179 insertions(+), 1 deletion(-)
create mode 100644 lib/net/rte_ib.h
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..3ade229ffc 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -496,6 +496,11 @@ enum index {
ITEM_QUOTA_STATE_NAME,
ITEM_AGGR_AFFINITY,
ITEM_AGGR_AFFINITY_VALUE,
+ ITEM_IB_BTH,
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
/* Validate/create actions. */
ACTIONS,
@@ -1452,6 +1457,7 @@ static const enum index next_item[] = {
ITEM_METER,
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
+ ITEM_IB_BTH,
END_SET,
ZERO,
};
@@ -1953,6 +1959,15 @@ static const enum index item_aggr_affinity[] = {
ZERO,
};
+static const enum index item_ib_bth[] = {
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
+ ITEM_NEXT,
+ ZERO,
+};
+
static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5523,6 +5538,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+ [ITEM_IB_BTH] = {
+ .name = "ib_bth",
+ .help = "match ib bth fields",
+ .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+ .next = NEXT(item_ib_bth),
+ .call = parse_vc,
+ },
+ [ITEM_IB_BTH_OPCODE] = {
+ .name = "opcode",
+ .help = "match ib bth opcode",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.opcode)),
+ },
+ [ITEM_IB_BTH_PKEY] = {
+ .name = "pkey",
+ .help = "partition key",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.pkey)),
+ },
+ [ITEM_IB_BTH_DST_QPN] = {
+ .name = "dst_qp",
+ .help = "destination qp",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.dst_qp)),
+ },
+ [ITEM_IB_BTH_PSN] = {
+ .name = "psn",
+ .help = "packet sequence number",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.psn)),
+ },
/* Validate/create actions. */
[ACTIONS] = {
.name = "actions",
@@ -11849,6 +11904,9 @@ flow_item_default_mask(const struct rte_flow_item *item)
case RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY:
mask = &rte_flow_item_aggr_affinity_mask;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ mask = &rte_flow_item_ib_bth_mask;
+ break;
default:
break;
}
diff --git a/devtools/words-case.txt b/devtools/words-case.txt
index 42c7861b68..5bd34e8b88 100644
--- a/devtools/words-case.txt
+++ b/devtools/words-case.txt
@@ -27,6 +27,7 @@ GENEVE
GTPU
GUID
HW
+IB
ICMP
ID
IO
@@ -74,6 +75,7 @@ QinQ
RDMA
RETA
ROC
+RoCE
RQ
RSS
RVU
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index c709fd48ad..a98439c4fb 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -127,7 +127,8 @@ The public API headers are grouped by topics:
[Geneve](@ref rte_geneve.h),
[eCPRI](@ref rte_ecpri.h),
[L2TPv2](@ref rte_l2tpv2.h),
- [PPP](@ref rte_ppp.h)
+ [PPP](@ref rte_ppp.h),
+ [IB](@ref rte_ib.h)
- **QoS**:
[metering](@ref rte_meter.h),
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 1a5087abad..1738715e26 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -104,6 +104,7 @@ gtpc =
gtpu =
gtp_psc =
higig2 =
+ib_bth =
icmp =
icmp6 =
icmp6_echo_request =
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 32fc45516a..e2957df71c 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1551,6 +1551,13 @@ Matches flow quota state set by quota action.
- ``state``: Flow quota state
+Item: ``IB_BTH``
+^^^^^^^^^^^^^^^^
+
+Matches an InfiniBand base transport header in RoCE packet.
+
+- ``hdr``: InfiniBand base transport header definition (``rte_ib.h``).
+
Actions
~~~~~~~
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..ac27de5761 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -54,6 +54,9 @@ New Features
This section is a comment. Do not overwrite or remove it.
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added flow matching of infiniband BTH.**
+
+ Added ``RTE_FLOW_ITEM_TYPE_IB_BTH`` to match infiniband BTH fields.
Removed Items
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 8f23847859..4bad244029 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3781,6 +3781,13 @@ This section lists supported pattern items and their attributes, if any.
- ``send_to_kernel``: send packets to kernel.
+- ``ib_bth``: match InfiniBand BTH(base transport header).
+
+ - ``opcode {unsigned}``: Opcode.
+ - ``pkey {unsigned}``: Partition key.
+ - ``dst_qp {unsigned}``: Destination Queue Pair.
+ - ``psn {unsigned}``: Packet Sequence Number.
+
Actions list
^^^^^^^^^^^^
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 69e6e749f7..6e099deca3 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -164,6 +164,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
MK_FLOW_ITEM(QUOTA, sizeof(struct rte_flow_item_quota)),
MK_FLOW_ITEM(AGGR_AFFINITY, sizeof(struct rte_flow_item_aggr_affinity)),
+ MK_FLOW_ITEM(IB_BTH, sizeof(struct rte_flow_item_ib_bth)),
};
/** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 713ba8b65c..2b7f144c27 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -38,6 +38,7 @@
#include <rte_ppp.h>
#include <rte_gre.h>
#include <rte_macsec.h>
+#include <rte_ib.h>
#ifdef __cplusplus
extern "C" {
@@ -672,6 +673,13 @@ enum rte_flow_item_type {
* @see struct rte_flow_item_aggr_affinity.
*/
RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY,
+
+ /**
+ * Matches an InfiniBand base transport header in RoCE packet.
+ *
+ * See struct rte_flow_item_ib_bth.
+ */
+ RTE_FLOW_ITEM_TYPE_IB_BTH,
};
/**
@@ -2260,6 +2268,25 @@ rte_flow_item_aggr_affinity_mask = {
};
#endif
+/**
+ * RTE_FLOW_ITEM_TYPE_IB_BTH.
+ *
+ * Matches an InfiniBand base transport header in RoCE packet.
+ */
+struct rte_flow_item_ib_bth {
+ struct rte_ib_bth hdr; /**< InfiniBand base transport header definition. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_IB_BTH. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ib_bth rte_flow_item_ib_bth_mask = {
+ .hdr = {
+ .opcode = 0xff,
+ .dst_qp = "\xff\xff\xff",
+ },
+};
+#endif
+
/**
* Action types.
*
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..b7a0684101 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
'rte_geneve.h',
'rte_l2tpv2.h',
'rte_ppp.h',
+ 'rte_ib.h',
)
sources = files(
diff --git a/lib/net/rte_ib.h b/lib/net/rte_ib.h
new file mode 100644
index 0000000000..9eab5f9e15
--- /dev/null
+++ b/lib/net/rte_ib.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_IB_H
+#define RTE_IB_H
+
+/**
+ * @file
+ *
+ * InfiniBand headers definitions
+ *
+ * The infiniBand headers are used by RoCE (RDMA over Converged Ethernet).
+ */
+
+#include <stdint.h>
+
+#include <rte_byteorder.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * InfiniBand Base Transport Header according to
+ * IB Specification Vol 1-Release-1.4.
+ */
+__extension__
+struct rte_ib_bth {
+ uint8_t opcode; /**< Opcode. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t tver:4; /**< Transport Header Version. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t se:1; /**< Solicited Event. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t se:1; /**< Solicited Event. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t tver:4; /**< Transport Header Version. */
+#endif
+ rte_be16_t pkey; /**< Partition key. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd0:6; /**< Reserved. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t f:1; /**< FECN. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t f:1; /**< FECN. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t rsvd0:6; /**< Reserved. */
+#endif
+ uint8_t dst_qp[3]; /**< Destination QP */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd1:7; /**< Reserved. */
+ uint8_t a:1; /**< Acknowledge Request. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t a:1; /**< Acknowledge Request. */
+ uint8_t rsvd1:7; /**< Reserved. */
+#endif
+ uint8_t psn[3]; /**< Packet Sequence Number */
+} __rte_packed;
+
+/** RoCEv2 default port. */
+#define RTE_ROCEV2_DEFAULT_PORT 4791
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_IB_H */
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v6] ethdev: add flow item for RoCE infiniband BTH
2023-05-31 3:26 ` [PATCH v5] " Dong Zhou
@ 2023-05-31 8:01 ` Dong Zhou
2023-05-31 8:47 ` [PATCH v5] " Ferruh Yigit
1 sibling, 0 replies; 23+ messages in thread
From: Dong Zhou @ 2023-05-31 8:01 UTC (permalink / raw)
To: orika, thomas, Aman Singh, Yuying Zhang, Ferruh Yigit,
Andrew Rybchenko, Olivier Matz
Cc: dev
IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.
RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
Ethernet link layer protocol, IB packets are encapsulated in the
Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.
This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:
testpmd> flow create 0 group 1 ingress pattern
eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
dst_qp is 0xd3 / end actions queue index 0 / end
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
v2:
- Change "ethernet" name to "Ethernet" in the commit log.
- Add "RoCE" and "IB" 2 words to words-case.txt.
- Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
- Add "Acked-by" labels in the first ethdev patch.
v3:
- Do rebase to fix the patch apply failure.
- Add "Acked-by" label in the second net/mlx5 patch.
v4:
- Split this series of patches, only keep the first ethdev patch.
v5:
- Update the release notes.
- Update the doxy-api-index.md file.
v6:
- Fix warning in release_23_07.rst.
---
app/test-pmd/cmdline_flow.c | 58 +++++++++++++++++
devtools/words-case.txt | 2 +
doc/api/doxy-api-index.md | 3 +-
doc/guides/nics/features/default.ini | 1 +
doc/guides/prog_guide/rte_flow.rst | 7 +++
doc/guides/rel_notes/release_23_07.rst | 4 ++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
lib/ethdev/rte_flow.c | 1 +
lib/ethdev/rte_flow.h | 27 ++++++++
lib/net/meson.build | 1 +
lib/net/rte_ib.h | 70 +++++++++++++++++++++
11 files changed, 180 insertions(+), 1 deletion(-)
create mode 100644 lib/net/rte_ib.h
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..3ade229ffc 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -496,6 +496,11 @@ enum index {
ITEM_QUOTA_STATE_NAME,
ITEM_AGGR_AFFINITY,
ITEM_AGGR_AFFINITY_VALUE,
+ ITEM_IB_BTH,
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
/* Validate/create actions. */
ACTIONS,
@@ -1452,6 +1457,7 @@ static const enum index next_item[] = {
ITEM_METER,
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
+ ITEM_IB_BTH,
END_SET,
ZERO,
};
@@ -1953,6 +1959,15 @@ static const enum index item_aggr_affinity[] = {
ZERO,
};
+static const enum index item_ib_bth[] = {
+ ITEM_IB_BTH_OPCODE,
+ ITEM_IB_BTH_PKEY,
+ ITEM_IB_BTH_DST_QPN,
+ ITEM_IB_BTH_PSN,
+ ITEM_NEXT,
+ ZERO,
+};
+
static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5523,6 +5538,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+ [ITEM_IB_BTH] = {
+ .name = "ib_bth",
+ .help = "match ib bth fields",
+ .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+ .next = NEXT(item_ib_bth),
+ .call = parse_vc,
+ },
+ [ITEM_IB_BTH_OPCODE] = {
+ .name = "opcode",
+ .help = "match ib bth opcode",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.opcode)),
+ },
+ [ITEM_IB_BTH_PKEY] = {
+ .name = "pkey",
+ .help = "partition key",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.pkey)),
+ },
+ [ITEM_IB_BTH_DST_QPN] = {
+ .name = "dst_qp",
+ .help = "destination qp",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.dst_qp)),
+ },
+ [ITEM_IB_BTH_PSN] = {
+ .name = "psn",
+ .help = "packet sequence number",
+ .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+ item_param),
+ .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+ hdr.psn)),
+ },
/* Validate/create actions. */
[ACTIONS] = {
.name = "actions",
@@ -11849,6 +11904,9 @@ flow_item_default_mask(const struct rte_flow_item *item)
case RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY:
mask = &rte_flow_item_aggr_affinity_mask;
break;
+ case RTE_FLOW_ITEM_TYPE_IB_BTH:
+ mask = &rte_flow_item_ib_bth_mask;
+ break;
default:
break;
}
diff --git a/devtools/words-case.txt b/devtools/words-case.txt
index 42c7861b68..5bd34e8b88 100644
--- a/devtools/words-case.txt
+++ b/devtools/words-case.txt
@@ -27,6 +27,7 @@ GENEVE
GTPU
GUID
HW
+IB
ICMP
ID
IO
@@ -74,6 +75,7 @@ QinQ
RDMA
RETA
ROC
+RoCE
RQ
RSS
RVU
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index c709fd48ad..a98439c4fb 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -127,7 +127,8 @@ The public API headers are grouped by topics:
[Geneve](@ref rte_geneve.h),
[eCPRI](@ref rte_ecpri.h),
[L2TPv2](@ref rte_l2tpv2.h),
- [PPP](@ref rte_ppp.h)
+ [PPP](@ref rte_ppp.h),
+ [IB](@ref rte_ib.h)
- **QoS**:
[metering](@ref rte_meter.h),
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 1a5087abad..1738715e26 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -104,6 +104,7 @@ gtpc =
gtpu =
gtp_psc =
higig2 =
+ib_bth =
icmp =
icmp6 =
icmp6_echo_request =
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 32fc45516a..e2957df71c 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1551,6 +1551,13 @@ Matches flow quota state set by quota action.
- ``state``: Flow quota state
+Item: ``IB_BTH``
+^^^^^^^^^^^^^^^^
+
+Matches an InfiniBand base transport header in RoCE packet.
+
+- ``hdr``: InfiniBand base transport header definition (``rte_ib.h``).
+
Actions
~~~~~~~
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..cc9ee8e170 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -55,6 +55,10 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added flow matching of infiniband BTH.**
+
+ Added ``RTE_FLOW_ITEM_TYPE_IB_BTH`` to match infiniband BTH fields.
+
Removed Items
-------------
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 8f23847859..4bad244029 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3781,6 +3781,13 @@ This section lists supported pattern items and their attributes, if any.
- ``send_to_kernel``: send packets to kernel.
+- ``ib_bth``: match InfiniBand BTH(base transport header).
+
+ - ``opcode {unsigned}``: Opcode.
+ - ``pkey {unsigned}``: Partition key.
+ - ``dst_qp {unsigned}``: Destination Queue Pair.
+ - ``psn {unsigned}``: Packet Sequence Number.
+
Actions list
^^^^^^^^^^^^
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 69e6e749f7..6e099deca3 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -164,6 +164,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
MK_FLOW_ITEM(QUOTA, sizeof(struct rte_flow_item_quota)),
MK_FLOW_ITEM(AGGR_AFFINITY, sizeof(struct rte_flow_item_aggr_affinity)),
+ MK_FLOW_ITEM(IB_BTH, sizeof(struct rte_flow_item_ib_bth)),
};
/** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 713ba8b65c..2b7f144c27 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -38,6 +38,7 @@
#include <rte_ppp.h>
#include <rte_gre.h>
#include <rte_macsec.h>
+#include <rte_ib.h>
#ifdef __cplusplus
extern "C" {
@@ -672,6 +673,13 @@ enum rte_flow_item_type {
* @see struct rte_flow_item_aggr_affinity.
*/
RTE_FLOW_ITEM_TYPE_AGGR_AFFINITY,
+
+ /**
+ * Matches an InfiniBand base transport header in RoCE packet.
+ *
+ * See struct rte_flow_item_ib_bth.
+ */
+ RTE_FLOW_ITEM_TYPE_IB_BTH,
};
/**
@@ -2260,6 +2268,25 @@ rte_flow_item_aggr_affinity_mask = {
};
#endif
+/**
+ * RTE_FLOW_ITEM_TYPE_IB_BTH.
+ *
+ * Matches an InfiniBand base transport header in RoCE packet.
+ */
+struct rte_flow_item_ib_bth {
+ struct rte_ib_bth hdr; /**< InfiniBand base transport header definition. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_IB_BTH. */
+#ifndef __cplusplus
+static const struct rte_flow_item_ib_bth rte_flow_item_ib_bth_mask = {
+ .hdr = {
+ .opcode = 0xff,
+ .dst_qp = "\xff\xff\xff",
+ },
+};
+#endif
+
/**
* Action types.
*
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..b7a0684101 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
'rte_geneve.h',
'rte_l2tpv2.h',
'rte_ppp.h',
+ 'rte_ib.h',
)
sources = files(
diff --git a/lib/net/rte_ib.h b/lib/net/rte_ib.h
new file mode 100644
index 0000000000..9eab5f9e15
--- /dev/null
+++ b/lib/net/rte_ib.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_IB_H
+#define RTE_IB_H
+
+/**
+ * @file
+ *
+ * InfiniBand headers definitions
+ *
+ * The infiniBand headers are used by RoCE (RDMA over Converged Ethernet).
+ */
+
+#include <stdint.h>
+
+#include <rte_byteorder.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * InfiniBand Base Transport Header according to
+ * IB Specification Vol 1-Release-1.4.
+ */
+__extension__
+struct rte_ib_bth {
+ uint8_t opcode; /**< Opcode. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t tver:4; /**< Transport Header Version. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t se:1; /**< Solicited Event. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t se:1; /**< Solicited Event. */
+ uint8_t m:1; /**< MigReq. */
+ uint8_t padcnt:2; /**< Pad Count. */
+ uint8_t tver:4; /**< Transport Header Version. */
+#endif
+ rte_be16_t pkey; /**< Partition key. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd0:6; /**< Reserved. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t f:1; /**< FECN. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t f:1; /**< FECN. */
+ uint8_t b:1; /**< BECN. */
+ uint8_t rsvd0:6; /**< Reserved. */
+#endif
+ uint8_t dst_qp[3]; /**< Destination QP */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+ uint8_t rsvd1:7; /**< Reserved. */
+ uint8_t a:1; /**< Acknowledge Request. */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+ uint8_t a:1; /**< Acknowledge Request. */
+ uint8_t rsvd1:7; /**< Reserved. */
+#endif
+ uint8_t psn[3]; /**< Packet Sequence Number */
+} __rte_packed;
+
+/** RoCEv2 default port. */
+#define RTE_ROCEV2_DEFAULT_PORT 4791
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_IB_H */
--
2.27.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5] ethdev: add flow item for RoCE infiniband BTH
2023-05-31 3:26 ` [PATCH v5] " Dong Zhou
2023-05-31 8:01 ` [PATCH v6] " Dong Zhou
@ 2023-05-31 8:47 ` Ferruh Yigit
1 sibling, 0 replies; 23+ messages in thread
From: Ferruh Yigit @ 2023-05-31 8:47 UTC (permalink / raw)
To: Dong Zhou, orika, thomas, Aman Singh, Yuying Zhang,
Andrew Rybchenko, Olivier Matz
Cc: dev
On 5/31/2023 4:26 AM, Dong Zhou wrote:
> IB(InfiniBand) is one type of networking used in high-performance
> computing with high throughput and low latency. Like Ethernet,
> IB defines a layered protocol (Physical, Link, Network, Transport
> Layers). IB provides native support for RDMA(Remote DMA), an
> extension of the DMA that allows direct access to remote host
> memory without CPU intervention. IB network requires NICs and
> switches to support the IB protocol.
>
> RoCE(RDMA over Converged Ethernet) is a network protocol that
> allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
> Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
> Ethernet link layer protocol, IB packets are encapsulated in the
> Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
> layer protocol, IB packets are encapsulated in UDP payload and
> use a destination port 4791, The format of the RoCEv2 packet is
> as follows:
> ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
>
> BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> and RoCEv2 both contain this header. This patch introduces a new
> RTE item to match the IB BTH in RoCE packets. One use of this match
> is that the user can monitor RoCEv2's CNP(Congestion Notification
> Packet) by matching BTH opcode 0x81.
>
> This patch also adds the testpmd command line to match the RoCEv2
> BTH. Usage example:
>
> testpmd> flow create 0 group 1 ingress pattern
> eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
> dst_qp is 0xd3 / end actions queue index 0 / end
>
> Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>
> v2:
> - Change "ethernet" name to "Ethernet" in the commit log.
> - Add "RoCE" and "IB" 2 words to words-case.txt.
> - Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
> - Add "Acked-by" labels in the first ethdev patch.
>
> v3:
> - Do rebase to fix the patch apply failure.
> - Add "Acked-by" label in the second net/mlx5 patch.
>
> v4:
> - Split this series of patches, only keep the first ethdev patch.
>
> v5:
> - Update the release notes.
> - Update the doxy-api-index.md file.
>
Applied to dpdk-next-net/main, thanks.
(release notes warning fixed while merging.)
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2023-05-31 8:47 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11 7:55 [PATCH v1 0/3] add support for infiniband BTH match Dong Zhou
2023-05-11 7:55 ` [PATCH v1 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-17 17:06 ` Ori Kam
2023-05-22 7:01 ` Andrew Rybchenko
2023-05-24 6:58 ` Bill Zhou
2023-05-11 7:55 ` [PATCH v1 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
2023-05-11 7:55 ` [PATCH v1 3/3] net/mlx5/hws: " Dong Zhou
2023-05-24 10:08 ` [PATCH v2 0/3] " Dong Zhou
2023-05-24 10:08 ` [PATCH v2 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-24 10:08 ` [PATCH v2 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
2023-05-24 12:54 ` Ori Kam
2023-05-24 10:08 ` [PATCH v2 3/3] net/mlx5/hws: " Dong Zhou
2023-05-25 7:40 ` [PATCH v3 0/3] " Dong Zhou
2023-05-25 7:40 ` [PATCH v3 1/3] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-25 7:40 ` [PATCH v3 2/3] net/mlx5: add support for infiniband BTH match Dong Zhou
2023-05-25 7:40 ` [PATCH v3 3/3] net/mlx5/hws: " Dong Zhou
2023-05-29 13:36 ` Alex Vesker
2023-05-30 3:06 ` [PATCH v4] ethdev: add flow item for RoCE infiniband BTH Dong Zhou
2023-05-30 17:46 ` Ferruh Yigit
2023-05-31 3:22 ` Dong Zhou
2023-05-31 3:26 ` [PATCH v5] " Dong Zhou
2023-05-31 8:01 ` [PATCH v6] " Dong Zhou
2023-05-31 8:47 ` [PATCH v5] " Ferruh Yigit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).